Page MenuHomec4science

No OneTemporary

File Metadata

Created
Thu, Jun 19, 16:02
This file is larger than 256 KB, so syntax highlighting was skipped.
diff --git a/doc/src/Manual.txt b/doc/src/Manual.txt
index a5a874fa8..bceb17401 100644
--- a/doc/src/Manual.txt
+++ b/doc/src/Manual.txt
@@ -1,340 +1,339 @@
<!-- HTML_ONLY -->
<HEAD>
<TITLE>LAMMPS Users Manual</TITLE>
-<META NAME="docnumber" CONTENT="11 Apr 2017 version">
+<META NAME="docnumber" CONTENT="4 May 2017 version">
<META NAME="author" CONTENT="http://lammps.sandia.gov - Sandia National Laboratories">
<META NAME="copyright" CONTENT="Copyright (2003) Sandia Corporation. This software and manual is distributed under the GNU General Public License.">
</HEAD>
<BODY>
<!-- END_HTML_ONLY -->
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
<H1></H1>
LAMMPS Documentation :c,h3
-11 Apr 2017 version :c,h4
+4 May 2017 version :c,h4
Version info: :h4
The LAMMPS "version" is the date when it was released, such as 1 May
2010. LAMMPS is updated continuously. Whenever we fix a bug or add a
feature, we release it immediately, and post a notice on "this page of
the WWW site"_bug. Every 2-4 months one of the incremental releases
is subjected to more thorough testing and labeled as a {stable} version.
Each dated copy of LAMMPS contains all the
features and bug-fixes up to and including that version date. The
version date is printed to the screen and logfile every time you run
LAMMPS. It is also in the file src/version.h and in the LAMMPS
directory name created when you unpack a tarball, and at the top of
the first page of the manual (this page).
If you browse the HTML doc pages on the LAMMPS WWW site, they always
describe the most current [development] version of LAMMPS. :ulb,l
If you browse the HTML doc pages included in your tarball, they
describe the version you have. :l
The "PDF file"_Manual.pdf on the WWW site or in the tarball is updated
about once per month. This is because it is large, and we don't want
it to be part of every patch. :l
There is also a "Developer.pdf"_Developer.pdf file in the doc
directory, which describes the internal structure and algorithms of
LAMMPS. :l
:ule
LAMMPS stands for Large-scale Atomic/Molecular Massively Parallel
Simulator.
LAMMPS is a classical molecular dynamics simulation code designed to
run efficiently on parallel computers. It was developed at Sandia
National Laboratories, a US Department of Energy facility, with
funding from the DOE. It is an open-source code, distributed freely
under the terms of the GNU Public License (GPL).
The current core group of LAMMPS developers is at Sandia National
Labs and Temple University:
"Steve Plimpton"_sjp, sjplimp at sandia.gov :ulb,l
Aidan Thompson, athomps at sandia.gov :l
Stan Moore, stamoor at sandia.gov :l
"Axel Kohlmeyer"_ako, akohlmey at gmail.com :l
:ule
Past core developers include Paul Crozier, Ray Shan and Mark Stevens,
all at Sandia. The [LAMMPS home page] at
"http://lammps.sandia.gov"_http://lammps.sandia.gov has more information
about the code and its uses. Interaction with external LAMMPS developers,
bug reports and feature requests are mainly coordinated through the
"LAMMPS project on GitHub."_https://github.com/lammps/lammps
The lammps.org domain, currently hosting "public continuous integration
testing"_https://ci.lammps.org/job/lammps/ and "precompiled Linux
RPM and Windows installer packages"_http://rpm.lammps.org is located
at Temple University and managed by Richard Berger,
richard.berger at temple.edu.
:link(bug,http://lammps.sandia.gov/bug.html)
:link(sjp,http://www.sandia.gov/~sjplimp)
:link(ako,http://goo.gl/1wk0)
:line
The LAMMPS documentation is organized into the following sections. If
you find errors or omissions in this manual or have suggestions for
useful information to add, please send an email to the developers so
we can improve the LAMMPS documentation.
Once you are familiar with LAMMPS, you may want to bookmark "this
page"_Section_commands.html#comm at Section_commands.html#comm since
it gives quick access to documentation for all LAMMPS commands.
"PDF file"_Manual.pdf of the entire manual, generated by
"htmldoc"_http://freecode.com/projects/htmldoc
<!-- RST
.. toctree::
:maxdepth: 2
:numbered:
:caption: User Documentation
:name: userdoc
:includehidden:
Section_intro
Section_start
Section_commands
Section_packages
Section_accelerate
Section_howto
Section_example
Section_perf
Section_tools
Section_modify
Section_python
Section_errors
Section_history
.. toctree::
:caption: Index
:name: index
:hidden:
tutorials
commands
fixes
computes
pairs
bonds
angles
dihedrals
impropers
Indices and tables
==================
* :ref:`genindex`
* :ref:`search`
END_RST -->
<!-- HTML_ONLY -->
"Introduction"_Section_intro.html :olb,l
1.1 "What is LAMMPS"_intro_1 :ulb,b
1.2 "LAMMPS features"_intro_2 :b
1.3 "LAMMPS non-features"_intro_3 :b
1.4 "Open source distribution"_intro_4 :b
1.5 "Acknowledgments and citations"_intro_5 :ule,b
"Getting started"_Section_start.html :l
2.1 "What's in the LAMMPS distribution"_start_1 :ulb,b
2.2 "Making LAMMPS"_start_2 :b
2.3 "Making LAMMPS with optional packages"_start_3 :b
- 2.4 "Building LAMMPS via the Make.py script"_start_4 :b
- 2.5 "Building LAMMPS as a library"_start_5 :b
- 2.6 "Running LAMMPS"_start_6 :b
- 2.7 "Command-line options"_start_7 :b
- 2.8 "Screen output"_start_8 :b
- 2.9 "Tips for users of previous versions"_start_9 :ule,b
+ 2.4 "Building LAMMPS as a library"_start_4 :b
+ 2.5 "Running LAMMPS"_start_5 :b
+ 2.6 "Command-line options"_start_6 :b
+ 2.7 "Screen output"_start_7 :b
+ 2.8 "Tips for users of previous versions"_start_8 :ule,b
"Commands"_Section_commands.html :l
3.1 "LAMMPS input script"_cmd_1 :ulb,b
3.2 "Parsing rules"_cmd_2 :b
3.3 "Input script structure"_cmd_3 :b
3.4 "Commands listed by category"_cmd_4 :b
3.5 "Commands listed alphabetically"_cmd_5 :ule,b
"Packages"_Section_packages.html :l
4.1 "Standard packages"_pkg_1 :ulb,b
4.2 "User packages"_pkg_2 :ule,b
"Accelerating LAMMPS performance"_Section_accelerate.html :l
5.1 "Measuring performance"_acc_1 :ulb,b
5.2 "Algorithms and code options to boost performace"_acc_2 :b
5.3 "Accelerator packages with optimized styles"_acc_3 :b
5.3.1 "GPU package"_accelerate_gpu.html :ulb,b
5.3.2 "USER-INTEL package"_accelerate_intel.html :b
5.3.3 "KOKKOS package"_accelerate_kokkos.html :b
5.3.4 "USER-OMP package"_accelerate_omp.html :b
5.3.5 "OPT package"_accelerate_opt.html :ule,b
5.4 "Comparison of various accelerator packages"_acc_4 :ule,b
"How-to discussions"_Section_howto.html :l
6.1 "Restarting a simulation"_howto_1 :ulb,b
6.2 "2d simulations"_howto_2 :b
6.3 "CHARMM and AMBER force fields"_howto_3 :b
6.4 "Running multiple simulations from one input script"_howto_4 :b
6.5 "Multi-replica simulations"_howto_5 :b
6.6 "Granular models"_howto_6 :b
6.7 "TIP3P water model"_howto_7 :b
6.8 "TIP4P water model"_howto_8 :b
6.9 "SPC water model"_howto_9 :b
6.10 "Coupling LAMMPS to other codes"_howto_10 :b
6.11 "Visualizing LAMMPS snapshots"_howto_11 :b
6.12 "Triclinic (non-orthogonal) simulation boxes"_howto_12 :b
6.13 "NEMD simulations"_howto_13 :b
6.14 "Finite-size spherical and aspherical particles"_howto_14 :b
6.15 "Output from LAMMPS (thermo, dumps, computes, fixes, variables)"_howto_15 :b
6.16 "Thermostatting, barostatting, and compute temperature"_howto_16 :b
6.17 "Walls"_howto_17 :b
6.18 "Elastic constants"_howto_18 :b
6.19 "Library interface to LAMMPS"_howto_19 :b
6.20 "Calculating thermal conductivity"_howto_20 :b
6.21 "Calculating viscosity"_howto_21 :b
6.22 "Calculating a diffusion coefficient"_howto_22 :b
6.23 "Using chunks to calculate system properties"_howto_23 :b
6.24 "Setting parameters for pppm/disp"_howto_24 :b
6.25 "Polarizable models"_howto_25 :b
6.26 "Adiabatic core/shell model"_howto_26 :b
6.27 "Drude induced dipoles"_howto_27 :ule,b
"Example problems"_Section_example.html :l
"Performance & scalability"_Section_perf.html :l
"Additional tools"_Section_tools.html :l
"Modifying & extending LAMMPS"_Section_modify.html :l
10.1 "Atom styles"_mod_1 :ulb,b
10.2 "Bond, angle, dihedral, improper potentials"_mod_2 :b
10.3 "Compute styles"_mod_3 :b
10.4 "Dump styles"_mod_4 :b
10.5 "Dump custom output options"_mod_5 :b
10.6 "Fix styles"_mod_6 :b
10.7 "Input script commands"_mod_7 :b
10.8 "Kspace computations"_mod_8 :b
10.9 "Minimization styles"_mod_9 :b
10.10 "Pairwise potentials"_mod_10 :b
10.11 "Region styles"_mod_11 :b
10.12 "Body styles"_mod_12 :b
10.13 "Thermodynamic output options"_mod_13 :b
10.14 "Variable options"_mod_14 :b
10.15 "Submitting new features for inclusion in LAMMPS"_mod_15 :ule,b
"Python interface"_Section_python.html :l
11.1 "Overview of running LAMMPS from Python"_py_1 :ulb,b
11.2 "Overview of using Python from a LAMMPS script"_py_2 :b
11.3 "Building LAMMPS as a shared library"_py_3 :b
11.4 "Installing the Python wrapper into Python"_py_4 :b
11.5 "Extending Python with MPI to run in parallel"_py_5 :b
11.6 "Testing the Python-LAMMPS interface"_py_6 :b
11.7 "Using LAMMPS from Python"_py_7 :b
11.8 "Example Python scripts that use LAMMPS"_py_8 :ule,b
"Errors"_Section_errors.html :l
12.1 "Common problems"_err_1 :ulb,b
12.2 "Reporting bugs"_err_2 :b
12.3 "Error & warning messages"_err_3 :ule,b
"Future and history"_Section_history.html :l
13.1 "Coming attractions"_hist_1 :ulb,b
13.2 "Past versions"_hist_2 :ule,b
:ole
:link(intro_1,Section_intro.html#intro_1)
:link(intro_2,Section_intro.html#intro_2)
:link(intro_3,Section_intro.html#intro_3)
:link(intro_4,Section_intro.html#intro_4)
:link(intro_5,Section_intro.html#intro_5)
:link(start_1,Section_start.html#start_1)
:link(start_2,Section_start.html#start_2)
:link(start_3,Section_start.html#start_3)
:link(start_4,Section_start.html#start_4)
:link(start_5,Section_start.html#start_5)
:link(start_6,Section_start.html#start_6)
:link(start_7,Section_start.html#start_7)
:link(start_8,Section_start.html#start_8)
:link(start_9,Section_start.html#start_9)
:link(cmd_1,Section_commands.html#cmd_1)
:link(cmd_2,Section_commands.html#cmd_2)
:link(cmd_3,Section_commands.html#cmd_3)
:link(cmd_4,Section_commands.html#cmd_4)
:link(cmd_5,Section_commands.html#cmd_5)
:link(pkg_1,Section_packages.html#pkg_1)
:link(pkg_2,Section_packages.html#pkg_2)
:link(acc_1,Section_accelerate.html#acc_1)
:link(acc_2,Section_accelerate.html#acc_2)
:link(acc_3,Section_accelerate.html#acc_3)
:link(acc_4,Section_accelerate.html#acc_4)
:link(howto_1,Section_howto.html#howto_1)
:link(howto_2,Section_howto.html#howto_2)
:link(howto_3,Section_howto.html#howto_3)
:link(howto_4,Section_howto.html#howto_4)
:link(howto_5,Section_howto.html#howto_5)
:link(howto_6,Section_howto.html#howto_6)
:link(howto_7,Section_howto.html#howto_7)
:link(howto_8,Section_howto.html#howto_8)
:link(howto_9,Section_howto.html#howto_9)
:link(howto_10,Section_howto.html#howto_10)
:link(howto_11,Section_howto.html#howto_11)
:link(howto_12,Section_howto.html#howto_12)
:link(howto_13,Section_howto.html#howto_13)
:link(howto_14,Section_howto.html#howto_14)
:link(howto_15,Section_howto.html#howto_15)
:link(howto_16,Section_howto.html#howto_16)
:link(howto_17,Section_howto.html#howto_17)
:link(howto_18,Section_howto.html#howto_18)
:link(howto_19,Section_howto.html#howto_19)
:link(howto_20,Section_howto.html#howto_20)
:link(howto_21,Section_howto.html#howto_21)
:link(howto_22,Section_howto.html#howto_22)
:link(howto_23,Section_howto.html#howto_23)
:link(howto_24,Section_howto.html#howto_24)
:link(howto_25,Section_howto.html#howto_25)
:link(howto_26,Section_howto.html#howto_26)
:link(howto_27,Section_howto.html#howto_27)
:link(mod_1,Section_modify.html#mod_1)
:link(mod_2,Section_modify.html#mod_2)
:link(mod_3,Section_modify.html#mod_3)
:link(mod_4,Section_modify.html#mod_4)
:link(mod_5,Section_modify.html#mod_5)
:link(mod_6,Section_modify.html#mod_6)
:link(mod_7,Section_modify.html#mod_7)
:link(mod_8,Section_modify.html#mod_8)
:link(mod_9,Section_modify.html#mod_9)
:link(mod_10,Section_modify.html#mod_10)
:link(mod_11,Section_modify.html#mod_11)
:link(mod_12,Section_modify.html#mod_12)
:link(mod_13,Section_modify.html#mod_13)
:link(mod_14,Section_modify.html#mod_14)
:link(mod_15,Section_modify.html#mod_15)
:link(py_1,Section_python.html#py_1)
:link(py_2,Section_python.html#py_2)
:link(py_3,Section_python.html#py_3)
:link(py_4,Section_python.html#py_4)
:link(py_5,Section_python.html#py_5)
:link(py_6,Section_python.html#py_6)
:link(err_1,Section_errors.html#err_1)
:link(err_2,Section_errors.html#err_2)
:link(err_3,Section_errors.html#err_3)
:link(hist_1,Section_history.html#hist_1)
:link(hist_2,Section_history.html#hist_2)
<!-- END_HTML_ONLY -->
</BODY>
diff --git a/doc/src/Section_commands.txt b/doc/src/Section_commands.txt
index 3f1d6ff20..c71acfe06 100644
--- a/doc/src/Section_commands.txt
+++ b/doc/src/Section_commands.txt
@@ -1,1226 +1,1226 @@
"Previous Section"_Section_start.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_packages.html :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
3. Commands :h3
This section describes how a LAMMPS input script is formatted and the
input script commands used to define a LAMMPS simulation.
3.1 "LAMMPS input script"_#cmd_1
3.2 "Parsing rules"_#cmd_2
3.3 "Input script structure"_#cmd_3
3.4 "Commands listed by category"_#cmd_4
3.5 "Commands listed alphabetically"_#cmd_5 :all(b)
:line
:line
3.1 LAMMPS input script :link(cmd_1),h4
LAMMPS executes by reading commands from a input script (text file),
one line at a time. When the input script ends, LAMMPS exits. Each
command causes LAMMPS to take some action. It may set an internal
variable, read in a file, or run a simulation. Most commands have
default settings, which means you only need to use the command if you
wish to change the default.
In many cases, the ordering of commands in an input script is not
important. However the following rules apply:
(1) LAMMPS does not read your entire input script and then perform a
simulation with all the settings. Rather, the input script is read
one line at a time and each command takes effect when it is read.
Thus this sequence of commands:
timestep 0.5
run 100
run 100 :pre
does something different than this sequence:
run 100
timestep 0.5
run 100 :pre
In the first case, the specified timestep (0.5 fmsec) is used for two
simulations of 100 timesteps each. In the 2nd case, the default
timestep (1.0 fmsec) is used for the 1st 100 step simulation and a 0.5
fmsec timestep is used for the 2nd one.
(2) Some commands are only valid when they follow other commands. For
example you cannot set the temperature of a group of atoms until atoms
have been defined and a group command is used to define which atoms
belong to the group.
(3) Sometimes command B will use values that can be set by command A.
This means command A must precede command B in the input script if it
is to have the desired effect. For example, the
"read_data"_read_data.html command initializes the system by setting
up the simulation box and assigning atoms to processors. If default
values are not desired, the "processors"_processors.html and
"boundary"_boundary.html commands need to be used before read_data to
tell LAMMPS how to map processors to the simulation box.
Many input script errors are detected by LAMMPS and an ERROR or
WARNING message is printed. "This section"_Section_errors.html gives
more information on what errors mean. The documentation for each
command lists restrictions on how the command can be used.
:line
3.2 Parsing rules :link(cmd_2),h4
Each non-blank line in the input script is treated as a command.
LAMMPS commands are case sensitive. Command names are lower-case, as
are specified command arguments. Upper case letters may be used in
file names or user-chosen ID strings.
Here is how each line in the input script is parsed by LAMMPS:
(1) If the last printable character on the line is a "&" character,
the command is assumed to continue on the next line. The next line is
concatenated to the previous line by removing the "&" character and
line break. This allows long commands to be continued across two or
more lines. See the discussion of triple quotes in (6) for how to
continue a command across multiple line without using "&" characters.
(2) All characters from the first "#" character onward are treated as
comment and discarded. See an exception in (6). Note that a
comment after a trailing "&" character will prevent the command from
continuing on the next line. Also note that for multi-line commands a
single leading "#" will comment out the entire command.
(3) The line is searched repeatedly for $ characters, which indicate
variables that are replaced with a text string. See an exception in
(6).
If the $ is followed by curly brackets, then the variable name is the
text inside the curly brackets. If no curly brackets follow the $,
then the variable name is the single character immediately following
the $. Thus $\{myTemp\} and $x refer to variable names "myTemp" and
"x".
How the variable is converted to a text string depends on what style
of variable it is; see the "variable"_variable.html doc page for details.
It can be a variable that stores multiple text strings, and return one
of them. The returned text string can be multiple "words" (space
separated) which will then be interpreted as multiple arguments in the
input command. The variable can also store a numeric formula which
will be evaluated and its numeric result returned as a string.
As a special case, if the $ is followed by parenthesis, then the text
inside the parenthesis is treated as an "immediate" variable and
evaluated as an "equal-style variable"_variable.html. This is a way
to use numeric formulas in an input script without having to assign
them to variable names. For example, these 3 input script lines:
variable X equal (xlo+xhi)/2+sqrt(v_area)
region 1 block $X 2 INF INF EDGE EDGE
variable X delete :pre
can be replaced by
region 1 block $((xlo+xhi)/2+sqrt(v_area)) 2 INF INF EDGE EDGE :pre
so that you do not have to define (or discard) a temporary variable X.
Note that neither the curly-bracket or immediate form of variables can
contain nested $ characters for other variables to substitute for.
Thus you cannot do this:
variable a equal 2
variable b2 equal 4
print "B2 = $\{b$a\}" :pre
Nor can you specify this $($x-1.0) for an immediate variable, but
you could use $(v_x-1.0), since the latter is valid syntax for an
"equal-style variable"_variable.html.
See the "variable"_variable.html command for more details of how
strings are assigned to variables and evaluated, and how they can be
used in input script commands.
(4) The line is broken into "words" separated by whitespace (tabs,
spaces). Note that words can thus contain letters, digits,
underscores, or punctuation characters.
(5) The first word is the command name. All successive words in the
line are arguments.
(6) If you want text with spaces to be treated as a single argument,
it can be enclosed in either single or double or triple quotes. A
long single argument enclosed in single or double quotes can span
multiple lines if the "&" character is used, as described above. When
the lines are concatenated together (and the "&" characters and line
breaks removed), the text will become a single line. If you want
multiple lines of an argument to retain their line breaks, the text
can be enclosed in triple quotes, in which case "&" characters are not
needed. For example:
print "Volume = $v"
print 'Volume = $v'
if "$\{steps\} > 1000" then quit
variable a string "red green blue &
purple orange cyan"
print """
System volume = $v
System temperature = $t
""" :pre
In each case, the single, double, or triple quotes are removed when
the single argument they enclose is stored internally.
See the "dump modify format"_dump_modify.html, "print"_print.html,
"if"_if.html, and "python"_python.html commands for examples.
A "#" or "$" character that is between quotes will not be treated as a
comment indicator in (2) or substituted for as a variable in (3).
NOTE: If the argument is itself a command that requires a quoted
argument (e.g. using a "print"_print.html command as part of an
"if"_if.html or "run every"_run.html command), then single, double, or
triple quotes can be nested in the usual manner. See the doc pages
for those commands for examples. Only one of level of nesting is
allowed, but that should be sufficient for most use cases.
:line
3.3 Input script structure :h4,link(cmd_3)
This section describes the structure of a typical LAMMPS input script.
The "examples" directory in the LAMMPS distribution contains many
sample input scripts; the corresponding problems are discussed in
"Section 7"_Section_example.html, and animated on the "LAMMPS
WWW Site"_lws.
A LAMMPS input script typically has 4 parts:
Initialization
Atom definition
Settings
Run a simulation :ol
The last 2 parts can be repeated as many times as desired. I.e. run a
simulation, change some settings, run some more, etc. Each of the 4
parts is now described in more detail. Remember that almost all the
commands need only be used if a non-default value is desired.
(1) Initialization
Set parameters that need to be defined before atoms are created or
read-in from a file.
The relevant commands are "units"_units.html,
"dimension"_dimension.html, "newton"_newton.html,
"processors"_processors.html, "boundary"_boundary.html,
"atom_style"_atom_style.html, "atom_modify"_atom_modify.html.
If force-field parameters appear in the files that will be read, these
commands tell LAMMPS what kinds of force fields are being used:
"pair_style"_pair_style.html, "bond_style"_bond_style.html,
"angle_style"_angle_style.html, "dihedral_style"_dihedral_style.html,
"improper_style"_improper_style.html.
(2) Atom definition
There are 3 ways to define atoms in LAMMPS. Read them in from a data
or restart file via the "read_data"_read_data.html or
"read_restart"_read_restart.html commands. These files can contain
molecular topology information. Or create atoms on a lattice (with no
molecular topology), using these commands: "lattice"_lattice.html,
"region"_region.html, "create_box"_create_box.html,
"create_atoms"_create_atoms.html. The entire set of atoms can be
duplicated to make a larger simulation using the
"replicate"_replicate.html command.
(3) Settings
Once atoms and molecular topology are defined, a variety of settings
can be specified: force field coefficients, simulation parameters,
output options, etc.
Force field coefficients are set by these commands (they can also be
set in the read-in files): "pair_coeff"_pair_coeff.html,
"bond_coeff"_bond_coeff.html, "angle_coeff"_angle_coeff.html,
"dihedral_coeff"_dihedral_coeff.html,
"improper_coeff"_improper_coeff.html,
"kspace_style"_kspace_style.html, "dielectric"_dielectric.html,
"special_bonds"_special_bonds.html.
Various simulation parameters are set by these commands:
"neighbor"_neighbor.html, "neigh_modify"_neigh_modify.html,
"group"_group.html, "timestep"_timestep.html,
"reset_timestep"_reset_timestep.html, "run_style"_run_style.html,
"min_style"_min_style.html, "min_modify"_min_modify.html.
Fixes impose a variety of boundary conditions, time integration, and
diagnostic options. The "fix"_fix.html command comes in many flavors.
Various computations can be specified for execution during a
simulation using the "compute"_compute.html,
"compute_modify"_compute_modify.html, and "variable"_variable.html
commands.
Output options are set by the "thermo"_thermo.html, "dump"_dump.html,
and "restart"_restart.html commands.
(4) Run a simulation
A molecular dynamics simulation is run using the "run"_run.html
command. Energy minimization (molecular statics) is performed using
the "minimize"_minimize.html command. A parallel tempering
(replica-exchange) simulation can be run using the
"temper"_temper.html command.
:line
3.4 Commands listed by category :link(cmd_4),h4
This section lists core LAMMPS commands, grouped by category.
The "next section"_#cmd_5 lists all commands alphabetically. The
next section also includes (long) lists of style options for entries
that appear in the following categories as a single command (fix,
compute, pair, etc). Commands that are added by user packages are not
included in the categories here, but they are in the next section.
Initialization:
"newton"_newton.html,
"package"_package.html,
"processors"_processors.html,
"suffix"_suffix.html,
"units"_units.html
Setup simulation box:
"boundary"_boundary.html,
"box"_box.html,
"change_box"_change_box.html,
"create_box"_create_box.html,
"dimension"_dimension.html,
"lattice"_lattice.html,
"region"_region.html
Setup atoms:
"atom_modify"_atom_modify.html,
"atom_style"_atom_style.html,
"balance"_balance.html,
"create_atoms"_create_atoms.html,
"create_bonds"_create_bonds.html,
"delete_atoms"_delete_atoms.html,
"delete_bonds"_delete_bonds.html,
"displace_atoms"_displace_atoms.html,
"group"_group.html,
"mass"_mass.html,
"molecule"_molecule.html,
"read_data"_read_data.html,
"read_dump"_read_dump.html,
"read_restart"_read_restart.html,
"replicate"_replicate.html,
"set"_set.html,
"velocity"_velocity.html
Force fields:
"angle_coeff"_angle_coeff.html,
"angle_style"_angle_style.html,
"bond_coeff"_bond_coeff.html,
"bond_style"_bond_style.html,
"bond_write"_bond_write.html,
"dielectric"_dielectric.html,
"dihedral_coeff"_dihedral_coeff.html,
"dihedral_style"_dihedral_style.html,
"improper_coeff"_improper_coeff.html,
"improper_style"_improper_style.html,
"kspace_modify"_kspace_modify.html,
"kspace_style"_kspace_style.html,
"pair_coeff"_pair_coeff.html,
"pair_modify"_pair_modify.html,
"pair_style"_pair_style.html,
"pair_write"_pair_write.html,
"special_bonds"_special_bonds.html
Settings:
"comm_modify"_comm_modify.html,
"comm_style"_comm_style.html,
"info"_info.html,
"min_modify"_min_modify.html,
"min_style"_min_style.html,
"neigh_modify"_neigh_modify.html,
"neighbor"_neighbor.html,
"partition"_partition.html,
"reset_timestep"_reset_timestep.html,
"run_style"_run_style.html,
"timer"_timer.html,
"timestep"_timestep.html
Operations within timestepping (fixes) and diagnostics (computes):
"compute"_compute.html,
"compute_modify"_compute_modify.html,
"fix"_fix.html,
"fix_modify"_fix_modify.html,
"uncompute"_uncompute.html,
"unfix"_unfix.html
Output:
"dump image"_dump_image.html,
"dump movie"_dump_image.html,
"dump"_dump.html,
"dump_modify"_dump_modify.html,
"restart"_restart.html,
"thermo"_thermo.html,
"thermo_modify"_thermo_modify.html,
"thermo_style"_thermo_style.html,
"undump"_undump.html,
"write_coeff"_write_coeff.html,
"write_data"_write_data.html,
"write_dump"_write_dump.html,
"write_restart"_write_restart.html
Actions:
"minimize"_minimize.html,
"neb"_neb.html,
"prd"_prd.html,
"rerun"_rerun.html,
"run"_run.html,
"tad"_tad.html,
"temper"_temper.html
Input script control:
"clear"_clear.html,
"echo"_echo.html,
"if"_if.html,
"include"_include.html,
"jump"_jump.html,
"label"_label.html,
"log"_log.html,
"next"_next.html,
"print"_print.html,
"python"_python.html,
"quit"_quit.html,
"shell"_shell.html,
"variable"_variable.html
:line
3.5 Individual commands :h4,link(cmd_5),link(comm)
This section lists all LAMMPS commands alphabetically, with a separate
listing below of styles within certain commands. The "previous
section"_#cmd_4 lists the same commands, grouped by category. Note
that some style options for some commands are part of specific LAMMPS
packages, which means they cannot be used unless the package was
included when LAMMPS was built. Not all packages are included in a
default LAMMPS build. These dependencies are listed as Restrictions
in the command's documentation.
"angle_coeff"_angle_coeff.html,
"angle_style"_angle_style.html,
"atom_modify"_atom_modify.html,
"atom_style"_atom_style.html,
"balance"_balance.html,
"bond_coeff"_bond_coeff.html,
"bond_style"_bond_style.html,
"bond_write"_bond_write.html,
"boundary"_boundary.html,
"box"_box.html,
"change_box"_change_box.html,
"clear"_clear.html,
"comm_modify"_comm_modify.html,
"comm_style"_comm_style.html,
"compute"_compute.html,
"compute_modify"_compute_modify.html,
"create_atoms"_create_atoms.html,
"create_bonds"_create_bonds.html,
"create_box"_create_box.html,
"delete_atoms"_delete_atoms.html,
"delete_bonds"_delete_bonds.html,
"dielectric"_dielectric.html,
"dihedral_coeff"_dihedral_coeff.html,
"dihedral_style"_dihedral_style.html,
"dimension"_dimension.html,
"displace_atoms"_displace_atoms.html,
"dump"_dump.html,
"dump image"_dump_image.html,
"dump_modify"_dump_modify.html,
"dump movie"_dump_image.html,
"echo"_echo.html,
"fix"_fix.html,
"fix_modify"_fix_modify.html,
"group"_group.html,
"if"_if.html,
"info"_info.html,
"improper_coeff"_improper_coeff.html,
"improper_style"_improper_style.html,
"include"_include.html,
"jump"_jump.html,
"kspace_modify"_kspace_modify.html,
"kspace_style"_kspace_style.html,
"label"_label.html,
"lattice"_lattice.html,
"log"_log.html,
"mass"_mass.html,
"minimize"_minimize.html,
"min_modify"_min_modify.html,
"min_style"_min_style.html,
"molecule"_molecule.html,
"neb"_neb.html,
"neigh_modify"_neigh_modify.html,
"neighbor"_neighbor.html,
"newton"_newton.html,
"next"_next.html,
"package"_package.html,
"pair_coeff"_pair_coeff.html,
"pair_modify"_pair_modify.html,
"pair_style"_pair_style.html,
"pair_write"_pair_write.html,
"partition"_partition.html,
"prd"_prd.html,
"print"_print.html,
"processors"_processors.html,
"python"_python.html,
"quit"_quit.html,
"read_data"_read_data.html,
"read_dump"_read_dump.html,
"read_restart"_read_restart.html,
"region"_region.html,
"replicate"_replicate.html,
"rerun"_rerun.html,
"reset_timestep"_reset_timestep.html,
"restart"_restart.html,
"run"_run.html,
"run_style"_run_style.html,
"set"_set.html,
"shell"_shell.html,
"special_bonds"_special_bonds.html,
"suffix"_suffix.html,
"tad"_tad.html,
"temper"_temper.html,
"thermo"_thermo.html,
"thermo_modify"_thermo_modify.html,
"thermo_style"_thermo_style.html,
"timer"_timer.html,
"timestep"_timestep.html,
"uncompute"_uncompute.html,
"undump"_undump.html,
"unfix"_unfix.html,
"units"_units.html,
"variable"_variable.html,
"velocity"_velocity.html,
"write_coeff"_write_coeff.html,
"write_data"_write_data.html,
"write_dump"_write_dump.html,
"write_restart"_write_restart.html :tb(c=6,ea=c)
These are additional commands in USER packages, which can be used if
"LAMMPS is built with the appropriate
package"_Section_start.html#start_3.
"dump custom/vtk"_dump_custom_vtk.html,
"dump nc"_dump_nc.html,
"dump nc/mpiio"_dump_nc.html,
"group2ndx"_group2ndx.html,
"ndx2group"_group2ndx.html,
"temper/grem"_temper_grem.html :tb(c=3,ea=c)
:line
Fix styles :h4
See the "fix"_fix.html command for one-line descriptions of each style
or click on the style itself for a full description. Some of the
styles have accelerated versions, which can be used if LAMMPS is built
with the "appropriate accelerated package"_Section_accelerate.html.
This is indicated by additional letters in parenthesis: g = GPU, i =
USER-INTEL, k = KOKKOS, o = USER-OMP, t = OPT.
"adapt"_fix_adapt.html,
"addforce"_fix_addforce.html,
"append/atoms"_fix_append_atoms.html,
"atom/swap"_fix_atom_swap.html,
"aveforce"_fix_aveforce.html,
"ave/atom"_fix_ave_atom.html,
"ave/chunk"_fix_ave_chunk.html,
"ave/correlate"_fix_ave_correlate.html,
"ave/histo"_fix_ave_histo.html,
"ave/histo/weight"_fix_ave_histo.html,
"ave/time"_fix_ave_time.html,
"balance"_fix_balance.html,
"bond/break"_fix_bond_break.html,
"bond/create"_fix_bond_create.html,
"bond/swap"_fix_bond_swap.html,
"box/relax"_fix_box_relax.html,
"cmap"_fix_cmap.html,
"controller"_fix_controller.html,
"deform (k)"_fix_deform.html,
"deposit"_fix_deposit.html,
"drag"_fix_drag.html,
"dt/reset"_fix_dt_reset.html,
"efield"_fix_efield.html,
"ehex"_fix_ehex.html,
"enforce2d"_fix_enforce2d.html,
"evaporate"_fix_evaporate.html,
"external"_fix_external.html,
"freeze"_fix_freeze.html,
"gcmc"_fix_gcmc.html,
"gld"_fix_gld.html,
"gravity (o)"_fix_gravity.html,
"halt"_fix_halt.html,
"heat"_fix_heat.html,
"indent"_fix_indent.html,
"langevin (k)"_fix_langevin.html,
"lineforce"_fix_lineforce.html,
"momentum (k)"_fix_momentum.html,
"move"_fix_move.html,
"mscg"_fix_mscg.html,
"msst"_fix_msst.html,
"neb"_fix_neb.html,
"nph (ko)"_fix_nh.html,
"nphug (o)"_fix_nphug.html,
"nph/asphere (o)"_fix_nph_asphere.html,
"nph/body"_fix_nph_body.html,
"nph/sphere (o)"_fix_nph_sphere.html,
"npt (kio)"_fix_nh.html,
"npt/asphere (o)"_fix_npt_asphere.html,
"npt/body"_fix_npt_body.html,
"npt/sphere (o)"_fix_npt_sphere.html,
"nve (kio)"_fix_nve.html,
"nve/asphere (i)"_fix_nve_asphere.html,
"nve/asphere/noforce"_fix_nve_asphere_noforce.html,
"nve/body"_fix_nve_body.html,
"nve/limit"_fix_nve_limit.html,
"nve/line"_fix_nve_line.html,
"nve/noforce"_fix_nve_noforce.html,
"nve/sphere (o)"_fix_nve_sphere.html,
"nve/tri"_fix_nve_tri.html,
"nvt (iko)"_fix_nh.html,
"nvt/asphere (o)"_fix_nvt_asphere.html,
"nvt/body"_fix_nvt_body.html,
"nvt/sllod (io)"_fix_nvt_sllod.html,
"nvt/sphere (o)"_fix_nvt_sphere.html,
"oneway"_fix_oneway.html,
"orient/bcc"_fix_orient.html,
"orient/fcc"_fix_orient.html,
"planeforce"_fix_planeforce.html,
"poems"_fix_poems.html,
"pour"_fix_pour.html,
"press/berendsen"_fix_press_berendsen.html,
"print"_fix_print.html,
"property/atom"_fix_property_atom.html,
"qeq/comb (o)"_fix_qeq_comb.html,
"qeq/dynamic"_fix_qeq.html,
"qeq/fire"_fix_qeq.html,
"qeq/point"_fix_qeq.html,
"qeq/shielded"_fix_qeq.html,
"qeq/slater"_fix_qeq.html,
"rattle"_fix_shake.html,
"reax/bonds"_fix_reax_bonds.html,
"recenter"_fix_recenter.html,
"restrain"_fix_restrain.html,
"rigid (o)"_fix_rigid.html,
"rigid/nph (o)"_fix_rigid.html,
"rigid/npt (o)"_fix_rigid.html,
"rigid/nve (o)"_fix_rigid.html,
"rigid/nvt (o)"_fix_rigid.html,
"rigid/small (o)"_fix_rigid.html,
"rigid/small/nph (o)"_fix_rigid.html,
"rigid/small/npt (o)"_fix_rigid.html,
"rigid/small/nve (o)"_fix_rigid.html,
"rigid/small/nvt (o)"_fix_rigid.html,
"setforce (k)"_fix_setforce.html,
"shake"_fix_shake.html,
"spring"_fix_spring.html,
"spring/chunk"_fix_spring_chunk.html,
"spring/rg"_fix_spring_rg.html,
"spring/self"_fix_spring_self.html,
"srd"_fix_srd.html,
"store/force"_fix_store_force.html,
"store/state"_fix_store_state.html,
"temp/berendsen"_fix_temp_berendsen.html,
"temp/csld"_fix_temp_csvr.html,
"temp/csvr"_fix_temp_csvr.html,
"temp/rescale"_fix_temp_rescale.html,
"tfmc"_fix_tfmc.html,
"thermal/conductivity"_fix_thermal_conductivity.html,
"tmd"_fix_tmd.html,
"ttm"_fix_ttm.html,
"tune/kspace"_fix_tune_kspace.html,
"vector"_fix_vector.html,
"viscosity"_fix_viscosity.html,
"viscous"_fix_viscous.html,
"wall/colloid"_fix_wall.html,
"wall/gran"_fix_wall_gran.html,
"wall/gran/region"_fix_wall_gran_region.html,
"wall/harmonic"_fix_wall.html,
"wall/lj1043"_fix_wall.html,
"wall/lj126"_fix_wall.html,
"wall/lj93"_fix_wall.html,
"wall/piston"_fix_wall_piston.html,
"wall/reflect (k)"_fix_wall_reflect.html,
"wall/region"_fix_wall_region.html,
"wall/srd"_fix_wall_srd.html :tb(c=8,ea=c)
These are additional fix styles in USER packages, which can be used if
"LAMMPS is built with the appropriate
package"_Section_start.html#start_3.
"adapt/fep"_fix_adapt_fep.html,
"addtorque"_fix_addtorque.html,
"atc"_fix_atc.html,
"ave/correlate/long"_fix_ave_correlate_long.html,
"colvars"_fix_colvars.html,
"dpd/energy"_fix_dpd_energy.html,
"drude"_fix_drude.html,
"drude/transform/direct"_fix_drude_transform.html,
"drude/transform/reverse"_fix_drude_transform.html,
"eos/cv"_fix_eos_cv.html,
"eos/table"_fix_eos_table.html,
"eos/table/rx"_fix_eos_table_rx.html,
"filter/corotate"_fix_filter_corotate.html,
"flow/gauss"_fix_flow_gauss.html,
"gle"_fix_gle.html,
"grem"_fix_grem.html,
"imd"_fix_imd.html,
"ipi"_fix_ipi.html,
"langevin/drude"_fix_langevin_drude.html,
"langevin/eff"_fix_langevin_eff.html,
"lb/fluid"_fix_lb_fluid.html,
"lb/momentum"_fix_lb_momentum.html,
"lb/pc"_fix_lb_pc.html,
"lb/rigid/pc/sphere"_fix_lb_rigid_pc_sphere.html,
"lb/viscous"_fix_lb_viscous.html,
"meso"_fix_meso.html,
"manifoldforce"_fix_manifoldforce.html,
"meso/stationary"_fix_meso_stationary.html,
"nve/dot"_fix_nve_dot.html,
"nve/dotc/langevin"_fix_nve_dotc_langevin.html,
"nve/manifold/rattle"_fix_nve_manifold_rattle.html,
"nvk"_fix_nvk.html,
"nvt/manifold/rattle"_fix_nvt_manifold_rattle.html,
"nph/eff"_fix_nh_eff.html,
"npt/eff"_fix_nh_eff.html,
"nve/eff"_fix_nve_eff.html,
"nvt/eff"_fix_nh_eff.html,
"nvt/sllod/eff"_fix_nvt_sllod_eff.html,
"phonon"_fix_phonon.html,
"pimd"_fix_pimd.html,
"qbmsst"_fix_qbmsst.html,
"qeq/reax"_fix_qeq_reax.html,
"qmmm"_fix_qmmm.html,
"qtb"_fix_qtb.html,
"reax/c/bonds"_fix_reax_bonds.html,
"reax/c/species"_fix_reaxc_species.html,
"rx"_fix_rx.html,
"saed/vtk"_fix_saed_vtk.html,
"shardlow"_fix_shardlow.html,
"smd"_fix_smd.html,
"smd/adjust/dt"_fix_smd_adjust_dt.html,
"smd/integrate/tlsph"_fix_smd_integrate_tlsph.html,
"smd/integrate/ulsph"_fix_smd_integrate_ulsph.html,
"smd/move/triangulated/surface"_fix_smd_move_triangulated_surface.html,
"smd/setvel"_fix_smd_setvel.html,
"smd/wall/surface"_fix_smd_wall_surface.html,
"temp/rescale/eff"_fix_temp_rescale_eff.html,
"ti/spring"_fix_ti_spring.html,
"ttm/mod"_fix_ttm.html :tb(c=6,ea=c)
:line
Compute styles :h4
See the "compute"_compute.html command for one-line descriptions of
each style or click on the style itself for a full description. Some
of the styles have accelerated versions, which can be used if LAMMPS
is built with the "appropriate accelerated
package"_Section_accelerate.html. This is indicated by additional
letters in parenthesis: g = GPU, i = USER-INTEL, k =
KOKKOS, o = USER-OMP, t = OPT.
"angle"_compute_angle.html,
"angle/local"_compute_angle_local.html,
"angmom/chunk"_compute_angmom_chunk.html,
"body/local"_compute_body_local.html,
"bond"_compute_bond.html,
"bond/local"_compute_bond_local.html,
"centro/atom"_compute_centro_atom.html,
"chunk/atom"_compute_chunk_atom.html,
"cluster/atom"_compute_cluster_atom.html,
"cna/atom"_compute_cna_atom.html,
"com"_compute_com.html,
"com/chunk"_compute_com_chunk.html,
"contact/atom"_compute_contact_atom.html,
"coord/atom"_compute_coord_atom.html,
"damage/atom"_compute_damage_atom.html,
"dihedral"_compute_dihedral.html,
"dihedral/local"_compute_dihedral_local.html,
"dilatation/atom"_compute_dilatation_atom.html,
"dipole/chunk"_compute_dipole_chunk.html,
"displace/atom"_compute_displace_atom.html,
"erotate/asphere"_compute_erotate_asphere.html,
"erotate/rigid"_compute_erotate_rigid.html,
"erotate/sphere"_compute_erotate_sphere.html,
"erotate/sphere/atom"_compute_erotate_sphere_atom.html,
"event/displace"_compute_event_displace.html,
"global/atom"_compute_global_atom.html,
"group/group"_compute_group_group.html,
"gyration"_compute_gyration.html,
"gyration/chunk"_compute_gyration_chunk.html,
"heat/flux"_compute_heat_flux.html,
"hexorder/atom"_compute_hexorder_atom.html,
"improper"_compute_improper.html,
"improper/local"_compute_improper_local.html,
"inertia/chunk"_compute_inertia_chunk.html,
"ke"_compute_ke.html,
"ke/atom"_compute_ke_atom.html,
"ke/rigid"_compute_ke_rigid.html,
"msd"_compute_msd.html,
"msd/chunk"_compute_msd_chunk.html,
"msd/nongauss"_compute_msd_nongauss.html,
"omega/chunk"_compute_omega_chunk.html,
"orientorder/atom"_compute_orientorder_atom.html,
"pair"_compute_pair.html,
"pair/local"_compute_pair_local.html,
"pe"_compute_pe.html,
"pe/atom"_compute_pe_atom.html,
"plasticity/atom"_compute_plasticity_atom.html,
"pressure"_compute_pressure.html,
"property/atom"_compute_property_atom.html,
"property/local"_compute_property_local.html,
"property/chunk"_compute_property_chunk.html,
"rdf"_compute_rdf.html,
"reduce"_compute_reduce.html,
"reduce/region"_compute_reduce.html,
"rigid/local"_compute_rigid_local.html,
"slice"_compute_slice.html,
"sna/atom"_compute_sna_atom.html,
"snad/atom"_compute_sna_atom.html,
"snav/atom"_compute_sna_atom.html,
"stress/atom"_compute_stress_atom.html,
"temp (k)"_compute_temp.html,
"temp/asphere"_compute_temp_asphere.html,
"temp/body"_compute_temp_body.html,
"temp/chunk"_compute_temp_chunk.html,
"temp/com"_compute_temp_com.html,
"temp/deform"_compute_temp_deform.html,
"temp/partial"_compute_temp_partial.html,
"temp/profile"_compute_temp_profile.html,
"temp/ramp"_compute_temp_ramp.html,
"temp/region"_compute_temp_region.html,
"temp/sphere"_compute_temp_sphere.html,
"ti"_compute_ti.html,
"torque/chunk"_compute_torque_chunk.html,
"vacf"_compute_vacf.html,
"vcm/chunk"_compute_vcm_chunk.html,
"voronoi/atom"_compute_voronoi_atom.html :tb(c=6,ea=c)
These are additional compute styles in USER packages, which can be
used if "LAMMPS is built with the appropriate
package"_Section_start.html#start_3.
"ackland/atom"_compute_ackland_atom.html,
"basal/atom"_compute_basal_atom.html,
"dpd"_compute_dpd.html,
"dpd/atom"_compute_dpd_atom.html,
"fep"_compute_fep.html,
"force/tally"_compute_tally.html,
"heat/flux/tally"_compute_tally.html,
"ke/eff"_compute_ke_eff.html,
"ke/atom/eff"_compute_ke_atom_eff.html,
"meso/e/atom"_compute_meso_e_atom.html,
"meso/rho/atom"_compute_meso_rho_atom.html,
"meso/t/atom"_compute_meso_t_atom.html,
"pe/tally"_compute_tally.html,
"pe/mol/tally"_compute_tally.html,
"saed"_compute_saed.html,
"smd/contact/radius"_compute_smd_contact_radius.html,
"smd/damage"_compute_smd_damage.html,
"smd/hourglass/error"_compute_smd_hourglass_error.html,
"smd/internal/energy"_compute_smd_internal_energy.html,
"smd/plastic/strain"_compute_smd_plastic_strain.html,
"smd/plastic/strain/rate"_compute_smd_plastic_strain_rate.html,
"smd/rho"_compute_smd_rho.html,
"smd/tlsph/defgrad"_compute_smd_tlsph_defgrad.html,
"smd/tlsph/dt"_compute_smd_tlsph_dt.html,
"smd/tlsph/num/neighs"_compute_smd_tlsph_num_neighs.html,
"smd/tlsph/shape"_compute_smd_tlsph_shape.html,
"smd/tlsph/strain"_compute_smd_tlsph_strain.html,
"smd/tlsph/strain/rate"_compute_smd_tlsph_strain_rate.html,
"smd/tlsph/stress"_compute_smd_tlsph_stress.html,
"smd/triangle/mesh/vertices"_compute_smd_triangle_mesh_vertices.html,
"smd/ulsph/num/neighs"_compute_smd_ulsph_num_neighs.html,
"smd/ulsph/strain"_compute_smd_ulsph_strain.html,
"smd/ulsph/strain/rate"_compute_smd_ulsph_strain_rate.html,
"smd/ulsph/stress"_compute_smd_ulsph_stress.html,
"smd/vol"_compute_smd_vol.html,
"stress/tally"_compute_tally.html,
"temp/drude"_compute_temp_drude.html,
"temp/eff"_compute_temp_eff.html,
"temp/deform/eff"_compute_temp_deform_eff.html,
"temp/region/eff"_compute_temp_region_eff.html,
"temp/rotate"_compute_temp_rotate.html,
"xrd"_compute_xrd.html :tb(c=6,ea=c)
:line
Pair_style potentials :h4
See the "pair_style"_pair_style.html command for an overview of pair
potentials. Click on the style itself for a full description. Many
of the styles have accelerated versions, which can be used if LAMMPS
is built with the "appropriate accelerated
package"_Section_accelerate.html. This is indicated by additional
letters in parenthesis: g = GPU, i = USER-INTEL, k =
KOKKOS, o = USER-OMP, t = OPT.
"none"_pair_none.html,
"zero"_pair_zero.html,
"hybrid"_pair_hybrid.html,
"hybrid/overlay"_pair_hybrid.html,
"adp (o)"_pair_adp.html,
"airebo (o)"_pair_airebo.html,
"airebo/morse (o)"_pair_airebo.html,
"beck (go)"_pair_beck.html,
"body"_pair_body.html,
"bop"_pair_bop.html,
"born (go)"_pair_born.html,
"born/coul/dsf"_pair_born.html,
"born/coul/dsf/cs"_pair_born.html,
"born/coul/long (go)"_pair_born.html,
"born/coul/long/cs"_pair_born.html,
"born/coul/msm (o)"_pair_born.html,
"born/coul/wolf (go)"_pair_born.html,
"brownian (o)"_pair_brownian.html,
"brownian/poly (o)"_pair_brownian.html,
"buck (gkio)"_pair_buck.html,
"buck/coul/cut (gkio)"_pair_buck.html,
"buck/coul/long (gkio)"_pair_buck.html,
"buck/coul/long/cs"_pair_buck.html,
"buck/coul/msm (o)"_pair_buck.html,
"buck/long/coul/long (o)"_pair_buck_long.html,
"colloid (go)"_pair_colloid.html,
"comb (o)"_pair_comb.html,
"comb3"_pair_comb.html,
"coul/cut (gko)"_pair_coul.html,
"coul/debye (gko)"_pair_coul.html,
"coul/dsf (gko)"_pair_coul.html,
"coul/long (gko)"_pair_coul.html,
"coul/long/cs"_pair_coul.html,
"coul/msm"_pair_coul.html,
"coul/streitz"_pair_coul.html,
"coul/wolf (ko)"_pair_coul.html,
"dpd (go)"_pair_dpd.html,
"dpd/tstat (go)"_pair_dpd.html,
"dsmc"_pair_dsmc.html,
"eam (gkiot)"_pair_eam.html,
"eam/alloy (gkot)"_pair_eam.html,
"eam/fs (gkot)"_pair_eam.html,
"eim (o)"_pair_eim.html,
"gauss (go)"_pair_gauss.html,
"gayberne (gio)"_pair_gayberne.html,
"gran/hertz/history (o)"_pair_gran.html,
"gran/hooke (o)"_pair_gran.html,
"gran/hooke/history (o)"_pair_gran.html,
"hbond/dreiding/lj (o)"_pair_hbond_dreiding.html,
"hbond/dreiding/morse (o)"_pair_hbond_dreiding.html,
"kim"_pair_kim.html,
"lcbop"_pair_lcbop.html,
"line/lj"_pair_line_lj.html,
"lj/charmm/coul/charmm (ko)"_pair_charmm.html,
"lj/charmm/coul/charmm/implicit (ko)"_pair_charmm.html,
"lj/charmm/coul/long (giko)"_pair_charmm.html,
"lj/charmm/coul/msm"_pair_charmm.html,
"lj/charmmfsw/coul/charmmfsh"_pair_charmm.html,
"lj/charmmfsw/coul/long"_pair_charmm.html,
"lj/class2 (gko)"_pair_class2.html,
"lj/class2/coul/cut (ko)"_pair_class2.html,
"lj/class2/coul/long (gko)"_pair_class2.html,
"lj/cubic (go)"_pair_lj_cubic.html,
"lj/cut (gikot)"_pair_lj.html,
"lj/cut/coul/cut (gko)"_pair_lj.html,
"lj/cut/coul/debye (gko)"_pair_lj.html,
"lj/cut/coul/dsf (gko)"_pair_lj.html,
"lj/cut/coul/long (gikot)"_pair_lj.html,
"lj/cut/coul/long/cs"_pair_lj.html,
"lj/cut/coul/msm (go)"_pair_lj.html,
"lj/cut/dipole/cut (go)"_pair_dipole.html,
"lj/cut/dipole/long"_pair_dipole.html,
"lj/cut/tip4p/cut (o)"_pair_lj.html,
"lj/cut/tip4p/long (ot)"_pair_lj.html,
"lj/expand (gko)"_pair_lj_expand.html,
"lj/gromacs (gko)"_pair_gromacs.html,
"lj/gromacs/coul/gromacs (ko)"_pair_gromacs.html,
"lj/long/coul/long (o)"_pair_lj_long.html,
"lj/long/dipole/long"_pair_dipole.html,
"lj/long/tip4p/long"_pair_lj_long.html,
"lj/smooth (o)"_pair_lj_smooth.html,
"lj/smooth/linear (o)"_pair_lj_smooth_linear.html,
"lj96/cut (go)"_pair_lj96.html,
"lubricate (o)"_pair_lubricate.html,
"lubricate/poly (o)"_pair_lubricate.html,
"lubricateU"_pair_lubricateU.html,
"lubricateU/poly"_pair_lubricateU.html,
"meam"_pair_meam.html,
"mie/cut (o)"_pair_mie.html,
"morse (gkot)"_pair_morse.html,
"nb3b/harmonic (o)"_pair_nb3b_harmonic.html,
"nm/cut (o)"_pair_nm.html,
"nm/cut/coul/cut (o)"_pair_nm.html,
"nm/cut/coul/long (o)"_pair_nm.html,
"peri/eps"_pair_peri.html,
"peri/lps (o)"_pair_peri.html,
"peri/pmb (o)"_pair_peri.html,
"peri/ves"_pair_peri.html,
"polymorphic"_pair_polymorphic.html,
"reax"_pair_reax.html,
"rebo (o)"_pair_airebo.html,
"resquared (go)"_pair_resquared.html,
"snap"_pair_snap.html,
"soft (go)"_pair_soft.html,
"sw (gkio)"_pair_sw.html,
"table (gko)"_pair_table.html,
"tersoff (gkio)"_pair_tersoff.html,
"tersoff/mod (gko)"_pair_tersoff_mod.html,
"tersoff/mod/c (o)"_pair_tersoff_mod.html,
"tersoff/zbl (gko)"_pair_tersoff_zbl.html,
"tip4p/cut (o)"_pair_coul.html,
"tip4p/long (o)"_pair_coul.html,
"tri/lj"_pair_tri_lj.html,
"vashishta (ko)"_pair_vashishta.html,
"vashishta/table (o)"_pair_vashishta.html,
"yukawa (go)"_pair_yukawa.html,
"yukawa/colloid (go)"_pair_yukawa_colloid.html,
"zbl (go)"_pair_zbl.html :tb(c=4,ea=c)
These are additional pair styles in USER packages, which can be used
if "LAMMPS is built with the appropriate
package"_Section_start.html#start_3.
"agni (o)"_pair_agni.html,
"awpmd/cut"_pair_awpmd.html,
"buck/mdf"_pair_mdf.html,
"coul/cut/soft (o)"_pair_lj_soft.html,
"coul/diel (o)"_pair_coul_diel.html,
"coul/long/soft (o)"_pair_lj_soft.html,
"dpd/fdt"_pair_dpd_fdt.html,
"dpd/fdt/energy"_pair_dpd_fdt.html,
"eam/cd (o)"_pair_eam.html,
"edip (o)"_pair_edip.html,
"eff/cut"_pair_eff.html,
"exp6/rx"_pair_exp6_rx.html,
"gauss/cut"_pair_gauss.html,
"kolmogorov/crespi/z"_pair_kolmogorov_crespi_z.html,
"lennard/mdf"_pair_mdf.html,
"list"_pair_list.html,
"lj/charmm/coul/long/soft (o)"_pair_charmm.html,
"lj/cut/coul/cut/soft (o)"_pair_lj_soft.html,
"lj/cut/coul/long/soft (o)"_pair_lj_soft.html,
"lj/cut/dipole/sf (go)"_pair_dipole.html,
"lj/cut/soft (o)"_pair_lj_soft.html,
"lj/cut/thole/long (o)"_pair_thole.html,
"lj/cut/tip4p/long/soft (o)"_pair_lj_soft.html,
"lj/mdf"_pair_mdf.html,
"lj/sdk (gko)"_pair_sdk.html,
"lj/sdk/coul/long (go)"_pair_sdk.html,
"lj/sdk/coul/msm (o)"_pair_sdk.html,
"lj/sf (o)"_pair_lj_sf.html,
"meam/spline (o)"_pair_meam_spline.html,
"meam/sw/spline"_pair_meam_sw_spline.html,
"mgpt"_pair_mgpt.html,
"momb"_pair_momb.html,
"morse/smooth/linear"_pair_morse.html,
"morse/soft"_pair_morse.html,
"multi/lucy"_pair_multi_lucy.html,
"multi/lucy/rx"_pair_multi_lucy_rx.html,
"oxdna/coaxstk"_pair_oxdna.html,
"oxdna/excv"_pair_oxdna.html,
"oxdna/hbond"_pair_oxdna.html,
"oxdna/stk"_pair_oxdna.html,
"oxdna/xstk"_pair_oxdna.html,
"oxdna2/coaxstk"_pair_oxdna2.html,
"oxdna2/dh"_pair_oxdna2.html,
"oxdna2/excv"_pair_oxdna2.html,
"oxdna2/stk"_pair_oxdna2.html,
"quip"_pair_quip.html,
-"reax/c (k)"_pair_reax_c.html,
+"reax/c (k)"_pair_reaxc.html,
"smd/hertz"_pair_smd_hertz.html,
"smd/tlsph"_pair_smd_tlsph.html,
"smd/triangulated/surface"_pair_smd_triangulated_surface.html,
"smd/ulsph"_pair_smd_ulsph.html,
"smtbq"_pair_smtbq.html,
"sph/heatconduction"_pair_sph_heatconduction.html,
"sph/idealgas"_pair_sph_idealgas.html,
"sph/lj"_pair_sph_lj.html,
"sph/rhosum"_pair_sph_rhosum.html,
"sph/taitwater"_pair_sph_taitwater.html,
"sph/taitwater/morris"_pair_sph_taitwater_morris.html,
"srp"_pair_srp.html,
"table/rx"_pair_table_rx.html,
"tersoff/table (o)"_pair_tersoff.html,
"thole"_pair_thole.html,
"tip4p/long/soft (o)"_pair_lj_soft.html :tb(c=4,ea=c)
:line
Bond_style potentials :h4
See the "bond_style"_bond_style.html command for an overview of bond
potentials. Click on the style itself for a full description. Some
of the styles have accelerated versions, which can be used if LAMMPS
is built with the "appropriate accelerated
package"_Section_accelerate.html. This is indicated by additional
letters in parenthesis: g = GPU, i = USER-INTEL, k =
KOKKOS, o = USER-OMP, t = OPT.
"none"_bond_none.html,
"zero"_bond_zero.html,
"hybrid"_bond_hybrid.html,
"class2 (ko)"_bond_class2.html,
"fene (iko)"_bond_fene.html,
"fene/expand (o)"_bond_fene_expand.html,
"harmonic (ko)"_bond_harmonic.html,
"morse (o)"_bond_morse.html,
"nonlinear (o)"_bond_nonlinear.html,
"quartic (o)"_bond_quartic.html,
"table (o)"_bond_table.html :tb(c=4,ea=c)
These are additional bond styles in USER packages, which can be used
if "LAMMPS is built with the appropriate
package"_Section_start.html#start_3.
"harmonic/shift (o)"_bond_harmonic_shift.html,
"harmonic/shift/cut (o)"_bond_harmonic_shift_cut.html,
"oxdna/fene"_bond_oxdna.html,
"oxdna2/fene"_bond_oxdna.html :tb(c=4,ea=c)
:line
Angle_style potentials :h4
See the "angle_style"_angle_style.html command for an overview of
angle potentials. Click on the style itself for a full description.
Some of the styles have accelerated versions, which can be used if
LAMMPS is built with the "appropriate accelerated
package"_Section_accelerate.html. This is indicated by additional
letters in parenthesis: g = GPU, i = USER-INTEL, k = KOKKOS, o =
USER-OMP, t = OPT.
"none"_angle_none.html,
"zero"_angle_zero.html,
"hybrid"_angle_hybrid.html,
"charmm (ko)"_angle_charmm.html,
"class2 (ko)"_angle_class2.html,
"cosine (o)"_angle_cosine.html,
"cosine/delta (o)"_angle_cosine_delta.html,
"cosine/periodic (o)"_angle_cosine_periodic.html,
"cosine/squared (o)"_angle_cosine_squared.html,
"harmonic (iko)"_angle_harmonic.html,
"table (o)"_angle_table.html :tb(c=4,ea=c)
These are additional angle styles in USER packages, which can be used
if "LAMMPS is built with the appropriate
package"_Section_start.html#start_3.
"cosine/shift (o)"_angle_cosine_shift.html,
"cosine/shift/exp (o)"_angle_cosine_shift_exp.html,
"dipole (o)"_angle_dipole.html,
"fourier (o)"_angle_fourier.html,
"fourier/simple (o)"_angle_fourier_simple.html,
"quartic (o)"_angle_quartic.html,
"sdk"_angle_sdk.html :tb(c=4,ea=c)
:line
Dihedral_style potentials :h4
See the "dihedral_style"_dihedral_style.html command for an overview
of dihedral potentials. Click on the style itself for a full
description. Some of the styles have accelerated versions, which can
be used if LAMMPS is built with the "appropriate accelerated
package"_Section_accelerate.html. This is indicated by additional
letters in parenthesis: g = GPU, i = USER-INTEL, k = KOKKOS, o =
USER-OMP, t = OPT.
"none"_dihedral_none.html,
"zero"_dihedral_zero.html,
"hybrid"_dihedral_hybrid.html,
"charmm (ko)"_dihedral_charmm.html,
"charmmfsw"_dihedral_charmm.html,
"class2 (ko)"_dihedral_class2.html,
"harmonic (io)"_dihedral_harmonic.html,
"helix (o)"_dihedral_helix.html,
"multi/harmonic (o)"_dihedral_multi_harmonic.html,
"opls (iko)"_dihedral_opls.html :tb(c=4,ea=c)
These are additional dihedral styles in USER packages, which can be
used if "LAMMPS is built with the appropriate
package"_Section_start.html#start_3.
"cosine/shift/exp (o)"_dihedral_cosine_shift_exp.html,
"fourier (o)"_dihedral_fourier.html,
"nharmonic (o)"_dihedral_nharmonic.html,
"quadratic (o)"_dihedral_quadratic.html,
"spherical (o)"_dihedral_spherical.html,
"table (o)"_dihedral_table.html :tb(c=4,ea=c)
:line
Improper_style potentials :h4
See the "improper_style"_improper_style.html command for an overview
of improper potentials. Click on the style itself for a full
description. Some of the styles have accelerated versions, which can
be used if LAMMPS is built with the "appropriate accelerated
package"_Section_accelerate.html. This is indicated by additional
letters in parenthesis: g = GPU, i = USER-INTEL, k = KOKKOS, o =
USER-OMP, t = OPT.
"none"_improper_none.html,
"zero"_improper_zero.html,
"hybrid"_improper_hybrid.html,
"class2 (ko)"_improper_class2.html,
"cvff (io)"_improper_cvff.html,
"harmonic (ko)"_improper_harmonic.html,
"umbrella (o)"_improper_umbrella.html :tb(c=4,ea=c)
These are additional improper styles in USER packages, which can be
used if "LAMMPS is built with the appropriate
package"_Section_start.html#start_3.
"cossq (o)"_improper_cossq.html,
"distance"_improper_distance.html,
"fourier (o)"_improper_fourier.html,
"ring (o)"_improper_ring.html :tb(c=4,ea=c)
:line
Kspace solvers :h4
See the "kspace_style"_kspace_style.html command for an overview of
Kspace solvers. Click on the style itself for a full description.
Some of the styles have accelerated versions, which can be used if
LAMMPS is built with the "appropriate accelerated
package"_Section_accelerate.html. This is indicated by additional
letters in parenthesis: g = GPU, i = USER-INTEL, k = KOKKOS, o =
USER-OMP, t = OPT.
"ewald (o)"_kspace_style.html,
"ewald/disp"_kspace_style.html,
"msm (o)"_kspace_style.html,
"msm/cg (o)"_kspace_style.html,
"pppm (go)"_kspace_style.html,
"pppm/cg (o)"_kspace_style.html,
"pppm/disp"_kspace_style.html,
"pppm/disp/tip4p"_kspace_style.html,
"pppm/stagger"_kspace_style.html,
"pppm/tip4p (o)"_kspace_style.html :tb(c=4,ea=c)
diff --git a/doc/src/Section_errors.txt b/doc/src/Section_errors.txt
index 832c5718a..5e0574b39 100644
--- a/doc/src/Section_errors.txt
+++ b/doc/src/Section_errors.txt
@@ -1,11922 +1,11928 @@
"Previous Section"_Section_python.html - "LAMMPS WWW Site"_lws -
"LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next
Section"_Section_history.html :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
12. Errors :h3
This section describes the errors you can encounter when using LAMMPS,
either conceptually, or as printed out by the program.
12.1 "Common problems"_#err_1
12.2 "Reporting bugs"_#err_2
12.3 "Error & warning messages"_#err_3 :all(b)
:line
:line
12.1 Common problems :link(err_1),h4
If two LAMMPS runs do not produce the exact same answer on different
machines or different numbers of processors, this is typically not a
bug. In theory you should get identical answers on any number of
processors and on any machine. In practice, numerical round-off can
cause slight differences and eventual divergence of molecular dynamics
phase space trajectories within a few 100s or few 1000s of timesteps.
However, the statistical properties of the two runs (e.g. average
energy or temperature) should still be the same.
If the "velocity"_velocity.html command is used to set initial atom
velocities, a particular atom can be assigned a different velocity
when the problem is run on a different number of processors or on
different machines. If this happens, the phase space trajectories of
the two simulations will rapidly diverge. See the discussion of the
{loop} option in the "velocity"_velocity.html command for details and
options that avoid this issue.
Similarly, the "create_atoms"_create_atoms.html command generates a
lattice of atoms. For the same physical system, the ordering and
numbering of atoms by atom ID may be different depending on the number
of processors.
Some commands use random number generators which may be setup to
produce different random number streams on each processor and hence
will produce different effects when run on different numbers of
processors. A commonly-used example is the "fix
langevin"_fix_langevin.html command for thermostatting.
A LAMMPS simulation typically has two stages, setup and run. Most
LAMMPS errors are detected at setup time; others like a bond
stretching too far may not occur until the middle of a run.
LAMMPS tries to flag errors and print informative error messages so
you can fix the problem. For most errors it will also print the last
input script command that it was processing. Of course, LAMMPS cannot
figure out your physics or numerical mistakes, like choosing too big a
timestep, specifying erroneous force field coefficients, or putting 2
atoms on top of each other! If you run into errors that LAMMPS
doesn't catch that you think it should flag, please send an email to
the "developers"_http://lammps.sandia.gov/authors.html.
If you get an error message about an invalid command in your input
script, you can determine what command is causing the problem by
looking in the log.lammps file or using the "echo command"_echo.html
to see it on the screen. If you get an error like "Invalid ...
style", with ... being fix, compute, pair, etc, it means that you
mistyped the style name or that the command is part of an optional
package which was not compiled into your executable. The list of
available styles in your executable can be listed by using "the -h
command-line argument"_Section_start.html#start_7. The installation
and compilation of optional packages is explained in the "installation
instructions"_Section_start.html#start_3.
For a given command, LAMMPS expects certain arguments in a specified
order. If you mess this up, LAMMPS will often flag the error, but it
may also simply read a bogus argument and assign a value that is
valid, but not what you wanted. E.g. trying to read the string "abc"
as an integer value of 0. Careful reading of the associated doc page
for the command should allow you to fix these problems. In most cases,
where LAMMPS expects to read a number, either integer or floating point,
it performs a stringent test on whether the provided input actually
is an integer or floating-point number, respectively, and reject the
input with an error message (for instance, when an integer is required,
but a floating-point number 1.0 is provided):
ERROR: Expected integer parameter in input script or data file :pre
Some commands allow for using variable references in place of numeric
constants so that the value can be evaluated and may change over the
course of a run. This is typically done with the syntax {v_name} for a
parameter, where name is the name of the variable. On the other hand,
immediate variable expansion with the syntax ${name} is performed while
reading the input and before parsing commands,
NOTE: Using a variable reference (i.e. {v_name}) is only allowed if
the documentation of the corresponding command explicitly says it is.
Generally, LAMMPS will print a message to the screen and logfile and
exit gracefully when it encounters a fatal error. Sometimes it will
print a WARNING to the screen and logfile and continue on; you can
decide if the WARNING is important or not. A WARNING message that is
generated in the middle of a run is only printed to the screen, not to
the logfile, to avoid cluttering up thermodynamic output. If LAMMPS
crashes or hangs without spitting out an error message first then it
could be a bug (see "this section"_#err_2) or one of the following
cases:
LAMMPS runs in the available memory a processor allows to be
allocated. Most reasonable MD runs are compute limited, not memory
limited, so this shouldn't be a bottleneck on most platforms. Almost
all large memory allocations in the code are done via C-style malloc's
which will generate an error message if you run out of memory.
Smaller chunks of memory are allocated via C++ "new" statements. If
you are unlucky you could run out of memory just when one of these
small requests is made, in which case the code will crash or hang (in
parallel), since LAMMPS doesn't trap on those errors.
Illegal arithmetic can cause LAMMPS to run slow or crash. This is
typically due to invalid physics and numerics that your simulation is
computing. If you see wild thermodynamic values or NaN values in your
LAMMPS output, something is wrong with your simulation. If you
suspect this is happening, it is a good idea to print out
thermodynamic info frequently (e.g. every timestep) via the
"thermo"_thermo.html so you can monitor what is happening.
Visualizing the atom movement is also a good idea to insure your model
is behaving as you expect.
In parallel, one way LAMMPS can hang is due to how different MPI
implementations handle buffering of messages. If the code hangs
without an error message, it may be that you need to specify an MPI
setting or two (usually via an environment variable) to enable
buffering or boost the sizes of messages that can be buffered.
:line
12.2 Reporting bugs :link(err_2),h4
If you are confident that you have found a bug in LAMMPS, follow these
steps.
Check the "New features and bug
fixes"_http://lammps.sandia.gov/bug.html section of the "LAMMPS WWW
site"_lws to see if the bug has already been reported or fixed or the
"Unfixed bug"_http://lammps.sandia.gov/unbug.html to see if a fix is
pending.
Check the "mailing list"_http://lammps.sandia.gov/mail.html
to see if it has been discussed before.
If not, send an email to the mailing list describing the problem with
any ideas you have as to what is causing it or where in the code the
problem might be. The developers will ask for more info if needed,
such as an input script or data files.
The most useful thing you can do to help us fix the bug is to isolate
the problem. Run it on the smallest number of atoms and fewest number
of processors and with the simplest input script that reproduces the
bug and try to identify what command or combination of commands is
causing the problem.
As a last resort, you can send an email directly to the
"developers"_http://lammps.sandia.gov/authors.html.
:line
12.3 Error & warning messages :h4,link(err_3)
These are two alphabetic lists of the "ERROR"_#error and
"WARNING"_#warn messages LAMMPS prints out and the reason why. If the
explanation here is not sufficient, the documentation for the
offending command may help.
Error and warning messages also list the source file and line number
where the error was generated. For example, this message
ERROR: Illegal velocity command (velocity.cpp:78)
means that line #78 in the file src/velocity.cpp generated the error.
Looking in the source code may help you figure out what went wrong.
Note that error messages from "user-contributed
packages"_Section_start.html#start_3 are not listed here. If such an
error occurs and is not self-explanatory, you'll need to look in the
source code or contact the author of the package.
Errors: :h4,link(error)
:dlb
{1-3 bond count is inconsistent} :dt
An inconsistency was detected when computing the number of 1-3
neighbors for each atom. This likely means something is wrong with
the bond topologies you have defined. :dd
{1-4 bond count is inconsistent} :dt
An inconsistency was detected when computing the number of 1-4
neighbors for each atom. This likely means something is wrong with
the bond topologies you have defined. :dd
{Accelerator sharing is not currently supported on system} :dt
Multiple MPI processes cannot share the accelerator on your
system. For NVIDIA GPUs, see the nvidia-smi command to change this
setting. :dd
{All angle coeffs are not set} :dt
All angle coefficients must be set in the data file or by the
angle_coeff command before running a simulation. :dd
{All atom IDs = 0 but atom_modify id = yes} :dt
Self-explanatory. :dd
{All atoms of a swapped type must have same charge.} :dt
Self-explanatory. :dd
{All atoms of a swapped type must have the same charge.} :dt
Self-explanatory. :dd
{All bond coeffs are not set} :dt
All bond coefficients must be set in the data file or by the
bond_coeff command before running a simulation. :dd
{All dihedral coeffs are not set} :dt
All dihedral coefficients must be set in the data file or by the
dihedral_coeff command before running a simulation. :dd
{All improper coeffs are not set} :dt
All improper coefficients must be set in the data file or by the
improper_coeff command before running a simulation. :dd
{All masses are not set} :dt
For atom styles that define masses for each atom type, all masses must
be set in the data file or by the mass command before running a
simulation. They must also be set before using the velocity
command. :dd
{All mol IDs should be set for fix gcmc group atoms} :dt
The molecule flag is on, yet not all molecule ids in the fix group
have been set to non-zero positive values by the user. This is an
error since all atoms in the fix gcmc group are eligible for deletion,
rotation, and translation and therefore must have valid molecule ids. :dd
{All pair coeffs are not set} :dt
All pair coefficients must be set in the data file or by the
pair_coeff command before running a simulation. :dd
{All read_dump x,y,z fields must be specified for scaled, triclinic coords} :dt
For triclinic boxes and scaled coordinates you must specify all 3 of
the x,y,z fields, else LAMMPS cannot reconstruct the unscaled
coordinates. :dd
{All universe/uloop variables must have same # of values} :dt
Self-explanatory. :dd
{All variables in next command must be same style} :dt
Self-explanatory. :dd
{Angle atom missing in delete_bonds} :dt
The delete_bonds command cannot find one or more atoms in a particular
angle on a particular processor. The pairwise cutoff is too short or
the atoms are too far apart to make a valid angle. :dd
{Angle atom missing in set command} :dt
The set command cannot find one or more atoms in a particular angle on
a particular processor. The pairwise cutoff is too short or the atoms
are too far apart to make a valid angle. :dd
{Angle atoms %d %d %d missing on proc %d at step %ld} :dt
One or more of 3 atoms needed to compute a particular angle are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the angle has blown apart and an atom is
too far away. :dd
{Angle atoms missing on proc %d at step %ld} :dt
One or more of 3 atoms needed to compute a particular angle are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the angle has blown apart and an atom is
too far away. :dd
{Angle coeff for hybrid has invalid style} :dt
Angle style hybrid uses another angle style as one of its
coefficients. The angle style used in the angle_coeff command or read
from a restart file is not recognized. :dd
{Angle coeffs are not set} :dt
No angle coefficients have been assigned in the data file or via the
angle_coeff command. :dd
{Angle extent > half of periodic box length} :dt
This error was detected by the neigh_modify check yes setting. It is
an error because the angle atoms are so far apart it is ambiguous how
it should be defined. :dd
{Angle potential must be defined for SHAKE} :dt
When shaking angles, an angle_style potential must be used. :dd
{Angle style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Angle style hybrid cannot have none as an argument} :dt
Self-explanatory. :dd
{Angle style hybrid cannot use same angle style twice} :dt
Self-explanatory. :dd
{Angle table must range from 0 to 180 degrees} :dt
Self-explanatory. :dd
{Angle table parameters did not set N} :dt
List of angle table parameters must include N setting. :dd
{Angle_coeff command before angle_style is defined} :dt
Coefficients cannot be set in the data file or via the angle_coeff
command until an angle_style has been assigned. :dd
{Angle_coeff command before simulation box is defined} :dt
The angle_coeff command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Angle_coeff command when no angles allowed} :dt
The chosen atom style does not allow for angles to be defined. :dd
{Angle_style command when no angles allowed} :dt
The chosen atom style does not allow for angles to be defined. :dd
{Angles assigned incorrectly} :dt
Angles read in from the data file were not assigned correctly to
atoms. This means there is something invalid about the topology
definitions. :dd
{Angles defined but no angle types} :dt
The data file header lists angles but no angle types. :dd
{Append boundary must be shrink/minimum} :dt
The boundary style of the face where atoms are added
must be of type m (shrink/minimum). :dd
{Arccos of invalid value in variable formula} :dt
Argument of arccos() must be between -1 and 1. :dd
{Arcsin of invalid value in variable formula} :dt
Argument of arcsin() must be between -1 and 1. :dd
{Assigning body parameters to non-body atom} :dt
Self-explanatory. :dd
{Assigning ellipsoid parameters to non-ellipsoid atom} :dt
Self-explanatory. :dd
{Assigning line parameters to non-line atom} :dt
Self-explanatory. :dd
{Assigning quat to non-body atom} :dt
Self-explanatory. :dd
{Assigning tri parameters to non-tri atom} :dt
Self-explanatory. :dd
{At least one atom of each swapped type must be present to define charges.} :dt
Self-explanatory. :dd
{Atom IDs must be consecutive for velocity create loop all} :dt
Self-explanatory. :dd
{Atom IDs must be used for molecular systems} :dt
Atom IDs are used to identify and find partner atoms in bonds. :dd
{Atom count changed in fix neb} :dt
This is not allowed in a NEB calculation. :dd
{Atom count is inconsistent, cannot write data file} :dt
The sum of atoms across processors does not equal the global number
of atoms. Probably some atoms have been lost. :dd
{Atom count is inconsistent, cannot write restart file} :dt
Sum of atoms across processors does not equal initial total count.
This is probably because you have lost some atoms. :dd
{Atom in too many rigid bodies - boost MAXBODY} :dt
Fix poems has a parameter MAXBODY (in fix_poems.cpp) which determines
the maximum number of rigid bodies a single atom can belong to (i.e. a
multibody joint). The bodies you have defined exceed this limit. :dd
{Atom sort did not operate correctly} :dt
This is an internal LAMMPS error. Please report it to the
developers. :dd
{Atom sorting has bin size = 0.0} :dt
The neighbor cutoff is being used as the bin size, but it is zero.
Thus you must explicitly list a bin size in the atom_modify sort
command or turn off sorting. :dd
{Atom style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Atom style hybrid cannot use same atom style twice} :dt
Self-explanatory. :dd
{Atom style template molecule must have atom types} :dt
The defined molecule(s) does not specify atom types. :dd
{Atom style was redefined after using fix property/atom} :dt
This is not allowed. :dd
{Atom type must be zero in fix gcmc mol command} :dt
Self-explanatory. :dd
{Atom vector in equal-style variable formula} :dt
Atom vectors generate one value per atom which is not allowed
in an equal-style variable. :dd
{Atom-style variable in equal-style variable formula} :dt
Atom-style variables generate one value per atom which is not allowed
in an equal-style variable. :dd
{Atom_modify id command after simulation box is defined} :dt
The atom_modify id command cannot be used after a read_data,
read_restart, or create_box command. :dd
{Atom_modify map command after simulation box is defined} :dt
The atom_modify map command cannot be used after a read_data,
read_restart, or create_box command. :dd
{Atom_modify sort and first options cannot be used together} :dt
Self-explanatory. :dd
{Atom_style command after simulation box is defined} :dt
The atom_style command cannot be used after a read_data,
read_restart, or create_box command. :dd
{Atom_style line can only be used in 2d simulations} :dt
Self-explanatory. :dd
{Atom_style tri can only be used in 3d simulations} :dt
Self-explanatory. :dd
{Atomfile variable could not read values} :dt
Check the file assigned to the variable. :dd
{Atomfile variable in equal-style variable formula} :dt
Self-explanatory. :dd
{Atomfile-style variable in equal-style variable formula} :dt
Self-explanatory. :dd
{Attempt to pop empty stack in fix box/relax} :dt
Internal LAMMPS error. Please report it to the developers. :dd
{Attempt to push beyond stack limit in fix box/relax} :dt
Internal LAMMPS error. Please report it to the developers. :dd
{Attempting to rescale a 0.0 temperature} :dt
Cannot rescale a temperature that is already 0.0. :dd
{Bad FENE bond} :dt
Two atoms in a FENE bond have become so far apart that the bond cannot
be computed. :dd
{Bad TIP4P angle type for PPPM/TIP4P} :dt
Specified angle type is not valid. :dd
{Bad TIP4P angle type for PPPMDisp/TIP4P} :dt
Specified angle type is not valid. :dd
{Bad TIP4P bond type for PPPM/TIP4P} :dt
Specified bond type is not valid. :dd
{Bad TIP4P bond type for PPPMDisp/TIP4P} :dt
Specified bond type is not valid. :dd
{Bad fix ID in fix append/atoms command} :dt
The value of the fix_id for keyword spatial must start with 'f_'. :dd
{Bad grid of processors} :dt
The 3d grid of processors defined by the processors command does not
match the number of processors LAMMPS is being run on. :dd
{Bad kspace_modify kmax/ewald parameter} :dt
Kspace_modify values for the kmax/ewald keyword must be integers > 0 :dd
{Bad kspace_modify slab parameter} :dt
Kspace_modify value for the slab/volume keyword must be >= 2.0. :dd
{Bad matrix inversion in mldivide3} :dt
This error should not occur unless the matrix is badly formed. :dd
{Bad principal moments} :dt
Fix rigid did not compute the principal moments of inertia of a rigid
group of atoms correctly. :dd
{Bad quadratic solve for particle/line collision} :dt
This is an internal error. It should normally not occur. :dd
{Bad quadratic solve for particle/tri collision} :dt
This is an internal error. It should normally not occur. :dd
{Bad real space Coulomb cutoff in fix tune/kspace} :dt
Fix tune/kspace tried to find the optimal real space Coulomb cutoff using
the Newton-Rhaphson method, but found a non-positive or NaN cutoff :dd
{Balance command before simulation box is defined} :dt
The balance command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Balance produced bad splits} :dt
This should not occur. It means two or more cutting plane locations
are on top of each other or out of order. Report the problem to the
developers. :dd
{Balance rcb cannot be used with comm_style brick} :dt
Comm_style tiled must be used instead. :dd
{Balance shift string is invalid} :dt
The string can only contain the characters "x", "y", or "z". :dd
{Bias compute does not calculate a velocity bias} :dt
The specified compute must compute a bias for temperature. :dd
{Bias compute does not calculate temperature} :dt
The specified compute must compute temperature. :dd
{Bias compute group does not match compute group} :dt
The specified compute must operate on the same group as the parent
compute. :dd
{Big particle in fix srd cannot be point particle} :dt
Big particles must be extended spheriods or ellipsoids. :dd
{Bigint setting in lmptype.h is invalid} :dt
Size of bigint is less than size of tagint. :dd
{Bigint setting in lmptype.h is not compatible} :dt
Format of bigint stored in restart file is not consistent with LAMMPS
version you are running. See the settings in src/lmptype.h :dd
{Bitmapped lookup tables require int/float be same size} :dt
Cannot use pair tables on this machine, because of word sizes. Use
the pair_modify command with table 0 instead. :dd
{Bitmapped table in file does not match requested table} :dt
Setting for bitmapped table in pair_coeff command must match table
in file exactly. :dd
{Bitmapped table is incorrect length in table file} :dt
Number of table entries is not a correct power of 2. :dd
{Bond and angle potentials must be defined for TIP4P} :dt
Cannot use TIP4P pair potential unless bond and angle potentials
are defined. :dd
{Bond atom missing in box size check} :dt
The 2nd atoms needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away. :dd
{Bond atom missing in delete_bonds} :dt
The delete_bonds command cannot find one or more atoms in a particular
bond on a particular processor. The pairwise cutoff is too short or
the atoms are too far apart to make a valid bond. :dd
{Bond atom missing in image check} :dt
The 2nd atom in a particular bond is missing on this processor.
Typically this is because the pairwise cutoff is set too short or the
bond has blown apart and an atom is too far away. :dd
{Bond atom missing in set command} :dt
The set command cannot find one or more atoms in a particular bond on
a particular processor. The pairwise cutoff is too short or the atoms
are too far apart to make a valid bond. :dd
{Bond atoms %d %d missing on proc %d at step %ld} :dt
The 2nd atom needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away. :dd
{Bond atoms missing on proc %d at step %ld} :dt
The 2nd atom needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away. :dd
{Bond coeff for hybrid has invalid style} :dt
Bond style hybrid uses another bond style as one of its coefficients.
The bond style used in the bond_coeff command or read from a restart
file is not recognized. :dd
{Bond coeffs are not set} :dt
No bond coefficients have been assigned in the data file or via the
bond_coeff command. :dd
{Bond extent > half of periodic box length} :dt
This error was detected by the neigh_modify check yes setting. It is
an error because the bond atoms are so far apart it is ambiguous how
it should be defined. :dd
{Bond potential must be defined for SHAKE} :dt
Cannot use fix shake unless bond potential is defined. :dd
{Bond style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Bond style hybrid cannot have none as an argument} :dt
Self-explanatory. :dd
{Bond style hybrid cannot use same bond style twice} :dt
Self-explanatory. :dd
{Bond style quartic cannot be used with 3,4-body interactions} :dt
No angle, dihedral, or improper styles can be defined when using
bond style quartic. :dd
{Bond style quartic cannot be used with atom style template} :dt
This bond style can change the bond topology which is not
allowed with this atom style. :dd
{Bond style quartic requires special_bonds = 1,1,1} :dt
This is a restriction of the current bond quartic implementation. :dd
{Bond table parameters did not set N} :dt
List of bond table parameters must include N setting. :dd
{Bond table values are not increasing} :dt
The values in the tabulated file must be monotonically increasing. :dd
{BondAngle coeff for hybrid angle has invalid format} :dt
No "ba" field should appear in data file entry. :dd
{BondBond coeff for hybrid angle has invalid format} :dt
No "bb" field should appear in data file entry. :dd
{Bond_coeff command before bond_style is defined} :dt
Coefficients cannot be set in the data file or via the bond_coeff
command until an bond_style has been assigned. :dd
{Bond_coeff command before simulation box is defined} :dt
The bond_coeff command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Bond_coeff command when no bonds allowed} :dt
The chosen atom style does not allow for bonds to be defined. :dd
{Bond_style command when no bonds allowed} :dt
The chosen atom style does not allow for bonds to be defined. :dd
{Bonds assigned incorrectly} :dt
Bonds read in from the data file were not assigned correctly to atoms.
This means there is something invalid about the topology definitions. :dd
{Bonds defined but no bond types} :dt
The data file header lists bonds but no bond types. :dd
{Both restart files must use % or neither} :dt
Self-explanatory. :dd
{Both restart files must use MPI-IO or neither} :dt
Self-explanatory. :dd
{Both sides of boundary must be periodic} :dt
Cannot specify a boundary as periodic only on the lo or hi side. Must
be periodic on both sides. :dd
{Boundary command after simulation box is defined} :dt
The boundary command cannot be used after a read_data, read_restart,
or create_box command. :dd
{Box bounds are invalid} :dt
The box boundaries specified in the read_data file are invalid. The
lo value must be less than the hi value for all 3 dimensions. :dd
{Box command after simulation box is defined} :dt
The box command cannot be used after a read_data, read_restart, or
create_box command. :dd
{CPU neighbor lists must be used for ellipsoid/sphere mix.} :dt
When using Gay-Berne or RE-squared pair styles with both ellipsoidal and
spherical particles, the neighbor list must be built on the CPU :dd
{Can not specify Pxy/Pxz/Pyz in fix box/relax with non-triclinic box} :dt
Only triclinic boxes can be used with off-diagonal pressure components.
See the region prism command for details. :dd
{Can not specify Pxy/Pxz/Pyz in fix nvt/npt/nph with non-triclinic box} :dt
Only triclinic boxes can be used with off-diagonal pressure components.
See the region prism command for details. :dd
{Can only use -plog with multiple partitions} :dt
Self-explanatory. See doc page discussion of command-line switches. :dd
{Can only use -pscreen with multiple partitions} :dt
Self-explanatory. See doc page discussion of command-line switches. :dd
{Can only use Kokkos supported regions with Kokkos package} :dt
Self-explanatory. :dd
{Can only use NEB with 1-processor replicas} :dt
This is current restriction for NEB as implemented in LAMMPS. :dd
{Can only use TAD with 1-processor replicas for NEB} :dt
This is current restriction for NEB as implemented in LAMMPS. :dd
{Cannot (yet) do analytic differentiation with pppm/gpu} :dt
This is a current restriction of this command. :dd
{Cannot (yet) request ghost atoms with Kokkos half neighbor list} :dt
This feature is not yet supported. :dd
{Cannot (yet) use 'electron' units with dipoles} :dt
This feature is not yet supported. :dd
{Cannot (yet) use Ewald with triclinic box and slab correction} :dt
This feature is not yet supported. :dd
{Cannot (yet) use K-space slab correction with compute group/group for triclinic systems} :dt
This option is not yet supported. :dd
{Cannot (yet) use MSM with 2d simulation} :dt
This feature is not yet supported. :dd
{Cannot (yet) use PPPM with triclinic box and TIP4P} :dt
This feature is not yet supported. :dd
{Cannot (yet) use PPPM with triclinic box and kspace_modify diff ad} :dt
This feature is not yet supported. :dd
{Cannot (yet) use PPPM with triclinic box and slab correction} :dt
This feature is not yet supported. :dd
{Cannot (yet) use kspace slab correction with long-range dipoles and non-neutral systems or per-atom energy} :dt
This feature is not yet supported. :dd
{Cannot (yet) use kspace_modify diff ad with compute group/group} :dt
This option is not yet supported. :dd
{Cannot (yet) use kspace_style pppm/stagger with triclinic systems} :dt
This feature is not yet supported. :dd
{Cannot (yet) use molecular templates with Kokkos} :dt
Self-explanatory. :dd
{Cannot (yet) use respa with Kokkos} :dt
Self-explanatory. :dd
{Cannot (yet) use rigid bodies with fix deform and Kokkos} :dt
Self-explanatory. :dd
{Cannot (yet) use rigid bodies with fix nh and Kokkos} :dt
Self-explanatory. :dd
{Cannot (yet) use single precision with MSM (remove -DFFT_SINGLE from Makefile and recompile)} :dt
Single precision cannot be used with MSM. :dd
{Cannot add atoms to fix move variable} :dt
Atoms can not be added afterwards to this fix option. :dd
{Cannot append atoms to a triclinic box} :dt
The simulation box must be defined with edges aligned with the
Cartesian axes. :dd
{Cannot balance in z dimension for 2d simulation} :dt
Self-explanatory. :dd
{Cannot change box ortho/triclinic with certain fixes defined} :dt
This is because those fixes store the shape of the box. You need to
use unfix to discard the fix, change the box, then redefine a new
fix. :dd
{Cannot change box ortho/triclinic with dumps defined} :dt
This is because some dumps store the shape of the box. You need to
use undump to discard the dump, change the box, then redefine a new
dump. :dd
{Cannot change box tilt factors for orthogonal box} :dt
Cannot use tilt factors unless the simulation box is non-orthogonal. :dd
{Cannot change box to orthogonal when tilt is non-zero} :dt
Self-explanatory. :dd
{Cannot change box z boundary to nonperiodic for a 2d simulation} :dt
Self-explanatory. :dd
{Cannot change dump_modify every for dump dcd} :dt
The frequency of writing dump dcd snapshots cannot be changed. :dd
{Cannot change dump_modify every for dump xtc} :dt
The frequency of writing dump xtc snapshots cannot be changed. :dd
{Cannot change timestep once fix srd is setup} :dt
This is because various SRD properties depend on the timestep
size. :dd
{Cannot change timestep with fix pour} :dt
This is because fix pour pre-computes the time delay for particles to
fall out of the insertion volume due to gravity. :dd
{Cannot change to comm_style brick from tiled layout} :dt
Self-explanatory. :dd
{Cannot change_box after reading restart file with per-atom info} :dt
This is because the restart file info cannot be migrated with the
atoms. You can get around this by performing a 0-timestep run which
will assign the restart file info to actual atoms. :dd
{Cannot change_box in xz or yz for 2d simulation} :dt
Self-explanatory. :dd
{Cannot change_box in z dimension for 2d simulation} :dt
Self-explanatory. :dd
{Cannot clear group all} :dt
This operation is not allowed. :dd
{Cannot close restart file - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot compute initial g_ewald_disp} :dt
LAMMPS failed to compute an initial guess for the PPPM_disp g_ewald_6
factor that partitions the computation between real space and k-space
for Dispersion interactions. :dd
{Cannot create an atom map unless atoms have IDs} :dt
The simulation requires a mapping from global atom IDs to local atoms,
but the atoms that have been defined have no IDs. :dd
{Cannot create atoms with undefined lattice} :dt
Must use the lattice command before using the create_atoms
command. :dd
{Cannot create/grow a vector/array of pointers for %s} :dt
LAMMPS code is making an illegal call to the templated memory
allocaters, to create a vector or array of pointers. :dd
{Cannot create_atoms after reading restart file with per-atom info} :dt
The per-atom info was stored to be used when by a fix that you may
re-define. If you add atoms before re-defining the fix, then there
will not be a correct amount of per-atom info. :dd
{Cannot create_box after simulation box is defined} :dt
A simulation box can only be defined once. :dd
{Cannot currently use pair reax with pair hybrid} :dt
This is not yet supported. :dd
{Cannot currently use pppm/gpu with fix balance.} :dt
Self-explanatory. :dd
{Cannot delete group all} :dt
Self-explanatory. :dd
{Cannot delete group currently used by a compute} :dt
Self-explanatory. :dd
{Cannot delete group currently used by a dump} :dt
Self-explanatory. :dd
{Cannot delete group currently used by a fix} :dt
Self-explanatory. :dd
{Cannot delete group currently used by atom_modify first} :dt
Self-explanatory. :dd
{Cannot delete_atoms bond yes for non-molecular systems} :dt
Self-explanatory. :dd
{Cannot displace_atoms after reading restart file with per-atom info} :dt
This is because the restart file info cannot be migrated with the
atoms. You can get around this by performing a 0-timestep run which
will assign the restart file info to actual atoms. :dd
{Cannot do GCMC on atoms in atom_modify first group} :dt
This is a restriction due to the way atoms are organized in a list to
enable the atom_modify first command. :dd
{Cannot do atom/swap on atoms in atom_modify first group} :dt
This is a restriction due to the way atoms are organized in a list to
enable the atom_modify first command. :dd
{Cannot dump sort on atom IDs with no atom IDs defined} :dt
Self-explanatory. :dd
{Cannot dump sort when multiple dump files are written} :dt
In this mode, each processor dumps its atoms to a file, so
no sorting is allowed. :dd
{Cannot embed Python when also extending Python with LAMMPS} :dt
When running LAMMPS via Python through the LAMMPS library interface
you cannot also user the input script python command. :dd
{Cannot evaporate atoms in atom_modify first group} :dt
This is a restriction due to the way atoms are organized in
a list to enable the atom_modify first command. :dd
{Cannot find create_bonds group ID} :dt
Self-explanatory. :dd
{Cannot find delete_bonds group ID} :dt
Group ID used in the delete_bonds command does not exist. :dd
{Cannot find specified group ID for core particles} :dt
Self-explanatory. :dd
{Cannot find specified group ID for shell particles} :dt
Self-explanatory. :dd
{Cannot have both pair_modify shift and tail set to yes} :dt
These 2 options are contradictory. :dd
{Cannot intersect groups using a dynamic group} :dt
This operation is not allowed. :dd
{Cannot mix molecular and molecule template atom styles} :dt
Self-explanatory. :dd
{Cannot open -reorder file} :dt
Self-explanatory. :dd
{Cannot open ADP potential file %s} :dt
The specified ADP potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open AIREBO potential file %s} :dt
The specified AIREBO potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open BOP potential file %s} :dt
The specified BOP potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open COMB potential file %s} :dt
The specified COMB potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open COMB3 lib.comb3 file} :dt
The COMB3 library file cannot be opened. Check that the path and name
are correct. :dd
{Cannot open COMB3 potential file %s} :dt
The specified COMB3 potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open EAM potential file %s} :dt
The specified EAM potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open EIM potential file %s} :dt
The specified EIM potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open LCBOP potential file %s} :dt
The specified LCBOP potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open MEAM potential file %s} :dt
The specified MEAM potential file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open SNAP coefficient file %s} :dt
The specified SNAP coefficient file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open SNAP parameter file %s} :dt
The specified SNAP parameter file cannot be opened. Check that the
path and name are correct. :dd
{Cannot open Stillinger-Weber potential file %s} :dt
The specified SW potential file cannot be opened. Check that the path
and name are correct. :dd
{Cannot open Tersoff potential file %s} :dt
The specified potential file cannot be opened. Check that the path
and name are correct. :dd
{Cannot open Vashishta potential file %s} :dt
The specified Vashishta potential file cannot be opened. Check that the path
and name are correct. :dd
{Cannot open balance output file} :dt
Self-explanatory. :dd
{Cannot open coul/streitz potential file %s} :dt
The specified coul/streitz potential file cannot be opened. Check
that the path and name are correct. :dd
{Cannot open custom file} :dt
Self-explanatory. :dd
{Cannot open data file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open dir to search for restart file} :dt
Using a "*" in the name of the restart file will open the current
directory to search for matching file names. :dd
{Cannot open dump file} :dt
Self-explanatory. :dd
{Cannot open dump file %s} :dt
The output file for the dump command cannot be opened. Check that the
path and name are correct. :dd
{Cannot open file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. If the file is a compressed file, also check that the gzip
executable can be found and run. :dd
{Cannot open file variable file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix ave/chunk file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix ave/correlate file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix ave/histo file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix ave/spatial file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix ave/time file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix balance output file} :dt
Self-explanatory. :dd
{Cannot open fix poems file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix print file %s} :dt
The output file generated by the fix print command cannot be opened :dd
{Cannot open fix qeq parameter file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix qeq/comb file %s} :dt
The output file for the fix qeq/combs command cannot be opened.
Check that the path and name are correct. :dd
{Cannot open fix reax/bonds file %s} :dt
The output file for the fix reax/bonds command cannot be opened.
Check that the path and name are correct. :dd
{Cannot open fix rigid infile %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix rigid restart file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix rigid/small infile %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open fix tmd file %s} :dt
The output file for the fix tmd command cannot be opened. Check that
the path and name are correct. :dd
{Cannot open fix ttm file %s} :dt
The output file for the fix ttm command cannot be opened. Check that
the path and name are correct. :dd
{Cannot open gzipped file} :dt
LAMMPS was compiled without support for reading and writing gzipped
files through a pipeline to the gzip program with -DLAMMPS_GZIP. :dd
{Cannot open input script %s} :dt
Self-explanatory. :dd
{Cannot open log.cite file} :dt
This file is created when you use some LAMMPS features, to indicate
what paper you should cite on behalf of those who implemented
the feature. Check that you have write privileges into the directory
you are running in. :dd
{Cannot open log.lammps for writing} :dt
The default LAMMPS log file cannot be opened. Check that the
directory you are running in allows for files to be created. :dd
{Cannot open logfile} :dt
The LAMMPS log file named in a command-line argument cannot be opened.
Check that the path and name are correct. :dd
{Cannot open logfile %s} :dt
The LAMMPS log file specified in the input script cannot be opened.
Check that the path and name are correct. :dd
{Cannot open molecule file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct. :dd
{Cannot open nb3b/harmonic potential file %s} :dt
The specified potential file cannot be opened. Check that the path
and name are correct. :dd
{Cannot open pair_write file} :dt
The specified output file for pair energies and forces cannot be
opened. Check that the path and name are correct. :dd
{Cannot open polymorphic potential file %s} :dt
The specified polymorphic potential file cannot be opened. Check that
the path and name are correct. :dd
{Cannot open print file %s} :dt
Self-explanatory. :dd
{Cannot open processors output file} :dt
Self-explanatory. :dd
{Cannot open restart file %s} :dt
Self-explanatory. :dd
{Cannot open restart file for reading - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot open restart file for writing - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot open screen file} :dt
The screen file specified as a command-line argument cannot be
opened. Check that the directory you are running in allows for files
to be created. :dd
{Cannot open temporary file for world counter.} :dt
Self-explanatory. :dd
{Cannot open universe log file} :dt
For a multi-partition run, the master log file cannot be opened.
Check that the directory you are running in allows for files to be
created. :dd
{Cannot open universe screen file} :dt
For a multi-partition run, the master screen file cannot be opened.
Check that the directory you are running in allows for files to be
created. :dd
{Cannot read from restart file - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot read_data without add keyword after simulation box is defined} :dt
Self-explanatory. :dd
{Cannot read_restart after simulation box is defined} :dt
The read_restart command cannot be used after a read_data,
read_restart, or create_box command. :dd
{Cannot redefine variable as a different style} :dt
An equal-style variable can be re-defined but only if it was
originally an equal-style variable. :dd
{Cannot replicate 2d simulation in z dimension} :dt
The replicate command cannot replicate a 2d simulation in the z
dimension. :dd
{Cannot replicate with fixes that store atom quantities} :dt
Either fixes are defined that create and store atom-based vectors or a
restart file was read which included atom-based vectors for fixes.
The replicate command cannot duplicate that information for new atoms.
You should use the replicate command before fixes are applied to the
system. :dd
{Cannot reset timestep with a dynamic region defined} :dt
Dynamic regions (see the region command) have a time dependence.
Thus you cannot change the timestep when one or more of these
are defined. :dd
{Cannot reset timestep with a time-dependent fix defined} :dt
You cannot reset the timestep when a fix that keeps track of elapsed
time is in place. :dd
{Cannot run 2d simulation with nonperiodic Z dimension} :dt
Use the boundary command to make the z dimension periodic in order to
run a 2d simulation. :dd
{Cannot set bond topology types for atom style template} :dt
The bond, angle, etc types cannot be changed for this atom style since
they are static settings in the molecule template files. :dd
{Cannot set both respa pair and inner/middle/outer} :dt
In the rRESPA integrator, you must compute pairwise potentials either
all together (pair), or in pieces (inner/middle/outer). You can't do
both. :dd
{Cannot set cutoff/multi before simulation box is defined} :dt
Self-explanatory. :dd
{Cannot set dpd/theta for this atom style} :dt
Self-explanatory. :dd
{Cannot set dump_modify flush for dump xtc} :dt
Self-explanatory. :dd
{Cannot set mass for this atom style} :dt
This atom style does not support mass settings for each atom type.
Instead they are defined on a per-atom basis in the data file. :dd
{Cannot set meso/cv for this atom style} :dt
Self-explanatory. :dd
{Cannot set meso/e for this atom style} :dt
Self-explanatory. :dd
{Cannot set meso/rho for this atom style} :dt
Self-explanatory. :dd
{Cannot set non-zero image flag for non-periodic dimension} :dt
Self-explanatory. :dd
{Cannot set non-zero z velocity for 2d simulation} :dt
Self-explanatory. :dd
{Cannot set quaternion for atom that has none} :dt
Self-explanatory. :dd
{Cannot set quaternion with xy components for 2d system} :dt
Self-explanatory. :dd
{Cannot set respa hybrid and any of pair/inner/middle/outer} :dt
In the rRESPA integrator, you must compute pairwise potentials either
all together (pair), with different cutoff regions (inner/middle/outer),
or per hybrid sub-style (hybrid). You cannot mix those. :dd
{Cannot set respa middle without inner/outer} :dt
In the rRESPA integrator, you must define both a inner and outer
setting in order to use a middle setting. :dd
{Cannot set restart file size - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot set smd/contact/radius for this atom style} :dt
Self-explanatory. :dd
{Cannot set smd/mass/density for this atom style} :dt
Self-explanatory. :dd
{Cannot set temperature for fix rigid/nph} :dt
The temp keyword cannot be specified. :dd
{Cannot set theta for atom that is not a line} :dt
Self-explanatory. :dd
{Cannot set this attribute for this atom style} :dt
The attribute being set does not exist for the defined atom style. :dd
{Cannot set variable z velocity for 2d simulation} :dt
Self-explanatory. :dd
{Cannot skew triclinic box in z for 2d simulation} :dt
Self-explanatory. :dd
{Cannot subtract groups using a dynamic group} :dt
This operation is not allowed. :dd
{Cannot union groups using a dynamic group} :dt
This operation is not allowed. :dd
{Cannot use -cuda on and -kokkos on together} :dt
This is not allowed since both packages can use GPUs. :dd
{Cannot use -cuda on without USER-CUDA installed} :dt
The USER-CUDA package must be installed via "make yes-user-cuda"
before LAMMPS is built. :dd
{Cannot use -kokkos on without KOKKOS installed} :dt
Self-explanatory. :dd
{Cannot use -reorder after -partition} :dt
Self-explanatory. See doc page discussion of command-line switches. :dd
{Cannot use Ewald with 2d simulation} :dt
The kspace style ewald cannot be used in 2d simulations. You can use
2d Ewald in a 3d simulation; see the kspace_modify command. :dd
{Cannot use Ewald/disp solver on system with no charge, dipole, or LJ particles} :dt
No atoms in system have a non-zero charge or dipole, or are LJ
particles. Change charges/dipoles or change options of the kspace
solver/pair style. :dd
{Cannot use EwaldDisp with 2d simulation} :dt
This is a current restriction of this command. :dd
{Cannot use GPU package with USER-CUDA package enabled} :dt
You cannot use both the GPU and USER-CUDA packages
together. Use one or the other. :dd
{Cannot use Kokkos pair style with rRESPA inner/middle} :dt
Self-explanatory. :dd
{Cannot use NEB unless atom map exists} :dt
Use the atom_modify command to create an atom map. :dd
{Cannot use NEB with a single replica} :dt
Self-explanatory. :dd
{Cannot use NEB with atom_modify sort enabled} :dt
This is current restriction for NEB implemented in LAMMPS. :dd
{Cannot use PPPM with 2d simulation} :dt
The kspace style pppm cannot be used in 2d simulations. You can use
2d PPPM in a 3d simulation; see the kspace_modify command. :dd
{Cannot use PPPMDisp with 2d simulation} :dt
The kspace style pppm/disp cannot be used in 2d simulations. You can
use 2d pppm/disp in a 3d simulation; see the kspace_modify command. :dd
{Cannot use PRD with a changing box} :dt
The current box dimensions are not copied between replicas :dd
{Cannot use PRD with a time-dependent fix defined} :dt
PRD alters the timestep in ways that will mess up these fixes. :dd
{Cannot use PRD with a time-dependent region defined} :dt
PRD alters the timestep in ways that will mess up these regions. :dd
{Cannot use PRD with atom_modify sort enabled} :dt
This is a current restriction of PRD. You must turn off sorting,
which is enabled by default, via the atom_modify command. :dd
{Cannot use PRD with multi-processor replicas unless atom map exists} :dt
Use the atom_modify command to create an atom map. :dd
{Cannot use TAD unless atom map exists for NEB} :dt
See atom_modify map command to set this. :dd
{Cannot use TAD with a single replica for NEB} :dt
NEB requires multiple replicas. :dd
{Cannot use TAD with atom_modify sort enabled for NEB} :dt
This is a current restriction of NEB. :dd
{Cannot use a damped dynamics min style with fix box/relax} :dt
This is a current restriction in LAMMPS. Use another minimizer
style. :dd
{Cannot use a damped dynamics min style with per-atom DOF} :dt
This is a current restriction in LAMMPS. Use another minimizer
style. :dd
{Cannot use append/atoms in periodic dimension} :dt
The boundary style of the face where atoms are added can not be of
type p (periodic). :dd
{Cannot use atomfile-style variable unless atom map exists} :dt
Self-explanatory. See the atom_modify command to create a map. :dd
{Cannot use both com and bias with compute temp/chunk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with buck/coul/cut/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with buck/coul/long/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with buck/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with coul/cut/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with coul/debye/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with coul/dsf/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with coul/wolf/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with lj/charmm/coul/charmm/implicit/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/charmm/coul/charmm/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/charmm/coul/long/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/class2/coul/cut/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/class2/coul/long/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/class2/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/cut/coul/cut/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with lj/cut/coul/debye/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/cut/coul/long/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with lj/cut/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with lj/expand/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/gromacs/coul/gromacs/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/gromacs/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with lj/sdk/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with pair eam/kk} :dt
That style is not supported by Kokkos. :dd
{Cannot use chosen neighbor list style with pair eam/kk/alloy} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with pair eam/kk/fs} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with pair sw/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with tersoff/kk} :dt
Self-explanatory. :dd
{Cannot use chosen neighbor list style with tersoff/zbl/kk} :dt
Self-explanatory. :dd
{Cannot use compute chunk/atom bin z for 2d model} :dt
Self-explanatory. :dd
{Cannot use compute cluster/atom unless atoms have IDs} :dt
Atom IDs are used to identify clusters. :dd
{Cannot use create_atoms rotate unless single style} :dt
Self-explanatory. :dd
{Cannot use create_bonds unless atoms have IDs} :dt
This command requires a mapping from global atom IDs to local atoms,
but the atoms that have been defined have no IDs. :dd
{Cannot use create_bonds with non-molecular system} :dt
Self-explanatory. :dd
{Cannot use cwiggle in variable formula between runs} :dt
This is a function of elapsed time. :dd
{Cannot use delete_atoms bond yes with atom_style template} :dt
This is because the bonds for that atom style are hardwired in the
molecule template. :dd
{Cannot use delete_atoms unless atoms have IDs} :dt
Your atoms do not have IDs, so the delete_atoms command cannot be
used. :dd
{Cannot use delete_bonds with non-molecular system} :dt
Your choice of atom style does not have bonds. :dd
{Cannot use dump_modify fileper without % in dump file name} :dt
Self-explanatory. :dd
{Cannot use dump_modify nfile without % in dump file name} :dt
Self-explanatory. :dd
{Cannot use dynamic group with fix adapt atom} :dt
This is not yet supported. :dd
{Cannot use fix TMD unless atom map exists} :dt
Using this fix requires the ability to lookup an atom index, which is
provided by an atom map. An atom map does not exist (by default) for
non-molecular problems. Using the atom_modify map command will force
an atom map to be created. :dd
{Cannot use fix ave/spatial z for 2 dimensional model} :dt
Self-explanatory. :dd
{Cannot use fix bond/break with non-molecular systems} :dt
Only systems with bonds that can be changed can be used. Atom_style
template does not qualify. :dd
{Cannot use fix bond/create with non-molecular systems} :dt
Only systems with bonds that can be changed can be used. Atom_style
template does not qualify. :dd
{Cannot use fix bond/swap with non-molecular systems} :dt
Only systems with bonds that can be changed can be used. Atom_style
template does not qualify. :dd
{Cannot use fix box/relax on a 2nd non-periodic dimension} :dt
When specifying an off-diagonal pressure component, the 2nd of the two
dimensions must be periodic. E.g. if the xy component is specified,
then the y dimension must be periodic. :dd
{Cannot use fix box/relax on a non-periodic dimension} :dt
When specifying a diagonal pressure component, the dimension must be
periodic. :dd
{Cannot use fix box/relax with both relaxation and scaling on a tilt factor} :dt
When specifying scaling on a tilt factor component, that component can not
also be controlled by the barostat. E.g. if scalexy yes is specified and
also keyword tri or xy, this is wrong. :dd
{Cannot use fix box/relax with tilt factor scaling on a 2nd non-periodic dimension} :dt
When specifying scaling on a tilt factor component, the 2nd of the two
dimensions must be periodic. E.g. if the xy component is specified,
then the y dimension must be periodic. :dd
{Cannot use fix deform on a shrink-wrapped boundary} :dt
The x, y, z options cannot be applied to shrink-wrapped
dimensions. :dd
{Cannot use fix deform tilt on a shrink-wrapped 2nd dim} :dt
This is because the shrink-wrapping will change the value
of the strain implied by the tilt factor. :dd
{Cannot use fix deform trate on a box with zero tilt} :dt
The trate style alters the current strain. :dd
{Cannot use fix deposit rigid and not molecule} :dt
Self-explanatory. :dd
{Cannot use fix deposit rigid and shake} :dt
These two attributes are conflicting. :dd
{Cannot use fix deposit shake and not molecule} :dt
Self-explanatory. :dd
{Cannot use fix enforce2d with 3d simulation} :dt
Self-explanatory. :dd
{Cannot use fix gcmc in a 2d simulation} :dt
Fix gcmc is set up to run in 3d only. No 2d simulations with fix gcmc
are allowed. :dd
{Cannot use fix gcmc shake and not molecule} :dt
Self-explanatory. :dd
{Cannot use fix msst without per-type mass defined} :dt
Self-explanatory. :dd
{Cannot use fix npt and fix deform on same component of stress tensor} :dt
This would be changing the same box dimension twice. :dd
{Cannot use fix nvt/npt/nph on a 2nd non-periodic dimension} :dt
When specifying an off-diagonal pressure component, the 2nd of the two
dimensions must be periodic. E.g. if the xy component is specified,
then the y dimension must be periodic. :dd
{Cannot use fix nvt/npt/nph on a non-periodic dimension} :dt
When specifying a diagonal pressure component, the dimension must be
periodic. :dd
{Cannot use fix nvt/npt/nph with both xy dynamics and xy scaling} :dt
Self-explanatory. :dd
{Cannot use fix nvt/npt/nph with both xz dynamics and xz scaling} :dt
Self-explanatory. :dd
{Cannot use fix nvt/npt/nph with both yz dynamics and yz scaling} :dt
Self-explanatory. :dd
{Cannot use fix nvt/npt/nph with xy scaling when y is non-periodic dimension} :dt
The 2nd dimension in the barostatted tilt factor must be periodic. :dd
{Cannot use fix nvt/npt/nph with xz scaling when z is non-periodic dimension} :dt
The 2nd dimension in the barostatted tilt factor must be periodic. :dd
{Cannot use fix nvt/npt/nph with yz scaling when z is non-periodic dimension} :dt
The 2nd dimension in the barostatted tilt factor must be periodic. :dd
{Cannot use fix pour rigid and not molecule} :dt
Self-explanatory. :dd
{Cannot use fix pour rigid and shake} :dt
These two attributes are conflicting. :dd
{Cannot use fix pour shake and not molecule} :dt
Self-explanatory. :dd
{Cannot use fix pour with triclinic box} :dt
This option is not yet supported. :dd
{Cannot use fix press/berendsen and fix deform on same component of stress tensor} :dt
These commands both change the box size/shape, so you cannot use both
together. :dd
{Cannot use fix press/berendsen on a non-periodic dimension} :dt
Self-explanatory. :dd
{Cannot use fix press/berendsen with triclinic box} :dt
Self-explanatory. :dd
{Cannot use fix reax/bonds without pair_style reax} :dt
Self-explanatory. :dd
{Cannot use fix rigid npt/nph and fix deform on same component of stress tensor} :dt
This would be changing the same box dimension twice. :dd
{Cannot use fix rigid npt/nph on a non-periodic dimension} :dt
When specifying a diagonal pressure component, the dimension must be
periodic. :dd
{Cannot use fix rigid/small npt/nph on a non-periodic dimension} :dt
When specifying a diagonal pressure component, the dimension must be
periodic. :dd
{Cannot use fix shake with non-molecular system} :dt
Your choice of atom style does not have bonds. :dd
{Cannot use fix ttm with 2d simulation} :dt
This is a current restriction of this fix due to the grid it creates. :dd
{Cannot use fix ttm with triclinic box} :dt
This is a current restriction of this fix due to the grid it creates. :dd
{Cannot use fix tune/kspace without a kspace style} :dt
Self-explanatory. :dd
{Cannot use fix tune/kspace without a pair style} :dt
This fix (tune/kspace) can only be used when a pair style has been specified. :dd
{Cannot use fix wall in periodic dimension} :dt
Self-explanatory. :dd
{Cannot use fix wall zlo/zhi for a 2d simulation} :dt
Self-explanatory. :dd
{Cannot use fix wall/reflect in periodic dimension} :dt
Self-explanatory. :dd
{Cannot use fix wall/reflect zlo/zhi for a 2d simulation} :dt
Self-explanatory. :dd
{Cannot use fix wall/srd in periodic dimension} :dt
Self-explanatory. :dd
{Cannot use fix wall/srd more than once} :dt
Nor is their a need to since multiple walls can be specified
in one command. :dd
{Cannot use fix wall/srd without fix srd} :dt
Self-explanatory. :dd
{Cannot use fix wall/srd zlo/zhi for a 2d simulation} :dt
Self-explanatory. :dd
{Cannot use fix_deposit unless atoms have IDs} :dt
Self-explanatory. :dd
{Cannot use fix_pour unless atoms have IDs} :dt
Self-explanatory. :dd
{Cannot use include command within an if command} :dt
Self-explanatory. :dd
{Cannot use lines with fix srd unless overlap is set} :dt
This is because line segments are connected to each other. :dd
{Cannot use multiple fix wall commands with pair brownian} :dt
Self-explanatory. :dd
{Cannot use multiple fix wall commands with pair lubricate} :dt
Self-explanatory. :dd
{Cannot use multiple fix wall commands with pair lubricate/poly} :dt
Self-explanatory. :dd
{Cannot use multiple fix wall commands with pair lubricateU} :dt
Self-explanatory. :dd
{Cannot use neigh_modify exclude with GPU neighbor builds} :dt
This is a current limitation of the GPU implementation
in LAMMPS. :dd
{Cannot use neighbor bins - box size << cutoff} :dt
Too many neighbor bins will be created. This typically happens when
the simulation box is very small in some dimension, compared to the
neighbor cutoff. Use the "nsq" style instead of "bin" style. :dd
{Cannot use newton pair with beck/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with born/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with born/coul/wolf/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with born/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with buck/coul/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with buck/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with buck/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with colloid/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with coul/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with coul/debye/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with coul/dsf/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with dipole/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with dipole/sf/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with dpd/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with dpd/tstat/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with eam/alloy/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with eam/fs/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with eam/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with gauss/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with gayberne/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/charmm/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/class2/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/class2/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cubic/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/coul/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/coul/debye/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/coul/dsf/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/coul/msm/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/expand/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/gromacs/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/sdk/coul/long/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj/sdk/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with lj96/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with mie/cut/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with morse/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with resquared/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with soft/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with table/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with yukawa/colloid/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with yukawa/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use newton pair with zbl/gpu pair style} :dt
Self-explanatory. :dd
{Cannot use non-zero forces in an energy minimization} :dt
Fix setforce cannot be used in this manner. Use fix addforce
instead. :dd
{Cannot use nonperiodic boundares with fix ttm} :dt
This fix requires a fully periodic simulation box. :dd
{Cannot use nonperiodic boundaries with Ewald} :dt
For kspace style ewald, all 3 dimensions must have periodic boundaries
unless you use the kspace_modify command to define a 2d slab with a
non-periodic z dimension. :dd
{Cannot use nonperiodic boundaries with EwaldDisp} :dt
For kspace style ewald/disp, all 3 dimensions must have periodic
boundaries unless you use the kspace_modify command to define a 2d
slab with a non-periodic z dimension. :dd
{Cannot use nonperiodic boundaries with PPPM} :dt
For kspace style pppm, all 3 dimensions must have periodic boundaries
unless you use the kspace_modify command to define a 2d slab with a
non-periodic z dimension. :dd
{Cannot use nonperiodic boundaries with PPPMDisp} :dt
For kspace style pppm/disp, all 3 dimensions must have periodic
boundaries unless you use the kspace_modify command to define a 2d
slab with a non-periodic z dimension. :dd
{Cannot use order greater than 8 with pppm/gpu.} :dt
Self-explanatory. :dd
{Cannot use package gpu neigh yes with triclinic box} :dt
This is a current restriction in LAMMPS. :dd
{Cannot use pair hybrid with GPU neighbor list builds} :dt
Neighbor list builds must be done on the CPU for this pair style. :dd
{Cannot use pair tail corrections with 2d simulations} :dt
The correction factors are only currently defined for 3d systems. :dd
{Cannot use processors part command without using partitions} :dt
See the command-line -partition switch. :dd
{Cannot use ramp in variable formula between runs} :dt
This is because the ramp() function is time dependent. :dd
{Cannot use read_data add before simulation box is defined} :dt
Self-explanatory. :dd
{Cannot use read_data extra with add flag} :dt
Self-explanatory. :dd
{Cannot use read_data offset without add flag} :dt
Self-explanatory. :dd
{Cannot use read_data shift without add flag} :dt
Self-explanatory. :dd
{Cannot use region INF or EDGE when box does not exist} :dt
Regions that extend to the box boundaries can only be used after the
create_box command has been used. :dd
{Cannot use set atom with no atom IDs defined} :dt
Atom IDs are not defined, so they cannot be used to identify an atom. :dd
{Cannot use set mol with no molecule IDs defined} :dt
Self-explanatory. :dd
{Cannot use swiggle in variable formula between runs} :dt
This is a function of elapsed time. :dd
{Cannot use tris with fix srd unless overlap is set} :dt
This is because triangles are connected to each other. :dd
{Cannot use variable energy with constant efield in fix efield} :dt
LAMMPS computes the energy itself when the E-field is constant. :dd
{Cannot use variable energy with constant force in fix addforce} :dt
This is because for constant force, LAMMPS can compute the change
in energy directly. :dd
{Cannot use variable every setting for dump dcd} :dt
The format of DCD dump files requires snapshots be output
at a constant frequency. :dd
{Cannot use variable every setting for dump xtc} :dt
The format of this file requires snapshots at regular intervals. :dd
{Cannot use vdisplace in variable formula between runs} :dt
This is a function of elapsed time. :dd
{Cannot use velocity bias command without temp keyword} :dt
Self-explanatory. :dd
{Cannot use velocity create loop all unless atoms have IDs} :dt
Atoms in the simulation to do not have IDs, so this style
of velocity creation cannot be performed. :dd
{Cannot use wall in periodic dimension} :dt
Self-explanatory. :dd
{Cannot use write_restart fileper without % in restart file name} :dt
Self-explanatory. :dd
{Cannot use write_restart nfile without % in restart file name} :dt
Self-explanatory. :dd
{Cannot wiggle and shear fix wall/gran} :dt
Cannot specify both options at the same time. :dd
{Cannot write to restart file - MPI error: %s} :dt
This error was generated by MPI when reading/writing an MPI-IO restart
file. :dd
{Cannot yet use KSpace solver with grid with comm style tiled} :dt
This is current restriction in LAMMPS. :dd
{Cannot yet use comm_style tiled with multi-mode comm} :dt
Self-explanatory. :dd
{Cannot yet use comm_style tiled with triclinic box} :dt
Self-explanatory. :dd
{Cannot yet use compute tally with Kokkos} :dt
This feature is not yet supported. :dd
{Cannot yet use fix bond/break with this improper style} :dt
This is a current restriction in LAMMPS. :dd
{Cannot yet use fix bond/create with this improper style} :dt
This is a current restriction in LAMMPS. :dd
{Cannot yet use minimize with Kokkos} :dt
This feature is not yet supported. :dd
{Cannot yet use pair hybrid with Kokkos} :dt
This feature is not yet supported. :dd
{Cannot zero Langevin force of 0 atoms} :dt
The group has zero atoms, so you cannot request its force
be zeroed. :dd
{Cannot zero gld force for zero atoms} :dt
There are no atoms currently in the group. :dd
{Cannot zero momentum of no atoms} :dt
Self-explanatory. :dd
{Change_box command before simulation box is defined} :dt
Self-explanatory. :dd
{Change_box volume used incorrectly} :dt
The "dim volume" option must be used immediately following one or two
settings for "dim1 ..." (and optionally "dim2 ...") and must be for a
different dimension, i.e. dim != dim1 and dim != dim2. :dd
{Chunk/atom compute does not exist for compute angmom/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute com/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute gyration/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute inertia/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute msd/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute omega/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute property/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute temp/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute torque/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for compute vcm/chunk} :dt
Self-explanatory. :dd
{Chunk/atom compute does not exist for fix ave/chunk} :dt
Self-explanatory. :dd
{Comm tiled invalid index in box drop brick} :dt
Internal error check in comm_style tiled which should not occur.
Contact the developers. :dd
{Comm tiled mis-match in box drop brick} :dt
Internal error check in comm_style tiled which should not occur.
Contact the developers. :dd
{Comm_modify group != atom_modify first group} :dt
Self-explanatory. :dd
{Communication cutoff for comm_style tiled cannot exceed periodic box length} :dt
Self-explanatory. :dd
{Communication cutoff too small for SNAP micro load balancing} :dt
This can happen if you change the neighbor skin after your pair_style
command or if your box dimensions grow during a run. You can set the
cutoff explicitly via the comm_modify cutoff command. :dd
{Compute %s does not allow use of dynamic group} :dt
Dynamic groups have not yet been enabled for this compute. :dd
{Compute ID for compute chunk /atom does not exist} :dt
Self-explanatory. :dd
{Compute ID for compute chunk/atom does not exist} :dt
Self-explanatory. :dd
{Compute ID for compute reduce does not exist} :dt
Self-explanatory. :dd
{Compute ID for compute slice does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/atom does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/chunk does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/correlate does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/histo does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/spatial does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix ave/time does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix store/state does not exist} :dt
Self-explanatory. :dd
{Compute ID for fix vector does not exist} :dt
Self-explanatory. :dd
{Compute ID must be alphanumeric or underscore characters} :dt
Self-explanatory. :dd
{Compute angle/local used when angles are not allowed} :dt
The atom style does not support angles. :dd
{Compute angmom/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute body/local requires atom style body} :dt
Self-explanatory. :dd
{Compute bond/local used when bonds are not allowed} :dt
The atom style does not support bonds. :dd
{Compute centro/atom requires a pair style be defined} :dt
This is because the computation of the centro-symmetry values
uses a pairwise neighbor list. :dd
{Compute chunk/atom bin/cylinder radius is too large for periodic box} :dt
Radius cannot be bigger than 1/2 of a non-axis periodic dimension. :dd
{Compute chunk/atom bin/sphere radius is too large for periodic box} :dt
Radius cannot be bigger than 1/2 of any periodic dimension. :dd
{Compute chunk/atom compute array is accessed out-of-range} :dt
The index for the array is out of bounds. :dd
{Compute chunk/atom compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Compute chunk/atom compute does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Compute chunk/atom compute does not calculate per-atom values} :dt
Self-explanatory. :dd
{Compute chunk/atom cylinder axis must be z for 2d} :dt
Self-explanatory. :dd
{Compute chunk/atom fix array is accessed out-of-range} :dt
the index for the array is out of bounds. :dd
{Compute chunk/atom fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Compute chunk/atom fix does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Compute chunk/atom fix does not calculate per-atom values} :dt
Self-explanatory. :dd
{Compute chunk/atom for triclinic boxes requires units reduced} :dt
Self-explanatory. :dd
{Compute chunk/atom ids once but nchunk is not once} :dt
You cannot assign chunks IDs to atom permanently if the number of
chunks may change. :dd
{Compute chunk/atom molecule for non-molecular system} :dt
Self-explanatory. :dd
{Compute chunk/atom sphere z origin must be 0.0 for 2d} :dt
Self-explanatory. :dd
{Compute chunk/atom stores no IDs for compute property/chunk} :dt
It will only store IDs if its compress option is enabled. :dd
{Compute chunk/atom stores no coord1 for compute property/chunk} :dt
Only certain binning options for compute chunk/atom store coordinates. :dd
{Compute chunk/atom stores no coord2 for compute property/chunk} :dt
Only certain binning options for compute chunk/atom store coordinates. :dd
{Compute chunk/atom stores no coord3 for compute property/chunk} :dt
Only certain binning options for compute chunk/atom store coordinates. :dd
{Compute chunk/atom variable is not atom-style variable} :dt
Self-explanatory. :dd
{Compute chunk/atom without bins cannot use discard mixed} :dt
That discard option only applies to the binning styles. :dd
{Compute cluster/atom cutoff is longer than pairwise cutoff} :dt
Cannot identify clusters beyond cutoff. :dd
{Compute cluster/atom requires a pair style be defined} :dt
This is so that the pair style defines a cutoff distance which
is used to find clusters. :dd
{Compute cna/atom cutoff is longer than pairwise cutoff} :dt
Self-explanatory. :dd
{Compute cna/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute com/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute contact/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute contact/atom requires atom style sphere} :dt
Self-explanatory. :dd
{Compute coord/atom cutoff is longer than pairwise cutoff} :dt
Cannot compute coordination at distances longer than the pair cutoff,
since those atoms are not in the neighbor list. :dd
{Compute coord/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute damage/atom requires peridynamic potential} :dt
Damage is a Peridynamic-specific metric. It requires you
to be running a Peridynamics simulation. :dd
{Compute dihedral/local used when dihedrals are not allowed} :dt
The atom style does not support dihedrals. :dd
{Compute dilatation/atom cannot be used with this pair style} :dt
Self-explanatory. :dd
{Compute dilatation/atom requires Peridynamic pair style} :dt
Self-explanatory. :dd
{Compute does not allow an extra compute or fix to be reset} :dt
This is an internal LAMMPS error. Please report it to the
developers. :dd
{Compute erotate/asphere requires atom style ellipsoid or line or tri} :dt
Self-explanatory. :dd
{Compute erotate/asphere requires extended particles} :dt
This compute cannot be used with point particles. :dd
{Compute erotate/rigid with non-rigid fix-ID} :dt
Self-explanatory. :dd
{Compute erotate/sphere requires atom style sphere} :dt
Self-explanatory. :dd
{Compute erotate/sphere/atom requires atom style sphere} :dt
Self-explanatory. :dd
{Compute event/displace has invalid fix event assigned} :dt
This is an internal LAMMPS error. Please report it to the
developers. :dd
{Compute group/group group ID does not exist} :dt
Self-explanatory. :dd
{Compute gyration/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute heat/flux compute ID does not compute ke/atom} :dt
Self-explanatory. :dd
{Compute heat/flux compute ID does not compute pe/atom} :dt
Self-explanatory. :dd
{Compute heat/flux compute ID does not compute stress/atom} :dt
Self-explanatory. :dd
{Compute hexorder/atom cutoff is longer than pairwise cutoff} :dt
Cannot compute order parameter beyond cutoff. :dd
{Compute hexorder/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute improper/local used when impropers are not allowed} :dt
The atom style does not support impropers. :dd
{Compute inertia/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute ke/rigid with non-rigid fix-ID} :dt
Self-explanatory. :dd
{Compute msd/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute msd/chunk nchunk is not static} :dt
This is required because the MSD cannot be computed consistently if
the number of chunks is changing. Compute chunk/atom allows setting
nchunk to be static. :dd
{Compute nve/asphere requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Compute nvt/nph/npt asphere requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Compute nvt/nph/npt body requires atom style body} :dt
Self-explanatory. :dd
{Compute omega/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute orientorder/atom cutoff is longer than pairwise cutoff} :dt
Cannot compute order parameter beyond cutoff. :dd
{Compute orientorder/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute pair must use group all} :dt
Pair styles accumulate energy on all atoms. :dd
{Compute pe must use group all} :dt
Energies computed by potentials (pair, bond, etc) are computed on all
atoms. :dd
{Compute plasticity/atom cannot be used with this pair style} :dt
Self-explanatory. :dd
{Compute plasticity/atom requires Peridynamic pair style} :dt
Self-explanatory. :dd
{Compute pressure must use group all} :dt
Virial contributions computed by potentials (pair, bond, etc) are
computed on all atoms. :dd
{Compute pressure requires temperature ID to include kinetic energy} :dt
The keflag cannot be used unless a temperature compute is provided. :dd
{Compute pressure temperature ID does not compute temperature} :dt
The compute ID assigned to a pressure computation must compute
temperature. :dd
{Compute property/atom floating point vector does not exist} :dt
The command is accessing a vector added by the fix property/atom
command, that does not exist. :dd
{Compute property/atom for atom property that isn't allocated} :dt
Self-explanatory. :dd
{Compute property/atom integer vector does not exist} :dt
The command is accessing a vector added by the fix property/atom
command, that does not exist. :dd
{Compute property/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute property/local cannot use these inputs together} :dt
Only inputs that generate the same number of datums can be used
together. E.g. bond and angle quantities cannot be mixed. :dd
{Compute property/local does not (yet) work with atom_style template} :dt
Self-explanatory. :dd
{Compute property/local for property that isn't allocated} :dt
Self-explanatory. :dd
{Compute rdf requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute reduce compute array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Compute reduce compute calculates global values} :dt
A compute that calculates peratom or local values is required. :dd
{Compute reduce compute does not calculate a local array} :dt
Self-explanatory. :dd
{Compute reduce compute does not calculate a local vector} :dt
Self-explanatory. :dd
{Compute reduce compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Compute reduce compute does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Compute reduce fix array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Compute reduce fix calculates global values} :dt
A fix that calculates peratom or local values is required. :dd
{Compute reduce fix does not calculate a local array} :dt
Self-explanatory. :dd
{Compute reduce fix does not calculate a local vector} :dt
Self-explanatory. :dd
{Compute reduce fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Compute reduce fix does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Compute reduce replace requires min or max mode} :dt
Self-explanatory. :dd
{Compute reduce variable is not atom-style variable} :dt
Self-explanatory. :dd
{Compute slice compute array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Compute slice compute does not calculate a global array} :dt
Self-explanatory. :dd
{Compute slice compute does not calculate a global vector} :dt
Self-explanatory. :dd
{Compute slice compute does not calculate global vector or array} :dt
Self-explanatory. :dd
{Compute slice compute vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Compute slice fix array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Compute slice fix does not calculate a global array} :dt
Self-explanatory. :dd
{Compute slice fix does not calculate a global vector} :dt
Self-explanatory. :dd
{Compute slice fix does not calculate global vector or array} :dt
Self-explanatory. :dd
{Compute slice fix vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Compute sna/atom cutoff is longer than pairwise cutoff} :dt
Self-explanatory. :dd
{Compute sna/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute snad/atom cutoff is longer than pairwise cutoff} :dt
Self-explanatory. :dd
{Compute snad/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute snav/atom cutoff is longer than pairwise cutoff} :dt
Self-explanatory. :dd
{Compute snav/atom requires a pair style be defined} :dt
Self-explanatory. :dd
{Compute stress/atom temperature ID does not compute temperature} :dt
The specified compute must compute temperature. :dd
{Compute temp/asphere requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Compute temp/asphere requires extended particles} :dt
This compute cannot be used with point particles. :dd
{Compute temp/body requires atom style body} :dt
Self-explanatory. :dd
{Compute temp/body requires bodies} :dt
This compute can only be applied to body particles. :dd
{Compute temp/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute temp/cs requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Compute temp/cs used when bonds are not allowed} :dt
This compute only works on pairs of bonded particles. :dd
{Compute temp/partial cannot use vz for 2d systemx} :dt
Self-explanatory. :dd
{Compute temp/profile cannot bin z for 2d systems} :dt
Self-explanatory. :dd
{Compute temp/profile cannot use vz for 2d systemx} :dt
Self-explanatory. :dd
{Compute temp/sphere requires atom style sphere} :dt
Self-explanatory. :dd
{Compute ti kspace style does not exist} :dt
Self-explanatory. :dd
{Compute ti pair style does not exist} :dt
Self-explanatory. :dd
{Compute ti tail when pair style does not compute tail corrections} :dt
Self-explanatory. :dd
{Compute torque/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Compute used in dump between runs is not current} :dt
The compute was not invoked on the current timestep, therefore it
cannot be used in a dump between runs. :dd
{Compute used in variable between runs is not current} :dt
Computes cannot be invoked by a variable in between runs. Thus they
must have been evaluated on the last timestep of the previous run in
order for their value(s) to be accessed. See the doc page for the
variable command for more info. :dd
{Compute used in variable thermo keyword between runs is not current} :dt
Some thermo keywords rely on a compute to calculate their value(s).
Computes cannot be invoked by a variable in between runs. Thus they
must have been evaluated on the last timestep of the previous run in
order for their value(s) to be accessed. See the doc page for the
variable command for more info. :dd
{Compute vcm/chunk does not use chunk/atom compute} :dt
The style of the specified compute is not chunk/atom. :dd
{Computed temperature for fix temp/berendsen cannot be 0.0} :dt
Self-explanatory. :dd
{Computed temperature for fix temp/rescale cannot be 0.0} :dt
Cannot rescale the temperature to a new value if the current
temperature is 0.0. :dd
{Core/shell partner atom not found} :dt
Could not find one of the atoms in the bond pair. :dd
{Core/shell partners were not all found} :dt
Could not find or more atoms in the bond pairs. :dd
{Could not adjust g_ewald_6} :dt
The Newton-Raphson solver failed to converge to a good value for
g_ewald. This error should not occur for typical problems. Please
send an email to the developers. :dd
{Could not compute g_ewald} :dt
The Newton-Raphson solver failed to converge to a good value for
g_ewald. This error should not occur for typical problems. Please
send an email to the developers. :dd
{Could not compute grid size} :dt
The code is unable to compute a grid size consistent with the desired
accuracy. This error should not occur for typical problems. Please
send an email to the developers. :dd
{Could not compute grid size for Coulomb interaction} :dt
The code is unable to compute a grid size consistent with the desired
accuracy. This error should not occur for typical problems. Please
send an email to the developers. :dd
{Could not compute grid size for Dispersion} :dt
The code is unable to compute a grid size consistent with the desired
accuracy. This error should not occur for typical problems. Please
send an email to the developers. :dd
{Could not create 3d FFT plan} :dt
The FFT setup for the PPPM solver failed, typically due
to lack of memory. This is an unusual error. Check the
size of the FFT grid you are requesting. :dd
{Could not create 3d grid of processors} :dt
The specified constraints did not allow a Px by Py by Pz grid to be
created where Px * Py * Pz = P = total number of processors. :dd
{Could not create 3d remap plan} :dt
The FFT setup in pppm failed. :dd
{Could not create Python function arguments} :dt
This is an internal Python error, possibly because the number
of inputs to the function is too large. :dd
{Could not create numa grid of processors} :dt
The specified constraints did not allow this style of grid to be
created. Usually this is because the total processor count is not a
multiple of the cores/node or the user specified processor count is >
1 in one of the dimensions. :dd
{Could not create twolevel 3d grid of processors} :dt
The specified constraints did not allow this style of grid to be
created. :dd
{Could not evaluate Python function input variable} :dt
Self-explanatory. :dd
{Could not find Python function} :dt
The provided Python code was run successfully, but it not
define a callable function with the required name. :dd
{Could not find atom_modify first group ID} :dt
Self-explanatory. :dd
{Could not find change_box group ID} :dt
Group ID used in the change_box command does not exist. :dd
{Could not find compute ID for PRD} :dt
Self-explanatory. :dd
{Could not find compute ID for TAD} :dt
Self-explanatory. :dd
{Could not find compute ID for temperature bias} :dt
Self-explanatory. :dd
{Could not find compute ID to delete} :dt
Self-explanatory. :dd
{Could not find compute displace/atom fix ID} :dt
Self-explanatory. :dd
{Could not find compute event/displace fix ID} :dt
Self-explanatory. :dd
{Could not find compute group ID} :dt
Self-explanatory. :dd
{Could not find compute heat/flux compute ID} :dt
Self-explanatory. :dd
{Could not find compute msd fix ID} :dt
Self-explanatory. :dd
{Could not find compute msd/chunk fix ID} :dt
The compute creates an internal fix, which has been deleted. :dd
{Could not find compute pressure temperature ID} :dt
The compute ID for calculating temperature does not exist. :dd
{Could not find compute stress/atom temperature ID} :dt
Self-explanatory. :dd
{Could not find compute vacf fix ID} :dt
Self-explanatory. :dd
{Could not find compute/voronoi surface group ID} :dt
Self-explanatory. :dd
{Could not find compute_modify ID} :dt
Self-explanatory. :dd
{Could not find custom per-atom property ID} :dt
Self-explanatory. :dd
{Could not find delete_atoms group ID} :dt
Group ID used in the delete_atoms command does not exist. :dd
{Could not find delete_atoms region ID} :dt
Region ID used in the delete_atoms command does not exist. :dd
{Could not find displace_atoms group ID} :dt
Group ID used in the displace_atoms command does not exist. :dd
{Could not find dump custom compute ID} :dt
Self-explanatory. :dd
{Could not find dump custom fix ID} :dt
Self-explanatory. :dd
{Could not find dump custom variable name} :dt
Self-explanatory. :dd
{Could not find dump group ID} :dt
A group ID used in the dump command does not exist. :dd
{Could not find dump local compute ID} :dt
Self-explanatory. :dd
{Could not find dump local fix ID} :dt
Self-explanatory. :dd
{Could not find dump modify compute ID} :dt
Self-explanatory. :dd
{Could not find dump modify custom atom floating point property ID} :dt
Self-explanatory. :dd
{Could not find dump modify custom atom integer property ID} :dt
Self-explanatory. :dd
{Could not find dump modify fix ID} :dt
Self-explanatory. :dd
{Could not find dump modify variable name} :dt
Self-explanatory. :dd
{Could not find fix ID to delete} :dt
Self-explanatory. :dd
{Could not find fix adapt storage fix ID} :dt
This should not happen unless you explicitly deleted
a secondary fix that fix adapt created internally. :dd
{Could not find fix gcmc exclusion group ID} :dt
Self-explanatory. :dd
{Could not find fix gcmc rotation group ID} :dt
Self-explanatory. :dd
{Could not find fix group ID} :dt
A group ID used in the fix command does not exist. :dd
{Could not find fix msst compute ID} :dt
Self-explanatory. :dd
{Could not find fix poems group ID} :dt
A group ID used in the fix poems command does not exist. :dd
{Could not find fix recenter group ID} :dt
A group ID used in the fix recenter command does not exist. :dd
{Could not find fix rigid group ID} :dt
A group ID used in the fix rigid command does not exist. :dd
{Could not find fix srd group ID} :dt
Self-explanatory. :dd
{Could not find fix_modify ID} :dt
A fix ID used in the fix_modify command does not exist. :dd
{Could not find fix_modify pressure ID} :dt
The compute ID for computing pressure does not exist. :dd
{Could not find fix_modify temperature ID} :dt
The compute ID for computing temperature does not exist. :dd
{Could not find group clear group ID} :dt
Self-explanatory. :dd
{Could not find group delete group ID} :dt
Self-explanatory. :dd
{Could not find pair fix ID} :dt
A fix is created internally by the pair style to store shear
history information. You cannot delete it. :dd
{Could not find set group ID} :dt
Group ID specified in set command does not exist. :dd
{Could not find specified fix gcmc group ID} :dt
Self-explanatory. :dd
{Could not find thermo compute ID} :dt
Compute ID specified in thermo_style command does not exist. :dd
{Could not find thermo custom compute ID} :dt
The compute ID needed by thermo style custom to compute a requested
quantity does not exist. :dd
{Could not find thermo custom fix ID} :dt
The fix ID needed by thermo style custom to compute a requested
quantity does not exist. :dd
{Could not find thermo custom variable name} :dt
Self-explanatory. :dd
{Could not find thermo fix ID} :dt
Fix ID specified in thermo_style command does not exist. :dd
{Could not find thermo variable name} :dt
Self-explanatory. :dd
{Could not find thermo_modify pressure ID} :dt
The compute ID needed by thermo style custom to compute pressure does
not exist. :dd
{Could not find thermo_modify temperature ID} :dt
The compute ID needed by thermo style custom to compute temperature does
not exist. :dd
{Could not find undump ID} :dt
A dump ID used in the undump command does not exist. :dd
{Could not find velocity group ID} :dt
A group ID used in the velocity command does not exist. :dd
{Could not find velocity temperature ID} :dt
The compute ID needed by the velocity command to compute temperature
does not exist. :dd
{Could not find/initialize a specified accelerator device} :dt
Could not initialize at least one of the devices specified for the gpu
package :dd
{Could not grab element entry from EIM potential file} :dt
Self-explanatory :dd
{Could not grab global entry from EIM potential file} :dt
Self-explanatory. :dd
{Could not grab pair entry from EIM potential file} :dt
Self-explanatory. :dd
{Could not initialize embedded Python} :dt
The main module in Python was not accessible. :dd
{Could not open Python file} :dt
The specified file of Python code cannot be opened. Check that the
path and name are correct. :dd
{Could not process Python file} :dt
The Python code in the specified file was not run successfully by
Python, probably due to errors in the Python code. :dd
{Could not process Python string} :dt
The Python code in the here string was not run successfully by Python,
probably due to errors in the Python code. :dd
{Coulomb PPPMDisp order has been reduced below minorder} :dt
The default minimum order is 2. This can be reset by the
kspace_modify minorder command. :dd
{Coulomb cut not supported in pair_style buck/long/coul/coul} :dt
Must use long-range Coulombic interactions. :dd
{Coulomb cut not supported in pair_style lj/long/coul/long} :dt
Must use long-range Coulombic interactions. :dd
{Coulomb cut not supported in pair_style lj/long/tip4p/long} :dt
Must use long-range Coulombic interactions. :dd
{Coulomb cutoffs of pair hybrid sub-styles do not match} :dt
If using a Kspace solver, all Coulomb cutoffs of long pair styles must
be the same. :dd
{Coulombic cut not supported in pair_style lj/long/dipole/long} :dt
Must use long-range Coulombic interactions. :dd
{Cound not find dump_modify ID} :dt
Self-explanatory. :dd
{Create_atoms command before simulation box is defined} :dt
The create_atoms command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Create_atoms molecule has atom IDs, but system does not} :dt
The atom_style id command can be used to force atom IDs to be stored. :dd
{Create_atoms molecule must have atom types} :dt
The defined molecule does not specify atom types. :dd
{Create_atoms molecule must have coordinates} :dt
The defined molecule does not specify coordinates. :dd
{Create_atoms region ID does not exist} :dt
A region ID used in the create_atoms command does not exist. :dd
{Create_bonds command before simulation box is defined} :dt
Self-explanatory. :dd
{Create_bonds command requires no kspace_style be defined} :dt
This is so that atom pairs that are already bonded to not appear
in the neighbor list. :dd
{Create_bonds command requires special_bonds 1-2 weights be 0.0} :dt
This is so that atom pairs that are already bonded to not appear in
the neighbor list. :dd
{Create_bonds max distance > neighbor cutoff} :dt
Can only create bonds for atom pairs that will be in neighbor list. :dd
{Create_bonds requires a pair style be defined} :dt
Self-explanatory. :dd
{Create_box region ID does not exist} :dt
Self-explanatory. :dd
{Create_box region does not support a bounding box} :dt
Not all regions represent bounded volumes. You cannot use
such a region with the create_box command. :dd
{Custom floating point vector for fix store/state does not exist} :dt
The command is accessing a vector added by the fix property/atom
command, that does not exist. :dd
{Custom integer vector for fix store/state does not exist} :dt
The command is accessing a vector added by the fix property/atom
command, that does not exist. :dd
{Custom per-atom property ID is not floating point} :dt
Self-explanatory. :dd
{Custom per-atom property ID is not integer} :dt
Self-explanatory. :dd
{Cut-offs missing in pair_style lj/long/dipole/long} :dt
Self-explanatory. :dd
{Cutoffs missing in pair_style buck/long/coul/long} :dt
Self-explanatory. :dd
{Cutoffs missing in pair_style lj/long/coul/long} :dt
Self-explanatory. :dd
{Cyclic loop in joint connections} :dt
Fix poems cannot (yet) work with coupled bodies whose joints connect
the bodies in a ring (or cycle). :dd
{Degenerate lattice primitive vectors} :dt
Invalid set of 3 lattice vectors for lattice command. :dd
{Delete region ID does not exist} :dt
Self-explanatory. :dd
{Delete_atoms command before simulation box is defined} :dt
The delete_atoms command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Delete_atoms cutoff > max neighbor cutoff} :dt
Can only delete atoms in atom pairs that will be in neighbor list. :dd
{Delete_atoms mol yes requires atom attribute molecule} :dt
Cannot use this option with a non-molecular system. :dd
{Delete_atoms requires a pair style be defined} :dt
This is because atom deletion within a cutoff uses a pairwise
neighbor list. :dd
{Delete_bonds command before simulation box is defined} :dt
The delete_bonds command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Delete_bonds command with no atoms existing} :dt
No atoms are yet defined so the delete_bonds command cannot be used. :dd
{Deposition region extends outside simulation box} :dt
Self-explanatory. :dd
{Did not assign all atoms correctly} :dt
Atoms read in from a data file were not assigned correctly to
processors. This is likely due to some atom coordinates being
outside a non-periodic simulation box. :dd
{Did not assign all restart atoms correctly} :dt
Atoms read in from the restart file were not assigned correctly to
processors. This is likely due to some atom coordinates being outside
a non-periodic simulation box. Normally this should not happen. You
may wish to use the "remap" option on the read_restart command to see
if this helps. :dd
{Did not find all elements in MEAM library file} :dt
The requested elements were not found in the MEAM file. :dd
{Did not find fix shake partner info} :dt
Could not find bond partners implied by fix shake command. This error
can be triggered if the delete_bonds command was used before fix
shake, and it removed bonds without resetting the 1-2, 1-3, 1-4
weighting list via the special keyword. :dd
{Did not find keyword in table file} :dt
Keyword used in pair_coeff command was not found in table file. :dd
{Did not set pressure for fix rigid/nph} :dt
The press keyword must be specified. :dd
{Did not set temp for fix rigid/nvt/small} :dt
Self-explanatory. :dd
{Did not set temp or press for fix rigid/npt/small} :dt
Self-explanatory. :dd
{Did not set temperature for fix rigid/nvt} :dt
The temp keyword must be specified. :dd
{Did not set temperature or pressure for fix rigid/npt} :dt
The temp and press keywords must be specified. :dd
{Dihedral atom missing in delete_bonds} :dt
The delete_bonds command cannot find one or more atoms in a particular
dihedral on a particular processor. The pairwise cutoff is too short
or the atoms are too far apart to make a valid dihedral. :dd
{Dihedral atom missing in set command} :dt
The set command cannot find one or more atoms in a particular dihedral
on a particular processor. The pairwise cutoff is too short or the
atoms are too far apart to make a valid dihedral. :dd
{Dihedral atoms %d %d %d %d missing on proc %d at step %ld} :dt
One or more of 4 atoms needed to compute a particular dihedral are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the dihedral has blown apart and an atom is
too far away. :dd
{Dihedral atoms missing on proc %d at step %ld} :dt
One or more of 4 atoms needed to compute a particular dihedral are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the dihedral has blown apart and an atom is
too far away. :dd
{Dihedral charmm is incompatible with Pair style} :dt
Dihedral style charmm must be used with a pair style charmm
in order for the 1-4 epsilon/sigma parameters to be defined. :dd
{Dihedral coeff for hybrid has invalid style} :dt
Dihedral style hybrid uses another dihedral style as one of its
coefficients. The dihedral style used in the dihedral_coeff command
or read from a restart file is not recognized. :dd
{Dihedral coeffs are not set} :dt
No dihedral coefficients have been assigned in the data file or via
the dihedral_coeff command. :dd
{Dihedral style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Dihedral style hybrid cannot have none as an argument} :dt
Self-explanatory. :dd
{Dihedral style hybrid cannot use same dihedral style twice} :dt
Self-explanatory. :dd
{Dihedral/improper extent > half of periodic box length} :dt
This error was detected by the neigh_modify check yes setting. It is
an error because the dihedral atoms are so far apart it is ambiguous
how it should be defined. :dd
{Dihedral_coeff command before dihedral_style is defined} :dt
Coefficients cannot be set in the data file or via the dihedral_coeff
command until an dihedral_style has been assigned. :dd
{Dihedral_coeff command before simulation box is defined} :dt
The dihedral_coeff command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Dihedral_coeff command when no dihedrals allowed} :dt
The chosen atom style does not allow for dihedrals to be defined. :dd
{Dihedral_style command when no dihedrals allowed} :dt
The chosen atom style does not allow for dihedrals to be defined. :dd
{Dihedrals assigned incorrectly} :dt
Dihedrals read in from the data file were not assigned correctly to
atoms. This means there is something invalid about the topology
definitions. :dd
{Dihedrals defined but no dihedral types} :dt
The data file header lists dihedrals but no dihedral types. :dd
{Dimension command after simulation box is defined} :dt
The dimension command cannot be used after a read_data,
read_restart, or create_box command. :dd
{Dispersion PPPMDisp order has been reduced below minorder} :dt
The default minimum order is 2. This can be reset by the
kspace_modify minorder command. :dd
{Displace_atoms command before simulation box is defined} :dt
The displace_atoms command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Distance must be > 0 for compute event/displace} :dt
Self-explanatory. :dd
{Divide by 0 in influence function} :dt
This should not normally occur. It is likely a problem with your
model. :dd
{Divide by 0 in influence function of pair peri/lps} :dt
This should not normally occur. It is likely a problem with your
model. :dd
{Divide by 0 in variable formula} :dt
Self-explanatory. :dd
{Domain too large for neighbor bins} :dt
The domain has become extremely large so that neighbor bins cannot be
used. Most likely, one or more atoms have been blown out of the
simulation box to a great distance. :dd
{Double precision is not supported on this accelerator} :dt
Self-explanatory :dd
{Dump atom/gz only writes compressed files} :dt
The dump atom/gz output file name must have a .gz suffix. :dd
{Dump cfg arguments can not mix xs|ys|zs with xsu|ysu|zsu} :dt
Self-explanatory. :dd
{Dump cfg arguments must start with 'mass type xs ys zs' or 'mass type xsu ysu zsu'} :dt
This is a requirement of the CFG output format. See the dump cfg doc
page for more details. :dd
{Dump cfg requires one snapshot per file} :dt
Use the wildcard "*" character in the filename. :dd
{Dump cfg/gz only writes compressed files} :dt
The dump cfg/gz output file name must have a .gz suffix. :dd
{Dump custom and fix not computed at compatible times} :dt
The fix must produce per-atom quantities on timesteps that dump custom
needs them. :dd
{Dump custom compute does not calculate per-atom array} :dt
Self-explanatory. :dd
{Dump custom compute does not calculate per-atom vector} :dt
Self-explanatory. :dd
{Dump custom compute does not compute per-atom info} :dt
Self-explanatory. :dd
{Dump custom compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Dump custom fix does not compute per-atom array} :dt
Self-explanatory. :dd
{Dump custom fix does not compute per-atom info} :dt
Self-explanatory. :dd
{Dump custom fix does not compute per-atom vector} :dt
Self-explanatory. :dd
{Dump custom fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Dump custom variable is not atom-style variable} :dt
Only atom-style variables generate per-atom quantities, needed for
dump output. :dd
{Dump custom/gz only writes compressed files} :dt
The dump custom/gz output file name must have a .gz suffix. :dd
{Dump dcd of non-matching # of atoms} :dt
Every snapshot written by dump dcd must contain the same # of atoms. :dd
{Dump dcd requires sorting by atom ID} :dt
Use the dump_modify sort command to enable this. :dd
{Dump every variable returned a bad timestep} :dt
The variable must return a timestep greater than the current timestep. :dd
{Dump file MPI-IO output not allowed with % in filename} :dt
This is because a % signifies one file per processor and MPI-IO
creates one large file for all processors. :dd
{Dump file does not contain requested snapshot} :dt
Self-explanatory. :dd
{Dump file is incorrectly formatted} :dt
Self-explanatory. :dd
{Dump image body yes requires atom style body} :dt
Self-explanatory. :dd
{Dump image bond not allowed with no bond types} :dt
Self-explanatory. :dd
{Dump image cannot perform sorting} :dt
Self-explanatory. :dd
{Dump image line requires atom style line} :dt
Self-explanatory. :dd
{Dump image persp option is not yet supported} :dt
Self-explanatory. :dd
{Dump image requires one snapshot per file} :dt
Use a "*" in the filename. :dd
{Dump image tri requires atom style tri} :dt
Self-explanatory. :dd
{Dump local and fix not computed at compatible times} :dt
The fix must produce per-atom quantities on timesteps that dump local
needs them. :dd
{Dump local attributes contain no compute or fix} :dt
Self-explanatory. :dd
{Dump local cannot sort by atom ID} :dt
This is because dump local does not really dump per-atom info. :dd
{Dump local compute does not calculate local array} :dt
Self-explanatory. :dd
{Dump local compute does not calculate local vector} :dt
Self-explanatory. :dd
{Dump local compute does not compute local info} :dt
Self-explanatory. :dd
{Dump local compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Dump local count is not consistent across input fields} :dt
Every column of output must be the same length. :dd
{Dump local fix does not compute local array} :dt
Self-explanatory. :dd
{Dump local fix does not compute local info} :dt
Self-explanatory. :dd
{Dump local fix does not compute local vector} :dt
Self-explanatory. :dd
{Dump local fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Dump modify bcolor not allowed with no bond types} :dt
Self-explanatory. :dd
{Dump modify bdiam not allowed with no bond types} :dt
Self-explanatory. :dd
{Dump modify compute ID does not compute per-atom array} :dt
Self-explanatory. :dd
{Dump modify compute ID does not compute per-atom info} :dt
Self-explanatory. :dd
{Dump modify compute ID does not compute per-atom vector} :dt
Self-explanatory. :dd
{Dump modify compute ID vector is not large enough} :dt
Self-explanatory. :dd
{Dump modify element names do not match atom types} :dt
Number of element names must equal number of atom types. :dd
{Dump modify fix ID does not compute per-atom array} :dt
Self-explanatory. :dd
{Dump modify fix ID does not compute per-atom info} :dt
Self-explanatory. :dd
{Dump modify fix ID does not compute per-atom vector} :dt
Self-explanatory. :dd
{Dump modify fix ID vector is not large enough} :dt
Self-explanatory. :dd
{Dump modify variable is not atom-style variable} :dt
Self-explanatory. :dd
{Dump sort column is invalid} :dt
Self-explanatory. :dd
{Dump xtc requires sorting by atom ID} :dt
Use the dump_modify sort command to enable this. :dd
{Dump xyz/gz only writes compressed files} :dt
The dump xyz/gz output file name must have a .gz suffix. :dd
{Dump_modify buffer yes not allowed for this style} :dt
Self-explanatory. :dd
{Dump_modify format string is too short} :dt
There are more fields to be dumped in a line of output than your
format string specifies. :dd
{Dump_modify region ID does not exist} :dt
Self-explanatory. :dd
{Dumping an atom property that isn't allocated} :dt
The chosen atom style does not define the per-atom quantity being
dumped. :dd
{Duplicate atom IDs exist} :dt
Self-explanatory. :dd
{Duplicate fields in read_dump command} :dt
Self-explanatory. :dd
{Duplicate particle in PeriDynamic bond - simulation box is too small} :dt
This is likely because your box length is shorter than 2 times
the bond length. :dd
{Electronic temperature dropped below zero} :dt
Something has gone wrong with the fix ttm electron temperature model. :dd
{Element not defined in potential file} :dt
The specified element is not in the potential file. :dd
{Empty brackets in variable} :dt
There is no variable syntax that uses empty brackets. Check
the variable doc page. :dd
{Energy was not tallied on needed timestep} :dt
You are using a thermo keyword that requires potentials to
have tallied energy, but they didn't on this timestep. See the
variable doc page for ideas on how to make this work. :dd
{Epsilon or sigma reference not set by pair style in PPPMDisp} :dt
Self-explanatory. :dd
{Epsilon or sigma reference not set by pair style in ewald/n} :dt
The pair style is not providing the needed epsilon or sigma values. :dd
{Error in vdw spline: inner radius > outer radius} :dt
A pre-tabulated spline is invalid. Likely a problem with the
potential parameters. :dd
{Error writing averaged chunk data} :dt
Something in the output to the file triggered an error. :dd
{Error writing file header} :dt
Something in the output to the file triggered an error. :dd
{Error writing out correlation data} :dt
Something in the output to the file triggered an error. :dd
{Error writing out histogram data} :dt
Something in the output to the file triggered an error. :dd
{Error writing out time averaged data} :dt
Something in the output to the file triggered an error. :dd
{Failed to allocate %ld bytes for array %s} :dt
Your LAMMPS simulation has run out of memory. You need to run a
smaller simulation or on more processors. :dd
{Failed to open FFmpeg pipeline to file %s} :dt
The specified file cannot be opened. Check that the path and name are
correct and writable and that the FFmpeg executable can be found and run. :dd
{Failed to reallocate %ld bytes for array %s} :dt
Your LAMMPS simulation has run out of memory. You need to run a
smaller simulation or on more processors. :dd
{Fewer SRD bins than processors in some dimension} :dt
This is not allowed. Make your SRD bin size smaller. :dd
{File variable could not read value} :dt
Check the file assigned to the variable. :dd
{Final box dimension due to fix deform is < 0.0} :dt
Self-explanatory. :dd
{Fix %s does not allow use of dynamic group} :dt
Dynamic groups have not yet been enabled for this fix. :dd
{Fix ID for compute chunk/atom does not exist} :dt
Self-explanatory. :dd
{Fix ID for compute erotate/rigid does not exist} :dt
Self-explanatory. :dd
{Fix ID for compute ke/rigid does not exist} :dt
Self-explanatory. :dd
{Fix ID for compute reduce does not exist} :dt
Self-explanatory. :dd
{Fix ID for compute slice does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/atom does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/chunk does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/correlate does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/histo does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/spatial does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix ave/time does not exist} :dt
Self-explanatory. :dd
{Fix ID for fix store/state does not exist} :dt
Self-explanatory :dd
{Fix ID for fix vector does not exist} :dt
Self-explanatory. :dd
{Fix ID for read_data does not exist} :dt
Self-explanatory. :dd
{Fix ID for velocity does not exist} :dt
Self-explanatory. :dd
{Fix ID must be alphanumeric or underscore characters} :dt
Self-explanatory. :dd
{Fix SRD: bad bin assignment for SRD advection} :dt
Something has gone wrong in your SRD model; try using more
conservative settings. :dd
{Fix SRD: bad search bin assignment} :dt
Something has gone wrong in your SRD model; try using more
conservative settings. :dd
{Fix SRD: bad stencil bin for big particle} :dt
Something has gone wrong in your SRD model; try using more
conservative settings. :dd
{Fix SRD: too many big particles in bin} :dt
Reset the ATOMPERBIN parameter at the top of fix_srd.cpp
to a larger value, and re-compile the code. :dd
{Fix SRD: too many walls in bin} :dt
This should not happen unless your system has been setup incorrectly. :dd
{Fix adapt interface to this pair style not supported} :dt
New coding for the pair style would need to be done. :dd
{Fix adapt kspace style does not exist} :dt
Self-explanatory. :dd
{Fix adapt pair style does not exist} :dt
Self-explanatory :dd
{Fix adapt pair style param not supported} :dt
The pair style does not know about the parameter you specified. :dd
{Fix adapt requires atom attribute charge} :dt
The atom style being used does not specify an atom charge. :dd
{Fix adapt requires atom attribute diameter} :dt
The atom style being used does not specify an atom diameter. :dd
{Fix adapt type pair range is not valid for pair hybrid sub-style} :dt
Self-explanatory. :dd
{Fix append/atoms requires a lattice be defined} :dt
Use the lattice command for this purpose. :dd
{Fix ave/atom compute array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/atom compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/atom compute does not calculate a per-atom vector} :dt
A compute used by fix ave/atom must generate per-atom values. :dd
{Fix ave/atom compute does not calculate per-atom values} :dt
A compute used by fix ave/atom must generate per-atom values. :dd
{Fix ave/atom fix array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/atom fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/atom fix does not calculate a per-atom vector} :dt
A fix used by fix ave/atom must generate per-atom values. :dd
{Fix ave/atom fix does not calculate per-atom values} :dt
A fix used by fix ave/atom must generate per-atom values. :dd
{Fix ave/atom variable is not atom-style variable} :dt
A variable used by fix ave/atom must generate per-atom values. :dd
{Fix ave/chunk compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/chunk compute does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Fix ave/chunk compute does not calculate per-atom values} :dt
Self-explanatory. :dd
{Fix ave/chunk compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/chunk does not use chunk/atom compute} :dt
The specified compute is not for a compute chunk/atom command. :dd
{Fix ave/chunk fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/chunk fix does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Fix ave/chunk fix does not calculate per-atom values} :dt
Self-explanatory. :dd
{Fix ave/chunk fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/chunk variable is not atom-style variable} :dt
Self-explanatory. :dd
{Fix ave/correlate compute does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix ave/correlate compute does not calculate a vector} :dt
Self-explanatory. :dd
{Fix ave/correlate compute vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/correlate fix does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix ave/correlate fix does not calculate a vector} :dt
Self-explanatory. :dd
{Fix ave/correlate fix vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/correlate variable is not equal-style variable} :dt
Self-explanatory. :dd
{Fix ave/histo cannot input local values in scalar mode} :dt
Self-explanatory. :dd
{Fix ave/histo cannot input per-atom values in scalar mode} :dt
Self-explanatory. :dd
{Fix ave/histo compute array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a global array} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a global scalar} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a global vector} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a local array} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a local vector} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate local values} :dt
Self-explanatory. :dd
{Fix ave/histo compute does not calculate per-atom values} :dt
Self-explanatory. :dd
{Fix ave/histo compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/histo fix array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a global array} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a global scalar} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a global vector} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a local array} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a local vector} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate a per-atom vector} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate local values} :dt
Self-explanatory. :dd
{Fix ave/histo fix does not calculate per-atom values} :dt
Self-explanatory. :dd
{Fix ave/histo fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix ave/histo input is invalid compute} :dt
Self-explanatory. :dd
{Fix ave/histo input is invalid fix} :dt
Self-explanatory. :dd
{Fix ave/histo input is invalid variable} :dt
Self-explanatory. :dd
{Fix ave/histo inputs are not all global, peratom, or local} :dt
All inputs in a single fix ave/histo command must be of the
same style. :dd
{Fix ave/histo/weight value and weight vector lengths do not match} :dt
Self-explanatory. :dd
{Fix ave/spatial compute does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/spatial compute does not calculate a per-atom vector} :dt
A compute used by fix ave/spatial must generate per-atom values. :dd
{Fix ave/spatial compute does not calculate per-atom values} :dt
A compute used by fix ave/spatial must generate per-atom values. :dd
{Fix ave/spatial compute vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/spatial fix does not calculate a per-atom array} :dt
Self-explanatory. :dd
{Fix ave/spatial fix does not calculate a per-atom vector} :dt
A fix used by fix ave/spatial must generate per-atom values. :dd
{Fix ave/spatial fix does not calculate per-atom values} :dt
A fix used by fix ave/spatial must generate per-atom values. :dd
{Fix ave/spatial fix vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/spatial for triclinic boxes requires units reduced} :dt
Self-explanatory. :dd
{Fix ave/spatial settings invalid with changing box size} :dt
If the box size changes, only the units reduced option can be
used. :dd
{Fix ave/spatial variable is not atom-style variable} :dt
A variable used by fix ave/spatial must generate per-atom values. :dd
{Fix ave/time cannot set output array intensive/extensive from these inputs} :dt
One of more of the vector inputs has individual elements which are
flagged as intensive or extensive. Such an input cannot be flagged as
all intensive/extensive when turned into an array by fix ave/time. :dd
{Fix ave/time cannot use variable with vector mode} :dt
Variables produce scalar values. :dd
{Fix ave/time columns are inconsistent lengths} :dt
Self-explanatory. :dd
{Fix ave/time compute array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Fix ave/time compute does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix ave/time compute does not calculate a vector} :dt
Self-explanatory. :dd
{Fix ave/time compute does not calculate an array} :dt
Self-explanatory. :dd
{Fix ave/time compute vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/time fix array cannot be variable length} :dt
Self-explanatory. :dd
{Fix ave/time fix array is accessed out-of-range} :dt
An index for the array is out of bounds. :dd
{Fix ave/time fix does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix ave/time fix does not calculate a vector} :dt
Self-explanatory. :dd
{Fix ave/time fix does not calculate an array} :dt
Self-explanatory. :dd
{Fix ave/time fix vector cannot be variable length} :dt
Self-explanatory. :dd
{Fix ave/time fix vector is accessed out-of-range} :dt
The index for the vector is out of bounds. :dd
{Fix ave/time variable is not equal-style variable} :dt
Self-explanatory. :dd
{Fix balance rcb cannot be used with comm_style brick} :dt
Comm_style tiled must be used instead. :dd
{Fix balance shift string is invalid} :dt
The string can only contain the characters "x", "y", or "z". :dd
{Fix bond/break needs ghost atoms from further away} :dt
This is because the fix needs to walk bonds to a certain distance to
acquire needed info, The comm_modify cutoff command can be used to
extend the communication range. :dd
{Fix bond/create angle type is invalid} :dt
Self-explanatory. :dd
{Fix bond/create cutoff is longer than pairwise cutoff} :dt
This is not allowed because bond creation is done using the
pairwise neighbor list. :dd
{Fix bond/create dihedral type is invalid} :dt
Self-explanatory. :dd
{Fix bond/create improper type is invalid} :dt
Self-explanatory. :dd
{Fix bond/create induced too many angles/dihedrals/impropers per atom} :dt
See the read_data command for info on setting the "extra angle per
atom", etc header values to allow for additional angles, etc to be
formed. :dd
{Fix bond/create needs ghost atoms from further away} :dt
This is because the fix needs to walk bonds to a certain distance to
acquire needed info, The comm_modify cutoff command can be used to
extend the communication range. :dd
{Fix bond/swap cannot use dihedral or improper styles} :dt
These styles cannot be defined when using this fix. :dd
{Fix bond/swap requires pair and bond styles} :dt
Self-explanatory. :dd
{Fix bond/swap requires special_bonds = 0,1,1} :dt
Self-explanatory. :dd
{Fix box/relax generated negative box length} :dt
The pressure being applied is likely too large. Try applying
it incrementally, to build to the high pressure. :dd
{Fix command before simulation box is defined} :dt
The fix command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Fix deform cannot use yz variable with xy} :dt
The yz setting cannot be a variable if xy deformation is also
specified. This is because LAMMPS cannot determine if the yz setting
will induce a box flip which would be invalid if xy is also changing. :dd
{Fix deform is changing yz too much with xy} :dt
When both yz and xy are changing, it induces changes in xz if the
box must flip from one tilt extreme to another. Thus it is not
allowed for yz to grow so much that a flip is induced. :dd
{Fix deform tilt factors require triclinic box} :dt
Cannot deform the tilt factors of a simulation box unless it
is a triclinic (non-orthogonal) box. :dd
{Fix deform volume setting is invalid} :dt
Cannot use volume style unless other dimensions are being controlled. :dd
{Fix deposit and fix rigid/small not using same molecule template ID} :dt
Self-explanatory. :dd
{Fix deposit and fix shake not using same molecule template ID} :dt
Self-explanatory. :dd
{Fix deposit molecule must have atom types} :dt
The defined molecule does not specify atom types. :dd
{Fix deposit molecule must have coordinates} :dt
The defined molecule does not specify coordinates. :dd
{Fix deposit molecule template ID must be same as atom_style template ID} :dt
When using atom_style template, you cannot deposit molecules that are
not in that template. :dd
{Fix deposit region cannot be dynamic} :dt
Only static regions can be used with fix deposit. :dd
{Fix deposit region does not support a bounding box} :dt
Not all regions represent bounded volumes. You cannot use
such a region with the fix deposit command. :dd
{Fix deposit shake fix does not exist} :dt
Self-explanatory. :dd
{Fix efield requires atom attribute q or mu} :dt
The atom style defined does not have this attribute. :dd
{Fix efield with dipoles cannot use atom-style variables} :dt
This option is not supported. :dd
{Fix evaporate molecule requires atom attribute molecule} :dt
The atom style being used does not define a molecule ID. :dd
{Fix external callback function not set} :dt
This must be done by an external program in order to use this fix. :dd
{Fix for fix ave/atom not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/atom is
requesting a value on a non-allowed timestep. :dd
{Fix for fix ave/chunk not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/chunk is
requesting a value on a non-allowed timestep. :dd
{Fix for fix ave/correlate not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/correlate
is requesting a value on a non-allowed timestep. :dd
{Fix for fix ave/histo not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/histo is
requesting a value on a non-allowed timestep. :dd
{Fix for fix ave/spatial not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/spatial is
requesting a value on a non-allowed timestep. :dd
{Fix for fix ave/time not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix ave/time
is requesting a value on a non-allowed timestep. :dd
{Fix for fix store/state not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix store/state
is requesting a value on a non-allowed timestep. :dd
{Fix for fix vector not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Fix vector is
requesting a value on a non-allowed timestep. :dd
{Fix freeze requires atom attribute torque} :dt
The atom style defined does not have this attribute. :dd
{Fix gcmc and fix shake not using same molecule template ID} :dt
Self-explanatory. :dd
{Fix gcmc atom has charge, but atom style does not} :dt
Self-explanatory. :dd
{Fix gcmc cannot exchange individual atoms belonging to a molecule} :dt
This is an error since you should not delete only one atom of a
molecule. The user has specified atomic (non-molecular) gas
exchanges, but an atom belonging to a molecule could be deleted. :dd
{Fix gcmc does not (yet) work with atom_style template} :dt
Self-explanatory. :dd
{Fix gcmc molecule command requires that atoms have molecule attributes} :dt
Should not choose the gcmc molecule feature if no molecules are being
simulated. The general molecule flag is off, but gcmc's molecule flag
is on. :dd
{Fix gcmc molecule has charges, but atom style does not} :dt
Self-explanatory. :dd
{Fix gcmc molecule must have atom types} :dt
The defined molecule does not specify atom types. :dd
{Fix gcmc molecule must have coordinates} :dt
The defined molecule does not specify coordinates. :dd
{Fix gcmc molecule template ID must be same as atom_style template ID} :dt
When using atom_style template, you cannot insert molecules that are
not in that template. :dd
{Fix gcmc put atom outside box} :dt
This should not normally happen. Contact the developers. :dd
{Fix gcmc ran out of available atom IDs} :dt
See the setting for tagint in the src/lmptype.h file. :dd
{Fix gcmc ran out of available molecule IDs} :dt
See the setting for tagint in the src/lmptype.h file. :dd
{Fix gcmc region cannot be dynamic} :dt
Only static regions can be used with fix gcmc. :dd
{Fix gcmc region does not support a bounding box} :dt
Not all regions represent bounded volumes. You cannot use
such a region with the fix gcmc command. :dd
{Fix gcmc region extends outside simulation box} :dt
Self-explanatory. :dd
{Fix gcmc shake fix does not exist} :dt
Self-explanatory. :dd
{Fix gld c coefficients must be >= 0} :dt
Self-explanatory. :dd
{Fix gld needs more prony series coefficients} :dt
Self-explanatory. :dd
{Fix gld prony terms must be > 0} :dt
Self-explanatory. :dd
{Fix gld series type must be pprony for now} :dt
Self-explanatory. :dd
{Fix gld start temperature must be >= 0} :dt
Self-explanatory. :dd
{Fix gld stop temperature must be >= 0} :dt
Self-explanatory. :dd
{Fix gld tau coefficients must be > 0} :dt
Self-explanatory. :dd
{Fix heat group has no atoms} :dt
Self-explanatory. :dd
{Fix heat kinetic energy of an atom went negative} :dt
This will cause the velocity rescaling about to be performed by fix
heat to be invalid. :dd
{Fix heat kinetic energy went negative} :dt
This will cause the velocity rescaling about to be performed by fix
heat to be invalid. :dd
{Fix in variable not computed at compatible time} :dt
Fixes generate their values on specific timesteps. The variable is
requesting the values on a non-allowed timestep. :dd
{Fix langevin angmom is not yet implemented with kokkos} :dt
This option is not yet available. :dd
{Fix langevin angmom requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Fix langevin angmom requires extended particles} :dt
This fix option cannot be used with point particles. :dd
{Fix langevin omega is not yet implemented with kokkos} :dt
This option is not yet available. :dd
{Fix langevin omega requires atom style sphere} :dt
Self-explanatory. :dd
{Fix langevin omega requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Fix langevin period must be > 0.0} :dt
The time window for temperature relaxation must be > 0 :dd
{Fix langevin variable returned negative temperature} :dt
Self-explanatory. :dd
{Fix momentum group has no atoms} :dt
Self-explanatory. :dd
{Fix move cannot define z or vz variable for 2d problem} :dt
Self-explanatory. :dd
{Fix move cannot rotate aroung non z-axis for 2d problem} :dt
Self-explanatory. :dd
{Fix move cannot set linear z motion for 2d problem} :dt
Self-explanatory. :dd
{Fix move cannot set wiggle z motion for 2d problem} :dt
Self-explanatory. :dd
{Fix msst compute ID does not compute potential energy} :dt
Self-explanatory. :dd
{Fix msst compute ID does not compute pressure} :dt
Self-explanatory. :dd
{Fix msst compute ID does not compute temperature} :dt
Self-explanatory. :dd
{Fix msst requires a periodic box} :dt
Self-explanatory. :dd
{Fix msst tscale must satisfy 0 <= tscale < 1} :dt
Self-explanatory. :dd
{Fix npt/nph has tilted box too far in one step - periodic cell is too far from equilibrium state} :dt
Self-explanatory. The change in the box tilt is too extreme
on a short timescale. :dd
{Fix nve/asphere requires extended particles} :dt
This fix can only be used for particles with a shape setting. :dd
{Fix nve/asphere/noforce requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Fix nve/asphere/noforce requires extended particles} :dt
One of the particles is not an ellipsoid. :dd
{Fix nve/body requires atom style body} :dt
Self-explanatory. :dd
{Fix nve/body requires bodies} :dt
This fix can only be used for particles that are bodies. :dd
{Fix nve/line can only be used for 2d simulations} :dt
Self-explanatory. :dd
{Fix nve/line requires atom style line} :dt
Self-explanatory. :dd
{Fix nve/line requires line particles} :dt
Self-explanatory. :dd
{Fix nve/sphere dipole requires atom attribute mu} :dt
An atom style with this attribute is needed. :dd
{Fix nve/sphere requires atom style sphere} :dt
Self-explanatory. :dd
{Fix nve/sphere requires extended particles} :dt
This fix can only be used for particles of a finite size. :dd
{Fix nve/tri can only be used for 3d simulations} :dt
Self-explanatory. :dd
{Fix nve/tri requires atom style tri} :dt
Self-explanatory. :dd
{Fix nve/tri requires tri particles} :dt
Self-explanatory. :dd
{Fix nvt/nph/npt asphere requires extended particles} :dt
The shape setting for a particle in the fix group has shape = 0.0,
which means it is a point particle. :dd
{Fix nvt/nph/npt body requires bodies} :dt
Self-explanatory. :dd
{Fix nvt/nph/npt sphere requires atom style sphere} :dt
Self-explanatory. :dd
{Fix nvt/npt/nph damping parameters must be > 0.0} :dt
Self-explanatory. :dd
{Fix nvt/npt/nph dilate group ID does not exist} :dt
Self-explanatory. :dd
{Fix nvt/sphere requires extended particles} :dt
This fix can only be used for particles of a finite size. :dd
{Fix orient/fcc file open failed} :dt
The fix orient/fcc command could not open a specified file. :dd
{Fix orient/fcc file read failed} :dt
The fix orient/fcc command could not read the needed parameters from a
specified file. :dd
{Fix orient/fcc found self twice} :dt
The neighbor lists used by fix orient/fcc are messed up. If this
error occurs, it is likely a bug, so send an email to the
"developers"_http://lammps.sandia.gov/authors.html. :dd
{Fix peri neigh does not exist} :dt
Somehow a fix that the pair style defines has been deleted. :dd
{Fix pour and fix rigid/small not using same molecule template ID} :dt
Self-explanatory. :dd
{Fix pour and fix shake not using same molecule template ID} :dt
Self-explanatory. :dd
{Fix pour insertion count per timestep is 0} :dt
Self-explanatory. :dd
{Fix pour molecule must have atom types} :dt
The defined molecule does not specify atom types. :dd
{Fix pour molecule must have coordinates} :dt
The defined molecule does not specify coordinates. :dd
{Fix pour molecule template ID must be same as atom style template ID} :dt
When using atom_style template, you cannot pour molecules that are
not in that template. :dd
{Fix pour polydisperse fractions do not sum to 1.0} :dt
Self-explanatory. :dd
{Fix pour region ID does not exist} :dt
Self-explanatory. :dd
{Fix pour region cannot be dynamic} :dt
Only static regions can be used with fix pour. :dd
{Fix pour region does not support a bounding box} :dt
Not all regions represent bounded volumes. You cannot use
such a region with the fix pour command. :dd
{Fix pour requires atom attributes radius, rmass} :dt
The atom style defined does not have these attributes. :dd
{Fix pour rigid fix does not exist} :dt
Self-explanatory. :dd
{Fix pour shake fix does not exist} :dt
Self-explanatory. :dd
{Fix press/berendsen damping parameters must be > 0.0} :dt
Self-explanatory. :dd
{Fix property/atom cannot specify mol twice} :dt
Self-explanatory. :dd
{Fix property/atom cannot specify q twice} :dt
Self-explanatory. :dd
{Fix property/atom mol when atom_style already has molecule attribute} :dt
Self-explanatory. :dd
{Fix property/atom q when atom_style already has charge attribute} :dt
Self-explanatory. :dd
{Fix property/atom vector name already exists} :dt
The name for an integer or floating-point vector must be unique. :dd
{Fix qeq has negative upper Taper radius cutoff} :dt
Self-explanatory. :dd
{Fix qeq/comb group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/comb requires atom attribute q} :dt
An atom style with charge must be used to perform charge equilibration. :dd
{Fix qeq/dynamic group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/dynamic requires atom attribute q} :dt
Self-explanatory. :dd
{Fix qeq/fire group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/fire requires atom attribute q} :dt
Self-explanatory. :dd
{Fix qeq/point group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/point has insufficient QEq matrix size} :dt
Occurs when number of neighbor atoms for an atom increased too much
during a run. Increase SAFE_ZONE and MIN_CAP in fix_qeq.h and
recompile. :dd
{Fix qeq/point requires atom attribute q} :dt
Self-explanatory. :dd
{Fix qeq/shielded group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/shielded has insufficient QEq matrix size} :dt
Occurs when number of neighbor atoms for an atom increased too much
during a run. Increase SAFE_ZONE and MIN_CAP in fix_qeq.h and
recompile. :dd
{Fix qeq/shielded requires atom attribute q} :dt
Self-explanatory. :dd
{Fix qeq/slater could not extract params from pair coul/streitz} :dt
This should not happen unless pair coul/streitz has been altered. :dd
{Fix qeq/slater group has no atoms} :dt
Self-explanatory. :dd
{Fix qeq/slater has insufficient QEq matrix size} :dt
Occurs when number of neighbor atoms for an atom increased too much
during a run. Increase SAFE_ZONE and MIN_CAP in fix_qeq.h and
recompile. :dd
{Fix qeq/slater requires atom attribute q} :dt
Self-explanatory. :dd
{Fix reax/bonds numbonds > nsbmax_most} :dt
The limit of the number of bonds expected by the ReaxFF force field
was exceeded. :dd
{Fix recenter group has no atoms} :dt
Self-explanatory. :dd
{Fix restrain requires an atom map, see atom_modify} :dt
Self-explanatory. :dd
{Fix rigid atom has non-zero image flag in a non-periodic dimension} :dt
Image flags for non-periodic dimensions should not be set. :dd
{Fix rigid file has no lines} :dt
Self-explanatory. :dd
{Fix rigid langevin period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid molecule requires atom attribute molecule} :dt
Self-explanatory. :dd
{Fix rigid npt/nph dilate group ID does not exist} :dt
Self-explanatory. :dd
{Fix rigid npt/nph does not yet allow triclinic box} :dt
This is a current restriction in LAMMPS. :dd
{Fix rigid npt/nph period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid npt/small t_chain should not be less than 1} :dt
Self-explanatory. :dd
{Fix rigid npt/small t_order must be 3 or 5} :dt
Self-explanatory. :dd
{Fix rigid nvt/npt/nph damping parameters must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid nvt/small t_chain should not be less than 1} :dt
Self-explanatory. :dd
{Fix rigid nvt/small t_iter should not be less than 1} :dt
Self-explanatory. :dd
{Fix rigid nvt/small t_order must be 3 or 5} :dt
Self-explanatory. :dd
{Fix rigid xy torque cannot be on for 2d simulation} :dt
Self-explanatory. :dd
{Fix rigid z force cannot be on for 2d simulation} :dt
Self-explanatory. :dd
{Fix rigid/npt period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/npt temperature order must be 3 or 5} :dt
Self-explanatory. :dd
{Fix rigid/npt/small period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/nvt period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/nvt temperature order must be 3 or 5} :dt
Self-explanatory. :dd
{Fix rigid/nvt/small period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/small atom has non-zero image flag in a non-periodic dimension} :dt
Image flags for non-periodic dimensions should not be set. :dd
{Fix rigid/small langevin period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/small molecule must have atom types} :dt
The defined molecule does not specify atom types. :dd
{Fix rigid/small molecule must have coordinates} :dt
The defined molecule does not specify coordinates. :dd
{Fix rigid/small npt/nph period must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/small nvt/npt/nph damping parameters must be > 0.0} :dt
Self-explanatory. :dd
{Fix rigid/small nvt/npt/nph dilate group ID does not exist} :dt
Self-explanatory. :dd
{Fix rigid/small requires an atom map, see atom_modify} :dt
Self-explanatory. :dd
{Fix rigid/small requires atom attribute molecule} :dt
Self-explanatory. :dd
{Fix rigid: Bad principal moments} :dt
The principal moments of inertia computed for a rigid body
are not within the required tolerances. :dd
{Fix shake cannot be used with minimization} :dt
Cannot use fix shake while doing an energy minimization since
it turns off bonds that should contribute to the energy. :dd
{Fix shake molecule template must have shake info} :dt
The defined molecule does not specify SHAKE information. :dd
{Fix spring couple group ID does not exist} :dt
Self-explanatory. :dd
{Fix srd can only currently be used with comm_style brick} :dt
This is a current restriction in LAMMPS. :dd
{Fix srd lamda must be >= 0.6 of SRD grid size} :dt
This is a requirement for accuracy reasons. :dd
{Fix srd no-slip requires atom attribute torque} :dt
This is because the SRD collisions will impart torque to the solute
particles. :dd
{Fix srd requires SRD particles all have same mass} :dt
Self-explanatory. :dd
{Fix srd requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Fix srd requires newton pair on} :dt
Self-explanatory. :dd
{Fix store/state compute array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix store/state compute does not calculate a per-atom array} :dt
The compute calculates a per-atom vector. :dd
{Fix store/state compute does not calculate a per-atom vector} :dt
The compute calculates a per-atom vector. :dd
{Fix store/state compute does not calculate per-atom values} :dt
Computes that calculate global or local quantities cannot be used
with fix store/state. :dd
{Fix store/state fix array is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix store/state fix does not calculate a per-atom array} :dt
The fix calculates a per-atom vector. :dd
{Fix store/state fix does not calculate a per-atom vector} :dt
The fix calculates a per-atom array. :dd
{Fix store/state fix does not calculate per-atom values} :dt
Fixes that calculate global or local quantities cannot be used with
fix store/state. :dd
{Fix store/state for atom property that isn't allocated} :dt
Self-explanatory. :dd
{Fix store/state variable is not atom-style variable} :dt
Only atom-style variables calculate per-atom quantities. :dd
{Fix temp/berendsen period must be > 0.0} :dt
Self-explanatory. :dd
{Fix temp/berendsen variable returned negative temperature} :dt
Self-explanatory. :dd
{Fix temp/csld is not compatible with fix rattle or fix shake} :dt
These two commands cannot currently be used together with fix temp/csld. :dd
{Fix temp/csld variable returned negative temperature} :dt
Self-explanatory. :dd
{Fix temp/csvr variable returned negative temperature} :dt
Self-explanatory. :dd
{Fix temp/rescale variable returned negative temperature} :dt
Self-explanatory. :dd
{Fix tfmc displacement length must be > 0} :dt
Self-explanatory. :dd
{Fix tfmc is not compatible with fix shake} :dt
These two commands cannot currently be used together. :dd
{Fix tfmc temperature must be > 0} :dt
Self-explanatory. :dd
{Fix thermal/conductivity swap value must be positive} :dt
Self-explanatory. :dd
{Fix tmd must come after integration fixes} :dt
Any fix tmd command must appear in the input script after all time
integration fixes (nve, nvt, npt). See the fix tmd documentation for
details. :dd
{Fix ttm electron temperatures must be > 0.0} :dt
Self-explanatory. :dd
{Fix ttm electronic_density must be > 0.0} :dt
Self-explanatory. :dd
{Fix ttm electronic_specific_heat must be > 0.0} :dt
Self-explanatory. :dd
{Fix ttm electronic_thermal_conductivity must be >= 0.0} :dt
Self-explanatory. :dd
{Fix ttm gamma_p must be > 0.0} :dt
Self-explanatory. :dd
{Fix ttm gamma_s must be >= 0.0} :dt
Self-explanatory. :dd
{Fix ttm number of nodes must be > 0} :dt
Self-explanatory. :dd
{Fix ttm v_0 must be >= 0.0} :dt
Self-explanatory. :dd
{Fix used in compute chunk/atom not computed at compatible time} :dt
The chunk/atom compute cannot query the output of the fix on a timestep
it is needed. :dd
{Fix used in compute reduce not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Compute reduce is
requesting a value on a non-allowed timestep. :dd
{Fix used in compute slice not computed at compatible time} :dt
Fixes generate their values on specific timesteps. Compute slice is
requesting a value on a non-allowed timestep. :dd
{Fix vector cannot set output array intensive/extensive from these inputs} :dt
The inputs to the command have conflicting intensive/extensive attributes.
You need to use more than one fix vector command. :dd
{Fix vector compute does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix vector compute does not calculate a vector} :dt
Self-explanatory. :dd
{Fix vector compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix vector fix does not calculate a scalar} :dt
Self-explanatory. :dd
{Fix vector fix does not calculate a vector} :dt
Self-explanatory. :dd
{Fix vector fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Fix vector variable is not equal-style variable} :dt
Self-explanatory. :dd
{Fix viscosity swap value must be positive} :dt
Self-explanatory. :dd
{Fix viscosity vtarget value must be positive} :dt
Self-explanatory. :dd
{Fix wall cutoff <= 0.0} :dt
Self-explanatory. :dd
{Fix wall/colloid requires atom style sphere} :dt
Self-explanatory. :dd
{Fix wall/colloid requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Fix wall/gran is incompatible with Pair style} :dt
Must use a granular pair style to define the parameters needed for
this fix. :dd
{Fix wall/gran requires atom style sphere} :dt
Self-explanatory. :dd
{Fix wall/piston command only available at zlo} :dt
The face keyword must be zlo. :dd
{Fix wall/region colloid requires atom style sphere} :dt
Self-explanatory. :dd
{Fix wall/region colloid requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Fix wall/region cutoff <= 0.0} :dt
Self-explanatory. :dd
{Fix_modify pressure ID does not compute pressure} :dt
The compute ID assigned to the fix must compute pressure. :dd
{Fix_modify temperature ID does not compute temperature} :dt
The compute ID assigned to the fix must compute temperature. :dd
{For triclinic deformation, specified target stress must be hydrostatic} :dt
Triclinic pressure control is allowed using the tri keyword, but
non-hydrostatic pressure control can not be used in this case. :dd
{Found no restart file matching pattern} :dt
When using a "*" in the restart file name, no matching file was found. :dd
{GPU library not compiled for this accelerator} :dt
Self-explanatory. :dd
{GPU package does not (yet) work with atom_style template} :dt
Self-explanatory. :dd
{GPU particle split must be set to 1 for this pair style.} :dt
For this pair style, you cannot run part of the force calculation on
the host. See the package command. :dd
{GPU split param must be positive for hybrid pair styles} :dt
See the package gpu command. :dd
{GPUs are requested but Kokkos has not been compiled for CUDA} :dt
Recompile Kokkos with CUDA support to use GPUs. :dd
{Ghost velocity forward comm not yet implemented with Kokkos} :dt
This is a current restriction. :dd
{Gmask function in equal-style variable formula} :dt
Gmask is per-atom operation. :dd
{Gravity changed since fix pour was created} :dt
The gravity vector defined by fix gravity must be static. :dd
{Gravity must point in -y to use with fix pour in 2d} :dt
Self-explanatory. :dd
{Gravity must point in -z to use with fix pour in 3d} :dt
Self-explanatory. :dd
{Grmask function in equal-style variable formula} :dt
Grmask is per-atom operation. :dd
{Group ID does not exist} :dt
A group ID used in the group command does not exist. :dd
{Group ID in variable formula does not exist} :dt
Self-explanatory. :dd
{Group all cannot be made dynamic} :dt
This operation is not allowed. :dd
{Group command before simulation box is defined} :dt
The group command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Group dynamic cannot reference itself} :dt
Self-explanatory. :dd
{Group dynamic parent group cannot be dynamic} :dt
Self-explanatory. :dd
{Group dynamic parent group does not exist} :dt
Self-explanatory. :dd
{Group region ID does not exist} :dt
A region ID used in the group command does not exist. :dd
{If read_dump purges it cannot replace or trim} :dt
These operations are not compatible. See the read_dump doc
page for details. :dd
{Illegal ... command} :dt
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line. :dd
{Illegal COMB parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal COMB3 parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal Stillinger-Weber parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal Tersoff parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal Vashishta parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal compute voronoi/atom command (occupation and (surface or edges))} :dt
Self-explanatory. :dd
{Illegal coul/streitz parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal dump_modify sfactor value (must be > 0.0)} :dt
Self-explanatory. :dd
{Illegal dump_modify tfactor value (must be > 0.0)} :dt
Self-explanatory. :dd
{Illegal fix gcmc gas mass <= 0} :dt
The computed mass of the designated gas molecule or atom type was less
than or equal to zero. :dd
{Illegal fix tfmc random seed} :dt
Seeds can only be nonzero positive integers. :dd
{Illegal fix wall/piston velocity} :dt
The piston velocity must be positive. :dd
{Illegal integrate style} :dt
Self-explanatory. :dd
{Illegal nb3b/harmonic parameter} :dt
One or more of the coefficients defined in the potential file is
invalid. :dd
{Illegal number of angle table entries} :dt
There must be at least 2 table entries. :dd
{Illegal number of bond table entries} :dt
There must be at least 2 table entries. :dd
{Illegal number of pair table entries} :dt
There must be at least 2 table entries. :dd
{Illegal or unset periodicity in restart} :dt
This error should not normally occur unless the restart file is invalid. :dd
{Illegal range increment value} :dt
The increment must be >= 1. :dd
{Illegal simulation box} :dt
The lower bound of the simulation box is greater than the upper bound. :dd
{Illegal size double vector read requested} :dt
This error should not normally occur unless the restart file is invalid. :dd
{Illegal size integer vector read requested} :dt
This error should not normally occur unless the restart file is invalid. :dd
{Illegal size string or corrupt restart} :dt
This error should not normally occur unless the restart file is invalid. :dd
{Imageint setting in lmptype.h is invalid} :dt
Imageint must be as large or larger than smallint. :dd
{Imageint setting in lmptype.h is not compatible} :dt
Format of imageint stored in restart file is not consistent with
LAMMPS version you are running. See the settings in src/lmptype.h :dd
{Improper atom missing in delete_bonds} :dt
The delete_bonds command cannot find one or more atoms in a particular
improper on a particular processor. The pairwise cutoff is too short
or the atoms are too far apart to make a valid improper. :dd
{Improper atom missing in set command} :dt
The set command cannot find one or more atoms in a particular improper
on a particular processor. The pairwise cutoff is too short or the
atoms are too far apart to make a valid improper. :dd
{Improper atoms %d %d %d %d missing on proc %d at step %ld} :dt
One or more of 4 atoms needed to compute a particular improper are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the improper has blown apart and an atom is
too far away. :dd
{Improper atoms missing on proc %d at step %ld} :dt
One or more of 4 atoms needed to compute a particular improper are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the improper has blown apart and an atom is
too far away. :dd
{Improper coeff for hybrid has invalid style} :dt
Improper style hybrid uses another improper style as one of its
coefficients. The improper style used in the improper_coeff command
or read from a restart file is not recognized. :dd
{Improper coeffs are not set} :dt
No improper coefficients have been assigned in the data file or via
the improper_coeff command. :dd
{Improper style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Improper style hybrid cannot have none as an argument} :dt
Self-explanatory. :dd
{Improper style hybrid cannot use same improper style twice} :dt
Self-explanatory. :dd
{Improper_coeff command before improper_style is defined} :dt
Coefficients cannot be set in the data file or via the improper_coeff
command until an improper_style has been assigned. :dd
{Improper_coeff command before simulation box is defined} :dt
The improper_coeff command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Improper_coeff command when no impropers allowed} :dt
The chosen atom style does not allow for impropers to be defined. :dd
{Improper_style command when no impropers allowed} :dt
The chosen atom style does not allow for impropers to be defined. :dd
{Impropers assigned incorrectly} :dt
Impropers read in from the data file were not assigned correctly to
atoms. This means there is something invalid about the topology
definitions. :dd
{Impropers defined but no improper types} :dt
The data file header lists improper but no improper types. :dd
{Incomplete use of variables in create_atoms command} :dt
The var and set options must be used together. :dd
{Inconsistent iparam/jparam values in fix bond/create command} :dt
If itype and jtype are the same, then their maxbond and newtype
settings must also be the same. :dd
{Inconsistent line segment in data file} :dt
The end points of the line segment are not equal distances from the
center point which is the atom coordinate. :dd
{Inconsistent triangle in data file} :dt
The centroid of the triangle as defined by the corner points is not
the atom coordinate. :dd
{Inconsistent use of finite-size particles by molecule template molecules} :dt
Not all of the molecules define a radius for their constituent
particles. :dd
{Incorrect # of floating-point values in Bodies section of data file} :dt
See doc page for body style. :dd
{Incorrect # of integer values in Bodies section of data file} :dt
See doc page for body style. :dd
{Incorrect %s format in data file} :dt
A section of the data file being read by fix property/atom does
not have the correct number of values per line. :dd
{Incorrect SNAP parameter file} :dt
The file cannot be parsed correctly, check its internal syntax. :dd
{Incorrect args for angle coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect args for bond coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect args for dihedral coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect args for improper coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect args for pair coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect args in pair_style command} :dt
Self-explanatory. :dd
{Incorrect atom format in data file} :dt
Number of values per atom line in the data file is not consistent with
the atom style. :dd
{Incorrect atom format in neb file} :dt
The number of fields per line is not what expected. :dd
{Incorrect bonus data format in data file} :dt
See the read_data doc page for a description of how various kinds of
bonus data must be formatted for certain atom styles. :dd
{Incorrect boundaries with slab Ewald} :dt
Must have periodic x,y dimensions and non-periodic z dimension to use
2d slab option with Ewald. :dd
{Incorrect boundaries with slab EwaldDisp} :dt
Must have periodic x,y dimensions and non-periodic z dimension to use
2d slab option with Ewald. :dd
{Incorrect boundaries with slab PPPM} :dt
Must have periodic x,y dimensions and non-periodic z dimension to use
2d slab option with PPPM. :dd
{Incorrect boundaries with slab PPPMDisp} :dt
Must have periodic x,y dimensions and non-periodic z dimension to use
2d slab option with pppm/disp. :dd
{Incorrect element names in ADP potential file} :dt
The element names in the ADP file do not match those requested. :dd
{Incorrect element names in EAM potential file} :dt
The element names in the EAM file do not match those requested. :dd
{Incorrect format in COMB potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in COMB3 potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in MEAM potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in SNAP coefficient file} :dt
Incorrect number of words per line in the coefficient file. :dd
{Incorrect format in SNAP parameter file} :dt
Incorrect number of words per line in the parameter file. :dd
{Incorrect format in Stillinger-Weber potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in TMD target file} :dt
Format of file read by fix tmd command is incorrect. :dd
{Incorrect format in Tersoff potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in Vashishta potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in coul/streitz potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect format in nb3b/harmonic potential file} :dt
Incorrect number of words per line in the potential file. :dd
{Incorrect integer value in Bodies section of data file} :dt
See doc page for body style. :dd
{Incorrect multiplicity arg for dihedral coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect number of elements in potential file} :dt
Self-explanatory. :dd
{Incorrect rigid body format in fix rigid file} :dt
The number of fields per line is not what expected. :dd
{Incorrect rigid body format in fix rigid/small file} :dt
The number of fields per line is not what expected. :dd
{Incorrect sign arg for dihedral coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Incorrect table format check for element types} :dt
Self-explanatory. :dd
{Incorrect velocity format in data file} :dt
Each atom style defines a format for the Velocity section
of the data file. The read-in lines do not match. :dd
{Incorrect weight arg for dihedral coefficients} :dt
Self-explanatory. Check the input script or data file. :dd
{Index between variable brackets must be positive} :dt
Self-explanatory. :dd
{Indexed per-atom vector in variable formula without atom map} :dt
Accessing a value from an atom vector requires the ability to lookup
an atom index, which is provided by an atom map. An atom map does not
exist (by default) for non-molecular problems. Using the atom_modify
map command will force an atom map to be created. :dd
{Initial temperatures not all set in fix ttm} :dt
Self-explanatory. :dd
{Input line quote not followed by whitespace} :dt
An end quote must be followed by whitespace. :dd
{Insertion region extends outside simulation box} :dt
Self-explanatory. :dd
{Insufficient Jacobi rotations for POEMS body} :dt
Eigensolve for rigid body was not sufficiently accurate. :dd
{Insufficient Jacobi rotations for body nparticle} :dt
Eigensolve for rigid body was not sufficiently accurate. :dd
{Insufficient Jacobi rotations for rigid body} :dt
Eigensolve for rigid body was not sufficiently accurate. :dd
{Insufficient Jacobi rotations for rigid molecule} :dt
Eigensolve for rigid body was not sufficiently accurate. :dd
{Insufficient Jacobi rotations for triangle} :dt
The calculation of the inertia tensor of the triangle failed. This
should not happen if it is a reasonably shaped triangle. :dd
{Insufficient memory on accelerator} :dt
There is insufficient memory on one of the devices specified for the gpu
package :dd
{Internal error in atom_style body} :dt
This error should not occur. Contact the developers. :dd
{Invalid -reorder N value} :dt
Self-explanatory. :dd
{Invalid Angles section in molecule file} :dt
Self-explanatory. :dd
{Invalid Bonds section in molecule file} :dt
Self-explanatory. :dd
{Invalid Boolean syntax in if command} :dt
Self-explanatory. :dd
{Invalid Charges section in molecule file} :dt
Self-explanatory. :dd
{Invalid Coords section in molecule file} :dt
Self-explanatory. :dd
{Invalid Diameters section in molecule file} :dt
Self-explanatory. :dd
{Invalid Dihedrals section in molecule file} :dt
Self-explanatory. :dd
{Invalid Impropers section in molecule file} :dt
Self-explanatory. :dd
{Invalid Kokkos command-line args} :dt
Self-explanatory. See Section 2.7 of the manual for details. :dd
{Invalid LAMMPS restart file} :dt
The file does not appear to be a LAMMPS restart file since
it doesn't contain the correct magic string at the beginning. :dd
{Invalid Masses section in molecule file} :dt
Self-explanatory. :dd
{Invalid REAX atom type} :dt
There is a mis-match between LAMMPS atom types and the elements
listed in the ReaxFF force field file. :dd
{Invalid Special Bond Counts section in molecule file} :dt
Self-explanatory. :dd
{Invalid Types section in molecule file} :dt
Self-explanatory. :dd
{Invalid angle count in molecule file} :dt
Self-explanatory. :dd
{Invalid angle table length} :dt
Length must be 2 or greater. :dd
{Invalid angle type in Angles section of data file} :dt
Angle type must be positive integer and within range of specified angle
types. :dd
{Invalid angle type in Angles section of molecule file} :dt
Self-explanatory. :dd
{Invalid angle type index for fix shake} :dt
Self-explanatory. :dd
{Invalid args for non-hybrid pair coefficients} :dt
"NULL" is only supported in pair_coeff calls when using pair hybrid :dd
{Invalid argument to factorial %d} :dt
N must be >= 0 and <= 167, otherwise the factorial result is too
large. :dd
{Invalid atom ID in %s section of data file} :dt
An atom in a section of the data file being read by fix property/atom
has an invalid atom ID that is <= 0 or > the maximum existing atom ID. :dd
{Invalid atom ID in Angles section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Angles section of molecule file} :dt
Self-explanatory. :dd
{Invalid atom ID in Atoms section of data file} :dt
Atom IDs must be positive integers. :dd
{Invalid atom ID in Bodies section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Bonds section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Bonds section of molecule file} :dt
Self-explanatory. :dd
{Invalid atom ID in Bonus section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Dihedrals section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Impropers section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in Velocities section of data file} :dt
Atom IDs must be positive integers and within range of defined
atoms. :dd
{Invalid atom ID in dihedrals section of molecule file} :dt
Self-explanatory. :dd
{Invalid atom ID in impropers section of molecule file} :dt
Self-explanatory. :dd
{Invalid atom ID in variable file} :dt
Self-explanatory. :dd
{Invalid atom IDs in neb file} :dt
An ID in the file was not found in the system. :dd
{Invalid atom diameter in molecule file} :dt
Diameters must be >= 0.0. :dd
{Invalid atom mass for fix shake} :dt
Mass specified in fix shake command must be > 0.0. :dd
{Invalid atom mass in molecule file} :dt
Masses must be > 0.0. :dd
{Invalid atom type in Atoms section of data file} :dt
Atom types must range from 1 to specified # of types. :dd
{Invalid atom type in create_atoms command} :dt
The create_box command specified the range of valid atom types.
An invalid type is being requested. :dd
{Invalid atom type in create_atoms mol command} :dt
The atom types in the defined molecule are added to the value
specified in the create_atoms command, as an offset. The final value
for each atom must be between 1 to N, where N is the number of atom
types. :dd
{Invalid atom type in fix atom/swap command} :dt
The atom type specified in the atom/swap command does not exist. :dd
{Invalid atom type in fix bond/create command} :dt
Self-explanatory. :dd
{Invalid atom type in fix deposit command} :dt
Self-explanatory. :dd
{Invalid atom type in fix deposit mol command} :dt
The atom types in the defined molecule are added to the value
specified in the create_atoms command, as an offset. The final value
for each atom must be between 1 to N, where N is the number of atom
types. :dd
{Invalid atom type in fix gcmc command} :dt
The atom type specified in the gcmc command does not exist. :dd
{Invalid atom type in fix pour command} :dt
Self-explanatory. :dd
{Invalid atom type in fix pour mol command} :dt
The atom types in the defined molecule are added to the value
specified in the create_atoms command, as an offset. The final value
for each atom must be between 1 to N, where N is the number of atom
types. :dd
{Invalid atom type in molecule file} :dt
Atom types must range from 1 to specified # of types. :dd
{Invalid atom type in neighbor exclusion list} :dt
Atom types must range from 1 to Ntypes inclusive. :dd
{Invalid atom type index for fix shake} :dt
Atom types must range from 1 to Ntypes inclusive. :dd
{Invalid atom types in pair_write command} :dt
Atom types must range from 1 to Ntypes inclusive. :dd
{Invalid atom vector in variable formula} :dt
The atom vector is not recognized. :dd
{Invalid atom_style body command} :dt
No body style argument was provided. :dd
{Invalid atom_style command} :dt
Self-explanatory. :dd
{Invalid attribute in dump custom command} :dt
Self-explanatory. :dd
{Invalid attribute in dump local command} :dt
Self-explanatory. :dd
{Invalid attribute in dump modify command} :dt
Self-explanatory. :dd
{Invalid basis setting in create_atoms command} :dt
The basis index must be between 1 to N where N is the number of basis
atoms in the lattice. The type index must be between 1 to N where N
is the number of atom types. :dd
{Invalid basis setting in fix append/atoms command} :dt
The basis index must be between 1 to N where N is the number of basis
atoms in the lattice. The type index must be between 1 to N where N
is the number of atom types. :dd
{Invalid bin bounds in compute chunk/atom} :dt
The lo/hi values are inconsistent. :dd
{Invalid bin bounds in fix ave/spatial} :dt
The lo/hi values are inconsistent. :dd
{Invalid body nparticle command} :dt
Arguments in atom-style command are not correct. :dd
{Invalid bond count in molecule file} :dt
Self-explanatory. :dd
{Invalid bond table length} :dt
Length must be 2 or greater. :dd
{Invalid bond type in Bonds section of data file} :dt
Bond type must be positive integer and within range of specified bond
types. :dd
{Invalid bond type in Bonds section of molecule file} :dt
Self-explanatory. :dd
{Invalid bond type in create_bonds command} :dt
Self-explanatory. :dd
{Invalid bond type in fix bond/break command} :dt
Self-explanatory. :dd
{Invalid bond type in fix bond/create command} :dt
Self-explanatory. :dd
{Invalid bond type index for fix shake} :dt
Self-explanatory. Check the fix shake command in the input script. :dd
{Invalid coeffs for this dihedral style} :dt
Cannot set class 2 coeffs in data file for this dihedral style. :dd
{Invalid color in dump_modify command} :dt
The specified color name was not in the list of recognized colors.
See the dump_modify doc page. :dd
{Invalid color map min/max values} :dt
The min/max values are not consistent with either each other or
with values in the color map. :dd
{Invalid command-line argument} :dt
One or more command-line arguments is invalid. Check the syntax of
the command you are using to launch LAMMPS. :dd
{Invalid compute ID in variable formula} :dt
The compute is not recognized. :dd
{Invalid create_atoms rotation vector for 2d model} :dt
The rotation vector can only have a z component. :dd
{Invalid custom OpenCL parameter string.} :dt
There are not enough or too many parameters in the custom string for package
GPU. :dd
{Invalid cutoff in comm_modify command} :dt
Specified cutoff must be >= 0.0. :dd
{Invalid cutoffs in pair_write command} :dt
Inner cutoff must be larger than 0.0 and less than outer cutoff. :dd
{Invalid d1 or d2 value for pair colloid coeff} :dt
Neither d1 or d2 can be < 0. :dd
{Invalid data file section: Angle Coeffs} :dt
Atom style does not allow angles. :dd
{Invalid data file section: AngleAngle Coeffs} :dt
Atom style does not allow impropers. :dd
{Invalid data file section: AngleAngleTorsion Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: AngleTorsion Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Angles} :dt
Atom style does not allow angles. :dd
{Invalid data file section: Bodies} :dt
Atom style does not allow bodies. :dd
{Invalid data file section: Bond Coeffs} :dt
Atom style does not allow bonds. :dd
{Invalid data file section: BondAngle Coeffs} :dt
Atom style does not allow angles. :dd
{Invalid data file section: BondBond Coeffs} :dt
Atom style does not allow angles. :dd
{Invalid data file section: BondBond13 Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Bonds} :dt
Atom style does not allow bonds. :dd
{Invalid data file section: Dihedral Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Dihedrals} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Ellipsoids} :dt
Atom style does not allow ellipsoids. :dd
{Invalid data file section: EndBondTorsion Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Improper Coeffs} :dt
Atom style does not allow impropers. :dd
{Invalid data file section: Impropers} :dt
Atom style does not allow impropers. :dd
{Invalid data file section: Lines} :dt
Atom style does not allow lines. :dd
{Invalid data file section: MiddleBondTorsion Coeffs} :dt
Atom style does not allow dihedrals. :dd
{Invalid data file section: Triangles} :dt
Atom style does not allow triangles. :dd
{Invalid delta_conf in tad command} :dt
The value must be between 0 and 1 inclusive. :dd
{Invalid density in Atoms section of data file} :dt
Density value cannot be <= 0.0. :dd
{Invalid density in set command} :dt
Density must be > 0.0. :dd
{Invalid diameter in set command} :dt
Self-explanatory. :dd
{Invalid dihedral count in molecule file} :dt
Self-explanatory. :dd
{Invalid dihedral type in Dihedrals section of data file} :dt
Dihedral type must be positive integer and within range of specified
dihedral types. :dd
{Invalid dihedral type in dihedrals section of molecule file} :dt
Self-explanatory. :dd
{Invalid dipole length in set command} :dt
Self-explanatory. :dd
{Invalid displace_atoms rotate axis for 2d} :dt
Axis must be in z direction. :dd
{Invalid dump dcd filename} :dt
Filenames used with the dump dcd style cannot be binary or compressed
or cause multiple files to be written. :dd
{Invalid dump frequency} :dt
Dump frequency must be 1 or greater. :dd
{Invalid dump image element name} :dt
The specified element name was not in the standard list of elements.
See the dump_modify doc page. :dd
{Invalid dump image filename} :dt
The file produced by dump image cannot be binary and must
be for a single processor. :dd
{Invalid dump image persp value} :dt
Persp value must be >= 0.0. :dd
{Invalid dump image theta value} :dt
Theta must be between 0.0 and 180.0 inclusive. :dd
{Invalid dump image zoom value} :dt
Zoom value must be > 0.0. :dd
{Invalid dump movie filename} :dt
The file produced by dump movie cannot be binary or compressed
and must be a single file for a single processor. :dd
{Invalid dump xtc filename} :dt
Filenames used with the dump xtc style cannot be binary or compressed
or cause multiple files to be written. :dd
{Invalid dump xyz filename} :dt
Filenames used with the dump xyz style cannot be binary or cause files
to be written by each processor. :dd
{Invalid dump_modify threshold operator} :dt
Operator keyword used for threshold specification in not recognized. :dd
{Invalid entry in -reorder file} :dt
Self-explanatory. :dd
{Invalid fix ID in variable formula} :dt
The fix is not recognized. :dd
{Invalid fix ave/time off column} :dt
Self-explanatory. :dd
{Invalid fix box/relax command for a 2d simulation} :dt
Fix box/relax styles involving the z dimension cannot be used in
a 2d simulation. :dd
{Invalid fix box/relax command pressure settings} :dt
If multiple dimensions are coupled, those dimensions must be specified. :dd
{Invalid fix box/relax pressure settings} :dt
Settings for coupled dimensions must be the same. :dd
{Invalid fix nvt/npt/nph command for a 2d simulation} :dt
Cannot control z dimension in a 2d model. :dd
{Invalid fix nvt/npt/nph command pressure settings} :dt
If multiple dimensions are coupled, those dimensions must be
specified. :dd
{Invalid fix nvt/npt/nph pressure settings} :dt
Settings for coupled dimensions must be the same. :dd
{Invalid fix press/berendsen for a 2d simulation} :dt
The z component of pressure cannot be controlled for a 2d model. :dd
{Invalid fix press/berendsen pressure settings} :dt
Settings for coupled dimensions must be the same. :dd
{Invalid fix qeq parameter file} :dt
Element index > number of atom types. :dd
{Invalid fix rigid npt/nph command for a 2d simulation} :dt
Cannot control z dimension in a 2d model. :dd
{Invalid fix rigid npt/nph command pressure settings} :dt
If multiple dimensions are coupled, those dimensions must be
specified. :dd
{Invalid fix rigid/small npt/nph command for a 2d simulation} :dt
Cannot control z dimension in a 2d model. :dd
{Invalid fix rigid/small npt/nph command pressure settings} :dt
If multiple dimensions are coupled, those dimensions must be
specified. :dd
{Invalid flag in force field section of restart file} :dt
Unrecognized entry in restart file. :dd
{Invalid flag in header section of restart file} :dt
Unrecognized entry in restart file. :dd
{Invalid flag in peratom section of restart file} :dt
The format of this section of the file is not correct. :dd
{Invalid flag in type arrays section of restart file} :dt
Unrecognized entry in restart file. :dd
{Invalid frequency in temper command} :dt
Nevery must be > 0. :dd
{Invalid group ID in neigh_modify command} :dt
A group ID used in the neigh_modify command does not exist. :dd
{Invalid group function in variable formula} :dt
Group function is not recognized. :dd
{Invalid group in comm_modify command} :dt
Self-explanatory. :dd
{Invalid image up vector} :dt
Up vector cannot be (0,0,0). :dd
{Invalid immediate variable} :dt
Syntax of immediate value is incorrect. :dd
{Invalid improper count in molecule file} :dt
Self-explanatory. :dd
{Invalid improper type in Impropers section of data file} :dt
Improper type must be positive integer and within range of specified
improper types. :dd
{Invalid improper type in impropers section of molecule file} :dt
Self-explanatory. :dd
{Invalid index for non-body particles in compute body/local command} :dt
Only indices 1,2,3 can be used for non-body particles. :dd
{Invalid index in compute body/local command} :dt
Self-explanatory. :dd
{Invalid is_active() function in variable formula} :dt
Self-explanatory. :dd
{Invalid is_available() function in variable formula} :dt
Self-explanatory. :dd
{Invalid is_defined() function in variable formula} :dt
Self-explanatory. :dd
{Invalid keyword in angle table parameters} :dt
Self-explanatory. :dd
{Invalid keyword in bond table parameters} :dt
Self-explanatory. :dd
{Invalid keyword in compute angle/local command} :dt
Self-explanatory. :dd
{Invalid keyword in compute bond/local command} :dt
Self-explanatory. :dd
{Invalid keyword in compute dihedral/local command} :dt
Self-explanatory. :dd
{Invalid keyword in compute improper/local command} :dt
Self-explanatory. :dd
{Invalid keyword in compute pair/local command} :dt
Self-explanatory. :dd
{Invalid keyword in compute property/atom command} :dt
Self-explanatory. :dd
{Invalid keyword in compute property/chunk command} :dt
Self-explanatory. :dd
{Invalid keyword in compute property/local command} :dt
Self-explanatory. :dd
{Invalid keyword in dump cfg command} :dt
Self-explanatory. :dd
{Invalid keyword in pair table parameters} :dt
Keyword used in list of table parameters is not recognized. :dd
{Invalid length in set command} :dt
Self-explanatory. :dd
{Invalid mass in set command} :dt
Self-explanatory. :dd
{Invalid mass line in data file} :dt
Self-explanatory. :dd
{Invalid mass value} :dt
Self-explanatory. :dd
{Invalid math function in variable formula} :dt
Self-explanatory. :dd
{Invalid math/group/special function in variable formula} :dt
Self-explanatory. :dd
{Invalid option in lattice command for non-custom style} :dt
Certain lattice keywords are not supported unless the
lattice style is "custom". :dd
{Invalid order of forces within respa levels} :dt
For respa, ordering of force computations within respa levels must
obey certain rules. E.g. bonds cannot be compute less frequently than
angles, pairwise forces cannot be computed less frequently than
kspace, etc. :dd
{Invalid pair table cutoff} :dt
Cutoffs in pair_coeff command are not valid with read-in pair table. :dd
{Invalid pair table length} :dt
Length of read-in pair table is invalid :dd
{Invalid param file for fix qeq/shielded} :dt
Invalid value of gamma. :dd
{Invalid param file for fix qeq/slater} :dt
Zeta value is 0.0. :dd
{Invalid partitions in processors part command} :dt
Valid partitions are numbered 1 to N and the sender and receiver
cannot be the same partition. :dd
{Invalid python command} :dt
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line. :dd
{Invalid radius in Atoms section of data file} :dt
Radius must be >= 0.0. :dd
{Invalid random number seed in fix ttm command} :dt
Random number seed must be > 0. :dd
{Invalid random number seed in set command} :dt
Random number seed must be > 0. :dd
{Invalid replace values in compute reduce} :dt
Self-explanatory. :dd
{Invalid rigid body ID in fix rigid file} :dt
The ID does not match the number of an existing ID of rigid bodies
that are defined by the fix rigid command. :dd
{Invalid rigid body ID in fix rigid/small file} :dt
The ID does not match the number of an existing ID of rigid bodies
that are defined by the fix rigid/small command. :dd
{Invalid run command N value} :dt
The number of timesteps must fit in a 32-bit integer. If you want to
run for more steps than this, perform multiple shorter runs. :dd
{Invalid run command start/stop value} :dt
Self-explanatory. :dd
{Invalid run command upto value} :dt
Self-explanatory. :dd
{Invalid seed for Marsaglia random # generator} :dt
The initial seed for this random number generator must be a positive
integer less than or equal to 900 million. :dd
{Invalid seed for Park random # generator} :dt
The initial seed for this random number generator must be a positive
integer. :dd
{Invalid shake angle type in molecule file} :dt
Self-explanatory. :dd
{Invalid shake atom in molecule file} :dt
Self-explanatory. :dd
{Invalid shake bond type in molecule file} :dt
Self-explanatory. :dd
{Invalid shake flag in molecule file} :dt
Self-explanatory. :dd
{Invalid shape in Ellipsoids section of data file} :dt
Self-explanatory. :dd
{Invalid shape in Triangles section of data file} :dt
Two or more of the triangle corners are duplicate points. :dd
{Invalid shape in set command} :dt
Self-explanatory. :dd
{Invalid shear direction for fix wall/gran} :dt
Self-explanatory. :dd
{Invalid special atom index in molecule file} :dt
Self-explanatory. :dd
{Invalid special function in variable formula} :dt
Self-explanatory. :dd
{Invalid style in pair_write command} :dt
Self-explanatory. Check the input script. :dd
{Invalid syntax in variable formula} :dt
Self-explanatory. :dd
{Invalid t_event in prd command} :dt
Self-explanatory. :dd
{Invalid t_event in tad command} :dt
The value must be greater than 0. :dd
{Invalid template atom in Atoms section of data file} :dt
The atom indices must be between 1 to N, where N is the number of
atoms in the template molecule the atom belongs to. :dd
{Invalid template index in Atoms section of data file} :dt
The template indices must be between 1 to N, where N is the number of
molecules in the template. :dd
{Invalid thermo keyword in variable formula} :dt
The keyword is not recognized. :dd
{Invalid threads_per_atom specified.} :dt
For 3-body potentials on the GPU, the threads_per_atom setting cannot be
greater than 4 for NVIDIA GPUs. :dd
{Invalid timestep reset for fix ave/atom} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid timestep reset for fix ave/chunk} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid timestep reset for fix ave/correlate} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid timestep reset for fix ave/histo} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid timestep reset for fix ave/spatial} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid timestep reset for fix ave/time} :dt
Resetting the timestep has invalidated the sequence of timesteps this
fix needs to process. :dd
{Invalid tmax in tad command} :dt
The value must be greater than 0.0. :dd
{Invalid type for mass set} :dt
Mass command must set a type from 1-N where N is the number of atom
types. :dd
{Invalid use of library file() function} :dt
This function is called thru the library interface. This
error should not occur. Contact the developers if it does. :dd
{Invalid value in set command} :dt
The value specified for the setting is invalid, likely because it is
too small or too large. :dd
{Invalid variable evaluation in variable formula} :dt
A variable used in a formula could not be evaluated. :dd
{Invalid variable in next command} :dt
Self-explanatory. :dd
{Invalid variable name} :dt
Variable name used in an input script line is invalid. :dd
{Invalid variable name in variable formula} :dt
Variable name is not recognized. :dd
{Invalid variable style in special function next} :dt
Only file-style or atomfile-style variables can be used with next(). :dd
{Invalid variable style with next command} :dt
Variable styles {equal} and {world} cannot be used in a next
command. :dd
{Invalid volume in set command} :dt
Volume must be > 0.0. :dd
{Invalid wiggle direction for fix wall/gran} :dt
Self-explanatory. :dd
{Invoked angle equil angle on angle style none} :dt
Self-explanatory. :dd
{Invoked angle single on angle style none} :dt
Self-explanatory. :dd
{Invoked bond equil distance on bond style none} :dt
Self-explanatory. :dd
{Invoked bond single on bond style none} :dt
Self-explanatory. :dd
{Invoked pair single on pair style none} :dt
A command (e.g. a dump) attempted to invoke the single() function on a
pair style none, which is illegal. You are probably attempting to
compute per-atom quantities with an undefined pair style. :dd
{Invoking coulombic in pair style lj/coul requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Invoking coulombic in pair style lj/long/dipole/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{KIM neighbor iterator exceeded range} :dt
This should not happen. It likely indicates a bug
in the KIM implementation of the interatomic potential
where it is requesting neighbors incorrectly. :dd
{KOKKOS package does not yet support comm_style tiled} :dt
Self-explanatory. :dd
{KOKKOS package requires a kokkos enabled atom_style} :dt
Self-explanatory. :dd
{KSpace accuracy must be > 0} :dt
The kspace accuracy designated in the input must be greater than zero. :dd
{KSpace accuracy too large to estimate G vector} :dt
Reduce the accuracy request or specify gwald explicitly
via the kspace_modify command. :dd
{KSpace accuracy too low} :dt
Requested accuracy must be less than 1.0. :dd
{KSpace solver requires a pair style} :dt
No pair style is defined. :dd
{KSpace style does not yet support triclinic geometries} :dt
The specified kspace style does not allow for non-orthogonal
simulation boxes. :dd
{KSpace style has not yet been set} :dt
Cannot use kspace_modify command until a kspace style is set. :dd
{KSpace style is incompatible with Pair style} :dt
Setting a kspace style requires that a pair style with matching
long-range Coulombic or dispersion components be used. :dd
{Keyword %s in MEAM parameter file not recognized} :dt
Self-explanatory. :dd
{Kokkos has been compiled for CUDA but no GPUs are requested} :dt
One or more GPUs must be used when Kokkos is compiled for CUDA. :dd
{Kspace style does not support compute group/group} :dt
Self-explanatory. :dd
{Kspace style pppm/disp/tip4p requires newton on} :dt
Self-explanatory. :dd
{Kspace style pppm/tip4p requires newton on} :dt
Self-explanatory. :dd
{Kspace style requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Kspace_modify eigtol must be smaller than one} :dt
Self-explanatory. :dd
{LAMMPS is not built with Python embedded} :dt
This is done by including the PYTHON package before LAMMPS is built.
This is required to use python-style variables. :dd
{LAMMPS unit_style lj not supported by KIM models} :dt
Self-explanatory. Check the input script or data file. :dd
{LJ6 off not supported in pair_style buck/long/coul/long} :dt
Self-explanatory. :dd
{Label wasn't found in input script} :dt
Self-explanatory. :dd
{Lattice orient vectors are not orthogonal} :dt
The three specified lattice orientation vectors must be mutually
orthogonal. :dd
{Lattice orient vectors are not right-handed} :dt
The three specified lattice orientation vectors must create a
right-handed coordinate system such that a1 cross a2 = a3. :dd
{Lattice primitive vectors are collinear} :dt
The specified lattice primitive vectors do not for a unit cell with
non-zero volume. :dd
{Lattice settings are not compatible with 2d simulation} :dt
One or more of the specified lattice vectors has a non-zero z
component. :dd
{Lattice spacings are invalid} :dt
Each x,y,z spacing must be > 0. :dd
{Lattice style incompatible with simulation dimension} :dt
2d simulation can use sq, sq2, or hex lattice. 3d simulation can use
sc, bcc, or fcc lattice. :dd
{Log of zero/negative value in variable formula} :dt
Self-explanatory. :dd
{Lost atoms via balance: original %ld current %ld} :dt
This should not occur. Report the problem to the developers. :dd
{Lost atoms: original %ld current %ld} :dt
Lost atoms are checked for each time thermo output is done. See the
thermo_modify lost command for options. Lost atoms usually indicate
bad dynamics, e.g. atoms have been blown far out of the simulation
box, or moved further than one processor's sub-domain away before
reneighboring. :dd
{MEAM library error %d} :dt
A call to the MEAM Fortran library returned an error. :dd
{MPI_LMP_BIGINT and bigint in lmptype.h are not compatible} :dt
The size of the MPI datatype does not match the size of a bigint. :dd
{MPI_LMP_TAGINT and tagint in lmptype.h are not compatible} :dt
The size of the MPI datatype does not match the size of a tagint. :dd
{MSM can only currently be used with comm_style brick} :dt
This is a current restriction in LAMMPS. :dd
{MSM grid is too large} :dt
The global MSM grid is larger than OFFSET in one or more dimensions.
OFFSET is currently set to 16384. You likely need to decrease the
requested accuracy. :dd
{MSM order must be 4, 6, 8, or 10} :dt
This is a limitation of the MSM implementation in LAMMPS:
the MSM order can only be 4, 6, 8, or 10. :dd
{Mass command before simulation box is defined} :dt
The mass command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Matrix factorization to split dispersion coefficients failed} :dt
This should not normally happen. Contact the developers. :dd
{Min_style command before simulation box is defined} :dt
The min_style command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Minimization could not find thermo_pe compute} :dt
This compute is created by the thermo command. It must have been
explicitly deleted by a uncompute command. :dd
{Minimize command before simulation box is defined} :dt
The minimize command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Mismatched brackets in variable} :dt
Self-explanatory. :dd
{Mismatched compute in variable formula} :dt
A compute is referenced incorrectly or a compute that produces per-atom
values is used in an equal-style variable formula. :dd
{Mismatched fix in variable formula} :dt
A fix is referenced incorrectly or a fix that produces per-atom
values is used in an equal-style variable formula. :dd
{Mismatched variable in variable formula} :dt
A variable is referenced incorrectly or an atom-style variable that
produces per-atom values is used in an equal-style variable
formula. :dd
{Modulo 0 in variable formula} :dt
Self-explanatory. :dd
{Molecule IDs too large for compute chunk/atom} :dt
The IDs must not be larger than can be stored in a 32-bit integer
since chunk IDs are 32-bit integers. :dd
{Molecule auto special bond generation overflow} :dt
Counts exceed maxspecial setting for other atoms in system. :dd
{Molecule file has angles but no nangles setting} :dt
Self-explanatory. :dd
{Molecule file has body params but no setting for them} :dt
Self-explanatory. :dd
{Molecule file has bonds but no nbonds setting} :dt
Self-explanatory. :dd
{Molecule file has dihedrals but no ndihedrals setting} :dt
Self-explanatory. :dd
{Molecule file has impropers but no nimpropers setting} :dt
Self-explanatory. :dd
{Molecule file has no Body Doubles section} :dt
Self-explanatory. :dd
{Molecule file has no Body Integers section} :dt
Self-explanatory. :dd
{Molecule file has special flags but no bonds} :dt
Self-explanatory. :dd
{Molecule file needs both Special Bond sections} :dt
Self-explanatory. :dd
{Molecule file requires atom style body} :dt
Self-explanatory. :dd
{Molecule file shake flags not before shake atoms} :dt
The order of the two sections is important. :dd
{Molecule file shake flags not before shake bonds} :dt
The order of the two sections is important. :dd
{Molecule file shake info is incomplete} :dt
All 3 SHAKE sections are needed. :dd
{Molecule file special list does not match special count} :dt
The number of values in an atom's special list does not match count. :dd
{Molecule file z center-of-mass must be 0.0 for 2d} :dt
Self-explanatory. :dd
{Molecule file z coord must be 0.0 for 2d} :dt
Self-explanatory. :dd
{Molecule natoms must be 1 for body particle} :dt
Self-explanatory. :dd
{Molecule sizescale must be 1.0 for body particle} :dt
Self-explanatory. :dd
{Molecule template ID for atom_style template does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for create_atoms does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for fix deposit does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for fix gcmc does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for fix pour does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for fix rigid/small does not exist} :dt
Self-explanatory. :dd
{Molecule template ID for fix shake does not exist} :dt
Self-explanatory. :dd
{Molecule template ID must be alphanumeric or underscore characters} :dt
Self-explanatory. :dd
{Molecule topology/atom exceeds system topology/atom} :dt
The number of bonds, angles, etc per-atom in the molecule exceeds the
system setting. See the create_box command for how to specify these
values. :dd
{Molecule topology type exceeds system topology type} :dt
The number of bond, angle, etc types in the molecule exceeds the
system setting. See the create_box command for how to specify these
values. :dd
{More than one fix deform} :dt
Only one fix deform can be defined at a time. :dd
{More than one fix freeze} :dt
Only one of these fixes can be defined, since the granular pair
potentials access it. :dd
{More than one fix shake} :dt
Only one fix shake can be defined. :dd
{Mu not allowed when not using semi-grand in fix atom/swap command} :dt
Self-explanatory. :dd
{Must define angle_style before Angle Coeffs} :dt
Must use an angle_style command before reading a data file that
defines Angle Coeffs. :dd
{Must define angle_style before BondAngle Coeffs} :dt
Must use an angle_style command before reading a data file that
defines Angle Coeffs. :dd
{Must define angle_style before BondBond Coeffs} :dt
Must use an angle_style command before reading a data file that
defines Angle Coeffs. :dd
{Must define bond_style before Bond Coeffs} :dt
Must use a bond_style command before reading a data file that
defines Bond Coeffs. :dd
{Must define dihedral_style before AngleAngleTorsion Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines AngleAngleTorsion Coeffs. :dd
{Must define dihedral_style before AngleTorsion Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines AngleTorsion Coeffs. :dd
{Must define dihedral_style before BondBond13 Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines BondBond13 Coeffs. :dd
{Must define dihedral_style before Dihedral Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines Dihedral Coeffs. :dd
{Must define dihedral_style before EndBondTorsion Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines EndBondTorsion Coeffs. :dd
{Must define dihedral_style before MiddleBondTorsion Coeffs} :dt
Must use a dihedral_style command before reading a data file that
defines MiddleBondTorsion Coeffs. :dd
{Must define improper_style before AngleAngle Coeffs} :dt
Must use an improper_style command before reading a data file that
defines AngleAngle Coeffs. :dd
{Must define improper_style before Improper Coeffs} :dt
Must use an improper_style command before reading a data file that
defines Improper Coeffs. :dd
{Must define pair_style before Pair Coeffs} :dt
Must use a pair_style command before reading a data file that defines
Pair Coeffs. :dd
{Must define pair_style before PairIJ Coeffs} :dt
Must use a pair_style command before reading a data file that defines
PairIJ Coeffs. :dd
{Must have more than one processor partition to temper} :dt
Cannot use the temper command with only one processor partition. Use
the -partition command-line option. :dd
{Must read Atoms before Angles} :dt
The Atoms section of a data file must come before an Angles section. :dd
{Must read Atoms before Bodies} :dt
The Atoms section of a data file must come before a Bodies section. :dd
{Must read Atoms before Bonds} :dt
The Atoms section of a data file must come before a Bonds section. :dd
{Must read Atoms before Dihedrals} :dt
The Atoms section of a data file must come before a Dihedrals section. :dd
{Must read Atoms before Ellipsoids} :dt
The Atoms section of a data file must come before a Ellipsoids
section. :dd
{Must read Atoms before Impropers} :dt
The Atoms section of a data file must come before an Impropers
section. :dd
{Must read Atoms before Lines} :dt
The Atoms section of a data file must come before a Lines section. :dd
{Must read Atoms before Triangles} :dt
The Atoms section of a data file must come before a Triangles section. :dd
{Must read Atoms before Velocities} :dt
The Atoms section of a data file must come before a Velocities
section. :dd
{Must set both respa inner and outer} :dt
Cannot use just the inner or outer option with respa without using the
other. :dd
{Must set number of threads via package omp command} :dt
Because you are using the USER-OMP package, set the number of threads
via its settings, not by the pair_style snap nthreads setting. :dd
{Must shrink-wrap piston boundary} :dt
The boundary style of the face where the piston is applied must be of
type s (shrink-wrapped). :dd
{Must specify a region in fix deposit} :dt
The region keyword must be specified with this fix. :dd
{Must specify a region in fix pour} :dt
Self-explanatory. :dd
{Must specify at least 2 types in fix atom/swap command} :dt
Self-explanatory. :dd
{Must use 'kspace_modify pressure/scalar no' for rRESPA with kspace_style MSM} :dt
The kspace scalar pressure option cannot (yet) be used with rRESPA. :dd
{Must use 'kspace_modify pressure/scalar no' for tensor components with kspace_style msm} :dt
Otherwise MSM will compute only a scalar pressure. See the kspace_modify
command for details on this setting. :dd
{Must use 'kspace_modify pressure/scalar no' to obtain per-atom virial with kspace_style MSM} :dt
The kspace scalar pressure option cannot be used to obtain per-atom virial. :dd
{Must use 'kspace_modify pressure/scalar no' with GPU MSM Pair styles} :dt
The kspace scalar pressure option is not (yet) compatible with GPU MSM Pair styles. :dd
{Must use 'kspace_modify pressure/scalar no' with kspace_style msm/cg} :dt
The kspace scalar pressure option is not compatible with kspace_style msm/cg. :dd
{Must use -in switch with multiple partitions} :dt
A multi-partition simulation cannot read the input script from stdin.
The -in command-line option must be used to specify a file. :dd
{Must use Kokkos half/thread or full neighbor list with threads or GPUs} :dt
Using Kokkos half-neighbor lists with threading is not allowed. :dd
{Must use a block or cylinder region with fix pour} :dt
Self-explanatory. :dd
{Must use a block region with fix pour for 2d simulations} :dt
Self-explanatory. :dd
{Must use a bond style with TIP4P potential} :dt
TIP4P potentials assume bond lengths in water are constrained
by a fix shake command. :dd
{Must use a molecular atom style with fix poems molecule} :dt
Self-explanatory. :dd
{Must use a z-axis cylinder region with fix pour} :dt
Self-explanatory. :dd
{Must use an angle style with TIP4P potential} :dt
TIP4P potentials assume angles in water are constrained by a fix shake
command. :dd
{Must use atom map style array with Kokkos} :dt
See the atom_modify map command. :dd
{Must use atom style with molecule IDs with fix bond/swap} :dt
Self-explanatory. :dd
{Must use pair_style comb or comb3 with fix qeq/comb} :dt
Self-explanatory. :dd
{Must use variable energy with fix addforce} :dt
Must define an energy variable when applying a dynamic
force during minimization. :dd
{Must use variable energy with fix efield} :dt
You must define an energy when performing a minimization with a
variable E-field. :dd
{NEB command before simulation box is defined} :dt
Self-explanatory. :dd
{NEB requires damped dynamics minimizer} :dt
Use a different minimization style. :dd
{NEB requires use of fix neb} :dt
Self-explanatory. :dd
{NL ramp in wall/piston only implemented in zlo for now} :dt
The ramp keyword can only be used for piston applied to face zlo. :dd
{Need nswaptypes mu values in fix atom/swap command} :dt
Self-explanatory. :dd
{Needed bonus data not in data file} :dt
Some atom styles require bonus data. See the read_data doc page for
details. :dd
{Needed molecular topology not in data file} :dt
The header of the data file indicated bonds, angles, etc would be
included, but they are not present. :dd
{Neigh_modify exclude molecule requires atom attribute molecule} :dt
Self-explanatory. :dd
{Neigh_modify include group != atom_modify first group} :dt
Self-explanatory. :dd
{Neighbor delay must be 0 or multiple of every setting} :dt
The delay and every parameters set via the neigh_modify command are
inconsistent. If the delay setting is non-zero, then it must be a
multiple of the every setting. :dd
{Neighbor include group not allowed with ghost neighbors} :dt
This is a current restriction within LAMMPS. :dd
{Neighbor list overflow, boost neigh_modify one} :dt
There are too many neighbors of a single atom. Use the neigh_modify
command to increase the max number of neighbors allowed for one atom.
You may also want to boost the page size. :dd
{Neighbor multi not yet enabled for ghost neighbors} :dt
This is a current restriction within LAMMPS. :dd
{Neighbor multi not yet enabled for granular} :dt
Self-explanatory. :dd
{Neighbor multi not yet enabled for rRESPA} :dt
Self-explanatory. :dd
{Neighbor page size must be >= 10x the one atom setting} :dt
This is required to prevent wasting too much memory. :dd
{New atom IDs exceed maximum allowed ID} :dt
See the setting for tagint in the src/lmptype.h file. :dd
{New bond exceeded bonds per atom in create_bonds} :dt
See the read_data command for info on setting the "extra bond per
atom" header value to allow for additional bonds to be formed. :dd
{New bond exceeded bonds per atom in fix bond/create} :dt
See the read_data command for info on setting the "extra bond per
atom" header value to allow for additional bonds to be formed. :dd
{New bond exceeded special list size in fix bond/create} :dt
See the special_bonds extra command for info on how to leave space in
the special bonds list to allow for additional bonds to be formed. :dd
{Newton bond change after simulation box is defined} :dt
The newton command cannot be used to change the newton bond value
after a read_data, read_restart, or create_box command. :dd
{Next command must list all universe and uloop variables} :dt
This is to insure they stay in sync. :dd
{No Kspace style defined for compute group/group} :dt
Self-explanatory. :dd
{No OpenMP support compiled in} :dt
An OpenMP flag is set, but LAMMPS was not built with
OpenMP support. :dd
{No angle style is defined for compute angle/local} :dt
Self-explanatory. :dd
{No angles allowed with this atom style} :dt
Self-explanatory. :dd
{No atoms in data file} :dt
The header of the data file indicated that atoms would be included,
but they are not present. :dd
{No basis atoms in lattice} :dt
Basis atoms must be defined for lattice style user. :dd
{No bodies allowed with this atom style} :dt
Self-explanatory. Check data file. :dd
{No bond style is defined for compute bond/local} :dt
Self-explanatory. :dd
{No bonds allowed with this atom style} :dt
Self-explanatory. :dd
{No box information in dump. You have to use 'box no'} :dt
Self-explanatory. :dd
{No count or invalid atom count in molecule file} :dt
The number of atoms must be specified. :dd
{No dihedral style is defined for compute dihedral/local} :dt
Self-explanatory. :dd
{No dihedrals allowed with this atom style} :dt
Self-explanatory. :dd
{No dump custom arguments specified} :dt
The dump custom command requires that atom quantities be specified to
output to dump file. :dd
{No dump local arguments specified} :dt
Self-explanatory. :dd
{No ellipsoids allowed with this atom style} :dt
Self-explanatory. Check data file. :dd
{No fix gravity defined for fix pour} :dt
Gravity is required to use fix pour. :dd
{No improper style is defined for compute improper/local} :dt
Self-explanatory. :dd
{No impropers allowed with this atom style} :dt
Self-explanatory. :dd
{No input values for fix ave/spatial} :dt
Self-explanatory. :dd
{No lines allowed with this atom style} :dt
Self-explanatory. Check data file. :dd
{No matching element in ADP potential file} :dt
The ADP potential file does not contain elements that match the
requested elements. :dd
{No matching element in EAM potential file} :dt
The EAM potential file does not contain elements that match the
requested elements. :dd
{No molecule topology allowed with atom style template} :dt
The data file cannot specify the number of bonds, angles, etc,
because this info if inferred from the molecule templates. :dd
{No overlap of box and region for create_atoms} :dt
Self-explanatory. :dd
{No pair coul/streitz for fix qeq/slater} :dt
These commands must be used together. :dd
{No pair hbond/dreiding coefficients set} :dt
Self-explanatory. :dd
{No pair style defined for compute group/group} :dt
Cannot calculate group interactions without a pair style defined. :dd
{No pair style is defined for compute pair/local} :dt
Self-explanatory. :dd
{No pair style is defined for compute property/local} :dt
Self-explanatory. :dd
{No rigid bodies defined} :dt
The fix specification did not end up defining any rigid bodies. :dd
{No triangles allowed with this atom style} :dt
Self-explanatory. Check data file. :dd
{No values in fix ave/chunk command} :dt
Self-explanatory. :dd
{No values in fix ave/time command} :dt
Self-explanatory. :dd
{Non digit character between brackets in variable} :dt
Self-explanatory. :dd
{Non integer # of swaps in temper command} :dt
Swap frequency in temper command must evenly divide the total # of
timesteps. :dd
{Non-numeric box dimensions - simulation unstable} :dt
The box size has apparently blown up. :dd
{Non-zero atom IDs with atom_modify id = no} :dt
Self-explanatory. :dd
{Non-zero read_data shift z value for 2d simulation} :dt
Self-explanatory. :dd
{Nprocs not a multiple of N for -reorder} :dt
Self-explanatory. :dd
{Number of core atoms != number of shell atoms} :dt
There must be a one-to-one pairing of core and shell atoms. :dd
{Numeric index is out of bounds} :dt
A command with an argument that specifies an integer or range of
integers is using a value that is less than 1 or greater than the
maximum allowed limit. :dd
{One or more Atom IDs is negative} :dt
Atom IDs must be positive integers. :dd
{One or more atom IDs is too big} :dt
The limit on atom IDs is set by the SMALLBIG, BIGBIG, SMALLSMALL
setting in your Makefile. See Section_start 2.2 of the manual for
more details. :dd
{One or more atom IDs is zero} :dt
Either all atoms IDs must be zero or none of them. :dd
{One or more atoms belong to multiple rigid bodies} :dt
Two or more rigid bodies defined by the fix rigid command cannot
contain the same atom. :dd
{One or more rigid bodies are a single particle} :dt
Self-explanatory. :dd
{One or zero atoms in rigid body} :dt
Any rigid body defined by the fix rigid command must contain 2 or more
atoms. :dd
{Only 2 types allowed when not using semi-grand in fix atom/swap command} :dt
Self-explanatory. :dd
{Only one cut-off allowed when requesting all long} :dt
Self-explanatory. :dd
{Only one cutoff allowed when requesting all long} :dt
Self-explanatory. :dd
{Only zhi currently implemented for fix append/atoms} :dt
Self-explanatory. :dd
{Out of range atoms - cannot compute MSM} :dt
One or more atoms are attempting to map their charge to a MSM grid point
that is not owned by a processor. This is likely for one of two
reasons, both of them bad. First, it may mean that an atom near the
boundary of a processor's sub-domain has moved more than 1/2 the
"neighbor skin distance"_neighbor.html without neighbor lists being
rebuilt and atoms being migrated to new processors. This also means
you may be missing pairwise interactions that need to be computed.
The solution is to change the re-neighboring criteria via the
"neigh_modify"_neigh_modify.html command. The safest settings are
"delay 0 every 1 check yes". Second, it may mean that an atom has
moved far outside a processor's sub-domain or even the entire
simulation box. This indicates bad physics, e.g. due to highly
overlapping atoms, too large a timestep, etc. :dd
{Out of range atoms - cannot compute PPPM} :dt
One or more atoms are attempting to map their charge to a PPPM grid
point that is not owned by a processor. This is likely for one of two
reasons, both of them bad. First, it may mean that an atom near the
boundary of a processor's sub-domain has moved more than 1/2 the
"neighbor skin distance"_neighbor.html without neighbor lists being
rebuilt and atoms being migrated to new processors. This also means
you may be missing pairwise interactions that need to be computed.
The solution is to change the re-neighboring criteria via the
"neigh_modify"_neigh_modify.html command. The safest settings are
"delay 0 every 1 check yes". Second, it may mean that an atom has
moved far outside a processor's sub-domain or even the entire
simulation box. This indicates bad physics, e.g. due to highly
overlapping atoms, too large a timestep, etc. :dd
{Out of range atoms - cannot compute PPPMDisp} :dt
One or more atoms are attempting to map their charge to a PPPM grid
point that is not owned by a processor. This is likely for one of two
reasons, both of them bad. First, it may mean that an atom near the
boundary of a processor's sub-domain has moved more than 1/2 the
"neighbor skin distance"_neighbor.html without neighbor lists being
rebuilt and atoms being migrated to new processors. This also means
you may be missing pairwise interactions that need to be computed.
The solution is to change the re-neighboring criteria via the
"neigh_modify"_neigh_modify.html command. The safest settings are
"delay 0 every 1 check yes". Second, it may mean that an atom has
moved far outside a processor's sub-domain or even the entire
simulation box. This indicates bad physics, e.g. due to highly
overlapping atoms, too large a timestep, etc. :dd
{Overflow of allocated fix vector storage} :dt
This should not normally happen if the fix correctly calculated
how long the vector will grow to. Contact the developers. :dd
{Overlapping large/large in pair colloid} :dt
This potential is infinite when there is an overlap. :dd
{Overlapping small/large in pair colloid} :dt
This potential is infinite when there is an overlap. :dd
{POEMS fix must come before NPT/NPH fix} :dt
NPT/NPH fix must be defined in input script after all poems fixes,
else the fix contribution to the pressure virial is incorrect. :dd
{PPPM can only currently be used with comm_style brick} :dt
This is a current restriction in LAMMPS. :dd
{PPPM grid is too large} :dt
The global PPPM grid is larger than OFFSET in one or more dimensions.
OFFSET is currently set to 4096. You likely need to decrease the
requested accuracy. :dd
{PPPM grid stencil extends beyond nearest neighbor processor} :dt
This is not allowed if the kspace_modify overlap setting is no. :dd
{PPPM order < minimum allowed order} :dt
The default minimum order is 2. This can be reset by the
kspace_modify minorder command. :dd
{PPPM order cannot be < 2 or > than %d} :dt
This is a limitation of the PPPM implementation in LAMMPS. :dd
{PPPMDisp Coulomb grid is too large} :dt
The global PPPM grid is larger than OFFSET in one or more dimensions.
OFFSET is currently set to 4096. You likely need to decrease the
requested accuracy. :dd
{PPPMDisp Dispersion grid is too large} :dt
The global PPPM grid is larger than OFFSET in one or more dimensions.
OFFSET is currently set to 4096. You likely need to decrease the
requested accuracy. :dd
{PPPMDisp can only currently be used with comm_style brick} :dt
This is a current restriction in LAMMPS. :dd
{PPPMDisp coulomb order cannot be greater than %d} :dt
This is a limitation of the PPPM implementation in LAMMPS. :dd
{PPPMDisp used but no parameters set, for further information please see the pppm/disp documentation} :dt
An efficient and accurate usage of the pppm/disp requires settings via the kspace_modify command. Please see the pppm/disp documentation for further instructions. :dd
{PRD command before simulation box is defined} :dt
The prd command cannot be used before a read_data,
read_restart, or create_box command. :dd
{PRD nsteps must be multiple of t_event} :dt
Self-explanatory. :dd
{PRD t_corr must be multiple of t_event} :dt
Self-explanatory. :dd
{Package command after simulation box is defined} :dt
The package command cannot be used afer a read_data, read_restart, or
create_box command. :dd
{Package cuda command without USER-CUDA package enabled} :dt
The USER-CUDA package must be installed via "make yes-user-cuda"
before LAMMPS is built, and the "-c on" must be used to enable the
package. :dd
{Package gpu command without GPU package installed} :dt
The GPU package must be installed via "make yes-gpu" before LAMMPS is
built. :dd
{Package intel command without USER-INTEL package installed} :dt
The USER-INTEL package must be installed via "make yes-user-intel"
before LAMMPS is built. :dd
{Package kokkos command without KOKKOS package enabled} :dt
The KOKKOS package must be installed via "make yes-kokkos" before
LAMMPS is built, and the "-k on" must be used to enable the package. :dd
{Package omp command without USER-OMP package installed} :dt
The USER-OMP package must be installed via "make yes-user-omp" before
LAMMPS is built. :dd
{Pair body requires atom style body} :dt
Self-explanatory. :dd
{Pair body requires body style nparticle} :dt
This pair style is specific to the nparticle body style. :dd
{Pair brownian requires atom style sphere} :dt
Self-explanatory. :dd
{Pair brownian requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Pair brownian requires monodisperse particles} :dt
All particles must be the same finite size. :dd
{Pair brownian/poly requires atom style sphere} :dt
Self-explanatory. :dd
{Pair brownian/poly requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Pair brownian/poly requires newton pair off} :dt
Self-explanatory. :dd
{Pair coeff for hybrid has invalid style} :dt
Style in pair coeff must have been listed in pair_style command. :dd
{Pair coul/wolf requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair cutoff < Respa interior cutoff} :dt
One or more pairwise cutoffs are too short to use with the specified
rRESPA cutoffs. :dd
{Pair dipole/cut requires atom attributes q, mu, torque} :dt
The atom style defined does not have these attributes. :dd
{Pair dipole/cut/gpu requires atom attributes q, mu, torque} :dt
The atom style defined does not have this attribute. :dd
{Pair dipole/long requires atom attributes q, mu, torque} :dt
The atom style defined does not have these attributes. :dd
{Pair dipole/sf/gpu requires atom attributes q, mu, torque} :dt
The atom style defined does not one or more of these attributes. :dd
{Pair distance < table inner cutoff} :dt
Two atoms are closer together than the pairwise table allows. :dd
{Pair distance > table outer cutoff} :dt
Two atoms are further apart than the pairwise table allows. :dd
{Pair dpd requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair gayberne epsilon a,b,c coeffs are not all set} :dt
Each atom type involved in pair_style gayberne must
have these 3 coefficients set at least once. :dd
{Pair gayberne requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Pair gayberne requires atoms with same type have same shape} :dt
Self-explanatory. :dd
{Pair gayberne/gpu requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Pair gayberne/gpu requires atoms with same type have same shape} :dt
Self-explanatory. :dd
{Pair granular requires atom attributes radius, rmass} :dt
The atom style defined does not have these attributes. :dd
{Pair granular requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair granular with shear history requires newton pair off} :dt
This is a current restriction of the implementation of pair
granular styles with history. :dd
{Pair hybrid single calls do not support per sub-style special bond values} :dt
Self-explanatory. :dd
{Pair hybrid sub-style does not support single call} :dt
You are attempting to invoke a single() call on a pair style
that doesn't support it. :dd
{Pair hybrid sub-style is not used} :dt
No pair_coeff command used a sub-style specified in the pair_style
command. :dd
{Pair inner cutoff < Respa interior cutoff} :dt
One or more pairwise cutoffs are too short to use with the specified
rRESPA cutoffs. :dd
{Pair inner cutoff >= Pair outer cutoff} :dt
The specified cutoffs for the pair style are inconsistent. :dd
{Pair line/lj requires atom style line} :dt
Self-explanatory. :dd
{Pair lj/long/dipole/long requires atom attributes mu, torque} :dt
The atom style defined does not have these attributes. :dd
{Pair lubricate requires atom style sphere} :dt
Self-explanatory. :dd
{Pair lubricate requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair lubricate requires monodisperse particles} :dt
All particles must be the same finite size. :dd
{Pair lubricate/poly requires atom style sphere} :dt
Self-explanatory. :dd
{Pair lubricate/poly requires extended particles} :dt
One of the particles has radius 0.0. :dd
{Pair lubricate/poly requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair lubricate/poly requires newton pair off} :dt
Self-explanatory. :dd
{Pair lubricateU requires atom style sphere} :dt
Self-explanatory. :dd
{Pair lubricateU requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair lubricateU requires monodisperse particles} :dt
All particles must be the same finite size. :dd
{Pair lubricateU/poly requires ghost atoms store velocity} :dt
Use the comm_modify vel yes command to enable this. :dd
{Pair lubricateU/poly requires newton pair off} :dt
Self-explanatory. :dd
{Pair peri lattice is not identical in x, y, and z} :dt
The lattice defined by the lattice command must be cubic. :dd
{Pair peri requires a lattice be defined} :dt
Use the lattice command for this purpose. :dd
{Pair peri requires an atom map, see atom_modify} :dt
Even for atomic systems, an atom map is required to find Peridynamic
bonds. Use the atom_modify command to define one. :dd
{Pair resquared epsilon a,b,c coeffs are not all set} :dt
Self-explanatory. :dd
{Pair resquared epsilon and sigma coeffs are not all set} :dt
Self-explanatory. :dd
{Pair resquared requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Pair resquared requires atoms with same type have same shape} :dt
Self-explanatory. :dd
{Pair resquared/gpu requires atom style ellipsoid} :dt
Self-explanatory. :dd
{Pair resquared/gpu requires atoms with same type have same shape} :dt
Self-explanatory. :dd
{Pair style AIREBO requires atom IDs} :dt
This is a requirement to use the AIREBO potential. :dd
{Pair style AIREBO requires newton pair on} :dt
See the newton command. This is a restriction to use the AIREBO
potential. :dd
{Pair style BOP requires atom IDs} :dt
This is a requirement to use the BOP potential. :dd
{Pair style BOP requires newton pair on} :dt
See the newton command. This is a restriction to use the BOP
potential. :dd
{Pair style COMB requires atom IDs} :dt
This is a requirement to use the AIREBO potential. :dd
{Pair style COMB requires atom attribute q} :dt
Self-explanatory. :dd
{Pair style COMB requires newton pair on} :dt
See the newton command. This is a restriction to use the COMB
potential. :dd
{Pair style COMB3 requires atom IDs} :dt
This is a requirement to use the COMB3 potential. :dd
{Pair style COMB3 requires atom attribute q} :dt
Self-explanatory. :dd
{Pair style COMB3 requires newton pair on} :dt
See the newton command. This is a restriction to use the COMB3
potential. :dd
{Pair style LCBOP requires atom IDs} :dt
This is a requirement to use the LCBOP potential. :dd
{Pair style LCBOP requires newton pair on} :dt
See the newton command. This is a restriction to use the Tersoff
potential. :dd
{Pair style MEAM requires newton pair on} :dt
See the newton command. This is a restriction to use the MEAM
potential. :dd
{Pair style SNAP requires newton pair on} :dt
See the newton command. This is a restriction to use the SNAP
potential. :dd
{Pair style Stillinger-Weber requires atom IDs} :dt
This is a requirement to use the SW potential. :dd
{Pair style Stillinger-Weber requires newton pair on} :dt
See the newton command. This is a restriction to use the SW
potential. :dd
{Pair style Tersoff requires atom IDs} :dt
This is a requirement to use the Tersoff potential. :dd
{Pair style Tersoff requires newton pair on} :dt
See the newton command. This is a restriction to use the Tersoff
potential. :dd
{Pair style Vashishta requires atom IDs} :dt
This is a requirement to use the Vashishta potential. :dd
{Pair style Vashishta requires newton pair on} :dt
See the newton command. This is a restriction to use the Vashishta
potential. :dd
{Pair style bop requires comm ghost cutoff at least 3x larger than %g} :dt
Use the communicate ghost command to set this. See the pair bop
doc page for more details. :dd
{Pair style born/coul/long requires atom attribute q} :dt
An atom style that defines this attribute must be used. :dd
{Pair style born/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style born/coul/wolf requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style buck/coul/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style buck/coul/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style buck/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style buck/long/coul/long requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style coul/cut requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style coul/cut/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style coul/debye/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style coul/dsf requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style coul/dsf/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style coul/streitz requires atom attribute q} :dt
Self-explanatory. :dd
{Pair style does not have extra field requested by compute pair/local} :dt
The pair style does not support the pN value requested by the compute
pair/local command. :dd
{Pair style does not support bond_style quartic} :dt
The pair style does not have a single() function, so it can
not be invoked by bond_style quartic. :dd
{Pair style does not support compute group/group} :dt
The pair_style does not have a single() function, so it cannot be
invoked by the compute group/group command. :dd
{Pair style does not support compute pair/local} :dt
The pair style does not have a single() function, so it can
not be invoked by compute pair/local. :dd
{Pair style does not support compute property/local} :dt
The pair style does not have a single() function, so it can
not be invoked by fix bond/swap. :dd
{Pair style does not support fix bond/swap} :dt
The pair style does not have a single() function, so it can
not be invoked by fix bond/swap. :dd
{Pair style does not support pair_write} :dt
The pair style does not have a single() function, so it can
not be invoked by pair write. :dd
{Pair style does not support rRESPA inner/middle/outer} :dt
You are attempting to use rRESPA options with a pair style that
does not support them. :dd
{Pair style granular with history requires atoms have IDs} :dt
Atoms in the simulation do not have IDs, so history effects
cannot be tracked by the granular pair potential. :dd
{Pair style hbond/dreiding requires an atom map, see atom_modify} :dt
Self-explanatory. :dd
{Pair style hbond/dreiding requires atom IDs} :dt
Self-explanatory. :dd
{Pair style hbond/dreiding requires molecular system} :dt
Self-explanatory. :dd
{Pair style hbond/dreiding requires newton pair on} :dt
See the newton command for details. :dd
{Pair style hybrid cannot have hybrid as an argument} :dt
Self-explanatory. :dd
{Pair style hybrid cannot have none as an argument} :dt
Self-explanatory. :dd
{Pair style is incompatible with KSpace style} :dt
If a pair style with a long-range Coulombic component is selected,
then a kspace style must also be used. :dd
{Pair style is incompatible with TIP4P KSpace style} :dt
The pair style does not have the requires TIP4P settings. :dd
{Pair style lj/charmm/coul/charmm requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style lj/charmm/coul/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style lj/charmm/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/class2/coul/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/class2/coul/long requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/class2/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/cut/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/debye/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/dsf requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style lj/cut/coul/dsf/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/long requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/tip4p/cut requires atom IDs} :dt
This is a requirement to use this potential. :dd
{Pair style lj/cut/tip4p/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style lj/cut/tip4p/cut requires newton pair on} :dt
See the newton command. This is a restriction to use this
potential. :dd
{Pair style lj/cut/tip4p/long requires atom IDs} :dt
There are no atom IDs defined in the system and the TIP4P potential
requires them to find O,H atoms with a water molecule. :dd
{Pair style lj/cut/tip4p/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style lj/cut/tip4p/long requires newton pair on} :dt
This is because the computation of constraint forces within a water
molecule adds forces to atoms owned by other processors. :dd
{Pair style lj/gromacs/coul/gromacs requires atom attribute q} :dt
An atom_style with this attribute is needed. :dd
{Pair style lj/long/dipole/long does not currently support respa} :dt
This feature is not yet supported. :dd
{Pair style lj/long/tip4p/long requires atom IDs} :dt
There are no atom IDs defined in the system and the TIP4P potential
requires them to find O,H atoms with a water molecule. :dd
{Pair style lj/long/tip4p/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style lj/long/tip4p/long requires newton pair on} :dt
This is because the computation of constraint forces within a water
molecule adds forces to atoms owned by other processors. :dd
{Pair style lj/sdk/coul/long/gpu requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style nb3b/harmonic requires atom IDs} :dt
This is a requirement to use this potential. :dd
{Pair style nb3b/harmonic requires newton pair on} :dt
See the newton command. This is a restriction to use this potential. :dd
{Pair style nm/cut/coul/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style nm/cut/coul/long requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style peri requires atom style peri} :dt
Self-explanatory. :dd
{Pair style polymorphic requires atom IDs} :dt
This is a requirement to use the polymorphic potential. :dd
{Pair style polymorphic requires newton pair on} :dt
See the newton command. This is a restriction to use the polymorphic
potential. :dd
{Pair style reax requires atom IDs} :dt
This is a requirement to use the ReaxFF potential. :dd
{Pair style reax requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style reax requires newton pair on} :dt
This is a requirement to use the ReaxFF potential. :dd
{Pair style requires a KSpace style} :dt
No kspace style is defined. :dd
{Pair style requires use of kspace_style ewald/disp} :dt
Self-explanatory. :dd
{Pair style sw/gpu requires atom IDs} :dt
This is a requirement to use this potential. :dd
{Pair style sw/gpu requires newton pair off} :dt
See the newton command. This is a restriction to use this potential. :dd
{Pair style tersoff/gpu requires atom IDs} :dt
This is a requirement to use the tersoff/gpu potential. :dd
{Pair style tersoff/gpu requires newton pair off} :dt
See the newton command. This is a restriction to use this pair style. :dd
{Pair style tip4p/cut requires atom IDs} :dt
This is a requirement to use this potential. :dd
{Pair style tip4p/cut requires atom attribute q} :dt
The atom style defined does not have this attribute. :dd
{Pair style tip4p/cut requires newton pair on} :dt
See the newton command. This is a restriction to use this potential. :dd
{Pair style tip4p/long requires atom IDs} :dt
There are no atom IDs defined in the system and the TIP4P potential
requires them to find O,H atoms with a water molecule. :dd
{Pair style tip4p/long requires atom attribute q} :dt
The atom style defined does not have these attributes. :dd
{Pair style tip4p/long requires newton pair on} :dt
This is because the computation of constraint forces within a water
molecule adds forces to atoms owned by other processors. :dd
{Pair table cutoffs must all be equal to use with KSpace} :dt
When using pair style table with a long-range KSpace solver, the
cutoffs for all atom type pairs must all be the same, since the
long-range solver starts at that cutoff. :dd
{Pair table parameters did not set N} :dt
List of pair table parameters must include N setting. :dd
{Pair tersoff/zbl requires metal or real units} :dt
This is a current restriction of this pair potential. :dd
{Pair tersoff/zbl/kk requires metal or real units} :dt
This is a current restriction of this pair potential. :dd
{Pair tri/lj requires atom style tri} :dt
Self-explanatory. :dd
{Pair yukawa/colloid requires atom style sphere} :dt
Self-explanatory. :dd
{Pair yukawa/colloid requires atoms with same type have same radius} :dt
Self-explanatory. :dd
{Pair yukawa/colloid/gpu requires atom style sphere} :dt
Self-explanatory. :dd
{PairKIM only works with 3D problems} :dt
This is a current limitation. :dd
{Pair_coeff command before pair_style is defined} :dt
Self-explanatory. :dd
{Pair_coeff command before simulation box is defined} :dt
The pair_coeff command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Pair_modify command before pair_style is defined} :dt
Self-explanatory. :dd
{Pair_modify special setting for pair hybrid incompatible with global special_bonds setting} :dt
Cannot override a setting of 0.0 or 1.0 or change a setting between
0.0 and 1.0. :dd
{Pair_write command before pair_style is defined} :dt
Self-explanatory. :dd
{Particle on or inside fix wall surface} :dt
Particles must be "exterior" to the wall in order for energy/force to
be calculated. :dd
{Particle outside surface of region used in fix wall/region} :dt
Particles must be inside the region for energy/force to be calculated.
A particle outside the region generates an error. :dd
{Per-atom compute in equal-style variable formula} :dt
Equal-style variables cannot use per-atom quantities. :dd
{Per-atom energy was not tallied on needed timestep} :dt
You are using a thermo keyword that requires potentials to
have tallied energy, but they didn't on this timestep. See the
variable doc page for ideas on how to make this work. :dd
{Per-atom fix in equal-style variable formula} :dt
Equal-style variables cannot use per-atom quantities. :dd
{Per-atom virial was not tallied on needed timestep} :dt
You are using a thermo keyword that requires potentials to have
tallied the virial, but they didn't on this timestep. See the
variable doc page for ideas on how to make this work. :dd
{Per-processor system is too big} :dt
The number of owned atoms plus ghost atoms on a single
processor must fit in 32-bit integer. :dd
{Potential energy ID for fix neb does not exist} :dt
Self-explanatory. :dd
{Potential energy ID for fix nvt/nph/npt does not exist} :dt
A compute for potential energy must be defined. :dd
{Potential file has duplicate entry} :dt
The potential file has more than one entry for the same element. :dd
{Potential file is missing an entry} :dt
The potential file does not have a needed entry. :dd
{Power by 0 in variable formula} :dt
Self-explanatory. :dd
{Pressure ID for fix box/relax does not exist} :dt
The compute ID needed to compute pressure for the fix does not
exist. :dd
{Pressure ID for fix modify does not exist} :dt
Self-explanatory. :dd
{Pressure ID for fix npt/nph does not exist} :dt
Self-explanatory. :dd
{Pressure ID for fix press/berendsen does not exist} :dt
The compute ID needed to compute pressure for the fix does not
exist. :dd
{Pressure ID for fix rigid npt/nph does not exist} :dt
Self-explanatory. :dd
{Pressure ID for thermo does not exist} :dt
The compute ID needed to compute pressure for thermodynamics does not
exist. :dd
{Pressure control can not be used with fix nvt} :dt
Self-explanatory. :dd
{Pressure control can not be used with fix nvt/asphere} :dt
Self-explanatory. :dd
{Pressure control can not be used with fix nvt/body} :dt
Self-explanatory. :dd
{Pressure control can not be used with fix nvt/sllod} :dt
Self-explanatory. :dd
{Pressure control can not be used with fix nvt/sphere} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nph} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nph/asphere} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nph/body} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nph/small} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nph/sphere} :dt
Self-explanatory. :dd
{Pressure control must be used with fix nphug} :dt
A pressure control keyword (iso, aniso, tri, x, y, or z) must be
provided. :dd
{Pressure control must be used with fix npt} :dt
Self-explanatory. :dd
{Pressure control must be used with fix npt/asphere} :dt
Self-explanatory. :dd
{Pressure control must be used with fix npt/body} :dt
Self-explanatory. :dd
{Pressure control must be used with fix npt/sphere} :dt
Self-explanatory. :dd
{Processor count in z must be 1 for 2d simulation} :dt
Self-explanatory. :dd
{Processor partitions do not match number of allocated processors} :dt
The total number of processors in all partitions must match the number
of processors LAMMPS is running on. :dd
{Processors command after simulation box is defined} :dt
The processors command cannot be used after a read_data, read_restart,
or create_box command. :dd
{Processors custom grid file is inconsistent} :dt
The vales in the custom file are not consistent with the number of
processors you are running on or the Px,Py,Pz settings of the
processors command. Or there was not a setting for every processor. :dd
{Processors grid numa and map style are incompatible} :dt
Using numa for gstyle in the processors command requires using
cart for the map option. :dd
{Processors part option and grid style are incompatible} :dt
Cannot use gstyle numa or custom with the part option. :dd
{Processors twogrid requires proc count be a multiple of core count} :dt
Self-explanatory. :dd
{Pstart and Pstop must have the same value} :dt
Self-explanatory. :dd
{Python function evaluation failed} :dt
The Python function did not run successfully and/or did not return a
value (if it is supposed to return a value). This is probably due to
some error condition in the function. :dd
{Python function is not callable} :dt
The provided Python code was run successfully, but it not
define a callable function with the required name. :dd
{Python invoke of undefined function} :dt
Cannot invoke a function that has not been previously defined. :dd
{Python variable does not match Python function} :dt
This matching is defined by the python-style variable and the python
command. :dd
{Python variable has no function} :dt
No python command was used to define the function associated with the
python-style variable. :dd
{QEQ with 'newton pair off' not supported} :dt
See the newton command. This is a restriction to use the QEQ fixes. :dd
{R0 < 0 for fix spring command} :dt
Equilibrium spring length is invalid. :dd
{RATTLE coordinate constraints are not satisfied up to desired tolerance} :dt
Self-explanatory. :dd
{RATTLE determinant = 0.0} :dt
The determinant of the matrix being solved for a single cluster
specified by the fix rattle command is numerically invalid. :dd
{RATTLE failed} :dt
Certain constraints were not satisfied. :dd
{RATTLE velocity constraints are not satisfied up to desired tolerance} :dt
Self-explanatory. :dd
{Read data add offset is too big} :dt
It cannot be larger than the size of atom IDs, e.g. the maximum 32-bit
integer. :dd
{Read dump of atom property that isn't allocated} :dt
Self-explanatory. :dd
{Read rerun dump file timestep > specified stop} :dt
Self-explanatory. :dd
{Read restart MPI-IO input not allowed with % in filename} :dt
This is because a % signifies one file per processor and MPI-IO
creates one large file for all processors. :dd
{Read_data shrink wrap did not assign all atoms correctly} :dt
This is typically because the box-size specified in the data file is
large compared to the actual extent of atoms in a shrink-wrapped
dimension. When LAMMPS shrink-wraps the box atoms will be lost if the
processor they are re-assigned to is too far away. Choose a box
size closer to the actual extent of the atoms. :dd
{Read_dump command before simulation box is defined} :dt
The read_dump command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Read_dump field not found in dump file} :dt
Self-explanatory. :dd
{Read_dump triclinic status does not match simulation} :dt
Both the dump snapshot and the current LAMMPS simulation must
be using either an orthogonal or triclinic box. :dd
{Read_dump xyz fields do not have consistent scaling/wrapping} :dt
Self-explanatory. :dd
{Reading from MPI-IO filename when MPIIO package is not installed} :dt
Self-explanatory. :dd
{Reax_defs.h setting for NATDEF is too small} :dt
Edit the setting in the ReaxFF library and re-compile the
library and re-build LAMMPS. :dd
{Reax_defs.h setting for NNEIGHMAXDEF is too small} :dt
Edit the setting in the ReaxFF library and re-compile the
library and re-build LAMMPS. :dd
{Receiving partition in processors part command is already a receiver} :dt
Cannot specify a partition to be a receiver twice. :dd
{Region ID for compute chunk/atom does not exist} :dt
Self-explanatory. :dd
{Region ID for compute reduce/region does not exist} :dt
Self-explanatory. :dd
{Region ID for compute temp/region does not exist} :dt
Self-explanatory. :dd
{Region ID for dump custom does not exist} :dt
Self-explanatory. :dd
{Region ID for fix addforce does not exist} :dt
Self-explanatory. :dd
{Region ID for fix atom/swap does not exist} :dt
Self-explanatory. :dd
{Region ID for fix ave/spatial does not exist} :dt
Self-explanatory. :dd
{Region ID for fix aveforce does not exist} :dt
Self-explanatory. :dd
{Region ID for fix deposit does not exist} :dt
Self-explanatory. :dd
{Region ID for fix efield does not exist} :dt
Self-explanatory. :dd
{Region ID for fix evaporate does not exist} :dt
Self-explanatory. :dd
{Region ID for fix gcmc does not exist} :dt
Self-explanatory. :dd
{Region ID for fix heat does not exist} :dt
Self-explanatory. :dd
{Region ID for fix setforce does not exist} :dt
Self-explanatory. :dd
{Region ID for fix wall/region does not exist} :dt
Self-explanatory. :dd
{Region ID for group dynamic does not exist} :dt
Self-explanatory. :dd
{Region ID in variable formula does not exist} :dt
Self-explanatory. :dd
{Region cannot have 0 length rotation vector} :dt
Self-explanatory. :dd
{Region for fix oneway does not exist} :dt
Self-explanatory. :dd
{Region intersect region ID does not exist} :dt
Self-explanatory. :dd
{Region union or intersect cannot be dynamic} :dt
The sub-regions can be dynamic, but not the combined region. :dd
{Region union region ID does not exist} :dt
One or more of the region IDs specified by the region union command
does not exist. :dd
{Replacing a fix, but new style != old style} :dt
A fix ID can be used a 2nd time, but only if the style matches the
previous fix. In this case it is assumed you with to reset a fix's
parameters. This error may mean you are mistakenly re-using a fix ID
when you do not intend to. :dd
{Replicate command before simulation box is defined} :dt
The replicate command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Replicate did not assign all atoms correctly} :dt
Atoms replicated by the replicate command were not assigned correctly
to processors. This is likely due to some atom coordinates being
outside a non-periodic simulation box. :dd
{Replicated system atom IDs are too big} :dt
See the setting for tagint in the src/lmptype.h file. :dd
{Replicated system is too big} :dt
See the setting for bigint in the src/lmptype.h file. :dd
{Required border comm not yet implemented with Kokkos} :dt
There are various limitations in the communication options supported
by Kokkos. :dd
{Rerun command before simulation box is defined} :dt
The rerun command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Rerun dump file does not contain requested snapshot} :dt
Self-explanatory. :dd
{Resetting timestep size is not allowed with fix move} :dt
This is because fix move is moving atoms based on elapsed time. :dd
{Respa inner cutoffs are invalid} :dt
The first cutoff must be <= the second cutoff. :dd
{Respa levels must be >= 1} :dt
Self-explanatory. :dd
{Respa middle cutoffs are invalid} :dt
The first cutoff must be <= the second cutoff. :dd
{Restart file MPI-IO output not allowed with % in filename} :dt
This is because a % signifies one file per processor and MPI-IO
creates one large file for all processors. :dd
{Restart file byte ordering is not recognized} :dt
The file does not appear to be a LAMMPS restart file since it doesn't
contain a recognized byte-orderomg flag at the beginning. :dd
{Restart file byte ordering is swapped} :dt
The file was written on a machine with different byte-ordering than
the machine you are reading it on. Convert it to a text data file
instead, on the machine you wrote it on. :dd
{Restart file incompatible with current version} :dt
This is probably because you are trying to read a file created with a
version of LAMMPS that is too old compared to the current version.
Use your older version of LAMMPS and convert the restart file
to a data file. :dd
{Restart file is a MPI-IO file} :dt
The file is inconsistent with the filename you specified for it. :dd
{Restart file is a multi-proc file} :dt
The file is inconsistent with the filename you specified for it. :dd
{Restart file is not a MPI-IO file} :dt
The file is inconsistent with the filename you specified for it. :dd
{Restart file is not a multi-proc file} :dt
The file is inconsistent with the filename you specified for it. :dd
{Restart variable returned a bad timestep} :dt
The variable must return a timestep greater than the current timestep. :dd
{Restrain atoms %d %d %d %d missing on proc %d at step %ld} :dt
The 4 atoms in a restrain dihedral specified by the fix restrain
command are not all accessible to a processor. This probably means an
atom has moved too far. :dd
{Restrain atoms %d %d %d missing on proc %d at step %ld} :dt
The 3 atoms in a restrain angle specified by the fix restrain
command are not all accessible to a processor. This probably means an
atom has moved too far. :dd
{Restrain atoms %d %d missing on proc %d at step %ld} :dt
The 2 atoms in a restrain bond specified by the fix restrain
command are not all accessible to a processor. This probably means an
atom has moved too far. :dd
{Reuse of compute ID} :dt
A compute ID cannot be used twice. :dd
{Reuse of dump ID} :dt
A dump ID cannot be used twice. :dd
{Reuse of molecule template ID} :dt
The template IDs must be unique. :dd
{Reuse of region ID} :dt
A region ID cannot be used twice. :dd
{Rigid body atoms %d %d missing on proc %d at step %ld} :dt
This means that an atom cannot find the atom that owns the rigid body
it is part of, or vice versa. The solution is to use the communicate
cutoff command to insure ghost atoms are acquired from far enough away
to encompass the max distance printed when the fix rigid/small command
was invoked. :dd
{Rigid body has degenerate moment of inertia} :dt
Fix poems will only work with bodies (collections of atoms) that have
non-zero principal moments of inertia. This means they must be 3 or
more non-collinear atoms, even with joint atoms removed. :dd
{Rigid fix must come before NPT/NPH fix} :dt
NPT/NPH fix must be defined in input script after all rigid fixes,
else the rigid fix contribution to the pressure virial is
incorrect. :dd
{Rmask function in equal-style variable formula} :dt
Rmask is per-atom operation. :dd
{Run command before simulation box is defined} :dt
The run command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Run command start value is after start of run} :dt
Self-explanatory. :dd
{Run command stop value is before end of run} :dt
Self-explanatory. :dd
{Run_style command before simulation box is defined} :dt
The run_style command cannot be used before a read_data,
read_restart, or create_box command. :dd
{SRD bin size for fix srd differs from user request} :dt
Fix SRD had to adjust the bin size to fit the simulation box. See the
cubic keyword if you want this message to be an error vs warning. :dd
{SRD bins for fix srd are not cubic enough} :dt
The bin shape is not within tolerance of cubic. See the cubic
keyword if you want this message to be an error vs warning. :dd
{SRD particle %d started inside big particle %d on step %ld bounce %d} :dt
See the inside keyword if you want this message to be an error vs
warning. :dd
{SRD particle %d started inside wall %d on step %ld bounce %d} :dt
See the inside keyword if you want this message to be an error vs
warning. :dd
{Same dimension twice in fix ave/spatial} :dt
Self-explanatory. :dd
{Sending partition in processors part command is already a sender} :dt
Cannot specify a partition to be a sender twice. :dd
{Set command before simulation box is defined} :dt
The set command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Set command floating point vector does not exist} :dt
Self-explanatory. :dd
{Set command integer vector does not exist} :dt
Self-explanatory. :dd
{Set command with no atoms existing} :dt
No atoms are yet defined so the set command cannot be used. :dd
{Set region ID does not exist} :dt
Region ID specified in set command does not exist. :dd
{Shake angles have different bond types} :dt
All 3-atom angle-constrained SHAKE clusters specified by the fix shake
command that are the same angle type, must also have the same bond
types for the 2 bonds in the angle. :dd
{Shake atoms %d %d %d %d missing on proc %d at step %ld} :dt
The 4 atoms in a single shake cluster specified by the fix shake
command are not all accessible to a processor. This probably means
an atom has moved too far. :dd
{Shake atoms %d %d %d missing on proc %d at step %ld} :dt
The 3 atoms in a single shake cluster specified by the fix shake
command are not all accessible to a processor. This probably means
an atom has moved too far. :dd
{Shake atoms %d %d missing on proc %d at step %ld} :dt
The 2 atoms in a single shake cluster specified by the fix shake
command are not all accessible to a processor. This probably means
an atom has moved too far. :dd
{Shake cluster of more than 4 atoms} :dt
A single cluster specified by the fix shake command can have no more
than 4 atoms. :dd
{Shake clusters are connected} :dt
A single cluster specified by the fix shake command must have a single
central atom with up to 3 other atoms bonded to it. :dd
{Shake determinant = 0.0} :dt
The determinant of the matrix being solved for a single cluster
specified by the fix shake command is numerically invalid. :dd
{Shake fix must come before NPT/NPH fix} :dt
NPT fix must be defined in input script after SHAKE fix, else the
SHAKE fix contribution to the pressure virial is incorrect. :dd
{Shear history overflow, boost neigh_modify one} :dt
There are too many neighbors of a single atom. Use the neigh_modify
command to increase the max number of neighbors allowed for one atom.
You may also want to boost the page size. :dd
{Small to big integers are not sized correctly} :dt
This error occurs whenthe sizes of smallint, imageint, tagint, bigint,
as defined in src/lmptype.h are not what is expected. Contact
the developers if this occurs. :dd
{Smallint setting in lmptype.h is invalid} :dt
It has to be the size of an integer. :dd
{Smallint setting in lmptype.h is not compatible} :dt
Smallint stored in restart file is not consistent with LAMMPS version
you are running. :dd
{Special list size exceeded in fix bond/create} :dt
See the read_data command for info on setting the "extra special per
atom" header value to allow for additional special values to be
stored. :dd
{Specified processors != physical processors} :dt
The 3d grid of processors defined by the processors command does not
match the number of processors LAMMPS is being run on. :dd
{Specified target stress must be uniaxial or hydrostatic} :dt
Self-explanatory. :dd
{Sqrt of negative value in variable formula} :dt
Self-explanatory. :dd
{Subsequent read data induced too many angles per atom} :dt
See the create_box extra/angle/per/atom or read_data "extra angle per
atom" header value to set this limit larger. :dd
{Subsequent read data induced too many bonds per atom} :dt
See the create_box extra/bond/per/atom or read_data "extra bond per
atom" header value to set this limit larger. :dd
{Subsequent read data induced too many dihedrals per atom} :dt
See the create_box extra/dihedral/per/atom or read_data "extra
dihedral per atom" header value to set this limit larger. :dd
{Subsequent read data induced too many impropers per atom} :dt
See the create_box extra/improper/per/atom or read_data "extra
improper per atom" header value to set this limit larger. :dd
{Substitution for illegal variable} :dt
Input script line contained a variable that could not be substituted
for. :dd
{Support for writing images in JPEG format not included} :dt
LAMMPS was not built with the -DLAMMPS_JPEG switch in the Makefile. :dd
{Support for writing images in PNG format not included} :dt
LAMMPS was not built with the -DLAMMPS_PNG switch in the Makefile. :dd
{Support for writing movies not included} :dt
LAMMPS was not built with the -DLAMMPS_FFMPEG switch in the Makefile :dd
{System in data file is too big} :dt
See the setting for bigint in the src/lmptype.h file. :dd
{System is not charge neutral, net charge = %g} :dt
The total charge on all atoms on the system is not 0.0.
For some KSpace solvers this is an error. :dd
{TAD nsteps must be multiple of t_event} :dt
Self-explanatory. :dd
{TIP4P hydrogen has incorrect atom type} :dt
The TIP4P pairwise computation found an H atom whose type does not
agree with the specified H type. :dd
{TIP4P hydrogen is missing} :dt
The TIP4P pairwise computation failed to find the correct H atom
within a water molecule. :dd
{TMD target file did not list all group atoms} :dt
The target file for the fix tmd command did not list all atoms in the
fix group. :dd
{Tad command before simulation box is defined} :dt
Self-explanatory. :dd
{Tagint setting in lmptype.h is invalid} :dt
Tagint must be as large or larger than smallint. :dd
{Tagint setting in lmptype.h is not compatible} :dt
Format of tagint stored in restart file is not consistent with LAMMPS
version you are running. See the settings in src/lmptype.h :dd
{Target pressure for fix rigid/nph cannot be < 0.0} :dt
Self-explanatory. :dd
{Target pressure for fix rigid/npt/small cannot be < 0.0} :dt
Self-explanatory. :dd
{Target temperature for fix nvt/npt/nph cannot be 0.0} :dt
Self-explanatory. :dd
{Target temperature for fix rigid/npt cannot be 0.0} :dt
Self-explanatory. :dd
{Target temperature for fix rigid/npt/small cannot be 0.0} :dt
Self-explanatory. :dd
{Target temperature for fix rigid/nvt cannot be 0.0} :dt
Self-explanatory. :dd
{Target temperature for fix rigid/nvt/small cannot be 0.0} :dt
Self-explanatory. :dd
{Temper command before simulation box is defined} :dt
The temper command cannot be used before a read_data, read_restart, or
create_box command. :dd
{Temperature ID for fix bond/swap does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix box/relax does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix nvt/npt does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix press/berendsen does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix rigid nvt/npt/nph does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix temp/berendsen does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix temp/csld does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix temp/csvr does not exist} :dt
Self-explanatory. :dd
{Temperature ID for fix temp/rescale does not exist} :dt
Self-explanatory. :dd
{Temperature compute degrees of freedom < 0} :dt
This should not happen if you are calculating the temperature
on a valid set of atoms. :dd
{Temperature control can not be used with fix nph} :dt
Self-explanatory. :dd
{Temperature control can not be used with fix nph/asphere} :dt
Self-explanatory. :dd
{Temperature control can not be used with fix nph/body} :dt
Self-explanatory. :dd
{Temperature control can not be used with fix nph/sphere} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nphug} :dt
The temp keyword must be provided. :dd
{Temperature control must be used with fix npt} :dt
Self-explanatory. :dd
{Temperature control must be used with fix npt/asphere} :dt
Self-explanatory. :dd
{Temperature control must be used with fix npt/body} :dt
Self-explanatory. :dd
{Temperature control must be used with fix npt/sphere} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nvt} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nvt/asphere} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nvt/body} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nvt/sllod} :dt
Self-explanatory. :dd
{Temperature control must be used with fix nvt/sphere} :dt
Self-explanatory. :dd
{Temperature control must not be used with fix nph/small} :dt
Self-explanatory. :dd
{Temperature for fix nvt/sllod does not have a bias} :dt
The specified compute must compute temperature with a bias. :dd
{Tempering could not find thermo_pe compute} :dt
This compute is created by the thermo command. It must have been
explicitly deleted by a uncompute command. :dd
{Tempering fix ID is not defined} :dt
The fix ID specified by the temper command does not exist. :dd
{Tempering temperature fix is not valid} :dt
The fix specified by the temper command is not one that controls
temperature (nvt or langevin). :dd
{Test_descriptor_string already allocated} :dt
This is an internal error. Contact the developers. :dd
{The package gpu command is required for gpu styles} :dt
Self-explanatory. :dd
{Thermo and fix not computed at compatible times} :dt
Fixes generate values on specific timesteps. The thermo output
does not match these timesteps. :dd
{Thermo compute array is accessed out-of-range} :dt
Self-explanatory. :dd
{Thermo compute does not compute array} :dt
Self-explanatory. :dd
{Thermo compute does not compute scalar} :dt
Self-explanatory. :dd
{Thermo compute does not compute vector} :dt
Self-explanatory. :dd
{Thermo compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Thermo custom variable cannot be indexed} :dt
Self-explanatory. :dd
{Thermo custom variable is not equal-style variable} :dt
Only equal-style variables can be output with thermodynamics, not
atom-style variables. :dd
{Thermo every variable returned a bad timestep} :dt
The variable must return a timestep greater than the current timestep. :dd
{Thermo fix array is accessed out-of-range} :dt
Self-explanatory. :dd
{Thermo fix does not compute array} :dt
Self-explanatory. :dd
{Thermo fix does not compute scalar} :dt
Self-explanatory. :dd
{Thermo fix does not compute vector} :dt
Self-explanatory. :dd
{Thermo fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Thermo keyword in variable requires thermo to use/init pe} :dt
You are using a thermo keyword in a variable that requires
potential energy to be calculated, but your thermo output
does not use it. Add it to your thermo output. :dd
{Thermo keyword in variable requires thermo to use/init press} :dt
You are using a thermo keyword in a variable that requires pressure to
be calculated, but your thermo output does not use it. Add it to your
thermo output. :dd
{Thermo keyword in variable requires thermo to use/init temp} :dt
You are using a thermo keyword in a variable that requires temperature
to be calculated, but your thermo output does not use it. Add it to
your thermo output. :dd
{Thermo style does not use press} :dt
Cannot use thermo_modify to set this parameter since the thermo_style
is not computing this quantity. :dd
{Thermo style does not use temp} :dt
Cannot use thermo_modify to set this parameter since the thermo_style
is not computing this quantity. :dd
{Thermo_modify every variable returned a bad timestep} :dt
The returned timestep is less than or equal to the current timestep. :dd
{Thermo_modify int format does not contain d character} :dt
Self-explanatory. :dd
{Thermo_modify pressure ID does not compute pressure} :dt
The specified compute ID does not compute pressure. :dd
{Thermo_modify temperature ID does not compute temperature} :dt
The specified compute ID does not compute temperature. :dd
{Thermo_style command before simulation box is defined} :dt
The thermo_style command cannot be used before a read_data,
read_restart, or create_box command. :dd
{This variable thermo keyword cannot be used between runs} :dt
Keywords that refer to time (such as cpu, elapsed) do not
make sense in between runs. :dd
{Threshhold for an atom property that isn't allocated} :dt
A dump threshold has been requested on a quantity that is
not defined by the atom style used in this simulation. :dd
{Timestep must be >= 0} :dt
Specified timestep is invalid. :dd
{Too big a problem to use velocity create loop all} :dt
The system size must fit in a 32-bit integer to use this option. :dd
{Too big a timestep for dump dcd} :dt
The timestep must fit in a 32-bit integer to use this dump style. :dd
{Too big a timestep for dump xtc} :dt
The timestep must fit in a 32-bit integer to use this dump style. :dd
{Too few bits for lookup table} :dt
Table size specified via pair_modify command does not work with your
machine's floating point representation. :dd
{Too few lines in %s section of data file} :dt
Self-explanatory. :dd
{Too few values in body lines in data file} :dt
Self-explanatory. :dd
{Too few values in body section of molecule file} :dt
Self-explanatory. :dd
{Too many -pk arguments in command line} :dt
The string formed by concatenating the arguments is too long. Use a
package command in the input script instead. :dd
{Too many MSM grid levels} :dt
The max number of MSM grid levels is hardwired to 10. :dd
{Too many args in variable function} :dt
More args are used than any variable function allows. :dd
{Too many atom pairs for pair bop} :dt
The number of atomic pairs exceeds the expected number. Check your
atomic structure to ensure that it is realistic. :dd
{Too many atom sorting bins} :dt
This is likely due to an immense simulation box that has blown up
to a large size. :dd
{Too many atom triplets for pair bop} :dt
The number of three atom groups for angle determinations exceeds the
expected number. Check your atomic structure to ensure that it is
realistic. :dd
{Too many atoms for dump dcd} :dt
The system size must fit in a 32-bit integer to use this dump
style. :dd
{Too many atoms for dump xtc} :dt
The system size must fit in a 32-bit integer to use this dump
style. :dd
{Too many atoms to dump sort} :dt
Cannot sort when running with more than 2^31 atoms. :dd
{Too many exponent bits for lookup table} :dt
Table size specified via pair_modify command does not work with your
machine's floating point representation. :dd
{Too many groups} :dt
The maximum number of atom groups (including the "all" group) is
given by MAX_GROUP in group.cpp and is 32. :dd
{Too many iterations} :dt
You must use a number of iterations that fit in a 32-bit integer
for minimization. :dd
{Too many lines in one body in data file - boost MAXBODY} :dt
MAXBODY is a setting at the top of the src/read_data.cpp file.
Set it larger and re-compile the code. :dd
{Too many local+ghost atoms for neighbor list} :dt
The number of nlocal + nghost atoms on a processor
is limited by the size of a 32-bit integer with 2 bits
removed for masking 1-2, 1-3, 1-4 neighbors. :dd
{Too many mantissa bits for lookup table} :dt
Table size specified via pair_modify command does not work with your
machine's floating point representation. :dd
{Too many masses for fix shake} :dt
The fix shake command cannot list more masses than there are atom
types. :dd
{Too many molecules for fix poems} :dt
The limit is 2^31 = ~2 billion molecules. :dd
{Too many molecules for fix rigid} :dt
The limit is 2^31 = ~2 billion molecules. :dd
{Too many neighbor bins} :dt
This is likely due to an immense simulation box that has blown up
to a large size. :dd
{Too many timesteps} :dt
The cumulative timesteps must fit in a 64-bit integer. :dd
{Too many timesteps for NEB} :dt
You must use a number of timesteps that fit in a 32-bit integer
for NEB. :dd
{Too many total atoms} :dt
See the setting for bigint in the src/lmptype.h file. :dd
{Too many total bits for bitmapped lookup table} :dt
Table size specified via pair_modify command is too large. Note that
a value of N generates a 2^N size table. :dd
{Too many values in body lines in data file} :dt
Self-explanatory. :dd
{Too many values in body section of molecule file} :dt
Self-explanatory. :dd
{Too much buffered per-proc info for dump} :dt
The size of the buffered string must fit in a 32-bit integer for a
dump. :dd
{Too much per-proc info for dump} :dt
Number of local atoms times number of columns must fit in a 32-bit
integer for dump. :dd
{Tree structure in joint connections} :dt
Fix poems cannot (yet) work with coupled bodies whose joints connect
the bodies in a tree structure. :dd
{Triclinic box skew is too large} :dt
The displacement in a skewed direction must be less than half the box
length in that dimension. E.g. the xy tilt must be between -half and
+half of the x box length. This constraint can be relaxed by using
the box tilt command. :dd
{Tried to convert a double to int, but input_double > INT_MAX} :dt
Self-explanatory. :dd
{Trying to build an occasional neighbor list before initialization completed} :dt
This is not allowed. Source code caller needs to be modified. :dd
{Two fix ave commands using same compute chunk/atom command in incompatible ways} :dt
They are both attempting to "lock" the chunk/atom command so that the
chunk assignments persist for some number of timesteps, but are doing
it in different ways. :dd
{Two groups cannot be the same in fix spring couple} :dt
Self-explanatory. :dd
{USER-CUDA mode requires CUDA variant of min style} :dt
CUDA mode is enabled, so the min style must include a cuda suffix. :dd
{USER-CUDA mode requires CUDA variant of run style} :dt
CUDA mode is enabled, so the run style must include a cuda suffix. :dd
{USER-CUDA package does not yet support comm_style tiled} :dt
Self-explanatory. :dd
{USER-CUDA package requires a cuda enabled atom_style} :dt
Self-explanatory. :dd
{Unable to initialize accelerator for use} :dt
There was a problem initializing an accelerator for the gpu package :dd
{Unbalanced quotes in input line} :dt
No matching end double quote was found following a leading double
quote. :dd
{Unexpected end of -reorder file} :dt
Self-explanatory. :dd
{Unexpected end of AngleCoeffs section} :dt
Read a blank line. :dd
{Unexpected end of BondCoeffs section} :dt
Read a blank line. :dd
{Unexpected end of DihedralCoeffs section} :dt
Read a blank line. :dd
{Unexpected end of ImproperCoeffs section} :dt
Read a blank line. :dd
{Unexpected end of PairCoeffs section} :dt
Read a blank line. :dd
{Unexpected end of custom file} :dt
Self-explanatory. :dd
{Unexpected end of data file} :dt
LAMMPS hit the end of the data file while attempting to read a
section. Something is wrong with the format of the data file. :dd
{Unexpected end of dump file} :dt
A read operation from the file failed. :dd
{Unexpected end of fix rigid file} :dt
A read operation from the file failed. :dd
{Unexpected end of fix rigid/small file} :dt
A read operation from the file failed. :dd
{Unexpected end of molecule file} :dt
Self-explanatory. :dd
{Unexpected end of neb file} :dt
A read operation from the file failed. :dd
{Units command after simulation box is defined} :dt
The units command cannot be used after a read_data, read_restart, or
create_box command. :dd
{Universe/uloop variable count < # of partitions} :dt
A universe or uloop style variable must specify a number of values >= to the
number of processor partitions. :dd
{Unknown angle style} :dt
The choice of angle style is unknown. :dd
{Unknown atom style} :dt
The choice of atom style is unknown. :dd
{Unknown body style} :dt
The choice of body style is unknown. :dd
{Unknown bond style} :dt
The choice of bond style is unknown. :dd
{Unknown category for info is_active()} :dt
Self-explanatory. :dd
{Unknown category for info is_available()} :dt
Self-explanatory. :dd
{Unknown category for info is_defined()} :dt
Self-explanatory. :dd
{Unknown command: %s} :dt
The command is not known to LAMMPS. Check the input script. :dd
{Unknown compute style} :dt
The choice of compute style is unknown. :dd
{Unknown dihedral style} :dt
The choice of dihedral style is unknown. :dd
{Unknown dump reader style} :dt
The choice of dump reader style via the format keyword is unknown. :dd
{Unknown dump style} :dt
The choice of dump style is unknown. :dd
{Unknown error in GPU library} :dt
Self-explanatory. :dd
{Unknown fix style} :dt
The choice of fix style is unknown. :dd
{Unknown identifier in data file: %s} :dt
A section of the data file cannot be read by LAMMPS. :dd
{Unknown improper style} :dt
The choice of improper style is unknown. :dd
{Unknown keyword in thermo_style custom command} :dt
One or more specified keywords are not recognized. :dd
{Unknown kspace style} :dt
The choice of kspace style is unknown. :dd
{Unknown name for info newton category} :dt
Self-explanatory. :dd
{Unknown name for info package category} :dt
Self-explanatory. :dd
{Unknown name for info pair category} :dt
Self-explanatory. :dd
{Unknown pair style} :dt
The choice of pair style is unknown. :dd
{Unknown pair_modify hybrid sub-style} :dt
The choice of sub-style is unknown. :dd
{Unknown region style} :dt
The choice of region style is unknown. :dd
{Unknown section in molecule file} :dt
Self-explanatory. :dd
{Unknown table style in angle style table} :dt
Self-explanatory. :dd
{Unknown table style in bond style table} :dt
Self-explanatory. :dd
{Unknown table style in pair_style command} :dt
Style of table is invalid for use with pair_style table command. :dd
{Unknown unit_style} :dt
Self-explanatory. Check the input script or data file. :dd
{Unrecognized lattice type in MEAM file 1} :dt
The lattice type in an entry of the MEAM library file is not
valid. :dd
{Unrecognized lattice type in MEAM file 2} :dt
The lattice type in an entry of the MEAM parameter file is not
valid. :dd
{Unrecognized pair style in compute pair command} :dt
Self-explanatory. :dd
{Unrecognized virial argument in pair_style command} :dt
Only two options are supported: LAMMPSvirial and KIMvirial :dd
{Unsupported mixing rule in kspace_style ewald/disp} :dt
Only geometric mixing is supported. :dd
{Unsupported order in kspace_style ewald/disp} :dt
Only 1/r^6 dispersion or dipole terms are supported. :dd
{Unsupported order in kspace_style pppm/disp, pair_style %s} :dt
Only pair styles with 1/r and 1/r^6 dependence are currently supported. :dd
{Use cutoff keyword to set cutoff in single mode} :dt
Mode is single so cutoff/multi keyword cannot be used. :dd
{Use cutoff/multi keyword to set cutoff in multi mode} :dt
Mode is multi so cutoff keyword cannot be used. :dd
{Using fix nvt/sllod with inconsistent fix deform remap option} :dt
Fix nvt/sllod requires that deforming atoms have a velocity profile
provided by "remap v" as a fix deform option. :dd
{Using fix nvt/sllod with no fix deform defined} :dt
Self-explanatory. :dd
{Using fix srd with inconsistent fix deform remap option} :dt
When shearing the box in an SRD simulation, the remap v option for fix
deform needs to be used. :dd
{Using pair lubricate with inconsistent fix deform remap option} :dt
Must use remap v option with fix deform with this pair style. :dd
{Using pair lubricate/poly with inconsistent fix deform remap option} :dt
If fix deform is used, the remap v option is required. :dd
{Using suffix cuda without USER-CUDA package enabled} :dt
Self-explanatory. :dd
{Using suffix gpu without GPU package installed} :dt
Self-explanatory. :dd
{Using suffix intel without USER-INTEL package installed} :dt
Self-explanatory. :dd
{Using suffix kk without KOKKOS package enabled} :dt
Self-explanatory. :dd
{Using suffix omp without USER-OMP package installed} :dt
Self-explanatory. :dd
{Using update dipole flag requires atom attribute mu} :dt
Self-explanatory. :dd
{Using update dipole flag requires atom style sphere} :dt
Self-explanatory. :dd
{Variable ID in variable formula does not exist} :dt
Self-explanatory. :dd
{Variable atom ID is too large} :dt
Specified ID is larger than the maximum allowed atom ID. :dd
{Variable evaluation before simulation box is defined} :dt
Cannot evaluate a compute or fix or atom-based value in a variable
before the simulation has been setup. :dd
{Variable evaluation in fix wall gave bad value} :dt
The returned value for epsilon or sigma < 0.0. :dd
{Variable evaluation in region gave bad value} :dt
Variable returned a radius < 0.0. :dd
{Variable for compute ti is invalid style} :dt
Self-explanatory. :dd
{Variable for create_atoms is invalid style} :dt
The variables must be equal-style variables. :dd
{Variable for displace_atoms is invalid style} :dt
It must be an equal-style or atom-style variable. :dd
{Variable for dump every is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for dump image center is invalid style} :dt
Must be an equal-style variable. :dd
{Variable for dump image persp is invalid style} :dt
Must be an equal-style variable. :dd
{Variable for dump image phi is invalid style} :dt
Must be an equal-style variable. :dd
{Variable for dump image theta is invalid style} :dt
Must be an equal-style variable. :dd
{Variable for dump image zoom is invalid style} :dt
Must be an equal-style variable. :dd
{Variable for fix adapt is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix addforce is invalid style} :dt
Self-explanatory. :dd
{Variable for fix aveforce is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix deform is invalid style} :dt
The variable must be an equal-style variable. :dd
{Variable for fix efield is invalid style} :dt
The variable must be an equal- or atom-style variable. :dd
{Variable for fix gravity is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix heat is invalid style} :dt
Only equal-style or atom-style variables can be used. :dd
{Variable for fix indent is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix indent is not equal style} :dt
Only equal-style variables can be used. :dd
{Variable for fix langevin is invalid style} :dt
It must be an equal-style variable. :dd
{Variable for fix move is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix setforce is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix temp/berendsen is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix temp/csld is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix temp/csvr is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix temp/rescale is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix wall is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix wall/reflect is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for fix wall/srd is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for group dynamic is invalid style} :dt
The variable must be an atom-style variable. :dd
{Variable for group is invalid style} :dt
Only atom-style variables can be used. :dd
{Variable for region cylinder is invalid style} :dt
Only equal-style variables are allowed. :dd
{Variable for region is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for region is not equal style} :dt
Self-explanatory. :dd
{Variable for region sphere is invalid style} :dt
Only equal-style variables are allowed. :dd
{Variable for restart is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for set command is invalid style} :dt
Only atom-style variables can be used. :dd
{Variable for thermo every is invalid style} :dt
Only equal-style variables can be used. :dd
{Variable for velocity set is invalid style} :dt
Only atom-style variables can be used. :dd
{Variable for voronoi radius is not atom style} :dt
Self-explanatory. :dd
{Variable formula compute array is accessed out-of-range} :dt
Self-explanatory. :dd
{Variable formula compute vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Variable formula fix array is accessed out-of-range} :dt
Self-explanatory. :dd
{Variable formula fix vector is accessed out-of-range} :dt
Self-explanatory. :dd
{Variable has circular dependency} :dt
A circular dependency is when variable "a" in used by variable "b" and
variable "b" is also used by variable "a". Circular dependencies with
longer chains of dependence are also not allowed. :dd
{Variable name between brackets must be alphanumeric or underscore characters} :dt
Self-explanatory. :dd
{Variable name for compute chunk/atom does not exist} :dt
Self-explanatory. :dd
{Variable name for compute reduce does not exist} :dt
Self-explanatory. :dd
{Variable name for compute ti does not exist} :dt
Self-explanatory. :dd
{Variable name for create_atoms does not exist} :dt
Self-explanatory. :dd
{Variable name for displace_atoms does not exist} :dt
Self-explanatory. :dd
{Variable name for dump every does not exist} :dt
Self-explanatory. :dd
{Variable name for dump image center does not exist} :dt
Self-explanatory. :dd
{Variable name for dump image persp does not exist} :dt
Self-explanatory. :dd
{Variable name for dump image phi does not exist} :dt
Self-explanatory. :dd
{Variable name for dump image theta does not exist} :dt
Self-explanatory. :dd
{Variable name for dump image zoom does not exist} :dt
Self-explanatory. :dd
{Variable name for fix adapt does not exist} :dt
Self-explanatory. :dd
{Variable name for fix addforce does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/atom does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/chunk does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/correlate does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/histo does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/spatial does not exist} :dt
Self-explanatory. :dd
{Variable name for fix ave/time does not exist} :dt
Self-explanatory. :dd
{Variable name for fix aveforce does not exist} :dt
Self-explanatory. :dd
{Variable name for fix deform does not exist} :dt
Self-explanatory. :dd
{Variable name for fix efield does not exist} :dt
Self-explanatory. :dd
{Variable name for fix gravity does not exist} :dt
Self-explanatory. :dd
{Variable name for fix heat does not exist} :dt
Self-explanatory. :dd
{Variable name for fix indent does not exist} :dt
Self-explanatory. :dd
{Variable name for fix langevin does not exist} :dt
Self-explanatory. :dd
{Variable name for fix move does not exist} :dt
Self-explanatory. :dd
{Variable name for fix setforce does not exist} :dt
Self-explanatory. :dd
{Variable name for fix store/state does not exist} :dt
Self-explanatory. :dd
{Variable name for fix temp/berendsen does not exist} :dt
Self-explanatory. :dd
{Variable name for fix temp/csld does not exist} :dt
Self-explanatory. :dd
{Variable name for fix temp/csvr does not exist} :dt
Self-explanatory. :dd
{Variable name for fix temp/rescale does not exist} :dt
Self-explanatory. :dd
{Variable name for fix vector does not exist} :dt
Self-explanatory. :dd
{Variable name for fix wall does not exist} :dt
Self-explanatory. :dd
{Variable name for fix wall/reflect does not exist} :dt
Self-explanatory. :dd
{Variable name for fix wall/srd does not exist} :dt
Self-explanatory. :dd
{Variable name for group does not exist} :dt
Self-explanatory. :dd
{Variable name for group dynamic does not exist} :dt
Self-explanatory. :dd
{Variable name for region cylinder does not exist} :dt
Self-explanatory. :dd
{Variable name for region does not exist} :dt
Self-explanatory. :dd
{Variable name for region sphere does not exist} :dt
Self-explanatory. :dd
{Variable name for restart does not exist} :dt
Self-explanatory. :dd
{Variable name for set command does not exist} :dt
Self-explanatory. :dd
{Variable name for thermo every does not exist} :dt
Self-explanatory. :dd
{Variable name for velocity set does not exist} :dt
Self-explanatory. :dd
{Variable name for voronoi radius does not exist} :dt
Self-explanatory. :dd
{Variable name must be alphanumeric or underscore characters} :dt
Self-explanatory. :dd
{Variable uses atom property that isn't allocated} :dt
Self-explanatory. :dd
{Velocity command before simulation box is defined} :dt
The velocity command cannot be used before a read_data, read_restart,
or create_box command. :dd
{Velocity command with no atoms existing} :dt
A velocity command has been used, but no atoms yet exist. :dd
{Velocity ramp in z for a 2d problem} :dt
Self-explanatory. :dd
{Velocity rigid used with non-rigid fix-ID} :dt
Self-explanatory. :dd
{Velocity temperature ID does calculate a velocity bias} :dt
The specified compute must compute a bias for temperature. :dd
{Velocity temperature ID does not compute temperature} :dt
The compute ID given to the velocity command must compute
temperature. :dd
{Verlet/split can only currently be used with comm_style brick} :dt
This is a current restriction in LAMMPS. :dd
{Verlet/split does not yet support TIP4P} :dt
This is a current limitation. :dd
{Verlet/split requires 2 partitions} :dt
See the -partition command-line switch. :dd
{Verlet/split requires Rspace partition layout be multiple of Kspace partition layout in each dim} :dt
This is controlled by the processors command. :dd
{Verlet/split requires Rspace partition size be multiple of Kspace partition size} :dt
This is so there is an equal number of Rspace processors for every
Kspace processor. :dd
{Virial was not tallied on needed timestep} :dt
You are using a thermo keyword that requires potentials to
have tallied the virial, but they didn't on this timestep. See the
variable doc page for ideas on how to make this work. :dd
{Voro++ error: narea and neigh have a different size} :dt
This error is returned by the Voro++ library. :dd
{Wall defined twice in fix wall command} :dt
Self-explanatory. :dd
{Wall defined twice in fix wall/reflect command} :dt
Self-explanatory. :dd
{Wall defined twice in fix wall/srd command} :dt
Self-explanatory. :dd
{Water H epsilon must be 0.0 for pair style lj/cut/tip4p/cut} :dt
This is because LAMMPS does not compute the Lennard-Jones interactions
with these particles for efficiency reasons. :dd
{Water H epsilon must be 0.0 for pair style lj/cut/tip4p/long} :dt
This is because LAMMPS does not compute the Lennard-Jones interactions
with these particles for efficiency reasons. :dd
{Water H epsilon must be 0.0 for pair style lj/long/tip4p/long} :dt
This is because LAMMPS does not compute the Lennard-Jones interactions
with these particles for efficiency reasons. :dd
{World variable count doesn't match # of partitions} :dt
A world-style variable must specify a number of values equal to the
number of processor partitions. :dd
{Write_data command before simulation box is defined} :dt
Self-explanatory. :dd
{Write_restart command before simulation box is defined} :dt
The write_restart command cannot be used before a read_data,
read_restart, or create_box command. :dd
{Writing to MPI-IO filename when MPIIO package is not installed} :dt
Self-explanatory. :dd
{Zero length rotation vector with displace_atoms} :dt
Self-explanatory. :dd
{Zero length rotation vector with fix move} :dt
Self-explanatory. :dd
{Zero-length lattice orient vector} :dt
Self-explanatory. :dd
:dle
Warnings: :h4,link(warn)
:dlb
{Adjusting Coulombic cutoff for MSM, new cutoff = %g} :dt
The adjust/cutoff command is turned on and the Coulombic cutoff has been
adjusted to match the user-specified accuracy. :dd
{Angle atoms missing at step %ld} :dt
One or more of 3 atoms needed to compute a particular angle are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the angle has blown apart and an atom is
too far away. :dd
{Angle style in data file differs from currently defined angle style} :dt
Self-explanatory. :dd
{Atom style in data file differs from currently defined atom style} :dt
Self-explanatory. :dd
{Bond atom missing in box size check} :dt
The 2nd atoms needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away. :dd
{Bond atom missing in image check} :dt
The 2nd atom in a particular bond is missing on this processor.
Typically this is because the pairwise cutoff is set too short or the
bond has blown apart and an atom is too far away. :dd
{Bond atoms missing at step %ld} :dt
The 2nd atom needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away. :dd
{Bond style in data file differs from currently defined bond style} :dt
Self-explanatory. :dd
{Bond/angle/dihedral extent > half of periodic box length} :dt
This is a restriction because LAMMPS can be confused about which image
of an atom in the bonded interaction is the correct one to use.
"Extent" in this context means the maximum end-to-end length of the
bond/angle/dihedral. LAMMPS computes this by taking the maximum bond
length, multiplying by the number of bonds in the interaction (e.g. 3
for a dihedral) and adding a small amount of stretch. :dd
{Both groups in compute group/group have a net charge; the Kspace boundary correction to energy will be non-zero} :dt
Self-explanatory. :dd
{Calling write_dump before a full system init.} :dt
The write_dump command is used before the system has been fully
initialized as part of a 'run' or 'minimize' command. Not all dump
styles and features are fully supported at this point and thus the
command may fail or produce incomplete or incorrect output. Insert
a "run 0" command, if a full system init is required. :dd
{Cannot count rigid body degrees-of-freedom before bodies are fully initialized} :dt
This means the temperature associated with the rigid bodies may be
incorrect on this timestep. :dd
{Cannot count rigid body degrees-of-freedom before bodies are initialized} :dt
This means the temperature associated with the rigid bodies may be
incorrect on this timestep. :dd
{Cannot include log terms without 1/r terms; setting flagHI to 1} :dt
Self-explanatory. :dd
{Cannot include log terms without 1/r terms; setting flagHI to 1.} :dt
Self-explanatory. :dd
{Charges are set, but coulombic solver is not used} :dt
Self-explanatory. :dd
{Charges did not converge at step %ld: %lg} :dt
Self-explanatory. :dd
{Communication cutoff is too small for SNAP micro load balancing, increased to %lf} :dt
Self-explanatory. :dd
{Compute cna/atom cutoff may be too large to find ghost atom neighbors} :dt
The neighbor cutoff used may not encompass enough ghost atoms
to perform this operation correctly. :dd
{Computing temperature of portions of rigid bodies} :dt
The group defined by the temperature compute does not encompass all
the atoms in one or more rigid bodies, so the change in
degrees-of-freedom for the atoms in those partial rigid bodies will
not be accounted for. :dd
{Create_bonds max distance > minimum neighbor cutoff} :dt
This means atom pairs for some atom types may not be in the neighbor
list and thus no bond can be created between them. :dd
{Delete_atoms cutoff > minimum neighbor cutoff} :dt
This means atom pairs for some atom types may not be in the neighbor
list and thus an atom in that pair cannot be deleted. :dd
{Dihedral atoms missing at step %ld} :dt
One or more of 4 atoms needed to compute a particular dihedral are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the dihedral has blown apart and an atom is
too far away. :dd
{Dihedral problem} :dt
Conformation of the 4 listed dihedral atoms is extreme; you may want
to check your simulation geometry. :dd
{Dihedral problem: %d %ld %d %d %d %d} :dt
Conformation of the 4 listed dihedral atoms is extreme; you may want
to check your simulation geometry. :dd
{Dihedral style in data file differs from currently defined dihedral style} :dt
Self-explanatory. :dd
{Dump dcd/xtc timestamp may be wrong with fix dt/reset} :dt
If the fix changes the timestep, the dump dcd file will not
reflect the change. :dd
+{Energy due to X extra global DOFs will be included in minimizer energies} :dt
+
+When using fixes like box/relax, the potential energy used by the minimizer
+is augmented by an additional energy provided by the fix. Thus the printed
+converged energy may be different from the total potential energy. :dd
+
{Energy tally does not account for 'zero yes'} :dt
The energy removed by using the 'zero yes' flag is not accounted
for in the energy tally and thus energy conservation cannot be
monitored in this case. :dd
{Estimated error in splitting of dispersion coeffs is %g} :dt
Error is greater than 0.0001 percent. :dd
{Ewald/disp Newton solver failed, using old method to estimate g_ewald} :dt
Self-explanatory. Choosing a different cutoff value may help. :dd
{FENE bond too long} :dt
A FENE bond has stretched dangerously far. It's interaction strength
will be truncated to attempt to prevent the bond from blowing up. :dd
{FENE bond too long: %ld %d %d %g} :dt
A FENE bond has stretched dangerously far. It's interaction strength
will be truncated to attempt to prevent the bond from blowing up. :dd
{FENE bond too long: %ld %g} :dt
A FENE bond has stretched dangerously far. It's interaction strength
will be truncated to attempt to prevent the bond from blowing up. :dd
{Fix SRD walls overlap but fix srd overlap not set} :dt
You likely want to set this in your input script. :dd
{Fix bond/swap will ignore defined angles} :dt
See the doc page for fix bond/swap for more info on this
restriction. :dd
{Fix deposit near setting < possible overlap separation %g} :dt
This test is performed for finite size particles with a diameter, not
for point particles. The near setting is smaller than the particle
diameter which can lead to overlaps. :dd
{Fix evaporate may delete atom with non-zero molecule ID} :dt
This is probably an error, since you should not delete only one atom
of a molecule. :dd
{Fix gcmc using full_energy option} :dt
Fix gcmc has automatically turned on the full_energy option since it
is required for systems like the one specified by the user. User input
included one or more of the following: kspace, triclinic, a hybrid
pair style, an eam pair style, or no "single" function for the pair
style. :dd
{Fix property/atom mol or charge w/out ghost communication} :dt
A model typically needs these properties defined for ghost atoms. :dd
{Fix qeq CG convergence failed (%g) after %d iterations at %ld step} :dt
Self-explanatory. :dd
{Fix qeq has non-zero lower Taper radius cutoff} :dt
Absolute value must be <= 0.01. :dd
{Fix qeq has very low Taper radius cutoff} :dt
Value should typically be >= 5.0. :dd
{Fix qeq/dynamic tolerance may be too small for damped dynamics} :dt
Self-explanatory. :dd
{Fix qeq/fire tolerance may be too small for damped fires} :dt
Self-explanatory. :dd
{Fix rattle should come after all other integration fixes} :dt
This fix is designed to work after all other integration fixes change
atom positions. Thus it should be the last integration fix specified.
If not, it will not satisfy the desired constraints as well as it
otherwise would. :dd
{Fix recenter should come after all other integration fixes} :dt
Other fixes may change the position of the center-of-mass, so
fix recenter should come last. :dd
{Fix srd SRD moves may trigger frequent reneighboring} :dt
This is because the SRD particles may move long distances. :dd
{Fix srd grid size > 1/4 of big particle diameter} :dt
This may cause accuracy problems. :dd
{Fix srd particle moved outside valid domain} :dt
This may indicate a problem with your simulation parameters. :dd
{Fix srd particles may move > big particle diameter} :dt
This may cause accuracy problems. :dd
{Fix srd viscosity < 0.0 due to low SRD density} :dt
This may cause accuracy problems. :dd
{Fix thermal/conductivity comes before fix ave/spatial} :dt
The order of these 2 fixes in your input script is such that fix
thermal/conductivity comes first. If you are using fix ave/spatial to
measure the temperature profile induced by fix viscosity, then this
may cause a glitch in the profile since you are averaging immediately
after swaps have occurred. Flipping the order of the 2 fixes
typically helps. :dd
{Fix viscosity comes before fix ave/spatial} :dt
The order of these 2 fixes in your input script is such that
fix viscosity comes first. If you are using fix ave/spatial
to measure the velocity profile induced by fix viscosity, then
this may cause a glitch in the profile since you are averaging
immediately after swaps have occurred. Flipping the order
of the 2 fixes typically helps. :dd
{Fixes cannot send data in Kokkos communication, switching to classic communication} :dt
This is current restriction with Kokkos. :dd
{For better accuracy use 'pair_modify table 0'} :dt
The user-specified force accuracy cannot be achieved unless the table
feature is disabled by using 'pair_modify table 0'. :dd
{Geometric mixing assumed for 1/r^6 coefficients} :dt
Self-explanatory. :dd
{Group for fix_modify temp != fix group} :dt
The fix_modify command is specifying a temperature computation that
computes a temperature on a different group of atoms than the fix
itself operates on. This is probably not what you want to do. :dd
{H matrix size has been exceeded: m_fill=%d H.m=%d\n} :dt
This is the size of the matrix. :dd
{Ignoring unknown or incorrect info command flag} :dt
Self-explanatory. An unknown argument was given to the info command.
Compare your input with the documentation. :dd
{Improper atoms missing at step %ld} :dt
One or more of 4 atoms needed to compute a particular improper are
missing on this processor. Typically this is because the pairwise
cutoff is set too short or the improper has blown apart and an atom is
too far away. :dd
{Improper problem: %d %ld %d %d %d %d} :dt
Conformation of the 4 listed improper atoms is extreme; you may want
to check your simulation geometry. :dd
{Improper style in data file differs from currently defined improper style} :dt
Self-explanatory. :dd
{Inconsistent image flags} :dt
The image flags for a pair on bonded atoms appear to be inconsistent.
Inconsistent means that when the coordinates of the two atoms are
unwrapped using the image flags, the two atoms are far apart.
Specifically they are further apart than half a periodic box length.
Or they are more than a box length apart in a non-periodic dimension.
This is usually due to the initial data file not having correct image
flags for the 2 atoms in a bond that straddles a periodic boundary.
They should be different by 1 in that case. This is a warning because
inconsistent image flags will not cause problems for dynamics or most
LAMMPS simulations. However they can cause problems when such atoms
are used with the fix rigid or replicate commands. Note that if you
have an infinite periodic crystal with bonds then it is impossible to
have fully consistent image flags, since some bonds will cross
periodic boundaries and connect two atoms with the same image
flag. :dd
{KIM Model does not provide 'energy'; Potential energy will be zero} :dt
Self-explanatory. :dd
{KIM Model does not provide 'forces'; Forces will be zero} :dt
Self-explanatory. :dd
{KIM Model does not provide 'particleEnergy'; energy per atom will be zero} :dt
Self-explanatory. :dd
{KIM Model does not provide 'particleVirial'; virial per atom will be zero} :dt
Self-explanatory. :dd
{Kspace_modify slab param < 2.0 may cause unphysical behavior} :dt
The kspace_modify slab parameter should be larger to insure periodic
grids padded with empty space do not overlap. :dd
{Less insertions than requested} :dt
The fix pour command was unsuccessful at finding open space
for as many particles as it tried to insert. :dd
{Library error in lammps_gather_atoms} :dt
This library function cannot be used if atom IDs are not defined
or are not consecutively numbered. :dd
{Library error in lammps_scatter_atoms} :dt
This library function cannot be used if atom IDs are not defined or
are not consecutively numbered, or if no atom map is defined. See the
atom_modify command for details about atom maps. :dd
{Lost atoms via change_box: original %ld current %ld} :dt
The command options you have used caused atoms to be lost. :dd
{Lost atoms via displace_atoms: original %ld current %ld} :dt
The command options you have used caused atoms to be lost. :dd
{Lost atoms: original %ld current %ld} :dt
Lost atoms are checked for each time thermo output is done. See the
thermo_modify lost command for options. Lost atoms usually indicate
bad dynamics, e.g. atoms have been blown far out of the simulation
box, or moved further than one processor's sub-domain away before
reneighboring. :dd
{MSM mesh too small, increasing to 2 points in each direction} :dt
Self-explanatory. :dd
{Mismatch between velocity and compute groups} :dt
The temperature computation used by the velocity command will not be
on the same group of atoms that velocities are being set for. :dd
{Mixing forced for lj coefficients} :dt
Self-explanatory. :dd
{Molecule attributes do not match system attributes} :dt
An attribute is specified (e.g. diameter, charge) that is
not defined for the specified atom style. :dd
{Molecule has bond topology but no special bond settings} :dt
This means the bonded atoms will not be excluded in pair-wise
interactions. :dd
{Molecule template for create_atoms has multiple molecules} :dt
The create_atoms command will only create molecules of a single type,
i.e. the first molecule in the template. :dd
{Molecule template for fix gcmc has multiple molecules} :dt
The fix gcmc command will only create molecules of a single type,
i.e. the first molecule in the template. :dd
{Molecule template for fix shake has multiple molecules} :dt
The fix shake command will only recognize molecules of a single
type, i.e. the first molecule in the template. :dd
{More than one compute centro/atom} :dt
It is not efficient to use compute centro/atom more than once. :dd
{More than one compute cluster/atom} :dt
It is not efficient to use compute cluster/atom more than once. :dd
{More than one compute cna/atom defined} :dt
It is not efficient to use compute cna/atom more than once. :dd
{More than one compute contact/atom} :dt
It is not efficient to use compute contact/atom more than once. :dd
{More than one compute coord/atom} :dt
It is not efficient to use compute coord/atom more than once. :dd
{More than one compute damage/atom} :dt
It is not efficient to use compute ke/atom more than once. :dd
{More than one compute dilatation/atom} :dt
Self-explanatory. :dd
{More than one compute erotate/sphere/atom} :dt
It is not efficient to use compute erorate/sphere/atom more than once. :dd
{More than one compute hexorder/atom} :dt
It is not efficient to use compute hexorder/atom more than once. :dd
{More than one compute ke/atom} :dt
It is not efficient to use compute ke/atom more than once. :dd
{More than one compute orientorder/atom} :dt
It is not efficient to use compute orientorder/atom more than once. :dd
{More than one compute plasticity/atom} :dt
Self-explanatory. :dd
{More than one compute sna/atom} :dt
Self-explanatory. :dd
{More than one compute snad/atom} :dt
Self-explanatory. :dd
{More than one compute snav/atom} :dt
Self-explanatory. :dd
{More than one fix poems} :dt
It is not efficient to use fix poems more than once. :dd
{More than one fix rigid} :dt
It is not efficient to use fix rigid more than once. :dd
{Neighbor exclusions used with KSpace solver may give inconsistent Coulombic energies} :dt
This is because excluding specific pair interactions also excludes
them from long-range interactions which may not be the desired effect.
The special_bonds command handles this consistently by insuring
excluded (or weighted) 1-2, 1-3, 1-4 interactions are treated
consistently by both the short-range pair style and the long-range
solver. This is not done for exclusions of charged atom pairs via the
neigh_modify exclude command. :dd
{New thermo_style command, previous thermo_modify settings will be lost} :dt
If a thermo_style command is used after a thermo_modify command, the
settings changed by the thermo_modify command will be reset to their
default values. This is because the thermo_modify command acts on
the currently defined thermo style, and a thermo_style command creates
a new style. :dd
{No Kspace calculation with verlet/split} :dt
The 2nd partition performs a kspace calculation so the kspace_style
command must be used. :dd
{No automatic unit conversion to XTC file format conventions possible for units lj} :dt
This means no scaling will be performed. :dd
{No fixes defined, atoms won't move} :dt
If you are not using a fix like nve, nvt, npt then atom velocities and
coordinates will not be updated during timestepping. :dd
{No joints between rigid bodies, use fix rigid instead} :dt
The bodies defined by fix poems are not connected by joints. POEMS
will integrate the body motion, but it would be more efficient to use
fix rigid. :dd
{Not using real units with pair reax} :dt
This is most likely an error, unless you have created your own ReaxFF
parameter file in a different set of units. :dd
{Number of MSM mesh points changed to be a multiple of 2} :dt
MSM requires that the number of grid points in each direction be a multiple
of two and the number of grid points in one or more directions have been
adjusted to meet this requirement. :dd
{OMP_NUM_THREADS environment is not set.} :dt
This environment variable must be set appropriately to use the
USER-OMP package. :dd
{One or more atoms are time integrated more than once} :dt
This is probably an error since you typically do not want to
advance the positions or velocities of an atom more than once
per timestep. :dd
{One or more chunks do not contain all atoms in molecule} :dt
This may not be what you intended. :dd
{One or more dynamic groups may not be updated at correct point in timestep} :dt
If there are other fixes that act immediately after the initial stage
of time integration within a timestep (i.e. after atoms move), then
the command that sets up the dynamic group should appear after those
fixes. This will insure that dynamic group assignments are made
after all atoms have moved. :dd
{One or more respa levels compute no forces} :dt
This is computationally inefficient. :dd
{Pair COMB charge %.10f with force %.10f hit max barrier} :dt
Something is possibly wrong with your model. :dd
{Pair COMB charge %.10f with force %.10f hit min barrier} :dt
Something is possibly wrong with your model. :dd
{Pair brownian needs newton pair on for momentum conservation} :dt
Self-explanatory. :dd
{Pair dpd needs newton pair on for momentum conservation} :dt
Self-explanatory. :dd
{Pair dsmc: num_of_collisions > number_of_A} :dt
Collision model in DSMC is breaking down. :dd
{Pair dsmc: num_of_collisions > number_of_B} :dt
Collision model in DSMC is breaking down. :dd
{Pair style in data file differs from currently defined pair style} :dt
Self-explanatory. :dd
{Particle deposition was unsuccessful} :dt
The fix deposit command was not able to insert as many atoms as
needed. The requested volume fraction may be too high, or other atoms
may be in the insertion region. :dd
{Proc sub-domain size < neighbor skin, could lead to lost atoms} :dt
The decomposition of the physical domain (likely due to load
balancing) has led to a processor's sub-domain being smaller than the
neighbor skin in one or more dimensions. Since reneighboring is
triggered by atoms moving the skin distance, this may lead to lost
atoms, if an atom moves all the way across a neighboring processor's
sub-domain before reneighboring is triggered. :dd
{Reducing PPPM order b/c stencil extends beyond nearest neighbor processor} :dt
This may lead to a larger grid than desired. See the kspace_modify overlap
command to prevent changing of the PPPM order. :dd
{Reducing PPPMDisp Coulomb order b/c stencil extends beyond neighbor processor} :dt
This may lead to a larger grid than desired. See the kspace_modify overlap
command to prevent changing of the PPPM order. :dd
{Reducing PPPMDisp dispersion order b/c stencil extends beyond neighbor processor} :dt
This may lead to a larger grid than desired. See the kspace_modify overlap
command to prevent changing of the PPPM order. :dd
{Replacing a fix, but new group != old group} :dt
The ID and style of a fix match for a fix you are changing with a fix
command, but the new group you are specifying does not match the old
group. :dd
{Replicating in a non-periodic dimension} :dt
The parameters for a replicate command will cause a non-periodic
dimension to be replicated; this may cause unwanted behavior. :dd
{Resetting reneighboring criteria during PRD} :dt
A PRD simulation requires that neigh_modify settings be delay = 0,
every = 1, check = yes. Since these settings were not in place,
LAMMPS changed them and will restore them to their original values
after the PRD simulation. :dd
{Resetting reneighboring criteria during TAD} :dt
A TAD simulation requires that neigh_modify settings be delay = 0,
every = 1, check = yes. Since these settings were not in place,
LAMMPS changed them and will restore them to their original values
after the PRD simulation. :dd
{Resetting reneighboring criteria during minimization} :dt
Minimization requires that neigh_modify settings be delay = 0, every =
1, check = yes. Since these settings were not in place, LAMMPS
changed them and will restore them to their original values after the
minimization. :dd
{Restart file used different # of processors} :dt
The restart file was written out by a LAMMPS simulation running on a
different number of processors. Due to round-off, the trajectories of
your restarted simulation may diverge a little more quickly than if
you ran on the same # of processors. :dd
{Restart file used different 3d processor grid} :dt
The restart file was written out by a LAMMPS simulation running on a
different 3d grid of processors. Due to round-off, the trajectories
of your restarted simulation may diverge a little more quickly than if
you ran on the same # of processors. :dd
{Restart file used different boundary settings, using restart file values} :dt
Your input script cannot change these restart file settings. :dd
{Restart file used different newton bond setting, using restart file value} :dt
The restart file value will override the setting in the input script. :dd
{Restart file used different newton pair setting, using input script value} :dt
The input script value will override the setting in the restart file. :dd
{Restrain problem: %d %ld %d %d %d %d} :dt
Conformation of the 4 listed dihedral atoms is extreme; you may want
to check your simulation geometry. :dd
{Running PRD with only one replica} :dt
This is allowed, but you will get no parallel speed-up. :dd
{SRD bin shifting turned on due to small lamda} :dt
This is done to try to preserve accuracy. :dd
{SRD bin size for fix srd differs from user request} :dt
Fix SRD had to adjust the bin size to fit the simulation box. See the
cubic keyword if you want this message to be an error vs warning. :dd
{SRD bins for fix srd are not cubic enough} :dt
The bin shape is not within tolerance of cubic. See the cubic
keyword if you want this message to be an error vs warning. :dd
{SRD particle %d started inside big particle %d on step %ld bounce %d} :dt
See the inside keyword if you want this message to be an error vs
warning. :dd
{SRD particle %d started inside wall %d on step %ld bounce %d} :dt
See the inside keyword if you want this message to be an error vs
warning. :dd
{Shake determinant < 0.0} :dt
The determinant of the quadratic equation being solved for a single
cluster specified by the fix shake command is numerically suspect. LAMMPS
will set it to 0.0 and continue. :dd
{Shell command '%s' failed with error '%s'} :dt
Self-explanatory. :dd
{Shell command returned with non-zero status} :dt
This may indicate the shell command did not operate as expected. :dd
{Should not allow rigid bodies to bounce off relecting walls} :dt
LAMMPS allows this, but their dynamics are not computed correctly. :dd
{Should not use fix nve/limit with fix shake or fix rattle} :dt
This will lead to invalid constraint forces in the SHAKE/RATTLE
computation. :dd
{Simulations might be very slow because of large number of structure factors} :dt
Self-explanatory. :dd
{Slab correction not needed for MSM} :dt
Slab correction is intended to be used with Ewald or PPPM and is not needed by MSM. :dd
{System is not charge neutral, net charge = %g} :dt
The total charge on all atoms on the system is not 0.0.
For some KSpace solvers this is only a warning. :dd
{Table inner cutoff >= outer cutoff} :dt
You specified an inner cutoff for a Coulombic table that is longer
than the global cutoff. Probably not what you wanted. :dd
{Temperature for MSST is not for group all} :dt
User-assigned temperature to MSST fix does not compute temperature for
all atoms. Since MSST computes a global pressure, the kinetic energy
contribution from the temperature is assumed to also be for all atoms.
Thus the pressure used by MSST could be inaccurate. :dd
{Temperature for NPT is not for group all} :dt
User-assigned temperature to NPT fix does not compute temperature for
all atoms. Since NPT computes a global pressure, the kinetic energy
contribution from the temperature is assumed to also be for all atoms.
Thus the pressure used by NPT could be inaccurate. :dd
{Temperature for fix modify is not for group all} :dt
The temperature compute is being used with a pressure calculation
which does operate on group all, so this may be inconsistent. :dd
{Temperature for thermo pressure is not for group all} :dt
User-assigned temperature to thermo via the thermo_modify command does
not compute temperature for all atoms. Since thermo computes a global
pressure, the kinetic energy contribution from the temperature is
assumed to also be for all atoms. Thus the pressure printed by thermo
could be inaccurate. :dd
{The fix ave/spatial command has been replaced by the more flexible fix ave/chunk and compute chunk/atom commands -- fix ave/spatial will be removed in the summer of 2015} :dt
Self-explanatory. :dd
{The minimizer does not re-orient dipoles when using fix efield} :dt
This means that only the atom coordinates will be minimized,
not the orientation of the dipoles. :dd
{Too many common neighbors in CNA %d times} :dt
More than the maximum # of neighbors was found multiple times. This
was unexpected. :dd
{Too many inner timesteps in fix ttm} :dt
Self-explanatory. :dd
{Too many neighbors in CNA for %d atoms} :dt
More than the maximum # of neighbors was found multiple times. This
was unexpected. :dd
{Triclinic box skew is large} :dt
The displacement in a skewed direction is normally required to be less
than half the box length in that dimension. E.g. the xy tilt must be
between -half and +half of the x box length. You have relaxed the
constraint using the box tilt command, but the warning means that a
LAMMPS simulation may be inefficient as a result. :dd
{Use special bonds = 0,1,1 with bond style fene} :dt
Most FENE models need this setting for the special_bonds command. :dd
{Use special bonds = 0,1,1 with bond style fene/expand} :dt
Most FENE models need this setting for the special_bonds command. :dd
{Using a manybody potential with bonds/angles/dihedrals and special_bond exclusions} :dt
This is likely not what you want to do. The exclusion settings will
eliminate neighbors in the neighbor list, which the manybody potential
needs to calculated its terms correctly. :dd
{Using compute temp/deform with inconsistent fix deform remap option} :dt
Fix nvt/sllod assumes deforming atoms have a velocity profile provided
by "remap v" or "remap none" as a fix deform option. :dd
{Using compute temp/deform with no fix deform defined} :dt
This is probably an error, since it makes little sense to use
compute temp/deform in this case. :dd
{Using fix srd with box deformation but no SRD thermostat} :dt
The deformation will heat the SRD particles so this can
be dangerous. :dd
{Using kspace solver on system with no charge} :dt
Self-explanatory. :dd
{Using largest cut-off for lj/long/dipole/long long long} :dt
Self-explanatory. :dd
{Using largest cutoff for buck/long/coul/long} :dt
Self-explanatory. :dd
{Using largest cutoff for lj/long/coul/long} :dt
Self-explanatory. :dd
{Using largest cutoff for pair_style lj/long/tip4p/long} :dt
Self-explanatory. :dd
{Using package gpu without any pair style defined} :dt
Self-explanatory. :dd
{Using pair potential shift with pair_modify compute no} :dt
The shift effects will thus not be computed. :dd
{Using pair tail corrections with nonperiodic system} :dt
This is probably a bogus thing to do, since tail corrections are
computed by integrating the density of a periodic system out to
infinity. :dd
{Using pair tail corrections with pair_modify compute no} :dt
The tail corrections will thus not be computed. :dd
{pair style reax is now deprecated and will soon be retired. Users should switch to pair_style reax/c} :dt
Self-explanatory. :dd
:dle
diff --git a/doc/src/Section_intro.txt b/doc/src/Section_intro.txt
index 33c3cf395..bfb6ef390 100644
--- a/doc/src/Section_intro.txt
+++ b/doc/src/Section_intro.txt
@@ -1,540 +1,544 @@
"Previous Section"_Manual.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_start.html :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
1. Introduction :h3
This section provides an overview of what LAMMPS can and can't do,
describes what it means for LAMMPS to be an open-source code, and
acknowledges the funding and people who have contributed to LAMMPS
over the years.
1.1 "What is LAMMPS"_#intro_1
1.2 "LAMMPS features"_#intro_2
1.3 "LAMMPS non-features"_#intro_3
1.4 "Open source distribution"_#intro_4
1.5 "Acknowledgments and citations"_#intro_5 :all(b)
:line
:line
1.1 What is LAMMPS :link(intro_1),h4
LAMMPS is a classical molecular dynamics code that models an ensemble
of particles in a liquid, solid, or gaseous state. It can model
atomic, polymeric, biological, metallic, granular, and coarse-grained
systems using a variety of force fields and boundary conditions.
For examples of LAMMPS simulations, see the Publications page of the
"LAMMPS WWW Site"_lws.
LAMMPS runs efficiently on single-processor desktop or laptop
machines, but is designed for parallel computers. It will run on any
parallel machine that compiles C++ and supports the "MPI"_mpi
message-passing library. This includes distributed- or shared-memory
parallel machines and Beowulf-style clusters.
:link(mpi,http://www-unix.mcs.anl.gov/mpi)
LAMMPS can model systems with only a few particles up to millions or
billions. See "Section 8"_Section_perf.html for information on
LAMMPS performance and scalability, or the Benchmarks section of the
"LAMMPS WWW Site"_lws.
LAMMPS is a freely-available open-source code, distributed under the
terms of the "GNU Public License"_gnu, which means you can use or
modify the code however you wish. See "this section"_#intro_4 for a
brief discussion of the open-source philosophy.
:link(gnu,http://www.gnu.org/copyleft/gpl.html)
LAMMPS is designed to be easy to modify or extend with new
capabilities, such as new force fields, atom types, boundary
conditions, or diagnostics. See "Section 10"_Section_modify.html
for more details.
The current version of LAMMPS is written in C++. Earlier versions
were written in F77 and F90. See
"Section 13"_Section_history.html for more information on
different versions. All versions can be downloaded from the "LAMMPS
WWW Site"_lws.
LAMMPS was originally developed under a US Department of Energy CRADA
(Cooperative Research and Development Agreement) between two DOE labs
and 3 companies. It is distributed by "Sandia National Labs"_snl.
See "this section"_#intro_5 for more information on LAMMPS funding and
individuals who have contributed to LAMMPS.
:link(snl,http://www.sandia.gov)
In the most general sense, LAMMPS integrates Newton's equations of
motion for collections of atoms, molecules, or macroscopic particles
that interact via short- or long-range forces with a variety of
initial and/or boundary conditions. For computational efficiency
LAMMPS uses neighbor lists to keep track of nearby particles. The
lists are optimized for systems with particles that are repulsive at
short distances, so that the local density of particles never becomes
too large. On parallel machines, LAMMPS uses spatial-decomposition
techniques to partition the simulation domain into small 3d
sub-domains, one of which is assigned to each processor. Processors
communicate and store "ghost" atom information for atoms that border
their sub-domain. LAMMPS is most efficient (in a parallel sense) for
systems whose particles fill a 3d rectangular box with roughly uniform
density. Papers with technical details of the algorithms used in
LAMMPS are listed in "this section"_#intro_5.
:line
1.2 LAMMPS features :link(intro_2),h4
This section highlights LAMMPS features, with pointers to specific
commands which give more details. If LAMMPS doesn't have your
favorite interatomic potential, boundary condition, or atom type, see
"Section 10"_Section_modify.html, which describes how you can add
it to LAMMPS.
General features :h5
runs on a single processor or in parallel
distributed-memory message-passing parallelism (MPI)
spatial-decomposition of simulation domain for parallelism
open-source distribution
highly portable C++
optional libraries used: MPI and single-processor FFT
GPU (CUDA and OpenCL), Intel(R) Xeon Phi(TM) coprocessors, and OpenMP support for many code features
easy to extend with new features and functionality
runs from an input script
syntax for defining and using variables and formulas
syntax for looping over runs and breaking out of loops
run one or multiple simulations simultaneously (in parallel) from one script
build as library, invoke LAMMPS thru library interface or provided Python wrapper
couple with other codes: LAMMPS calls other code, other code calls LAMMPS, umbrella code calls both :ul
Particle and model types :h5
("atom style"_atom_style.html command)
atoms
coarse-grained particles (e.g. bead-spring polymers)
united-atom polymers or organic molecules
all-atom polymers, organic molecules, proteins, DNA
metals
granular materials
coarse-grained mesoscale models
finite-size spherical and ellipsoidal particles
finite-size line segment (2d) and triangle (3d) particles
point dipole particles
rigid collections of particles
hybrid combinations of these :ul
Force fields :h5
("pair style"_pair_style.html, "bond style"_bond_style.html,
"angle style"_angle_style.html, "dihedral style"_dihedral_style.html,
"improper style"_improper_style.html, "kspace style"_kspace_style.html
commands)
pairwise potentials: Lennard-Jones, Buckingham, Morse, Born-Mayer-Huggins, \
Yukawa, soft, class 2 (COMPASS), hydrogen bond, tabulated
charged pairwise potentials: Coulombic, point-dipole
manybody potentials: EAM, Finnis/Sinclair EAM, modified EAM (MEAM), \
embedded ion method (EIM), EDIP, ADP, Stillinger-Weber, Tersoff, \
REBO, AIREBO, ReaxFF, COMB, SNAP, Streitz-Mintmire, 3-body polymorphic
long-range interactions for charge, point-dipoles, and LJ dispersion: \
Ewald, Wolf, PPPM (similar to particle-mesh Ewald)
polarization models: "QEq"_fix_qeq.html, \
"core/shell model"_Section_howto.html#howto_26, \
"Drude dipole model"_Section_howto.html#howto_27
charge equilibration (QEq via dynamic, point, shielded, Slater methods)
coarse-grained potentials: DPD, GayBerne, REsquared, colloidal, DLVO
mesoscopic potentials: granular, Peridynamics, SPH
electron force field (eFF, AWPMD)
bond potentials: harmonic, FENE, Morse, nonlinear, class 2, \
quartic (breakable)
angle potentials: harmonic, CHARMM, cosine, cosine/squared, cosine/periodic, \
class 2 (COMPASS)
dihedral potentials: harmonic, CHARMM, multi-harmonic, helix, \
class 2 (COMPASS), OPLS
improper potentials: harmonic, cvff, umbrella, class 2 (COMPASS)
polymer potentials: all-atom, united-atom, bead-spring, breakable
water potentials: TIP3P, TIP4P, SPC
implicit solvent potentials: hydrodynamic lubrication, Debye
force-field compatibility with common CHARMM, AMBER, DREIDING, \
OPLS, GROMACS, COMPASS options
access to "KIM archive"_http://openkim.org of potentials via \
"pair kim"_pair_kim.html
hybrid potentials: multiple pair, bond, angle, dihedral, improper \
potentials can be used in one simulation
overlaid potentials: superposition of multiple pair potentials :ul
Atom creation :h5
("read_data"_read_data.html, "lattice"_lattice.html,
"create_atoms"_create_atoms.html, "delete_atoms"_delete_atoms.html,
"displace_atoms"_displace_atoms.html, "replicate"_replicate.html commands)
read in atom coords from files
create atoms on one or more lattices (e.g. grain boundaries)
delete geometric or logical groups of atoms (e.g. voids)
replicate existing atoms multiple times
displace atoms :ul
Ensembles, constraints, and boundary conditions :h5
("fix"_fix.html command)
2d or 3d systems
orthogonal or non-orthogonal (triclinic symmetry) simulation domains
constant NVE, NVT, NPT, NPH, Parinello/Rahman integrators
thermostatting options for groups and geometric regions of atoms
pressure control via Nose/Hoover or Berendsen barostatting in 1 to 3 dimensions
simulation box deformation (tensile and shear)
harmonic (umbrella) constraint forces
rigid body constraints
SHAKE bond and angle constraints
Monte Carlo bond breaking, formation, swapping
atom/molecule insertion and deletion
walls of various kinds
non-equilibrium molecular dynamics (NEMD)
variety of additional boundary conditions and constraints :ul
Integrators :h5
("run"_run.html, "run_style"_run_style.html, "minimize"_minimize.html commands)
velocity-Verlet integrator
Brownian dynamics
rigid body integration
energy minimization via conjugate gradient or steepest descent relaxation
rRESPA hierarchical timestepping
rerun command for post-processing of dump files :ul
Diagnostics :h5
see the various flavors of the "fix"_fix.html and "compute"_compute.html commands :ul
Output :h5
("dump"_dump.html, "restart"_restart.html commands)
log file of thermodynamic info
text dump files of atom coords, velocities, other per-atom quantities
binary restart files
parallel I/O of dump and restart files
per-atom quantities (energy, stress, centro-symmetry parameter, CNA, etc)
user-defined system-wide (log file) or per-atom (dump file) calculations
spatial and time averaging of per-atom quantities
time averaging of system-wide quantities
atom snapshots in native, XYZ, XTC, DCD, CFG formats :ul
Multi-replica models :h5
"nudged elastic band"_neb.html
"parallel replica dynamics"_prd.html
"temperature accelerated dynamics"_tad.html
"parallel tempering"_temper.html
Pre- and post-processing :h5
Various pre- and post-processing serial tools are packaged
with LAMMPS; see these "doc pages"_Section_tools.html. :ulb,l
Our group has also written and released a separate toolkit called
"Pizza.py"_pizza which provides tools for doing setup, analysis,
plotting, and visualization for LAMMPS simulations. Pizza.py is
written in "Python"_python and is available for download from "the
Pizza.py WWW site"_pizza. :l
:ule
:link(pizza,http://www.sandia.gov/~sjplimp/pizza.html)
:link(python,http://www.python.org)
Specialized features :h5
-These are LAMMPS capabilities which you may not think of as typical
-molecular dynamics options:
+LAMMPS can be built with optional packages which implement a variety
+of additional capabilities. An overview of all the packages is "given
+here"_Section_packages.html.
+
+These are some LAMMPS capabilities which you may not think of as
+typical classical molecular dynamics options:
"static"_balance.html and "dynamic load-balancing"_fix_balance.html
"generalized aspherical particles"_body.html
"stochastic rotation dynamics (SRD)"_fix_srd.html
"real-time visualization and interactive MD"_fix_imd.html
calculate "virtual diffraction patterns"_compute_xrd.html
"atom-to-continuum coupling"_fix_atc.html with finite elements
coupled rigid body integration via the "POEMS"_fix_poems.html library
"QM/MM coupling"_fix_qmmm.html
"path-integral molecular dynamics (PIMD)"_fix_ipi.html and "this as well"_fix_pimd.html
Monte Carlo via "GCMC"_fix_gcmc.html and "tfMC"_fix_tfmc.html "atom swapping"_fix_atom_swap.html and "bond swapping"_fix_bond_swap.html
"Direct Simulation Monte Carlo"_pair_dsmc.html for low-density fluids
"Peridynamics mesoscale modeling"_pair_peri.html
"Lattice Boltzmann fluid"_fix_lb_fluid.html
"targeted"_fix_tmd.html and "steered"_fix_smd.html molecular dynamics
"two-temperature electron model"_fix_ttm.html :ul
:line
1.3 LAMMPS non-features :link(intro_3),h4
LAMMPS is designed to efficiently compute Newton's equations of motion
for a system of interacting particles. Many of the tools needed to
pre- and post-process the data for such simulations are not included
in the LAMMPS kernel for several reasons:
the desire to keep LAMMPS simple
they are not parallel operations
other codes already do them
limited development resources :ul
Specifically, LAMMPS itself does not:
run thru a GUI
build molecular systems
assign force-field coefficients automagically
perform sophisticated analyses of your MD simulation
visualize your MD simulation
plot your output data :ul
A few tools for pre- and post-processing tasks are provided as part of
the LAMMPS package; they are described in "this
section"_Section_tools.html. However, many people use other codes or
write their own tools for these tasks.
As noted above, our group has also written and released a separate
toolkit called "Pizza.py"_pizza which addresses some of the listed
bullets. It provides tools for doing setup, analysis, plotting, and
visualization for LAMMPS simulations. Pizza.py is written in
"Python"_python and is available for download from "the Pizza.py WWW
site"_pizza.
LAMMPS requires as input a list of initial atom coordinates and types,
molecular topology information, and force-field coefficients assigned
to all atoms and bonds. LAMMPS will not build molecular systems and
assign force-field parameters for you.
For atomic systems LAMMPS provides a "create_atoms"_create_atoms.html
command which places atoms on solid-state lattices (fcc, bcc,
user-defined, etc). Assigning small numbers of force field
coefficients can be done via the "pair coeff"_pair_coeff.html, "bond
coeff"_bond_coeff.html, "angle coeff"_angle_coeff.html, etc commands.
For molecular systems or more complicated simulation geometries, users
typically use another code as a builder and convert its output to
LAMMPS input format, or write their own code to generate atom
coordinate and molecular topology for LAMMPS to read in.
For complicated molecular systems (e.g. a protein), a multitude of
topology information and hundreds of force-field coefficients must
typically be specified. We suggest you use a program like
"CHARMM"_charmm or "AMBER"_amber or other molecular builders to setup
such problems and dump its information to a file. You can then
reformat the file as LAMMPS input. Some of the tools in "this
section"_Section_tools.html can assist in this process.
Similarly, LAMMPS creates output files in a simple format. Most users
post-process these files with their own analysis tools or re-format
them for input into other programs, including visualization packages.
If you are convinced you need to compute something on-the-fly as
LAMMPS runs, see "Section 10"_Section_modify.html for a discussion
of how you can use the "dump"_dump.html and "compute"_compute.html and
"fix"_fix.html commands to print out data of your choosing. Keep in
mind that complicated computations can slow down the molecular
dynamics timestepping, particularly if the computations are not
parallel, so it is often better to leave such analysis to
post-processing codes.
For high-quality visualization we recommend the
following packages:
"VMD"_http://www.ks.uiuc.edu/Research/vmd
"AtomEye"_http://mt.seas.upenn.edu/Archive/Graphics/A
"OVITO"_http://www.ovito.org/
"ParaView"_http://www.paraview.org/
"PyMol"_http://www.pymol.org
"Raster3d"_http://www.bmsc.washington.edu/raster3d/raster3d.html
"RasMol"_http://www.openrasmol.org :ul
Other features that LAMMPS does not yet (and may never) support are
discussed in "Section 13"_Section_history.html.
Finally, these are freely-available molecular dynamics codes, most of
them parallel, which may be well-suited to the problems you want to
model. They can also be used in conjunction with LAMMPS to perform
complementary modeling tasks.
"CHARMM"_charmm
"AMBER"_amber
"NAMD"_namd
"NWCHEM"_nwchem
"DL_POLY"_dlpoly
"Tinker"_tinker :ul
:link(charmm,http://www.charmm.org)
:link(amber,http://ambermd.org)
:link(namd,http://www.ks.uiuc.edu/Research/namd/)
:link(nwchem,http://www.emsl.pnl.gov/docs/nwchem/nwchem.html)
:link(dlpoly,http://www.ccp5.ac.uk/DL_POLY_CLASSIC)
:link(tinker,http://dasher.wustl.edu/tinker)
CHARMM, AMBER, NAMD, NWCHEM, and Tinker are designed primarily for
modeling biological molecules. CHARMM and AMBER use
atom-decomposition (replicated-data) strategies for parallelism; NAMD
and NWCHEM use spatial-decomposition approaches, similar to LAMMPS.
Tinker is a serial code. DL_POLY includes potentials for a variety of
biological and non-biological materials; both a replicated-data and
spatial-decomposition version exist.
:line
1.4 Open source distribution :link(intro_4),h4
LAMMPS comes with no warranty of any kind. As each source file states
in its header, it is a copyrighted code that is distributed free-of-
charge, under the terms of the "GNU Public License"_gnu (GPL). This
is often referred to as open-source distribution - see
"www.gnu.org"_gnuorg or "www.opensource.org"_opensource for more
details. The legal text of the GPL is in the LICENSE file that is
included in the LAMMPS distribution.
:link(gnuorg,http://www.gnu.org)
:link(opensource,http://www.opensource.org)
Here is a summary of what the GPL means for LAMMPS users:
(1) Anyone is free to use, modify, or extend LAMMPS in any way they
choose, including for commercial purposes.
(2) If you distribute a modified version of LAMMPS, it must remain
open-source, meaning you distribute it under the terms of the GPL.
You should clearly annotate such a code as a derivative version of
LAMMPS.
(3) If you release any code that includes LAMMPS source code, then it
must also be open-sourced, meaning you distribute it under the terms
of the GPL.
(4) If you give LAMMPS files to someone else, the GPL LICENSE file and
source file headers (including the copyright and GPL notices) should
remain part of the code.
In the spirit of an open-source code, these are various ways you can
contribute to making LAMMPS better. You can send email to the
"developers"_http://lammps.sandia.gov/authors.html on any of these
items.
Point prospective users to the "LAMMPS WWW Site"_lws. Mention it in
talks or link to it from your WWW site. :ulb,l
If you find an error or omission in this manual or on the "LAMMPS WWW
Site"_lws, or have a suggestion for something to clarify or include,
send an email to the
"developers"_http://lammps.sandia.gov/authors.html. :l
If you find a bug, "Section 12.2"_Section_errors.html#err_2
describes how to report it. :l
If you publish a paper using LAMMPS results, send the citation (and
any cool pictures or movies if you like) to add to the Publications,
Pictures, and Movies pages of the "LAMMPS WWW Site"_lws, with links
and attributions back to you. :l
Create a new Makefile.machine that can be added to the src/MAKE
directory. :l
The tools sub-directory of the LAMMPS distribution has various
stand-alone codes for pre- and post-processing of LAMMPS data. More
details are given in "Section 9"_Section_tools.html. If you write
a new tool that users will find useful, it can be added to the LAMMPS
distribution. :l
LAMMPS is designed to be easy to extend with new code for features
like potentials, boundary conditions, diagnostic computations, etc.
"This section"_Section_modify.html gives details. If you add a
feature of general interest, it can be added to the LAMMPS
distribution. :l
The Benchmark page of the "LAMMPS WWW Site"_lws lists LAMMPS
performance on various platforms. The files needed to run the
benchmarks are part of the LAMMPS distribution. If your machine is
sufficiently different from those listed, your timing data can be
added to the page. :l
You can send feedback for the User Comments page of the "LAMMPS WWW
Site"_lws. It might be added to the page. No promises. :l
Cash. Small denominations, unmarked bills preferred. Paper sack OK.
Leave on desk. VISA also accepted. Chocolate chip cookies
encouraged. :l
:ule
:line
1.5 Acknowledgments and citations :h4,link(intro_5)
LAMMPS development has been funded by the "US Department of
Energy"_doe (DOE), through its CRADA, LDRD, ASCI, and Genomes-to-Life
programs and its "OASCR"_oascr and "OBER"_ober offices.
Specifically, work on the latest version was funded in part by the US
Department of Energy's Genomics:GTL program
("www.doegenomestolife.org"_gtl) under the "project"_ourgtl, "Carbon
Sequestration in Synechococcus Sp.: From Molecular Machines to
Hierarchical Modeling".
:link(doe,http://www.doe.gov)
:link(gtl,http://www.doegenomestolife.org)
:link(ourgtl,http://www.genomes2life.org)
:link(oascr,http://www.sc.doe.gov/ascr/home.html)
:link(ober,http://www.er.doe.gov/production/ober/ober_top.html)
The following paper describe the basic parallel algorithms used in
LAMMPS. If you use LAMMPS results in your published work, please cite
this paper and include a pointer to the "LAMMPS WWW Site"_lws
(http://lammps.sandia.gov):
S. Plimpton, [Fast Parallel Algorithms for Short-Range Molecular
Dynamics], J Comp Phys, 117, 1-19 (1995).
Other papers describing specific algorithms used in LAMMPS are listed
under the "Citing LAMMPS link"_http://lammps.sandia.gov/cite.html of
the LAMMPS WWW page.
The "Publications link"_http://lammps.sandia.gov/papers.html on the
LAMMPS WWW page lists papers that have cited LAMMPS. If your paper is
not listed there for some reason, feel free to send us the info. If
the simulations in your paper produced cool pictures or animations,
we'll be pleased to add them to the
"Pictures"_http://lammps.sandia.gov/pictures.html or
"Movies"_http://lammps.sandia.gov/movies.html pages of the LAMMPS WWW
site.
The core group of LAMMPS developers is at Sandia National Labs:
Steve Plimpton, sjplimp at sandia.gov
Aidan Thompson, athomps at sandia.gov
Paul Crozier, pscrozi at sandia.gov :ul
The following folks are responsible for significant contributions to
the code, or other aspects of the LAMMPS development effort. Many of
the packages they have written are somewhat unique to LAMMPS and the
code would not be as general-purpose as it is without their expertise
and efforts.
-Axel Kohlmeyer (Temple U), akohlmey at gmail.com, SVN and Git repositories, indefatigable mail list responder, USER-CG-CMM and USER-OMP packages
+Axel Kohlmeyer (Temple U), akohlmey at gmail.com, SVN and Git repositories, indefatigable mail list responder, USER-CGSDK and USER-OMP packages
Roy Pollock (LLNL), Ewald and PPPM solvers
Mike Brown (ORNL), brownw at ornl.gov, GPU package
Greg Wagner (Sandia), gjwagne at sandia.gov, MEAM package for MEAM potential
Mike Parks (Sandia), mlparks at sandia.gov, PERI package for Peridynamics
Rudra Mukherjee (JPL), Rudranarayan.M.Mukherjee at jpl.nasa.gov, POEMS package for articulated rigid body motion
Reese Jones (Sandia) and collaborators, rjones at sandia.gov, USER-ATC package for atom/continuum coupling
Ilya Valuev (JIHT), valuev at physik.hu-berlin.de, USER-AWPMD package for wave-packet MD
Christian Trott (U Tech Ilmenau), christian.trott at tu-ilmenau.de, USER-CUDA package
Andres Jaramillo-Botero (Caltech), ajaramil at wag.caltech.edu, USER-EFF package for electron force field
Christoph Kloss (JKU), Christoph.Kloss at jku.at, USER-LIGGGHTS package for granular models and granular/fluid coupling
Metin Aktulga (LBL), hmaktulga at lbl.gov, USER-REAXC package for C version of ReaxFF
Georg Gunzenmuller (EMI), georg.ganzenmueller at emi.fhg.de, USER-SPH package :ul
As discussed in "Section 13"_Section_history.html, LAMMPS
originated as a cooperative project between DOE labs and industrial
partners. Folks involved in the design and testing of the original
version of LAMMPS were the following:
John Carpenter (Mayo Clinic, formerly at Cray Research)
Terry Stouch (Lexicon Pharmaceuticals, formerly at Bristol Myers Squibb)
Steve Lustig (Dupont)
Jim Belak (LLNL) :ul
diff --git a/doc/src/Section_packages.txt b/doc/src/Section_packages.txt
index b327b7b1c..2a0a8386e 100644
--- a/doc/src/Section_packages.txt
+++ b/doc/src/Section_packages.txt
@@ -1,1904 +1,2601 @@
"Previous Section"_Section_commands.html - "LAMMPS WWW Site"_lws -
"LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next
Section"_Section_accelerate.html :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
4. Packages :h3
-This section gives an overview of the add-on optional packages that
-extend LAMMPS functionality. Packages are groups of files that enable
-a specific set of features. For example, force fields for molecular
-systems or granular systems are in packages. You can see the list of
-all packages by typing "make package" from within the src directory of
-the LAMMPS distribution.
-
-Here are links for two tables below, which list standard and user
-packages.
-
-4.1 "Standard packages"_#pkg_1
-4.2 "User packages"_#pkg_2 :all(b)
-
-"Section 2.3"_Section_start.html#start_3 of the manual describes
-the difference between standard packages and user packages. It also
-has general details on how to include/exclude specific packages as
-part of the LAMMPS build process, and on how to build auxiliary
-libraries or modify a machine Makefile if a package requires it.
-
-Following the two tables below, is a sub-section for each package. It
-has a summary of what the package contains. It has specific
-instructions on how to install it, build or obtain any auxiliary
-library it requires, and any Makefile.machine changes it requires. It
-also lists pointers to examples of its use or documentation provided
-in the LAMMPS distribution. If you want to know the complete list of
-commands that a package adds to LAMMPS, simply list the files in its
-directory, e.g. "ls src/GRANULAR". Source files with names that start
-with compute, fix, pair, bond, etc correspond to command styles with
-the same names.
-
-NOTE: The USER package sub-sections below are still being filled in,
-as of March 2016.
-
-Unless otherwise noted below, every package is independent of all the
-others. I.e. any package can be included or excluded in a LAMMPS
-build, independent of all other packages. However, note that some
-packages include commands derived from commands in other packages. If
-the other package is not installed, the derived command from the new
-package will also not be installed when you include the new one.
-E.g. the pair lj/cut/coul/long/omp command from the USER-OMP package
-will not be installed as part of the USER-OMP package if the KSPACE
-package is not also installed, since it contains the pair
-lj/cut/coul/long command. If you later install the KSPACE package and
-the USER-OMP package is already installed, both the pair
-lj/cut/coul/long and lj/cut/coul/long/omp commands will be installed.
-
-:line
-
-4.1 Standard packages :h4,link(pkg_1)
-
-The current list of standard packages is as follows. Each package
-name links to a sub-section below with more details.
-
-Package, Description, Author(s), Doc page, Example, Library
-"ASPHERE"_#ASPHERE, aspherical particles, -, "Section 6.6.14"_Section_howto.html#howto_14, ellipse, -
-"BODY"_#BODY, body-style particles, -, "body"_body.html, body, -
-"CLASS2"_#CLASS2, class 2 force fields, -, "pair_style lj/class2"_pair_class2.html, -, -
-"COLLOID"_#COLLOID, colloidal particles, Kumar (1), "atom_style colloid"_atom_style.html, colloid, -
-"COMPRESS"_#COMPRESS, I/O compression, Axel Kohlmeyer (Temple U), "dump */gz"_dump.html, -, -
-"CORESHELL"_#CORESHELL, adiabatic core/shell model, Hendrik Heenen (Technical U of Munich), "Section 6.6.25"_Section_howto.html#howto_25, coreshell, -
-"DIPOLE"_#DIPOLE, point dipole particles, -, "pair_style dipole/cut"_pair_dipole.html, dipole, -
-"GPU"_#GPU, GPU-enabled styles, Mike Brown (ORNL), "Section 5.3.1"_accelerate_gpu.html, gpu, lib/gpu
-"GRANULAR"_#GRANULAR, granular systems, -, "Section 6.6.6"_Section_howto.html#howto_6, pour, -
-"KIM"_#KIM, openKIM potentials, Smirichinski & Elliot & Tadmor (3), "pair_style kim"_pair_kim.html, kim, KIM
-"KOKKOS"_#KOKKOS, Kokkos-enabled styles, Trott & Moore (4), "Section 5.3.3"_accelerate_kokkos.html, kokkos, lib/kokkos
-"KSPACE"_#KSPACE, long-range Coulombic solvers, -, "kspace_style"_kspace_style.html, peptide, -
-"MANYBODY"_#MANYBODY, many-body potentials, -, "pair_style tersoff"_pair_tersoff.html, shear, -
-"MEAM"_#MEAM, modified EAM potential, Greg Wagner (Sandia), "pair_style meam"_pair_meam.html, meam, lib/meam
-"MC"_#MC, Monte Carlo options, -, "fix gcmc"_fix_gcmc.html, -, -
-"MOLECULE"_#MOLECULE, molecular system force fields, -, "Section 6.6.3"_Section_howto.html#howto_3, peptide, -
-"OPT"_#OPT, optimized pair styles, Fischer & Richie & Natoli (2), "Section 5.3.5"_accelerate_opt.html, -, -
-"PERI"_#PERI, Peridynamics models, Mike Parks (Sandia), "pair_style peri"_pair_peri.html, peri, -
-"POEMS"_#POEMS, coupled rigid body motion, Rudra Mukherjee (JPL), "fix poems"_fix_poems.html, rigid, lib/poems
-"PYTHON"_#PYTHON, embed Python code in an input script, -, "python"_python.html, python, lib/python
-"REAX"_#REAX, ReaxFF potential, Aidan Thompson (Sandia), "pair_style reax"_pair_reax.html, reax, lib/reax
-"REPLICA"_#REPLICA, multi-replica methods, -, "Section 6.6.5"_Section_howto.html#howto_5, tad, -
-"RIGID"_#RIGID, rigid bodies, -, "fix rigid"_fix_rigid.html, rigid, -
-"SHOCK"_#SHOCK, shock loading methods, -, "fix msst"_fix_msst.html, -, -
-"SNAP"_#SNAP, quantum-fit potential, Aidan Thompson (Sandia), "pair snap"_pair_snap.html, snap, -
-"SRD"_#SRD, stochastic rotation dynamics, -, "fix srd"_fix_srd.html, srd, -
-"VORONOI"_#VORONOI, Voronoi tesselations, Daniel Schwen (LANL), "compute voronoi/atom"_compute_voronoi_atom.html, -, Voro++
-:tb(ea=c)
-
-The "Authors" column lists a name(s) if a specific person is
-responsible for creating and maintaining the package.
-
-(1) The COLLOID package includes Fast Lubrication Dynamics pair styles
-which were created by Amit Kumar and Michael Bybee from Jonathan
-Higdon's group at UIUC.
-
-(2) The OPT package was created by James Fischer (High Performance
-Technologies), David Richie, and Vincent Natoli (Stone Ridge
-Technolgy).
-
-(3) The KIM package was created by Valeriu Smirichinski, Ryan Elliott,
-and Ellad Tadmor (U Minn).
-
-(4) The KOKKOS package was created primarily by Christian Trott and
-Stan Moore (Sandia). It uses the Kokkos library which was developed
-by Carter Edwards, Christian Trott, and others at Sandia.
+This section gives an overview of the optional packages that extend
+LAMMPS functionality with instructions on how to build LAMMPS with
+each of them. Packages are groups of files that enable a specific set
+of features. For example, force fields for molecular systems or
+granular systems are in packages. You can see the list of all
+packages and "make" commands to manage them by typing "make package"
+from within the src directory of the LAMMPS distribution. "Section
+2.3"_Section_start.html#start_3 gives general info on how to install
+and un-install packages as part of the LAMMPS build process.
+
+There are two kinds of packages in LAMMPS, standard and user packages:
+
+"Table of standard packages"_#table_standard
+"Table of user packages"_#table_user :ul
+
+Standard packages are supported by the LAMMPS developers and are
+written in a syntax and style consistent with the rest of LAMMPS.
+This means the developers will answer questions about them, debug and
+fix them if necessary, and keep them compatible with future changes to
+LAMMPS.
+
+User packages have been contributed by users, and begin with the
+"user" prefix. If they are a single command (single file), they are
+typically in the user-misc package. User packages don't necessarily
+meet the requirements of the standard packages. If you have problems
+using a feature provided in a user package, you may need to contact
+the contributor directly to get help. Information on how to submit
+additions you make to LAMMPS as single files or as a standard or user
+package are given in "this section"_Section_modify.html#mod_15 of the
+manual.
+
+Following the next two tables is a sub-section for each package. It
+lists authors (if applicable) and summarizes the package contents. It
+has specific instructions on how to install the package, including (if
+necessary) downloading or building any extra library it requires. It
+also gives links to documentation, example scripts, and
+pictures/movies (if available) that illustrate use of the package.
+
+NOTE: To see the complete list of commands a package adds to LAMMPS,
+just look at the files in its src directory, e.g. "ls src/GRANULAR".
+Files with names that start with fix, compute, atom, pair, bond,
+angle, etc correspond to commands with the same style names.
+
+In these two tables, the "Example" column is a sub-directory in the
+examples directory of the distribution which has an input script that
+uses the package. E.g. "peptide" refers to the examples/peptide
+directory; USER/atc refers to the examples/USER/atc directory. The
+"Library" column indicates whether an extra library is needed to build
+and use the package:
+
+dash = no library
+sys = system library: you likely have it on your machine
+int = internal library: provided with LAMMPS, but you may need to build it
+ext = external library: you will need to download and install it on your machine :ul
-The "Doc page" column links to either a sub-section of the
-"Section 6"_Section_howto.html of the manual, or an input script
-command implemented as part of the package, or to additional
-documentation provided within the package.
-
-The "Example" column is a sub-directory in the examples directory of
-the distribution which has an input script that uses the package.
-E.g. "peptide" refers to the examples/peptide directory.
+:line
+:line
-The "Library" column lists an external library which must be built
-first and which LAMMPS links to when it is built. If it is listed as
-lib/package, then the code for the library is under the lib directory
-of the LAMMPS distribution. See the lib/package/README file for info
-on how to build the library. If it is not listed as lib/package, then
-it is a third-party library not included in the LAMMPS distribution.
-See details on all of this below for individual packages.
+[Standard packages] :link(table_standard),p
+
+Package, Description, Doc page, Example, Library
+"ASPHERE"_#ASPHERE, aspherical particle models, "Section 6.6.14"_Section_howto.html#howto_14, ellipse, -
+"BODY"_#BODY, body-style particles, "body"_body.html, body, -
+"CLASS2"_#CLASS2, class 2 force fields, "pair_style lj/class2"_pair_class2.html, -, -
+"COLLOID"_#COLLOID, colloidal particles, "atom_style colloid"_atom_style.html, colloid, -
+"COMPRESS"_#COMPRESS, I/O compression, "dump */gz"_dump.html, -, sys
+"CORESHELL"_#CORESHELL, adiabatic core/shell model, "Section 6.6.25"_Section_howto.html#howto_25, coreshell, -
+"DIPOLE"_#DIPOLE, point dipole particles, "pair_style dipole/cut"_pair_dipole.html, dipole, -
+"GPU"_#GPU, GPU-enabled styles, "Section 5.3.1"_accelerate_gpu.html, WWW bench, int
+"GRANULAR"_#GRANULAR, granular systems, "Section 6.6.6"_Section_howto.html#howto_6, pour, -
+"KIM"_#KIM, openKIM wrapper, "pair_style kim"_pair_kim.html, kim, ext
+"KOKKOS"_#KOKKOS, Kokkos-enabled styles, "Section 5.3.3"_accelerate_kokkos.html, WWW bench, -
+"KSPACE"_#KSPACE, long-range Coulombic solvers, "kspace_style"_kspace_style.html, peptide, -
+"MANYBODY"_#MANYBODY, many-body potentials, "pair_style tersoff"_pair_tersoff.html, shear, -
+"MC"_#MC, Monte Carlo options, "fix gcmc"_fix_gcmc.html, -, -
+"MEAM"_#MEAM, modified EAM potential, "pair_style meam"_pair_meam.html, meam, int
+"MISC"_#MISC, miscellanous single-file commands, -, -, -
+"MOLECULE"_#MOLECULE, molecular system force fields, "Section 6.6.3"_Section_howto.html#howto_3, peptide, -
+"MPIIO"_#MPIIO, MPI parallel I/O dump and restart, "dump"_dump.html, -, -
+"MSCG"_#MSCG, multi-scale coarse-graining wrapper, "fix mscg"_fix_mscg.html, mscg, ext
+"OPT"_#OPT, optimized pair styles, "Section 5.3.5"_accelerate_opt.html, WWW bench, -
+"PERI"_#PERI, Peridynamics models, "pair_style peri"_pair_peri.html, peri, -
+"POEMS"_#POEMS, coupled rigid body motion, "fix poems"_fix_poems.html, rigid, int
+"PYTHON"_#PYTHON, embed Python code in an input script, "python"_python.html, python, sys
+"QEQ"_#QEQ, QEq charge equilibration, "fix qeq"_fix_qeq.html, qeq, -
+"REAX"_#REAX, ReaxFF potential (Fortran), "pair_style reax"_pair_reax.html, reax, int
+"REPLICA"_#REPLICA, multi-replica methods, "Section 6.6.5"_Section_howto.html#howto_5, tad, -
+"RIGID"_#RIGID, rigid bodies and constraints, "fix rigid"_fix_rigid.html, rigid, -
+"SHOCK"_#SHOCK, shock loading methods, "fix msst"_fix_msst.html, -, -
+"SNAP"_#SNAP, quantum-fitted potential, "pair snap"_pair_snap.html, snap, -
+"SRD"_#SRD, stochastic rotation dynamics, "fix srd"_fix_srd.html, srd, -
+"VORONOI"_#VORONOI, Voronoi tesselation, "compute voronoi/atom"_compute_voronoi_atom.html, -, ext
+:tb(ea=c,ca1=l)
+
+[USER packages] :link(table_user),p
+
+Package, Description, Doc page, Example, Library
+"USER-ATC"_#USER-ATC, atom-to-continuum coupling, "fix atc"_fix_atc.html, USER/atc, int
+"USER-AWPMD"_#USER-AWPMD, wave-packet MD, "pair_style awpmd/cut"_pair_awpmd.html, USER/awpmd, int
+"USER-CGDNA"_#USER-CGDNA, coarse-grained DNA force fields, src/USER-CGDNA/README, USER/cgdna, -
+"USER-CGSDK"_#USER-CGSDK, SDK coarse-graining model, "pair_style lj/sdk"_pair_sdk.html, USER/cgsdk, -
+"USER-COLVARS"_#USER-COLVARS, collective variables library, "fix colvars"_fix_colvars.html, USER/colvars, int
+"USER-DIFFRACTION"_#USER-DIFFRACTION, virtual x-ray and electron diffraction,"compute xrd"_compute_xrd.html, USER/diffraction, -
+"USER-DPD"_#USER-DPD, reactive dissipative particle dynamics, src/USER-DPD/README, USER/dpd, -
+"USER-DRUDE"_#USER-DRUDE, Drude oscillators, "tutorial"_tutorial_drude.html, USER/drude, -
+"USER-EFF"_#USER-EFF, electron force field,"pair_style eff/cut"_pair_eff.html, USER/eff, -
+"USER-FEP"_#USER-FEP, free energy perturbation,"compute fep"_compute_fep.html, USER/fep, -
+"USER-H5MD"_#USER-H5MD, dump output via HDF5,"dump h5md"_dump_h5md.html, -, ext
+"USER-INTEL"_#USER-INTEL, optimized Intel CPU and KNL styles,"Section 5.3.2"_accelerate_intel.html, WWW bench, -
+"USER-LB"_#USER-LB, Lattice Boltzmann fluid,"fix lb/fluid"_fix_lb_fluid.html, USER/lb, -
+"USER-MANIFOLD"_#USER-MANIFOLD, motion on 2d surfaces,"fix manifoldforce"_fix_manifoldforce.html, USER/manifold, -
+"USER-MGPT"_#USER-MGPT, fast MGPT multi-ion potentials, "pair_style mgpt"_pair_mgpt.html, USER/mgpt, -
+"USER-MISC"_#USER-MISC, single-file contributions, USER-MISC/README, USER/misc, -
+"USER-MOLFILE"_#USER-MOLFILE, "VMD"_VMD molfile plug-ins,"dump molfile"_dump_molfile.html, -, ext
+"USER-NETCDF"_#USER-NETCDF, dump output via NetCDF,"dump netcdf"_dump_netcdf.html, -, ext
+"USER-OMP"_#USER-OMP, OpenMP-enabled styles,"Section 5.3.4"_accelerate_omp.html, WWW bench, -
+"USER-PHONON"_#USER-PHONON, phonon dynamical matrix,"fix phonon"_fix_phonon.html, USER/phonon, -
+"USER-QMMM"_#USER-QMMM, QM/MM coupling,"fix qmmm"_fix_qmmm.html, USER/qmmm, ext
+"USER-QTB"_#USER-QTB, quantum nuclear effects,"fix qtb"_fix_qtb.html "fix qbmsst"_fix_qbmsst.html, qtb, -
+"USER-QUIP"_#USER-QUIP, QUIP/libatoms interface,"pair_style quip"_pair_quip.html, USER/quip, ext
+"USER-REAXC"_#USER-REAXC, ReaxFF potential (C/C++) ,"pair_style reaxc"_pair_reaxc.html, reax, -
+"USER-SMD"_#USER-SMD, smoothed Mach dynamics,"SMD User Guide"_PDF/SMD_LAMMPS_userguide.pdf, USER/smd, ext
+"USER-SMTBQ"_#USER-SMTBQ, second moment tight binding QEq potential,"pair_style smtbq"_pair_smtbq.html, USER/smtbq, -
+"USER-SPH"_#USER-SPH, smoothed particle hydrodynamics,"SPH User Guide"_PDF/SPH_LAMMPS_userguide.pdf, USER/sph, -
+"USER-TALLY"_#USER-TALLY, pairwise tally computes,"compute XXX/tally"_compute_tally.html, USER/tally, -
+"USER-VTK"_#USER-VTK, dump output via VTK, "compute custom/vtk"_dump_custom_vtk.html, -, ext
+:tb(ea=c,ca1=l)
:line
+:line
+
+ASPHERE package :link(ASPHERE),h4
-ASPHERE package :link(ASPHERE),h5
+[Contents:]
-Contents: Several computes, time-integration fixes, and pair styles
-for aspherical particle models: ellipsoids, 2d lines, 3d triangles.
+Computes, time-integration fixes, and pair styles for aspherical
+particle models including ellipsoids, 2d lines, and 3d triangles.
-To install via make or Make.py:
+[Install or un-install:]
make yes-asphere
make machine :pre
-Make.py -p asphere -a machine :pre
-
-To un-install via make or Make.py:
-
make no-asphere
make machine :pre
-Make.py -p ^asphere -a machine :pre
+[Supporting info:]
-Supporting info: "Section 6.14"_Section_howto.html#howto_14,
-"pair_style gayberne"_pair_gayberne.html, "pair_style
-resquared"_pair_resquared.html,
-"doc/PDF/pair_gayberne_extra.pdf"_PDF/pair_gayberne_extra.pdf,
-"doc/PDF/pair_resquared_extra.pdf"_PDF/pair_resquared_extra.pdf,
-examples/ASPHERE, examples/ellipse
+src/ASPHERE: filenames -> commands
+"Section 6.14"_Section_howto.html#howto_14
+"pair_style gayberne"_pair_gayberne.html
+"pair_style resquared"_pair_resquared.html
+"doc/PDF/pair_gayberne_extra.pdf"_PDF/pair_gayberne_extra.pdf
+"doc/PDF/pair_resquared_extra.pdf"_PDF/pair_resquared_extra.pdf
+examples/ASPHERE
+examples/ellipse
+http://lammps.sandia.gov/movies.html#line
+http://lammps.sandia.gov/movies.html#tri :ul
:line
-BODY package :link(BODY),h5
+BODY package :link(BODY),h4
+
+[Contents:]
-Contents: Support for body-style particles. Computes,
+Body-style particles with internal structure. Computes,
time-integration fixes, pair styles, as well as the body styles
themselves. See the "body"_body.html doc page for an overview.
-To install via make or Make.py:
+[Install or un-install:]
make yes-body
make machine :pre
-Make.py -p body -a machine :pre
-
-To un-install via make or Make.py:
-
make no-body
make machine :pre
-Make.py -p ^body -a machine :pre
+[Supporting info:]
-Supporting info: "atom_style body"_atom_style.html, "body"_body.html,
-"pair_style body"_pair_body.html, examples/body
+src/BODY filenames -> commands
+"body"_body.html
+"atom_style body"_atom_style.html
+"fix nve/body"_fix_nve_body.html
+"pair_style body"_pair_body.html
+examples/body :ul
:line
-CLASS2 package :link(CLASS2),h5
+CLASS2 package :link(CLASS2),h4
-Contents: Bond, angle, dihedral, improper, and pair styles for the
-COMPASS CLASS2 molecular force field.
+[Contents:]
-To install via make or Make.py:
+Bond, angle, dihedral, improper, and pair styles for the COMPASS
+CLASS2 molecular force field.
+
+[Install or un-install:]
make yes-class2
make machine :pre
-Make.py -p class2 -a machine :pre
-
-To un-install via make or Make.py:
-
make no-class2
make machine :pre
-Make.py -p ^class2 -a machine :pre
+[Supporting info:]
-Supporting info: "bond_style class2"_bond_class2.html, "angle_style
-class2"_angle_class2.html, "dihedral_style
-class2"_dihedral_class2.html, "improper_style
-class2"_improper_class2.html, "pair_style lj/class2"_pair_class2.html
+src/CLASS2: filenames -> commands
+"bond_style class2"_bond_class2.html
+"angle_style class2"_angle_class2.html
+"dihedral_style class2"_dihedral_class2.html
+"improper_style class2"_improper_class2.html
+"pair_style lj/class2"_pair_class2.html :ul
:line
-COLLOID package :link(COLLOID),h5
+COLLOID package :link(COLLOID),h4
-Contents: Support for coarse-grained colloidal particles. Wall fix
-and pair styles that implement colloidal interaction models for
-finite-size particles. This includes the Fast Lubrication Dynamics
-method for hydrodynamic interactions, which is a simplified
-approximation to Stokesian dynamics.
+[Contents:]
-To install via make or Make.py:
+Coarse-grained finite-size colloidal particles. Pair stayle and fix
+wall styles for colloidal interactions. Includes the Fast Lubrication
+Dynamics (FLD) method for hydrodynamic interactions, which is a
+simplified approximation to Stokesian dynamics.
-make yes-colloid
-make machine :pre
+[Authors:] This package includes Fast Lubrication Dynamics pair styles
+which were created by Amit Kumar and Michael Bybee from Jonathan
+Higdon's group at UIUC.
-Make.py -p colloid -a machine :pre
+[Install or un-install:]
-To un-install via make or Make.py:
+make yes-colloid
+make machine :pre
make no-colloid
make machine :pre
-Make.py -p ^colloid -a machine :pre
+[Supporting info:]
-Supporting info: "fix wall/colloid"_fix_wall.html, "pair_style
-colloid"_pair_colloid.html, "pair_style
-yukawa/colloid"_pair_yukawa_colloid.html, "pair_style
-brownian"_pair_brownian.html, "pair_style
-lubricate"_pair_lubricate.html, "pair_style
-lubricateU"_pair_lubricateU.html, examples/colloid, examples/srd
+src/COLLOID: filenames -> commands
+"fix wall/colloid"_fix_wall.html
+"pair_style colloid"_pair_colloid.html
+"pair_style yukawa/colloid"_pair_yukawa_colloid.html
+"pair_style brownian"_pair_brownian.html
+"pair_style lubricate"_pair_lubricate.html
+"pair_style lubricateU"_pair_lubricateU.html
+examples/colloid
+examples/srd :ul
:line
-COMPRESS package :link(COMPRESS),h5
+COMPRESS package :link(COMPRESS),h4
-Contents: Support for compressed output of dump files via the zlib
-compression library, using dump styles with a "gz" in their style
-name.
+[Contents:]
-Building with the COMPRESS package assumes you have the zlib
-compression library available on your system. The build uses the
-lib/compress/Makefile.lammps file in the compile/link process. You
-should only need to edit this file if the LAMMPS build cannot find the
-zlib info it specifies.
+Compressed output of dump files via the zlib compression library,
+using dump styles with a "gz" in their style name.
-To install via make or Make.py:
+To use this package you must have the zlib compression library
+available on your system.
-make yes-compress
-make machine :pre
+[Author:] Axel Kohlmeyer (Temple U).
-Make.py -p compress -a machine :pre
+[Install or un-install:]
-To un-install via make or Make.py:
+Note that building with this package assumes you have the zlib
+compression library available on your system. The LAMMPS build uses
+the settings in the lib/compress/Makefile.lammps file in the
+compile/link process. You should only need to edit this file if the
+LAMMPS build fails on your system.
+
+make yes-compress
+make machine :pre
make no-compress
make machine :pre
-Make.py -p ^compress -a machine :pre
+[Supporting info:]
-Supporting info: src/COMPRESS/README, lib/compress/README, "dump
-atom/gz"_dump.html, "dump cfg/gz"_dump.html, "dump
-custom/gz"_dump.html, "dump xyz/gz"_dump.html
+src/COMPRESS: filenames -> commands
+src/COMPRESS/README
+lib/compress/README
+"dump atom/gz"_dump.html
+"dump cfg/gz"_dump.html
+"dump custom/gz"_dump.html
+"dump xyz/gz"_dump.html :ul
:line
-CORESHELL package :link(CORESHELL),h5
+CORESHELL package :link(CORESHELL),h4
-Contents: Compute and pair styles that implement the adiabatic
-core/shell model for polarizability. The compute temp/cs command
-measures the temperature of a system with core/shell particles. The
-pair styles augment Born, Buckingham, and Lennard-Jones styles with
-core/shell capabilities. See "Section 6.26"_Section_howto.html#howto_26
-for an overview of how to use the package.
+[Contents:]
-To install via make or Make.py:
+Compute and pair styles that implement the adiabatic core/shell model
+for polarizability. The pair styles augment Born, Buckingham, and
+Lennard-Jones styles with core/shell capabilities. The "compute
+temp/cs"_compute_temp_cs.html command calculates the temperature of a
+system with core/shell particles. See "Section
+6.26"_Section_howto.html#howto_26 for an overview of how to use this
+package.
-make yes-coreshell
-make machine :pre
+[Author:] Hendrik Heenen (Technical U of Munich).
-Make.py -p coreshell -a machine :pre
+[Install or un-install:]
-To un-install via make or Make.py:
+make yes-coreshell
+make machine :pre
make no-coreshell
make machine :pre
-Make.py -p ^coreshell -a machine :pre
+[Supporting info:]
-Supporting info: "Section 6.26"_Section_howto.html#howto_26,
-"compute temp/cs"_compute_temp_cs.html,
-"pair_style born/coul/long/cs"_pair_cs.html, "pair_style
-buck/coul/long/cs"_pair_cs.html, pair_style
-lj/cut/coul/long/cs"_pair_lj.html, examples/coreshell
+src/CORESHELL: filenames -> commands
+"Section 6.26"_Section_howto.html#howto_26
+"Section 6.25"_Section_howto.html#howto_25
+"compute temp/cs"_compute_temp_cs.html
+"pair_style born/coul/long/cs"_pair_cs.html
+"pair_style buck/coul/long/cs"_pair_cs.html
+"pair_style lj/cut/coul/long/cs"_pair_lj.html
+examples/coreshell :ul
:line
-DIPOLE package :link(DIPOLE),h5
+DIPOLE package :link(DIPOLE),h4
-Contents: An atom style and several pair styles to support point
-dipole models with short-range or long-range interactions.
+[Contents:]
-To install via make or Make.py:
+An atom style and several pair styles for point dipole models with
+short-range or long-range interactions.
+
+[Install or un-install:]
make yes-dipole
make machine :pre
-Make.py -p dipole -a machine :pre
-
-To un-install via make or Make.py:
-
make no-dipole
make machine :pre
-Make.py -p ^dipole -a machine :pre
+[Supporting info:]
-Supporting info: "atom_style dipole"_atom_style.html, "pair_style
-lj/cut/dipole/cut"_pair_dipole.html, "pair_style
-lj/cut/dipole/long"_pair_dipole.html, "pair_style
-lj/long/dipole/long"_pair_dipole.html, examples/dipole
+src/DIPOLE: filenames -> commands
+"atom_style dipole"_atom_style.html
+"pair_style lj/cut/dipole/cut"_pair_dipole.html
+"pair_style lj/cut/dipole/long"_pair_dipole.html
+"pair_style lj/long/dipole/long"_pair_dipole.html
+examples/dipole :ul
:line
-GPU package :link(GPU),h5
+GPU package :link(GPU),h4
+
+[Contents:]
-Contents: Dozens of pair styles and a version of the PPPM long-range
-Coulombic solver for NVIDIA GPUs. All of them have a "gpu" in their
-style name. "Section 5.3.1"_accelerate_gpu.html gives
+Dozens of pair styles and a version of the PPPM long-range Coulombic
+solver optimized for NVIDIA GPUs. All such styles have a "gpu" as a
+suffix in their style name. "Section 5.3.1"_accelerate_gpu.html gives
details of what hardware and Cuda software is required on your system,
-and how to build and use this package. See the KOKKOS package, which
-also has GPU-enabled styles.
-
-Building LAMMPS with the GPU package requires first building the GPU
-library itself, which is a set of C and Cuda files in lib/gpu.
-Details of how to do this are in lib/gpu/README. As illustrated
-below, perform a "make" using one of the Makefile.machine files in
-lib/gpu which should create a lib/reax/libgpu.a file.
-Makefile.linux.* and Makefile.xk7 are examples for different
-platforms. There are 3 important settings in the Makefile.machine you
-use:
+and details on how to build and use this package. Its styles can be
+invoked at run time via the "-sf gpu" or "-suffix gpu" "command-line
+switches"_Section_start.html#start_7. See also the "KOKKOS"_#KOKKOS
+package, which has GPU-enabled styles.
+
+[Authors:] Mike Brown (Intel) while at Sandia and ORNL and Trung Nguyen
+(Northwestern U) while at ORNL.
+
+[Install or un-install:]
+
+Before building LAMMPS with this package, you must first build the GPU
+library in lib/gpu from a set of provided C and Cuda files. You can
+do this manually if you prefer; follow the instructions in
+lib/gpu/README. You can also do it in one step from the lammps/src
+dir, using a command like these, which simply invoke the
+lib/gpu/Install.py script with the specified args:
+
+make lib-gpu # print help message
+make lib-gpu args="-m" # build GPU library with default Makefile.linux
+make lib-gpu args="-i xk7 -p single -o xk7.single" # create new Makefile.xk7.single, altered for single-precision
+make lib-gpu args="-i xk7 -p single -o xk7.single -m" # ditto, also build GPU library
+
+Note that this procedure starts with one of the existing
+Makefile.machine files in lib/gpu. It allows you to alter 4 important
+settings in that Makefile, via the -h, -a, -p, -e switches,
+and save the new Makefile, if desired:
CUDA_HOME = where NVIDIA Cuda software is installed on your system
-CUDA_ARCH = appropriate to your GPU hardware
-CUDA_PREC = precision (double, mixed, single) you desire :ul
-
-See example Makefile.machine files in lib/gpu for the syntax of these
-settings. See lib/gpu/Makefile.linux.double for ARCH settings for
-various NVIDIA GPUs. The "make" also creates a
-lib/gpu/Makefile.lammps file. This file has settings that enable
-LAMMPS to link with Cuda libraries. If the settings in
-Makefile.lammps for your machine are not correct, the LAMMPS link will
-fail. Note that the Make.py script has a "-gpu" option to allow the
-GPU library (with several of its options) and LAMMPS to be built in
-one step, with Type "python src/Make.py -h -gpu" to see the details.
-
-To install via make or Make.py:
-
-cd ~/lammps/lib/gpu
-make -f Makefile.linux.mixed # for example
-cd ~/lammps/src
-make yes-gpu
-make machine :pre
+CUDA_ARCH = what GPU hardware you have (see help message for details)
+CUDA_PRECISION = precision (double, mixed, single)
+EXTRAMAKE = which Makefile.lammps.* file to copy to Makefile.lammps :ul
+
+If the library build is successful, 2 files should be created:
+lib/gpu/libgpu.a and lib/gpu/Makefile.lammps. The latter has settings
+that enable LAMMPS to link with Cuda libraries. If the settings in
+Makefile.lammps for your machine are not correct, the LAMMPS build
+will fail.
-Make.py -p gpu -gpu mode=mixed arch=35 -a machine :pre
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
-To un-install via make or Make.py:
+make yes-gpu
+make machine :pre
make no-gpu
make machine :pre
-Make.py -p ^gpu -a machine :pre
+NOTE: If you re-build the GPU library in lib/gpu, you should always
+un-install the GPU package, then re-install it and re-build LAMMPS.
+This is because the compilation of files in the GPU package use the
+library settings from the lib/gpu/Makefile.machine used to build the
+GPU library.
-Supporting info: src/GPU/README, lib/gpu/README,
-"Section 5.3"_Section_accelerate.html#acc_3,
-"Section 5.3.1"_accelerate_gpu.html,
-Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5
-for any pair style listed with a (g),
-"kspace_style"_kspace_style.html, "package gpu"_package.html,
-examples/accelerate, bench/FERMI, bench/KEPLER
+[Supporting info:]
+
+src/GPU: filenames -> commands
+src/GPU/README
+lib/gpu/README
+"Section 5.3"_Section_accelerate.html#acc_3
+"Section 5.3.1"_accelerate_gpu.html
+"Section 2.7 -sf gpu"_Section_start.html#start_7
+"Section 2.7 -pk gpu"_Section_start.html#start_7
+"package gpu"_package.html
+Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5 for pair styles followed by (g)
+"Benchmarks page"_http://lammps.sandia.gov/bench.html of web site :ul
:line
-GRANULAR package :link(GRANULAR),h5
+GRANULAR package :link(GRANULAR),h4
-Contents: Fixes and pair styles that support models of finite-size
-granular particles, which interact with each other and boundaries via
-frictional and dissipative potentials.
+[Contents:]
-To install via make or Make.py:
+Pair styles and fixes for finite-size granular particles, which
+interact with each other and boundaries via frictional and dissipative
+potentials.
+
+[Install or un-install:]
make yes-granular
make machine :pre
-Make.py -p granular -a machine :pre
-
-To un-install via make or Make.py:
-
make no-granular
make machine :pre
-Make.py -p ^granular -a machine :pre
-
-Supporting info: "Section 6.6"_Section_howto.html#howto_6, "fix
-pour"_fix_pour.html, "fix wall/gran"_fix_wall_gran.html, "pair_style
-gran/hooke"_pair_gran.html, "pair_style
-gran/hertz/history"_pair_gran.html, examples/pour, bench/in.chute
+[Supporting info:]
+
+src/GRANULAR: filenames -> commands
+"Section 6.6"_Section_howto.html#howto_6,
+"fix pour"_fix_pour.html
+"fix wall/gran"_fix_wall_gran.html
+"pair_style gran/hooke"_pair_gran.html
+"pair_style gran/hertz/history"_pair_gran.html
+examples/granregion
+examples/pour
+bench/in.chute
+http://lammps.sandia.gov/pictures.html#jamming
+http://lammps.sandia.gov/movies.html#hopper
+http://lammps.sandia.gov/movies.html#dem
+http://lammps.sandia.gov/movies.html#brazil
+http://lammps.sandia.gov/movies.html#granregion :ul
:line
-KIM package :link(KIM),h5
+KIM package :link(KIM),h4
-Contents: A pair style that interfaces to the Knowledge Base for
-Interatomic Models (KIM) repository of interatomic potentials, so that
-KIM potentials can be used in a LAMMPS simulation.
+[Contents:]
-To build LAMMPS with the KIM package you must have previously
-installed the KIM API (library) on your system. The lib/kim/README
-file explains how to download and install KIM. Building with the KIM
-package also uses the lib/kim/Makefile.lammps file in the compile/link
-process. You should not need to edit this file.
+A "pair_style kim"_pair_kim.html command which is a wrapper on the
+Knowledge Base for Interatomic Models (KIM) repository of interatomic
+potentials, enabling any of them to be used in LAMMPS simulations.
-To install via make or Make.py:
+To use this package you must have the KIM library available on your
+system.
-make yes-kim
-make machine :pre
+Information about the KIM project can be found at its website:
+https://openkim.org. The KIM project is led by Ellad Tadmor and Ryan
+Elliott (U Minnesota) and James Sethna (Cornell U).
+
+[Authors:] Ryan Elliott (U Minnesota) is the main developer for the KIM
+API which the "pair_style kim"_pair_kim.html command uses. He
+developed the pair style in collaboration with Valeriu Smirichinski (U
+Minnesota).
+
+[Install or un-install:]
-Make.py -p kim -a machine :pre
+Using this package requires the KIM library and its models
+(interatomic potentials) to be downloaded and installed on your
+system. The library can be downloaded and built in lib/kim or
+elsewhere on your system. Details of the download, build, and install
+process for KIM are given in the lib/kim/README file.
-To un-install via make or Make.py:
+Once that process is complete, you can then install/un-install the
+package and build LAMMPS in the usual manner:
+
+make yes-kim
+make machine :pre
make no-kim
make machine :pre
-Make.py -p ^kim -a machine :pre
+[Supporting info:]
-Supporting info: src/KIM/README, lib/kim/README, "pair_style
-kim"_pair_kim.html, examples/kim
+src/KIM: filenames -> commands
+src/KIM/README
+lib/kim/README
+"pair_style kim"_pair_kim.html
+examples/kim :ul
:line
-KOKKOS package :link(KOKKOS),h5
+KOKKOS package :link(KOKKOS),h4
-Contents: Dozens of atom, pair, bond, angle, dihedral, improper styles
-which run with the Kokkos library to provide optimization for
-multicore CPUs (via OpenMP), NVIDIA GPUs, or the Intel Xeon Phi (in
-native mode). All of them have a "kk" in their style name. "Section
-5.3.3"_accelerate_kokkos.html gives details of what
-hardware and software is required on your system, and how to build and
-use this package. See the GPU, OPT, USER-INTEL, USER-OMP packages,
-which also provide optimizations for the same range of hardware.
+[Contents:]
-Building with the KOKKOS package requires choosing which of 3 hardware
-options you are optimizing for: CPU acceleration via OpenMP, GPU
-acceleration, or Intel Xeon Phi. (You can build multiple times to
-create LAMMPS executables for different hardware.) It also requires a
-C++11 compatible compiler. For GPUs, the NVIDIA "nvcc" compiler is
-used, and an appropriate KOKKOS_ARCH setting should be made in your
-Makefile.machine for your GPU hardware and NVIDIA software.
+Dozens of atom, pair, bond, angle, dihedral, improper, fix, compute
+styles adapted to compile using the Kokkos library which can convert
+them to OpenMP or Cuda code so that they run efficiently on multicore
+CPUs, KNLs, or GPUs. All the styles have a "kk" as a suffix in their
+style name. "Section 5.3.3"_accelerate_kokkos.html gives details of
+what hardware and software is required on your system, and how to
+build and use this package. Its styles can be invoked at run time via
+the "-sf kk" or "-suffix kk" "command-line
+switches"_Section_start.html#start_7. Also see the "GPU"_#GPU,
+"OPT"_#OPT, "USER-INTEL"_#USER-INTEL, and "USER-OMP"_#USER_OMP
+packages, which have styles optimized for CPUs, KNLs, and GPUs.
-The simplest way to do this is to use Makefile.kokkos_cuda or
-Makefile.kokkos_omp or Makefile.kokkos_phi in src/MAKE/OPTIONS, via
-"make kokkos_cuda" or "make kokkos_omp" or "make kokkos_phi". (Check
-the KOKKOS_ARCH setting in Makefile.kokkos_cuda), Or, as illustrated
-below, you can use the Make.py script with its "-kokkos" option to
-choose which hardware to build for. Type "python src/Make.py -h
--kokkos" to see the details. If these methods do not work on your
-system, you will need to read the "Section 5.3.3"_accelerate_kokkos.html
-doc page for details of what Makefile.machine settings are needed.
+You must have a C++11 compatible compiler to use this package.
-To install via make or Make.py for each of 3 hardware options:
+[Authors:] The KOKKOS package was created primarily by Christian Trott
+and Stan Moore (Sandia), with contributions from other folks as well.
+It uses the open-source "Kokkos library"_https://github.com/kokkos
+which was developed by Carter Edwards, Christian Trott, and others at
+Sandia, and which is included in the LAMMPS distribution in
+lib/kokkos.
-make yes-kokkos
-make kokkos_omp # for CPUs with OpenMP
-make kokkos_cuda # for GPUs, check the KOKKOS_ARCH setting in Makefile.kokkos_cuda
-make kokkos_phi # for Xeon Phis :pre
+[Install or un-install:]
+
+For the KOKKOS package, you have 3 choices when building. You can
+build with either CPU or KNL or GPU support. Each choice requires
+additional settings in your Makefile.machine for the KOKKOS_DEVICES
+and KOKKOS_ARCH settings. See the src/MAKE/OPTIONS/Makefile.kokkos*
+files for examples.
+
+For multicore CPUs using OpenMP:
+
+KOKKOS_DEVICES = OpenMP
+KOKKOS_ARCH = HSW # HSW = Haswell, SNB = SandyBridge, BDW = Broadwell, etc
+
+For Intel KNLs using OpenMP:
+
+KOKKOS_DEVICES = OpenMP
+KOKKOS_ARCH = KNL
+
+For NVIDIA GPUs using Cuda:
+
+KOKKOS_DEVICES = Cuda
+KOKKOS_ARCH = Pascal60,Power8 # P100 hosted by an IBM Power8, etc
+KOKKOS_ARCH = Kepler37,Power8 # K80 hosted by an IBM Power8, etc
+
+For GPUs, you also need these 2 lines in your Makefile.machine before
+the CC line is defined, in this case for use with OpenMPI mpicxx. The
+2 lines define a nvcc wrapper compiler, which will use nvcc for
+compiling Cuda files or use a C++ compiler for non-Kokkos, non-Cuda
+files.
-Make.py -p kokkos -kokkos omp -a machine # for CPUs with OpenMP
-Make.py -p kokkos -kokkos cuda arch=35 -a machine # for GPUs of style arch
-Make.py -p kokkos -kokkos phi -a machine # for Xeon Phis
+KOKKOS_ABSOLUTE_PATH = $(shell cd $(KOKKOS_PATH); pwd)
+export OMPI_CXX = $(KOKKOS_ABSOLUTE_PATH)/config/nvcc_wrapper
+CC = mpicxx
-To un-install via make or Make.py:
+Once you have an appropriate Makefile.machine, you can
+install/un-install the package and build LAMMPS in the usual manner.
+Note that you cannot build one executable to run on multiple hardware
+targets (CPU or KNL or GPU). You need to build LAMMPS once for each
+hardware target, to produce a separate executable. Also note that we
+do not recommend building with other acceleration packages installed
+(GPU, OPT, USER-INTEL, USER-OMP) when also building with KOKKOS.
+make yes-kokkos
+make machine :pre
+
make no-kokkos
make machine :pre
-Make.py -p ^kokkos -a machine :pre
+[Supporting info:]
-Supporting info: src/KOKKOS/README, lib/kokkos/README,
-"Section 5.3"_Section_accelerate.html#acc_3,
-"Section 5.3.3"_accelerate_kokkos.html,
-Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5
-for any pair style listed with a (k), "package kokkos"_package.html,
-examples/accelerate, bench/FERMI, bench/KEPLER
+src/KOKKOS: filenames -> commands
+src/KOKKOS/README
+lib/kokkos/README
+"Section 5.3"_Section_accelerate.html#acc_3
+"Section 5.3.3"_accelerate_kokkos.html
+"Section 2.7 -k on ..."_Section_start.html#start_7
+"Section 2.7 -sf kk"_Section_start.html#start_7
+"Section 2.7 -pk kokkos"_Section_start.html#start_7
+"package kokkos"_package.html
+Styles sections of "Section 3.5"_Section_commands.html#cmd_5 for styles followed by (k)
+"Benchmarks page"_http://lammps.sandia.gov/bench.html of web site :ul
:line
-KSPACE package :link(KSPACE),h5
+KSPACE package :link(KSPACE),h4
-Contents: A variety of long-range Coulombic solvers, and pair styles
-which compute the corresponding short-range portion of the pairwise
-Coulombic interactions. These include Ewald, particle-particle
-particle-mesh (PPPM), and multilevel summation method (MSM) solvers.
+[Contents:]
-Building with the KSPACE package requires a 1d FFT library be present
-on your system for use by the PPPM solvers. This can be the KISS FFT
-library provided with LAMMPS, or 3rd party libraries like FFTW or a
-vendor-supplied FFT library. See step 6 of "Section
-2.2.2"_Section_start.html#start_2_2 of the manual for details of how
-to select different FFT options in your machine Makefile. The Make.py
-tool has an "-fft" option which can insert these settings into your
-machine Makefile automatically. Type "python src/Make.py -h -fft" to
-see the details.
+A variety of long-range Coulombic solvers, as well as pair styles
+which compute the corresponding short-range pairwise Coulombic
+interactions. These include Ewald, particle-particle particle-mesh
+(PPPM), and multilevel summation method (MSM) solvers.
-To install via make or Make.py:
+[Install or un-install:]
+
+Building with this package requires a 1d FFT library be present on
+your system for use by the PPPM solvers. This can be the KISS FFT
+library provided with LAMMPS, 3rd party libraries like FFTW, or a
+vendor-supplied FFT library. See step 6 of "Section
+2.2.2"_Section_start.html#start_2_2 of the manual for details on how
+to select different FFT options in your machine Makefile.
make yes-kspace
make machine :pre
-Make.py -p kspace -a machine :pre
-
-To un-install via make or Make.py:
-
make no-kspace
make machine :pre
-Make.py -p ^kspace -a machine :pre
+[Supporting info:]
-Supporting info: "kspace_style"_kspace_style.html,
-"doc/PDF/kspace.pdf"_PDF/kspace.pdf,
-"Section 6.7"_Section_howto.html#howto_7,
-"Section 6.8"_Section_howto.html#howto_8,
-"Section 6.9"_Section_howto.html#howto_9,
-"pair_style coul"_pair_coul.html, other pair style command doc pages
-which have "long" or "msm" in their style name,
-examples/peptide, bench/in.rhodo
+src/KSPACE: filenames -> commands
+"kspace_style"_kspace_style.html
+"doc/PDF/kspace.pdf"_PDF/kspace.pdf
+"Section 6.7"_Section_howto.html#howto_7
+"Section 6.8"_Section_howto.html#howto_8
+"Section 6.9"_Section_howto.html#howto_9
+"pair_style coul"_pair_coul.html
+Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5 with "long" or "msm" in pair style name
+examples/peptide
+bench/in.rhodo :ul
:line
-MANYBODY package :link(MANYBODY),h5
+MANYBODY package :link(MANYBODY),h4
+
+[Contents:]
-Contents: A variety of many-body and bond-order potentials. These
-include (AI)REBO, EAM, EIM, BOP, Stillinger-Weber, and Tersoff
-potentials. Do a directory listing, "ls src/MANYBODY", to see
-the full list.
+A variety of manybody and bond-order potentials. These include
+(AI)REBO, BOP, EAM, EIM, Stillinger-Weber, and Tersoff potentials.
-To install via make or Make.py:
+[Install or un-install:]
make yes-manybody
make machine :pre
-Make.py -p manybody -a machine :pre
-
-To un-install via make or Make.py:
-
make no-manybody
make machine :pre
-Make.py -p ^manybody -a machine :pre
-
-Supporting info:
+[Supporting info:]
-Examples: Pair Styles section of "Section
-3.5"_Section_commands.html#cmd_5, examples/comb, examples/eim,
-examples/nb3d, examples/vashishta
+src/MANYBODY: filenames -> commands
+Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5
+examples/comb
+examples/eim
+examples/nb3d
+examples/shear
+examples/streitz
+examples/vashishta
+bench/in.eam :ul
:line
-MC package :link(MC),h5
+MC package :link(MC),h4
+
+[Contents:]
-Contents: Several fixes and a pair style that have Monte Carlo (MC) or
-MC-like attributes. These include fixes for creating, breaking, and
-swapping bonds, and for performing atomic swaps and grand-canonical MC
-in conjuction with dynamics.
+Several fixes and a pair style that have Monte Carlo (MC) or MC-like
+attributes. These include fixes for creating, breaking, and swapping
+bonds, for performing atomic swaps, and performing grand-canonical MC
+(GCMC) in conjuction with dynamics.
-To install via make or Make.py:
+[Install or un-install:]
make yes-mc
make machine :pre
-Make.py -p mc -a machine :pre
-
-To un-install via make or Make.py:
-
make no-mc
make machine :pre
-Make.py -p ^mc -a machine :pre
+[Supporting info:]
-Supporting info: "fix atom/swap"_fix_atom_swap.html, "fix
-bond/break"_fix_bond_break.html, "fix
-bond/create"_fix_bond_create.html, "fix bond/swap"_fix_bond_swap.html,
-"fix gcmc"_fix_gcmc.html, "pair_style dsmc"_pair_dsmc.html
+src/MC: filenames -> commands
+"fix atom/swap"_fix_atom_swap.html
+"fix bond/break"_fix_bond_break.html
+"fix bond/create"_fix_bond_create.html
+"fix bond/swap"_fix_bond_swap.html
+"fix gcmc"_fix_gcmc.html
+"pair_style dsmc"_pair_dsmc.html
+http://lammps.sandia.gov/movies.html#gcmc :ul
:line
-MEAM package :link(MEAM),h5
+MEAM package :link(MEAM),h4
-Contents: A pair style for the modified embedded atom (MEAM)
-potential.
+[Contents:]
-Building LAMMPS with the MEAM package requires first building the MEAM
-library itself, which is a set of Fortran 95 files in lib/meam.
-Details of how to do this are in lib/meam/README. As illustrated
-below, perform a "make" using one of the Makefile.machine files in
-lib/meam which should create a lib/meam/libmeam.a file.
-Makefile.gfortran and Makefile.ifort are examples for the GNU Fortran
-and Intel Fortran compilers. The "make" also copies a
-lib/meam/Makefile.lammps.machine file to lib/meam/Makefile.lammps.
-This file has settings that enable the C++ compiler used to build
-LAMMPS to link with a Fortran library (typically the 2 compilers to be
-consistent e.g. both Intel compilers, or both GNU compilers). If the
-settings in Makefile.lammps for your compilers and machine are not
-correct, the LAMMPS link will fail. Note that the Make.py script has
-a "-meam" option to allow the MEAM library and LAMMPS to be built in
-one step. Type "python src/Make.py -h -meam" to see the details.
+A pair style for the modified embedded atom (MEAM) potential.
-NOTE: The MEAM potential can run dramatically faster if built with the
-Intel Fortran compiler, rather than the GNU Fortran compiler.
+[Author:] Greg Wagner (Northwestern U) while at Sandia.
-To install via make or Make.py:
+[Install or un-install:]
-cd ~/lammps/lib/meam
-make -f Makefile.gfortran # for example
-cd ~/lammps/src
-make yes-meam
-make machine :pre
+Before building LAMMPS with this package, you must first build the
+MEAM library in lib/meam. You can do this manually if you prefer;
+follow the instructions in lib/meam/README. You can also do it in one
+step from the lammps/src dir, using a command like these, which simply
+invoke the lib/meam/Install.py script with the specified args:
+
+make lib-meam # print help message
+make lib-meam args="-m gfortran" # build with GNU Fortran compiler
+make lib-meam args="-m ifort" # build with Intel ifort compiler :pre
-Make.py -p meam -meam make=gfortran -a machine :pre
+The build should produce two files: lib/meam/libmeam.a and
+lib/meam/Makefile.lammps. The latter is copied from an existing
+Makefile.lammps.* and has settings needed to link C++ (LAMMPS) with
+Fortran (MEAM library). Typically the two compilers used for LAMMPS
+and the MEAM library need to be consistent (e.g. both Intel or both
+GNU compilers). If necessary, you can edit/create a new
+lib/meam/Makefile.machine file for your system, which should define an
+EXTRAMAKE variable to specify a corresponding Makefile.lammps.machine
+file.
-To un-install via make or Make.py:
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
+
+make yes-meam
+make machine :pre
make no-meam
make machine :pre
-Make.py -p ^meam -a machine :pre
+NOTE: You should test building the MEAM library with both the Intel
+and GNU compilers to see if a simulation runs faster with one versus
+the other on your system.
+
+[Supporting info:]
-Supporting info: lib/meam/README, "pair_style meam"_pair_meam.html,
-examples/meam
+src/MEAM: filenames -> commands
+src/meam/README
+lib/meam/README
+"pair_style meam"_pair_meam.html
+examples/meam :ul
:line
-MISC package :link(MISC),h5
+MISC package :link(MISC),h4
-Contents: A variety of computes, fixes, and pair styles that are not
-commonly used, but don't align with other packages. Do a directory
+[Contents:]
+
+A variety of compute, fix, pair, dump styles with specialized
+capabilities that don't align with other packages. Do a directory
listing, "ls src/MISC", to see the list of commands.
-To install via make or Make.py:
+[Install or un-install:]
make yes-misc
make machine :pre
-Make.py -p misc -a machine :pre
-
-To un-install via make or Make.py:
-
make no-misc
make machine :pre
-Make.py -p ^misc -a machine :pre
+[Supporting info:]
-Supporting info: "compute ti"_compute_ti.html, "fix
-evaporate"_fix_evaporate.html, "fix tmm"_fix_ttm.html, "fix
-viscosity"_fix_viscosity.html, examples/misc
+src/MISC: filenames -> commands
+"compute ti"_compute_ti.html
+"fix evaporate"_fix_evaporate.html
+"fix orient/fcc"_fix_orient.html
+"fix ttm"_fix_ttm.html
+"fix thermal/conductivity"_fix_thermal_conductivity.html
+"fix viscosity"_fix_viscosity.html
+examples/KAPPA
+examples/VISCOSITY
+http://lammps.sandia.gov/pictures.html#ttm
+http://lammps.sandia.gov/movies.html#evaporation :ul
:line
-MOLECULE package :link(MOLECULE),h5
+MOLECULE package :link(MOLECULE),h4
-Contents: A large number of atom, pair, bond, angle, dihedral,
-improper styles that are used to model molecular systems with fixed
-covalent bonds. The pair styles include terms for the Dreiding
-(hydrogen-bonding) and CHARMM force fields, and TIP4P water model.
+[Contents:]
-To install via make or Make.py:
+A large number of atom, pair, bond, angle, dihedral, improper styles
+that are used to model molecular systems with fixed covalent bonds.
+The pair styles include the Dreiding (hydrogen-bonding) and CHARMM
+force fields, and a TIP4P water model.
+
+[Install or un-install:]
make yes-molecule
make machine :pre
-Make.py -p molecule -a machine :pre
-
-To un-install via make or Make.py:
-
make no-molecule
make machine :pre
-Make.py -p ^molecule -a machine :pre
-
-Supporting info:"atom_style"_atom_style.html,
-"bond_style"_bond_style.html, "angle_style"_angle_style.html,
-"dihedral_style"_dihedral_style.html,
-"improper_style"_improper_style.html, "pair_style
-hbond/dreiding/lj"_pair_hbond_dreiding.html, "pair_style
-lj/charmm/coul/charmm"_pair_charmm.html,
-"Section 6.3"_Section_howto.html#howto_3,
-examples/micelle, examples/peptide, bench/in.chain, bench/in.rhodo
+[Supporting info:]
+
+src/MOLECULE: filenames -> commands
+"atom_style"_atom_style.html
+"bond_style"_bond_style.html
+"angle_style"_angle_style.html
+"dihedral_style"_dihedral_style.html
+"improper_style"_improper_style.html
+"pair_style hbond/dreiding/lj"_pair_hbond_dreiding.html
+"pair_style lj/charmm/coul/charmm"_pair_charmm.html
+"Section 6.3"_Section_howto.html#howto_3
+examples/cmap
+examples/dreiding
+examples/micelle,
+examples/peptide
+bench/in.chain
+bench/in.rhodo :ul
:line
-MPIIO package :link(MPIIO),h5
+MPIIO package :link(MPIIO),h4
+
+[Contents:]
-Contents: Support for parallel output/input of dump and restart files
-via the MPIIO library, which is part of the standard message-passing
-interface (MPI) library. It adds "dump styles"_dump.html with a
-"mpiio" in their style name. Restart files with an ".mpiio" suffix
-are also written and read in parallel.
+Support for parallel output/input of dump and restart files via the
+MPIIO library. It adds "dump styles"_dump.html with a "mpiio" in
+their style name. Restart files with an ".mpiio" suffix are also
+written and read in parallel.
-To install via make or Make.py:
+[Install or un-install:]
+Note that MPIIO is part of the standard message-passing interface
+(MPI) library, so you should not need any additional compiler or link
+settings, beyond what LAMMPS normally uses for MPI on your system.
+
make yes-mpiio
make machine :pre
+
+make no-mpiio
+make machine :pre
+
+[Supporting info:]
-Make.py -p mpiio -a machine :pre
+src/MPIIO: filenames -> commands
+"dump"_dump.html
+"restart"_restart.html
+"write_restart"_write_restart.html
+"read_restart"_read_restart.html :ul
-To un-install via make or Make.py:
+:line
+
+MSCG package :link(MSCG),h4
-make no-mpiio
+[Contents:]
+
+A "fix mscg"_fix_mscg.html command which can parameterize a
+Mulit-Scale Coarse-Graining (MSCG) model using the open-source "MS-CG
+library"_mscg.
+
+:link(mscg,https://github.com/uchicago-voth/MSCG-release)
+
+To use this package you must have the MS-CG library available on your
+system.
+
+[Authors:] The fix was written by Lauren Abbott (Sandia). The MS-CG
+library was developed by Jacob Wagner in Greg Voth's group at the
+University of Chicago.
+
+[Install or un-install:]
+
+Before building LAMMPS with this package, you must first download and
+build the MS-CG library. Building the MS-CG library and using it from
+LAMMPS requires a C++11 compatible compiler, and that LAPACK and GSL
+(GNU Scientific Library) libraries be installed on your machine. See
+the lib/mscg/README and MSCG/Install files for more details.
+
+Assuming these libraries are in place, you can do the download and
+build of MS-CG manually if you prefer; follow the instructions in
+lib/mscg/README. You can also do it in one step from the lammps/src
+dir, using a command like these, which simply invoke the
+lib/mscg/Install.py script with the specified args:
+
+make lib-mscg # print help message
+make lib-mscg args="-g -b -l" # download and build in default lib/mscg/MSCG-release-master
+make lib-mscg args="-h . MSCG -g -b -l" # download and build in lib/mscg/MSCG
+make lib-mscg args="-h ~ MSCG -g -b -l" # download and build in ~/mscg :pre
+
+Note that the final -l switch is to create 2 symbolic (soft) links,
+"includelink" and "liblink", in lib/mscg to point to the MS-CG src
+dir. When LAMMPS builds it will use these links. You should not need
+to edit the lib/mscg/Makefile.lammps file.
+
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
+
+make yes-mscg
+make machine :pre
+
+make no-mscg
make machine :pre
-Make.py -p ^mpiio -a machine :pre
+[Supporting info:]
-Supporting info: "dump"_dump.html, "restart"_restart.html,
-"write_restart"_write_restart.html, "read_restart"_read_restart.html
+src/MSCG: filenames -> commands
+src/MSCG/README
+lib/mscg/README
+examples/mscg :ul
:line
+
+OPT package :link(OPT),h4
-OPT package :link(OPT),h5
+[Contents:]
-Contents: A handful of pair styles with an "opt" in their style name
-which are optimized for improved CPU performance on single or multiple
-cores. These include EAM, LJ, CHARMM, and Morse potentials. "Section
-5.3.5"_accelerate_opt.html gives details of how to build and
-use this package. See the KOKKOS, USER-INTEL, and USER-OMP packages,
-which also have styles optimized for CPU performance.
+A handful of pair styles which are optimized for improved CPU
+performance on single or multiple cores. These include EAM, LJ,
+CHARMM, and Morse potentials. The styles have an "opt" suffix in
+their style name. "Section 5.3.5"_accelerate_opt.html gives details
+of how to build and use this package. Its styles can be invoked at
+run time via the "-sf opt" or "-suffix opt" "command-line
+switches"_Section_start.html#start_7. See also the "KOKKOS"_#KOKKOS,
+"USER-INTEL"_#USER-INTEL, and "USER-OMP"_#USER-OMP packages, which
+have styles optimized for CPU performance.
-Some C++ compilers, like the Intel compiler, require the compile flag
-"-restrict" to build LAMMPS with the OPT package. It should be added
-to the CCFLAGS line of your Makefile.machine. Or use Makefile.opt in
-src/MAKE/OPTIONS, via "make opt". For compilers that use the flag,
-the Make.py command adds it automatically to the Makefile.auto file it
-creates and uses.
+[Authors:] James Fischer (High Performance Technologies), David Richie,
+and Vincent Natoli (Stone Ridge Technolgy).
-To install via make or Make.py:
+[Install or un-install:]
make yes-opt
make machine :pre
-Make.py -p opt -a machine :pre
-
-To un-install via make or Make.py:
-
make no-opt
make machine :pre
-Make.py -p ^opt -a machine :pre
+NOTE: The compile flag "-restrict" must be used to build LAMMPS with
+the OPT package. It should be added to the CCFLAGS line of your
+Makefile.machine. See Makefile.opt in src/MAKE/OPTIONS for an
+example.
+
+CCFLAGS: add -restrict :ul
-Supporting info: "Section 5.3"_Section_accelerate.html#acc_3,
-"Section 5.3.5"_accelerate_opt.html, Pair Styles section of
-"Section 3.5"_Section_commands.html#cmd_5 for any pair style
-listed with an (t), examples/accelerate, bench/KEPLER
+[Supporting info:]
+
+src/OPT: filenames -> commands
+"Section 5.3"_Section_accelerate.html#acc_3
+"Section 5.3.5"_accelerate_opt.html
+"Section 2.7 -sf opt"_Section_start.html#start_7
+Pair Styles section of "Section 3.5"_Section_commands.html#cmd_5 for pair styles followed by (t)
+"Benchmarks page"_http://lammps.sandia.gov/bench.html of web site :ul
:line
-PERI package :link(PERI),h5
+PERI package :link(PERI),h4
-Contents: Support for the Peridynamics method, a particle-based
-meshless continuum model. The package includes an atom style, several
-computes which calculate diagnostics, and several Peridynamic pair
-styles which implement different materials models.
+[Contents:]
-To install via make or Make.py:
+An atom style, several pair styles which implement different
+Peridynamics materials models, and several computes which calculate
+diagnostics. Peridynamics is a a particle-based meshless continuum
+model.
-make yes-peri
-make machine :pre
+[Authors:] The original package was created by Mike Parks (Sandia).
+Additional Peridynamics models were added by Rezwanur Rahman and John
+Foster (UTSA).
-Make.py -p peri -a machine :pre
+[Install or un-install:]
-To un-install via make or Make.py:
+make yes-peri
+make machine :pre
make no-peri
make machine :pre
-Make.py -p ^peri -a machine :pre
+[Supporting info:]
-Supporting info:
-"doc/PDF/PDLammps_overview.pdf"_PDF/PDLammps_overview.pdf,
-"doc/PDF/PDLammps_EPS.pdf"_PDF/PDLammps_EPS.pdf,
-"doc/PDF/PDLammps_VES.pdf"_PDF/PDLammps_VES.pdf, "atom_style
-peri"_atom_style.html, "compute damage/atom"_compute_damage_atom.html,
-"pair_style peri/pmb"_pair_peri.html, examples/peri
+src/PERI: filenames -> commands
+"doc/PDF/PDLammps_overview.pdf"_PDF/PDLammps_overview.pdf
+"doc/PDF/PDLammps_EPS.pdf"_PDF/PDLammps_EPS.pdf
+"doc/PDF/PDLammps_VES.pdf"_PDF/PDLammps_VES.pdf
+"atom_style peri"_atom_style.html
+"pair_style peri/*"_pair_peri.html
+"compute damage/atom"_compute_damage_atom.html
+"compute plasticity/atom"_compute_plasticity_atom.html
+examples/peri
+http://lammps.sandia.gov/movies.html#peri :ul
:line
-POEMS package :link(POEMS),h5
+POEMS package :link(POEMS),h4
-Contents: A fix that wraps the Parallelizable Open source Efficient
-Multibody Software (POEMS) librar, which is able to simulate the
-dynamics of articulated body systems. These are systems with multiple
-rigid bodies (collections of atoms or particles) whose motion is
-coupled by connections at hinge points.
+[Contents:]
-Building LAMMPS with the POEMS package requires first building the
-POEMS library itself, which is a set of C++ files in lib/poems.
-Details of how to do this are in lib/poems/README. As illustrated
-below, perform a "make" using one of the Makefile.machine files in
-lib/poems which should create a lib/meam/libpoems.a file.
-Makefile.g++ and Makefile.icc are examples for the GNU and Intel C++
-compilers. The "make" also creates a lib/poems/Makefile.lammps file
-which you should not need to change. Note the Make.py script has a
-"-poems" option to allow the POEMS library and LAMMPS to be built in
-one step. Type "python src/Make.py -h -poems" to see the details.
+A fix that wraps the Parallelizable Open source Efficient Multibody
+Software (POEMS) library, which is able to simulate the dynamics of
+articulated body systems. These are systems with multiple rigid
+bodies (collections of particles) whose motion is coupled by
+connections at hinge points.
-To install via make or Make.py:
+[Author:] Rudra Mukherjee (JPL) while at RPI.
-cd ~/lammps/lib/poems
-make -f Makefile.g++ # for example
-cd ~/lammps/src
-make yes-poems
-make machine :pre
+[Install or un-install:]
+
+Before building LAMMPS with this package, you must first build the
+POEMS library in lib/poems. You can do this manually if you prefer;
+follow the instructions in lib/poems/README. You can also do it in
+one step from the lammps/src dir, using a command like these, which
+simply invoke the lib/poems/Install.py script with the specified args:
+
+make lib-poems # print help message
+make lib-poems args="-m g++" # build with GNU g++ compiler
+make lib-poems args="-m icc" # build with Intel icc compiler :pre
-Make.py -p poems -poems make=g++ -a machine :pre
+The build should produce two files: lib/poems/libpoems.a and
+lib/poems/Makefile.lammps. The latter is copied from an existing
+Makefile.lammps.* and has settings needed to build LAMMPS with the
+POEMS library (though typically the settings are just blank). If
+necessary, you can edit/create a new lib/poems/Makefile.machine file
+for your system, which should define an EXTRAMAKE variable to specify
+a corresponding Makefile.lammps.machine file.
-To un-install via make or Make.py:
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
+
+make yes-poems
+make machine :pre
make no-meam
make machine :pre
-Make.py -p ^meam -a machine :pre
+[Supporting info:]
-Supporting info: src/POEMS/README, lib/poems/README,
-"fix poems"_fix_poems.html, examples/rigid
+src/POEMS: filenames -> commands
+src/POEMS/README
+lib/poems/README
+"fix poems"_fix_poems.html
+examples/rigid :ul
:line
-PYTHON package :link(PYTHON),h5
+PYTHON package :link(PYTHON),h4
-Contents: A "python"_python.html command which allow you to execute
-Python code from a LAMMPS input script. The code can be in a separate
-file or embedded in the input script itself. See "Section
-11.2"_Section_python.html#py_2 for an overview of using Python from
-LAMMPS and for other ways to use LAMMPS and Python together.
+[Contents:]
-Building with the PYTHON package assumes you have a Python shared
-library available on your system, which needs to be a Python 2
-version, 2.6 or later. Python 3 is not yet supported. The build uses
-the contents of the lib/python/Makefile.lammps file to find all the Python
-files required in the build/link process. See the lib/python/README
-file if the settings in that file do not work on your system. Note
-that the Make.py script has a "-python" option to allow an alternate
-lib/python/Makefile.lammps file to be specified and LAMMPS to be built
-in one step. Type "python src/Make.py -h -python" to see the details.
+A "python"_python.html command which allow you to execute Python code
+from a LAMMPS input script. The code can be in a separate file or
+embedded in the input script itself. See "Section
+11.2"_Section_python.html#py_2 for an overview of using Python from
+LAMMPS in this manner and the entire section for other ways to use
+LAMMPS and Python together.
-To install via make or Make.py:
+[Install or un-install:]
make yes-python
make machine :pre
-Make.py -p python -a machine :pre
-
-To un-install via make or Make.py:
-
make no-python
make machine :pre
-Make.py -p ^python -a machine :pre
+NOTE: Building with the PYTHON package assumes you have a Python
+shared library available on your system, which needs to be a Python 2
+version, 2.6 or later. Python 3 is not yet supported. See the
+lib/python/README for more details. Note that the build uses the
+lib/python/Makefile.lammps file in the compile/link process. You
+should only need to create a new Makefile.lammps.* file (and copy it
+to Makefile.lammps) if the LAMMPS build fails.
-Supporting info: examples/python
+[Supporting info:]
+
+src/PYTHON: filenames -> commands
+"Section 11"_Section_python.html
+lib/python/README
+examples/python :ul
:line
-QEQ package :link(QEQ),h5
+QEQ package :link(QEQ),h4
+
+[Contents:]
-Contents: Several fixes for performing charge equilibration (QEq) via
-severeal different algorithms. These can be used with pair styles
-that use QEq as part of their formulation.
+Several fixes for performing charge equilibration (QEq) via different
+algorithms. These can be used with pair styles that perform QEq as
+part of their formulation.
-To install via make or Make.py:
+[Install or un-install:]
make yes-qeq
make machine :pre
-Make.py -p qeq -a machine :pre
-
-To un-install via make or Make.py:
-
make no-qeq
make machine :pre
-Make.py -p ^qeq -a machine :pre
+[Supporting info:]
-Supporting info: "fix qeq/*"_fix_qeq.html, examples/qeq
+src/QEQ: filenames -> commands
+"fix qeq/*"_fix_qeq.html
+examples/qeq
+examples/streitz :ul
:line
-REAX package :link(REAX),h5
+REAX package :link(REAX),h4
-Contents: A pair style for the ReaxFF potential, a universal reactive
-force field, as well as a "fix reax/bonds"_fix_reax_bonds.html command
-for monitoring molecules as bonds are created and destroyed.
+[Contents:]
-Building LAMMPS with the REAX package requires first building the REAX
-library itself, which is a set of Fortran 95 files in lib/reax.
-Details of how to do this are in lib/reax/README. As illustrated
-below, perform a "make" using one of the Makefile.machine files in
-lib/reax which should create a lib/reax/libreax.a file.
-Makefile.gfortran and Makefile.ifort are examples for the GNU Fortran
-and Intel Fortran compilers. The "make" also copies a
-lib/reax/Makefile.lammps.machine file to lib/reax/Makefile.lammps.
-This file has settings that enable the C++ compiler used to build
-LAMMPS to link with a Fortran library (typically the 2 compilers to be
-consistent e.g. both Intel compilers, or both GNU compilers). If the
-settings in Makefile.lammps for your compilers and machine are not
-correct, the LAMMPS link will fail. Note that the Make.py script has
-a "-reax" option to allow the REAX library and LAMMPS to be built in
-one step. Type "python src/Make.py -h -reax" to see the details.
-
-To install via make or Make.py:
-
-cd ~/lammps/lib/reax
-make -f Makefile.gfortran # for example
-cd ~/lammps/src
-make yes-reax
-make machine :pre
+A pair style which wraps a Fortran library which implements the ReaxFF
+potential, which is a universal reactive force field. See the
+"USER-REAXC package"_#USER-REAXC for an alternate implementation in
+C/C++. Also a "fix reax/bonds"_fix_reax_bonds.html command for
+monitoring molecules as bonds are created and destroyed.
+
+[Author:] Aidan Thompson (Sandia).
-Make.py -p reax -reax make=gfortran -a machine :pre
+[Install or un-install:]
-To un-install via make or Make.py:
+Before building LAMMPS with this package, you must first build the
+REAX library in lib/reax. You can do this manually if you prefer;
+follow the instructions in lib/reax/README. You can also do it in one
+step from the lammps/src dir, using a command like these, which simply
+invoke the lib/reax/Install.py script with the specified args:
+
+make lib-reax # print help message
+make lib-reax args="-m gfortran" # build with GNU Fortran compiler
+make lib-reax args="-m ifort" # build with Intel ifort compiler :pre
+
+The build should produce two files: lib/reax/libreax.a and
+lib/reax/Makefile.lammps. The latter is copied from an existing
+Makefile.lammps.* and has settings needed to link C++ (LAMMPS) with
+Fortran (REAX library). Typically the two compilers used for LAMMPS
+and the REAX library need to be consistent (e.g. both Intel or both
+GNU compilers). If necessary, you can edit/create a new
+lib/reax/Makefile.machine file for your system, which should define an
+EXTRAMAKE variable to specify a corresponding Makefile.lammps.machine
+file.
+
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
+
+make yes-reax
+make machine :pre
make no-reax
make machine :pre
-Make.py -p ^reax -a machine :pre
+[Supporting info:]
-Supporting info: lib/reax/README, "pair_style reax"_pair_reax.html,
-"fix reax/bonds"_fix_reax_bonds.html, examples/reax
+src/REAX: filenames -> commands
+lib/reax/README
+"pair_style reax"_pair_reax.html
+"fix reax/bonds"_fix_reax_bonds.html
+examples/reax :ul
:line
-REPLICA package :link(REPLICA),h5
+REPLICA package :link(REPLICA),h4
-Contents: A collection of multi-replica methods that are used by
-invoking multiple instances (replicas) of LAMMPS
-simulations. Communication between individual replicas is performed in
-different ways by the different methods. See "Section
+[Contents:]
+
+A collection of multi-replica methods which can be used when running
+multiple LAMMPS simulations (replicas). See "Section
6.5"_Section_howto.html#howto_5 for an overview of how to run
-multi-replica simulations in LAMMPS. Multi-replica methods included
-in the package are nudged elastic band (NEB), parallel replica
-dynamics (PRD), temperature accelerated dynamics (TAD), parallel
-tempering, and a verlet/split algorithm for performing long-range
-Coulombics on one set of processors, and the remainder of the force
-field calculation on another set.
+multi-replica simulations in LAMMPS. Methods in the package include
+nudged elastic band (NEB), parallel replica dynamics (PRD),
+temperature accelerated dynamics (TAD), parallel tempering, and a
+verlet/split algorithm for performing long-range Coulombics on one set
+of processors, and the remainder of the force field calcalation on
+another set.
-To install via make or Make.py:
+[Install or un-install:]
make yes-replica
make machine :pre
-Make.py -p replica -a machine :pre
-
-To un-install via make or Make.py:
-
make no-replica
make machine :pre
-Make.py -p ^replica -a machine :pre
+[Supporting info:]
-Supporting info: "Section 6.5"_Section_howto.html#howto_5,
-"neb"_neb.html, "prd"_prd.html, "tad"_tad.html, "temper"_temper.html,
-"run_style verlet/split"_run_style.html, examples/neb, examples/prd,
-examples/tad
+src/REPLICA: filenames -> commands
+"Section 6.5"_Section_howto.html#howto_5
+"neb"_neb.html
+"prd"_prd.html
+"tad"_tad.html
+"temper"_temper.html,
+"run_style verlet/split"_run_style.html
+examples/neb
+examples/prd
+examples/tad :ul
:line
-RIGID package :link(RIGID),h5
+RIGID package :link(RIGID),h4
+
+[Contents:]
-Contents: A collection of computes and fixes which enforce rigid
-constraints on collections of atoms or particles. This includes SHAKE
-and RATTLE, as well as variants of rigid-body time integrators for a
-few large bodies or many small bodies.
+Fixes which enforce rigid constraints on collections of atoms or
+particles. This includes SHAKE and RATTLE, as well as varous
+rigid-body integrators for a few large bodies or many small bodies.
+Also several computes which calculate properties of rigid bodies.
-To install via make or Make.py:
+To install/build:
make yes-rigid
make machine :pre
-Make.py -p rigid -a machine :pre
-
-To un-install via make or Make.py:
+To un-install/re-build:
make no-rigid
make machine :pre
-Make.py -p ^rigid -a machine :pre
+[Supporting info:]
-Supporting info: "compute erotate/rigid"_compute_erotate_rigid.html,
-"fix shake"_fix_shake.html, "fix rattle"_fix_shake.html, "fix
-rigid/*"_fix_rigid.html, examples/ASPHERE, examples/rigid
+src/RIGID: filenames -> commands
+"compute erotate/rigid"_compute_erotate_rigid.html
+fix shake"_fix_shake.html
+"fix rattle"_fix_shake.html
+"fix rigid/*"_fix_rigid.html
+examples/ASPHERE
+examples/rigid
+bench/in.rhodo
+http://lammps.sandia.gov/movies.html#box
+http://lammps.sandia.gov/movies.html#star :ul
:line
-SHOCK package :link(SHOCK),h5
+SHOCK package :link(SHOCK),h4
+
+[Contents:]
-Contents: A small number of fixes useful for running impact
-simulations where a shock-wave passes through a material.
+Fixes for running impact simulations where a shock-wave passes through
+a material.
-To install via make or Make.py:
+[Install or un-install:]
make yes-shock
make machine :pre
-Make.py -p shock -a machine :pre
-
-To un-install via make or Make.py:
-
make no-shock
make machine :pre
-Make.py -p ^shock -a machine :pre
+[Supporting info:]
-Supporting info: "fix append/atoms"_fix_append_atoms.html, "fix
-msst"_fix_msst.html, "fix nphug"_fix_nphug.html, "fix
-wall/piston"_fix_wall_piston.html, examples/hugoniostat, examples/msst
+src/SHOCK: filenames -> commands
+"fix append/atoms"_fix_append_atoms.html
+"fix msst"_fix_msst.html
+"fix nphug"_fix_nphug.html
+"fix wall/piston"_fix_wall_piston.html
+examples/hugoniostat
+examples/msst :ul
:line
-SNAP package :link(SNAP),h5
+SNAP package :link(SNAP),h4
-Contents: A pair style for the spectral neighbor analysis potential
-(SNAP), which is an empirical potential which can be quantum accurate
-when fit to an archive of DFT data. Computes useful for analyzing
-properties of the potential are also included.
+[Contents:]
-To install via make or Make.py:
+A pair style for the spectral neighbor analysis potential (SNAP).
+SNAP is methodology for deriving a highly accurate classical potential
+fit to a large archive of quantum mechanical (DFT) data. Also several
+computes which analyze attributes of the potential.
-make yes-snap
-make machine :pre
+[Author:] Aidan Thompson (Sandia).
-Make.py -p snap -a machine :pre
+[Install or un-install:]
-To un-install via make or Make.py:
+make yes-snap
+make machine :pre
make no-snap
make machine :pre
-Make.py -p ^snap -a machine :pre
+[Supporting info:]
-Supporting info: "pair snap"_pair_snap.html, "compute
-sna/atom"_compute_sna_atom.html, "compute snad/atom"_compute_sna_atom.html,
-"compute snav/atom"_compute_sna_atom.html, examples/snap
+src/SNAP: filenames -> commands
+"pair snap"_pair_snap.html
+"compute sna/atom"_compute_sna_atom.html
+"compute snad/atom"_compute_sna_atom.html
+"compute snav/atom"_compute_sna_atom.html
+examples/snap :ul
:line
-SRD package :link(SRD),h5
+SRD package :link(SRD),h4
-Contents: Two fixes which implement the Stochastic Rotation Dynamics
-(SRD) method for coarse-graining of a solvent, typically around large
-colloidal-scale particles.
+[Contents:]
-To install via make or Make.py:
+A pair of fixes which implement the Stochastic Rotation Dynamics (SRD)
+method for coarse-graining of a solvent, typically around large
+colloidal particles.
+
+To install/build:
make yes-srd
make machine :pre
-Make.py -p srd -a machine :pre
-
-To un-install via make or Make.py:
+To un-install/re-build:
make no-srd
make machine :pre
-Make.py -p ^srd -a machine :pre
+[Supporting info:]
-Supporting info: "fix srd"_fix_srd.html, "fix
-wall/srd"_fix_wall_srd.html, examples/srd, examples/ASPHERE
+src/SRD: filenames -> commands
+"fix srd"_fix_srd.html
+"fix wall/srd"_fix_wall_srd.html
+examples/srd
+examples/ASPHERE
+http://lammps.sandia.gov/movies.html#tri
+http://lammps.sandia.gov/movies.html#line
+http://lammps.sandia.gov/movies.html#poly :ul
:line
-VORONOI package :link(VORONOI),h5
+VORONOI package :link(VORONOI),h4
-Contents: A "compute voronoi/atom"_compute_voronoi_atom.html command
-which computes the Voronoi tesselation of a collection of atoms or
-particles by wrapping the Voro++ lib
+[Contents:]
-To build LAMMPS with the KIM package you must have previously
-installed the KIM API (library) on your system. The lib/kim/README
-file explains how to download and install KIM. Building with the KIM
-package also uses the lib/kim/Makefile.lammps file in the compile/link
-process. You should not need to edit this file.
+A compute command which calculates the Voronoi tesselation of a
+collection of atoms by wrapping the "Voro++ library"_voronoi. This
+can be used to calculate the local volume or each atoms or its near
+neighbors.
+:link(voronoi,http://math.lbl.gov/voro++)
-To build LAMMPS with the VORONOI package you must have previously
-installed the Voro++ library on your system. The lib/voronoi/README
-file explains how to download and install Voro++. There is a
-lib/voronoi/install.py script which automates the process. Type
-"python install.py" to see instructions. The final step is to create
-soft links in the lib/voronoi directory for "includelink" and
-"liblink" which point to installed Voro++ directories. Building with
-the VORONOI package uses the contents of the
-lib/voronoi/Makefile.lammps file in the compile/link process. You
-should not need to edit this file. Note that the Make.py script has a
-"-voronoi" option to allow the Voro++ library to be downloaded and/or
-installed and LAMMPS to be built in one step. Type "python
-src/Make.py -h -voronoi" to see the details.
+To use this package you must have the Voro++ library available on your
+system.
-To install via make or Make.py:
+[Author:] Daniel Schwen (INL) while at LANL. The open-source Voro++
+library was written by Chris Rycroft (Harvard U) while at UC Berkeley
+and LBNL.
-cd ~/lammps/lib/voronoi
-python install.py -g -b -l # download Voro++, build in lib/voronoi, create links
-cd ~/lammps/src
-make yes-voronoi
-make machine :pre
+[Install or un-install:]
+
+Before building LAMMPS with this package, you must first download and
+build the Voro++ library. You can do this manually if you prefer;
+follow the instructions in lib/voronoi/README. You can also do it in
+one step from the lammps/src dir, using a command like these, which
+simply invoke the lib/voronoi/Install.py script with the specified
+args:
+
+make lib-voronoi # print help message
+make lib-voronoi args="-g -b -l" # download and build in default lib/voronoi/voro++-0.4.6
+make lib-voronoi args="-h . voro++ -g -b -l" # download and build in lib/voronoi/voro++
+make lib-voronoi args="-h ~ voro++ -g -b -l" # download and build in ~/voro++ :pre
+
+Note that the final -l switch is to create 2 symbolic (soft) links,
+"includelink" and "liblink", in lib/voronoi to point to the Voro++ src
+dir. When LAMMPS builds it will use these links. You should not need
+to edit the lib/voronoi/Makefile.lammps file.
-Make.py -p voronoi -voronoi install="-g -b -l" -a machine :pre
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
-To un-install via make or Make.py:
+make yes-voronoi
+make machine :pre
make no-voronoi
make machine :pre
-Make.py -p ^voronoi -a machine :pre
-
-Supporting info: src/VORONOI/README, lib/voronoi/README, "compute
-voronoi/atom"_compute_voronoi_atom.html, examples/voronoi
+[Supporting info:]
+
+src/VORONOI: filenames -> commands
+src/VORONOI/README
+lib/voronoi/README
+"compute voronoi/atom"_compute_voronoi_atom.html
+examples/voronoi :ul
:line
+:line
+
+USER-ATC package :link(USER-ATC),h4
-4.2 User packages :h4,link(pkg_2)
+[Contents:]
-The current list of user-contributed packages is as follows:
+ATC stands for atoms-to-continuum. This package implements a "fix
+atc"_fix_atc.html command to either couple molecular dynamics with
+continuum finite element equations or perform on-the-fly conversion of
+atomic information to continuum fields.
-Package, Description, Author(s), Doc page, Example, Pic/movie, Library
-"USER-ATC"_#USER-ATC, atom-to-continuum coupling, Jones & Templeton & Zimmerman (1), "fix atc"_fix_atc.html, USER/atc, "atc"_atc, lib/atc
-"USER-AWPMD"_#USER-AWPMD, wave-packet MD, Ilya Valuev (JIHT), "pair_style awpmd/cut"_pair_awpmd.html, USER/awpmd, -, lib/awpmd
-"USER-CG-CMM"_#USER-CG-CMM, coarse-graining model, Axel Kohlmeyer (Temple U), "pair_style lj/sdk"_pair_sdk.html, USER/cg-cmm, "cg"_cg, -
-"USER-CGDNA"_#USER-CGDNA, coarse-grained DNA force fields, Oliver Henrich (U Strathclyde Glasgow), src/USER-CGDNA/README, USER/cgdna, -, -
-"USER-COLVARS"_#USER-COLVARS, collective variables, Fiorin & Henin & Kohlmeyer (2), "fix colvars"_fix_colvars.html, USER/colvars, "colvars"_colvars, lib/colvars
-"USER-DIFFRACTION"_#USER-DIFFRACTION, virutal x-ray and electron diffraction, Shawn Coleman (ARL),"compute xrd"_compute_xrd.html, USER/diffraction, -, -
-"USER-DPD"_#USER-DPD, reactive dissipative particle dynamics (DPD), Larentzos & Mattox & Brennan (5), src/USER-DPD/README, USER/dpd, -, -
-"USER-DRUDE"_#USER-DRUDE, Drude oscillators, Dequidt & Devemy & Padua (3), "tutorial"_tutorial_drude.html, USER/drude, -, -
-"USER-EFF"_#USER-EFF, electron force field, Andres Jaramillo-Botero (Caltech), "pair_style eff/cut"_pair_eff.html, USER/eff, "eff"_eff, -
-"USER-FEP"_#USER-FEP, free energy perturbation, Agilio Padua (U Blaise Pascal Clermont-Ferrand), "compute fep"_compute_fep.html, USER/fep, -, -
-"USER-H5MD"_#USER-H5MD, dump output via HDF5, Pierre de Buyl (KU Leuven), "dump h5md"_dump_h5md.html, -, -, lib/h5md
-"USER-INTEL"_#USER-INTEL, Vectorized CPU and Intel(R) coprocessor styles, W. Michael Brown (Intel), "Section 5.3.2"_accelerate_intel.html, examples/intel, -, -
-"USER-LB"_#USER-LB, Lattice Boltzmann fluid, Colin Denniston (U Western Ontario), "fix lb/fluid"_fix_lb_fluid.html, USER/lb, -, -
-"USER-MGPT"_#USER-MGPT, fast MGPT multi-ion potentials, Tomas Oppelstrup & John Moriarty (LLNL), "pair_style mgpt"_pair_mgpt.html, USER/mgpt, -, -
-"USER-MISC"_#USER-MISC, single-file contributions, USER-MISC/README, USER-MISC/README, -, -, -
-"USER-MANIFOLD"_#USER-MANIFOLD, motion on 2d surface, Stefan Paquay (Eindhoven U of Technology), "fix manifoldforce"_fix_manifoldforce.html, USER/manifold, "manifold"_manifold, -
-"USER-MOLFILE"_#USER-MOLFILE, "VMD"_VMD molfile plug-ins, Axel Kohlmeyer (Temple U), "dump molfile"_dump_molfile.html, -, -, VMD-MOLFILE
-"USER-NC-DUMP"_#USER-NC-DUMP, dump output via NetCDF, Lars Pastewka (Karlsruhe Institute of Technology, KIT), "dump nc / dump nc/mpiio"_dump_nc.html, -, -, lib/netcdf
-"USER-OMP"_#USER-OMP, OpenMP threaded styles, Axel Kohlmeyer (Temple U), "Section 5.3.4"_accelerate_omp.html, -, -, -
-"USER-PHONON"_#USER-PHONON, phonon dynamical matrix, Ling-Ti Kong (Shanghai Jiao Tong U), "fix phonon"_fix_phonon.html, USER/phonon, -, -
-"USER-QMMM"_#USER-QMMM, QM/MM coupling, Axel Kohlmeyer (Temple U), "fix qmmm"_fix_qmmm.html, USER/qmmm, -, lib/qmmm
-"USER-QTB"_#USER-QTB, quantum nuclear effects, Yuan Shen (Stanford), "fix qtb"_fix_qtb.html "fix qbmsst"_fix_qbmsst.html, qtb, -, -
-"USER-QUIP"_#USER-QUIP, QUIP/libatoms interface, Albert Bartok-Partay (U Cambridge), "pair_style quip"_pair_quip.html, USER/quip, -, lib/quip
-"USER-REAXC"_#USER-REAXC, C version of ReaxFF, Metin Aktulga (LBNL), "pair_style reaxc"_pair_reax_c.html, reax, -, -
-"USER-SMD"_#USER-SMD, smoothed Mach dynamics, Georg Ganzenmuller (EMI), "SMD User Guide"_PDF/SMD_LAMMPS_userguide.pdf, USER/smd, -, -
-"USER-SMTBQ"_#USER-SMTBQ, Second Moment Tight Binding - QEq potential, Salles & Maras & Politano & Tetot (4), "pair_style smtbq"_pair_smtbq.html, USER/smtbq, -, -
-"USER-SPH"_#USER-SPH, smoothed particle hydrodynamics, Georg Ganzenmuller (EMI), "SPH User Guide"_PDF/SPH_LAMMPS_userguide.pdf, USER/sph, "sph"_sph, -
-"USER-TALLY"_#USER-TALLY, Pairwise tallied computes, Axel Kohlmeyer (Temple U), "compute XXX/tally"_compute_tally.html, USER/tally, -, -
-"USER-VTK"_#USER-VTK, VTK-style dumps, Berger and Queteschiner (6), "compute custom/vtk"_dump_custom_vtk.html, -, -, lib/vtk
-:tb(ea=c)
+[Authors:] Reese Jones, Jeremy Templeton, Jon Zimmerman (Sandia).
-:link(atc,http://lammps.sandia.gov/pictures.html#atc)
-:link(cg,http://lammps.sandia.gov/pictures.html#cg)
-:link(eff,http://lammps.sandia.gov/movies.html#eff)
-:link(manifold,http://lammps.sandia.gov/movies.html#manifold)
-:link(sph,http://lammps.sandia.gov/movies.html#sph)
-:link(VMD,http://www.ks.uiuc.edu/Research/vmd)
+[Install or un-install:]
+
+Before building LAMMPS with this package, you must first build the ATC
+library in lib/atc. You can do this manually if you prefer; follow
+the instructions in lib/atc/README. You can also do it in one step
+from the lammps/src dir, using a command like these, which simply
+invoke the lib/atc/Install.py script with the specified args:
-The "Authors" column lists a name(s) if a specific person is
-responsible for creating and maintaining the package.
+make lib-atc # print help message
+make lib-atc args="-m g++" # build with GNU g++ compiler
+make lib-atc args="-m icc" # build with Intel icc compiler :pre
-(1) The ATC package was created by Reese Jones, Jeremy Templeton, and
-Jon Zimmerman (Sandia).
+The build should produce two files: lib/atc/libatc.a and
+lib/atc/Makefile.lammps. The latter is copied from an existing
+Makefile.lammps.* and has settings needed to build LAMMPS with the ATC
+library. If necessary, you can edit/create a new
+lib/atc/Makefile.machine file for your system, which should define an
+EXTRAMAKE variable to specify a corresponding Makefile.lammps.machine
+file.
-(2) The COLVARS package was created by Axel Kohlmeyer (Temple U) using
-the colvars module library written by Giacomo Fiorin (Temple U) and
-Jerome Henin (LISM, Marseille, France).
+Note that the Makefile.lammps file has settings for the BLAS and
+LAPACK linear algebra libraries. As explained in lib/atc/README these
+can either exist on your system, or you can use the files provided in
+lib/linalg. In the latter case you also need to build the library
+in lib/linalg with a command like these:
-(3) The DRUDE package was created by Alain Dequidt (U Blaise Pascal
-Clermont-Ferrand) and co-authors Julien Devemy (CNRS) and Agilio Padua
-(U Blaise Pascal).
+make lib-linalg # print help message
+make lib-atc args="-m gfortran" # build with GNU Fortran compiler
-(4) The SMTBQ package was created by Nicolas Salles, Emile Maras,
-Olivier Politano, and Robert Tetot (LAAS-CNRS, France).
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
-(5) The USER-DPD package was created by James Larentzos (ARL), Timothy
-Mattox (Engility), and John Brennan (ARL).
+make yes-user-atc
+make machine :pre
+
+make no-user-atc
+make machine :pre
+
+[Supporting info:]
-(6) The USER-VTK package was created by Richard Berger (JKU) and
-Daniel Queteschiner (DCS Computing).
+src/USER-ATC: filenames -> commands
+src/USER-ATC/README
+"fix atc"_fix_atc.html
+examples/USER/atc
+http://lammps.sandia.gov/pictures.html#atc :ul
-The "Doc page" column links to either a sub-section of the
-"Section 6"_Section_howto.html of the manual, or an input script
-command implemented as part of the package, or to additional
-documentation provided within the package.
+:line
-The "Example" column is a sub-directory in the examples directory of
-the distribution which has an input script that uses the package.
-E.g. "peptide" refers to the examples/peptide directory.
+USER-AWPMD package :link(USER-AWPMD),h4
-The "Library" column lists an external library which must be built
-first and which LAMMPS links to when it is built. If it is listed as
-lib/package, then the code for the library is under the lib directory
-of the LAMMPS distribution. See the lib/package/README file for info
-on how to build the library. If it is not listed as lib/package, then
-it is a third-party library not included in the LAMMPS distribution.
-See details on all of this below for individual packages.
+[Contents:]
-:line
+AWPMD stands for Antisymmetrized Wave Packet Molecular Dynamics. This
+package implements an atom, pair, and fix style which allows electrons
+to be treated as explicit particles in a classical molecular dynamics
+model.
-USER-ATC package :link(USER-ATC),h5
+[Author:] Ilya Valuev (JIHT, Russia).
-Contents: ATC stands for atoms-to-continuum. This package implements
-a "fix atc"_fix_atc.html command to either couple MD with continuum
-finite element equations or perform on-the-fly post-processing of
-atomic information to continuum fields. See src/USER-ATC/README for
-more details.
-
-To build LAMMPS with this package ...
-
-To install via make or Make.py:
+[Install or un-install:]
+
+Before building LAMMPS with this package, you must first build the
+AWPMD library in lib/awpmd. You can do this manually if you prefer;
+follow the instructions in lib/awpmd/README. You can also do it in
+one step from the lammps/src dir, using a command like these, which
+simply invoke the lib/awpmd/Install.py script with the specified args:
-make yes-user-atc
-make machine :pre
+make lib-awpmd # print help message
+make lib-awpmd args="-m g++" # build with GNU g++ compiler
+make lib-awpmd args="-m icc" # build with Intel icc compiler :pre
-Make.py -p atc -a machine :pre
+The build should produce two files: lib/awpmd/libawpmd.a and
+lib/awpmd/Makefile.lammps. The latter is copied from an existing
+Makefile.lammps.* and has settings needed to build LAMMPS with the
+AWPMD library. If necessary, you can edit/create a new
+lib/awpmd/Makefile.machine file for your system, which should define
+an EXTRAMAKE variable to specify a corresponding
+Makefile.lammps.machine file.
-To un-install via make or Make.py:
+Note that the Makefile.lammps file has settings for the BLAS and
+LAPACK linear algebra libraries. As explained in lib/awpmd/README
+these can either exist on your system, or you can use the files
+provided in lib/linalg. In the latter case you also need to build the
+library in lib/linalg with a command like these:
-make no-user-atc
-make machine :pre
+make lib-linalg # print help message
+make lib-atc args="-m gfortran" # build with GNU Fortran compiler
-Make.py -p ^atc -a machine :pre
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
-Supporting info:src/USER-ATC/README, "fix atc"_fix_atc.html,
-examples/USER/atc
+make yes-user-awpmd
+make machine :pre
+
+make no-user-awpmd
+make machine :pre
+
+[Supporting info:]
-Authors: Reese Jones (rjones at sandia.gov), Jeremy Templeton (jatempl
-at sandia.gov) and Jon Zimmerman (jzimmer at sandia.gov) at Sandia.
-Contact them directly if you have questions.
+src/USER-AWPMD: filenames -> commands
+src/USER-AWPMD/README
+"pair awpmd/cut"_pair_awpmd.html
+"fix nve/awpmd"_fix_nve_awpmd.html
+examples/USER/awpmd :ul
:line
-USER-AWPMD package :link(USER-AWPMD),h5
+USER-CGDNA package :link(USER-CGDNA),h4
-Contents: AWPMD stands for Antisymmetrized Wave Packet Molecular
-Dynamics. This package implements an atom, pair, and fix style which
-allows electrons to be treated as explicit particles in an MD
-calculation. See src/USER-AWPMD/README for more details.
+[Contents:]
-To build LAMMPS with this package ...
+Several pair styles, a bond style, and integration fixes for
+coarse-grained models of single- and double-stranded DNA based on the
+oxDNA model of Doye, Louis and Ouldridge at the University of Oxford.
+This includes Langevin-type rigid-body integrators with improved
+stability.
-Supporting info: src/USER-AWPMD/README, "fix
-awpmd/cut"_pair_awpmd.html, examples/USER/awpmd
+[Author:] Oliver Henrich (University of Edinburgh).
-Author: Ilya Valuev at the JIHT in Russia (valuev at
-physik.hu-berlin.de). Contact him directly if you have questions.
+[Install or un-install:]
+
+make yes-user-cgdna
+make machine :pre
+
+make no-user-cgdna
+make machine :pre
+
+[Supporting info:]
+
+src/USER-CGDNA: filenames -> commands
+/src/USER-CGDNA/README
+"pair_style oxdna/*"_pair_oxdna.html
+"pair_style oxdna2/*"_pair_oxdna2.html
+"bond_style oxdna/*"_bond_oxdna.html
+"bond_style oxdna2/*"_bond_oxdna2.html
+"fix nve/dotc/langevin"_fix_nve_dotc_langevin.html :ul
:line
-USER-CG-CMM package :link(USER-CG-CMM),h5
+USER-CGSDK package :link(USER-CGSDK),h4
-Contents: CG-CMM stands for coarse-grained ??. This package
-implements several pair styles and an angle style using the coarse
-grained parametrization of Shinoda, DeVane, Klein, Mol Sim, 33, 27
-(2007) (SDK), with extensions to simulate ionic liquids, electrolytes,
-lipids and charged amino acids. See src/USER-CG-CMM/README for more
-details.
+[Contents:]
-Supporting info: src/USER-CG-CMM/README, "pair lj/sdk"_pair_sdk.html,
-"pair lj/sdk/coul/long"_pair_sdk.html, "angle sdk"_angle_sdk.html,
-examples/USER/cg-cmm
+Several pair styles and an angle style which implement the
+coarse-grained SDK model of Shinoda, DeVane, and Klein which enables
+simulation of ionic liquids, electrolytes, lipids and charged amino
+acids.
-Author: Axel Kohlmeyer at Temple U (akohlmey at gmail.com). Contact
-him directly if you have questions.
-
-:line
+[Author:] Axel Kohlmeyer (Temple U).
-USER-CGDNA package :link(USER-CGDNA),h5
-
-Contents: The CGDNA package implements coarse-grained force fields for
-single- and double-stranded DNA. These are at the moment mainly the
-oxDNA and oxDNA2 models, developed by Doye, Louis and Ouldridge at the University
-of Oxford. The package also contains Langevin-type rigid-body
-integrators with improved stability.
+[Install or un-install:]
+
+make yes-user-cgsdk
+make machine :pre
+
+make no-user-cgsdk
+make machine :pre
+
+[Supporting info:]
-See these doc pages to get started:
+src/USER-CGSDK: filenames -> commands
+src/USER-CGSDK/README
+"pair_style lj/sdk/*"_pair_sdk.html
+"angle_style sdk"_angle_sdk.html
+examples/USER/cgsdk
+http://lammps.sandia.gov/pictures.html#cg :ul
-"bond_style oxdna/fene"_bond_oxdna.html
-"bond_style oxdna2/fene"_bond_oxdna.html
-"pair_style oxdna/..."_pair_oxdna.html
-"pair_style oxdna2/..."_pair_oxdna2.html
-"fix nve/dotc/langevin"_fix_nve_dotc_langevin.html :ul
+:line
-Supporting info: /src/USER-CGDNA/README, "bond_style
-oxdna/fene"_bond_oxdna.html, "bond_style
-oxdna2/fene"_bond_oxdna.html, "pair_style
-oxdna/..."_pair_oxdna.html, "pair_style
-oxdna2/..."_pair_oxdna2.html, "fix
-nve/dotc/langevin"_fix_nve_dotc_langevin.html
+USER-COLVARS package :link(USER-COLVARS),h4
+
+[Contents:]
+
+COLVARS stands for collective variables, which can be used to
+implement various enhanced sampling methods, including Adaptive
+Biasing Force, Metadynamics, Steered MD, Umbrella Sampling and
+Restraints. A "fix colvars"_fix_colvars.html command is implemented
+which wraps a COLVARS library, which implements these methods.
+simulations.
+
+[Authors:] Axel Kohlmeyer (Temple U). The COLVARS library was written
+by Giacomo Fiorin (ICMS, Temple University, Philadelphia, PA, USA) and
+Jerome Henin (LISM, CNRS, Marseille, France).
+
+[Install or un-install:]
+
+Before building LAMMPS with this package, you must first build the
+COLVARS library in lib/colvars. You can do this manually if you
+prefer; follow the instructions in lib/colvars/README. You can also
+do it in one step from the lammps/src dir, using a command like these,
+which simply invoke the lib/colvars/Install.py script with the
+specified args:
+
+make lib-colvars # print help message
+make lib-colvars args="-m g++" # build with GNU g++ compiler :pre
+
+The build should produce two files: lib/colvars/libcolvars.a and
+lib/colvars/Makefile.lammps. The latter is copied from an existing
+Makefile.lammps.* and has settings needed to build LAMMPS with the
+COLVARS library (though typically the settings are just blank). If
+necessary, you can edit/create a new lib/colvars/Makefile.machine file
+for your system, which should define an EXTRAMAKE variable to specify
+a corresponding Makefile.lammps.machine file.
+
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
+
+make yes-user-colvars
+make machine :pre
+
+make no-user-colvars
+make machine :pre
+
+[Supporting info:]
-Author: Oliver Henrich at the University of Strathclyde, Glasgow
-(oliver.henrich at strath.ac.uk, also ohenrich at ph.ed.ac.uk).
-Contact him directly if you have any questions.
+src/USER-COLVARS: filenames -> commands
+"doc/PDF/colvars-refman-lammps.pdf"_PDF/colvars-refman-lammps.pdf
+src/USER-COLVARS/README
+lib/colvars/README
+"fix colvars"_fix_colvars.html
+examples/USER/colvars :ul
:line
-USER-COLVARS package :link(USER-COLVARS),h5
+USER-DIFFRACTION package :link(USER-DIFFRACTION),h4
-Contents: COLVARS stands for collective variables which can be used to
-implement Adaptive Biasing Force, Metadynamics, Steered MD, Umbrella
-Sampling and Restraints. This package implements a "fix
-colvars"_fix_colvars.html command which wraps a COLVARS library which
-can perform those kinds of simulations. See src/USER-COLVARS/README
-for more details.
+[Contents:]
-Supporting info:
-"doc/PDF/colvars-refman-lammps.pdf"_PDF/colvars-refman-lammps.pdf,
-src/USER-COLVARS/README, lib/colvars/README, "fix
-colvars"_fix_colvars.html, examples/USER/colvars
+Two computes and a fix for calculating x-ray and electron diffraction
+intensities based on kinematic diffraction theory.
-Authors: Axel Kohlmeyer at Temple U (akohlmey at gmail.com) wrote the
-fix. The COLVARS library itself is written and maintained by Giacomo
-Fiorin (ICMS, Temple University, Philadelphia, PA, USA) and Jerome
-Henin (LISM, CNRS, Marseille, France). Contact them directly if you
-have questions.
+[Author:] Shawn Coleman while at the U Arkansas.
-:line
+[Install or un-install:]
+
+make yes-user-diffraction
+make machine :pre
+
+make no-user-diffraction
+make machine :pre
+
+[Supporting info:]
-USER-DIFFRACTION package :link(USER-DIFFRACTION),h5
+src/USER-DIFFRACTION: filenames -> commands
+"compute saed"_compute_saed.html
+"compute xrd"_compute_xrd.html
+"fix saed/vtk"_fix_saed_vtk.html
+examples/USER/diffraction :ul
-Contents: This packages implements two computes and a fix for
-calculating x-ray and electron diffraction intensities based on
-kinematic diffraction theory. See src/USER-DIFFRACTION/README for
-more details.
+:line
-Supporting info: "compute saed"_compute_saed.html, "compute
-xrd"_compute_xrd.html, "fix saed/vtk"_fix_saed_vtk.html,
-examples/USER/diffraction
+USER-DPD package :link(USER-DPD),h4
-Author: Shawn P. Coleman (shawn.p.coleman8.ctr at mail.mil) while at
-the University of Arkansas. Contact him directly if you have
-questions.
+[Contents:]
-:line
+DPD stands for dissipative particle dynamics. This package implements
+coarse-grained DPD-based models for energetic, reactive molecular
+crystalline materials. It includes many pair styles specific to these
+systems, including for reactive DPD, where each particle has internal
+state for multiple species and a coupled set of chemical reaction ODEs
+are integrated each timestep. Highly accurate time intergrators for
+isothermal, isoenergetic, isobaric and isenthalpic conditions are
+included. These enable long timesteps via the Shardlow splitting
+algorithm.
-USER-DPD package :link(USER-DPD),h5
+[Authors:] Jim Larentzos (ARL), Tim Mattox (Engility Corp), and and John
+Brennan (ARL).
-Contents: DPD stands for dissipative particle dynamics, This package
-implements DPD for isothermal, isoenergetic, isobaric and isenthalpic
-conditions. It also has extensions for performing reactive DPD, where
-each particle has internal state for multiple species and a coupled
-set of chemical reaction ODEs are integrated each timestep. The DPD
-equations of motion are integrated efficiently through the Shardlow
-splitting algorithm. See src/USER-DPD/README for more details.
+[Install or un-install:]
+
+make yes-user-dpd
+make machine :pre
+
+make no-user-dpd
+make machine :pre
+
+[Supporting info:]
-Supporting info: /src/USER-DPD/README, "compute dpd"_compute_dpd.html
+src/USER-DPD: filenames -> commands
+/src/USER-DPD/README
+"compute dpd"_compute_dpd.html
"compute dpd/atom"_compute_dpd_atom.html
-"fix eos/cv"_fix_eos_table.html "fix eos/table"_fix_eos_table.html
-"fix eos/table/rx"_fix_eos_table_rx.html "fix shardlow"_fix_shardlow.html
-"fix rx"_fix_rx.html "pair table/rx"_pair_table_rx.html
-"pair dpd/fdt"_pair_dpd_fdt.html "pair dpd/fdt/energy"_pair_dpd_fdt.html
-"pair exp6/rx"_pair_exp6_rx.html "pair multi/lucy"_pair_multi_lucy.html
-"pair multi/lucy/rx"_pair_multi_lucy_rx.html, examples/USER/dpd
-
-Authors: James Larentzos (ARL) (james.p.larentzos.civ at mail.mil),
-Timothy Mattox (Engility Corp) (Timothy.Mattox at engilitycorp.com)
-and John Brennan (ARL) (john.k.brennan.civ at mail.mil). Contact them
-directly if you have questions.
+"fix eos/cv"_fix_eos_table.html
+"fix eos/table"_fix_eos_table.html
+"fix eos/table/rx"_fix_eos_table_rx.html
+"fix shardlow"_fix_shardlow.html
+"fix rx"_fix_rx.html
+"pair table/rx"_pair_table_rx.html
+"pair dpd/fdt"_pair_dpd_fdt.html
+"pair dpd/fdt/energy"_pair_dpd_fdt.html
+"pair exp6/rx"_pair_exp6_rx.html
+"pair multi/lucy"_pair_multi_lucy.html
+"pair multi/lucy/rx"_pair_multi_lucy_rx.html
+examples/USER/dpd :ul
:line
-USER-DRUDE package :link(USER-DRUDE),h5
+USER-DRUDE package :link(USER-DRUDE),h4
-Contents: This package contains methods for simulating polarizable
-systems using thermalized Drude oscillators. It has computes, fixes,
-and pair styles for this purpose. See "Section
+[Contents:]
+
+Fixes, pair styles, and a compute to simulate thermalized Drude
+oscillators as a model of polarization. See "Section
6.27"_Section_howto.html#howto_27 for an overview of how to use the
-package. See src/USER-DRUDE/README for additional details. There are
-auxiliary tools for using this package in tools/drude.
+package. There are auxiliary tools for using this package in
+tools/drude.
-Supporting info: "Section 6.27"_Section_howto.html#howto_27,
-src/USER-DRUDE/README, "fix drude"_fix_drude.html, "fix
-drude/transform/*"_fix_drude_transform.html, "compute
-temp/drude"_compute_temp_drude.html, "pair thole"_pair_thole.html,
-"pair lj/cut/thole/long"_pair_thole.html, examples/USER/drude,
-tools/drude
+[Authors:] Alain Dequidt (U Blaise Pascal Clermont-Ferrand), Julien
+Devemy (CNRS), and Agilio Padua (U Blaise Pascal).
-Authors: Alain Dequidt at Universite Blaise Pascal Clermont-Ferrand
-(alain.dequidt at univ-bpclermont.fr); co-authors: Julien Devemy,
-Agilio Padua. Contact them directly if you have questions.
+[Install or un-install:]
+
+make yes-user-drude
+make machine :pre
+
+make no-user-drude
+make machine :pre
+
+[Supporting info:]
+
+src/USER-DRUDE: filenames -> commands
+"Section 6.27"_Section_howto.html#howto_27
+"Section 6.25"_Section_howto.html#howto_25
+src/USER-DRUDE/README
+"fix drude"_fix_drude.html
+"fix drude/transform/*"_fix_drude_transform.html
+"compute temp/drude"_compute_temp_drude.html
+"pair thole"_pair_thole.html
+"pair lj/cut/thole/long"_pair_thole.html
+examples/USER/drude
+tools/drude :ul
:line
-USER-EFF package :link(USER-EFF),h5
+USER-EFF package :link(USER-EFF),h4
-Contents: EFF stands for electron force field. This package contains
-atom, pair, fix and compute styles which implement the eFF as
+[Contents:]
+
+EFF stands for electron force field which allows a classical MD code
+to model electrons as particles of variable radius. This package
+contains atom, pair, fix and compute styles which implement the eFF as
described in A. Jaramillo-Botero, J. Su, Q. An, and W.A. Goddard III,
-JCC, 2010. The eFF potential was first introduced by Su and Goddard,
-in 2007. See src/USER-EFF/README for more details. There are
-auxiliary tools for using this package in tools/eff; see its README
-file.
+JCC, 2010. The eFF potential was first introduced by Su and Goddard,
+in 2007. There are auxiliary tools for using this package in
+tools/eff; see its README file.
-Supporting info:
+[Author:] Andres Jaramillo-Botero (CalTech).
-Author: Andres Jaramillo-Botero at CalTech (ajaramil at
-wag.caltech.edu). Contact him directly if you have questions.
+[Install or un-install:]
+
+make yes-user-eff
+make machine :pre
+
+make no-user-eff
+make machine :pre
+
+[Supporting info:]
+
+src/USER-EFF: filenames -> commands
+src/USER-EFF/README
+"atom_style electron"_atom_style.html
+"fix nve/eff"_fix_nve_eff.html
+"fix nvt/eff"_fix_nvt_eff.html
+"fix npt/eff"_fix_npt_eff.html
+"fix langevin/eff"_fix_langevin_eff.html
+"compute temp/eff"_compute_temp_eff.html
+"pair eff/cut"_pair_eff.html
+"pair eff/inline"_pair_eff.html
+examples/USER/eff
+tools/eff/README
+tools/eff
+http://lammps.sandia.gov/movies.html#eff :ul
:line
-USER-FEP package :link(USER-FEP),h5
+USER-FEP package :link(USER-FEP),h4
+
+[Contents:]
-Contents: FEP stands for free energy perturbation. This package
-provides methods for performing FEP simulations by using a "fix
+FEP stands for free energy perturbation. This package provides
+methods for performing FEP simulations by using a "fix
adapt/fep"_fix_adapt_fep.html command with soft-core pair potentials,
-which have a "soft" in their style name. See src/USER-FEP/README for
-more details. There are auxiliary tools for using this package in
-tools/fep; see its README file.
+which have a "soft" in their style name. There are auxiliary tools
+for using this package in tools/fep; see its README file.
-Supporting info: src/USER-FEP/README, "fix
-adapt/fep"_fix_adapt_fep.html, "compute fep"_compute_fep.html,
-"pair_style */soft"_pair_lj_soft.html, examples/USER/fep
+[Author:] Agilio Padua (Universite Blaise Pascal Clermont-Ferrand)
-Author: Agilio Padua at Universite Blaise Pascal Clermont-Ferrand
-(agilio.padua at univ-bpclermont.fr). Contact him directly if you have
-questions.
+[Install or un-install:]
+
+make yes-user-fep
+make machine :pre
+
+make no-user-fep
+make machine :pre
+
+[Supporting info:]
+
+src/USER-FEP: filenames -> commands
+src/USER-FEP/README
+"fix adapt/fep"_fix_adapt_fep.html
+"compute fep"_compute_fep.html
+"pair_style */soft"_pair_lj_soft.html
+examples/USER/fep
+tools/fep/README
+tools/fep :ul
:line
-USER-H5MD package :link(USER-H5MD),h5
+USER-H5MD package :link(USER-H5MD),h4
-Contents: H5MD stands for HDF5 for MD. "HDF5"_HDF5 is a binary,
-portable, self-describing file format, used by many scientific
-simulations. H5MD is a format for molecular simulations, built on top
-of HDF5. This package implements a "dump h5md"_dump_h5md.html command
-to output LAMMPS snapshots in this format. See src/USER-H5MD/README
-for more details.
+[Contents:]
-:link(HDF5,http://www.hdfgroup.org/HDF5/)
+H5MD stands for HDF5 for MD. "HDF5"_HDF5 is a portable, binary,
+self-describing file format, used by many scientific simulations.
+H5MD is a format for molecular simulations, built on top of HDF5.
+This package implements a "dump h5md"_dump_h5md.html command to output
+LAMMPS snapshots in this format.
-Supporting info: src/USER-H5MD/README, lib/h5md/README, "dump
-h5md"_dump_h5md.html
+:link(HDF5,http://www.hdfgroup.org/HDF5)
-Author: Pierre de Buyl at KU Leuven (see http://pdebuyl.be) created
-this package as well as the H5MD format and library. Contact him
-directly if you have questions.
+To use this package you must have the HDF5 library available on your
+system.
-:line
-
-USER-INTEL package :link(USER-INTEL),h5
+[Author:] Pierre de Buyl (KU Leuven) created both the package and the
+H5MD format.
-Contents: Dozens of pair, bond, angle, dihedral, and improper styles
-that are optimized for Intel CPUs and the Intel Xeon Phi (in offload
-mode). All of them have an "intel" in their style name. "Section
-5.3.2"_accelerate_intel.html gives details of what hardware
-and compilers are required on your system, and how to build and use
-this package. Also see src/USER-INTEL/README for more details. See
-the KOKKOS, OPT, and USER-OMP packages, which also have CPU and
-Phi-enabled styles.
+[Install or un-install:]
-Supporting info: examples/accelerate, src/USER-INTEL/TEST
+Note that to follow these steps to compile and link to the CH5MD
+library, you need the standard HDF5 software package installed on your
+system, which should include the h5cc compiler and the HDF5 library.
-"Section 5.3"_Section_accelerate.html#acc_3
+Before building LAMMPS with this package, you must first build the
+CH5MD library in lib/h5md. You can do this manually if you prefer;
+follow the instructions in lib/h5md/README. You can also do it in one
+step from the lammps/src dir, using a command like these, which simply
+invoke the lib/h5md/Install.py script with the specified args:
-Author: Mike Brown at Intel (michael.w.brown at intel.com). Contact
-him directly if you have questions.
-
-For the USER-INTEL package, you have 2 choices when building. You can
-build with CPU or Phi support. The latter uses Xeon Phi chips in
-"offload" mode. Each of these modes requires additional settings in
-your Makefile.machine for CCFLAGS and LINKFLAGS.
+make lib-h5md # print help message
+make lib-hm5d args="-m h5cc" # build with h5cc compiler :pre
-For CPU mode (if using an Intel compiler):
+The build should produce two files: lib/h5md/libch5md.a and
+lib/h5md/Makefile.lammps. The latter is copied from an existing
+Makefile.lammps.* and has settings needed to build LAMMPS with the
+system HDF5 library. If necessary, you can edit/create a new
+lib/h5md/Makefile.machine file for your system, which should define an
+EXTRAMAKE variable to specify a corresponding Makefile.lammps.machine
+file.
-CCFLAGS: add -fopenmp, -DLAMMPS_MEMALIGN=64, -restrict, -xHost, -fno-alias, -ansi-alias, -override-limits
-LINKFLAGS: add -fopenmp :ul
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
+
+make yes-user-h5md
+make machine :pre
+
+make no-user-h5md
+make machine :pre
+
+[Supporting info:]
-For Phi mode add the following in addition to the CPU mode flags:
+src/USER-H5MD: filenames -> commands
+src/USER-H5MD/README
+lib/h5md/README
+"dump h5md"_dump_h5md.html :ul
-CCFLAGS: add -DLMP_INTEL_OFFLOAD and
-LINKFLAGS: add -offload :ul
+:line
-And also add this to CCFLAGS:
+USER-INTEL package :link(USER-INTEL),h4
--offload-option,mic,compiler,"-fp-model fast=2 -mGLOB_default_function_attrs=\"gather_scatter_loop_unroll=4\"" :pre
+[Contents:]
-Examples:
+Dozens of pair, fix, bond, angle, dihedral, improper, and kspace
+styles which are optimized for Intel CPUs and KNLs (Knights Landing).
+All of them have an "intel" in their style name. "Section
+5.3.2"_accelerate_intel.html gives details of what hardware and
+compilers are required on your system, and how to build and use this
+package. Its styles can be invoked at run time via the "-sf intel" or
+"-suffix intel" "command-line switches"_Section_start.html#start_7.
+Also see the "KOKKOS"_#KOKKOS, "OPT"_#OPT, and "USER-OMP"_#USER-OMP
+packages, which have styles optimized for CPUs and KNLs.
-:line
+You need to have an Intel compiler, version 14 or higher to take full
+advantage of this package.
-USER-LB package :link(USER-LB),h5
+[Author:] Mike Brown (Intel).
-Supporting info:
+[Install or un-install:]
-This package contains a LAMMPS implementation of a background
-Lattice-Boltzmann fluid, which can be used to model MD particles
-influenced by hydrodynamic forces.
+For the USER-INTEL package, you have 2 choices when building. You can
+build with either CPU or KNL support. Each choice requires additional
+settings in your Makefile.machine for CCFLAGS and LINKFLAGS and
+optimized malloc libraries. See the
+src/MAKE/OPTIONS/Makefile.intel_cpu and src/MAKE/OPTIONS/Makefile.knl
+files for examples.
+
+For CPUs:
+
+OPTFLAGS = -xHost -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
+CCFLAGS = -g -qopenmp -DLAMMPS_MEMALIGN=64 -no-offload \
+ -fno-alias -ansi-alias -restrict $(OPTFLAGS)
+LINKFLAGS = -g -qopenmp $(OPTFLAGS)
+LIB = -ltbbmalloc -ltbbmalloc_proxy
+
+For KNLs:
+
+OPTFLAGS = -xMIC-AVX512 -O2 -fp-model fast=2 -no-prec-div -qoverride-limits
+CCFLAGS = -g -qopenmp -DLAMMPS_MEMALIGN=64 -no-offload \
+ -fno-alias -ansi-alias -restrict $(OPTFLAGS)
+LINKFLAGS = -g -qopenmp $(OPTFLAGS)
+LIB = -ltbbmalloc
+
+Once you have an appropriate Makefile.machine, you can
+install/un-install the package and build LAMMPS in the usual manner.
+Note that you cannot build one executable to run on multiple hardware
+targets (Intel CPUs or KNL). You need to build LAMMPS once for each
+hardware target, to produce a separate executable.
+
+You should also typically install the USER-OMP package, as it can be
+used in tandem with the USER-INTEL package to good effect, as
+explained in "Section 5.3.2"_accelerate_intel.html.
+
+make yes-user-intel yes-user-omp
+make machine :pre
+
+make no-user-intel no-user-omp
+make machine :pre
-See this doc page and its related commands to get started:
+[Supporting info:]
-"fix lb/fluid"_fix_lb_fluid.html
+src/USER-INTEL: filenames -> commands
+src/USER-INTEL/README
+"Section 5.3"_Section_accelerate.html#acc_3
+"Section 5.3.2"_accelerate_gpu.html
+"Section 2.7 -sf intel"_Section_start.html#start_7
+"Section 2.7 -pk intel"_Section_start.html#start_7
+"package intel"_package.html
+Styles sections of "Section 3.5"_Section_commands.html#cmd_5 for styles followed by (i)
+src/USER-INTEL/TEST
+"Benchmarks page"_http://lammps.sandia.gov/bench.html of web site :ul
-The people who created this package are Frances Mackay (fmackay at
-uwo.ca) and Colin (cdennist at uwo.ca) Denniston, University of
-Western Ontario. Contact them directly if you have questions.
+:line
-Examples: examples/USER/lb
+USER-LB package :link(USER-LB),h4
-:line
+[Contents:]
-USER-MGPT package :link(USER-MGPT),h5
+Fixes which implement a background Lattice-Boltzmann (LB) fluid, which
+can be used to model MD particles influenced by hydrodynamic forces.
-Supporting info:
+[Authors:] Frances Mackay and Colin Denniston (University of Western
+Ontario).
-This package contains a fast implementation for LAMMPS of
-quantum-based MGPT multi-ion potentials. The MGPT or model GPT method
-derives from first-principles DFT-based generalized pseudopotential
-theory (GPT) through a series of systematic approximations valid for
-mid-period transition metals with nearly half-filled d bands. The
-MGPT method was originally developed by John Moriarty at Lawrence
-Livermore National Lab (LLNL).
+[Install or un-install:]
+
+make yes-user-lb
+make machine :pre
+
+make no-user-lb
+make machine :pre
+
+[Supporting info:]
-In the general matrix representation of MGPT, which can also be
-applied to f-band actinide metals, the multi-ion potentials are
-evaluated on the fly during a simulation through d- or f-state matrix
-multiplication, and the forces that move the ions are determined
-analytically. The {mgpt} pair style in this package calculates forces
-and energies using an optimized matrix-MGPT algorithm due to Tomas
-Oppelstrup at LLNL.
+src/USER-LB: filenames -> commands
+src/USER-LB/README
+"fix lb/fluid"_fix_lb_fluid.html
+"fix lb/momentum"_fix_lb_momentum.html
+"fix lb/viscous"_fix_lb_viscous.html
+examples/USER/lb :ul
-See this doc page to get started:
+:line
-"pair_style mgpt"_pair_mgpt.html
+USER-MGPT package :link(USER-MGPT),h4
-The persons who created the USER-MGPT package are Tomas Oppelstrup
-(oppelstrup2@llnl.gov) and John Moriarty (moriarty2@llnl.gov)
-Contact them directly if you have any questions.
+[Contents:]
-Examples: examples/USER/mgpt
+A pair style which provides a fast implementation of the quantum-based
+MGPT multi-ion potentials. The MGPT or model GPT method derives from
+first-principles DFT-based generalized pseudopotential theory (GPT)
+through a series of systematic approximations valid for mid-period
+transition metals with nearly half-filled d bands. The MGPT method
+was originally developed by John Moriarty at LLNL. The pair style in
+this package calculates forces and energies using an optimized
+matrix-MGPT algorithm due to Tomas Oppelstrup at LLNL.
-:line
+[Authors:] Tomas Oppelstrup and John Moriarty (LLNL).
-USER-MISC package :link(USER-MISC),h5
+[Install or un-install:]
+
+make yes-user-mgpt
+make machine :pre
+
+make no-user-mgpt
+make machine :pre
+
+[Supporting info:]
-Supporting info:
+src/USER-MGPT: filenames -> commands
+src/USER-MGPT/README
+"pair_style mgpt"_pair_mgpt.html
+examples/USER/mgpt :ul
-The files in this package are a potpourri of (mostly) unrelated
-features contributed to LAMMPS by users. Each feature is a single
-pair of files (*.cpp and *.h).
+:line
-More information about each feature can be found by reading its doc
-page in the LAMMPS doc directory. The doc page which lists all LAMMPS
-input script commands is as follows:
+USER-MISC package :link(USER-MISC),h4
-"Section 3.5"_Section_commands.html#cmd_5
+[Contents:]
-User-contributed features are listed at the bottom of the fix,
-compute, pair, etc sections.
+A potpourri of (mostly) unrelated features contributed to LAMMPS by
+users. Each feature is a single fix, compute, pair, bond, angle,
+dihedral, improper, or command style.
-The list of features and author of each is given in the
+[Authors:] The author for each style in the package is listed in the
src/USER-MISC/README file.
-You should contact the author directly if you have specific questions
-about the feature or its coding.
+[Install or un-install:]
+
+make yes-user-misc
+make machine :pre
+
+make no-user-misc
+make machine :pre
+
+[Supporting info:]
-Examples: examples/USER/misc
+src/USER-MISC: filenames -> commands
+src/USER-MISC/README
+one doc page per individual command listed in src/USER-MISC/README
+examples/USER/misc :ul
:line
-USER-MANIFOLD package :link(USER-MANIFOLD),h5
+USER-MANIFOLD package :link(USER-MANIFOLD),h4
-Supporting info:
+[Contents:]
-This package contains a dump molfile command which uses molfile
-plugins that are bundled with the
-"VMD"_http://www.ks.uiuc.edu/Research/vmd molecular visualization and
-analysis program, to enable LAMMPS to dump its information in formats
-compatible with various molecular simulation tools.
+Several fixes and a "manifold" class which enable simulations of
+particles constrained to a manifold (a 2D surface within the 3D
+simulation box). This is done by applying the RATTLE constraint
+algorithm to formulate single-particle constraint functions
+g(xi,yi,zi) = 0 and their derivative (i.e. the normal of the manifold)
+n = grad(g).
-This package allows LAMMPS to perform MD simulations of particles
-constrained on a manifold (i.e., a 2D subspace of the 3D simulation
-box). It achieves this using the RATTLE constraint algorithm applied
-to single-particle constraint functions g(xi,yi,zi) = 0 and their
-derivative (i.e. the normal of the manifold) n = grad(g).
+[Author:] Stefan Paquay (Eindhoven University of Technology (TU/e), The
+Netherlands)
-See this doc page to get started:
+[Install or un-install:]
+
+make yes-user-manifold
+make machine :pre
+
+make no-user-manifold
+make machine :pre
+
+[Supporting info:]
+src/USER-MANIFOLD: filenames -> commands
+src/USER-MANIFOLD/README
+"doc/manifolds"_manifolds.html
"fix manifoldforce"_fix_manifoldforce.html
-
-The person who created this package is Stefan Paquay, at the Eindhoven
-University of Technology (TU/e), The Netherlands (s.paquay at tue.nl).
-Contact him directly if you have questions.
+"fix nve/manifold/rattle"_fix_nve_manifold/rattle.html
+"fix nvt/manifold/rattle"_fix_nvt_manifold/rattle.html
+examples/USER/manifold
+http://lammps.sandia.gov/movies.html#manifold :ul
:line
-USER-MOLFILE package :link(USER-MOLFILE),h5
+USER-MOLFILE package :link(USER-MOLFILE),h4
+
+[Contents:]
+
+A "dump molfile"_dump_molfile.html command which uses molfile plugins
+that are bundled with the "VMD"_http://www.ks.uiuc.edu/Research/vmd
+molecular visualization and analysis program, to enable LAMMPS to dump
+snapshots in formats compatible with various molecular simulation
+tools.
+
+To use this package you must have the desired VMD plugins available on
+your system.
+
+Note that this package only provides the interface code, not the
+plugins themselves, which will be accessed when requesting a specific
+plugin via the "dump molfile"_dump_molfile.html command. Plugins can
+be obtained from a VMD installation which has to match the platform
+that you are using to compile LAMMPS for. By adding plugins to VMD,
+support for new file formats can be added to LAMMPS (or VMD or other
+programs that use them) without having to recompile the application
+itself. More information about the VMD molfile plugins can be found
+at
+"http://www.ks.uiuc.edu/Research/vmd/plugins/molfile"_http://www.ks.uiuc.edu/Research/vmd/plugins/molfile.
+
+[Author:] Axel Kohlmeyer (Temple U).
+
+[Install or un-install:]
+
+Note that the lib/molfile/Makefile.lammps file has a setting for a
+dynamic loading library libdl.a that should is typically present on
+all systems, which is required for LAMMPS to link with this package.
+If the setting is not valid for your system, you will need to edit the
+Makefile.lammps file. See lib/molfile/README and
+lib/molfile/Makefile.lammps for details.
+
+make yes-user-molfile
+make machine :pre
+
+make no-user-molfile
+make machine :pre
+
+[Supporting info:]
+
+src/USER-MOLFILE: filenames -> commands
+src/USER-MOLFILE/README
+lib/molfile/README
+"dump molfile"_dump_molfile.html :ul
-Supporting info:
+:line
-This package contains a dump molfile command which uses molfile
-plugins that are bundled with the
-"VMD"_http://www.ks.uiuc.edu/Research/vmd molecular visualization and
-analysis program, to enable LAMMPS to dump its information in formats
-compatible with various molecular simulation tools.
+USER-NETCDF package :link(USER-NETCDF),h4
-The package only provides the interface code, not the plugins. These
-can be obtained from a VMD installation which has to match the
-platform that you are using to compile LAMMPS for. By adding plugins
-to VMD, support for new file formats can be added to LAMMPS (or VMD or
-other programs that use them) without having to recompile the
-application itself.
+[Contents:]
-See this doc page to get started:
+Dump styles for writing NetCDF formatted dump files. NetCDF is a
+portable, binary, self-describing file format developed on top of
+HDF5. The file contents follow the AMBER NetCDF trajectory conventions
+(http://ambermd.org/netcdf/nctraj.xhtml), but include extensions.
-"dump molfile"_dump_molfile.html
+To use this package you must have the NetCDF library available on your
+system.
-The person who created this package is Axel Kohlmeyer at Temple U
-(akohlmey at gmail.com). Contact him directly if you have questions.
+Note that NetCDF files can be directly visualized with the following
+tools:
-:line
+"Ovito"_ovito (Ovito supports the AMBER convention and the extensions mentioned above)
+"VMD"_vmd
+"AtomEye"_atomeye (the libAtoms version of AtomEye contains a NetCDF reader not present in the standard distribution) :ul
+
+:link(ovito,http://www.ovito.org)
+:link(atomeye,http://www.libatoms.org)
-USER-NC-DUMP package :link(USER-NC-DUMP),h5
+[Author:] Lars Pastewka (Karlsruhe Institute of Technology).
-Contents: Dump styles for writing NetCDF format files. NetCDF is a binary,
-portable, self-describing file format on top of HDF5. The file format
-contents follow the AMBER NetCDF trajectory conventions
-(http://ambermd.org/netcdf/nctraj.xhtml), but include extensions to this
-convention. This package implements a "dump nc"_dump_nc.html command
-and a "dump nc/mpiio"_dump_nc.html command to output LAMMPS snapshots
-in this format. See src/USER-NC-DUMP/README for more details.
+[Install or un-install:]
+
+Note that to follow these steps, you need the standard NetCDF software
+package installed on your system. The lib/netcdf/Makefile.lammps file
+has settings for NetCDF include and library files that LAMMPS needs to
+compile and linkk with this package. If the settings are not valid
+for your system, you will need to edit the Makefile.lammps file. See
+lib/netcdf/README for details.
-NetCDF files can be directly visualized with the following tools:
+make yes-user-netcdf
+make machine :pre
+
+make no-user-netcdf
+make machine :pre
-Ovito (http://www.ovito.org/). Ovito supports the AMBER convention
-and all of the above extensions. :ulb,l
-VMD (http://www.ks.uiuc.edu/Research/vmd/) :l
-AtomEye (http://www.libatoms.org/). The libAtoms version of AtomEye contains
-a NetCDF reader that is not present in the standard distribution of AtomEye :l,ule
+[Supporting info:]
-The person who created these files is Lars Pastewka at
-Karlsruhe Institute of Technology (lars.pastewka at kit.edu).
-Contact him directly if you have questions.
+src/USER-NETCDF: filenames -> commands
+src/USER-NETCDF/README
+lib/netcdf/README
+"dump netcdf"_dump_netcdf.html :ul
:line
-USER-OMP package :link(USER-OMP),h5
+USER-OMP package :link(USER-OMP),h4
-Supporting info:
+[Contents:]
-This package provides OpenMP multi-threading support and
-other optimizations of various LAMMPS pair styles, dihedral
-styles, and fix styles.
+Hundreds of pair, fix, compute, bond, angle, dihedral, improper, and
+kspace styles which are altered to enable threading on many-core CPUs
+via OpenMP directives. All of them have an "omp" in their style name.
+"Section 5.3.4"_accelerate_omp.html gives details of what hardware and
+compilers are required on your system, and how to build and use this
+package. Its styles can be invoked at run time via the "-sf omp" or
+"-suffix omp" "command-line switches"_Section_start.html#start_7.
+Also see the "KOKKOS"_#KOKKOS, "OPT"_#OPT, and
+"USER-INTEL"_#USER-INTEL packages, which have styles optimized for
+CPUs.
-See this section of the manual to get started:
+[Author:] Axel Kohlmeyer (Temple U).
-"Section 5.3"_Section_accelerate.html#acc_3
+NOTE: The compile flags "-restrict" and "-fopenmp" must be used to
+build LAMMPS with the USER-OMP package, as well as the link flag
+"-fopenmp". They should be added to the CCFLAGS and LINKFLAGS lines
+of your Makefile.machine. See src/MAKE/OPTIONS/Makefile.omp for an
+example.
-The person who created this package is Axel Kohlmeyer at Temple U
-(akohlmey at gmail.com). Contact him directly if you have questions.
+Once you have an appropriate Makefile.machine, you can
+install/un-install the package and build LAMMPS in the usual manner:
-For the USER-OMP package, your Makefile.machine needs additional
-settings for CCFLAGS and LINKFLAGS.
+[Install or un-install:]
+
+make yes-user-omp
+make machine :pre
+
+make no-user-omp
+make machine :pre
CCFLAGS: add -fopenmp and -restrict
LINKFLAGS: add -fopenmp :ul
-Examples: examples/accelerate, bench/KEPLER
+[Supporting info:]
+
+src/USER-OMP: filenames -> commands
+src/USER-OMP/README
+"Section 5.3"_Section_accelerate.html#acc_3
+"Section 5.3.4"_accelerate_omp.html
+"Section 2.7 -sf omp"_Section_start.html#start_7
+"Section 2.7 -pk omp"_Section_start.html#start_7
+"package omp"_package.html
+Styles sections of "Section 3.5"_Section_commands.html#cmd_5 for styles followed by (o)
+"Benchmarks page"_http://lammps.sandia.gov/bench.html of web site :ul
:line
-USER-PHONON package :link(USER-PHONON),h5
+USER-PHONON package :link(USER-PHONON),h4
+
+[Contents:]
-This package contains a fix phonon command that calculates dynamical
+A "fix phonon"_fix_phonon.html command that calculates dynamical
matrices, which can then be used to compute phonon dispersion
relations, directly from molecular dynamics simulations.
-See this doc page to get started:
-
-"fix phonon"_fix_phonon.html
+[Author:] Ling-Ti Kong (Shanghai Jiao Tong University).
-The person who created this package is Ling-Ti Kong (konglt at
-sjtu.edu.cn) at Shanghai Jiao Tong University. Contact him directly
-if you have questions.
+[Install or un-install:]
+
+make yes-user-phonon
+make machine :pre
+
+make no-user-phonon
+make machine :pre
+
+[Supporting info:]
-Examples: examples/USER/phonon
+src/USER-PHONON: filenames -> commands
+src/USER-PHONON/README
+"fix phonon"_fix_phonon.html
+examples/USER/phonon :ul
:line
-USER-QMMM package :link(USER-QMMM),h5
+USER-QMMM package :link(USER-QMMM),h4
-Supporting info:
+[Contents:]
-This package provides a fix qmmm command which allows LAMMPS to be
-used in a QM/MM simulation, currently only in combination with pw.x
-code from the "Quantum ESPRESSO"_espresso package.
+A "fix qmmm"_fix_qmmm.html command which allows LAMMPS to be used in a
+QM/MM simulation, currently only in combination with the "Quantum
+ESPRESSO"_espresso package.
:link(espresso,http://www.quantum-espresso.org)
+To use this package you must have Quantum ESPRESSO available on your
+system.
+
The current implementation only supports an ONIOM style mechanical
coupling to the Quantum ESPRESSO plane wave DFT package.
Electrostatic coupling is in preparation and the interface has been
written in a manner that coupling to other QM codes should be possible
without changes to LAMMPS itself.
-See this doc page to get started:
+[Author:] Axel Kohlmeyer (Temple U).
+
+[Install or un-install:]
-"fix qmmm"_fix_qmmm.html
+Before building LAMMPS with this package, you must first build the
+QMMM library in lib/qmmm. You can do this manually if you prefer;
+follow the first two steps explained in lib/colvars/README. You can
+also do it in one step from the lammps/src dir, using a command like
+these, which simply invoke the lib/colvars/Install.py script with the
+specified args:
-as well as the lib/qmmm/README file.
+make lib-qmmm # print help message
+make lib-qmmm args="-m gfortran" # build with GNU Fortran compiler :pre
-The person who created this package is Axel Kohlmeyer at Temple U
-(akohlmey at gmail.com). Contact him directly if you have questions.
+The build should produce two files: lib/qmmm/libqmmm.a and
+lib/qmmm/Makefile.lammps. The latter is copied from an existing
+Makefile.lammps.* and has settings needed to build LAMMPS with the
+QMMM library (though typically the settings are just blank). If
+necessary, you can edit/create a new lib/qmmm/Makefile.machine file
+for your system, which should define an EXTRAMAKE variable to specify
+a corresponding Makefile.lammps.machine file.
+
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
+
+make yes-user-qmmm
+make machine :pre
+
+make no-user-qmmm
+make machine :pre
+
+NOTE: The LAMMPS executable these steps produce is not yet functional
+for a QM/MM simulation. You must also build Quantum ESPRESSO and
+create a new executable which links LAMMPS and Quanutm ESPRESSO
+together. These are steps 3 and 4 described in the lib/qmmm/README
+file.
+
+[Supporting info:]
+
+src/USER-QMMM: filenames -> commands
+src/USER-QMMM/README
+lib/qmmm/README
+"fix phonon"_fix_phonon.html
+lib/qmmm/example-ec/README
+lib/qmmm/example-mc/README :ul
:line
-USER-QTB package :link(USER-QTB),h5
+USER-QTB package :link(USER-QTB),h4
-Supporting info:
+[Contents:]
-This package provides a self-consistent quantum treatment of the
+Two fixes which provide a self-consistent quantum treatment of
vibrational modes in a classical molecular dynamics simulation. By
coupling the MD simulation to a colored thermostat, it introduces zero
-point energy into the system, alter the energy power spectrum and the
-heat capacity towards their quantum nature. This package could be of
-interest if one wants to model systems at temperatures lower than
-their classical limits or when temperatures ramp up across the
-classical limits in the simulation.
+point energy into the system, altering the energy power spectrum and
+the heat capacity to account for their quantum nature. This is useful
+when modeling systems at temperatures lower than their classical
+limits or when temperatures ramp across the classical limits in a
+simulation.
-See these two doc pages to get started:
+[Author:] Yuan Shen (Stanford U).
+
+[Install or un-install:]
+
+make yes-user-qtb
+make machine :pre
+
+make no-user-qtb
+make machine :pre
+
+[Supporting info:]
-"fix qtb"_fix_qtb.html provides quantum nulcear correction through a
-colored thermostat and can be used with other time integration schemes
-like "fix nve"_fix_nve.html or "fix nph"_fix_nh.html.
+src/USER-QTB: filenames -> commands
+src/USER-QTB/README
+"fix qtb"_fix_qtb.html
+"fix qbmsst"_fix_qbmsst.html
+examples/USER/qtb :ul
-"fix qbmsst"_fix_qbmsst.html enables quantum nuclear correction of a
-multi-scale shock technique simulation by coupling the quantum thermal
-bath with the shocked system.
+:line
-The person who created this package is Yuan Shen (sy0302 at
-stanford.edu) at Stanford University. Contact him directly if you
-have questions.
+USER-QUIP package :link(USER-QUIP),h4
-Examples: examples/USER/qtb
+[Contents:]
-:line
+A "pair_style quip"_pair_quip.html command which wraps the "QUIP
+libAtoms library"_quip, which includes a variety of interatomic
+potentials, including Gaussian Approximation Potential (GAP) models
+developed by the Cambridge University group.
-USER-QUIP package :link(USER-QUIP),h5
+:link(quip,https://github.com/libAtoms/QUIP)
-Supporting info:
+To use this package you must have the QUIP libAatoms library available
+on your system.
-Examples: examples/USER/quip
+[Author:] Albert Bartok (Cambridge University)
-:line
+[Install or un-install:]
-USER-REAXC package :link(USER-REAXC),h5
+Note that to follow these steps to compile and link to the QUIP
+library, you must first download and build QUIP on your systems. It
+can be obtained from GitHub. See step 1 and step 1.1 in the
+lib/quip/README file for details on how to do this. Note that it
+requires setting two environment variables, QUIP_ROOT and QUIP_ARCH,
+which will be accessed by the lib/quip/Makefile.lammps file which is
+used when you compile and link LAMMPS with this package. You should
+only need to edit this file if the LAMMPS build can not use its
+settings to successfully build on your system.
-Supporting info:
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
-This package contains a implementation for LAMMPS of the ReaxFF force
-field. ReaxFF uses distance-dependent bond-order functions to
-represent the contributions of chemical bonding to the potential
-energy. It was originally developed by Adri van Duin and the Goddard
-group at CalTech.
+make yes-user-quip
+make machine :pre
+
+make no-user-quip
+make machine :pre
+
+[Supporting info:]
-The USER-REAXC version of ReaxFF (pair_style reax/c), implemented in
-C, should give identical or very similar results to pair_style reax,
-which is a ReaxFF implementation on top of a Fortran library, a
-version of which library was originally authored by Adri van Duin.
+src/USER-QUIP: filenames -> commands
+src/USER-QUIP/README
+"pair_style quip"_pair_quip.html
+examples/USER/quip :ul
-The reax/c version should be somewhat faster and more scalable,
-particularly with respect to the charge equilibration calculation. It
-should also be easier to build and use since there are no complicating
-issues with Fortran memory allocation or linking to a Fortran library.
+:line
-For technical details about this implementation of ReaxFF, see
-this paper:
+USER-REAXC package :link(USER-REAXC),h4
-Parallel and Scalable Reactive Molecular Dynamics: Numerical Methods
-and Algorithmic Techniques, H. M. Aktulga, J. C. Fogarty,
-S. A. Pandit, A. Y. Grama, Parallel Computing, in press (2011).
+[Contents:]
-See the doc page for the pair_style reax/c command for details
-of how to use it in LAMMPS.
+A pair style which implements the ReaxFF potential in C/C++ (in
+contrast to the "REAX package"_#REAX and its Fortran library). ReaxFF
+is universal reactive force field. See the src/USER-REAXC/README file
+for more info on differences between the two packages. Also two fixes
+for monitoring molecules as bonds are created and destroyed.
-The person who created this package is Hasan Metin Aktulga (hmaktulga
-at lbl.gov), while at Purdue University. Contact him directly, or
-Aidan Thompson at Sandia (athomps at sandia.gov), if you have
-questions.
+[Author:] Hasan Metin Aktulga (MSU) while at Purdue University.
-Examples: examples/reax
+[Install or un-install:]
+
+make yes-user-reaxc
+make machine :pre
+
+make no-user-reaxc
+make machine :pre
+
+[Supporting info:]
+
+src/USER-REAXC: filenames -> commands
+src/USER-REAXC/README
+"pair_style reax/c"_pair_reaxc.html
+"fix reax/c/bonds"_fix_reax_bonds.html
+"fix reax/c/species"_fix_reaxc_species.html
+examples/reax :ul
:line
-USER-SMD package :link(USER-SMD),h5
+USER-SMD package :link(USER-SMD),h4
-Supporting info:
+[Contents:]
-This package implements smoothed Mach dynamics (SMD) in
-LAMMPS. Currently, the package has the following features:
+An atom style, fixes, computes, and several pair styles which
+implements smoothed Mach dynamics (SMD) for solids, which is a model
+related to smoothed particle hydrodynamics (SPH) for liquids (see the
+"USER-SPH package"_#USER-SPH).
-* Does liquids via traditional Smooth Particle Hydrodynamics (SPH)
+This package solves solids mechanics problems via a state of the art
+stabilized meshless method with hourglass control. It can specify
+hydrostatic interactions independently from material strength models,
+i.e. pressure and deviatoric stresses are separated. It provides many
+material models (Johnson-Cook, plasticity with hardening,
+Mie-Grueneisen, Polynomial EOS) and allows new material models to be
+added. It implements rigid boundary conditions (walls) which can be
+specified as surface geometries from *.STL files.
-* Also solves solids mechanics problems via a state of the art
- stabilized meshless method with hourglass control.
+[Author:] Georg Ganzenmuller (Fraunhofer-Institute for High-Speed
+Dynamics, Ernst Mach Institute, Germany).
-* Can specify hydrostatic interactions independently from material
- strength models, i.e. pressure and deviatoric stresses are separated.
+[Install or un-install:]
-* Many material models available (Johnson-Cook, plasticity with
- hardening, Mie-Grueneisen, Polynomial EOS). Easy to add new
- material models.
+Before building LAMMPS with this package, you must first download the
+Eigen library. Eigen is a template library, so you do not need to
+build it, just download it. You can do this manually if you prefer;
+follow the instructions in lib/smd/README. You can also do it in one
+step from the lammps/src dir, using a command like these, which simply
+invoke the lib/smd/Install.py script with the specified args:
-* Rigid boundary conditions (walls) can be loaded as surface geometries
- from *.STL files.
+make lib-smd # print help message
+make lib-smd args="-g -l" # download in default lib/smd/eigen-eigen-*
+make lib-smd args="-h . eigen -g -l" # download in lib/smd/eigen
+make lib-smd args="-h ~ eigen -g -l" # download and build in ~/eigen :pre
-See the file doc/PDF/SMD_LAMMPS_userguide.pdf to get started.
+Note that the final -l switch is to create a symbolic (soft) link
+named "includelink" in lib/smd to point to the Eigen dir. When LAMMPS
+builds it will use this link. You should not need to edit the
+lib/smd/Makefile.lammps file.
-There are example scripts for using this package in examples/USER/smd.
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
-The person who created this package is Georg Ganzenmuller at the
-Fraunhofer-Institute for High-Speed Dynamics, Ernst Mach Institute in
-Germany (georg.ganzenmueller at emi.fhg.de). Contact him directly if
-you have questions.
+make yes-user-smd
+make machine :pre
+
+make no-user-smd
+make machine :pre
+
+[Supporting info:]
-Examples: examples/USER/smd
+src/USER-SMD: filenames -> commands
+src/USER-SMD/README
+doc/PDF/SMD_LAMMPS_userguide.pdf
+examples/USER/smd
+http://lammps.sandia.gov/movies.html#smd :ul
:line
-USER-SMTBQ package :link(USER-SMTBQ),h5
+USER-SMTBQ package :link(USER-SMTBQ),h4
-Supporting info:
+[Contents:]
-This package implements the Second Moment Tight Binding - QEq (SMTB-Q)
-potential for the description of ionocovalent bonds in oxides.
+A pair style which implements a Second Moment Tight Binding model with
+QEq charge equilibration (SMTBQ) potential for the description of
+ionocovalent bonds in oxides.
-There are example scripts for using this package in
-examples/USER/smtbq.
+[Authors:] Nicolas Salles, Emile Maras, Olivier Politano, and Robert
+Tetot (LAAS-CNRS, France).
-See this doc page to get started:
+[Install or un-install:]
+
+make yes-user-smtbq
+make machine :pre
+
+make no-user-smtbq
+make machine :pre
+
+[Supporting info:]
+src/USER-SMTBQ: filenames -> commands
+src/USER-SMTBQ/README
"pair_style smtbq"_pair_smtbq.html
+examples/USER/smtbq :ul
-The persons who created the USER-SMTBQ package are Nicolas Salles,
-Emile Maras, Olivier Politano, Robert Tetot, who can be contacted at
-these email addresses: lammps@u-bourgogne.fr, nsalles@laas.fr. Contact
-them directly if you have any questions.
+:line
-Examples: examples/USER/smtbq
+USER-SPH package :link(USER-SPH),h4
-:line
+[Contents:]
-USER-SPH package :link(USER-SPH),h5
+An atom style, fixes, computes, and several pair styles which
+implements smoothed particle hydrodynamics (SPH) for liquids. See the
+related "USER-SMD package"_#USER-SMD package for smooth Mach dynamics
+(SMD) for solids.
-Supporting info:
+This package contains ideal gas, Lennard-Jones equation of states,
+Tait, and full support for complete (i.e. internal-energy dependent)
+equations of state. It allows for plain or Monaghans XSPH integration
+of the equations of motion. It has options for density continuity or
+density summation to propagate the density field. It has
+"set"_set.html command options to set the internal energy and density
+of particles from the input script and allows the same quantities to
+be output with thermodynamic output or to dump files via the "compute
+property/atom"_compute_property_atom.html command.
-This package implements smoothed particle hydrodynamics (SPH) in
-LAMMPS. Currently, the package has the following features:
+[Author:] Georg Ganzenmuller (Fraunhofer-Institute for High-Speed
+Dynamics, Ernst Mach Institute, Germany).
-* Tait, ideal gas, Lennard-Jones equation of states, full support for
- complete (i.e. internal-energy dependent) equations of state
+[Install or un-install:]
+
+make yes-user-sph
+make machine :pre
+
+make no-user-sph
+make machine :pre
+
+[Supporting info:]
-* Plain or Monaghans XSPH integration of the equations of motion
+src/USER-SPH: filenames -> commands
+src/USER-SPH/README
+doc/PDF/SPH_LAMMPS_userguide.pdf
+examples/USER/sph
+http://lammps.sandia.gov/movies.html#sph :ul
-* Density continuity or density summation to propagate the density field
+:line
-* Commands to set internal energy and density of particles from the
- input script
+USER-TALLY package :link(USER-TALLY),h4
-* Output commands to access internal energy and density for dumping and
- thermo output
+[Contents:]
-See the file doc/PDF/SPH_LAMMPS_userguide.pdf to get started.
+Several compute styles that can be called when pairwise interactions
+are calculated to tally information (forces, heat flux, energy,
+stress, etc) about individual interactions.
-There are example scripts for using this package in examples/USER/sph.
+[Author:] Axel Kohlmeyer (Temple U).
-The person who created this package is Georg Ganzenmuller at the
-Fraunhofer-Institute for High-Speed Dynamics, Ernst Mach Institute in
-Germany (georg.ganzenmueller at emi.fhg.de). Contact him directly if
-you have questions.
+[Install or un-install:]
+
+make yes-user-tally
+make machine :pre
+
+make no-user-tally
+make machine :pre
+
+[Supporting info:]
-Examples: examples/USER/sph
+src/USER-TALLY: filenames -> commands
+src/USER-TALLY/README
+"compute */tally"_compute_tally.html
+examples/USER/tally :ul
:line
-USER-TALLY package :link(USER-TALLY),h5
+USER-VTK package :link(USER-VTK),h4
-Supporting info:
+[Contents:]
-Examples: examples/USER/tally
+A "dump custom/vtk"_dump_custom_vtk.html command which outputs
+snapshot info in the "VTK format"_vtk, enabling visualization by
+"Paraview"_paraview or other visuzlization packages.
-:line
+:link(vtk,http://www.vtk.org)
+:link(paraview,http://www.paraview.org)
+
+To use this package you must have VTK library available on your
+system.
+
+[Authors:] Richard Berger (JKU) and Daniel Queteschiner (DCS Computing).
-USER-VTK package :link(USER-VTK),h5
+[Install or un-install:]
+
+The lib/vtk/Makefile.lammps file has settings for accessing VTK files
+and its library, which are required for LAMMPS to build and link with
+this package. If the settings are not valid for your system, check if
+one of the other lib/vtk/Makefile.lammps.* files is compatible and
+copy it to Makefile.lammps. If none of the provided files work, you
+will need to edit the Makefile.lammps file.
+
+You can then install/un-install the package and build LAMMPS in the
+usual manner:
+
+make yes-user-vtk
+make machine :pre
+
+make no-user-vtk
+make machine :pre
+
+[Supporting info:]
+src/USER-VTK: filenames -> commands
+src/USER-VTK/README
+lib/vtk/README
+"dump custom/vtk"_dump_custom_vtk.html :ul
diff --git a/doc/src/Section_start.txt b/doc/src/Section_start.txt
index 5a5de9ac9..0a7209765 100644
--- a/doc/src/Section_start.txt
+++ b/doc/src/Section_start.txt
@@ -1,1905 +1,1771 @@
"Previous Section"_Section_intro.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_commands.html :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
2. Getting Started :h3
This section describes how to build and run LAMMPS, for both new and
experienced users.
2.1 "What's in the LAMMPS distribution"_#start_1
2.2 "Making LAMMPS"_#start_2
2.3 "Making LAMMPS with optional packages"_#start_3
-2.4 "Building LAMMPS via the Make.py script"_#start_4
-2.5 "Building LAMMPS as a library"_#start_5
-2.6 "Running LAMMPS"_#start_6
-2.7 "Command-line options"_#start_7
-2.8 "Screen output"_#start_8
-2.9 "Tips for users of previous versions"_#start_9 :all(b)
+2.5 "Building LAMMPS as a library"_#start_4
+2.6 "Running LAMMPS"_#start_5
+2.7 "Command-line options"_#start_6
+2.8 "Screen output"_#start_7
+2.9 "Tips for users of previous versions"_#start_8 :all(b)
:line
2.1 What's in the LAMMPS distribution :h4,link(start_1)
When you download a LAMMPS tarball you will need to unzip and untar
the downloaded file with the following commands, after placing the
tarball in an appropriate directory.
tar -xzvf lammps*.tar.gz :pre
This will create a LAMMPS directory containing two files and several
sub-directories:
README: text file
LICENSE: the GNU General Public License (GPL)
bench: benchmark problems
doc: documentation
examples: simple test problems
potentials: embedded atom method (EAM) potential files
src: source files
tools: pre- and post-processing tools :tb(s=:)
Note that the "download page"_download also has links to download
pre-build Windows installers, as well as pre-built packages for
several widely used Linux distributions. It also has instructions
for how to download/install LAMMPS for Macs (via Homebrew), and to
download and update LAMMPS from SVN and Git repositories, which gives
you access to the up-to-date sources that are used by the LAMMPS
core developers.
:link(download,http://lammps.sandia.gov/download.html)
The Windows and Linux packages for serial or parallel include
only selected packages and bug-fixes/upgrades listed on "this
page"_http://lammps.sandia.gov/bug.html up to a certain date, as
stated on the download page. If you want an executable with
non-included packages or that is more current, then you'll need to
build LAMMPS yourself, as discussed in the next section.
Skip to the "Running LAMMPS"_#start_6 sections for info on how to
launch a LAMMPS Windows executable on a Windows box.
:line
2.2 Making LAMMPS :h4,link(start_2)
This section has the following sub-sections:
2.2.1 "Read this first"_#start_2_1
2.2.1 "Steps to build a LAMMPS executable"_#start_2_2
2.2.3 "Common errors that can occur when making LAMMPS"_#start_2_3
2.2.4 "Additional build tips"_#start_2_4
2.2.5 "Building for a Mac"_#start_2_5
2.2.6 "Building for Windows"_#start_2_6 :all(b)
:line
Read this first :h5,link(start_2_1)
-If you want to avoid building LAMMPS yourself, read the preceding
+If you want to avoid building LAMMPS yourself, read the preceeding
section about options available for downloading and installing
executables. Details are discussed on the "download"_download page.
Building LAMMPS can be simple or not-so-simple. If all you need are
the default packages installed in LAMMPS, and MPI is already installed
on your machine, or you just want to run LAMMPS in serial, then you
can typically use the Makefile.mpi or Makefile.serial files in
src/MAKE by typing one of these lines (from the src dir):
make mpi
make serial :pre
Note that on a facility supercomputer, there are often "modules"
loaded in your environment that provide the compilers and MPI you
should use. In this case, the "mpicxx" compile/link command in
-Makefile.mpi should just work by accessing those modules.
+Makefile.mpi should simply work by accessing those modules.
It may be the case that one of the other Makefile.machine files in the
src/MAKE sub-directories is a better match to your system (type "make"
to see a list), you can use it as-is by typing (for example):
make stampede :pre
If any of these builds (with an existing Makefile.machine) works on
your system, then you're done!
+If you need to install an optional package with a LAMMPS command you
+want to use, and the package does not depend on an extra library, you
+can simply type
+
+make name :pre
+
+before invoking (or re-invoking) the above steps. "Name" is the
+lower-case name of the package, e.g. replica or user-misc.
+
If you want to do one of the following:
-use optional LAMMPS features that require additional libraries
-use optional packages that require additional libraries
-use optional accelerator packages that require special compiler/linker settings
-run on a specialized platform that has its own compilers, settings, or other libs to use :ul
+use a LAMMPS command that requires an extra library (e.g. "dump image"_dump_image.html)
+build with a package that requires an extra library
+build with an accelerator package that requires special compiler/linker settings
+run on a machine that has its own compilers, settings, or libraries :ul
then building LAMMPS is more complicated. You may need to find where
-auxiliary libraries exist on your machine or install them if they
-don't. You may need to build additional libraries that are part of
-the LAMMPS package, before building LAMMPS. You may need to edit a
+extra libraries exist on your machine or install them if they don't.
+You may need to build extra libraries that are included in the LAMMPS
+distribution, before building LAMMPS itself. You may need to edit a
Makefile.machine file to make it compatible with your system.
-Note that there is a Make.py tool in the src directory that automates
-several of these steps, but you still have to know what you are doing.
-"Section 2.4"_#start_4 below describes the tool. It is a convenient
-way to work with installing/un-installing various packages, the
-Makefile.machine changes required by some packages, and the auxiliary
-libraries some of them use.
-
Please read the following sections carefully. If you are not
comfortable with makefiles, or building codes on a Unix platform, or
running an MPI job on your machine, please find a local expert to help
-you. Many compilation, linking, and run problems that users have are
-often not really LAMMPS issues - they are peculiar to the user's
-system, compilers, libraries, etc. Such questions are better answered
-by a local expert.
+you. Many compilation, linking, and run problems users experience are
+often not LAMMPS issues - they are peculiar to the user's system,
+compilers, libraries, etc. Such questions are better answered by a
+local expert.
If you have a build problem that you are convinced is a LAMMPS issue
(e.g. the compiler complains about a line of LAMMPS source code), then
please post the issue to the "LAMMPS mail
list"_http://lammps.sandia.gov/mail.html.
If you succeed in building LAMMPS on a new kind of machine, for which
there isn't a similar machine Makefile included in the
src/MAKE/MACHINES directory, then send it to the developers and we can
include it in the LAMMPS distribution.
:line
Steps to build a LAMMPS executable :h5,link(start_2_2)
Step 0 :h6
The src directory contains the C++ source and header files for LAMMPS.
It also contains a top-level Makefile and a MAKE sub-directory with
low-level Makefile.* files for many systems and machines. See the
src/MAKE/README file for a quick overview of what files are available
and what sub-directories they are in.
The src/MAKE dir has a few files that should work as-is on many
platforms. The src/MAKE/OPTIONS dir has more that invoke additional
compiler, MPI, and other setting options commonly used by LAMMPS, to
illustrate their syntax. The src/MAKE/MACHINES dir has many more that
have been tweaked or optimized for specific machines. These files are
all good starting points if you find you need to change them for your
machine. Put any file you edit into the src/MAKE/MINE directory and
it will be never be touched by any LAMMPS updates.
>From within the src directory, type "make" or "gmake". You should see
a list of available choices from src/MAKE and all of its
sub-directories. If one of those has the options you want or is the
machine you want, you can type a command like:
make mpi :pre
or
make serial :pre
or
gmake mac :pre
Note that the corresponding Makefile.machine can exist in src/MAKE or
any of its sub-directories. If a file with the same name appears in
multiple places (not a good idea), the order they are used is as
follows: src/MAKE/MINE, src/MAKE, src/MAKE/OPTIONS, src/MAKE/MACHINES.
This gives preference to a file you have created/edited and put in
src/MAKE/MINE.
Note that on a multi-processor or multi-core platform you can launch a
parallel make, by using the "-j" switch with the make command, which
will build LAMMPS more quickly.
If you get no errors and an executable like [lmp_mpi] or [lmp_serial]
or [lmp_mac] is produced, then you're done; it's your lucky day.
Note that by default only a few of LAMMPS optional packages are
installed. To build LAMMPS with optional packages, see "this
section"_#start_3 below.
Step 1 :h6
If Step 0 did not work, you will need to create a low-level Makefile
for your machine, like Makefile.foo. You should make a copy of an
existing Makefile.* in src/MAKE or one of its sub-directories as a
starting point. The only portions of the file you need to edit are
the first line, the "compiler/linker settings" section, and the
"LAMMPS-specific settings" section. When it works, put the edited
file in src/MAKE/MINE and it will not be altered by any future LAMMPS
updates.
Step 2 :h6
Change the first line of Makefile.foo to list the word "foo" after the
"#", and whatever other options it will set. This is the line you
will see if you just type "make".
Step 3 :h6
The "compiler/linker settings" section lists compiler and linker
settings for your C++ compiler, including optimization flags. You can
use g++, the open-source GNU compiler, which is available on all Unix
systems. You can also use mpicxx which will typically be available if
MPI is installed on your system, though you should check which actual
compiler it wraps. Vendor compilers often produce faster code. On
boxes with Intel CPUs, we suggest using the Intel icc compiler, which
can be downloaded from "Intel's compiler site"_intel.
:link(intel,http://www.intel.com/software/products/noncom)
If building a C++ code on your machine requires additional libraries,
then you should list them as part of the LIB variable. You should
not need to do this if you use mpicxx.
The DEPFLAGS setting is what triggers the C++ compiler to create a
dependency list for a source file. This speeds re-compilation when
source (*.cpp) or header (*.h) files are edited. Some compilers do
not support dependency file creation, or may use a different switch
than -D. GNU g++ and Intel icc works with -D. If your compiler can't
create dependency files, then you'll need to create a Makefile.foo
patterned after Makefile.storm, which uses different rules that do not
involve dependency files. Note that when you build LAMMPS for the
first time on a new platform, a long list of *.d files will be printed
out rapidly. This is not an error; it is the Makefile doing its
normal creation of dependencies.
Step 4 :h6
The "system-specific settings" section has several parts. Note that
if you change any -D setting in this section, you should do a full
re-compile, after typing "make clean" (which will describe different
clean options).
The LMP_INC variable is used to include options that turn on ifdefs
-within the LAMMPS code. The options that are currently recognized are:
+within the LAMMPS code. The options that are currently recogized are:
-DLAMMPS_GZIP
-DLAMMPS_JPEG
-DLAMMPS_PNG
-DLAMMPS_FFMPEG
-DLAMMPS_MEMALIGN
-DLAMMPS_XDR
-DLAMMPS_SMALLBIG
-DLAMMPS_BIGBIG
-DLAMMPS_SMALLSMALL
-DLAMMPS_LONGLONG_TO_LONG
-DLAMMPS_EXCEPTIONS
-DPACK_ARRAY
-DPACK_POINTER
-DPACK_MEMCPY :ul
The read_data and dump commands will read/write gzipped files if you
compile with -DLAMMPS_GZIP. It requires that your machine supports
the "popen()" function in the standard runtime library and that a gzip
executable can be found by LAMMPS during a run.
NOTE: on some clusters with high-speed networks, using the fork()
library calls (required by popen()) can interfere with the fast
communication library and lead to simulations using compressed output
or input to hang or crash. For selected operations, compressed file
I/O is also available using a compression library instead, which are
provided in the COMPRESS package. From more details about compiling
LAMMPS with packages, please see below.
If you use -DLAMMPS_JPEG, the "dump image"_dump_image.html command
will be able to write out JPEG image files. For JPEG files, you must
also link LAMMPS with a JPEG library, as described below. If you use
-DLAMMPS_PNG, the "dump image"_dump.html command will be able to write
out PNG image files. For PNG files, you must also link LAMMPS with a
PNG library, as described below. If neither of those two defines are
used, LAMMPS will only be able to write out uncompressed PPM image
files.
If you use -DLAMMPS_FFMPEG, the "dump movie"_dump_image.html command
will be available to support on-the-fly generation of rendered movies
the need to store intermediate image files. It requires that your
machines supports the "popen" function in the standard runtime library
and that an FFmpeg executable can be found by LAMMPS during the run.
NOTE: Similar to the note above, this option can conflict with
high-speed networks, because it uses popen().
Using -DLAMMPS_MEMALIGN=<bytes> enables the use of the
posix_memalign() call instead of malloc() when large chunks or memory
are allocated by LAMMPS. This can help to make more efficient use of
vector instructions of modern CPUS, since dynamically allocated memory
has to be aligned on larger than default byte boundaries (e.g. 16
bytes instead of 8 bytes on x86 type platforms) for optimal
performance.
If you use -DLAMMPS_XDR, the build will include XDR compatibility
files for doing particle dumps in XTC format. This is only necessary
if your platform does have its own XDR files available. See the
Restrictions section of the "dump"_dump.html command for details.
Use at most one of the -DLAMMPS_SMALLBIG, -DLAMMPS_BIGBIG,
-DLAMMPS_SMALLSMALL settings. The default is -DLAMMPS_SMALLBIG. These
settings refer to use of 4-byte (small) vs 8-byte (big) integers
within LAMMPS, as specified in src/lmptype.h. The only reason to use
the BIGBIG setting is to enable simulation of huge molecular systems
(which store bond topology info) with more than 2 billion atoms, or to
track the image flags of moving atoms that wrap around a periodic box
more than 512 times. Normally, the only reason to use SMALLSMALL is
if your machine does not support 64-bit integers, though you can use
SMALLSMALL setting if you are running in serial or on a desktop
machine or small cluster where you will never run large systems or for
long time (more than 2 billion atoms, more than 2 billion timesteps).
See the "Additional build tips"_#start_2_4 section below for more
details on these settings.
Note that the USER-ATC package is not currently compatible with
-DLAMMPS_BIGBIG. Also the GPU package requires the lib/gpu library to
be compiled with the same setting, or the link will fail.
The -DLAMMPS_LONGLONG_TO_LONG setting may be needed if your system or
MPI version does not recognize "long long" data types. In this case a
"long" data type is likely already 64-bits, in which case this setting
will convert to that data type.
The -DLAMMPS_EXCEPTIONS setting can be used to activate alternative
versions of error handling inside of LAMMPS. This is useful when
external codes drive LAMMPS as a library. Using this option, LAMMPS
errors do not kill the caller. Instead, the call stack is unwound and
control returns to the caller. The library interface provides the
lammps_has_error() and lammps_get_last_error_message() functions to
detect and find out more about a LAMMPS error.
Using one of the -DPACK_ARRAY, -DPACK_POINTER, and -DPACK_MEMCPY
options can make for faster parallel FFTs (in the PPPM solver) on some
platforms. The -DPACK_ARRAY setting is the default. See the
"kspace_style"_kspace_style.html command for info about PPPM. See
Step 6 below for info about building LAMMPS with an FFT library.
Step 5 :h6
The 3 MPI variables are used to specify an MPI library to build LAMMPS
with. Note that you do not need to set these if you use the MPI
compiler mpicxx for your CC and LINK setting in the section above.
The MPI wrapper knows where to find the needed files.
If you want LAMMPS to run in parallel, you must have an MPI library
installed on your platform. If MPI is installed on your system in the
usual place (under /usr/local), you also may not need to specify these
3 variables, assuming /usr/local is in your path. On some large
parallel machines which use "modules" for their compile/link
-environments, you may simply need to include the correct module in
+environements, you may simply need to include the correct module in
your build environment, before building LAMMPS. Or the parallel
machine may have a vendor-provided MPI which the compiler has no
trouble finding.
Failing this, these 3 variables can be used to specify where the mpi.h
file (MPI_INC) and the MPI library file (MPI_PATH) are found and the
name of the library file (MPI_LIB).
If you are installing MPI yourself, we recommend Argonne's MPICH2
or OpenMPI. MPICH can be downloaded from the "Argonne MPI
site"_http://www.mcs.anl.gov/research/projects/mpich2/. OpenMPI can
be downloaded from the "OpenMPI site"_http://www.open-mpi.org.
Other MPI packages should also work. If you are running on a big
parallel platform, your system people or the vendor should have
already installed a version of MPI, which is likely to be faster
than a self-installed MPICH or OpenMPI, so find out how to build
and link with it. If you use MPICH or OpenMPI, you will have to
configure and build it for your platform. The MPI configure script
should have compiler options to enable you to use the same compiler
you are using for the LAMMPS build, which can avoid problems that can
arise when linking LAMMPS to the MPI library.
If you just want to run LAMMPS on a single processor, you can use the
dummy MPI library provided in src/STUBS, since you don't need a true
MPI library installed on your system. See src/MAKE/Makefile.serial
for how to specify the 3 MPI variables in this case. You will also
need to build the STUBS library for your platform before making LAMMPS
itself. Note that if you are building with src/MAKE/Makefile.serial,
e.g. by typing "make serial", then the STUBS library is built for you.
To build the STUBS library from the src directory, type "make
mpi-stubs", or from the src/STUBS dir, type "make". This should
create a libmpi_stubs.a file suitable for linking to LAMMPS. If the
build fails, you will need to edit the STUBS/Makefile for your
platform.
The file STUBS/mpi.c provides a CPU timer function called MPI_Wtime()
that calls gettimeofday() . If your system doesn't support
gettimeofday() , you'll need to insert code to call another timer.
Note that the ANSI-standard function clock() rolls over after an hour
or so, and is therefore insufficient for timing long LAMMPS
simulations.
Step 6 :h6
The 3 FFT variables allow you to specify an FFT library which LAMMPS
uses (for performing 1d FFTs) when running the particle-particle
particle-mesh (PPPM) option for long-range Coulombics via the
"kspace_style"_kspace_style.html command.
LAMMPS supports common open-source or vendor-supplied FFT libraries
for this purpose. If you leave these 3 variables blank, LAMMPS will
use the open-source "KISS FFT library"_http://kissfft.sf.net, which is
included in the LAMMPS distribution. This library is portable to all
platforms and for typical LAMMPS simulations is almost as fast as FFTW
or vendor optimized libraries. If you are not including the KSPACE
package in your build, you can also leave the 3 variables blank.
Otherwise, select which kinds of FFTs to use as part of the FFT_INC
setting by a switch of the form -DFFT_XXX. Recommended values for XXX
are: MKL or FFTW3. FFTW2 and NONE are supported as legacy options.
Selecting -DFFT_FFTW will use the FFTW3 library and -DFFT_NONE will
use the KISS library described above.
You may also need to set the FFT_INC, FFT_PATH, and FFT_LIB variables,
so the compiler and linker can find the needed FFT header and library
files. Note that on some large parallel machines which use "modules"
-for their compile/link environments, you may simply need to include
+for their compile/link environements, you may simply need to include
the correct module in your build environment. Or the parallel machine
may have a vendor-provided FFT library which the compiler has no
trouble finding.
FFTW is a fast, portable library that should also work on any
platform. You can download it from
"www.fftw.org"_http://www.fftw.org. Both the legacy version 2.1.X and
the newer 3.X versions are supported as -DFFT_FFTW2 or -DFFT_FFTW3.
Building FFTW for your box should be as simple as ./configure; make.
Note that on some platforms FFTW2 has been pre-installed, and uses
renamed files indicating the precision it was compiled with,
e.g. sfftw.h, or dfftw.h instead of fftw.h. In this case, you can
specify an additional define variable for FFT_INC called -DFFTW_SIZE,
which will select the correct include file. In this case, for FFT_LIB
you must also manually specify the correct library, namely -lsfftw or
-ldfftw.
The FFT_INC variable also allows for a -DFFT_SINGLE setting that will
use single-precision FFTs with PPPM, which can speed-up long-range
-calculations, particularly in parallel or on GPUs. Fourier transform
+calulations, particularly in parallel or on GPUs. Fourier transform
and related PPPM operations are somewhat insensitive to floating point
truncation errors and thus do not always need to be performed in
double precision. Using the -DFFT_SINGLE setting trades off a little
accuracy for reduced memory use and parallel communication costs for
-transposing 3d FFT data.
+transposing 3d FFT data. Note that single precision FFTs have only
+been tested with the FFTW3, FFTW2, MKL, and KISS FFT options.
Step 7 :h6
The 3 JPG variables allow you to specify a JPEG and/or PNG library
which LAMMPS uses when writing out JPEG or PNG files via the "dump
image"_dump_image.html command. These can be left blank if you do not
use the -DLAMMPS_JPEG or -DLAMMPS_PNG switches discussed above in Step
4, since in that case JPEG/PNG output will be disabled.
A standard JPEG library usually goes by the name libjpeg.a or
libjpeg.so and has an associated header file jpeglib.h. Whichever
JPEG library you have on your platform, you'll need to set the
appropriate JPG_INC, JPG_PATH, and JPG_LIB variables, so that the
compiler and linker can find it.
A standard PNG library usually goes by the name libpng.a or libpng.so
and has an associated header file png.h. Whichever PNG library you
have on your platform, you'll need to set the appropriate JPG_INC,
JPG_PATH, and JPG_LIB variables, so that the compiler and linker can
find it.
As before, if these header and library files are in the usual place on
your machine, you may not need to set these variables.
Step 8 :h6
Note that by default only a few of LAMMPS optional packages are
installed. To build LAMMPS with optional packages, see "this
section"_#start_3 below, before proceeding to Step 9.
Step 9 :h6
That's it. Once you have a correct Makefile.foo, and you have
pre-built any other needed libraries (e.g. MPI, FFT, etc) all you need
to do from the src directory is type something like this:
make foo
make -j N foo
gmake foo
gmake -j N foo :pre
The -j or -j N switches perform a parallel build which can be much
faster, depending on how many cores your compilation machine has. N
is the number of cores the build runs on.
You should get the executable lmp_foo when the build is complete.
:line
Errors that can occur when making LAMMPS: h5 :link(start_2_3)
-NOTE: If an error occurs when building LAMMPS, the compiler or linker
-will state very explicitly what the problem is. The error message
-should give you a hint as to which of the steps above has failed, and
-what you need to do in order to fix it. Building a code with a
-Makefile is a very logical process. The compiler and linker need to
-find the appropriate files and those files need to be compatible with
-LAMMPS source files. When a make fails, there is usually a very
+If an error occurs when building LAMMPS, the compiler or linker will
+state very explicitly what the problem is. The error message should
+give you a hint as to which of the steps above has failed, and what
+you need to do in order to fix it. Building a code with a Makefile is
+a very logical process. The compiler and linker need to find the
+appropriate files and those files need to be compatible with LAMMPS
+settings and source files. When a make fails, there is usually a very
simple reason, which you or a local expert will need to fix.
Here are two non-obvious errors that can occur:
(1) If the make command breaks immediately with errors that indicate
it can't find files with a "*" in their names, this can be because
your machine's native make doesn't support wildcard expansion in a
makefile. Try gmake instead of make. If that doesn't work, try using
a -f switch with your make command to use a pre-generated
Makefile.list which explicitly lists all the needed files, e.g.
make makelist
make -f Makefile.list linux
gmake -f Makefile.list mac :pre
The first "make" command will create a current Makefile.list with all
the file names in your src dir. The 2nd "make" command (make or
gmake) will use it to build LAMMPS. Note that you should
include/exclude any desired optional packages before using the "make
makelist" command.
(2) If you get an error that says something like 'identifier "atoll"
is undefined', then your machine does not support "long long"
integers. Try using the -DLAMMPS_LONGLONG_TO_LONG setting described
above in Step 4.
:line
Additional build tips :h5,link(start_2_4)
Building LAMMPS for multiple platforms. :h6
You can make LAMMPS for multiple platforms from the same src
directory. Each target creates its own object sub-directory called
Obj_target where it stores the system-specific *.o files.
Cleaning up. :h6
Typing "make clean-all" or "make clean-machine" will delete *.o object
files created when LAMMPS is built, for either all builds or for a
particular machine.
-Changing the LAMMPS size limits via -DLAMMPS_SMALLBIG or -DLAMMPS_BIGBIG or -DLAMMPS_SMALLSMALL :h6
+Changing the LAMMPS size limits via -DLAMMPS_SMALLBIG or
+-DLAMMPS_BIGBIG or -DLAMMPS_SMALLSMALL :h6
As explained above, any of these 3 settings can be specified on the
LMP_INC line in your low-level src/MAKE/Makefile.foo.
The default is -DLAMMPS_SMALLBIG which allows for systems with up to
2^63 atoms and 2^63 timesteps (about 9e18). The atom limit is for
atomic systems which do not store bond topology info and thus do not
require atom IDs. If you use atom IDs for atomic systems (which is
the default) or if you use a molecular model, which stores bond
topology info and thus requires atom IDs, the limit is 2^31 atoms
(about 2 billion). This is because the IDs are stored in 32-bit
integers.
Likewise, with this setting, the 3 image flags for each atom (see the
"dump"_dump.html doc page for a discussion) are stored in a 32-bit
integer, which means the atoms can only wrap around a periodic box (in
each dimension) at most 512 times. If atoms move through the periodic
box more than this many times, the image flags will "roll over",
e.g. from 511 to -512, which can cause diagnostics like the
mean-squared displacement, as calculated by the "compute
msd"_compute_msd.html command, to be faulty.
To allow for larger atomic systems with atom IDs or larger molecular
systems or larger image flags, compile with -DLAMMPS_BIGBIG. This
stores atom IDs and image flags in 64-bit integers. This enables
atomic or molecular systems with atom IDS of up to 2^63 atoms (about
9e18). And image flags will not "roll over" until they reach 2^20 =
1048576.
If your system does not support 8-byte integers, you will need to
compile with the -DLAMMPS_SMALLSMALL setting. This will restrict the
total number of atoms (for atomic or molecular systems) and timesteps
to 2^31 (about 2 billion). Image flags will roll over at 2^9 = 512.
Note that in src/lmptype.h there are definitions of all these data
types as well as the MPI data types associated with them. The MPI
types need to be consistent with the associated C data types, or else
LAMMPS will generate a run-time error. As far as we know, the
settings defined in src/lmptype.h are portable and work on every
current system.
In all cases, the size of problem that can be run on a per-processor
basis is limited by 4-byte integer storage to 2^31 atoms per processor
(about 2 billion). This should not normally be a limitation since such
a problem would have a huge per-processor memory footprint due to
neighbor lists and would run very slowly in terms of CPU secs/timestep.
:line
Building for a Mac :h5,link(start_2_5)
OS X is a derivative of BSD Unix, so it should just work. See the
src/MAKE/MACHINES/Makefile.mac and Makefile.mac_mpi files.
:line
Building for Windows :h5,link(start_2_6)
If you want to build a Windows version of LAMMPS, you can build it
yourself, but it may require some effort. LAMMPS expects a Unix-like
build environment for the default build procedure. This can be done
using either Cygwin or MinGW; the latter also exists as a ready-to-use
Linux-to-Windows cross-compiler in several Linux distributions. In
these cases, you can do the installation after installing several
unix-style commands like make, grep, sed and bash with some shell
utilities.
For Cygwin and the MinGW cross-compilers, suitable makefiles are
provided in src/MAKE/MACHINES. When using other compilers, like
Visual C++ or Intel compilers for Windows, you may have to implement
your own build system. Since none of the current LAMMPS core developers
has significant experience building executables on Windows, we are
happy to distribute contributed instructions and modifications, but
we cannot provide support for those.
With the so-called "Anniversary Update" to Windows 10, there is a
Ubuntu Linux subsystem available for Windows, that can be installed
and then used to compile/install LAMMPS as if you are running on a
Ubuntu Linux system instead of Windows.
As an alternative, you can download "daily builds" (and some older
versions) of the installer packages from
"rpm.lammps.org/windows.html"_http://rpm.lammps.org/windows.html.
These executables are built with most optional packages and the
download includes documentation, potential files, some tools and
many examples, but no source code.
:line
2.3 Making LAMMPS with optional packages :h4,link(start_3)
This section has the following sub-sections:
2.3.1 "Package basics"_#start_3_1
2.3.2 "Including/excluding packages"_#start_3_2
2.3.3 "Packages that require extra libraries"_#start_3_3
2.3.4 "Packages that require Makefile.machine settings"_#start_3_4 :all(b)
-Note that the following "Section 2.4"_#start_4 describes the Make.py
-tool which can be used to install/un-install packages and build the
-auxiliary libraries which some of them use. It can also auto-edit a
-Makefile.machine to add settings needed by some packages.
-
:line
Package basics: :h5,link(start_3_1)
The source code for LAMMPS is structured as a set of core files which
are always included, plus optional packages. Packages are groups of
files that enable a specific set of features. For example, force
fields for molecular systems or granular systems are in packages.
-"Section 4"_Section_packages.html in the manual has details
-about all the packages, including specific instructions for building
-LAMMPS with each package, which are covered in a more general manner
+"Section 4"_Section_packages.html in the manual has details about all
+the packages, which come in two flavors: [standard] and [user]
+packages. It also has specific instructions for building LAMMPS with
+any package which requires an extra library. General instructions are
below.
You can see the list of all packages by typing "make package" from
-within the src directory of the LAMMPS distribution. This also lists
-various make commands that can be used to manipulate packages.
+within the src directory of the LAMMPS distribution. It will also
+list various make commands that can be used to manage packages.
If you use a command in a LAMMPS input script that is part of a
package, you must have built LAMMPS with that package, else you will
get an error that the style is invalid or the command is unknown.
-Every command's doc page specifies if it is part of a package. You can
-also type
+Every command's doc page specfies if it is part of a package. You can
+type
lmp_machine -h :pre
to run your executable with the optional "-h command-line
-switch"_#start_7 for "help", which will simply list the styles and
-commands known to your executable, and immediately exit.
-
-There are two kinds of packages in LAMMPS, standard and user packages.
-More information about the contents of standard and user packages is
-given in "Section 4"_Section_packages.html of the manual. The
-difference between standard and user packages is as follows:
-
-Standard packages, such as molecule or kspace, are supported by the
-LAMMPS developers and are written in a syntax and style consistent
-with the rest of LAMMPS. This means we will answer questions about
-them, debug and fix them if necessary, and keep them compatible with
-future changes to LAMMPS.
-
-User packages, such as user-atc or user-omp, have been contributed by
-users, and always begin with the user prefix. If they are a single
-command (single file), they are typically in the user-misc package.
-Otherwise, they are a set of files grouped together which add a
-specific functionality to the code.
-
-User packages don't necessarily meet the requirements of the standard
-packages. If you have problems using a feature provided in a user
-package, you may need to contact the contributor directly to get help.
-Information on how to submit additions you make to LAMMPS as single
-files or either a standard or user-contributed package are given in
-"this section"_Section_modify.html#mod_15 of the documentation.
+switch"_#start_7 for "help", which will list the styles and commands
+known to your executable, and immediately exit.
:line
Including/excluding packages :h5,link(start_3_2)
-To use (or not use) a package you must include it (or exclude it)
-before building LAMMPS. From the src directory, this is typically as
-simple as:
+To use (or not use) a package you must install it (or un-install it)
+before building LAMMPS. From the src directory, this is as simple as:
make yes-colloid
make mpi :pre
or
-make no-manybody
+make no-user-omp
make mpi :pre
-NOTE: You should NOT include/exclude packages and build LAMMPS in a
+NOTE: You should NOT install/un-install packages and build LAMMPS in a
single make command using multiple targets, e.g. make yes-colloid mpi.
This is because the make procedure creates a list of source files that
will be out-of-date for the build if the package configuration changes
within the same command.
-Some packages have individual files that depend on other packages
-being included. LAMMPS checks for this and does the right thing.
-I.e. individual files are only included if their dependencies are
-already included. Likewise, if a package is excluded, other files
+Any package can be installed or not in a LAMMPS build, independent of
+all other packages. However, some packages include files derived from
+files in other packages. LAMMPS checks for this and does the right
+thing. I.e. individual files are only included if their dependencies
+are already included. Likewise, if a package is excluded, other files
dependent on that package are also excluded.
+NOTE: The one exception is that we do not recommend building with both
+the KOKKOS package installed and any of the other acceleration
+packages (GPU, OPT, USER-INTEL, USER-OMP) also installed. This is
+because of how Kokkos sometimes builds using a wrapper compiler which
+can make it difficult to invoke all the compile/link flags correctly
+for both Kokkos and non-Kokkos files.
+
If you will never run simulations that use the features in a
particular packages, there is no reason to include it in your build.
-For some packages, this will keep you from having to build auxiliary
-libraries (see below), and will also produce a smaller executable
-which may run a bit faster.
-
-When you download a LAMMPS tarball, these packages are pre-installed
-in the src directory: KSPACE, MANYBODY,MOLECULE, because they are so
-commonly used. When you download LAMMPS source files from the SVN or
-Git repositories, no packages are pre-installed.
-
-Packages are included or excluded by typing "make yes-name" or "make
-no-name", where "name" is the name of the package in lower-case, e.g.
-name = kspace for the KSPACE package or name = user-atc for the
-USER-ATC package. You can also type "make yes-standard", "make
-no-standard", "make yes-std", "make no-std", "make yes-user", "make
-no-user", "make yes-lib", "make no-lib", "make yes-all", or "make
-no-all" to include/exclude various sets of packages. Type "make
-package" to see all of the package-related make options.
-
-NOTE: Inclusion/exclusion of a package works by simply moving files
-back and forth between the main src directory and sub-directories with
-the package name (e.g. src/KSPACE, src/USER-ATC), so that the files
-are seen or not seen when LAMMPS is built. After you have included or
-excluded a package, you must re-build LAMMPS.
-
-Additional package-related make options exist to help manage LAMMPS
-files that exist in both the src directory and in package
-sub-directories. You do not normally need to use these commands
-unless you are editing LAMMPS files or have downloaded a patch from
-the LAMMPS WWW site.
-
-Typing "make package-update" or "make pu" will overwrite src files
-with files from the package sub-directories if the package has been
-included. It should be used after a patch is installed, since patches
-only update the files in the package sub-directory, but not the src
-files. Typing "make package-overwrite" will overwrite files in the
-package sub-directories with src files.
+For some packages, this will keep you from having to build extra
+libraries, and will also produce a smaller executable which may run a
+bit faster.
+
+When you download a LAMMPS tarball, three packages are pre-installed
+in the src directory -- KSPACE, MANYBODY, MOLECULE -- because they are
+so commonly used. When you download LAMMPS source files from the SVN
+or Git repositories, no packages are pre-installed.
+
+Packages are installed or un-installed by typing
+
+make yes-name
+make no-name :pre
+
+where "name" is the name of the package in lower-case, e.g. name =
+kspace for the KSPACE package or name = user-atc for the USER-ATC
+package. You can also type any of these commands:
+
+make yes-all | install all packages
+make no-all | un-install all packages
+make yes-standard or make yes-std | install standard packages
+make no-standard or make no-std| un-install standard packages
+make yes-user | install user packages
+make no-user | un-install user packages
+make yes-lib | install packages that require extra libraries
+make no-lib | un-install packages that require extra libraries
+make yes-ext | install packages that require external libraries
+make no-ext | un-install packages that require external libraries :tb(s=|)
+
+which install/un-install various sets of packages. Typing "make
+package" will list all the these commands.
+
+NOTE: Installing or un-installing a package works by simply moving
+files back and forth between the main src directory and
+sub-directories with the package name (e.g. src/KSPACE, src/USER-ATC),
+so that the files are included or excluded when LAMMPS is built.
+After you have installed or un-installed a package, you must re-build
+LAMMPS for the action to take effect.
+
+The following make commands help manage files that exist in both the
+src directory and in package sub-directories. You do not normally
+need to use these commands unless you are editing LAMMPS files or have
+downloaded a patch from the LAMMPS web site.
Typing "make package-status" or "make ps" will show which packages are
-currently included. For those that are included, it will list any
+currently installed. For those that are installed, it will list any
files that are different in the src directory and package
-sub-directory. Typing "make package-diff" lists all differences
-between these files. Again, type "make package" to see all of the
-package-related make options.
+sub-directory.
-:line
+Typing "make package-update" or "make pu" will overwrite src files
+with files from the package sub-directories if the package is
+installed. It should be used after a patch has been applied, since
+patches only update the files in the package sub-directory, but not
+the src files.
-Packages that require extra libraries :h5,link(start_3_3)
+Typing "make package-overwrite" will overwrite files in the package
+sub-directories with src files.
-A few of the standard and user packages require additional auxiliary
-libraries. Many of them are provided with LAMMPS, in which case they
-must be compiled first, before LAMMPS is built, if you wish to include
-that package. If you get a LAMMPS build error about a missing
-library, this is likely the reason. See the
-"Section 4"_Section_packages.html doc page for a list of
-packages that have these kinds of auxiliary libraries.
-
-The lib directory in the distribution has sub-directories with package
-names that correspond to the needed auxiliary libs, e.g. lib/gpu.
-Each sub-directory has a README file that gives more details. Code
-for most of the auxiliary libraries is included in that directory.
-Examples are the USER-ATC and MEAM packages.
-
-A few of the lib sub-directories do not include code, but do include
-instructions (and sometimes scripts) that automate the process of
-downloading the auxiliary library and installing it so LAMMPS can link
-to it. Examples are the KIM, VORONOI, USER-MOLFILE, and USER-SMD
-packages.
-
-The lib/python directory (for the PYTHON package) contains only a
-choice of Makefile.lammps.* files. This is because no auxiliary code
-or libraries are needed, only the Python library and other system libs
-that should already available on your system. However, the
-Makefile.lammps file is needed to tell LAMMPS which libs to use and
-where to find them.
-
-For libraries with provided code, the sub-directory README file
-(e.g. lib/atc/README) has instructions on how to build that library.
-This information is also summarized in "Section
-4"_Section_packages.html. Typically this is done by typing
-something like:
+Typing "make package-diff" lists all differences between these files.
-make -f Makefile.g++ :pre
-
-If one of the provided Makefiles is not appropriate for your system
-you will need to edit or add one. Note that all the Makefiles have a
-setting for EXTRAMAKE at the top that specifies a Makefile.lammps.*
-file.
-
-If the library build is successful, it will produce 2 files in the lib
-directory:
-
-libpackage.a
-Makefile.lammps :pre
-
-The Makefile.lammps file will typically be a copy of one of the
-Makefile.lammps.* files in the library directory.
-
-Note that you must insure that the settings in Makefile.lammps are
-appropriate for your system. If they are not, the LAMMPS build may
-fail. To fix this, you can edit or create a new Makefile.lammps.*
-file for your system, and copy it to Makefile.lammps.
-
-As explained in the lib/package/README files, the settings in
-Makefile.lammps are used to specify additional system libraries and
-their locations so that LAMMPS can build with the auxiliary library.
-For example, if the MEAM package is used, the auxiliary library
-consists of F90 code, built with a Fortran complier. To link that
-library with LAMMPS (a C++ code) via whatever C++ compiler LAMMPS is
-built with, typically requires additional Fortran-to-C libraries be
-included in the link. Another example are the BLAS and LAPACK
-libraries needed to use the USER-ATC or USER-AWPMD packages.
-
-For libraries without provided code, the sub-directory README file has
-information on where to download the library and how to build it,
-e.g. lib/voronoi/README and lib/smd/README. The README files also
-describe how you must either (a) create soft links, via the "ln"
-command, in those directories to point to where you built or installed
-the packages, or (b) check or edit the Makefile.lammps file in the
-same directory to provide that information.
-
-Some of the sub-directories, e.g. lib/voronoi, also have an install.py
-script which can be used to automate the process of
-downloading/building/installing the auxiliary library, and setting the
-needed soft links. Type "python install.py" for further instructions.
-
-As with the sub-directories containing library code, if the soft links
-or settings in the lib/package/Makefile.lammps files are not correct,
-the LAMMPS build will typically fail.
+Again, just type "make package" to see all of the package-related make
+options.
:line
-Packages that require Makefile.machine settings :h5,link(start_3_4)
-
-A few packages require specific settings in Makefile.machine, to
-either build or use the package effectively. These are the
-USER-INTEL, KOKKOS, USER-OMP, and OPT packages, used for accelerating
-code performance on CPUs or other hardware, as discussed in "Section
-5.3"_Section_accelerate.html#acc_3.
+Packages that require extra libraries :h5,link(start_3_3)
-A summary of what Makefile.machine changes are needed for each of
-these packages is given in "Section 4"_Section_packages.html.
-The details are given on the doc pages that describe each of these
-accelerator packages in detail:
+A few of the standard and user packages require extra libraries. See
+"Section 4"_Section_packages.html for two tables of packages which
+indicate which ones require libraries. For each such package, the
+Section 4 doc page gives details on how to build the extra library,
+including how to download it if necessary. The basic ideas are
+summarized here.
+
+[System libraries:]
+
+Packages in the tables "Section 4"_Section_packages.html with a "sys"
+in the last column link to system libraries that typically already
+exist on your machine. E.g. the python package links to a system
+Python library. If your machine does not have the required library,
+you will have to download and install it on your machine, in either
+the system or user space.
+
+[Internal libraries:]
+
+Packages in the tables "Section 4"_Section_packages.html with an "int"
+in the last column link to internal libraries whose source code is
+included with LAMMPS, in the lib/name directory where name is the
+package name. You must first build the library in that directory
+before building LAMMPS with that package installed. E.g. the gpu
+package links to a library you build in the lib/gpu dir. You can
+often do the build in one step by typing "make lib-name args=..."
+from the src dir, with appropriate arguments. You can leave off the
+args to see a help message. See "Section 4"_Section_packages.html for
+details for each package.
+
+[External libraries:]
+
+Packages in the tables "Section 4"_Section_packages.html with an "ext"
+in the last column link to exernal libraries whose source code is not
+included with LAMMPS. You must first download and install the library
+before building LAMMPS with that package installed. E.g. the voronoi
+package links to the freely available "Voro++ library"_voronoi. You
+can often do the download/build in one step by typing "make lib-name
+args=..." from the src dir, with appropriate arguments. You can leave
+off the args to see a help message. See "Section
+4"_Section_packages.html for details for each package.
+
+:link(voronoi,http://math.lbl.gov/voro++)
+
+[Possible errors:]
+
+There are various common errors which can occur when building extra
+libraries or when building LAMMPS with packages that require the extra
+libraries.
+
+If you cannot build the extra library itself successfully, you may
+need to edit or create an appropriate Makefile for your machine, e.g.
+with appropriate compiler or system settings. Provided makefiles are
+typically in the lib/name directory. E.g. see the Makefile.* files in
+lib/gpu.
+
+The LAMMPS build often uses settings in a lib/name/Makefile.lammps
+file which either exists in the LAMMPS distribution or is created or
+copied from a lib/name/Makefile.lammps.* file when the library is
+built. If those settings are not correct for your machine you will
+need to edit or create an appropriate Makefile.lammps file.
+
+Package-specific details for these steps are given in "Section
+4"_Section_packages.html an in README files in the lib/name
+directories.
+
+[Compiler options needed for accelerator packages:]
+
+Several packages contain code that is optimized for specific hardware,
+e.g. CPU, KNL, or GPU. These are the OPT, GPU, KOKKOS, USER-INTEL,
+and USER-OMP packages. Compiling and linking the source files in
+these accelerator packages for optimal performance requires specific
+settings in the Makefile.machine file you use.
+
+A summary of the Makefile.machine settings needed for each of these
+packages is given in "Section 4"_Section_packages.html. More info is
+given on the doc pages that describe each package in detail:
5.3.1 "USER-INTEL package"_accelerate_intel.html
+5.3.2 "GPU package"_accelerate_intel.html
5.3.3 "KOKKOS package"_accelerate_kokkos.html
5.3.4 "USER-OMP package"_accelerate_omp.html
5.3.5 "OPT package"_accelerate_opt.html :all(b)
-You can also look at the following machine Makefiles in
-src/MAKE/OPTIONS, which include the changes. Note that the USER-INTEL
-and KOKKOS packages allow for settings that build LAMMPS for different
-hardware. The USER-INTEL package builds for CPU and the Xeon Phi, the
-KOKKOS package builds for OpenMP, GPUs (Cuda), and the Xeon Phi.
+You can also use or examine the following machine Makefiles in
+src/MAKE/OPTIONS, which include the settings. Note that the
+USER-INTEL and KOKKOS packages can use settings that build LAMMPS for
+different hardware. The USER-INTEL package can be compiled for Intel
+CPUs and KNLs; the KOKKOS package builds for CPUs (OpenMP), GPUs
+(Cuda), and Intel KNLs.
Makefile.intel_cpu
Makefile.intel_phi
Makefile.kokkos_omp
Makefile.kokkos_cuda
Makefile.kokkos_phi
Makefile.omp
Makefile.opt :ul
-Also note that the Make.py tool, described in the next "Section
-2.4"_#start_4 can automatically add the needed info to an existing
-machine Makefile, using simple command-line arguments.
-
-:line
-
-2.4 Building LAMMPS via the Make.py tool :h4,link(start_4)
-
-The src directory includes a Make.py script, written in Python, which
-can be used to automate various steps of the build process. It is
-particularly useful for working with the accelerator packages, as well
-as other packages which require auxiliary libraries to be built.
-
-The goal of the Make.py tool is to allow any complex multi-step LAMMPS
-build to be performed as a single Make.py command. And you can
-archive the commands, so they can be re-invoked later via the -r
-(redo) switch. If you find some LAMMPS build procedure that can't be
-done in a single Make.py command, let the developers know, and we'll
-see if we can augment the tool.
-
-You can run Make.py from the src directory by typing either:
-
-Make.py -h
-python Make.py -h :pre
-
-which will give you help info about the tool. For the former to work,
-you may need to edit the first line of Make.py to point to your local
-Python. And you may need to insure the script is executable:
-
-chmod +x Make.py :pre
-
-Here are examples of build tasks you can perform with Make.py:
-
-Install/uninstall packages: Make.py -p no-lib kokkos omp intel
-Build specific auxiliary libs: Make.py -a lib-atc lib-meam
-Build libs for all installed packages: Make.py -p cuda gpu -gpu mode=double arch=31 -a lib-all
-Create a Makefile from scratch with compiler and MPI settings: Make.py -m none -cc g++ -mpi mpich -a file
-Augment Makefile.serial with settings for installed packages: Make.py -p intel -intel cpu -m serial -a file
-Add JPG and FFTW support to Makefile.mpi: Make.py -m mpi -jpg -fft fftw -a file
-Build LAMMPS with a parallel make using Makefile.mpi: Make.py -j 16 -m mpi -a exe
-Build LAMMPS and libs it needs using Makefile.serial with accelerator settings: Make.py -p gpu intel -intel cpu -a lib-all file serial :tb(s=:)
-
-The bench and examples directories give Make.py commands that can be
-used to build LAMMPS with the various packages and options needed to
-run all the benchmark and example input scripts. See these files for
-more details:
-
-bench/README
-bench/FERMI/README
-bench/KEPLER/README
-bench/PHI/README
-examples/README
-examples/accelerate/README
-examples/accelerate/make.list :ul
-
-All of the Make.py options and syntax help can be accessed by using
-the "-h" switch.
-
-E.g. typing "Make.py -h" gives
-
-Syntax: Make.py switch args ...
- switches can be listed in any order
- help switch:
- -h prints help and syntax for all other specified switches
- switch for actions:
- -a lib-all, lib-dir, clean, file, exe or machine
- list one or more actions, in any order
- machine is a Makefile.machine suffix, must be last if used
- one-letter switches:
- -d (dir), -j (jmake), -m (makefile), -o (output),
- -p (packages), -r (redo), -s (settings), -v (verbose)
- switches for libs:
- -atc, -awpmd, -colvars, -cuda
- -gpu, -meam, -poems, -qmmm, -reax
- switches for build and makefile options:
- -intel, -kokkos, -cc, -mpi, -fft, -jpg, -png :pre
-
-Using the "-h" switch with other switches and actions gives additional
-info on all the other specified switches or actions. The "-h" can be
-anywhere in the command-line and the other switches do not need their
-arguments. E.g. type "Make.py -h -d -atc -intel" will print:
-
--d dir
- dir = LAMMPS home dir
- if -d not specified, working dir must be lammps/src :pre
-
--atc make=suffix lammps=suffix2
- all args are optional and can be in any order
- make = use Makefile.suffix (def = g++)
- lammps = use Makefile.lammps.suffix2 (def = EXTRAMAKE in makefile) :pre
-
--intel mode
- mode = cpu or phi (def = cpu)
- build Intel package for CPU or Xeon Phi :pre
-
-Note that Make.py never overwrites an existing Makefile.machine.
-Instead, it creates src/MAKE/MINE/Makefile.auto, which you can save or
-rename if desired. Likewise it creates an executable named
-src/lmp_auto, which you can rename using the -o switch if desired.
-
-The most recently executed Make.py command is saved in
-src/Make.py.last. You can use the "-r" switch (for redo) to re-invoke
-the last command, or you can save a sequence of one or more Make.py
-commands to a file and invoke the file of commands using "-r". You
-can also label the commands in the file and invoke one or more of them
-by name.
-
-A typical use of Make.py is to start with a valid Makefile.machine for
-your system, that works for a vanilla LAMMPS build, i.e. when optional
-packages are not installed. You can then use Make.py to add various
-settings (FFT, JPG, PNG) to the Makefile.machine as well as change its
-compiler and MPI options. You can also add additional packages to the
-build, as well as build the needed supporting libraries.
-
-You can also use Make.py to create a new Makefile.machine from
-scratch, using the "-m none" switch, if you also specify what compiler
-and MPI options to use, via the "-cc" and "-mpi" switches.
-
:line
-2.5 Building LAMMPS as a library :h4,link(start_5)
+2.4 Building LAMMPS as a library :h4,link(start_4)
LAMMPS can be built as either a static or shared library, which can
then be called from another application or a scripting language. See
"this section"_Section_howto.html#howto_10 for more info on coupling
LAMMPS to other codes. See "this section"_Section_python.html for
more info on wrapping and running LAMMPS from Python.
Static library :h5
To build LAMMPS as a static library (*.a file on Linux), type
make foo mode=lib :pre
where foo is the machine name. This kind of library is typically used
to statically link a driver application to LAMMPS, so that you can
insure all dependencies are satisfied at compile time. This will use
the ARCHIVE and ARFLAGS settings in src/MAKE/Makefile.foo. The build
will create the file liblammps_foo.a which another application can
link to. It will also create a soft link liblammps.a, which will
point to the most recently built static library.
Shared library :h5
To build LAMMPS as a shared library (*.so file on Linux), which can be
dynamically loaded, e.g. from Python, type
make foo mode=shlib :pre
where foo is the machine name. This kind of library is required when
wrapping LAMMPS with Python; see "Section 11"_Section_python.html
for details. This will use the SHFLAGS and SHLIBFLAGS settings in
src/MAKE/Makefile.foo and perform the build in the directory
Obj_shared_foo. This is so that each file can be compiled with the
-fPIC flag which is required for inclusion in a shared library. The
build will create the file liblammps_foo.so which another application
-can link to dynamically. It will also create a soft link liblammps.so,
+can link to dyamically. It will also create a soft link liblammps.so,
which will point to the most recently built shared library. This is
the file the Python wrapper loads by default.
Note that for a shared library to be usable by a calling program, all
the auxiliary libraries it depends on must also exist as shared
libraries. This will be the case for libraries included with LAMMPS,
such as the dummy MPI library in src/STUBS or any package libraries in
lib/packages, since they are always built as shared libraries using
the -fPIC switch. However, if a library like MPI or FFTW does not
exist as a shared library, the shared library build will generate an
error. This means you will need to install a shared library version
of the auxiliary library. The build instructions for the library
should tell you how to do this.
Here is an example of such errors when the system FFTW or provided
lib/colvars library have not been built as shared libraries:
/usr/bin/ld: /usr/local/lib/libfftw3.a(mapflags.o): relocation
R_X86_64_32 against '.rodata' can not be used when making a shared
object; recompile with -fPIC
/usr/local/lib/libfftw3.a: could not read symbols: Bad value :pre
/usr/bin/ld: ../../lib/colvars/libcolvars.a(colvarmodule.o):
relocation R_X86_64_32 against '__pthread_key_create' can not be used
when making a shared object; recompile with -fPIC
../../lib/colvars/libcolvars.a: error adding symbols: Bad value :pre
As an example, here is how to build and install the "MPICH
library"_mpich, a popular open-source version of MPI, distributed by
Argonne National Labs, as a shared library in the default
/usr/local/lib location:
:link(mpich,http://www-unix.mcs.anl.gov/mpi)
./configure --enable-shared
make
make install :pre
You may need to use "sudo make install" in place of the last line if
you do not have write privileges for /usr/local/lib. The end result
should be the file /usr/local/lib/libmpich.so.
[Additional requirement for using a shared library:] :h5
The operating system finds shared libraries to load at run-time using
the environment variable LD_LIBRARY_PATH. So you may wish to copy the
file src/liblammps.so or src/liblammps_g++.so (for example) to a place
the system can find it by default, such as /usr/local/lib, or you may
wish to add the LAMMPS src directory to LD_LIBRARY_PATH, so that the
current version of the shared library is always available to programs
that use it.
For the csh or tcsh shells, you would add something like this to your
~/.cshrc file:
setenv LD_LIBRARY_PATH $\{LD_LIBRARY_PATH\}:/home/sjplimp/lammps/src :pre
Calling the LAMMPS library :h5
Either flavor of library (static or shared) allows one or more LAMMPS
objects to be instantiated from the calling program.
When used from a C++ program, all of LAMMPS is wrapped in a LAMMPS_NS
namespace; you can safely use any of its classes and methods from
within the calling code, as needed.
When used from a C or Fortran program or a scripting language like
Python, the library has a simple function-style interface, provided in
src/library.cpp and src/library.h.
See the sample codes in examples/COUPLE/simple for examples of C++ and
C and Fortran codes that invoke LAMMPS thru its library interface.
There are other examples as well in the COUPLE directory which are
discussed in "Section 6.10"_Section_howto.html#howto_10 of the
manual. See "Section 11"_Section_python.html of the manual for a
description of the Python wrapper provided with LAMMPS that operates
through the LAMMPS library interface.
The files src/library.cpp and library.h define the C-style API for
using LAMMPS as a library. See "Section
6.19"_Section_howto.html#howto_19 of the manual for a description of the
interface and how to extend it for your needs.
:line
-2.6 Running LAMMPS :h4,link(start_6)
+2.5 Running LAMMPS :h4,link(start_5)
By default, LAMMPS runs by reading commands from standard input. Thus
if you run the LAMMPS executable by itself, e.g.
lmp_linux :pre
it will simply wait, expecting commands from the keyboard. Typically
you should put commands in an input script and use I/O redirection,
e.g.
lmp_linux < in.file :pre
For parallel environments this should also work. If it does not, use
the '-in' command-line switch, e.g.
lmp_linux -in in.file :pre
"This section"_Section_commands.html describes how input scripts are
structured and what commands they contain.
You can test LAMMPS on any of the sample inputs provided in the
examples or bench directory. Input scripts are named in.* and sample
outputs are named log.*.name.P where name is a machine and P is the
number of processors it was run on.
Here is how you might run a standard Lennard-Jones benchmark on a
Linux box, using mpirun to launch a parallel job:
cd src
make linux
cp lmp_linux ../bench
cd ../bench
mpirun -np 4 lmp_linux -in in.lj :pre
See "this page"_bench for timings for this and the other benchmarks on
various platforms. Note that some of the example scripts require
LAMMPS to be built with one or more of its optional packages.
:link(bench,http://lammps.sandia.gov/bench.html)
:line
On a Windows box, you can skip making LAMMPS and simply download an
installer package from "here"_http://rpm.lammps.org/windows.html
For running the non-MPI executable, follow these steps:
Get a command prompt by going to Start->Run... ,
then typing "cmd". :ulb,l
Move to the directory where you have your input, e.g. a copy of
the [in.lj] input from the bench folder. (e.g. by typing: cd "Documents"). :l
At the command prompt, type "lmp_serial -in in.lj", replacing [in.lj]
with the name of your LAMMPS input script. :l
:ule
For the MPI version, which allows you to run LAMMPS under Windows on
multiple processors, follow these steps:
Download and install
"MPICH2"_http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads
for Windows. :ulb,l
The LAMMPS Windows installer packages will automatically adjust your
path for the default location of this MPI package. After the installation
of the MPICH software, it needs to be integrated into the system.
For this you need to start a Command Prompt in {Administrator Mode}
(right click on the icon and select it). Change into the MPICH2
installation directory, then into the subdirectory [bin] and execute
[smpd.exe -install]. Exit the command window.
Get a new, regular command prompt by going to Start->Run... ,
then typing "cmd". :l
Move to the directory where you have your input file
(e.g. by typing: cd "Documents"). :l
Then type something like this:
mpiexec -localonly 4 lmp_mpi -in in.lj :pre
or
mpiexec -np 4 lmp_mpi -in in.lj :pre
replacing in.lj with the name of your LAMMPS input script. For the latter
case, you may be prompted to enter your password. :l
In this mode, output may not immediately show up on the screen, so if
your input script takes a long time to execute, you may need to be
patient before the output shows up. :l
The parallel executable can also run on a single processor by typing
something like:
lmp_mpi -in in.lj :pre
:ule
:line
The screen output from LAMMPS is described in a section below. As it
runs, LAMMPS also writes a log.lammps file with the same information.
Note that this sequence of commands copies the LAMMPS executable
(lmp_linux) to the directory with the input files. This may not be
necessary, but some versions of MPI reset the working directory to
where the executable is, rather than leave it as the directory where
you launch mpirun from (if you launch lmp_linux on its own and not
under mpirun). If that happens, LAMMPS will look for additional input
files and write its output files to the executable directory, rather
than your working directory, which is probably not what you want.
If LAMMPS encounters errors in the input script or while running a
simulation it will print an ERROR message and stop or a WARNING
message and continue. See "Section 12"_Section_errors.html for a
discussion of the various kinds of errors LAMMPS can or can't detect,
a list of all ERROR and WARNING messages, and what to do about them.
LAMMPS can run a problem on any number of processors, including a
single processor. In theory you should get identical answers on any
number of processors and on any machine. In practice, numerical
round-off can cause slight differences and eventual divergence of
molecular dynamics phase space trajectories.
LAMMPS can run as large a problem as will fit in the physical memory
of one or more processors. If you run out of memory, you must run on
more processors or setup a smaller problem.
:line
-2.7 Command-line options :h4,link(start_7)
+2.6 Command-line options :h4,link(start_6)
At run time, LAMMPS recognizes several optional command-line switches
which may be used in any order. Either the full word or a one-or-two
letter abbreviation can be used:
-e or -echo
-h or -help
-i or -in
-k or -kokkos
-l or -log
-nc or -nocite
-pk or -package
-p or -partition
-pl or -plog
-ps or -pscreen
-r or -restart
-ro or -reorder
-sc or -screen
-sf or -suffix
-v or -var :ul
For example, lmp_ibm might be launched as follows:
mpirun -np 16 lmp_ibm -v f tmp.out -l my.log -sc none -in in.alloy
mpirun -np 16 lmp_ibm -var f tmp.out -log my.log -screen none -in in.alloy :pre
Here are the details on the options:
-echo style :pre
Set the style of command echoing. The style can be {none} or {screen}
or {log} or {both}. Depending on the style, each command read from
the input script will be echoed to the screen and/or logfile. This
can be useful to figure out which line of your script is causing an
input error. The default value is {log}. The echo style can also be
set by using the "echo"_echo.html command in the input script itself.
-help :pre
Print a brief help summary and a list of options compiled into this
executable for each LAMMPS style (atom_style, fix, compute,
pair_style, bond_style, etc). This can tell you if the command you
want to use was included via the appropriate package at compile time.
LAMMPS will print the info and immediately exit if this switch is
used.
-in file :pre
Specify a file to use as an input script. This is an optional switch
when running LAMMPS in one-partition mode. If it is not specified,
LAMMPS reads its script from standard input, typically from a script
via I/O redirection; e.g. lmp_linux < in.run. I/O redirection should
also work in parallel, but if it does not (in the unlikely case that
an MPI implementation does not support it), then use the -in flag.
Note that this is a required switch when running LAMMPS in
multi-partition mode, since multiple processors cannot all read from
stdin.
-kokkos on/off keyword/value ... :pre
Explicitly enable or disable KOKKOS support, as provided by the KOKKOS
package. Even if LAMMPS is built with this package, as described
above in "Section 2.3"_#start_3, this switch must be set to enable
running with the KOKKOS-enabled styles the package provides. If the
switch is not set (the default), LAMMPS will operate as if the KOKKOS
package were not installed; i.e. you can run standard LAMMPS or with
the GPU or USER-OMP packages, for testing or benchmarking purposes.
Additional optional keyword/value pairs can be specified which
determine how Kokkos will use the underlying hardware on your
platform. These settings apply to each MPI task you launch via the
"mpirun" or "mpiexec" command. You may choose to run one or more MPI
tasks per physical node. Note that if you are running on a desktop
machine, you typically have one physical node. On a cluster or
supercomputer there may be dozens or 1000s of physical nodes.
Either the full word or an abbreviation can be used for the keywords.
Note that the keywords do not use a leading minus sign. I.e. the
keyword is "t", not "-t". Also note that each of the keywords has a
default setting. Example of when to use these options and what
settings to use on different platforms is given in "Section
5.3"_Section_accelerate.html#acc_3.
d or device
g or gpus
t or threads
n or numa :ul
device Nd :pre
This option is only relevant if you built LAMMPS with CUDA=yes, you
have more than one GPU per node, and if you are running with only one
MPI task per node. The Nd setting is the ID of the GPU on the node to
run on. By default Nd = 0. If you have multiple GPUs per node, they
have consecutive IDs numbered as 0,1,2,etc. This setting allows you
to launch multiple independent jobs on the node, each with a single
MPI task per node, and assign each job to run on a different GPU.
gpus Ng Ns :pre
This option is only relevant if you built LAMMPS with CUDA=yes, you
have more than one GPU per node, and you are running with multiple MPI
tasks per node (up to one per GPU). The Ng setting is how many GPUs
you will use. The Ns setting is optional. If set, it is the ID of a
GPU to skip when assigning MPI tasks to GPUs. This may be useful if
your desktop system reserves one GPU to drive the screen and the rest
are intended for computational work like running LAMMPS. By default
Ng = 1 and Ns is not set.
Depending on which flavor of MPI you are running, LAMMPS will look for
one of these 3 environment variables
SLURM_LOCALID (various MPI variants compiled with SLURM support)
MV2_COMM_WORLD_LOCAL_RANK (Mvapich)
OMPI_COMM_WORLD_LOCAL_RANK (OpenMPI) :pre
which are initialized by the "srun", "mpirun" or "mpiexec" commands.
The environment variable setting for each MPI rank is used to assign a
unique GPU ID to the MPI task.
threads Nt :pre
This option assigns Nt number of threads to each MPI task for
performing work when Kokkos is executing in OpenMP or pthreads mode.
The default is Nt = 1, which essentially runs in MPI-only mode. If
there are Np MPI tasks per physical node, you generally want Np*Nt =
the number of physical cores per node, to use your available hardware
optimally. This also sets the number of threads used by the host when
LAMMPS is compiled with CUDA=yes.
numa Nm :pre
This option is only relevant when using pthreads with hwloc support.
-In this case Nm defines the number of NUMA regions (typically sockets)
-on a node which will be utilized by a single MPI rank. By default Nm
+In this case Nm defines the number of NUMA regions (typicaly sockets)
+on a node which will be utilizied by a single MPI rank. By default Nm
= 1. If this option is used the total number of worker-threads per
MPI rank is threads*numa. Currently it is always almost better to
assign at least one MPI rank per NUMA region, and leave numa set to
its default value of 1. This is because letting a single process span
multiple NUMA regions induces a significant amount of cross NUMA data
traffic which is slow.
-log file :pre
Specify a log file for LAMMPS to write status information to. In
one-partition mode, if the switch is not used, LAMMPS writes to the
file log.lammps. If this switch is used, LAMMPS writes to the
specified file. In multi-partition mode, if the switch is not used, a
log.lammps file is created with hi-level status information. Each
partition also writes to a log.lammps.N file where N is the partition
ID. If the switch is specified in multi-partition mode, the hi-level
logfile is named "file" and each partition also logs information to a
file.N. For both one-partition and multi-partition mode, if the
specified file is "none", then no log files are created. Using a
"log"_log.html command in the input script will override this setting.
Option -plog will override the name of the partition log files file.N.
-nocite :pre
Disable writing the log.cite file which is normally written to list
references for specific cite-able features used during a LAMMPS run.
See the "citation page"_http://lammps.sandia.gov/cite.html for more
details.
-package style args .... :pre
Invoke the "package"_package.html command with style and args. The
syntax is the same as if the command appeared at the top of the input
script. For example "-package gpu 2" or "-pk gpu 2" is the same as
"package gpu 2"_package.html in the input script. The possible styles
and args are documented on the "package"_package.html doc page. This
switch can be used multiple times, e.g. to set options for the
USER-INTEL and USER-OMP packages which can be used together.
Along with the "-suffix" command-line switch, this is a convenient
mechanism for invoking accelerator packages and their options without
having to edit an input script.
-partition 8x2 4 5 ... :pre
Invoke LAMMPS in multi-partition mode. When LAMMPS is run on P
processors and this switch is not used, LAMMPS runs in one partition,
i.e. all P processors run a single simulation. If this switch is
used, the P processors are split into separate partitions and each
partition runs its own simulation. The arguments to the switch
specify the number of processors in each partition. Arguments of the
form MxN mean M partitions, each with N processors. Arguments of the
form N mean a single partition with N processors. The sum of
processors in all partitions must equal P. Thus the command
"-partition 8x2 4 5" has 10 partitions and runs on a total of 25
processors.
Running with multiple partitions can e useful for running
"multi-replica simulations"_Section_howto.html#howto_5, where each
replica runs on on one or a few processors. Note that with MPI
installed on a machine (e.g. your desktop), you can run on more
(virtual) processors than you have physical processors.
-To run multiple independent simulations from one input script, using
+To run multiple independent simulatoins from one input script, using
multiple partitions, see "Section 6.4"_Section_howto.html#howto_4
of the manual. World- and universe-style "variables"_variable.html
are useful in this context.
-plog file :pre
Specify the base name for the partition log files, so partition N
writes log information to file.N. If file is none, then no partition
log files are created. This overrides the filename specified in the
-log command-line option. This option is useful when working with
large numbers of partitions, allowing the partition log files to be
suppressed (-plog none) or placed in a sub-directory (-plog
replica_files/log.lammps) If this option is not used the log file for
partition N is log.lammps.N or whatever is specified by the -log
command-line option.
-pscreen file :pre
Specify the base name for the partition screen file, so partition N
writes screen information to file.N. If file is none, then no
partition screen files are created. This overrides the filename
specified in the -screen command-line option. This option is useful
when working with large numbers of partitions, allowing the partition
screen files to be suppressed (-pscreen none) or placed in a
sub-directory (-pscreen replica_files/screen). If this option is not
used the screen file for partition N is screen.N or whatever is
specified by the -screen command-line option.
-restart restartfile {remap} datafile keyword value ... :pre
Convert the restart file into a data file and immediately exit. This
is the same operation as if the following 2-line input script were
run:
read_restart restartfile {remap}
write_data datafile keyword value ... :pre
Note that the specified restartfile and datafile can have wild-card
characters ("*",%") as described by the
"read_restart"_read_restart.html and "write_data"_write_data.html
commands. But a filename such as file.* will need to be enclosed in
quotes to avoid shell expansion of the "*" character.
Note that following restartfile, the optional flag {remap} can be
used. This has the same effect as adding it to the
"read_restart"_read_restart.html command, as explained on its doc
page. This is only useful if the reading of the restart file triggers
an error that atoms have been lost. In that case, use of the remap
flag should allow the data file to still be produced.
Also note that following datafile, the same optional keyword/value
pairs can be listed as used by the "write_data"_write_data.html
command.
-reorder nth N
-reorder custom filename :pre
Reorder the processors in the MPI communicator used to instantiate
LAMMPS, in one of several ways. The original MPI communicator ranks
all P processors from 0 to P-1. The mapping of these ranks to
physical processors is done by MPI before LAMMPS begins. It may be
useful in some cases to alter the rank order. E.g. to insure that
cores within each node are ranked in a desired order. Or when using
the "run_style verlet/split"_run_style.html command with 2 partitions
to insure that a specific Kspace processor (in the 2nd partition) is
matched up with a specific set of processors in the 1st partition.
See the "Section 5"_Section_accelerate.html doc pages for
more details.
If the keyword {nth} is used with a setting {N}, then it means every
Nth processor will be moved to the end of the ranking. This is useful
when using the "run_style verlet/split"_run_style.html command with 2
partitions via the -partition command-line switch. The first set of
processors will be in the first partition, the 2nd set in the 2nd
partition. The -reorder command-line switch can alter this so that
the 1st N procs in the 1st partition and one proc in the 2nd partition
will be ordered consecutively, e.g. as the cores on one physical node.
This can boost performance. For example, if you use "-reorder nth 4"
and "-partition 9 3" and you are running on 12 processors, the
processors will be reordered from
0 1 2 3 4 5 6 7 8 9 10 11 :pre
to
0 1 2 4 5 6 8 9 10 3 7 11 :pre
so that the processors in each partition will be
0 1 2 4 5 6 8 9 10
3 7 11 :pre
See the "processors" command for how to insure processors from each
partition could then be grouped optimally for quad-core nodes.
If the keyword is {custom}, then a file that specifies a permutation
of the processor ranks is also specified. The format of the reorder
file is as follows. Any number of initial blank or comment lines
(starting with a "#" character) can be present. These should be
followed by P lines of the form:
I J :pre
where P is the number of processors LAMMPS was launched with. Note
that if running in multi-partition mode (see the -partition switch
above) P is the total number of processors in all partitions. The I
and J values describe a permutation of the P processors. Every I and
J should be values from 0 to P-1 inclusive. In the set of P I values,
every proc ID should appear exactly once. Ditto for the set of P J
values. A single I,J pairing means that the physical processor with
rank I in the original MPI communicator will have rank J in the
reordered communicator.
Note that rank ordering can also be specified by many MPI
implementations, either by environment variables that specify how to
order physical processors, or by config files that specify what
physical processors to assign to each MPI rank. The -reorder switch
simply gives you a portable way to do this without relying on MPI
itself. See the "processors out"_processors.html command for how
to output info on the final assignment of physical processors to
the LAMMPS simulation domain.
-screen file :pre
Specify a file for LAMMPS to write its screen information to. In
one-partition mode, if the switch is not used, LAMMPS writes to the
screen. If this switch is used, LAMMPS writes to the specified file
instead and you will see no screen output. In multi-partition mode,
if the switch is not used, hi-level status information is written to
the screen. Each partition also writes to a screen.N file where N is
the partition ID. If the switch is specified in multi-partition mode,
the hi-level screen dump is named "file" and each partition also
writes screen information to a file.N. For both one-partition and
multi-partition mode, if the specified file is "none", then no screen
output is performed. Option -pscreen will override the name of the
partition screen files file.N.
-suffix style args :pre
Use variants of various styles if they exist. The specified style can
be {cuda}, {gpu}, {intel}, {kk}, {omp}, {opt}, or {hybrid}. These
refer to optional packages that LAMMPS can be built with, as described
above in "Section 2.3"_#start_3. The "gpu" style corresponds to the
GPU package, the "intel" style to the USER-INTEL package, the "kk"
style to the KOKKOS package, the "opt" style to the OPT package, and
the "omp" style to the USER-OMP package. The hybrid style is the only
style that accepts arguments. It allows for two packages to be
specified. The first package specified is the default and will be used
if it is available. If no style is available for the first package,
the style for the second package will be used if available. For
example, "-suffix hybrid intel omp" will use styles from the
USER-INTEL package if they are installed and available, but styles for
the USER-OMP package otherwise.
Along with the "-package" command-line switch, this is a convenient
mechanism for invoking accelerator packages and their options without
having to edit an input script.
As an example, all of the packages provide a "pair_style
lj/cut"_pair_lj.html variant, with style names lj/cut/gpu,
lj/cut/intel, lj/cut/kk, lj/cut/omp, and lj/cut/opt. A variant style
can be specified explicitly in your input script, e.g. pair_style
lj/cut/gpu. If the -suffix switch is used the specified suffix
(gpu,intel,kk,omp,opt) is automatically appended whenever your input
script command creates a new "atom"_atom_style.html,
"pair"_pair_style.html, "fix"_fix.html, "compute"_compute.html, or
"run"_run_style.html style. If the variant version does not exist,
the standard version is created.
For the GPU package, using this command-line switch also invokes the
default GPU settings, as if the command "package gpu 1" were used at
the top of your input script. These settings can be changed by using
the "-package gpu" command-line switch or the "package
gpu"_package.html command in your script.
For the USER-INTEL package, using this command-line switch also
invokes the default USER-INTEL settings, as if the command "package
intel 1" were used at the top of your input script. These settings
can be changed by using the "-package intel" command-line switch or
the "package intel"_package.html command in your script. If the
USER-OMP package is also installed, the hybrid style with "intel omp"
arguments can be used to make the omp suffix a second choice, if a
requested style is not available in the USER-INTEL package. It will
also invoke the default USER-OMP settings, as if the command "package
omp 0" were used at the top of your input script. These settings can
be changed by using the "-package omp" command-line switch or the
"package omp"_package.html command in your script.
For the KOKKOS package, using this command-line switch also invokes
the default KOKKOS settings, as if the command "package kokkos" were
used at the top of your input script. These settings can be changed
by using the "-package kokkos" command-line switch or the "package
kokkos"_package.html command in your script.
For the OMP package, using this command-line switch also invokes the
default OMP settings, as if the command "package omp 0" were used at
the top of your input script. These settings can be changed by using
the "-package omp" command-line switch or the "package
omp"_package.html command in your script.
The "suffix"_suffix.html command can also be used within an input
script to set a suffix, or to turn off or back on any suffix setting
made via the command line.
-var name value1 value2 ... :pre
Specify a variable that will be defined for substitution purposes when
the input script is read. This switch can be used multiple times to
define multiple variables. "Name" is the variable name which can be a
single character (referenced as $x in the input script) or a full
string (referenced as $\{abc\}). An "index-style
variable"_variable.html will be created and populated with the
subsequent values, e.g. a set of filenames. Using this command-line
option is equivalent to putting the line "variable name index value1
value2 ..." at the beginning of the input script. Defining an index
variable as a command-line argument overrides any setting for the same
index variable in the input script, since index variables cannot be
re-defined. See the "variable"_variable.html command for more info on
defining index and other kinds of variables and "this
section"_Section_commands.html#cmd_2 for more info on using variables
in input scripts.
NOTE: Currently, the command-line parser looks for arguments that
start with "-" to indicate new switches. Thus you cannot specify
multiple variable values if any of they start with a "-", e.g. a
negative numeric value. It is OK if the first value1 starts with a
"-", since it is automatically skipped.
:line
-2.8 LAMMPS screen output :h4,link(start_8)
+2.7 LAMMPS screen output :h4,link(start_7)
As LAMMPS reads an input script, it prints information to both the
screen and a log file about significant actions it takes to setup a
simulation. When the simulation is ready to begin, LAMMPS performs
various initializations and prints the amount of memory (in MBytes per
processor) that the simulation requires. It also prints details of
the initial thermodynamic state of the system. During the run itself,
thermodynamic information is printed periodically, every few
timesteps. When the run concludes, LAMMPS prints the final
thermodynamic state and a total run time for the simulation. It then
appends statistics about the CPU time and storage requirements for the
simulation. An example set of statistics is shown here:
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms :pre
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre
MPI task timings breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 1.9808 | 2.0134 | 2.0318 | 1.4 | 71.60
Bond | 0.0021894 | 0.0060319 | 0.010058 | 4.7 | 0.21
Kspace | 0.3207 | 0.3366 | 0.36616 | 3.1 | 11.97
Neigh | 0.28411 | 0.28464 | 0.28516 | 0.1 | 10.12
Comm | 0.075732 | 0.077018 | 0.07883 | 0.4 | 2.74
Output | 0.00030518 | 0.00042665 | 0.00078821 | 1.0 | 0.02
Modify | 0.086606 | 0.086631 | 0.086668 | 0.0 | 3.08
Other | | 0.007178 | | | 0.26 :pre
Nlocal: 501 ave 508 max 490 min
Histogram: 1 0 0 0 0 0 1 1 0 1
Nghost: 6586.25 ave 6628 max 6548 min
Histogram: 1 0 1 0 0 0 1 0 0 1
Neighs: 177007 ave 180562 max 170212 min
Histogram: 1 0 0 0 0 0 0 1 1 1 :pre
Total # of neighbors = 708028
Ave neighs/atom = 353.307
Ave special neighs/atom = 2.34032
Neighbor list builds = 26
Dangerous builds = 0 :pre
The first section provides a global loop timing summary. The {loop time}
is the total wall time for the section. The {Performance} line is
provided for convenience to help predicting the number of loop
continuations required and for comparing performance with other,
-similar MD codes. The {CPU use} line provides the CPU utilization per
+similar MD codes. The {CPU use} line provides the CPU utilzation per
MPI task; it should be close to 100% times the number of OpenMP
threads (or 1 of no OpenMP). Lower numbers correspond to delays due
to file I/O or insufficient thread utilization.
The MPI task section gives the breakdown of the CPU run time (in
seconds) into major categories:
{Pair} stands for all non-bonded force computation
{Bond} stands for bonded interactions: bonds, angles, dihedrals, impropers
{Kspace} stands for reciprocal space interactions: Ewald, PPPM, MSM
{Neigh} stands for neighbor list construction
{Comm} stands for communicating atoms and their properties
{Output} stands for writing dumps and thermo output
{Modify} stands for fixes and computes called by them
{Other} is the remaining time :ul
For each category, there is a breakdown of the least, average and most
amount of wall time a processor spent on this section. Also you have the
variation from the average time. Together these numbers allow to gauge
the amount of load imbalance in this segment of the calculation. Ideally
the difference between minimum, maximum and average is small and thus
the variation from the average close to zero. The final column shows
the percentage of the total loop time is spent in this section.
When using the "timer full"_timer.html setting, an additional column
is present that also prints the CPU utilization in percent. In
addition, when using {timer full} and the "package omp"_package.html
command are active, a similar timing summary of time spent in threaded
regions to monitor thread utilization and load balance is provided. A
new entry is the {Reduce} section, which lists the time spent in
reducing the per-thread data elements to the storage for non-threaded
computation. These thread timings are taking from the first MPI rank
only and and thus, as the breakdown for MPI tasks can change from MPI
rank to MPI rank, this breakdown can be very different for individual
ranks. Here is an example output for this section:
Thread timings breakdown (MPI rank 0):
Total threaded time 0.6846 / 90.6%
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 0.5127 | 0.5147 | 0.5167 | 0.3 | 75.18
Bond | 0.0043139 | 0.0046779 | 0.0050418 | 0.5 | 0.68
Kspace | 0.070572 | 0.074541 | 0.07851 | 1.5 | 10.89
Neigh | 0.084778 | 0.086969 | 0.089161 | 0.7 | 12.70
Reduce | 0.0036485 | 0.003737 | 0.0038254 | 0.1 | 0.55 :pre
The third section lists the number of owned atoms (Nlocal), ghost atoms
(Nghost), and pair-wise neighbors stored per processor. The max and min
values give the spread of these values across processors with a 10-bin
histogram showing the distribution. The total number of histogram counts
is equal to the number of processors.
The last section gives aggregate statistics for pair-wise neighbors
and special neighbors that LAMMPS keeps track of (see the
"special_bonds"_special_bonds.html command). The number of times
neighbor lists were rebuilt during the run is given as well as the
number of potentially "dangerous" rebuilds. If atom movement
triggered neighbor list rebuilding (see the
"neigh_modify"_neigh_modify.html command), then dangerous
reneighborings are those that were triggered on the first timestep
atom movement was checked for. If this count is non-zero you may wish
to reduce the delay factor to insure no force interactions are missed
by atoms moving beyond the neighbor skin distance before a rebuild
takes place.
If an energy minimization was performed via the
"minimize"_minimize.html command, additional information is printed,
e.g.
Minimization stats:
Stopping criterion = linesearch alpha is zero
Energy initial, next-to-last, final =
-6372.3765206 -8328.46998942 -8328.46998942
Force two-norm initial, final = 1059.36 5.36874
Force max component initial, final = 58.6026 1.46872
Final line search alpha, max atom move = 2.7842e-10 4.0892e-10
Iterations, force evaluations = 701 1516 :pre
The first line prints the criterion that determined the minimization
to be completed. The third line lists the initial and final energy,
as well as the energy on the next-to-last iteration. The next 2 lines
give a measure of the gradient of the energy (force on all atoms).
The 2-norm is the "length" of this force vector; the inf-norm is the
largest component. Then some information about the line search and
statistics on how many iterations and force-evaluations the minimizer
required. Multiple force evaluations are typically done at each
iteration to perform a 1d line minimization in the search direction.
If a "kspace_style"_kspace_style.html long-range Coulombics solve was
performed during the run (PPPM, Ewald), then additional information is
printed, e.g.
FFT time (% of Kspce) = 0.200313 (8.34477)
FFT Gflps 3d 1d-only = 2.31074 9.19989 :pre
The first line gives the time spent doing 3d FFTs (4 per timestep) and
the fraction it represents of the total KSpace time (listed above).
Each 3d FFT requires computation (3 sets of 1d FFTs) and communication
(transposes). The total flops performed is 5Nlog_2(N), where N is the
number of points in the 3d grid. The FFTs are timed with and without
the communication and a Gflop rate is computed. The 3d rate is with
communication; the 1d rate is without (just the 1d FFTs). Thus you
can estimate what fraction of your FFT time was spent in
communication, roughly 75% in the example above.
:line
-2.9 Tips for users of previous LAMMPS versions :h4,link(start_9)
+2.8 Tips for users of previous LAMMPS versions :h4,link(start_8)
The current C++ began with a complete rewrite of LAMMPS 2001, which
was written in F90. Features of earlier versions of LAMMPS are listed
in "Section 13"_Section_history.html. The F90 and F77 versions
(2001 and 99) are also freely distributed as open-source codes; check
the "LAMMPS WWW Site"_lws for distribution information if you prefer
those versions. The 99 and 2001 versions are no longer under active
development; they do not have all the features of C++ LAMMPS.
If you are a previous user of LAMMPS 2001, these are the most
significant changes you will notice in C++ LAMMPS:
(1) The names and arguments of many input script commands have
changed. All commands are now a single word (e.g. read_data instead
of read data).
(2) All the functionality of LAMMPS 2001 is included in C++ LAMMPS,
but you may need to specify the relevant commands in different ways.
(3) The format of the data file can be streamlined for some problems.
See the "read_data"_read_data.html command for details. The data file
section "Nonbond Coeff" has been renamed to "Pair Coeff" in C++ LAMMPS.
(4) Binary restart files written by LAMMPS 2001 cannot be read by C++
LAMMPS with a "read_restart"_read_restart.html command. This is
because they were output by F90 which writes in a different binary
format than C or C++ writes or reads. Use the {restart2data} tool
provided with LAMMPS 2001 to convert the 2001 restart file to a text
data file. Then edit the data file as necessary before using the C++
LAMMPS "read_data"_read_data.html command to read it in.
(5) There are numerous small numerical changes in C++ LAMMPS that mean
you will not get identical answers when comparing to a 2001 run.
However, your initial thermodynamic energy and MD trajectory should be
close if you have setup the problem for both codes the same.
diff --git a/doc/src/Section_tools.txt b/doc/src/Section_tools.txt
index 03611c7cd..d95c4f0cd 100644
--- a/doc/src/Section_tools.txt
+++ b/doc/src/Section_tools.txt
@@ -1,497 +1,500 @@
"Previous Section"_Section_perf.html - "LAMMPS WWW Site"_lws - "LAMMPS
Documentation"_ld - "LAMMPS Commands"_lc - "Next
Section"_Section_modify.html :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
9. Additional tools :h3
LAMMPS is designed to be a computational kernel for performing
molecular dynamics computations. Additional pre- and post-processing
steps are often necessary to setup and analyze a simulation. A
list of such tools can be found on the LAMMPS home page
at "http://lammps.sandia.gov/prepost.html"_http://lammps.sandia.gov/prepost.html
A few additional tools are provided with the LAMMPS distribution
and are described in this section.
Our group has also written and released a separate toolkit called
"Pizza.py"_pizza which provides tools for doing setup, analysis,
plotting, and visualization for LAMMPS simulations. Pizza.py is
written in "Python"_python and is available for download from "the
Pizza.py WWW site"_pizza.
:link(pizza,http://www.sandia.gov/~sjplimp/pizza.html)
:link(python,http://www.python.org)
Note that many users write their own setup or analysis tools or use
other existing codes and convert their output to a LAMMPS input format
or vice versa. The tools listed here are included in the LAMMPS
distribution as examples of auxiliary tools. Some of them are not
actively supported by Sandia, as they were contributed by LAMMPS
users. If you have problems using them, we can direct you to the
authors.
The source code for each of these codes is in the tools sub-directory
of the LAMMPS distribution. There is a Makefile (which you may need
to edit for your platform) which will build several of the tools which
reside in that directory. Most of them are larger packages in their
own sub-directories with their own Makefiles and/or README files.
"amber2lmp"_#amber
"binary2txt"_#binary
"ch2lmp"_#charmm
"chain"_#chain
"colvars"_#colvars
"createatoms"_#createatoms
"drude"_#drude
"eam database"_#eamdb
"eam generate"_#eamgn
"eff"_#eff
"emacs"_#emacs
"fep"_#fep
"i-pi"_#ipi
"ipp"_#ipp
"kate"_#kate
"lmp2arc"_#arc
"lmp2cfg"_#cfg
"matlab"_#matlab
"micelle2d"_#micelle
"moltemplate"_#moltemplate
"msi2lmp"_#msi
"phonon"_#phonon
"polybond"_#polybond
"pymol_asphere"_#pymol
"python"_#pythontools
"reax"_#reax_tool
"smd"_#smd
"vim"_#vim
"xmgrace"_#xmgrace
:line
amber2lmp tool :h4,link(amber)
The amber2lmp sub-directory contains two Python scripts for converting
files back-and-forth between the AMBER MD code and LAMMPS. See the
README file in amber2lmp for more information.
These tools were written by Keir Novik while he was at Queen Mary
University of London. Keir is no longer there and cannot support
these tools which are out-of-date with respect to the current LAMMPS
version (and maybe with respect to AMBER as well). Since we don't use
these tools at Sandia, you'll need to experiment with them and make
necessary modifications yourself.
:line
binary2txt tool :h4,link(binary)
The file binary2txt.cpp converts one or more binary LAMMPS dump file
into ASCII text files. The syntax for running the tool is
binary2txt file1 file2 ... :pre
which creates file1.txt, file2.txt, etc. This tool must be compiled
on a platform that can read the binary file created by a LAMMPS run,
since binary files are not compatible across all platforms.
:line
ch2lmp tool :h4,link(charmm)
The ch2lmp sub-directory contains tools for converting files
back-and-forth between the CHARMM MD code and LAMMPS.
They are intended to make it easy to use CHARMM as a builder and as a
post-processor for LAMMPS. Using charmm2lammps.pl, you can convert a
PDB file with associated CHARMM info, including CHARMM force field
data, into its LAMMPS equivalent. Using lammps2pdb.pl you can convert
LAMMPS atom dumps into PDB files.
See the README file in the ch2lmp sub-directory for more information.
These tools were created by Pieter in't Veld (pjintve at sandia.gov)
and Paul Crozier (pscrozi at sandia.gov) at Sandia.
:line
chain tool :h4,link(chain)
The file chain.f creates a LAMMPS data file containing bead-spring
polymer chains and/or monomer solvent atoms. It uses a text file
containing chain definition parameters as an input. The created
chains and solvent atoms can strongly overlap, so LAMMPS needs to run
the system initially with a "soft" pair potential to un-overlap it.
The syntax for running the tool is
chain < def.chain > data.file :pre
See the def.chain or def.chain.ab files in the tools directory for
examples of definition files. This tool was used to create the
system for the "chain benchmark"_Section_perf.html.
:line
colvars tools :h4,link(colvars)
The colvars directory contains a collection of tools for postprocessing
data produced by the colvars collective variable library.
To compile the tools, edit the makefile for your system and run "make".
Please report problems and issues the colvars library and its tools
at: https://github.com/colvars/colvars/issues
abf_integrate:
MC-based integration of multidimensional free energy gradient
Version 20110511
Syntax: ./abf_integrate < filename > \[-n < nsteps >\] \[-t < temp >\] \[-m \[0|1\] (metadynamics)\] \[-h < hill_height >\] \[-f < variable_hill_factor >\] :pre
The LAMMPS interface to the colvars collective variable library, as
well as these tools, were created by Axel Kohlmeyer (akohlmey at
gmail.com) at ICTP, Italy.
:line
createatoms tool :h4,link(createatoms)
The tools/createatoms directory contains a Fortran program called
createAtoms.f which can generate a variety of interesting crystal
structures and geometries and output the resulting list of atom
coordinates in LAMMPS or other formats.
See the included Manual.pdf for details.
The tool is authored by Xiaowang Zhou (Sandia), xzhou at sandia.gov.
:line
drude tool :h4,link(drude)
The tools/drude directory contains a Python script called
polarizer.py which can add Drude oscillators to a LAMMPS
data file in the required format.
See the header of the polarizer.py file for details.
The tool is authored by Agilio Padua and Alain Dequidt: agilio.padua
at univ-bpclermont.fr, alain.dequidt at univ-bpclermont.fr
:line
eam database tool :h4,link(eamdb)
The tools/eam_database directory contains a Fortran program that will
generate EAM alloy setfl potential files for any combination of 16
elements: Cu, Ag, Au, Ni, Pd, Pt, Al, Pb, Fe, Mo, Ta, W, Mg, Co, Ti,
Zr. The files can then be used with the "pair_style
eam/alloy"_pair_eam.html command.
The tool is authored by Xiaowang Zhou (Sandia), xzhou at sandia.gov,
and is based on his paper:
X. W. Zhou, R. A. Johnson, and H. N. G. Wadley, Phys. Rev. B, 69,
144113 (2004).
:line
eam generate tool :h4,link(eamgn)
The tools/eam_generate directory contains several one-file C programs
that convert an analytic formula into a tabulated "embedded atom
method (EAM)"_pair_eam.html setfl potential file. The potentials they
produce are in the potentials directory, and can be used with the
"pair_style eam/alloy"_pair_eam.html command.
The source files and potentials were provided by Gerolf Ziegenhain
(gerolf at ziegenhain.com).
:line
eff tool :h4,link(eff)
The tools/eff directory contains various scripts for generating
structures and post-processing output for simulations using the
electron force field (eFF).
These tools were provided by Andres Jaramillo-Botero at CalTech
(ajaramil at wag.caltech.edu).
:line
emacs tool :h4,link(emacs)
The tools/emacs directory contains a Lips add-on file for Emacs that
enables a lammps-mode for editing of input scripts when using Emacs,
with various highlighting options setup.
These tools were provided by Aidan Thompson at Sandia
(athomps at sandia.gov).
:line
fep tool :h4,link(fep)
The tools/fep directory contains Python scripts useful for
post-processing results from performing free-energy perturbation
simulations using the USER-FEP package.
The scripts were contributed by Agilio Padua (Universite Blaise
Pascal Clermont-Ferrand), agilio.padua at univ-bpclermont.fr.
See README file in the tools/fep directory.
:line
i-pi tool :h4,link(ipi)
The tools/i-pi directory contains a version of the i-PI package, with
all the LAMMPS-unrelated files removed. It is provided so that it can
be used with the "fix ipi"_fix_ipi.html command to perform
path-integral molecular dynamics (PIMD).
The i-PI package was created and is maintained by Michele Ceriotti,
michele.ceriotti at gmail.com, to interface to a variety of molecular
dynamics codes.
See the tools/i-pi/manual.pdf file for an overview of i-PI, and the
"fix ipi"_fix_ipi.html doc page for further details on running PIMD
calculations with LAMMPS.
:line
ipp tool :h4,link(ipp)
The tools/ipp directory contains a Perl script ipp which can be used
to facilitate the creation of a complicated file (say, a lammps input
script or tools/createatoms input file) using a template file.
ipp was created and is maintained by Reese Jones (Sandia), rjones at
sandia.gov.
See two examples in the tools/ipp directory. One of them is for the
tools/createatoms tool's input file.
:line
kate tool :h4,link(kate)
The file in the tools/kate directory is an add-on to the Kate editor
in the KDE suite that allow syntax highlighting of LAMMPS input
scripts. See the README.txt file for details.
The file was provided by Alessandro Luigi Sellerio
(alessandro.sellerio at ieni.cnr.it).
:line
lmp2arc tool :h4,link(arc)
The lmp2arc sub-directory contains a tool for converting LAMMPS output
files to the format for Accelrys' Insight MD code (formerly
MSI/Biosym and its Discover MD code). See the README file for more
information.
This tool was written by John Carpenter (Cray), Michael Peachey
(Cray), and Steve Lustig (Dupont). John is now at the Mayo Clinic
(jec at mayo.edu), but still fields questions about the tool.
This tool was updated for the current LAMMPS C++ version by Jeff
Greathouse at Sandia (jagreat at sandia.gov).
:line
lmp2cfg tool :h4,link(cfg)
The lmp2cfg sub-directory contains a tool for converting LAMMPS output
files into a series of *.cfg files which can be read into the
"AtomEye"_http://mt.seas.upenn.edu/Archive/Graphics/A visualizer. See
the README file for more information.
This tool was written by Ara Kooser at Sandia (askoose at sandia.gov).
:line
matlab tool :h4,link(matlab)
The matlab sub-directory contains several "MATLAB"_matlabhome scripts for
post-processing LAMMPS output. The scripts include readers for log
and dump files, a reader for EAM potential files, and a converter that
reads LAMMPS dump files and produces CFG files that can be visualized
with the "AtomEye"_http://mt.seas.upenn.edu/Archive/Graphics/A
visualizer.
See the README.pdf file for more information.
These scripts were written by Arun Subramaniyan at Purdue Univ
(asubrama at purdue.edu).
:link(matlabhome,http://www.mathworks.com)
:line
micelle2d tool :h4,link(micelle)
The file micelle2d.f creates a LAMMPS data file containing short lipid
chains in a monomer solution. It uses a text file containing lipid
definition parameters as an input. The created molecules and solvent
atoms can strongly overlap, so LAMMPS needs to run the system
initially with a "soft" pair potential to un-overlap it. The syntax
for running the tool is
micelle2d < def.micelle2d > data.file :pre
See the def.micelle2d file in the tools directory for an example of a
definition file. This tool was used to create the system for the
"micelle example"_Section_example.html.
:line
moltemplate tool :h4,link(moltemplate)
The moltemplate sub-directory contains a Python-based tool for
building molecular systems based on a text-file description, and
creating LAMMPS data files that encode their molecular topology as
lists of bonds, angles, dihedrals, etc. See the README.TXT file for
more information.
This tool was written by Andrew Jewett (jewett.aij at gmail.com), who
supports it. It has its own WWW page at
"http://moltemplate.org"_http://moltemplate.org.
:line
msi2lmp tool :h4,link(msi)
-The msi2lmp sub-directory contains a tool for creating LAMMPS input
-data files from BIOVIA's Materias Studio files (formerly Accelrys'
+The msi2lmp sub-directory contains a tool for creating LAMMPS template
+input and data files from BIOVIA's Materias Studio files (formerly Accelrys'
Insight MD code, formerly MSI/Biosym and its Discover MD code).
This tool was written by John Carpenter (Cray), Michael Peachey
(Cray), and Steve Lustig (Dupont). Several people contributed changes
to remove bugs and adapt its output to changes in LAMMPS.
-See the README file for more information.
+This tool has several known limitations and is no longer under active
+development, so there are no changes except for the occasional bugfix.
+
+See the README file in the tools/msi2lmp folder for more information.
:line
phonon tool :h4,link(phonon)
The phonon sub-directory contains a post-processing tool useful for
analyzing the output of the "fix phonon"_fix_phonon.html command in
the USER-PHONON package.
See the README file for instruction on building the tool and what
library it needs. And see the examples/USER/phonon directory
for example problems that can be post-processed with this tool.
This tool was written by Ling-Ti Kong at Shanghai Jiao Tong
University.
:line
polybond tool :h4,link(polybond)
The polybond sub-directory contains a Python-based tool useful for
performing "programmable polymer bonding". The Python file
lmpsdata.py provides a "Lmpsdata" class with various methods which can
be invoked by a user-written Python script to create data files with
complex bonding topologies.
See the Manual.pdf for details and example scripts.
This tool was written by Zachary Kraus at Georgia Tech.
:line
pymol_asphere tool :h4,link(pymol)
The pymol_asphere sub-directory contains a tool for converting a
LAMMPS dump file that contains orientation info for ellipsoidal
particles into an input file for the "PyMol visualization
package"_pymolhome or its "open source variant"_pymolopen.
:link(pymolhome,http://www.pymol.org)
:link(pymolopen,http://sourceforge.net/scm/?type=svn&group_id=4546)
Specifically, the tool triangulates the ellipsoids so they can be
viewed as true ellipsoidal particles within PyMol. See the README and
examples directory within pymol_asphere for more information.
This tool was written by Mike Brown at Sandia.
:line
python tool :h4,link(pythontools)
The python sub-directory contains several Python scripts
that perform common LAMMPS post-processing tasks, such as:
extract thermodynamic info from a log file as columns of numbers
plot two columns of thermodynamic info from a log file using GnuPlot
sort the snapshots in a dump file by atom ID
convert multiple "NEB"_neb.html dump files into one dump file for viz
convert dump files into XYZ, CFG, or PDB format for viz by other packages :ul
These are simple scripts built on "Pizza.py"_pizza modules. See the
README for more info on Pizza.py and how to use these scripts.
:line
reax tool :h4,link(reax_tool)
The reax sub-directory contains stand-alond codes that can
post-process the output of the "fix reax/bonds"_fix_reax_bonds.html
command from a LAMMPS simulation using "ReaxFF"_pair_reax.html. See
the README.txt file for more info.
These tools were written by Aidan Thompson at Sandia.
:line
smd tool :h4,link(smd)
The smd sub-directory contains a C++ file dump2vtk_tris.cpp and
Makefile which can be compiled and used to convert triangle output
files created by the Smooth-Mach Dynamics (USER-SMD) package into a
VTK-compatible unstructured grid file. It could then be read in and
visualized by VTK.
See the header of dump2vtk.cpp for more details.
This tool was written by the USER-SMD package author, Georg
Ganzenmuller at the Fraunhofer-Institute for High-Speed Dynamics,
Ernst Mach Institute in Germany (georg.ganzenmueller at emi.fhg.de).
:line
vim tool :h4,link(vim)
The files in the tools/vim directory are add-ons to the VIM editor
that allow easier editing of LAMMPS input scripts. See the README.txt
file for details.
These files were provided by Gerolf Ziegenhain (gerolf at
ziegenhain.com)
:line
xmgrace tool :h4,link(xmgrace)
The files in the tools/xmgrace directory can be used to plot the
thermodynamic data in LAMMPS log files via the xmgrace plotting
package. There are several tools in the directory that can be used in
post-processing mode. The lammpsplot.cpp file can be compiled and
used to create plots from the current state of a running LAMMPS
simulation.
See the README file for details.
These files were provided by Vikas Varshney (vv0210 at gmail.com)
diff --git a/doc/src/angle_sdk.txt b/doc/src/angle_sdk.txt
index 785585f84..0cc535e54 100644
--- a/doc/src/angle_sdk.txt
+++ b/doc/src/angle_sdk.txt
@@ -1,58 +1,58 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
angle_style sdk command :h3
[Syntax:]
angle_style sdk :pre
angle_style sdk/omp :pre
[Examples:]
angle_style sdk
angle_coeff 1 300.0 107.0 :pre
[Description:]
The {sdk} angle style is a combination of the harmonic angle potential,
:c,image(Eqs/angle_harmonic.jpg)
where theta0 is the equilibrium value of the angle and K a prefactor,
with the {repulsive} part of the non-bonded {lj/sdk} pair style
between the atoms 1 and 3. This angle potential is intended for
coarse grained MD simulations with the CMM parametrization using the
"pair_style lj/sdk"_pair_sdk.html. Relative to the pair_style
{lj/sdk}, however, the energy is shifted by {epsilon}, to avoid sudden
jumps. Note that the usual 1/2 factor is included in K.
The following coefficients must be defined for each angle type via the
"angle_coeff"_angle_coeff.html command as in the example above:
K (energy/radian^2)
theta0 (degrees) :ul
Theta0 is specified in degrees, but LAMMPS converts it to radians
internally; hence the units of K are in energy/radian^2.
The also required {lj/sdk} parameters will be extracted automatically
from the pair_style.
[Restrictions:]
This angle style can only be used if LAMMPS was built with the
-USER-CG-CMM package. See the "Making
+USER-CGSDK package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info on packages.
[Related commands:]
"angle_coeff"_angle_coeff.html, "angle_style
harmonic"_angle_harmonic.html, "pair_style lj/sdk"_pair_sdk.html,
"pair_style lj/sdk/coul/long"_pair_sdk.html
[Default:] none
diff --git a/doc/src/bonds.txt b/doc/src/bonds.txt
index 3b50f6482..169d56ecb 100644
--- a/doc/src/bonds.txt
+++ b/doc/src/bonds.txt
@@ -1,24 +1,23 @@
Bond Styles :h1
<!-- RST
.. toctree::
:maxdepth: 1
bond_class2
bond_fene
bond_fene_expand
bond_harmonic
bond_harmonic_shift
bond_harmonic_shift_cut
bond_hybrid
bond_morse
bond_none
bond_nonlinear
bond_oxdna
- bond_oxdna2
bond_quartic
bond_table
bond_zero
END_RST -->
diff --git a/doc/src/compute_sna_atom.txt b/doc/src/compute_sna_atom.txt
index e2df70647..f82df0d81 100644
--- a/doc/src/compute_sna_atom.txt
+++ b/doc/src/compute_sna_atom.txt
@@ -1,250 +1,274 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
compute sna/atom command :h3
compute snad/atom command :h3
compute snav/atom command :h3
[Syntax:]
compute ID group-ID sna/atom rcutfac rfac0 twojmax R_1 R_2 ... w_1 w_2 ... keyword values ...
compute ID group-ID snad/atom rcutfac rfac0 twojmax R_1 R_2 ... w_1 w_2 ... keyword values ...
compute ID group-ID snav/atom rcutfac rfac0 twojmax R_1 R_2 ... w_1 w_2 ... keyword values ... :pre
ID, group-ID are documented in "compute"_compute.html command :ulb,l
sna/atom = style name of this compute command :l
rcutfac = scale factor applied to all cutoff radii (positive real) :l
rfac0 = parameter in distance to angle conversion (0 < rcutfac < 1) :l
twojmax = band limit for bispectrum components (non-negative integer) :l
R_1, R_2,... = list of cutoff radii, one for each type (distance units) :l
w_1, w_2,... = list of neighbor weights, one for each type :l
zero or more keyword/value pairs may be appended :l
-keyword = {diagonal} or {rmin0} or {switchflag} or {bzeroflag} :l
+keyword = {diagonal} or {rmin0} or {switchflag} or {bzeroflag} or {quadraticflag}:l
{diagonal} value = {0} or {1} or {2} or {3}
{0} = all j1, j2, j <= twojmax, j2 <= j1
{1} = subset satisfying j1 == j2
{2} = subset satisfying j1 == j2 == j3
{3} = subset satisfying j2 <= j1 <= j
{rmin0} value = parameter in distance to angle conversion (distance units)
{switchflag} value = {0} or {1}
{0} = do not use switching function
{1} = use switching function
{bzeroflag} value = {0} or {1}
{0} = do not subtract B0
- {1} = subtract B0 :pre
+ {1} = subtract B0
+ {quadraticflag} value = {0} or {1}
+ {0} = do not generate quadratic terms
+ {1} = generate quadratic terms :pre
:ule
[Examples:]
compute b all sna/atom 1.4 0.99363 6 2.0 2.4 0.75 1.0 diagonal 3 rmin0 0.0
compute db all sna/atom 1.4 0.95 6 2.0 1.0
compute vb all sna/atom 1.4 0.95 6 2.0 1.0 :pre
[Description:]
Define a computation that calculates a set of bispectrum components
for each atom in a group.
Bispectrum components of an atom are order parameters characterizing
the radial and angular distribution of neighbor atoms. The detailed
mathematical definition is given in the paper by Thompson et
al. "(Thompson)"_#Thompson20141
The position of a neighbor atom {i'} relative to a central atom {i} is
a point within the 3D ball of radius {R_ii' = rcutfac*(R_i + R_i')}
Bartok et al. "(Bartok)"_#Bartok20101, proposed mapping this 3D ball
onto the 3-sphere, the surface of the unit ball in a four-dimensional
space. The radial distance {r} within {R_ii'} is mapped on to a third
polar angle {theta0} defined by,
:c,image(Eqs/compute_sna_atom1.jpg)
In this way, all possible neighbor positions are mapped on to a subset
of the 3-sphere. Points south of the latitude {theta0max=rfac0*Pi}
are excluded.
The natural basis for functions on the 3-sphere is formed by the 4D
hyperspherical harmonics {U^j_m,m'(theta, phi, theta0).} These
functions are better known as {D^j_m,m',} the elements of the Wigner
{D}-matrices "(Meremianin"_#Meremianin2006,
"Varshalovich)"_#Varshalovich1987.
The density of neighbors on the 3-sphere can be written as a sum of
Dirac-delta functions, one for each neighbor, weighted by species and
radial distance. Expanding this density function as a generalized
Fourier series in the basis functions, we can write each Fourier
coefficient as
:c,image(Eqs/compute_sna_atom2.jpg)
The {w_i'} neighbor weights are dimensionless numbers that are chosen
to distinguish atoms of different types, while the central atom is
arbitrarily assigned a unit weight. The function {fc(r)} ensures that
the contribution of each neighbor atom goes smoothly to zero at
{R_ii'}:
:c,image(Eqs/compute_sna_atom4.jpg)
The expansion coefficients {u^j_m,m'} are complex-valued and they are
not directly useful as descriptors, because they are not invariant
under rotation of the polar coordinate frame. However, the following
scalar triple products of expansion coefficients can be shown to be
real-valued and invariant under rotation "(Bartok)"_#Bartok20101.
:c,image(Eqs/compute_sna_atom3.jpg)
The constants {H^jmm'_j1m1m1'_j2m2m2'} are coupling coefficients,
analogous to Clebsch-Gordan coefficients for rotations on the
2-sphere. These invariants are the components of the bispectrum and
these are the quantities calculated by the compute {sna/atom}. They
characterize the strength of density correlations at three points on
the 3-sphere. The j2=0 subset form the power spectrum, which
characterizes the correlations of two points. The lowest-order
components describe the coarsest features of the density function,
while higher-order components reflect finer detail. Note that the
central atom is included in the expansion, so three point-correlations
can be either due to three neighbors, or two neighbors and the central
atom.
Compute {snad/atom} calculates the derivative of the bispectrum components
summed separately for each atom type:
:c,image(Eqs/compute_sna_atom5.jpg)
The sum is over all atoms {i'} of atom type {I}. For each atom {i},
this compute evaluates the above expression for each direction, each
atom type, and each bispectrum component. See section below on output
for a detailed explanation.
Compute {snav/atom} calculates the virial contribution due to the
derivatives:
:c,image(Eqs/compute_sna_atom6.jpg)
Again, the sum is over all atoms {i'} of atom type {I}. For each atom
{i}, this compute evaluates the above expression for each of the six
virial components, each atom type, and each bispectrum component. See
section below on output for a detailed explanation.
The value of all bispectrum components will be zero for atoms not in
the group. Neighbor atoms not in the group do not contribute to the
bispectrum of atoms in the group.
The neighbor list needed to compute this quantity is constructed each
time the calculation is performed (i.e. each time a snapshot of atoms
is dumped). Thus it can be inefficient to compute/dump this quantity
too frequently.
The argument {rcutfac} is a scale factor that controls the ratio of
atomic radius to radial cutoff distance.
The argument {rfac0} and the optional keyword {rmin0} define the
linear mapping from radial distance to polar angle {theta0} on the
3-sphere.
The argument {twojmax} and the keyword {diagonal} define which
bispectrum components are generated. See section below on output for a
detailed explanation of the number of bispectrum components and the
-ordered in which they are listed
+ordered in which they are listed.
The keyword {switchflag} can be used to turn off the switching
function.
The keyword {bzeroflag} determines whether or not {B0}, the bispectrum
components of an atom with no neighbors, are subtracted from
the calculated bispectrum components. This optional keyword is only
available for compute {sna/atom}, as {snad/atom} and {snav/atom}
are unaffected by the removal of constant terms.
+The keyword {quadraticflag} determines whether or not the
+quadratic analogs to the bispectrum quantities are generated.
+These are formed by taking the outer product of the vector
+of bispectrum components with itself.
+See section below on output for a
+detailed explanation of the number of quadratic terms and the
+ordered in which they are listed.
+
NOTE: If you have a bonded system, then the settings of
"special_bonds"_special_bonds.html command can remove pairwise
interactions between atoms in the same bond, angle, or dihedral. This
is the default setting for the "special_bonds"_special_bonds.html
command, and means those pairwise interactions do not appear in the
neighbor list. Because this fix uses the neighbor list, it also means
those pairs will not be included in the calculation. One way to get
around this, is to write a dump file, and use the "rerun"_rerun.html
command to compute the bispectrum components for snapshots in the dump
file. The rerun script can use a "special_bonds"_special_bonds.html
command that includes all pairs in the neighbor list.
;line
[Output info:]
Compute {sna/atom} calculates a per-atom array, each column
corresponding to a particular bispectrum component. The total number
-of columns and the identities of the bispectrum component contained in
+of columns and the identity of the bispectrum component contained in
each column depend on the values of {twojmax} and {diagonal}, as
described by the following piece of python code:
for j1 in range(0,twojmax+1):
if(diagonal==2):
print j1/2.,j1/2.,j1/2.
elif(diagonal==1):
for j in range(0,min(twojmax,2*j1)+1,2):
print j1/2.,j1/2.,j/2.
elif(diagonal==0):
for j2 in range(0,j1+1):
for j in range(j1-j2,min(twojmax,j1+j2)+1,2):
print j1/2.,j2/2.,j/2.
elif(diagonal==3):
for j2 in range(0,j1+1):
for j in range(j1-j2,min(twojmax,j1+j2)+1,2):
if (j>=j1): print j1/2.,j2/2.,j/2. :pre
Compute {snad/atom} evaluates a per-atom array. The columns are
arranged into {ntypes} blocks, listed in order of atom type {I}. Each
block contains three sub-blocks corresponding to the {x}, {y}, and {z}
components of the atom position. Each of these sub-blocks contains
one column for each bispectrum component, the same as for compute
{sna/atom}
Compute {snav/atom} evaluates a per-atom array. The columns are
arranged into {ntypes} blocks, listed in order of atom type {I}. Each
block contains six sub-blocks corresponding to the {xx}, {yy}, {zz},
{yz}, {xz}, and {xy} components of the virial tensor in Voigt
notation. Each of these sub-blocks contains one column for each
bispectrum component, the same as for compute {sna/atom}
+For example, if {K}=30 and ntypes=1, the number of columns in the per-atom
+arrays generated by {sna/atom}, {snad/atom}, and {snav/atom}
+are 30, 90, and 180, respectively. With {quadratic} value=1,
+the numbers of columns are 930, 2790, and 5580, respectively.
+
+If the {quadratic} keyword value is set to 1, then additional
+columns are appended to each per-atom array, corresponding to
+a matrix of quantities that are products of two bispectrum components. If the
+number of bispectrum components is {K}, then the number of matrix elements
+is {K}^2. These are output in subblocks of {K}^2 columns, using the same
+ordering of columns and sub-blocks as was used for the bispectrum
+components.
+
These values can be accessed by any command that uses per-atom values
from a compute as input. See "Section
6.15"_Section_howto.html#howto_15 for an overview of LAMMPS output
options.
[Restrictions:]
These computes are part of the SNAP package. They are only enabled if
LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
[Related commands:]
"pair_style snap"_pair_snap.html
[Default:]
The optional keyword defaults are {diagonal} = 0, {rmin0} = 0,
-{switchflag} = 1, {bzeroflag} = 0.
+{switchflag} = 1, {bzeroflag} = 1, {quadraticflag} = 0,
:line
:link(Thompson20141)
[(Thompson)] Thompson, Swiler, Trott, Foiles, Tucker, under review, preprint
available at "arXiv:1409.3880"_http://arxiv.org/abs/1409.3880
:link(Bartok20101)
[(Bartok)] Bartok, Payne, Risi, Csanyi, Phys Rev Lett, 104, 136403 (2010).
:link(Meremianin2006)
[(Meremianin)] Meremianin, J. Phys. A, 39, 3099 (2006).
:link(Varshalovich1987)
[(Varshalovich)] Varshalovich, Moskalev, Khersonskii, Quantum Theory
of Angular Momentum, World Scientific, Singapore (1987).
diff --git a/doc/src/dump.txt b/doc/src/dump.txt
index cb9a5ba74..69a00eb47 100644
--- a/doc/src/dump.txt
+++ b/doc/src/dump.txt
@@ -1,678 +1,676 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
dump command :h3
-"dump custom/vtk"_dump_custom_vtk.html command :h3
+"dump vtk"_dump_vtk.html command :h3
"dump h5md"_dump_h5md.html command :h3
+"dump molfile"_dump_molfile.html command :h3
+"dump netcdf"_dump_netcdf.html command :h3
"dump image"_dump_image.html command :h3
"dump movie"_dump_image.html command :h3
-"dump molfile"_dump_molfile.html command :h3
-"dump nc"_dump_nc.html command :h3
[Syntax:]
dump ID group-ID style N file args :pre
ID = user-assigned name for the dump :ulb,l
group-ID = ID of the group of atoms to be dumped :l
-style = {atom} or {atom/gz} or {atom/mpiio} or {cfg} or {cfg/gz} or {cfg/mpiio} or {dcd} or {xtc} or {xyz} or {xyz/gz} or {xyz/mpiio} or {h5md} or {image} or {movie} or {molfile} or {local} or {custom} or {custom/gz} or {custom/mpiio} :l
+style = {atom} or {atom/gz} or {atom/mpiio} or {cfg} or {cfg/gz} or {cfg/mpiio} or {custom} or {custom/gz} or {custom/mpiio} or {dcd} or {h5md} or {image} or or {local} or {molfile} or {movie} or {netcdf} or {netcdf/mpiio} or {vtk} or {xtc} or {xyz} or {xyz/gz} or {xyz/mpiio} :l
N = dump every this many timesteps :l
file = name of file to write dump info to :l
args = list of arguments for a particular style :l
{atom} args = none
{atom/gz} args = none
{atom/mpiio} args = none
{cfg} args = same as {custom} args, see below
{cfg/gz} args = same as {custom} args, see below
{cfg/mpiio} args = same as {custom} args, see below
+ {custom}, {custom/gz}, {custom/mpiio} args = see below
{dcd} args = none
+ {h5md} args = discussed on "dump h5md"_dump_h5md.html doc page
+ {image} args = discussed on "dump image"_dump_image.html doc page
+ {local} args = see below
+ {molfile} args = discussed on "dump molfile"_dump_molfile.html doc page
+ {movie} args = discussed on "dump image"_dump_image.html doc page
+ {netcdf} args = discussed on "dump netcdf"_dump_netcdf.html doc page
+ {netcdf/mpiio} args = discussed on "dump netcdf"_dump_netcdf.html doc page
+ {vtk} args = same as {custom} args, see below, also "dump vtk"_dump_vtk.html doc page
{xtc} args = none
- {xyz} args = none :pre
- {xyz/gz} args = none :pre
+ {xyz} args = none
+ {xyz/gz} args = none
{xyz/mpiio} args = none :pre
- {custom/vtk} args = similar to custom args below, discussed on "dump custom/vtk"_dump_custom_vtk.html doc page :pre
-
- {h5md} args = discussed on "dump h5md"_dump_h5md.html doc page :pre
-
- {image} args = discussed on "dump image"_dump_image.html doc page :pre
-
- {movie} args = discussed on "dump image"_dump_image.html doc page :pre
-
- {molfile} args = discussed on "dump molfile"_dump_molfile.html doc page
-
- {nc} args = discussed on "dump nc"_dump_nc.html doc page :pre
-
- {local} args = list of local attributes
- possible attributes = index, c_ID, c_ID\[I\], f_ID, f_ID\[I\]
- index = enumeration of local values
- c_ID = local vector calculated by a compute with ID
- c_ID\[I\] = Ith column of local array calculated by a compute with ID, I can include wildcard (see below)
- f_ID = local vector calculated by a fix with ID
- f_ID\[I\] = Ith column of local array calculated by a fix with ID, I can include wildcard (see below) :pre
-
- {custom} or {custom/gz} or {custom/mpiio} args = list of atom attributes
+{custom} or {custom/gz} or {custom/mpiio} args = list of atom attributes :l
possible attributes = id, mol, proc, procp1, type, element, mass,
x, y, z, xs, ys, zs, xu, yu, zu,
xsu, ysu, zsu, ix, iy, iz,
vx, vy, vz, fx, fy, fz,
q, mux, muy, muz, mu,
radius, diameter, omegax, omegay, omegaz,
angmomx, angmomy, angmomz, tqx, tqy, tqz,
c_ID, c_ID\[N\], f_ID, f_ID\[N\], v_name :pre
id = atom ID
mol = molecule ID
proc = ID of processor that owns atom
procp1 = ID+1 of processor that owns atom
type = atom type
element = name of atom element, as defined by "dump_modify"_dump_modify.html command
mass = atom mass
x,y,z = unscaled atom coordinates
xs,ys,zs = scaled atom coordinates
xu,yu,zu = unwrapped atom coordinates
xsu,ysu,zsu = scaled unwrapped atom coordinates
ix,iy,iz = box image that the atom is in
vx,vy,vz = atom velocities
fx,fy,fz = forces on atoms
q = atom charge
mux,muy,muz = orientation of dipole moment of atom
mu = magnitude of dipole moment of atom
radius,diameter = radius,diameter of spherical particle
omegax,omegay,omegaz = angular velocity of spherical particle
angmomx,angmomy,angmomz = angular momentum of aspherical particle
tqx,tqy,tqz = torque on finite-size particles
c_ID = per-atom vector calculated by a compute with ID
c_ID\[I\] = Ith column of per-atom array calculated by a compute with ID, I can include wildcard (see below)
f_ID = per-atom vector calculated by a fix with ID
f_ID\[I\] = Ith column of per-atom array calculated by a fix with ID, I can include wildcard (see below)
v_name = per-atom vector calculated by an atom-style variable with name
d_name = per-atom floating point vector with name, managed by fix property/atom
i_name = per-atom integer vector with name, managed by fix property/atom :pre
+
+{local} args = list of local attributes :l
+ possible attributes = index, c_ID, c_ID\[I\], f_ID, f_ID\[I\]
+ index = enumeration of local values
+ c_ID = local vector calculated by a compute with ID
+ c_ID\[I\] = Ith column of local array calculated by a compute with ID, I can include wildcard (see below)
+ f_ID = local vector calculated by a fix with ID
+ f_ID\[I\] = Ith column of local array calculated by a fix with ID, I can include wildcard (see below) :pre
+
:ule
[Examples:]
dump myDump all atom 100 dump.atom
dump myDump all atom/mpiio 100 dump.atom.mpiio
dump myDump all atom/gz 100 dump.atom.gz
dump 2 subgroup atom 50 dump.run.bin
dump 2 subgroup atom 50 dump.run.mpiio.bin
dump 4a all custom 100 dump.myforce.* id type x y vx fx
dump 4b flow custom 100 dump.%.myforce id type c_myF\[3\] v_ke
dump 4b flow custom 100 dump.%.myforce id type c_myF\[*\] v_ke
dump 2 inner cfg 10 dump.snap.*.cfg mass type xs ys zs vx vy vz
dump snap all cfg 100 dump.config.*.cfg mass type xs ys zs id type c_Stress\[2\]
dump 1 all xtc 1000 file.xtc :pre
[Description:]
Dump a snapshot of atom quantities to one or more files every N
timesteps in one of several styles. The {image} and {movie} styles are
the exception: the {image} style renders a JPG, PNG, or PPM image file
of the atom configuration every N timesteps while the {movie} style
combines and compresses them into a movie file; both are discussed in
detail on the "dump image"_dump_image.html doc page. The timesteps on
which dump output is written can also be controlled by a variable.
See the "dump_modify every"_dump_modify.html command.
Only information for atoms in the specified group is dumped. The
"dump_modify thresh and region"_dump_modify.html commands can also
alter what atoms are included. Not all styles support all these
options; see details below.
As described below, the filename determines the kind of output (text
or binary or gzipped, one big file or one per timestep, one big file
or multiple smaller files).
NOTE: Because periodic boundary conditions are enforced only on
timesteps when neighbor lists are rebuilt, the coordinates of an atom
written to a dump file may be slightly outside the simulation box.
Re-neighbor timesteps will not typically coincide with the timesteps
dump snapshots are written. See the "dump_modify
pbc"_dump_modify.html command if you with to force coordinates to be
strictly inside the simulation box.
NOTE: Unless the "dump_modify sort"_dump_modify.html option is
invoked, the lines of atom information written to dump files
(typically one line per atom) will be in an indeterminate order for
each snapshot. This is even true when running on a single processor,
if the "atom_modify sort"_atom_modify.html option is on, which it is
by default. In this case atoms are re-ordered periodically during a
simulation, due to spatial sorting. It is also true when running in
parallel, because data for a single snapshot is collected from
multiple processors, each of which owns a subset of the atoms.
For the {atom}, {custom}, {cfg}, and {local} styles, sorting is off by
default. For the {dcd}, {xtc}, {xyz}, and {molfile} styles, sorting by
atom ID is on by default. See the "dump_modify"_dump_modify.html doc
page for details.
The {atom/gz}, {cfg/gz}, {custom/gz}, and {xyz/gz} styles are identical
in command syntax to the corresponding styles without "gz", however,
they generate compressed files using the zlib library. Thus the filename
suffix ".gz" is mandatory. This is an alternative approach to writing
compressed files via a pipe, as done by the regular dump styles, which
may be required on clusters where the interface to the high-speed network
disallows using the fork() library call (which is needed for a pipe).
For the remainder of this doc page, you should thus consider the {atom}
and {atom/gz} styles (etc) to be inter-changeable, with the exception
of the required filename suffix.
As explained below, the {atom/mpiio}, {cfg/mpiio}, {custom/mpiio}, and
{xyz/mpiio} styles are identical in command syntax and in the format
of the dump files they create, to the corresponding styles without
"mpiio", except the single dump file they produce is written in
parallel via the MPI-IO library. For the remainder of this doc page,
you should thus consider the {atom} and {atom/mpiio} styles (etc) to
be inter-changeable. The one exception is how the filename is
specified for the MPI-IO styles, as explained below.
The precision of values output to text-based dump files can be
controlled by the "dump_modify format"_dump_modify.html command and
its options.
:line
The {style} keyword determines what atom quantities are written to the
file and in what format. Settings made via the
"dump_modify"_dump_modify.html command can also alter the format of
individual values and the file itself.
The {atom}, {local}, and {custom} styles create files in a simple text
format that is self-explanatory when viewing a dump file. Many of the
LAMMPS "post-processing tools"_Section_tools.html, including
"Pizza.py"_http://www.sandia.gov/~sjplimp/pizza.html, work with this
format, as does the "rerun"_rerun.html command.
For post-processing purposes the {atom}, {local}, and {custom} text
files are self-describing in the following sense.
The dimensions of the simulation box are included in each snapshot.
For an orthogonal simulation box this information is is formatted as:
ITEM: BOX BOUNDS xx yy zz
xlo xhi
ylo yhi
zlo zhi :pre
where xlo,xhi are the maximum extents of the simulation box in the
x-dimension, and similarly for y and z. The "xx yy zz" represent 6
characters that encode the style of boundary for each of the 6
simulation box boundaries (xlo,xhi and ylo,yhi and zlo,zhi). Each of
the 6 characters is either p = periodic, f = fixed, s = shrink wrap,
or m = shrink wrapped with a minimum value. See the
"boundary"_boundary.html command for details.
For triclinic simulation boxes (non-orthogonal), an orthogonal
bounding box which encloses the triclinic simulation box is output,
along with the 3 tilt factors (xy, xz, yz) of the triclinic box,
formatted as follows:
ITEM: BOX BOUNDS xy xz yz xx yy zz
xlo_bound xhi_bound xy
ylo_bound yhi_bound xz
zlo_bound zhi_bound yz :pre
The presence of the text "xy xz yz" in the ITEM line indicates that
the 3 tilt factors will be included on each of the 3 following lines.
This bounding box is convenient for many visualization programs. The
meaning of the 6 character flags for "xx yy zz" is the same as above.
Note that the first two numbers on each line are now xlo_bound instead
of xlo, etc, since they represent a bounding box. See "this
section"_Section_howto.html#howto_12 of the doc pages for a geometric
description of triclinic boxes, as defined by LAMMPS, simple formulas
for how the 6 bounding box extents (xlo_bound,xhi_bound,etc) are
calculated from the triclinic parameters, and how to transform those
parameters to and from other commonly used triclinic representations.
The "ITEM: ATOMS" line in each snapshot lists column descriptors for
the per-atom lines that follow. For example, the descriptors would be
"id type xs ys zs" for the default {atom} style, and would be the atom
attributes you specify in the dump command for the {custom} style.
For style {atom}, atom coordinates are written to the file, along with
the atom ID and atom type. By default, atom coords are written in a
scaled format (from 0 to 1). I.e. an x value of 0.25 means the atom
is at a location 1/4 of the distance from xlo to xhi of the box
boundaries. The format can be changed to unscaled coords via the
"dump_modify"_dump_modify.html settings. Image flags can also be
added for each atom via dump_modify.
Style {custom} allows you to specify a list of atom attributes to be
written to the dump file for each atom. Possible attributes are
listed above and will appear in the order specified. You cannot
specify a quantity that is not defined for a particular simulation -
such as {q} for atom style {bond}, since that atom style doesn't
assign charges. Dumps occur at the very end of a timestep, so atom
attributes will include effects due to fixes that are applied during
the timestep. An explanation of the possible dump custom attributes
is given below.
For style {local}, local output generated by "computes"_compute.html
and "fixes"_fix.html is used to generate lines of output that is
written to the dump file. This local data is typically calculated by
each processor based on the atoms it owns, but there may be zero or
more entities per atom, e.g. a list of bond distances. An explanation
of the possible dump local attributes is given below. Note that by
using input from the "compute
property/local"_compute_property_local.html command with dump local,
it is possible to generate information on bonds, angles, etc that can
be cut and pasted directly into a data file read by the
"read_data"_read_data.html command.
Style {cfg} has the same command syntax as style {custom} and writes
extended CFG format files, as used by the
"AtomEye"_http://mt.seas.upenn.edu/Archive/Graphics/A visualization
package. Since the extended CFG format uses a single snapshot of the
system per file, a wildcard "*" must be included in the filename, as
discussed below. The list of atom attributes for style {cfg} must
begin with either "mass type xs ys zs" or "mass type xsu ysu zsu"
since these quantities are needed to write the CFG files in the
appropriate format (though the "mass" and "type" fields do not appear
explicitly in the file). Any remaining attributes will be stored as
"auxiliary properties" in the CFG files. Note that you will typically
want to use the "dump_modify element"_dump_modify.html command with
CFG-formatted files, to associate element names with atom types, so
that AtomEye can render atoms appropriately. When unwrapped
coordinates {xsu}, {ysu}, and {zsu} are requested, the nominal AtomEye
periodic cell dimensions are expanded by a large factor UNWRAPEXPAND =
10.0, which ensures atoms that are displayed correctly for up to
UNWRAPEXPAND/2 periodic boundary crossings in any direction. Beyond
this, AtomEye will rewrap the unwrapped coordinates. The expansion
causes the atoms to be drawn farther away from the viewer, but it is
easy to zoom the atoms closer, and the interatomic distances are
unaffected.
The {dcd} style writes DCD files, a standard atomic trajectory format
used by the CHARMM, NAMD, and XPlor molecular dynamics packages. DCD
files are binary and thus may not be portable to different machines.
The number of atoms per snapshot cannot change with the {dcd} style.
The {unwrap} option of the "dump_modify"_dump_modify.html command
allows DCD coordinates to be written "unwrapped" by the image flags
for each atom. Unwrapped means that if the atom has passed through
a periodic boundary one or more times, the value is printed for what
the coordinate would be if it had not been wrapped back into the
periodic box. Note that these coordinates may thus be far outside
the box size stored with the snapshot.
The {xtc} style writes XTC files, a compressed trajectory format used
by the GROMACS molecular dynamics package, and described
"here"_http://manual.gromacs.org/current/online/xtc.html.
The precision used in XTC files can be adjusted via the
"dump_modify"_dump_modify.html command. The default value of 1000
means that coordinates are stored to 1/1000 nanometer accuracy. XTC
files are portable binary files written in the NFS XDR data format,
so that any machine which supports XDR should be able to read them.
The number of atoms per snapshot cannot change with the {xtc} style.
The {unwrap} option of the "dump_modify"_dump_modify.html command allows
XTC coordinates to be written "unwrapped" by the image flags for each
atom. Unwrapped means that if the atom has passed thru a periodic
boundary one or more times, the value is printed for what the
coordinate would be if it had not been wrapped back into the periodic
box. Note that these coordinates may thus be far outside the box size
stored with the snapshot.
The {xyz} style writes XYZ files, which is a simple text-based
coordinate format that many codes can read. Specifically it has
a line with the number of atoms, then a comment line that is
usually ignored followed by one line per atom with the atom type
and the x-, y-, and z-coordinate of that atom. You can use the
"dump_modify element"_dump_modify.html option to change the output
from using the (numerical) atom type to an element name (or some
other label). This will help many visualization programs to guess
bonds and colors.
Note that {atom}, {custom}, {dcd}, {xtc}, and {xyz} style dump files
can be read directly by "VMD"_http://www.ks.uiuc.edu/Research/vmd, a
popular molecular viewing program.
:line
Dumps are performed on timesteps that are a multiple of N (including
timestep 0) and on the last timestep of a minimization if the
minimization converges. Note that this means a dump will not be
performed on the initial timestep after the dump command is invoked,
if the current timestep is not a multiple of N. This behavior can be
changed via the "dump_modify first"_dump_modify.html command, which
can also be useful if the dump command is invoked after a minimization
ended on an arbitrary timestep. N can be changed between runs by
using the "dump_modify every"_dump_modify.html command (not allowed
for {dcd} style). The "dump_modify every"_dump_modify.html command
also allows a variable to be used to determine the sequence of
timesteps on which dump files are written. In this mode a dump on the
first timestep of a run will also not be written unless the
"dump_modify first"_dump_modify.html command is used.
The specified filename determines how the dump file(s) is written.
The default is to write one large text file, which is opened when the
dump command is invoked and closed when an "undump"_undump.html
command is used or when LAMMPS exits. For the {dcd} and {xtc} styles,
this is a single large binary file.
Dump filenames can contain two wildcard characters. If a "*"
character appears in the filename, then one file per snapshot is
written and the "*" character is replaced with the timestep value.
For example, tmp.dump.* becomes tmp.dump.0, tmp.dump.10000,
tmp.dump.20000, etc. This option is not available for the {dcd} and
{xtc} styles. Note that the "dump_modify pad"_dump_modify.html
command can be used to insure all timestep numbers are the same length
(e.g. 00010), which can make it easier to read a series of dump files
in order with some post-processing tools.
If a "%" character appears in the filename, then each of P processors
writes a portion of the dump file, and the "%" character is replaced
with the processor ID from 0 to P-1. For example, tmp.dump.% becomes
tmp.dump.0, tmp.dump.1, ... tmp.dump.P-1, etc. This creates smaller
files and can be a fast mode of output on parallel machines that
support parallel I/O for output. This option is not available for the
{dcd}, {xtc}, and {xyz} styles.
By default, P = the number of processors meaning one file per
processor, but P can be set to a smaller value via the {nfile} or
{fileper} keywords of the "dump_modify"_dump_modify.html command.
These options can be the most efficient way of writing out dump files
when running on large numbers of processors.
Note that using the "*" and "%" characters together can produce a
large number of small dump files!
For the {atom/mpiio}, {cfg/mpiio}, {custom/mpiio}, and {xyz/mpiio}
styles, a single dump file is written in parallel via the MPI-IO
library, which is part of the MPI standard for versions 2.0 and above.
Using MPI-IO requires two steps. First, build LAMMPS with its MPIIO
package installed, e.g.
make yes-mpiio # installs the MPIIO package
make mpi # build LAMMPS for your platform :pre
Second, use a dump filename which contains ".mpiio". Note that it
does not have to end in ".mpiio", just contain those characters.
Unlike MPI-IO restart files, which must be both written and read using
MPI-IO, the dump files produced by these MPI-IO styles are identical
in format to the files produced by their non-MPI-IO style
counterparts. This means you can write a dump file using MPI-IO and
use the "read_dump"_read_dump.html command or perform other
post-processing, just as if the dump file was not written using
MPI-IO.
Note that MPI-IO dump files are one large file which all processors
write to. You thus cannot use the "%" wildcard character described
above in the filename since that specifies generation of multiple
files. You can use the ".bin" suffix described below in an MPI-IO
dump file; again this file will be written in parallel and have the
same binary format as if it were written without MPI-IO.
If the filename ends with ".bin", the dump file (or files, if "*" or
"%" is also used) is written in binary format. A binary dump file
will be about the same size as a text version, but will typically
write out much faster. Of course, when post-processing, you will need
to convert it back to text format (see the "binary2txt
tool"_Section_tools.html#binary) or write your own code to read the
binary file. The format of the binary file can be understood by
looking at the tools/binary2txt.cpp file. This option is only
available for the {atom} and {custom} styles.
If the filename ends with ".gz", the dump file (or files, if "*" or "%"
is also used) is written in gzipped format. A gzipped dump file will
be about 3x smaller than the text version, but will also take longer
to write. This option is not available for the {dcd} and {xtc}
styles.
:line
Note that in the discussion which follows, for styles which can
reference values from a compute or fix, like the {custom}, {cfg}, or
{local} styles, the bracketed index I can be specified using a
wildcard asterisk with the index to effectively specify multiple
values. This takes the form "*" or "*n" or "n*" or "m*n". If N = the
size of the vector (for {mode} = scalar) or the number of columns in
the array (for {mode} = vector), then an asterisk with no numeric
values means all indices from 1 to N. A leading asterisk means all
indices from 1 to n (inclusive). A trailing asterisk means all
indices from n to N (inclusive). A middle asterisk means all indices
from m to n (inclusive).
Using a wildcard is the same as if the individual columns of the array
had been listed one by one. E.g. these 2 dump commands are
equivalent, since the "compute stress/atom"_compute_stress_atom.html
command creates a per-atom array with 6 columns:
compute myPress all stress/atom NULL
dump 2 all custom 100 tmp.dump id myPress\[*\]
dump 2 all custom 100 tmp.dump id myPress\[1\] myPress\[2\] myPress\[3\] &
myPress\[4\] myPress\[5\] myPress\[6\] :pre
:line
This section explains the local attributes that can be specified as
part of the {local} style.
The {index} attribute can be used to generate an index number from 1
to N for each line written into the dump file, where N is the total
number of local datums from all processors, or lines of output that
will appear in the snapshot. Note that because data from different
processors depend on what atoms they currently own, and atoms migrate
between processor, there is no guarantee that the same index will be
used for the same info (e.g. a particular bond) in successive
snapshots.
The {c_ID} and {c_ID\[I\]} attributes allow local vectors or arrays
calculated by a "compute"_compute.html to be output. The ID in the
attribute should be replaced by the actual ID of the compute that has
been defined previously in the input script. See the
"compute"_compute.html command for details. There are computes for
calculating local information such as indices, types, and energies for
bonds and angles.
Note that computes which calculate global or per-atom quantities, as
opposed to local quantities, cannot be output in a dump local command.
Instead, global quantities can be output by the "thermo_style
custom"_thermo_style.html command, and per-atom quantities can be
output by the dump custom command.
If {c_ID} is used as a attribute, then the local vector calculated by
the compute is printed. If {c_ID\[I\]} is used, then I must be in the
range from 1-M, which will print the Ith column of the local array
with M columns calculated by the compute. See the discussion above
for how I can be specified with a wildcard asterisk to effectively
specify multiple values.
The {f_ID} and {f_ID\[I\]} attributes allow local vectors or arrays
calculated by a "fix"_fix.html to be output. The ID in the attribute
should be replaced by the actual ID of the fix that has been defined
previously in the input script.
If {f_ID} is used as a attribute, then the local vector calculated by
the fix is printed. If {f_ID\[I\]} is used, then I must be in the
range from 1-M, which will print the Ith column of the local with M
columns calculated by the fix. See the discussion above for how I can
be specified with a wildcard asterisk to effectively specify multiple
values.
Here is an example of how to dump bond info for a system, including
the distance and energy of each bond:
compute 1 all property/local batom1 batom2 btype
compute 2 all bond/local dist eng
dump 1 all local 1000 tmp.dump index c_1\[1\] c_1\[2\] c_1\[3\] c_2\[1\] c_2\[2\] :pre
:line
This section explains the atom attributes that can be specified as
part of the {custom} and {cfg} styles.
The {id}, {mol}, {proc}, {procp1}, {type}, {element}, {mass}, {vx},
{vy}, {vz}, {fx}, {fy}, {fz}, {q} attributes are self-explanatory.
{Id} is the atom ID. {Mol} is the molecule ID, included in the data
file for molecular systems. {Proc} is the ID of the processor (0 to
Nprocs-1) that currently owns the atom. {Procp1} is the proc ID+1,
which can be convenient in place of a {type} attribute (1 to Ntypes)
for coloring atoms in a visualization program. {Type} is the atom
type (1 to Ntypes). {Element} is typically the chemical name of an
element, which you must assign to each type via the "dump_modify
element"_dump_modify.html command. More generally, it can be any
string you wish to associated with an atom type. {Mass} is the atom
mass. {Vx}, {vy}, {vz}, {fx}, {fy}, {fz}, and {q} are components of
atom velocity and force and atomic charge.
There are several options for outputting atom coordinates. The {x},
{y}, {z} attributes write atom coordinates "unscaled", in the
appropriate distance "units"_units.html (Angstroms, sigma, etc). Use
{xs}, {ys}, {zs} if you want the coordinates "scaled" to the box size,
so that each value is 0.0 to 1.0. If the simulation box is triclinic
(tilted), then all atom coords will still be between 0.0 and 1.0.
I.e. actual unscaled (x,y,z) = xs*A + ys*B + zs*C, where (A,B,C) are
the non-orthogonal vectors of the simulation box edges, as discussed
in "Section 6.12"_Section_howto.html#howto_12.
Use {xu}, {yu}, {zu} if you want the coordinates "unwrapped" by the
image flags for each atom. Unwrapped means that if the atom has
passed thru a periodic boundary one or more times, the value is
printed for what the coordinate would be if it had not been wrapped
back into the periodic box. Note that using {xu}, {yu}, {zu} means
that the coordinate values may be far outside the box bounds printed
with the snapshot. Using {xsu}, {ysu}, {zsu} is similar to using
{xu}, {yu}, {zu}, except that the unwrapped coordinates are scaled by
the box size. Atoms that have passed through a periodic boundary will
have the corresponding coordinate increased or decreased by 1.0.
The image flags can be printed directly using the {ix}, {iy}, {iz}
attributes. For periodic dimensions, they specify which image of the
simulation box the atom is considered to be in. An image of 0 means
it is inside the box as defined. A value of 2 means add 2 box lengths
to get the true value. A value of -1 means subtract 1 box length to
get the true value. LAMMPS updates these flags as atoms cross
periodic boundaries during the simulation.
The {mux}, {muy}, {muz} attributes are specific to dipolar systems
defined with an atom style of {dipole}. They give the orientation of
the atom's point dipole moment. The {mu} attribute gives the
magnitude of the atom's dipole moment.
The {radius} and {diameter} attributes are specific to spherical
particles that have a finite size, such as those defined with an atom
style of {sphere}.
The {omegax}, {omegay}, and {omegaz} attributes are specific to
finite-size spherical particles that have an angular velocity. Only
certain atom styles, such as {sphere} define this quantity.
The {angmomx}, {angmomy}, and {angmomz} attributes are specific to
finite-size aspherical particles that have an angular momentum. Only
the {ellipsoid} atom style defines this quantity.
The {tqx}, {tqy}, {tqz} attributes are for finite-size particles that
can sustain a rotational torque due to interactions with other
particles.
The {c_ID} and {c_ID\[I\]} attributes allow per-atom vectors or arrays
calculated by a "compute"_compute.html to be output. The ID in the
attribute should be replaced by the actual ID of the compute that has
been defined previously in the input script. See the
"compute"_compute.html command for details. There are computes for
calculating the per-atom energy, stress, centro-symmetry parameter,
and coordination number of individual atoms.
Note that computes which calculate global or local quantities, as
opposed to per-atom quantities, cannot be output in a dump custom
command. Instead, global quantities can be output by the
"thermo_style custom"_thermo_style.html command, and local quantities
can be output by the dump local command.
If {c_ID} is used as a attribute, then the per-atom vector calculated
by the compute is printed. If {c_ID\[I\]} is used, then I must be in
the range from 1-M, which will print the Ith column of the per-atom
array with M columns calculated by the compute. See the discussion
above for how I can be specified with a wildcard asterisk to
effectively specify multiple values.
The {f_ID} and {f_ID\[I\]} attributes allow vector or array per-atom
quantities calculated by a "fix"_fix.html to be output. The ID in the
attribute should be replaced by the actual ID of the fix that has been
defined previously in the input script. The "fix
ave/atom"_fix_ave_atom.html command is one that calculates per-atom
quantities. Since it can time-average per-atom quantities produced by
any "compute"_compute.html, "fix"_fix.html, or atom-style
"variable"_variable.html, this allows those time-averaged results to
be written to a dump file.
If {f_ID} is used as a attribute, then the per-atom vector calculated
by the fix is printed. If {f_ID\[I\]} is used, then I must be in the
range from 1-M, which will print the Ith column of the per-atom array
with M columns calculated by the fix. See the discussion above for
how I can be specified with a wildcard asterisk to effectively specify
multiple values.
The {v_name} attribute allows per-atom vectors calculated by a
"variable"_variable.html to be output. The name in the attribute
should be replaced by the actual name of the variable that has been
defined previously in the input script. Only an atom-style variable
can be referenced, since it is the only style that generates per-atom
values. Variables of style {atom} can reference individual atom
attributes, per-atom atom attributes, thermodynamic keywords, or
invoke other computes, fixes, or variables when they are evaluated, so
this is a very general means of creating quantities to output to a
dump file.
The {d_name} and {i_name} attributes allow to output custom per atom
floating point or integer properties that are managed by
"fix property/atom"_fix_property_atom.html.
See "Section 10"_Section_modify.html of the manual for information
on how to add new compute and fix styles to LAMMPS to calculate
per-atom quantities which could then be output into dump files.
:line
[Restrictions:]
To write gzipped dump files, you must either compile LAMMPS with the
-DLAMMPS_GZIP option or use the styles from the COMPRESS package
- see the "Making LAMMPS"_Section_start.html#start_2 section of
the documentation.
The {atom/gz}, {cfg/gz}, {custom/gz}, and {xyz/gz} styles are part
of the COMPRESS package. They are only enabled if LAMMPS was built
with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
The {atom/mpiio}, {cfg/mpiio}, {custom/mpiio}, and {xyz/mpiio} styles
are part of the MPIIO package. They are only enabled if LAMMPS was
built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
The {xtc} style is part of the MISC package. It is only enabled if
LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info. This is
because some machines may not support the low-level XDR data format
that XTC files are written with, which will result in a compile-time
error when a low-level include file is not found. Putting this style
in a package makes it easy to exclude from a LAMMPS build for those
machines. However, the MISC package also includes two compatibility
header files and associated functions, which should be a suitable
substitute on machines that do not have the appropriate native header
files. This option can be invoked at build time by adding
-DLAMMPS_XDR to the CCFLAGS variable in the appropriate low-level
Makefile, e.g. src/MAKE/Makefile.foo. This compatibility mode has
been tested successfully on Cray XT3/XT4/XT5 and IBM BlueGene/L
machines and should also work on IBM BG/P, and Windows XP/Vista/7
machines.
[Related commands:]
"dump h5md"_dump_h5md.html, "dump image"_dump_image.html,
"dump molfile"_dump_molfile.html, "dump_modify"_dump_modify.html,
"undump"_undump.html
[Default:]
The defaults for the {image} and {movie} styles are listed on the
"dump image"_dump_image.html doc page.
diff --git a/doc/src/dump_custom_vtk.txt b/doc/src/dump_custom_vtk.txt
deleted file mode 100644
index d4c16193d..000000000
--- a/doc/src/dump_custom_vtk.txt
+++ /dev/null
@@ -1,347 +0,0 @@
- "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
-
-:link(lws,http://lammps.sandia.gov)
-:link(ld,Manual.html)
-:link(lc,Section_commands.html#comm)
-
-:line
-
-dump custom/vtk command :h3
-
-[Syntax:]
-
-dump ID group-ID style N file args :pre
-
-ID = user-assigned name for the dump :ulb,l
-group-ID = ID of the group of atoms to be dumped :l
-style = {custom/vtk} :l
-N = dump every this many timesteps :l
-file = name of file to write dump info to :l
-args = list of arguments for a particular style :l
- {custom/vtk} args = list of atom attributes
- possible attributes = id, mol, proc, procp1, type, element, mass,
- x, y, z, xs, ys, zs, xu, yu, zu,
- xsu, ysu, zsu, ix, iy, iz,
- vx, vy, vz, fx, fy, fz,
- q, mux, muy, muz, mu,
- radius, diameter, omegax, omegay, omegaz,
- angmomx, angmomy, angmomz, tqx, tqy, tqz,
- c_ID, c_ID\[N\], f_ID, f_ID\[N\], v_name :pre
-
- id = atom ID
- mol = molecule ID
- proc = ID of processor that owns atom
- procp1 = ID+1 of processor that owns atom
- type = atom type
- element = name of atom element, as defined by "dump_modify"_dump_modify.html command
- mass = atom mass
- x,y,z = unscaled atom coordinates
- xs,ys,zs = scaled atom coordinates
- xu,yu,zu = unwrapped atom coordinates
- xsu,ysu,zsu = scaled unwrapped atom coordinates
- ix,iy,iz = box image that the atom is in
- vx,vy,vz = atom velocities
- fx,fy,fz = forces on atoms
- q = atom charge
- mux,muy,muz = orientation of dipole moment of atom
- mu = magnitude of dipole moment of atom
- radius,diameter = radius,diameter of spherical particle
- omegax,omegay,omegaz = angular velocity of spherical particle
- angmomx,angmomy,angmomz = angular momentum of aspherical particle
- tqx,tqy,tqz = torque on finite-size particles
- c_ID = per-atom vector calculated by a compute with ID
- c_ID\[I\] = Ith column of per-atom array calculated by a compute with ID, I can include wildcard (see below)
- f_ID = per-atom vector calculated by a fix with ID
- f_ID\[I\] = Ith column of per-atom array calculated by a fix with ID, I can include wildcard (see below)
- v_name = per-atom vector calculated by an atom-style variable with name
- d_name = per-atom floating point vector with name, managed by fix property/atom
- i_name = per-atom integer vector with name, managed by fix property/atom :pre
-:ule
-
-[Examples:]
-
-dump dmpvtk all custom/vtk 100 dump*.myforce.vtk id type vx fx
-dump dmpvtp flow custom/vtk 100 dump*.%.displace.vtp id type c_myD\[1\] c_myD\[2\] c_myD\[3\] v_ke :pre
-
-The style {custom/vtk} is similar to the "custom"_dump.html style but
-uses the VTK library to write data to VTK simple legacy or XML format
-depending on the filename extension specified. This can be either
-{*.vtk} for the legacy format or {*.vtp} and {*.vtu}, respectively,
-for the XML format; see the "VTK
-homepage"_http://www.vtk.org/VTK/img/file-formats.pdf for a detailed
-description of these formats. Since this naming convention conflicts
-with the way binary output is usually specified (see below),
-"dump_modify binary"_dump_modify.html allows to set the binary
-flag for this dump style explicitly.
-
-[Description:]
-
-Dump a snapshot of atom quantities to one or more files every N
-timesteps in a format readable by the "VTK visualization
-toolkit"_http://www.vtk.org or other visualization tools that use it,
-e.g. "ParaView"_http://www.paraview.org. The timesteps on which dump
-output is written can also be controlled by a variable; see the
-"dump_modify every"_dump_modify.html command for details.
-
-Only information for atoms in the specified group is dumped. The
-"dump_modify thresh and region"_dump_modify.html commands can also
-alter what atoms are included; see details below.
-
-As described below, special characters ("*", "%") in the filename
-determine the kind of output.
-
-IMPORTANT NOTE: Because periodic boundary conditions are enforced only
-on timesteps when neighbor lists are rebuilt, the coordinates of an
-atom written to a dump file may be slightly outside the simulation
-box.
-
-IMPORTANT NOTE: Unless the "dump_modify sort"_dump_modify.html
-option is invoked, the lines of atom information written to dump files
-will be in an indeterminate order for each snapshot. This is even
-true when running on a single processor, if the "atom_modify
-sort"_atom_modify.html option is on, which it is by default. In this
-case atoms are re-ordered periodically during a simulation, due to
-spatial sorting. It is also true when running in parallel, because
-data for a single snapshot is collected from multiple processors, each
-of which owns a subset of the atoms.
-
-For the {custom/vtk} style, sorting is off by default. See the
-"dump_modify"_dump_modify.html doc page for details.
-
-:line
-
-The dimensions of the simulation box are written to a separate file
-for each snapshot (either in legacy VTK or XML format depending on
-the format of the main dump file) with the suffix {_boundingBox}
-appended to the given dump filename.
-
-For an orthogonal simulation box this information is saved as a
-rectilinear grid (legacy .vtk or .vtr XML format).
-
-Triclinic simulation boxes (non-orthogonal) are saved as
-hexahedrons in either legacy .vtk or .vtu XML format.
-
-Style {custom/vtk} allows you to specify a list of atom attributes
-to be written to the dump file for each atom. Possible attributes
-are listed above. In contrast to the {custom} style, the attributes
-are rearranged to ensure correct ordering of vector components
-(except for computes and fixes - these have to be given in the right
-order) and duplicate entries are removed.
-
-You cannot specify a quantity that is not defined for a particular
-simulation - such as {q} for atom style {bond}, since that atom style
-doesn't assign charges. Dumps occur at the very end of a timestep,
-so atom attributes will include effects due to fixes that are applied
-during the timestep. An explanation of the possible dump custom/vtk attributes
-is given below. Since position data is required to write VTK files "x y z"
-do not have to be specified explicitly.
-
-The VTK format uses a single snapshot of the system per file, thus
-a wildcard "*" must be included in the filename, as discussed below.
-Otherwise the dump files will get overwritten with the new snapshot
-each time.
-
-:line
-
-Dumps are performed on timesteps that are a multiple of N (including
-timestep 0) and on the last timestep of a minimization if the
-minimization converges. Note that this means a dump will not be
-performed on the initial timestep after the dump command is invoked,
-if the current timestep is not a multiple of N. This behavior can be
-changed via the "dump_modify first"_dump_modify.html command, which
-can also be useful if the dump command is invoked after a minimization
-ended on an arbitrary timestep. N can be changed between runs by
-using the "dump_modify every"_dump_modify.html command.
-The "dump_modify every"_dump_modify.html command
-also allows a variable to be used to determine the sequence of
-timesteps on which dump files are written. In this mode a dump on the
-first timestep of a run will also not be written unless the
-"dump_modify first"_dump_modify.html command is used.
-
-Dump filenames can contain two wildcard characters. If a "*"
-character appears in the filename, then one file per snapshot is
-written and the "*" character is replaced with the timestep value.
-For example, tmp.dump*.vtk becomes tmp.dump0.vtk, tmp.dump10000.vtk,
-tmp.dump20000.vtk, etc. Note that the "dump_modify pad"_dump_modify.html
-command can be used to insure all timestep numbers are the same length
-(e.g. 00010), which can make it easier to read a series of dump files
-in order with some post-processing tools.
-
-If a "%" character appears in the filename, then each of P processors
-writes a portion of the dump file, and the "%" character is replaced
-with the processor ID from 0 to P-1 preceded by an underscore character.
-For example, tmp.dump%.vtp becomes tmp.dump_0.vtp, tmp.dump_1.vtp, ...
-tmp.dump_P-1.vtp, etc. This creates smaller files and can be a fast
-mode of output on parallel machines that support parallel I/O for output.
-
-By default, P = the number of processors meaning one file per
-processor, but P can be set to a smaller value via the {nfile} or
-{fileper} keywords of the "dump_modify"_dump_modify.html command.
-These options can be the most efficient way of writing out dump files
-when running on large numbers of processors.
-
-For the legacy VTK format "%" is ignored and P = 1, i.e., only
-processor 0 does write files.
-
-Note that using the "*" and "%" characters together can produce a
-large number of small dump files!
-
-If {dump_modify binary} is used, the dump file (or files, if "*" or
-"%" is also used) is written in binary format. A binary dump file
-will be about the same size as a text version, but will typically
-write out much faster.
-
-:line
-
-This section explains the atom attributes that can be specified as
-part of the {custom/vtk} style.
-
-The {id}, {mol}, {proc}, {procp1}, {type}, {element}, {mass}, {vx},
-{vy}, {vz}, {fx}, {fy}, {fz}, {q} attributes are self-explanatory.
-
-{Id} is the atom ID. {Mol} is the molecule ID, included in the data
-file for molecular systems. {Proc} is the ID of the processor (0 to
-Nprocs-1) that currently owns the atom. {Procp1} is the proc ID+1,
-which can be convenient in place of a {type} attribute (1 to Ntypes)
-for coloring atoms in a visualization program. {Type} is the atom
-type (1 to Ntypes). {Element} is typically the chemical name of an
-element, which you must assign to each type via the "dump_modify
-element"_dump_modify.html command. More generally, it can be any
-string you wish to associated with an atom type. {Mass} is the atom
-mass. {Vx}, {vy}, {vz}, {fx}, {fy}, {fz}, and {q} are components of
-atom velocity and force and atomic charge.
-
-There are several options for outputting atom coordinates. The {x},
-{y}, {z} attributes write atom coordinates "unscaled", in the
-appropriate distance "units"_units.html (Angstroms, sigma, etc). Use
-{xs}, {ys}, {zs} if you want the coordinates "scaled" to the box size,
-so that each value is 0.0 to 1.0. If the simulation box is triclinic
-(tilted), then all atom coords will still be between 0.0 and 1.0.
-I.e. actual unscaled (x,y,z) = xs*A + ys*B + zs*C, where (A,B,C) are
-the non-orthogonal vectors of the simulation box edges, as discussed
-in "Section 6.12"_Section_howto.html#howto_12.
-
-Use {xu}, {yu}, {zu} if you want the coordinates "unwrapped" by the
-image flags for each atom. Unwrapped means that if the atom has
-passed thru a periodic boundary one or more times, the value is
-printed for what the coordinate would be if it had not been wrapped
-back into the periodic box. Note that using {xu}, {yu}, {zu} means
-that the coordinate values may be far outside the box bounds printed
-with the snapshot. Using {xsu}, {ysu}, {zsu} is similar to using
-{xu}, {yu}, {zu}, except that the unwrapped coordinates are scaled by
-the box size. Atoms that have passed through a periodic boundary will
-have the corresponding coordinate increased or decreased by 1.0.
-
-The image flags can be printed directly using the {ix}, {iy}, {iz}
-attributes. For periodic dimensions, they specify which image of the
-simulation box the atom is considered to be in. An image of 0 means
-it is inside the box as defined. A value of 2 means add 2 box lengths
-to get the true value. A value of -1 means subtract 1 box length to
-get the true value. LAMMPS updates these flags as atoms cross
-periodic boundaries during the simulation.
-
-The {mux}, {muy}, {muz} attributes are specific to dipolar systems
-defined with an atom style of {dipole}. They give the orientation of
-the atom's point dipole moment. The {mu} attribute gives the
-magnitude of the atom's dipole moment.
-
-The {radius} and {diameter} attributes are specific to spherical
-particles that have a finite size, such as those defined with an atom
-style of {sphere}.
-
-The {omegax}, {omegay}, and {omegaz} attributes are specific to
-finite-size spherical particles that have an angular velocity. Only
-certain atom styles, such as {sphere} define this quantity.
-
-The {angmomx}, {angmomy}, and {angmomz} attributes are specific to
-finite-size aspherical particles that have an angular momentum. Only
-the {ellipsoid} atom style defines this quantity.
-
-The {tqx}, {tqy}, {tqz} attributes are for finite-size particles that
-can sustain a rotational torque due to interactions with other
-particles.
-
-The {c_ID} and {c_ID\[I\]} attributes allow per-atom vectors or arrays
-calculated by a "compute"_compute.html to be output. The ID in the
-attribute should be replaced by the actual ID of the compute that has
-been defined previously in the input script. See the
-"compute"_compute.html command for details. There are computes for
-calculating the per-atom energy, stress, centro-symmetry parameter,
-and coordination number of individual atoms.
-
-Note that computes which calculate global or local quantities, as
-opposed to per-atom quantities, cannot be output in a dump custom/vtk
-command. Instead, global quantities can be output by the
-"thermo_style custom"_thermo_style.html command, and local quantities
-can be output by the dump local command.
-
-If {c_ID} is used as a attribute, then the per-atom vector calculated
-by the compute is printed. If {c_ID\[I\]} is used, then I must be in
-the range from 1-M, which will print the Ith column of the per-atom
-array with M columns calculated by the compute. See the discussion
-above for how I can be specified with a wildcard asterisk to
-effectively specify multiple values.
-
-The {f_ID} and {f_ID\[I\]} attributes allow vector or array per-atom
-quantities calculated by a "fix"_fix.html to be output. The ID in the
-attribute should be replaced by the actual ID of the fix that has been
-defined previously in the input script. The "fix
-ave/atom"_fix_ave_atom.html command is one that calculates per-atom
-quantities. Since it can time-average per-atom quantities produced by
-any "compute"_compute.html, "fix"_fix.html, or atom-style
-"variable"_variable.html, this allows those time-averaged results to
-be written to a dump file.
-
-If {f_ID} is used as a attribute, then the per-atom vector calculated
-by the fix is printed. If {f_ID\[I\]} is used, then I must be in the
-range from 1-M, which will print the Ith column of the per-atom array
-with M columns calculated by the fix. See the discussion above for
-how I can be specified with a wildcard asterisk to effectively specify
-multiple values.
-
-The {v_name} attribute allows per-atom vectors calculated by a
-"variable"_variable.html to be output. The name in the attribute
-should be replaced by the actual name of the variable that has been
-defined previously in the input script. Only an atom-style variable
-can be referenced, since it is the only style that generates per-atom
-values. Variables of style {atom} can reference individual atom
-attributes, per-atom atom attributes, thermodynamic keywords, or
-invoke other computes, fixes, or variables when they are evaluated, so
-this is a very general means of creating quantities to output to a
-dump file.
-
-The {d_name} and {i_name} attributes allow to output custom per atom
-floating point or integer properties that are managed by
-"fix property/atom"_fix_property_atom.html.
-
-See "Section 10"_Section_modify.html of the manual for information
-on how to add new compute and fix styles to LAMMPS to calculate
-per-atom quantities which could then be output into dump files.
-
-:line
-
-[Restrictions:]
-
-The {custom/vtk} style does not support writing of gzipped dump files.
-
-The {custom/vtk} dump style is part of the USER-VTK package. It is
-only enabled if LAMMPS was built with that package. See the "Making
-LAMMPS"_Section_start.html#start_3 section for more info.
-
-To use this dump style, you also must link to the VTK library. See
-the info in lib/vtk/README and insure the Makefile.lammps file in that
-directory is appropriate for your machine.
-
-The {custom/vtk} dump style neither supports buffering nor custom
-format strings.
-
-[Related commands:]
-
-"dump"_dump.html, "dump image"_dump_image.html,
-"dump_modify"_dump_modify.html, "undump"_undump.html
-
-[Default:]
-
-By default, files are written in ASCII format. If the file extension
-is not one of .vtk, .vtp or .vtu, the legacy VTK file format is used.
-
diff --git a/doc/src/dump_h5md.txt b/doc/src/dump_h5md.txt
index d797e633e..93c87d85b 100644
--- a/doc/src/dump_h5md.txt
+++ b/doc/src/dump_h5md.txt
@@ -1,123 +1,123 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
dump h5md command :h3
[Syntax:]
dump ID group-ID h5md N file.h5 args :pre
ID = user-assigned name for the dump :ulb,l
group-ID = ID of the group of atoms to be imaged :l
h5md = style of dump command (other styles {atom} or {cfg} or {dcd} or {xtc} or {xyz} or {local} or {custom} are discussed on the "dump"_dump.html doc page) :l
N = dump every this many timesteps :l
file.h5 = name of file to write to :l
-args = list of data elements to dump, with their dump "subintervals".
-At least one element must be given and image may only be present if
-position is specified first. :l
+args = list of data elements to dump, with their dump "subintervals"
position options
image
velocity options
force options
species options
file_from ID: do not open a new file, re-use the already opened file from dump ID
box value = {yes} or {no}
create_group value = {yes} or {no}
author value = quoted string :pre
+:ule
-For the elements {position}, {velocity}, {force} and {species}, one
-may specify a sub-interval to write the data only every N_element
-iterations of the dump (i.e. every N*N_element time steps). This is
-specified by the option
+Note that at least one element must be specified and image may only be
+present if position is specified first.
- every N_element :pre
+For the elements {position}, {velocity}, {force} and {species}, a
+sub-interval may be specified to write the data only every N_element
+iterations of the dump (i.e. every N*N_element time steps). This is
+specified by this option directly following the element declaration:
-that follows directly the element declaration.
+every N_element :pre
:ule
[Examples:]
dump h5md1 all h5md 100 dump_h5md.h5 position image
dump h5md1 all h5md 100 dump_h5md.h5 position velocity every 10
dump h5md1 all h5md 100 dump_h5md.h5 velocity author "John Doe" :pre
[Description:]
Dump a snapshot of atom coordinates every N timesteps in the
"HDF5"_HDF5_ws based "H5MD"_h5md file format "(de Buyl)"_#h5md_cpc.
HDF5 files are binary, portable and self-describing. This dump style
will write only one file, on the root node.
Several dumps may write to the same file, by using file_from and
referring to a previously defined dump. Several groups may also be
stored within the same file by defining several dumps. A dump that
refers (via {file_from}) to an already open dump ID and that concerns
another particle group must specify {create_group yes}.
:link(h5md,http://nongnu.org/h5md/)
Each data element is written every N*N_element steps. For {image}, no
subinterval is needed as it must be present at the same interval as
{position}. {image} must be given after {position} in any case. The
box information (edges in each dimension) is stored at the same
interval than the {position} element, if present. Else it is stored
every N steps.
NOTE: Because periodic boundary conditions are enforced only on
timesteps when neighbor lists are rebuilt, the coordinates of an atom
written to a dump file may be slightly outside the simulation box.
[Use from write_dump:]
It is possible to use this dump style with the
"write_dump"_write_dump.html command. In this case, the subintervals
must not be set at all. The write_dump command can be used either to
create a new file or to add current data to an existing dump file by
using the {file_from} keyword.
Typically, the {species} data is fixed. The following two commands
store the position data every 100 timesteps, with the image data, and
store once the species data in the same file.
dump h5md1 all h5md 100 dump.h5 position image
write_dump all h5md dump.h5 file_from h5md1 species :pre
:line
[Restrictions:]
The number of atoms per snapshot cannot change with the h5md style.
The position data is stored wrapped (box boundaries not enforced, see
note above). Only orthogonal domains are currently supported. This is
a limitation of the present dump h5md command and not of H5MD itself.
The {h5md} dump style is part of the USER-H5MD package. It is only
enabled if LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info. It also
requires (i) building the ch5md library provided with LAMMPS (See the
"Making LAMMPS"_Section_start.html#start_3 section for more info.) and
(ii) having the "HDF5"_HDF5_ws library installed (C bindings are
sufficient) on your system. The library ch5md is compiled with the
h5cc wrapper provided by the HDF5 library.
:link(HDF5_ws,http://www.hdfgroup.org/HDF5/)
:line
[Related commands:]
"dump"_dump.html, "dump_modify"_dump_modify.html, "undump"_undump.html
:line
:link(h5md_cpc)
[(de Buyl)] de Buyl, Colberg and Hofling, H5MD: A structured,
efficient, and portable file format for molecular data,
Comp. Phys. Comm. 185(6), 1546-1553 (2014) -
"\[arXiv:1308.6382\]"_http://arxiv.org/abs/1308.6382/.
diff --git a/doc/src/dump_nc.txt b/doc/src/dump_nc.txt
deleted file mode 100644
index 0b81ee6a3..000000000
--- a/doc/src/dump_nc.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
-
-:link(lws,http://lammps.sandia.gov)
-:link(ld,Manual.html)
-:link(lc,Section_commands.html#comm)
-
-:line
-
-dump nc command :h3
-dump nc/mpiio command :h3
-
-[Syntax:]
-
-dump ID group-ID nc N file.nc args
-dump ID group-ID nc/mpiio N file.nc args :pre
-
-ID = user-assigned name for the dump :ulb,l
-group-ID = ID of the group of atoms to be imaged :l
-{nc} or {nc/mpiio} = style of dump command (other styles {atom} or {cfg} or {dcd} or {xtc} or {xyz} or {local} or {custom} are discussed on the "dump"_dump.html doc page) :l
-N = dump every this many timesteps :l
-file.nc = name of file to write to :l
-args = list of per atom data elements to dump, same as for the 'custom' dump style. :l,ule
-
-[Examples:]
-
-dump 1 all nc 100 traj.nc type x y z vx vy vz
-dump_modify 1 append yes at -1 global c_thermo_pe c_thermo_temp c_thermo_press :pre
-
-dump 1 all nc/mpiio 1000 traj.nc id type x y z :pre
-
-[Description:]
-
-Dump a snapshot of atom coordinates every N timesteps in Amber-style
-NetCDF file format. NetCDF files are binary, portable and
-self-describing. This dump style will write only one file on the root
-node. The dump style {nc} uses the "standard NetCDF
-library"_netcdf-home all data is collected on one processor and then
-written to the dump file. Dump style {nc/mpiio} used the "parallel
-NetCDF library"_pnetcdf-home and MPI-IO; it has better performance on
-a larger number of processors. Note that 'nc' outputs all atoms sorted
-by atom tag while 'nc/mpiio' outputs in order of the MPI rank.
-
-In addition to per-atom data, also global (i.e. not per atom, but per
-frame) quantities can be included in the dump file. This can be
-variables, output from computes or fixes data prefixed with v_, c_ and
-f_, respectively. These properties are included via
-"dump_modify"_dump_modify.html {global}.
-
-:link(netcdf-home,http://www.unidata.ucar.edu/software/netcdf/)
-:link(pnetcdf-home,http://trac.mcs.anl.gov/projects/parallel-netcdf/)
-
-:line
-
-[Restrictions:]
-
-The {nc} and {nc/mpiio} dump styles are part of the USER-NC-DUMP
-package. It is only enabled if LAMMPS was built with that
-package. See the "Making LAMMPS"_Section_start.html#start_3 section
-for more info.
-
-:line
-
-[Related commands:]
-
-"dump"_dump.html, "dump_modify"_dump_modify.html, "undump"_undump.html
-
diff --git a/doc/src/dump_netcdf.txt b/doc/src/dump_netcdf.txt
new file mode 100644
index 000000000..4e8265669
--- /dev/null
+++ b/doc/src/dump_netcdf.txt
@@ -0,0 +1,82 @@
+"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
+
+:link(lws,http://lammps.sandia.gov)
+:link(ld,Manual.html)
+:link(lc,Section_commands.html#comm)
+
+:line
+
+dump netcdf command :h3
+dump netcdf/mpiio command :h3
+
+[Syntax:]
+
+dump ID group-ID netcdf N file args
+dump ID group-ID netcdf/mpiio N file args :pre
+
+ID = user-assigned name for the dump :ulb,l
+group-ID = ID of the group of atoms to be imaged :l
+{netcdf} or {netcdf/mpiio} = style of dump command (other styles {atom} or {cfg} or {dcd} or {xtc} or {xyz} or {local} or {custom} are discussed on the "dump"_dump.html doc page) :l
+N = dump every this many timesteps :l
+file = name of file to write dump info to :l
+args = list of atom attributes, same as for "dump_style custom"_dump.html :l,ule
+
+[Examples:]
+
+dump 1 all netcdf 100 traj.nc type x y z vx vy vz
+dump_modify 1 append yes at -1 global c_thermo_pe c_thermo_temp c_thermo_press
+dump 1 all netcdf/mpiio 1000 traj.nc id type x y z :pre
+
+[Description:]
+
+Dump a snapshot of atom coordinates every N timesteps in Amber-style
+NetCDF file format. NetCDF files are binary, portable and
+self-describing. This dump style will write only one file on the root
+node. The dump style {netcdf} uses the "standard NetCDF
+library"_netcdf-home. All data is collected on one processor and then
+written to the dump file. Dump style {netcdf/mpiio} uses the
+"parallel NetCDF library"_pnetcdf-home and MPI-IO to write to the dump
+file in parallel; it has better performance on a larger number of
+processors. Note that style {netcdf} outputs all atoms sorted by atom
+tag while style {netcdf/mpiio} outputs atoms in order of their MPI
+rank.
+
+NetCDF files can be directly visualized via the following tools:
+
+Ovito (http://www.ovito.org/). Ovito supports the AMBER convention and
+all of the above extensions. :ule,b
+
+VMD (http://www.ks.uiuc.edu/Research/vmd/). :l
+
+AtomEye (http://www.libatoms.org/). The libAtoms version of AtomEye
+contains a NetCDF reader that is not present in the standard
+distribution of AtomEye. :l,ule
+
+In addition to per-atom data, global data can be included in the dump
+file, which are the kinds of values output by the
+"thermo_style"_thermo_style.html command . See "Section howto
+6.15"_Section_howto.html#howto_15 for an explanation of per-atom
+versus global data. The global output written into the dump file can
+be from computes, fixes, or variables, by prefixing the compute/fix ID
+or variable name with "c_" or "f_" or "v_" respectively, as in the
+example above. These global values are specified via the "dump_modify
+global"_dump_modify.html command.
+
+:link(netcdf-home,http://www.unidata.ucar.edu/software/netcdf/)
+:link(pnetcdf-home,http://trac.mcs.anl.gov/projects/parallel-netcdf/)
+
+:line
+
+[Restrictions:]
+
+The {netcdf} and {netcdf/mpiio} dump styles are part of the
+USER-NETCDF package. They are only enabled if LAMMPS was built with
+that package. See the "Making LAMMPS"_Section_start.html#start_3
+section for more info.
+
+:line
+
+[Related commands:]
+
+"dump"_dump.html, "dump_modify"_dump_modify.html, "undump"_undump.html
+
diff --git a/doc/src/dump_vtk.txt b/doc/src/dump_vtk.txt
new file mode 100644
index 000000000..21502e7f4
--- /dev/null
+++ b/doc/src/dump_vtk.txt
@@ -0,0 +1,179 @@
+ "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
+
+:link(lws,http://lammps.sandia.gov)
+:link(ld,Manual.html)
+:link(lc,Section_commands.html#comm)
+
+:line
+
+dump vtk command :h3
+
+[Syntax:]
+
+dump ID group-ID vtk N file args :pre
+
+ID = user-assigned name for the dump
+group-ID = ID of the group of atoms to be dumped
+vtk = style of dump command (other styles {atom} or {cfg} or {dcd} or {xtc} or {xyz} or {local} or {custom} are discussed on the "dump"_dump.html doc page)
+N = dump every this many timesteps
+file = name of file to write dump info to
+args = same as arguments for "dump_style custom"_dump.html :ul
+
+[Examples:]
+
+dump dmpvtk all vtk 100 dump*.myforce.vtk id type vx fx
+dump dmpvtp flow vtk 100 dump*.%.displace.vtp id type c_myD\[1\] c_myD\[2\] c_myD\[3\] v_ke :pre
+
+[Description:]
+
+Dump a snapshot of atom quantities to one or more files every N
+timesteps in a format readable by the "VTK visualization
+toolkit"_http://www.vtk.org or other visualization tools that use it,
+e.g. "ParaView"_http://www.paraview.org. The timesteps on which dump
+output is written can also be controlled by a variable; see the
+"dump_modify every"_dump_modify.html command for details.
+
+This dump style is similar to "dump_style custom"_dump.html but uses
+the VTK library to write data to VTK simple legacy or XML format
+depending on the filename extension specified for the dump file. This
+can be either {*.vtk} for the legacy format or {*.vtp} and {*.vtu},
+respectively, for XML format; see the "VTK
+homepage"_http://www.vtk.org/VTK/img/file-formats.pdf for a detailed
+description of these formats. Since this naming convention conflicts
+with the way binary output is usually specified (see below), the
+"dump_modify binary"_dump_modify.html command allows setting of a
+binary option for this dump style explicitly.
+
+Only information for atoms in the specified group is dumped. The
+"dump_modify thresh and region"_dump_modify.html commands can also
+alter what atoms are included; see details below.
+
+As described below, special characters ("*", "%") in the filename
+determine the kind of output.
+
+IMPORTANT NOTE: Because periodic boundary conditions are enforced only
+on timesteps when neighbor lists are rebuilt, the coordinates of an
+atom written to a dump file may be slightly outside the simulation
+box.
+
+IMPORTANT NOTE: Unless the "dump_modify sort"_dump_modify.html option
+is invoked, the lines of atom information written to dump files will
+be in an indeterminate order for each snapshot. This is even true
+when running on a single processor, if the "atom_modify
+sort"_atom_modify.html option is on, which it is by default. In this
+case atoms are re-ordered periodically during a simulation, due to
+spatial sorting. It is also true when running in parallel, because
+data for a single snapshot is collected from multiple processors, each
+of which owns a subset of the atoms.
+
+For the {vtk} style, sorting is off by default. See the
+"dump_modify"_dump_modify.html doc page for details.
+
+:line
+
+The dimensions of the simulation box are written to a separate file
+for each snapshot (either in legacy VTK or XML format depending on the
+format of the main dump file) with the suffix {_boundingBox} appended
+to the given dump filename.
+
+For an orthogonal simulation box this information is saved as a
+rectilinear grid (legacy .vtk or .vtr XML format).
+
+Triclinic simulation boxes (non-orthogonal) are saved as
+hexahedrons in either legacy .vtk or .vtu XML format.
+
+Style {vtk} allows you to specify a list of atom attributes to be
+written to the dump file for each atom. The list of possible attributes
+is the same as for the "dump_style custom"_dump.html command; see
+its doc page for a listing and an explanation of each attribute.
+
+NOTE: Since position data is required to write VTK files the atom
+attributes "x y z" do not have to be specified explicitly; they will
+be included in the dump file regardless. Also, in contrast to the
+{custom} style, the specified {vtk} attributes are rearranged to
+ensure correct ordering of vector components (except for computes and
+fixes - these have to be given in the right order) and duplicate
+entries are removed.
+
+The VTK format uses a single snapshot of the system per file, thus
+a wildcard "*" must be included in the filename, as discussed below.
+Otherwise the dump files will get overwritten with the new snapshot
+each time.
+
+:line
+
+Dumps are performed on timesteps that are a multiple of N (including
+timestep 0) and on the last timestep of a minimization if the
+minimization converges. Note that this means a dump will not be
+performed on the initial timestep after the dump command is invoked,
+if the current timestep is not a multiple of N. This behavior can be
+changed via the "dump_modify first"_dump_modify.html command, which
+can also be useful if the dump command is invoked after a minimization
+ended on an arbitrary timestep. N can be changed between runs by
+using the "dump_modify every"_dump_modify.html command.
+The "dump_modify every"_dump_modify.html command
+also allows a variable to be used to determine the sequence of
+timesteps on which dump files are written. In this mode a dump on the
+first timestep of a run will also not be written unless the
+"dump_modify first"_dump_modify.html command is used.
+
+Dump filenames can contain two wildcard characters. If a "*"
+character appears in the filename, then one file per snapshot is
+written and the "*" character is replaced with the timestep value.
+For example, tmp.dump*.vtk becomes tmp.dump0.vtk, tmp.dump10000.vtk,
+tmp.dump20000.vtk, etc. Note that the "dump_modify pad"_dump_modify.html
+command can be used to insure all timestep numbers are the same length
+(e.g. 00010), which can make it easier to read a series of dump files
+in order with some post-processing tools.
+
+If a "%" character appears in the filename, then each of P processors
+writes a portion of the dump file, and the "%" character is replaced
+with the processor ID from 0 to P-1 preceded by an underscore character.
+For example, tmp.dump%.vtp becomes tmp.dump_0.vtp, tmp.dump_1.vtp, ...
+tmp.dump_P-1.vtp, etc. This creates smaller files and can be a fast
+mode of output on parallel machines that support parallel I/O for output.
+
+By default, P = the number of processors meaning one file per
+processor, but P can be set to a smaller value via the {nfile} or
+{fileper} keywords of the "dump_modify"_dump_modify.html command.
+These options can be the most efficient way of writing out dump files
+when running on large numbers of processors.
+
+For the legacy VTK format "%" is ignored and P = 1, i.e., only
+processor 0 does write files.
+
+Note that using the "*" and "%" characters together can produce a
+large number of small dump files!
+
+If {dump_modify binary} is used, the dump file (or files, if "*" or
+"%" is also used) is written in binary format. A binary dump file
+will be about the same size as a text version, but will typically
+write out much faster.
+
+:line
+
+[Restrictions:]
+
+The {vtk} style does not support writing of gzipped dump files.
+
+The {vtk} dump style is part of the USER-VTK package. It is
+only enabled if LAMMPS was built with that package. See the "Making
+LAMMPS"_Section_start.html#start_3 section for more info.
+
+To use this dump style, you also must link to the VTK library. See
+the info in lib/vtk/README and insure the Makefile.lammps file in that
+directory is appropriate for your machine.
+
+The {vtk} dump style supports neither buffering or custom format
+strings.
+
+[Related commands:]
+
+"dump"_dump.html, "dump image"_dump_image.html,
+"dump_modify"_dump_modify.html, "undump"_undump.html
+
+[Default:]
+
+By default, files are written in ASCII format. If the file extension
+is not one of .vtk, .vtp or .vtu, the legacy VTK file format is used.
+
diff --git a/doc/src/fix_cmap.txt b/doc/src/fix_cmap.txt
index 5fcac589b..2b14a20c1 100644
--- a/doc/src/fix_cmap.txt
+++ b/doc/src/fix_cmap.txt
@@ -1,132 +1,135 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix cmap command :h3
[Syntax:]
fix ID group-ID cmap filename :pre
ID, group-ID are documented in "fix"_fix.html command
cmap = style name of this fix command
filename = force-field file with CMAP coefficients :ul
[Examples:]
fix myCMAP all cmap ../potentials/cmap36.data
read_data proteinX.data fix myCMAP crossterm CMAP
fix_modify myCMAP energy yes :pre
[Description:]
This command enables CMAP crossterms to be added to simulations which
use the CHARMM force field. These are relevant for any CHARMM model
of a peptide or protein sequences that is 3 or more amino-acid
residues long; see "(Buck)"_#Buck and "(Brooks)"_#Brooks2 for details,
including the analytic energy expressions for CMAP interactions. The
CMAP crossterms add additional potential energy contributions to pairs
of overlapping phi-psi dihedrals of amino-acids, which are important
to properly represent their conformational behavior.
The examples/cmap directory has a sample input script and data file
for a small peptide, that illustrates use of the fix cmap command.
As in the example above, this fix should be used before reading a data
file that contains a listing of CMAP interactions. The {filename}
specified should contain the CMAP parameters for a particular version
of the CHARMM force field. Two such files are including in the
lammps/potentials directory: charmm22.cmap and charmm36.cmap.
The data file read by the "read_data" must contain the topology of all
the CMAP interactions, similar to the topology data for bonds, angles,
dihedrals, etc. Specially it should have a line like this
in its header section:
N crossterms :pre
where N is the number of CMAP crossterms. It should also have a section
in the body of the data file like this with N lines:
CMAP :pre
1 1 8 10 12 18 20
2 5 18 20 22 25 27
\[...\]
N 3 314 315 317 318 330 :pre
The first column is an index from 1 to N to enumerate the CMAP terms;
it is ignored by LAMMPS. The 2nd column is the "type" of the
interaction; it is an index into the CMAP force field file. The
remaining 5 columns are the atom IDs of the atoms in the two 4-atom
dihedrals that overlap to create the CMAP 5-body interaction. Note
that the "crossterm" and "CMAP" keywords for the header and body
sections match those specified in the read_data command following the
data file name; see the "read_data"_read_data.html doc page for
more details.
A data file containing CMAP crossterms can be generated from a PDB
file using the charmm2lammps.pl script in the tools/ch2lmp directory
of the LAMMPS distribution. The script must be invoked with the
optional "-cmap" flag to do this; see the tools/ch2lmp/README file for
more information.
The potential energy associated with CMAP interactions can be output
as described below. It can also be included in the total potential
energy of the system, as output by the
"thermo_style"_thermo_style.html command, if the "fix_modify
energy"_fix_modify.html command is used, as in the example above. See
the note below about how to include the CMAP energy when performing an
"energy minimization"_minimize.html.
:line
[Restart, fix_modify, output, run start/stop, minimize info:]
-No information about this fix is written to "binary restart
-files"_restart.html.
+This fix writes the list of CMAP crossterms to "binary restart
+files"_restart.html. See the "read_restart"_read_restart.html command
+for info on how to re-specify a fix in an input script that reads a
+restart file, so that the operation of the fix continues in an
+uninterrupted fashion.
The "fix_modify"_fix_modify.html {energy} option is supported by this
fix to add the potential "energy" of the CMAP interactions system's
potential energy as part of "thermodynamic output"_thermo_style.html.
This fix computes a global scalar which can be accessed by various
"output commands"_Section_howto.html#howto_15. The scalar is the
potential energy discussed above. The scalar value calculated by this
fix is "extensive".
No parameter of this fix can be used with the {start/stop} keywords of
the "run"_run.html command.
The forces due to this fix are imposed during an energy minimization,
invoked by the "minimize"_minimize.html command.
NOTE: If you want the potential energy associated with the CMAP terms
forces to be included in the total potential energy of the system (the
quantity being minimized), you MUST enable the
"fix_modify"_fix_modify.html {energy} option for this fix.
[Restrictions:]
This fix can only be used if LAMMPS was built with the MOLECULE
package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info on packages.
[Related commands:]
"fix_modify"_fix_modify.html, "read_data"_read_data.html
[Default:] none
:line
:link(Buck)
[(Buck)] Buck, Bouguet-Bonnet, Pastor, MacKerell Jr., Biophys J, 90, L36
(2006).
:link(Brooks2)
[(Brooks)] Brooks, Brooks, MacKerell Jr., J Comput Chem, 30, 1545 (2009).
diff --git a/doc/src/fix_gcmc.txt b/doc/src/fix_gcmc.txt
index 53973cdfb..7ac607a2f 100644
--- a/doc/src/fix_gcmc.txt
+++ b/doc/src/fix_gcmc.txt
@@ -1,417 +1,417 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix gcmc command :h3
[Syntax:]
fix ID group-ID gcmc N X M type seed T mu displace keyword values ... :pre
ID, group-ID are documented in "fix"_fix.html command :ulb,l
gcmc = style name of this fix command :l
N = invoke this fix every N steps :l
X = average number of GCMC exchanges to attempt every N steps :l
M = average number of MC moves to attempt every N steps :l
type = atom type for inserted atoms (must be 0 if mol keyword used) :l
seed = random # seed (positive integer) :l
T = temperature of the ideal gas reservoir (temperature units) :l
mu = chemical potential of the ideal gas reservoir (energy units) :l
displace = maximum Monte Carlo translation distance (length units) :l
zero or more keyword/value pairs may be appended to args :l
keyword = {mol}, {region}, {maxangle}, {pressure}, {fugacity_coeff}, {full_energy}, {charge}, {group}, {grouptype}, {intra_energy}, {tfac_insert}, or {overlap_cutoff}
{mol} value = template-ID
template-ID = ID of molecule template specified in a separate "molecule"_molecule.html command
{rigid} value = fix-ID
fix-ID = ID of "fix rigid/small"_fix_rigid.html command
{shake} value = fix-ID
fix-ID = ID of "fix shake"_fix_shake.html command
{region} value = region-ID
region-ID = ID of region where MC moves are allowed
{maxangle} value = maximum molecular rotation angle (degrees)
{pressure} value = pressure of the gas reservoir (pressure units)
{fugacity_coeff} value = fugacity coefficient of the gas reservoir (unitless)
{full_energy} = compute the entire system energy when performing MC moves
{charge} value = charge of inserted atoms (charge units)
{group} value = group-ID
group-ID = group-ID for inserted atoms (string)
{grouptype} values = type group-ID
type = atom type (int)
group-ID = group-ID for inserted atoms (string)
{intra_energy} value = intramolecular energy (energy units)
{tfac_insert} value = scale up/down temperature of inserted atoms (unitless)
{overlap_cutoff} value = maximum pair distance for overlap rejection (distance units) :pre
:ule
[Examples:]
fix 2 gas gcmc 10 1000 1000 2 29494 298.0 -0.5 0.01
fix 3 water gcmc 10 100 100 0 3456543 3.0 -2.5 0.1 mol my_one_water maxangle 180 full_energy
fix 4 my_gas gcmc 1 10 10 1 123456543 300.0 -12.5 1.0 region disk :pre
[Description:]
This fix performs grand canonical Monte Carlo (GCMC) exchanges of
atoms or molecules of the given type with an imaginary ideal gas
reservoir at the specified T and chemical potential (mu) as discussed
in "(Frenkel)"_#Frenkel. If used with the "fix nvt"_fix_nh.html
command, simulations in the grand canonical ensemble (muVT, constant
chemical potential, constant volume, and constant temperature) can be
performed. Specific uses include computing isotherms in microporous
materials, or computing vapor-liquid coexistence curves.
Every N timesteps the fix attempts a number of GCMC exchanges
(insertions or deletions) of gas atoms or molecules of the given type
between the simulation cell and the imaginary reservoir. It also
attempts a number of Monte Carlo moves (translations and molecule
rotations) of gas of the given type within the simulation cell or
region. The average number of attempted GCMC exchanges is X. The
average number of attempted MC moves is M. M should typically be
chosen to be approximately equal to the expected number of gas atoms
or molecules of the given type within the simulation cell or region,
which will result in roughly one MC translation per atom or molecule
per MC cycle.
For MC moves of molecular gasses, rotations and translations are each
attempted with 50% probability. For MC moves of atomic gasses,
translations are attempted 100% of the time. For MC exchanges of
either molecular or atomic gasses, deletions and insertions are each
attempted with 50% probability.
All inserted particles are always assigned to two groups: the default
group "all" and the group specified in the fix gcmc command (which can
also be "all"). In addition, particles are also added to any groups
specified by the {group} and {grouptype} keywords. If inserted
particles are individual atoms, they are assigned the atom type given
by the type argument. If they are molecules, the type argument has no
effect and must be set to zero. Instead, the type of each atom in the
inserted molecule is specified in the file read by the
"molecule"_molecule.html command.
This fix cannot be used to perform MC insertions of gas atoms or
molecules other than the exchanged type, but MC deletions,
translations, and rotations can be performed on any atom/molecule in
the fix group. All atoms in the simulation cell can be moved using
regular time integration translations, e.g. via "fix nvt"_fix_nh.html,
resulting in a hybrid GCMC+MD simulation. A smaller-than-usual
timestep size may be needed when running such a hybrid simulation,
especially if the inserted molecules are not well equilibrated.
This command may optionally use the {region} keyword to define an
exchange and move volume. The specified region must have been
previously defined with a "region"_region.html command. It must be
defined with side = {in}. Insertion attempts occur only within the
specified region. For non-rectangular regions, random trial points are
generated within the rectangular bounding box until a point is found
that lies inside the region. If no valid point is generated after 1000
trials, no insertion is performed, but it is counted as an attempted
insertion. Move and deletion attempt candidates are selected from gas
atoms or molecules within the region. If there are no candidates, no
move or deletion is performed, but it is counted as an attempt move or
deletion. If an attempted move places the atom or molecule
center-of-mass outside the specified region, a new attempted move is
generated. This process is repeated until the atom or molecule
center-of-mass is inside the specified region.
If used with "fix nvt"_fix_nh.html, the temperature of the imaginary
reservoir, T, should be set to be equivalent to the target temperature
used in fix nvt. Otherwise, the imaginary reservoir will not be in
thermal equilibrium with the simulation cell. Also, it is important
that the temperature used by fix nvt be dynamic/dof, which can be
achieved as follows:
compute mdtemp mdatoms temp
compute_modify mdtemp dynamic/dof yes
fix mdnvt mdatoms nvt temp 300.0 300.0 10.0
fix_modify mdnvt temp mdtemp :pre
Note that neighbor lists are re-built every timestep that this fix is
invoked, so you should not set N to be too small. However, periodic
rebuilds are necessary in order to avoid dangerous rebuilds and missed
interactions. Specifically, avoid performing so many MC translations
per timestep that atoms can move beyond the neighbor list skin
distance. See the "neighbor"_neighbor.html command for details.
When an atom or molecule is to be inserted, its coordinates are chosen
at a random position within the current simulation cell or region, and
new atom velocities are randomly chosen from the specified temperature
distribution given by T. The effective temperature for new atom
velocities can be increased or decreased using the optional keyword
{tfac_insert} (see below). Relative coordinates for atoms in a
molecule are taken from the template molecule provided by the
user. The center of mass of the molecule is placed at the insertion
point. The orientation of the molecule is chosen at random by rotating
about this point.
Individual atoms are inserted, unless the {mol} keyword is used. It
specifies a {template-ID} previously defined using the
"molecule"_molecule.html command, which reads a file that defines the
molecule. The coordinates, atom types, charges, etc, as well as any
bond/angle/etc and special neighbor information for the molecule can
be specified in the molecule file. See the "molecule"_molecule.html
command for details. The only settings required to be in this file
are the coordinates and types of atoms in the molecule.
When not using the {mol} keyword, you should ensure you do not delete
atoms that are bonded to other atoms, or LAMMPS will soon generate an
error when it tries to find bonded neighbors. LAMMPS will warn you if
any of the atoms eligible for deletion have a non-zero molecule ID,
but does not check for this at the time of deletion.
If you wish to insert molecules via the {mol} keyword, that will be
treated as rigid bodies, use the {rigid} keyword, specifying as its
value the ID of a separate "fix rigid/small"_fix_rigid.html command
which also appears in your input script.
NOTE: If you wish the new rigid molecules (and other rigid molecules)
to be thermostatted correctly via "fix rigid/small/nvt"_fix_rigid.html
or "fix rigid/small/npt"_fix_rigid.html, then you need to use the
"fix_modify dynamic/dof yes" command for the rigid fix. This is to
inform that fix that the molecule count will vary dynamically.
If you wish to insert molecules via the {mol} keyword, that will have
their bonds or angles constrained via SHAKE, use the {shake} keyword,
specifying as its value the ID of a separate "fix
shake"_fix_shake.html command which also appears in your input script.
Optionally, users may specify the maximum rotation angle for molecular
rotations using the {maxangle} keyword and specifying the angle in
degrees. Rotations are performed by generating a random point on the
unit sphere and a random rotation angle on the range
\[0,maxangle). The molecule is then rotated by that angle about an
axis passing through the molecule center of mass. The axis is parallel
to the unit vector defined by the point on the unit sphere. The same
procedure is used for randomly rotating molecules when they are
inserted, except that the maximum angle is 360 degrees.
Note that fix GCMC does not use configurational bias MC or any other
kind of sampling of intramolecular degrees of freedom. Inserted
molecules can have different orientations, but they will all have the
same intramolecular configuration, which was specified in the molecule
command input.
For atomic gasses, inserted atoms have the specified atom type, but
deleted atoms are any atoms that have been inserted or that belong to
the user-specified fix group. For molecular gasses, exchanged
molecules use the same atom types as in the template molecule supplied
by the user. In both cases, exchanged atoms/molecules are assigned to
two groups: the default group "all" and the group specified in the fix
gcmc command (which can also be "all").
The chemical potential is a user-specified input parameter defined
as:
:c,image(Eqs/fix_gcmc1.jpg)
The second term mu_ex is the excess chemical potential due to
energetic interactions and is formally zero for the fictitious gas
reservoir but is non-zero for interacting systems. So, while the
chemical potential of the reservoir and the simulation cell are equal,
mu_ex is not, and as a result, the densities of the two are generally
quite different. The first term mu_id is the ideal gas contribution
to the chemical potential. mu_id can be related to the density or
pressure of the fictitious gas reservoir by:
:c,image(Eqs/fix_gcmc2.jpg)
where k is Boltzman's constant,
T is the user-specified temperature, rho is the number density,
P is the pressure, and phi is the fugacity coefficient.
The constant Lambda is required for dimensional consistency.
For all unit styles except {lj} it is defined as the thermal
de Broglie wavelength
:c,image(Eqs/fix_gcmc3.jpg)
where h is Planck's constant, and m is the mass of the exchanged atom
or molecule. For unit style {lj}, Lambda is simply set to the
unity. Note that prior to March 2017, lambda for unit style {lj} was
calculated using the above formula with h set to the rather specific
value of 0.18292026. Chemical potential under the old definition can
be converted to an equivalent value under the new definition by
subtracting 3kTln(Lambda_old).
As an alternative to specifying mu directly, the ideal gas reservoir
can be defined by its pressure P using the {pressure} keyword, in
which case the user-specified chemical potential is ignored. The user
may also specify the fugacity coefficient phi using the
{fugacity_coeff} keyword, which defaults to unity.
The {full_energy} option means that fix GCMC will compute the total
potential energy of the entire simulated system. The total system
energy before and after the proposed GCMC move is then used in the
Metropolis criterion to determine whether or not to accept the
proposed GCMC move. By default, this option is off, in which case only
partial energies are computed to determine the difference in energy
that would be caused by the proposed GCMC move.
The {full_energy} option is needed for systems with complicated
potential energy calculations, including the following:
long-range electrostatics (kspace)
many-body pair styles
hybrid pair styles
eam pair styles
tail corrections
need to include potential energy contributions from other fixes :ul
In these cases, LAMMPS will automatically apply the {full_energy}
keyword and issue a warning message.
When the {mol} keyword is used, the {full_energy} option also includes
the intramolecular energy of inserted and deleted molecules. If this
is not desired, the {intra_energy} keyword can be used to define an
amount of energy that is subtracted from the final energy when a
molecule is inserted, and added to the initial energy when a molecule
is deleted. For molecules that have a non-zero intramolecular energy,
this will ensure roughly the same behavior whether or not the
{full_energy} option is used.
Inserted atoms and molecules are assigned random velocities based on
the specified temperature T. Because the relative velocity of all
atoms in the molecule is zero, this may result in inserted molecules
that are systematically too cold. In addition, the intramolecular
potential energy of the inserted molecule may cause the kinetic energy
of the molecule to quickly increase or decrease after insertion. The
{tfac_insert} keyword allows the user to counteract these effects by
changing the temperature used to assign velocities to inserted atoms
and molecules by a constant factor. For a particular application, some
experimentation may be required to find a value of {tfac_insert} that
results in inserted molecules that equilibrate quickly to the correct
temperature.
Some fixes have an associated potential energy. Examples of such fixes
include: "efield"_fix_efield.html, "gravity"_fix_gravity.html,
"addforce"_fix_addforce.html, "langevin"_fix_langevin.html,
"restrain"_fix_restrain.html,
"temp/berendsen"_fix_temp_berendsen.html,
"temp/rescale"_fix_temp_rescale.html, and "wall fixes"_fix_wall.html.
For that energy to be included in the total potential energy of the
system (the quantity used when performing GCMC moves), you MUST enable
the "fix_modify"_fix_modify.html {energy} option for that fix. The
doc pages for individual "fix"_fix.html commands specify if this
should be done.
Use the {charge} option to insert atoms with a user-specified point
charge. Note that doing so will cause the system to become
non-neutral. LAMMPS issues a warning when using long-range
electrostatics (kspace) with non-neutral systems. See the "compute
group/group"_compute_group_group.html documentation for more details
about simulating non-neutral systems with kspace on.
Use of this fix typically will cause the number of atoms to fluctuate,
therefore, you will want to use the
"compute_modify"_compute_modify.html command to insure that the
current number of atoms is used as a normalizing factor each time
temperature is computed. Here is the necessary command:
NOTE: If the density of the cell is initially very small or zero, and
increases to a much larger density after a period of equilibration,
then certain quantities that are only calculated once at the start
(kspace parameters, tail corrections) may no longer be accurate. The
solution is to start a new simulation after the equilibrium density
has been reached.
With some pair_styles, such as "Buckingham"_pair_buck.html,
-"Born-Mayer-Huggins"_pair_born.html and "ReaxFF"_pair_reax_c.html, two
+"Born-Mayer-Huggins"_pair_born.html and "ReaxFF"_pair_reaxc.html, two
atoms placed close to each other may have an arbitrary large, negative
potential energy due to the functional form of the potential. While
these unphysical configurations are inaccessible to typical dynamical
trajectories, they can be generated by Monte Carlo moves. The
{overlap_cutoff} keyword suppresses these moves by effectively
assigning an infinite positive energy to all new configurations that
place any pair of atoms closer than the specified overlap cutoff
distance.
compute_modify thermo_temp dynamic yes :pre
If LJ units are used, note that a value of 0.18292026 is used by this
fix as the reduced value for Planck's constant. This value was
derived from LJ parameters for argon, where h* = h/sqrt(sigma^2 *
epsilon * mass), sigma = 3.429 angstroms, epsilon/k = 121.85 K, and
mass = 39.948 amu.
The {group} keyword assigns all inserted atoms to the
"group"_group.html of the group-ID value. The {grouptype} keyword
assigns all inserted atoms of the specified type to the
"group"_group.html of the group-ID value.
[Restart, fix_modify, output, run start/stop, minimize info:]
This fix writes the state of the fix to "binary restart
files"_restart.html. This includes information about the random
number generator seed, the next timestep for MC exchanges, etc. See
the "read_restart"_read_restart.html command for info on how to
re-specify a fix in an input script that reads a restart file, so that
the operation of the fix continues in an uninterrupted fashion.
None of the "fix_modify"_fix_modify.html options are relevant to this
fix.
This fix computes a global vector of length 8, which can be accessed
by various "output commands"_Section_howto.html#howto_15. The vector
values are the following global cumulative quantities:
1 = translation attempts
2 = translation successes
3 = insertion attempts
4 = insertion successes
5 = deletion attempts
6 = deletion successes
7 = rotation attempts
8 = rotation successes :ul
The vector values calculated by this fix are "extensive".
No parameter of this fix can be used with the {start/stop} keywords of
the "run"_run.html command. This fix is not invoked during "energy
minimization"_minimize.html.
[Restrictions:]
This fix is part of the MC package. It is only enabled if LAMMPS was
built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
Do not set "neigh_modify once yes" or else this fix will never be
called. Reneighboring is required.
Can be run in parallel, but aspects of the GCMC part will not scale
well in parallel. Only usable for 3D simulations.
Note that very lengthy simulations involving insertions/deletions of
billions of gas molecules may run out of atom or molecule IDs and
trigger an error, so it is better to run multiple shorter-duration
simulations. Likewise, very large molecules have not been tested and
may turn out to be problematic.
Use of multiple fix gcmc commands in the same input script can be
problematic if using a template molecule. The issue is that the
user-referenced template molecule in the second fix gcmc command may
no longer exist since it might have been deleted by the first fix gcmc
command. An existing template molecule will need to be referenced by
the user for each subsequent fix gcmc command.
[Related commands:]
"fix atom/swap"_fix_atom_swap.html,
"fix nvt"_fix_nh.html, "neighbor"_neighbor.html,
"fix deposit"_fix_deposit.html, "fix evaporate"_fix_evaporate.html,
"delete_atoms"_delete_atoms.html
[Default:]
The option defaults are mol = no, maxangle = 10, overlap_cutoff = 0.0,
fugacity_coeff = 1, and full_energy = no,
except for the situations where full_energy is required, as
listed above.
:line
:link(Frenkel)
[(Frenkel)] Frenkel and Smit, Understanding Molecular Simulation,
Academic Press, London, 2002.
diff --git a/doc/src/fix_gle.txt b/doc/src/fix_gle.txt
index ca7625e2d..b8d3cc9b3 100644
--- a/doc/src/fix_gle.txt
+++ b/doc/src/fix_gle.txt
@@ -1,155 +1,156 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix gle command :h3
[Syntax:]
fix ID id-group gle Ns Tstart Tstop seed Amatrix \[noneq Cmatrix\] \[every stride\] :pre
ID, group-ID are documented in "fix"_fix.html command :ulb,l
gle = style name of this fix command :l
Ns = number of additional fictitious momenta :l
Tstart, Tstop = temperature ramp during the run :l
Amatrix = file to read the drift matrix A from :l
seed = random number seed to use for generating noise (positive integer) :l
zero or more keyword/value pairs may be appended :l
keyword = {noneq} or {every}
{noneq} Cmatrix = file to read the non-equilibrium covariance matrix from
{every} stride = apply the GLE once every time steps. Reduces the accuracy
of the integration of the GLE, but has *no effect* on the accuracy of equilibrium
sampling. It might change sampling properties when used together with {noneq}. :pre
:ule
[Examples:]
fix 3 boundary gle 6 300 300 31415 smart.A
fix 1 all gle 6 300 300 31415 qt-300k.A noneq qt-300k.C :pre
[Description:]
Apply a Generalized Langevin Equation (GLE) thermostat as described
in "(Ceriotti)"_#Ceriotti. The formalism allows one to obtain a number
of different effects ranging from efficient sampling of all
vibrational modes in the system to inexpensive (approximate)
modelling of nuclear quantum effects. Contrary to
"fix langevin"_fix_langevin.html, this fix performs both
thermostatting and evolution of the Hamiltonian equations of motion, so it
should not be used together with "fix nve"_fix_nve.html -- at least not
on the same atom groups.
Each degree of freedom in the thermostatted group is supplemented
with Ns additional degrees of freedom s, and the equations of motion
become
dq/dt=p/m
d(p,s)/dt=(F,0) - A(p,s) + B dW/dt :pre
where F is the physical force, A is the drift matrix (that generalizes
the friction in Langevin dynamics), B is the diffusion term and dW/dt
un-correlated Gaussian random forces. The A matrix couples the physical
(q,p) dynamics with that of the additional degrees of freedom,
and makes it possible to obtain effectively a history-dependent
noise and friction kernel.
The drift matrix should be given as an external file {Afile},
as a (Ns+1 x Ns+1) matrix in inverse time units. Matrices that are
optimal for a given application and the system of choice can be
obtained from "(GLE4MD)"_#GLE4MD.
Equilibrium sampling a temperature T is obtained by specifying the
target value as the {Tstart} and {Tstop} arguments, so that the diffusion
matrix that gives canonical sampling for a given A is computed automatically.
However, the GLE framework also allow for non-equilibrium sampling, that
can be used for instance to model inexpensively zero-point energy
-effects "(Ceriotti2)"_#Ceriotti2. This is achieved specifying the
-{noneq} keyword followed by the name of the file that contains the
-static covariance matrix for the non-equilibrium dynamics.
+effects "(Ceriotti2)"_#Ceriotti2. This is achieved specifying the {noneq}
+ keyword followed by the name of the file that contains the static covariance
+matrix for the non-equilibrium dynamics. Please note, that the covariance
+matrix is expected to be given in [temperature units].
Since integrating GLE dynamics can be costly when used together with
simple potentials, one can use the {every} optional keyword to
apply the Langevin terms only once every several MD steps, in a
multiple time-step fashion. This should be used with care when doing
non-equilibrium sampling, but should have no effect on equilibrium
averages when using canonical sampling.
The random number {seed} must be a positive integer. A Marsaglia random
number generator is used. Each processor uses the input seed to
generate its own unique seed and its own stream of random numbers.
Thus the dynamics of the system will not be identical on two runs on
different numbers of processors.
Note also that the Generalized Langevin Dynamics scheme that is
implemented by the "fix gld"_fix_gld.html scheme is closely related
to the present one. In fact, it should be always possible to cast the
Prony series form of the memory kernel used by GLD into an appropriate
input matrix for "fix gle"_fix_gle.html. While the GLE scheme is more
general, the form used by "fix gld"_fix_gld.html can be more directly
related to the representation of an implicit solvent environment.
[Restart, fix_modify, output, run start/stop, minimize info:]
The instantaneous values of the extended variables are written to
"binary restart files"_restart.html. Because the state of the random
number generator is not saved in restart files, this means you cannot
do "exact" restarts with this fix, where the simulation continues on
the same as if no restart had taken place. However, in a statistical
sense, a restarted simulation should produce the same behavior.
Note however that you should use a different seed each time you
restart, otherwise the same sequence of random numbers will be used
each time, which might lead to stochastic synchronization and
subtle artefacts in the sampling.
This fix can ramp its target temperature over multiple runs, using the
{start} and {stop} keywords of the "run"_run.html command. See the
"run"_run.html command for details of how to do this.
The "fix_modify"_fix_modify.html {energy} option is supported by this
fix to add the energy change induced by Langevin thermostatting to the
system's potential energy as part of "thermodynamic
output"_thermo_style.html.
This fix computes a global scalar which can be accessed by various
"output commands"_Section_howto.html#howto_15. The scalar is the
cumulative energy change due to this fix. The scalar value
calculated by this fix is "extensive".
[Restrictions:]
The GLE thermostat in its current implementation should not be used
with rigid bodies, SHAKE or RATTLE. It is expected that all the
thermostatted degrees of freedom are fully flexible, and the sampled
ensemble will not be correct otherwise.
In order to perform constant-pressure simulations please use
"fix press/berendsen"_fix_press_berendsen.html, rather than
"fix npt"_fix_nh.html, to avoid duplicate integration of the
equations of motion.
This fix is part of the USER-MISC package. It is only enabled if LAMMPS
was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
[Related commands:]
"fix nvt"_fix_nh.html, "fix temp/rescale"_fix_temp_rescale.html, "fix
viscous"_fix_viscous.html, "fix nvt"_fix_nh.html, "pair_style
dpd/tstat"_pair_dpd.html, "fix gld"_fix_gld.html
:line
:link(Ceriotti)
[(Ceriotti)] Ceriotti, Bussi and Parrinello, J Chem Theory Comput 6,
1170-80 (2010)
:link(GLE4MD)
-[(GLE4MD)] "http://epfl-cosmo.github.io/gle4md/"_http://epfl-cosmo.github.io/gle4md/
+[(GLE4MD)] "http://gle4md.org/"_http://gle4md.org/
:link(Ceriotti2)
[(Ceriotti2)] Ceriotti, Bussi and Parrinello, Phys Rev Lett 103,
030603 (2009)
diff --git a/doc/src/fix_qeq.txt b/doc/src/fix_qeq.txt
index f9c8ecde6..22f476689 100644
--- a/doc/src/fix_qeq.txt
+++ b/doc/src/fix_qeq.txt
@@ -1,217 +1,217 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix qeq/point command :h3
fix qeq/shielded command :h3
fix qeq/slater command :h3
fix qeq/dynamic command :h3
fix qeq/fire command :h3
[Syntax:]
fix ID group-ID style Nevery cutoff tolerance maxiter qfile keyword ... :pre
ID, group-ID are documented in "fix"_fix.html command :ulb,l
style = {qeq/point} or {qeq/shielded} or {qeq/slater} or {qeq/dynamic} or {qeq/fire} :l
Nevery = perform charge equilibration every this many steps :l
cutoff = global cutoff for charge-charge interactions (distance unit) :l
tolerance = precision to which charges will be equilibrated :l
maxiter = maximum iterations to perform charge equilibration :l
qfile = a filename with QEq parameters :l
zero or more keyword/value pairs may be appended :l
keyword = {alpha} or {qdamp} or {qstep} :l
{alpha} value = Slater type orbital exponent (qeq/slater only)
{qdamp} value = damping factor for damped dynamics charge solver (qeq/dynamic and qeq/fire only)
{qstep} value = time step size for damped dynamics charge solver (qeq/dynamic and qeq/fire only) :pre
:ule
[Examples:]
fix 1 all qeq/point 1 10 1.0e-6 200 param.qeq1
fix 1 qeq qeq/shielded 1 8 1.0e-6 100 param.qeq2
fix 1 all qeq/slater 5 10 1.0e-6 100 params alpha 0.2
fix 1 qeq qeq/dynamic 1 12 1.0e-3 100 my_qeq
fix 1 all qeq/fire 1 10 1.0e-3 100 my_qeq qdamp 0.2 qstep 0.1 :pre
[Description:]
Perform the charge equilibration (QEq) method as described in "(Rappe
and Goddard)"_#Rappe1 and formulated in "(Nakano)"_#Nakano1 (also known
as the matrix inversion method) and in "(Rick and Stuart)"_#Rick1 (also
known as the extended Lagrangian method) based on the
electronegativity equilization principle.
These fixes can be used with any "pair style"_pair_style.html in
LAMMPS, so long as per-atom charges are defined. The most typical
use-case is in conjunction with a "pair style"_pair_style.html that
performs charge equilibration periodically (e.g. every timestep), such
as the ReaxFF or Streitz-Mintmire potential.
But these fixes can also be used with
potentials that normally assume per-atom charges are fixed, e.g. a
"Buckingham"_pair_buck.html or "LJ/Coulombic"_pair_lj.html potential.
Because the charge equilibration calculation is effectively
independent of the pair style, these fixes can also be used to perform
a one-time assignment of charges to atoms. For example, you could
define the QEq fix, perform a zero-timestep run via the "run"_run.html
command without any pair style defined which would set per-atom
charges (based on the current atom configuration), then remove the fix
via the "unfix"_unfix.html command before performing further dynamics.
NOTE: Computing and using charge values different from published
values defined for a fixed-charge potential like Buckingham or CHARMM
or AMBER, can have a strong effect on energies and forces, and
produces a different model than the published versions.
NOTE: The "fix qeq/comb"_fix_qeq_comb.html command must still be used
to perform charge equilibration with the "COMB
potential"_pair_comb.html. The "fix qeq/reax"_fix_qeq_reax.html
command can be used to perform charge equilibration with the "ReaxFF
-force field"_pair_reax_c.html, although fix qeq/shielded yields the
+force field"_pair_reaxc.html, although fix qeq/shielded yields the
same results as fix qeq/reax if {Nevery}, {cutoff}, and {tolerance}
are the same. Eventually the fix qeq/reax command will be deprecated.
The QEq method minimizes the electrostatic energy of the system (or
equalizes the derivative of energy with respect to charge of all the
atoms) by adjusting the partial charge on individual atoms based on
interactions with their neighbors within {cutoff}. It requires a few
parameters, in {metal} units, for each atom type which provided in a
file specified by {qfile}. The file has the following format
1 chi eta gamma zeta qcore
2 chi eta gamma zeta qcore
...
Ntype chi eta gamma zeta qcore :pre
There is one line per atom type with the following parameters.
Only a subset of the parameters is used by each QEq style as described
below, thus the others can be set to 0.0 if desired.
{chi} = electronegativity in energy units
{eta} = self-Coulomb potential in energy units
{gamma} = shielded Coulomb constant defined by "ReaxFF force field"_#vanDuin in distance units
{zeta} = Slater type orbital exponent defined by the "Streitz-Mintmire"_#Streitz1 potential in reverse distance units
{qcore} = charge of the nucleus defined by the "Streitz-Mintmire potential"_#Streitz1 potential in charge units :ul
The {qeq/point} style describes partial charges on atoms as point
charges. Interaction between a pair of charged particles is 1/r,
which is the simplest description of the interaction between charges.
Only the {chi} and {eta} parameters from the {qfile} file are used.
Note that Coulomb catastrophe can occur if repulsion between the pair
of charged particles is too weak. This style solves partial charges
on atoms via the matrix inversion method. A tolerance of 1.0e-6 is
usually a good number.
The {qeq/shielded} style describes partial charges on atoms also as
point charges, but uses a shielded Coulomb potential to describe the
interaction between a pair of charged particles. Interaction through
the shielded Coulomb is given by equation (13) of the "ReaxFF force
field"_#vanDuin paper. The shielding accounts for charge overlap
between charged particles at small separation. This style is the same
as "fix qeq/reax"_fix_qeq_reax.html, and can be used with "pair_style
-reax/c"_pair_reax_c.html. Only the {chi}, {eta}, and {gamma}
+reax/c"_pair_reaxc.html. Only the {chi}, {eta}, and {gamma}
parameters from the {qfile} file are used. This style solves partial
charges on atoms via the matrix inversion method. A tolerance of
1.0e-6 is usually a good number.
The {qeq/slater} style describes partial charges on atoms as spherical
charge densities centered around atoms via the Slater 1{s} orbital, so
that the interaction between a pair of charged particles is the
product of two Slater 1{s} orbitals. The expression for the Slater
1{s} orbital is given under equation (6) of the
"Streitz-Mintmire"_#Streitz1 paper. Only the {chi}, {eta}, {zeta}, and
{qcore} parameters from the {qfile} file are used. This style solves
partial charges on atoms via the matrix inversion method. A tolerance
of 1.0e-6 is usually a good number. Keyword {alpha} can be used to
change the Slater type orbital exponent.
The {qeq/dynamic} style describes partial charges on atoms as point
charges that interact through 1/r, but the extended Lagrangian method
is used to solve partial charges on atoms. Only the {chi} and {eta}
parameters from the {qfile} file are used. Note that Coulomb
catastrophe can occur if repulsion between the pair of charged
particles is too weak. A tolerance of 1.0e-3 is usually a good
number. Keyword {qdamp} can be used to change the damping factor, while
keyword {qstep} can be used to change the time step size.
The "{qeq/fire}"_#Shan style describes the same charge model and charge
solver as the {qeq/dynamic} style, but employs a FIRE minimization
algorithm to solve for equilibrium charges.
Keyword {qdamp} can be used to change the damping factor, while
keyword {qstep} can be used to change the time step size.
Note that {qeq/point}, {qeq/shielded}, and {qeq/slater} describe
different charge models, whereas the matrix inversion method and the
extended Lagrangian method ({qeq/dynamic} and {qeq/fire}) are
different solvers.
Note that {qeq/point}, {qeq/dynamic} and {qeq/fire} styles all describe
charges as point charges that interact through 1/r relationship, but
solve partial charges on atoms using different solvers. These three
styles should yield comparable results if
the QEq parameters and {Nevery}, {cutoff}, and {tolerance} are the
same. Style {qeq/point} is typically faster, {qeq/dynamic} scales
better on larger sizes, and {qeq/fire} is faster than {qeq/dynamic}.
NOTE: To avoid the evaluation of the derivative of charge with respect
to position, which is typically ill-defined, the system should have a
zero net charge.
NOTE: Developing QEq parameters (chi, eta, gamma, zeta, and qcore) is
non-trivial. Charges on atoms are not guaranteed to equilibrate with
arbitrary choices of these parameters. We do not develop these QEq
parameters. See the examples/qeq directory for some examples.
[Restart, fix_modify, output, run start/stop, minimize info:]
No information about these fixes is written to "binary restart
files"_restart.html. No global scalar or vector or per-atom
quantities are stored by these fixes for access by various "output
commands"_Section_howto.html#howto_15. No parameter of these fixes
can be used with the {start/stop} keywords of the "run"_run.html
command.
Thexe fixes are invoked during "energy minimization"_minimize.html.
[Restrictions:]
These fixes are part of the QEQ package. They are only enabled if
LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
[Related commands:]
"fix qeq/reax"_fix_qeq_reax.html, "fix qeq/comb"_fix_qeq_comb.html
[Default:] none
:line
:link(Rappe1)
[(Rappe and Goddard)] A. K. Rappe and W. A. Goddard III, J Physical
Chemistry, 95, 3358-3363 (1991).
:link(Nakano1)
[(Nakano)] A. Nakano, Computer Physics Communications, 104, 59-69 (1997).
:link(Rick1)
[(Rick and Stuart)] S. W. Rick, S. J. Stuart, B. J. Berne, J Chemical Physics
101, 16141 (1994).
:link(Streitz1)
[(Streitz-Mintmire)] F. H. Streitz, J. W. Mintmire, Physical Review B, 50,
16, 11996 (1994)
:link(vanDuin)
[(ReaxFF)] A. C. T. van Duin, S. Dasgupta, F. Lorant, W. A. Goddard III, J
Physical Chemistry, 105, 9396-9049 (2001)
:link(Shan)
[(QEq/Fire)] T.-R. Shan, A. P. Thompson, S. J. Plimpton, in preparation
diff --git a/doc/src/fix_qeq_reax.txt b/doc/src/fix_qeq_reax.txt
index 76c95e111..aed043f6c 100644
--- a/doc/src/fix_qeq_reax.txt
+++ b/doc/src/fix_qeq_reax.txt
@@ -1,124 +1,124 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix qeq/reax command :h3
fix qeq/reax/kk command :h3
[Syntax:]
fix ID group-ID qeq/reax Nevery cutlo cuthi tolerance params :pre
ID, group-ID are documented in "fix"_fix.html command
qeq/reax = style name of this fix command
Nevery = perform QEq every this many steps
cutlo,cuthi = lo and hi cutoff for Taper radius
tolerance = precision to which charges will be equilibrated
params = reax/c or a filename :ul
[Examples:]
fix 1 all qeq/reax 1 0.0 10.0 1.0e-6 reax/c
fix 1 all qeq/reax 1 0.0 10.0 1.0e-6 param.qeq :pre
[Description:]
Perform the charge equilibration (QEq) method as described in "(Rappe
and Goddard)"_#Rappe2 and formulated in "(Nakano)"_#Nakano2. It is
typically used in conjunction with the ReaxFF force field model as
-implemented in the "pair_style reax/c"_pair_reax_c.html command, but
+implemented in the "pair_style reax/c"_pair_reaxc.html command, but
it can be used with any potential in LAMMPS, so long as it defines and
uses charges on each atom. The "fix qeq/comb"_fix_qeq_comb.html
command should be used to perform charge equilibration with the "COMB
potential"_pair_comb.html. For more technical details about the
charge equilibration performed by fix qeq/reax, see the
"(Aktulga)"_#qeq-Aktulga paper.
The QEq method minimizes the electrostatic energy of the system by
adjusting the partial charge on individual atoms based on interactions
with their neighbors. It requires some parameters for each atom type.
If the {params} setting above is the word "reax/c", then these are
-extracted from the "pair_style reax/c"_pair_reax_c.html command and
+extracted from the "pair_style reax/c"_pair_reaxc.html command and
the ReaxFF force field file it reads in. If a file name is specified
for {params}, then the parameters are taken from the specified file
and the file must contain one line for each atom type. The latter
form must be used when performing QeQ with a non-ReaxFF potential.
Each line should be formatted as follows:
itype chi eta gamma :pre
where {itype} is the atom type from 1 to Ntypes, {chi} denotes the
electronegativity in eV, {eta} denotes the self-Coulomb
potential in eV, and {gamma} denotes the valence orbital
exponent. Note that these 3 quantities are also in the ReaxFF
potential file, except that eta is defined here as twice the eta value
in the ReaxFF file. Note that unlike the rest of LAMMPS, the units
of this fix are hard-coded to be A, eV, and electronic charge.
[Restart, fix_modify, output, run start/stop, minimize info:]
No information about this fix is written to "binary restart
files"_restart.html. No global scalar or vector or per-atom
quantities are stored by this fix for access by various "output
commands"_Section_howto.html#howto_15. No parameter of this fix can
be used with the {start/stop} keywords of the "run"_run.html command.
This fix is invoked during "energy minimization"_minimize.html.
:line
Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
functionally the same as the corresponding style without the suffix.
They have been optimized to run faster, depending on your available
hardware, as discussed in "Section 5"_Section_accelerate.html
of the manual. The accelerated styles take the same arguments and
should produce the same results, except for round-off and precision
issues.
These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
USER-OMP and OPT packages, respectively. They are only enabled if
LAMMPS was built with those packages. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
You can specify the accelerated styles explicitly in your input script
by including their suffix, or you can use the "-suffix command-line
switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
use the "suffix"_suffix.html command in your input script.
See "Section 5"_Section_accelerate.html of the manual for
more instructions on how to use the accelerated styles effectively.
:line
[Restrictions:]
This fix is part of the USER-REAXC package. It is only enabled if
LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
This fix does not correctly handle interactions
involving multiple periodic images of the same atom. Hence, it should not
be used for periodic cell dimensions less than 10 angstroms.
[Related commands:]
-"pair_style reax/c"_pair_reax_c.html
+"pair_style reax/c"_pair_reaxc.html
[Default:] none
:line
:link(Rappe2)
[(Rappe)] Rappe and Goddard III, Journal of Physical Chemistry, 95,
3358-3363 (1991).
:link(Nakano2)
[(Nakano)] Nakano, Computer Physics Communications, 104, 59-69 (1997).
:link(qeq-Aktulga)
[(Aktulga)] Aktulga, Fogarty, Pandit, Grama, Parallel Computing, 38,
245-259 (2012).
diff --git a/doc/src/fix_reax_bonds.txt b/doc/src/fix_reax_bonds.txt
index 1fd1b3ca5..d3f108709 100644
--- a/doc/src/fix_reax_bonds.txt
+++ b/doc/src/fix_reax_bonds.txt
@@ -1,93 +1,93 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix reax/bonds command :h3
fix reax/c/bonds command :h3
fix reax/c/bonds/kk command :h3
[Syntax:]
fix ID group-ID reax/bonds Nevery filename :pre
ID, group-ID are documented in "fix"_fix.html command
reax/bonds = style name of this fix command
Nevery = output interval in timesteps
filename = name of output file :ul
[Examples:]
fix 1 all reax/bonds 100 bonds.tatb
fix 1 all reax/c/bonds 100 bonds.reaxc :pre
[Description:]
Write out the bond information computed by the ReaxFF potential
specified by "pair_style reax"_pair_reax.html or "pair_style
-reax/c"_pair_reax_c.html in the exact same format as the original
+reax/c"_pair_reaxc.html in the exact same format as the original
stand-alone ReaxFF code of Adri van Duin. The bond information is
written to {filename} on timesteps that are multiples of {Nevery},
including timestep 0. For time-averaged chemical species analysis,
please see the "fix reaxc/c/species"_fix_reaxc_species.html command.
The format of the output file should be self-explanatory.
:line
[Restart, fix_modify, output, run start/stop, minimize info:]
No information about this fix is written to "binary restart
files"_restart.html. None of the "fix_modify"_fix_modify.html options
are relevant to this fix. No global or per-atom quantities are stored
by this fix for access by various "output
commands"_Section_howto.html#howto_15. No parameter of this fix can
be used with the {start/stop} keywords of the "run"_run.html command.
This fix is not invoked during "energy minimization"_minimize.html.
:line
Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
functionally the same as the corresponding style without the suffix.
They have been optimized to run faster, depending on your available
hardware, as discussed in "Section_accelerate"_Section_accelerate.html
of the manual. The accelerated styles take the same arguments and
should produce the same results, except for round-off and precision
issues.
These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
USER-OMP and OPT packages, respectively. They are only enabled if
LAMMPS was built with those packages. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
You can specify the accelerated styles explicitly in your input script
by including their suffix, or you can use the "-suffix command-line
switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
use the "suffix"_suffix.html command in your input script.
See "Section_accelerate"_Section_accelerate.html of the manual for
more instructions on how to use the accelerated styles effectively.
:line
[Restrictions:]
The fix reax/bonds command requires that the "pair_style
reax"_pair_reax.html be invoked. This fix is part of the REAX
package. It is only enabled if LAMMPS was built with that package,
which also requires the REAX library be built and linked with LAMMPS.
The fix reax/c/bonds command requires that the "pair_style
-reax/c"_pair_reax_c.html be invoked. This fix is part of the
+reax/c"_pair_reaxc.html be invoked. This fix is part of the
USER-REAXC package. It is only enabled if LAMMPS was built with that
package. See the "Making LAMMPS"_Section_start.html#start_3 section
for more info.
[Related commands:]
"pair_style reax"_pair_reax.html, "pair_style
-reax/c"_pair_reax_c.html, "fix reax/c/species"_fix_reaxc_species.html
+reax/c"_pair_reaxc.html, "fix reax/c/species"_fix_reaxc_species.html
[Default:] none
diff --git a/doc/src/fix_reaxc_species.txt b/doc/src/fix_reaxc_species.txt
index 00db91900..d43a338a6 100644
--- a/doc/src/fix_reaxc_species.txt
+++ b/doc/src/fix_reaxc_species.txt
@@ -1,180 +1,180 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
fix reax/c/species command :h3
fix reax/c/species/kk command :h3
[Syntax:]
fix ID group-ID reax/c/species Nevery Nrepeat Nfreq filename keyword value ... :pre
ID, group-ID are documented in "fix"_fix.html command :ulb,l
reax/c/species = style name of this command :l
Nevery = sample bond-order every this many timesteps :l
Nrepeat = # of bond-order samples used for calculating averages :l
Nfreq = calculate average bond-order every this many timesteps :l
filename = name of output file :l
zero or more keyword/value pairs may be appended :l
keyword = {cutoff} or {element} or {position} :l
{cutoff} value = I J Cutoff
I, J = atom types
Cutoff = Bond-order cutoff value for this pair of atom types
{element} value = Element1, Element2, ...
{position} value = posfreq filepos
posfreq = write position files every this many timestep
filepos = name of position output file :pre
:ule
[Examples:]
fix 1 all reax/c/species 10 10 100 species.out
fix 1 all reax/c/species 1 2 20 species.out cutoff 1 1 0.40 cutoff 1 2 0.55
fix 1 all reax/c/species 1 100 100 species.out element Au O H position 1000 AuOH.pos :pre
[Description:]
Write out the chemical species information computed by the ReaxFF
-potential specified by "pair_style reax/c"_pair_reax_c.html.
+potential specified by "pair_style reax/c"_pair_reaxc.html.
Bond-order values (either averaged or instantaneous, depending on
value of {Nrepeat}) are used to determine chemical bonds. Every
{Nfreq} timesteps, chemical species information is written to
{filename} as a two line output. The first line is a header
containing labels. The second line consists of the following:
timestep, total number of molecules, total number of distinct species,
number of molecules of each species. In this context, "species" means
a unique molecule. The chemical formula of each species is given in
the first line.
Optional keyword {cutoff} can be assigned to change the minimum
bond-order values used in identifying chemical bonds between pairs of
atoms. Bond-order cutoffs should be carefully chosen, as bond-order
cutoffs that are too small may include too many bonds (which will
result in an error), while cutoffs that are too large will result in
fragmented molecules. The default cutoff of 0.3 usually gives good
results.
The optional keyword {element} can be used to specify the chemical
symbol printed for each LAMMPS atom type. The number of symbols must
match the number of LAMMPS atom types and each symbol must consist of
1 or 2 alphanumeric characters. Normally, these symbols should be
chosen to match the chemical identity of each LAMMPS atom type, as
-specified using the "reax/c pair_coeff"_pair_reax_c.html command and
+specified using the "reax/c pair_coeff"_pair_reaxc.html command and
the ReaxFF force field file.
The optional keyword {position} writes center-of-mass positions of
each identified molecules to file {filepos} every {posfreq} timesteps.
The first line contains information on timestep, total number of
molecules, total number of distinct species, and box dimensions. The
second line is a header containing labels. From the third line
downward, each molecule writes a line of output containing the
following information: molecule ID, number of atoms in this molecule,
chemical formula, total charge, and center-of-mass xyz positions of
this molecule. The xyz positions are in fractional coordinates
relative to the box dimensions.
For the keyword {position}, the {filepos} is the name of the output
file. It can contain the wildcard character "*". If the "*"
character appears in {filepos}, then one file per snapshot is written
at {posfreq} and the "*" character is replaced with the timestep
value. For example, AuO.pos.* becomes AuO.pos.0, AuO.pos.1000, etc.
:line
The {Nevery}, {Nrepeat}, and {Nfreq} arguments specify on what
timesteps the bond-order values are sampled to get the average bond
order. The species analysis is performed using the average bond-order
on timesteps that are a multiple of {Nfreq}. The average is over
{Nrepeat} bond-order samples, computed in the preceding portion of the
simulation every {Nevery} timesteps. {Nfreq} must be a multiple of
{Nevery} and {Nevery} must be non-zero even if {Nrepeat} is 1.
Also, the timesteps
contributing to the average bond-order cannot overlap,
i.e. Nrepeat*Nevery can not exceed Nfreq.
For example, if Nevery=2, Nrepeat=6, and Nfreq=100, then bond-order
values on timesteps 90,92,94,96,98,100 will be used to compute the
average bond-order for the species analysis output on timestep 100.
:line
[Restart, fix_modify, output, run start/stop, minimize info:]
No information about this fix is written to "binary restart
files"_restart.html. None of the "fix_modify"_fix_modify.html options
are relevant to this fix.
This fix computes both a global vector of length 2 and a per-atom
vector, either of which can be accessed by various "output
commands"_Section_howto.html#howto_15. The values in the global
vector are "intensive".
The 2 values in the global vector are as follows:
1 = total number of molecules
2 = total number of distinct species :ul
The per-atom vector stores the molecule ID for each atom as identified
by the fix. If an atom is not in a molecule, its ID will be 0.
For atoms in the same molecule, the molecule ID for all of them
will be the same and will be equal to the smallest atom ID of
any atom in the molecule.
No parameter of this fix can be used with the {start/stop} keywords of
the "run"_run.html command. This fix is not invoked during "energy
minimization"_minimize.html.
:line
Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
functionally the same as the corresponding style without the suffix.
They have been optimized to run faster, depending on your available
hardware, as discussed in "Section_accelerate"_Section_accelerate.html
of the manual. The accelerated styles take the same arguments and
should produce the same results, except for round-off and precision
issues.
These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
USER-OMP and OPT packages, respectively. They are only enabled if
LAMMPS was built with those packages. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
You can specify the accelerated styles explicitly in your input script
by including their suffix, or you can use the "-suffix command-line
switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
use the "suffix"_suffix.html command in your input script.
See "Section_accelerate"_Section_accelerate.html of the manual for
more instructions on how to use the accelerated styles effectively.
:line
[Restrictions:]
The fix species currently only works with
-"pair_style reax/c"_pair_reax_c.html and it requires that the "pair_style
-reax/c"_pair_reax_c.html be invoked. This fix is part of the
+"pair_style reax/c"_pair_reaxc.html and it requires that the "pair_style
+reax/c"_pair_reaxc.html be invoked. This fix is part of the
USER-REAXC package. It is only enabled if LAMMPS was built with that
package. See the "Making LAMMPS"_Section_start.html#start_3 section
for more info.
It should be possible to extend it to other reactive pair_styles (such as
"rebo"_pair_airebo.html, "airebo"_pair_airebo.html,
"comb"_pair_comb.html, and "bop"_pair_bop.html), but this has not yet been done.
[Related commands:]
-"pair_style reax/c"_pair_reax_c.html, "fix
+"pair_style reax/c"_pair_reaxc.html, "fix
reax/bonds"_fix_reax_bonds.html
[Default:]
The default values for bond-order cutoffs are 0.3 for all I-J pairs. The
default element symbols are C, H, O, N. Position files are not written
by default.
diff --git a/doc/src/improper_cossq.txt b/doc/src/improper_cossq.txt
index 513f0b315..e238063a8 100644
--- a/doc/src/improper_cossq.txt
+++ b/doc/src/improper_cossq.txt
@@ -1,89 +1,86 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
improper_style cossq command :h3
improper_style cossq/omp command :h3
[Syntax:]
improper_style cossq :pre
[Examples:]
improper_style cossq
improper_coeff 1 4.0 0.0 :pre
[Description:]
The {cossq} improper style uses the potential
:c,image(Eqs/improper_cossq.jpg)
where x is the improper angle, x0 is its equilibrium value, and K is a
prefactor.
If the 4 atoms in an improper quadruplet (listed in the data file read
by the "read_data"_read_data.html command) are ordered I,J,K,L then X
is the angle between the plane of I,J,K and the plane of J,K,L.
Alternatively, you can think of atoms J,K,L as being in a plane, and
atom I above the plane, and X as a measure of how far out-of-plane I
is with respect to the other 3 atoms.
Note that defining 4 atoms to interact in this way, does not mean that
bonds necessarily exist between I-J, J-K, or K-L, as they would in a
linear dihedral. Normally, the bonds I-J, I-K, I-L would exist for an
improper to be defined between the 4 atoms.
The following coefficients must be defined for each improper type via
the "improper_coeff"_improper_coeff.html command as in the example
above, or in the data file or restart files read by the
"read_data"_read_data.html or "read_restart"_read_restart.html
commands:
-K (energy/radian^2)
+K (energy)
X0 (degrees) :ul
-X0 is specified in degrees, but LAMMPS converts it to radians
-internally; hence the units of K are in energy/radian^2.
-
:line
Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
functionally the same as the corresponding style without the suffix.
They have been optimized to run faster, depending on your available
hardware, as discussed in "Section 5"_Section_accelerate.html
of the manual. The accelerated styles take the same arguments and
should produce the same results, except for round-off and precision
issues.
These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
USER-OMP and OPT packages, respectively. They are only enabled if
LAMMPS was built with those packages. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
You can specify the accelerated styles explicitly in your input script
by including their suffix, or you can use the "-suffix command-line
switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
use the "suffix"_suffix.html command in your input script.
See "Section 5"_Section_accelerate.html of the manual for
more instructions on how to use the accelerated styles effectively.
:line
[Restrictions:]
This improper style can only be used if LAMMPS was built with the
USER-MISC package. See the "Making LAMMPS"_Section_start.html#start_3
section for more info on packages.
[Related commands:]
"improper_coeff"_improper_coeff.html
[Default:] none
diff --git a/doc/src/improper_ring.txt b/doc/src/improper_ring.txt
index 705b1cf74..cba59399e 100644
--- a/doc/src/improper_ring.txt
+++ b/doc/src/improper_ring.txt
@@ -1,96 +1,93 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
improper_style ring command :h3
improper_style ring/omp command :h3
[Syntax:]
improper_style ring :pre
[Examples:]
improper_style ring
improper_coeff 1 8000 70.5 :pre
[Description:]
The {ring} improper style uses the potential
:c,image(Eqs/improper_ring.jpg)
where K is a prefactor, theta is the angle formed by the atoms
specified by (i,j,k,l) indices and theta0 its equilibrium value.
If the 4 atoms in an improper quadruplet (listed in the data file read
by the "read_data"_read_data.html command) are ordered i,j,k,l then
theta_{ijl} is the angle between atoms i,j and l, theta_{ijk} is the
angle between atoms i,j and k, theta_{kjl} is the angle between atoms
j,k, and l.
The "ring" improper style implements the improper potential introduced
by Destree et al., in Equation (9) of "(Destree)"_#Destree. This
potential does not affect small amplitude vibrations but is used in an
ad-hoc way to prevent the onset of accidentally large amplitude
fluctuations leading to the occurrence of a planar conformation of the
three bonds i-j, j-k and j-l, an intermediate conformation toward the
chiral inversion of a methine carbon. In the "Impropers" section of
data file four atoms: i, j, k and l are specified with i,j and l lying
on the backbone of the chain and k specifying the chirality of j.
The following coefficients must be defined for each improper type via
the "improper_coeff"_improper_coeff.html command as in the example
above, or in the data file or restart files read by the
"read_data"_read_data.html or "read_restart"_read_restart.html
commands:
-K (energy/radian^2)
+K (energy)
theta0 (degrees) :ul
-theta0 is specified in degrees, but LAMMPS converts it to radians
-internally; hence the units of K are in energy/radian^2.
-
:line
Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
functionally the same as the corresponding style without the suffix.
They have been optimized to run faster, depending on your available
hardware, as discussed in "Section 5"_Section_accelerate.html
of the manual. The accelerated styles take the same arguments and
should produce the same results, except for round-off and precision
issues.
These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
USER-OMP and OPT packages, respectively. They are only enabled if
LAMMPS was built with those packages. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
You can specify the accelerated styles explicitly in your input script
by including their suffix, or you can use the "-suffix command-line
switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
use the "suffix"_suffix.html command in your input script.
See "Section 5"_Section_accelerate.html of the manual for
more instructions on how to use the accelerated styles effectively.
:line
[Restrictions:]
This improper style can only be used if LAMMPS was built with the
USER-MISC package. See the "Making LAMMPS"_Section_start.html#start_3
section for more info on packages.
[Related commands:]
"improper_coeff"_improper_coeff.html
:link(Destree)
[(Destree)] M. Destree, F. Laupretre, A. Lyulin, and J.-P. Ryckaert,
J Chem Phys, 112, 9632 (2000).
diff --git a/doc/src/lammps.book b/doc/src/lammps.book
index 6c68955bc..b2b42aa7e 100644
--- a/doc/src/lammps.book
+++ b/doc/src/lammps.book
@@ -1,643 +1,643 @@
#HTMLDOC 1.8.27
-t pdf14 -f "../Manual.pdf" --book --toclevels 4 --no-numbered --toctitle "Table of Contents" --title --textcolor #000000 --linkcolor #0000ff --linkstyle plain --bodycolor #ffffff --size Universal --left 1.00in --right 0.50in --top 0.50in --bottom 0.50in --header .t. --header1 ... --footer ..1 --nup 1 --tocheader .t. --tocfooter ..i --portrait --color --no-pscommands --no-xrxcomments --compression=1 --jpeg=0 --fontsize 11.0 --fontspacing 1.2 --headingfont helvetica --bodyfont times --headfootsize 11.0 --headfootfont helvetica --charset iso-8859-1 --links --embedfonts --pagemode document --pagelayout single --firstpage c1 --pageeffect none --pageduration 10 --effectduration 1.0 --no-encryption --permissions all --owner-password "" --user-password "" --browserwidth 680 --no-strict --no-overflow
Manual.html
Section_intro.html
Section_start.html
Section_commands.html
Section_packages.html
Section_accelerate.html
accelerate_gpu.html
accelerate_intel.html
accelerate_kokkos.html
accelerate_omp.html
accelerate_opt.html
Section_howto.html
Section_example.html
Section_perf.html
Section_tools.html
Section_modify.html
Section_python.html
Section_errors.html
Section_history.html
tutorial_drude.html
tutorial_github.html
tutorial_pylammps.html
body.html
manifolds.html
angle_coeff.html
angle_style.html
atom_modify.html
atom_style.html
balance.html
bond_coeff.html
bond_style.html
bond_write.html
boundary.html
box.html
change_box.html
clear.html
comm_modify.html
comm_style.html
compute.html
compute_modify.html
create_atoms.html
create_bonds.html
create_box.html
delete_atoms.html
delete_bonds.html
dielectric.html
dihedral_coeff.html
dihedral_style.html
dimension.html
displace_atoms.html
dump.html
dump_custom_vtk.html
dump_h5md.html
dump_image.html
dump_modify.html
dump_molfile.html
dump_nc.html
echo.html
fix.html
fix_modify.html
group.html
group2ndx.html
if.html
improper_coeff.html
improper_style.html
include.html
info.html
jump.html
kspace_modify.html
kspace_style.html
label.html
lattice.html
log.html
mass.html
min_modify.html
min_style.html
minimize.html
molecule.html
neb.html
neigh_modify.html
neighbor.html
newton.html
next.html
package.html
pair_coeff.html
pair_modify.html
pair_style.html
pair_write.html
partition.html
prd.html
print.html
processors.html
python.html
quit.html
read_data.html
read_dump.html
read_restart.html
region.html
replicate.html
rerun.html
reset_timestep.html
restart.html
run.html
run_style.html
set.html
shell.html
special_bonds.html
suffix.html
tad.html
temper.html
temper_grem.html
thermo.html
thermo_modify.html
thermo_style.html
timer.html
timestep.html
uncompute.html
undump.html
unfix.html
units.html
variable.html
velocity.html
write_coeff.html
write_data.html
write_dump.html
write_restart.html
fix_adapt.html
fix_adapt_fep.html
fix_addforce.html
fix_addtorque.html
fix_append_atoms.html
fix_atc.html
fix_atom_swap.html
fix_ave_atom.html
fix_ave_chunk.html
fix_ave_correlate.html
fix_ave_correlate_long.html
fix_ave_histo.html
fix_ave_time.html
fix_aveforce.html
fix_balance.html
fix_bond_break.html
fix_bond_create.html
fix_bond_swap.html
fix_box_relax.html
fix_cmap.html
fix_colvars.html
fix_controller.html
fix_deform.html
fix_deposit.html
fix_dpd_energy.html
fix_drag.html
fix_drude.html
fix_drude_transform.html
fix_dt_reset.html
fix_efield.html
fix_ehex.html
fix_enforce2d.html
fix_eos_cv.html
fix_eos_table.html
fix_eos_table_rx.html
fix_evaporate.html
fix_external.html
fix_filter_corotate.html
fix_flow_gauss.html
fix_freeze.html
fix_gcmc.html
fix_gld.html
fix_gle.html
fix_gravity.html
fix_grem.html
fix_halt.html
fix_heat.html
fix_imd.html
fix_indent.html
fix_ipi.html
fix_langevin.html
fix_langevin_drude.html
fix_langevin_eff.html
fix_lb_fluid.html
fix_lb_momentum.html
fix_lb_pc.html
fix_lb_rigid_pc_sphere.html
fix_lb_viscous.html
fix_lineforce.html
fix_manifoldforce.html
fix_meso.html
fix_meso_stationary.html
fix_momentum.html
fix_move.html
fix_mscg.html
fix_msst.html
fix_neb.html
fix_nh.html
fix_nh_eff.html
fix_nph_asphere.html
fix_nph_body.html
fix_nph_sphere.html
fix_nphug.html
fix_npt_asphere.html
fix_npt_body.html
fix_npt_sphere.html
fix_nve.html
fix_nve_asphere.html
fix_nve_asphere_noforce.html
fix_nve_body.html
fix_nve_dot.html
fix_nve_dotc_langevin.html
fix_nve_eff.html
fix_nve_limit.html
fix_nve_line.html
fix_nve_manifold_rattle.html
fix_nve_noforce.html
fix_nve_sphere.html
fix_nve_tri.html
fix_nvk.html
fix_nvt_asphere.html
fix_nvt_body.html
fix_nvt_manifold_rattle.html
fix_nvt_sllod.html
fix_nvt_sllod_eff.html
fix_nvt_sphere.html
fix_oneway.html
fix_orient.html
fix_phonon.html
fix_pimd.html
fix_planeforce.html
fix_poems.html
fix_pour.html
fix_press_berendsen.html
fix_print.html
fix_property_atom.html
fix_qbmsst.html
fix_qeq.html
fix_qeq_comb.html
fix_qeq_reax.html
fix_qmmm.html
fix_qtb.html
fix_reax_bonds.html
fix_reaxc_species.html
fix_recenter.html
fix_restrain.html
fix_rigid.html
fix_rx.html
fix_saed_vtk.html
fix_setforce.html
fix_shake.html
fix_shardlow.html
fix_smd.html
fix_smd_adjust_dt.html
fix_smd_integrate_tlsph.html
fix_smd_integrate_ulsph.html
fix_smd_move_triangulated_surface.html
fix_smd_setvel.html
fix_smd_wall_surface.html
fix_spring.html
fix_spring_chunk.html
fix_spring_rg.html
fix_spring_self.html
fix_srd.html
fix_store_force.html
fix_store_state.html
fix_temp_berendsen.html
fix_temp_csvr.html
fix_temp_rescale.html
fix_temp_rescale_eff.html
fix_tfmc.html
fix_thermal_conductivity.html
fix_ti_spring.html
fix_tmd.html
fix_ttm.html
fix_tune_kspace.html
fix_vector.html
fix_viscosity.html
fix_viscous.html
fix_wall.html
fix_wall_gran.html
fix_wall_gran_region.html
fix_wall_piston.html
fix_wall_reflect.html
fix_wall_region.html
fix_wall_srd.html
compute_ackland_atom.html
compute_angle.html
compute_angle_local.html
compute_angmom_chunk.html
compute_basal_atom.html
compute_body_local.html
compute_bond.html
compute_bond_local.html
compute_centro_atom.html
compute_chunk_atom.html
compute_cluster_atom.html
compute_cna_atom.html
compute_com.html
compute_com_chunk.html
compute_contact_atom.html
compute_coord_atom.html
compute_damage_atom.html
compute_dihedral.html
compute_dihedral_local.html
compute_dilatation_atom.html
compute_dipole_chunk.html
compute_displace_atom.html
compute_dpd.html
compute_dpd_atom.html
compute_erotate_asphere.html
compute_erotate_rigid.html
compute_erotate_sphere.html
compute_erotate_sphere_atom.html
compute_event_displace.html
compute_fep.html
compute_global_atom.html
compute_group_group.html
compute_gyration.html
compute_gyration_chunk.html
compute_heat_flux.html
compute_hexorder_atom.html
compute_improper.html
compute_improper_local.html
compute_inertia_chunk.html
compute_ke.html
compute_ke_atom.html
compute_ke_atom_eff.html
compute_ke_eff.html
compute_ke_rigid.html
compute_meso_e_atom.html
compute_meso_rho_atom.html
compute_meso_t_atom.html
compute_msd.html
compute_msd_chunk.html
compute_msd_nongauss.html
compute_omega_chunk.html
compute_orientorder_atom.html
compute_pair.html
compute_pair_local.html
compute_pe.html
compute_pe_atom.html
compute_plasticity_atom.html
compute_pressure.html
compute_property_atom.html
compute_property_chunk.html
compute_property_local.html
compute_rdf.html
compute_reduce.html
compute_rigid_local.html
compute_saed.html
compute_slice.html
compute_smd_contact_radius.html
compute_smd_damage.html
compute_smd_hourglass_error.html
compute_smd_internal_energy.html
compute_smd_plastic_strain.html
compute_smd_plastic_strain_rate.html
compute_smd_rho.html
compute_smd_tlsph_defgrad.html
compute_smd_tlsph_dt.html
compute_smd_tlsph_num_neighs.html
compute_smd_tlsph_shape.html
compute_smd_tlsph_strain.html
compute_smd_tlsph_strain_rate.html
compute_smd_tlsph_stress.html
compute_smd_triangle_mesh_vertices.html
compute_smd_ulsph_num_neighs.html
compute_smd_ulsph_strain.html
compute_smd_ulsph_strain_rate.html
compute_smd_ulsph_stress.html
compute_smd_vol.html
compute_sna_atom.html
compute_stress_atom.html
compute_tally.html
compute_temp.html
compute_temp_asphere.html
compute_temp_body.html
compute_temp_chunk.html
compute_temp_com.html
compute_temp_cs.html
compute_temp_deform.html
compute_temp_deform_eff.html
compute_temp_drude.html
compute_temp_eff.html
compute_temp_partial.html
compute_temp_profile.html
compute_temp_ramp.html
compute_temp_region.html
compute_temp_region_eff.html
compute_temp_rotate.html
compute_temp_sphere.html
compute_ti.html
compute_torque_chunk.html
compute_vacf.html
compute_vcm_chunk.html
compute_voronoi_atom.html
compute_xrd.html
pair_adp.html
pair_agni.html
pair_airebo.html
pair_awpmd.html
pair_beck.html
pair_body.html
pair_bop.html
pair_born.html
pair_brownian.html
pair_buck.html
pair_buck_long.html
pair_charmm.html
pair_class2.html
pair_colloid.html
pair_comb.html
pair_coul.html
pair_coul_diel.html
pair_cs.html
pair_dipole.html
pair_dpd.html
pair_dpd_fdt.html
pair_dsmc.html
pair_eam.html
pair_edip.html
pair_eff.html
pair_eim.html
pair_exp6_rx.html
pair_gauss.html
pair_gayberne.html
pair_gran.html
pair_gromacs.html
pair_hbond_dreiding.html
pair_hybrid.html
pair_kim.html
pair_kolmogorov_crespi_z.html
pair_lcbop.html
pair_line_lj.html
pair_list.html
pair_lj.html
pair_lj96.html
pair_lj_cubic.html
pair_lj_expand.html
pair_lj_long.html
pair_lj_sf.html
pair_lj_smooth.html
pair_lj_smooth_linear.html
pair_lj_soft.html
pair_lubricate.html
pair_lubricateU.html
pair_mdf.html
pair_meam.html
pair_meam_spline.html
pair_meam_sw_spline.html
pair_mgpt.html
pair_mie.html
pair_momb.html
pair_morse.html
pair_multi_lucy.html
pair_multi_lucy_rx.html
pair_nb3b_harmonic.html
pair_nm.html
pair_none.html
pair_oxdna.html
pair_oxdna2.html
pair_peri.html
pair_polymorphic.html
pair_quip.html
pair_reax.html
-pair_reax_c.html
+pair_reaxc.html
pair_resquared.html
pair_sdk.html
pair_smd_hertz.html
pair_smd_tlsph.html
pair_smd_triangulated_surface.html
pair_smd_ulsph.html
pair_smtbq.html
pair_snap.html
pair_soft.html
pair_sph_heatconduction.html
pair_sph_idealgas.html
pair_sph_lj.html
pair_sph_rhosum.html
pair_sph_taitwater.html
pair_sph_taitwater_morris.html
pair_srp.html
pair_sw.html
pair_table.html
pair_table_rx.html
pair_tersoff.html
pair_tersoff_mod.html
pair_tersoff_zbl.html
pair_thole.html
pair_tri_lj.html
pair_vashishta.html
pair_yukawa.html
pair_yukawa_colloid.html
pair_zbl.html
pair_zero.html
bond_class2.html
bond_fene.html
bond_fene_expand.html
bond_oxdna.html
bond_harmonic.html
bond_harmonic_shift.html
bond_harmonic_shift_cut.html
bond_hybrid.html
bond_morse.html
bond_none.html
bond_nonlinear.html
bond_quartic.html
bond_table.html
bond_zero.html
angle_charmm.html
angle_class2.html
angle_cosine.html
angle_cosine_delta.html
angle_cosine_periodic.html
angle_cosine_shift.html
angle_cosine_shift_exp.html
angle_cosine_squared.html
angle_dipole.html
angle_fourier.html
angle_fourier_simple.html
angle_harmonic.html
angle_hybrid.html
angle_none.html
angle_quartic.html
angle_sdk.html
angle_table.html
angle_zero.html
dihedral_charmm.html
dihedral_class2.html
dihedral_cosine_shift_exp.html
dihedral_fourier.html
dihedral_harmonic.html
dihedral_helix.html
dihedral_hybrid.html
dihedral_multi_harmonic.html
dihedral_nharmonic.html
dihedral_none.html
dihedral_opls.html
dihedral_quadratic.html
dihedral_spherical.html
dihedral_table.html
dihedral_zero.html
improper_class2.html
improper_cossq.html
improper_cvff.html
improper_distance.html
improper_fourier.html
improper_harmonic.html
improper_hybrid.html
improper_none.html
improper_ring.html
improper_umbrella.html
improper_zero.html
USER/atc/man_add_molecule.html
USER/atc/man_add_species.html
USER/atc/man_atom_element_map.html
USER/atc/man_atom_weight.html
USER/atc/man_atomic_charge.html
USER/atc/man_boundary.html
USER/atc/man_boundary_dynamics.html
USER/atc/man_boundary_faceset.html
USER/atc/man_boundary_integral.html
USER/atc/man_consistent_fe_initialization.html
USER/atc/man_contour_integral.html
USER/atc/man_control.html
USER/atc/man_control_momentum.html
USER/atc/man_control_thermal.html
USER/atc/man_control_thermal_correction_max_iterations.html
USER/atc/man_decomposition.html
USER/atc/man_electron_integration.html
USER/atc/man_equilibrium_start.html
USER/atc/man_extrinsic_exchange.html
USER/atc/man_fe_md_boundary.html
USER/atc/man_fem_mesh.html
USER/atc/man_filter_scale.html
USER/atc/man_filter_type.html
USER/atc/man_fix_atc.html
USER/atc/man_fix_flux.html
USER/atc/man_fix_nodes.html
USER/atc/man_hardy_computes.html
USER/atc/man_hardy_fields.html
USER/atc/man_hardy_gradients.html
USER/atc/man_hardy_kernel.html
USER/atc/man_hardy_on_the_fly.html
USER/atc/man_hardy_rates.html
USER/atc/man_initial.html
USER/atc/man_internal_atom_integrate.html
USER/atc/man_internal_element_set.html
USER/atc/man_internal_quadrature.html
USER/atc/man_kernel_function.html
USER/atc/man_localized_lambda.html
USER/atc/man_lumped_lambda_solve.html
USER/atc/man_mask_direction.html
USER/atc/man_mass_matrix.html
USER/atc/man_material.html
USER/atc/man_mesh_add_to_nodeset.html
USER/atc/man_mesh_create.html
USER/atc/man_mesh_create_elementset.html
USER/atc/man_mesh_create_faceset_box.html
USER/atc/man_mesh_create_faceset_plane.html
USER/atc/man_mesh_create_nodeset.html
USER/atc/man_mesh_delete_elements.html
USER/atc/man_mesh_nodeset_to_elementset.html
USER/atc/man_mesh_output.html
USER/atc/man_mesh_quadrature.html
USER/atc/man_mesh_read.html
USER/atc/man_mesh_write.html
USER/atc/man_momentum_time_integration.html
USER/atc/man_output.html
USER/atc/man_output_elementset.html
USER/atc/man_output_nodeset.html
USER/atc/man_pair_interactions.html
USER/atc/man_poisson_solver.html
USER/atc/man_read_restart.html
USER/atc/man_remove_molecule.html
USER/atc/man_remove_source.html
USER/atc/man_remove_species.html
USER/atc/man_reset_atomic_reference_positions.html
USER/atc/man_reset_time.html
USER/atc/man_sample_frequency.html
USER/atc/man_set.html
USER/atc/man_source.html
USER/atc/man_source_integration.html
USER/atc/man_temperature_definition.html
USER/atc/man_thermal_time_integration.html
USER/atc/man_time_filter.html
USER/atc/man_track_displacement.html
USER/atc/man_unfix_flux.html
USER/atc/man_unfix_nodes.html
USER/atc/man_write_atom_weights.html
USER/atc/man_write_restart.html
diff --git a/doc/src/pair_hybrid.txt b/doc/src/pair_hybrid.txt
index 7ef54e7f0..5166fe1f8 100644
--- a/doc/src/pair_hybrid.txt
+++ b/doc/src/pair_hybrid.txt
@@ -1,382 +1,388 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
pair_style hybrid command :h3
pair_style hybrid/omp command :h3
pair_style hybrid/overlay command :h3
pair_style hybrid/overlay/omp command :h3
[Syntax:]
pair_style hybrid style1 args style2 args ...
pair_style hybrid/overlay style1 args style2 args ... :pre
style1,style2 = list of one or more pair styles and their arguments :ul
[Examples:]
pair_style hybrid lj/cut/coul/cut 10.0 eam lj/cut 5.0
pair_coeff 1*2 1*2 eam niu3
pair_coeff 3 3 lj/cut/coul/cut 1.0 1.0
pair_coeff 1*2 3 lj/cut 0.5 1.2 :pre
pair_style hybrid/overlay lj/cut 2.5 coul/long 2.0
pair_coeff * * lj/cut 1.0 1.0
pair_coeff * * coul/long :pre
[Description:]
The {hybrid} and {hybrid/overlay} styles enable the use of multiple
pair styles in one simulation. With the {hybrid} style, exactly one
pair style is assigned to each pair of atom types. With the
{hybrid/overlay} style, one or more pair styles can be assigned to
each pair of atom types. The assignment of pair styles to type pairs
is made via the "pair_coeff"_pair_coeff.html command.
Here are two examples of hybrid simulations. The {hybrid} style could
be used for a simulation of a metal droplet on a LJ surface. The
metal atoms interact with each other via an {eam} potential, the
surface atoms interact with each other via a {lj/cut} potential, and
the metal/surface interaction is also computed via a {lj/cut}
potential. The {hybrid/overlay} style could be used as in the 2nd
example above, where multiple potentials are superposed in an additive
fashion to compute the interaction between atoms. In this example,
using {lj/cut} and {coul/long} together gives the same result as if
the {lj/cut/coul/long} potential were used by itself. In this case,
it would be more efficient to use the single combined potential, but
in general any combination of pair potentials can be used together in
to produce an interaction that is not encoded in any single pair_style
file, e.g. adding Coulombic forces between granular particles.
All pair styles that will be used are listed as "sub-styles" following
the {hybrid} or {hybrid/overlay} keyword, in any order. Each
sub-style's name is followed by its usual arguments, as illustrated in
the example above. See the doc pages of individual pair styles for a
listing and explanation of the appropriate arguments.
Note that an individual pair style can be used multiple times as a
sub-style. For efficiency this should only be done if your model
requires it. E.g. if you have different regions of Si and C atoms and
wish to use a Tersoff potential for pure Si for one set of atoms, and
a Tersoff potential for pure C for the other set (presumably with some
3rd potential for Si-C interactions), then the sub-style {tersoff}
could be listed twice. But if you just want to use a Lennard-Jones or
other pairwise potential for several different atom type pairs in your
model, then you should just list the sub-style once and use the
pair_coeff command to assign parameters for the different type pairs.
NOTE: There are two exceptions to this option to list an individual
pair style multiple times. The first is for pair styles implemented
as Fortran libraries: "pair_style meam"_pair_meam.html and "pair_style
-reax"_pair_reax.html ("pair_style reax/c"_pair_reax_c.html is OK).
+reax"_pair_reax.html ("pair_style reax/c"_pair_reaxc.html is OK).
This is because unlike a C++ class, they can not be instantiated
multiple times, due to the manner in which they were coded in Fortran.
The second is for GPU-enabled pair styles in the GPU package. This is
b/c the GPU package also currently assumes that only one instance of a
pair style is being used.
In the pair_coeff commands, the name of a pair style must be added
after the I,J type specification, with the remaining coefficients
being those appropriate to that style. If the pair style is used
multiple times in the pair_style command, then an additional numeric
argument must also be specified which is a number from 1 to M where M
is the number of times the sub-style was listed in the pair style
command. The extra number indicates which instance of the sub-style
these coefficients apply to.
For example, consider a simulation with 3 atom types: types 1 and 2
are Ni atoms, type 3 are LJ atoms with charges. The following
commands would set up a hybrid simulation:
pair_style hybrid eam/alloy lj/cut/coul/cut 10.0 lj/cut 8.0
pair_coeff * * eam/alloy nialhjea Ni Ni NULL
pair_coeff 3 3 lj/cut/coul/cut 1.0 1.0
pair_coeff 1*2 3 lj/cut 0.8 1.3 :pre
As an example of using the same pair style multiple times, consider a
simulation with 2 atom types. Type 1 is Si, type 2 is C. The
following commands would model the Si atoms with Tersoff, the C atoms
with Tersoff, and the cross-interactions with Lennard-Jones:
pair_style hybrid lj/cut 2.5 tersoff tersoff
pair_coeff * * tersoff 1 Si.tersoff Si NULL
pair_coeff * * tersoff 2 C.tersoff NULL C
pair_coeff 1 2 lj/cut 1.0 1.5 :pre
If pair coefficients are specified in the data file read via the
"read_data"_read_data.html command, then the same rule applies.
E.g. "eam/alloy" or "lj/cut" must be added after the atom type, for
each line in the "Pair Coeffs" section, e.g.
Pair Coeffs :pre
1 lj/cut/coul/cut 1.0 1.0
... :pre
Note that the pair_coeff command for some potentials such as
"pair_style eam/alloy"_pair_eam.html includes a mapping specification
of elements to all atom types, which in the hybrid case, can include
atom types not assigned to the {eam/alloy} potential. The NULL
keyword is used by many such potentials (eam/alloy, Tersoff, AIREBO,
etc), to denote an atom type that will be assigned to a different
sub-style.
For the {hybrid} style, each atom type pair I,J is assigned to exactly
one sub-style. Just as with a simulation using a single pair style,
if you specify the same atom type pair in a second pair_coeff command,
the previous assignment will be overwritten.
For the {hybrid/overlay} style, each atom type pair I,J can be
assigned to one or more sub-styles. If you specify the same atom type
pair in a second pair_coeff command with a new sub-style, then the
second sub-style is added to the list of potentials that will be
calculated for two interacting atoms of those types. If you specify
the same atom type pair in a second pair_coeff command with a
sub-style that has already been defined for that pair of atoms, then
the new pair coefficients simply override the previous ones, as in the
normal usage of the pair_coeff command. E.g. these two sets of
commands are the same:
pair_style lj/cut 2.5
pair_coeff * * 1.0 1.0
pair_coeff 2 2 1.5 0.8 :pre
pair_style hybrid/overlay lj/cut 2.5
pair_coeff * * lj/cut 1.0 1.0
pair_coeff 2 2 lj/cut 1.5 0.8 :pre
Coefficients must be defined for each pair of atoms types via the
"pair_coeff"_pair_coeff.html command as described above, or in the
data file or restart files read by the "read_data"_read_data.html or
"read_restart"_read_restart.html commands, or by mixing as described
below.
For both the {hybrid} and {hybrid/overlay} styles, every atom type
pair I,J (where I <= J) must be assigned to at least one sub-style via
the "pair_coeff"_pair_coeff.html command as in the examples above, or
in the data file read by the "read_data"_read_data.html, or by mixing
as described below.
If you want there to be no interactions between a particular pair of
atom types, you have 3 choices. You can assign the type pair to some
sub-style and use the "neigh_modify exclude type"_neigh_modify.html
command. You can assign it to some sub-style and set the coefficients
so that there is effectively no interaction (e.g. epsilon = 0.0 in a
LJ potential). Or, for {hybrid} and {hybrid/overlay} simulations, you
can use this form of the pair_coeff command in your input script:
pair_coeff 2 3 none :pre
or this form in the "Pair Coeffs" section of the data file:
3 none :pre
If an assignment to {none} is made in a simulation with the
{hybrid/overlay} pair style, it wipes out all previous assignments of
that atom type pair to sub-styles.
Note that you may need to use an "atom_style"_atom_style.html hybrid
command in your input script, if atoms in the simulation will need
attributes from several atom styles, due to using multiple pair
potentials.
:line
Different force fields (e.g. CHARMM vs AMBER) may have different rules
for applying weightings that change the strength of pairwise
interactions between pairs of atoms that are also 1-2, 1-3, and 1-4
neighbors in the molecular bond topology, as normally set by the
"special_bonds"_special_bonds.html command. Different weights can be
assigned to different pair hybrid sub-styles via the "pair_modify
special"_pair_modify.html command. This allows multiple force fields
to be used in a model of a hybrid system, however, there is no consistent
approach to determine parameters automatically for the interactions
between the two force fields, this is only recommended when particles
described by the different force fields do not mix.
Here is an example for mixing CHARMM and AMBER: The global {amber}
setting sets the 1-4 interactions to non-zero scaling factors and
then overrides them with 0.0 only for CHARMM:
special_bonds amber
pair_hybrid lj/charmm/coul/long 8.0 10.0 lj/cut/coul/long 10.0
pair_modify pair lj/charmm/coul/long special lj/coul 0.0 0.0 0.0 :pre
The this input achieves the same effect:
special_bonds 0.0 0.0 0.1
pair_hybrid lj/charmm/coul/long 8.0 10.0 lj/cut/coul/long 10.0
pair_modify pair lj/cut/coul/long special lj 0.0 0.0 0.5
pair_modify pair lj/cut/coul/long special coul 0.0 0.0 0.83333333
pair_modify pair lj/charmm/coul/long special lj/coul 0.0 0.0 0.0 :pre
Here is an example for mixing Tersoff with OPLS/AA based on
a data file that defines bonds for all atoms where for the
Tersoff part of the system the force constants for the bonded
interactions have been set to 0. Note the global settings are
effectively {lj/coul 0.0 0.0 0.5} as required for OPLS/AA:
special_bonds lj/coul 1e-20 1e-20 0.5
pair_hybrid tersoff lj/cut/coul/long 12.0
pair_modify pair tersoff special lj/coul 1.0 1.0 1.0 :pre
+For use with the various "compute */tally"_compute_tally.html
+computes, the "pair_modify compute/tally"_pair_modify.html
+command can be used to selectively turn off processing of
+the compute tally styles, for example, if those pair styles
+(e.g. manybody styles) do not support this feature.
+
See the "pair_modify"_pair_modify.html doc page for details on
the specific syntax, requirements and restrictions.
:line
The potential energy contribution to the overall system due to an
individual sub-style can be accessed and output via the "compute
pair"_compute_pair.html command.
:line
NOTE: Several of the potentials defined via the pair_style command in
LAMMPS are really many-body potentials, such as Tersoff, AIREBO, MEAM,
ReaxFF, etc. The way to think about using these potentials in a
hybrid setting is as follows.
A subset of atom types is assigned to the many-body potential with a
single "pair_coeff"_pair_coeff.html command, using "* *" to include
all types and the NULL keywords described above to exclude specific
types not assigned to that potential. If types 1,3,4 were assigned in
that way (but not type 2), this means that all many-body interactions
between all atoms of types 1,3,4 will be computed by that potential.
Pair_style hybrid allows interactions between type pairs 2-2, 1-2,
2-3, 2-4 to be specified for computation by other pair styles. You
could even add a second interaction for 1-1 to be computed by another
pair style, assuming pair_style hybrid/overlay is used.
But you should not, as a general rule, attempt to exclude the
many-body interactions for some subset of the type pairs within the
set of 1,3,4 interactions, e.g. exclude 1-1 or 1-3 interactions. That
is not conceptually well-defined for many-body interactions, since the
potential will typically calculate energies and foces for small groups
of atoms, e.g. 3 or 4 atoms, using the neighbor lists of the atoms to
find the additional atoms in the group. It is typically non-physical
to think of excluding an interaction between a particular pair of
atoms when the potential computes 3-body or 4-body interactions.
However, you can still use the pair_coeff none setting or the
"neigh_modify exclude"_neigh_modify.html command to exclude certain
type pairs from the neighbor list that will be passed to a manybody
sub-style. This will alter the calculations made by a many-body
potential, since it builds its list of 3-body, 4-body, etc
interactions from the pair list. You will need to think carefully as
to whether it produces a physically meaningful result for your model.
For example, imagine you have two atom types in your model, type 1 for
atoms in one surface, and type 2 for atoms in the other, and you wish
to use a Tersoff potential to compute interactions within each
surface, but not between surfaces. Then either of these two command
sequences would implement that model:
pair_style hybrid tersoff
pair_coeff * * tersoff SiC.tersoff C C
pair_coeff 1 2 none :pre
pair_style tersoff
pair_coeff * * SiC.tersoff C C
neigh_modify exclude type 1 2 :pre
Either way, only neighbor lists with 1-1 or 2-2 interactions would be
passed to the Tersoff potential, which means it would compute no
3-body interactions containing both type 1 and 2 atoms.
Here is another example, using hybrid/overlay, to use 2 many-body
potentials together, in an overlapping manner. Imagine you have CNT
(C atoms) on a Si surface. You want to use Tersoff for Si/Si and Si/C
interactions, and AIREBO for C/C interactions. Si atoms are type 1; C
atoms are type 2. Something like this will work:
pair_style hybrid/overlay tersoff airebo 3.0
pair_coeff * * tersoff SiC.tersoff.custom Si C
pair_coeff * * airebo CH.airebo NULL C :pre
Note that to prevent the Tersoff potential from computing C/C
interactions, you would need to modify the SiC.tersoff file to turn
off C/C interaction, i.e. by setting the appropriate coefficients to
0.0.
:line
Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
functionally the same as the corresponding style without the suffix.
They have been optimized to run faster, depending on your available
hardware, as discussed in "Section 5"_Section_accelerate.html
of the manual.
Since the {hybrid} and {hybrid/overlay} styles delegate computation to
the individual sub-styles, the suffix versions of the {hybrid} and
{hybrid/overlay} styles are used to propagate the corresponding suffix
to all sub-styles, if those versions exist. Otherwise the
non-accelerated version will be used.
The individual accelerated sub-styles are part of the GPU,
USER-OMP and OPT packages, respectively. They are only enabled if
LAMMPS was built with those packages. See the
"Making LAMMPS"_Section_start.html#start_3 section for more info.
You can specify the accelerated styles explicitly in your input script
by including their suffix, or you can use the "-suffix command-line
switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
use the "suffix"_suffix.html command in your input script.
See "Section 5"_Section_accelerate.html of the manual for
more instructions on how to use the accelerated styles effectively.
:line
[Mixing, shift, table, tail correction, restart, rRESPA info]:
Any pair potential settings made via the
"pair_modify"_pair_modify.html command are passed along to all
sub-styles of the hybrid potential.
For atom type pairs I,J and I != J, if the sub-style assigned to I,I
and J,J is the same, and if the sub-style allows for mixing, then the
coefficients for I,J can be mixed. This means you do not have to
specify a pair_coeff command for I,J since the I,J type pair will be
assigned automatically to the sub-style defined for both I,I and J,J
and its coefficients generated by the mixing rule used by that
sub-style. For the {hybrid/overlay} style, there is an additional
requirement that both the I,I and J,J pairs are assigned to a single
sub-style. See the "pair_modify" command for details of mixing rules.
See the See the doc page for the sub-style to see if allows for
mixing.
The hybrid pair styles supports the "pair_modify"_pair_modify.html
shift, table, and tail options for an I,J pair interaction, if the
associated sub-style supports it.
For the hybrid pair styles, the list of sub-styles and their
respective settings are written to "binary restart
files"_restart.html, so a "pair_style"_pair_style.html command does
not need to specified in an input script that reads a restart file.
However, the coefficient information is not stored in the restart
file. Thus, pair_coeff commands need to be re-specified in the
restart input script.
These pair styles support the use of the {inner}, {middle}, and
{outer} keywords of the "run_style respa"_run_style.html command, if
their sub-styles do.
[Restrictions:]
When using a long-range Coulombic solver (via the
"kspace_style"_kspace_style.html command) with a hybrid pair_style,
one or more sub-styles will be of the "long" variety,
e.g. {lj/cut/coul/long} or {buck/coul/long}. You must insure that the
short-range Coulombic cutoff used by each of these long pair styles is
the same or else LAMMPS will generate an error.
[Related commands:]
"pair_coeff"_pair_coeff.html
[Default:] none
diff --git a/doc/src/pair_modify.txt b/doc/src/pair_modify.txt
index 03fb80ae5..34dbb5bc3 100644
--- a/doc/src/pair_modify.txt
+++ b/doc/src/pair_modify.txt
@@ -1,260 +1,275 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
pair_modify command :h3
[Syntax:]
pair_modify keyword values ... :pre
one or more keyword/value pairs may be listed :ulb,l
keyword = {pair} or {shift} or {mix} or {table} or {table/disp} or {tabinner} or {tabinner/disp} or {tail} or {compute} :l
{pair} values = sub-style N {special} which wt1 wt2 wt3
+ or sub-style N {compute/tally} flag
sub-style = sub-style of "pair hybrid"_pair_hybrid.html
N = which instance of sub-style (only if sub-style is used multiple times)
- {special} which wt1 wt2 wt3 = override {special_bonds} settings (optional)
- which = {lj/coul} or {lj} or {coul}
- w1,w2,w3 = 1-2, 1-3, and 1-4 weights from 0.0 to 1.0 inclusive
+ {special} which wt1 wt2 wt3 = override {special_bonds} settings (optional)
+ which = {lj/coul} or {lj} or {coul}
+ w1,w2,w3 = 1-2, 1-3, and 1-4 weights from 0.0 to 1.0 inclusive
+ {compute/tally} flag = {yes} or {no}
{mix} value = {geometric} or {arithmetic} or {sixthpower}
{shift} value = {yes} or {no}
{table} value = N
2^N = # of values in table
{table/disp} value = N
2^N = # of values in table
{tabinner} value = cutoff
cutoff = inner cutoff at which to begin table (distance units)
{tabinner/disp} value = cutoff
cutoff = inner cutoff at which to begin table (distance units)
{tail} value = {yes} or {no}
{compute} value = {yes} or {no} :pre
:ule
[Examples:]
pair_modify shift yes mix geometric
pair_modify tail yes
pair_modify table 12
pair_modify pair lj/cut compute no
+pair_modify pair tersoff compute/tally no
pair_modify pair lj/cut/coul/long 1 special lj/coul 0.0 0.0 0.0 :pre
[Description:]
Modify the parameters of the currently defined pair style. Not all
parameters are relevant to all pair styles.
If used, the {pair} keyword must appear first in the list of keywords.
It can only be used with the "hybrid and
hybrid/overlay"_pair_hybrid.html pair styles. It means that all the
following parameters will only be modified for the specified
sub-style. If the sub-style is defined multiple times, then an
additional numeric argument {N} must also be specified, which is a
number from 1 to M where M is the number of times the sub-style was
listed in the "pair_style hybrid"_pair_hybrid.html command. The extra
number indicates which instance of the sub-style the remaining
keywords will be applied to. Note that if the {pair} keyword is not
used, and the pair style is {hybrid} or {hybrid/overlay}, then all the
specified keywords will be applied to all sub-styles.
-The {special} keyword can only be used in conjunction with the {pair}
-keyword and must directly follow it. It allows to override the
+The {special} and {compute/tally} keywords can [only] be used in
+conjunction with the {pair} keyword and must directly follow it.
+{special} allows to override the
"special_bonds"_special_bonds.html settings for the specified sub-style.
+{compute/tally} allows to disable or enable registering
+"compute */tally"_compute_tally.html computes for a given sub-style.
More details are given below.
The {mix} keyword affects pair coefficients for interactions between
atoms of type I and J, when I != J and the coefficients are not
explicitly set in the input script. Note that coefficients for I = J
must be set explicitly, either in the input script via the
"pair_coeff" command or in the "Pair Coeffs" section of the "data
file"_read_data.html. For some pair styles it is not necessary to
specify coefficients when I != J, since a "mixing" rule will create
them from the I,I and J,J settings. The pair_modify {mix} value
determines what formulas are used to compute the mixed coefficients.
In each case, the cutoff distance is mixed the same way as sigma.
Note that not all pair styles support mixing. Also, some mix options
are not available for certain pair styles. See the doc page for
individual pair styles for those restrictions. Note also that the
"pair_coeff"_pair_coeff.html command also can be to directly set
coefficients for a specific I != J pairing, in which case no mixing is
performed.
mix {geometric}
epsilon_ij = sqrt(epsilon_i * epsilon_j)
sigma_ij = sqrt(sigma_i * sigma_j) :pre
mix {arithmetic}
epsilon_ij = sqrt(epsilon_i * epsilon_j)
sigma_ij = (sigma_i + sigma_j) / 2 :pre
mix {sixthpower}
epsilon_ij = (2 * sqrt(epsilon_i*epsilon_j) * sigma_i^3 * sigma_j^3) /
(sigma_i^6 + sigma_j^6)
sigma_ij = ((sigma_i**6 + sigma_j**6) / 2) ^ (1/6) :pre
The {shift} keyword determines whether a Lennard-Jones potential is
shifted at its cutoff to 0.0. If so, this adds an energy term to each
pairwise interaction which will be included in the thermodynamic
output, but does not affect pair forces or atom trajectories. See the
doc page for individual pair styles to see which ones support this
option.
The {table} and {table/disp} keywords apply to pair styles with a
long-range Coulombic term or long-range dispersion term respectively;
see the doc page for individual styles to see which potentials support
these options. If N is non-zero, a table of length 2^N is
pre-computed for forces and energies, which can shrink their
computational cost by up to a factor of 2. The table is indexed via a
bit-mapping technique "(Wolff)"_#Wolff1 and a linear interpolation is
performed between adjacent table values. In our experiments with
different table styles (lookup, linear, spline), this method typically
gave the best performance in terms of speed and accuracy.
The choice of table length is a tradeoff in accuracy versus speed. A
larger N yields more accurate force computations, but requires more
memory which can slow down the computation due to cache misses. A
reasonable value of N is between 8 and 16. The default value of 12
(table of length 4096) gives approximately the same accuracy as the
no-table (N = 0) option. For N = 0, forces and energies are computed
directly, using a polynomial fit for the needed erfc() function
evaluation, which is what earlier versions of LAMMPS did. Values
greater than 16 typically slow down the simulation and will not
improve accuracy; values from 1 to 8 give unreliable results.
The {tabinner} and {tabinner/disp} keywords set an inner cutoff above
which the pairwise computation is done by table lookup (if tables are
invoked), for the corresponding Coulombic and dispersion tables
discussed with the {table} and {table/disp} keywords. The smaller the
cutoff is set, the less accurate the table becomes (for a given number
of table values), which can require use of larger tables. The default
cutoff value is sqrt(2.0) distance units which means nearly all
pairwise interactions are computed via table lookup for simulations
with "real" units, but some close pairs may be computed directly
(non-table) for simulations with "lj" units.
When the {tail} keyword is set to {yes}, certain pair styles will add
a long-range VanderWaals tail "correction" to the energy and pressure.
These corrections are bookkeeping terms which do not affect dynamics,
unless a constant-pressure simulation is being performed. See the doc
page for individual styles to see which support this option. These
corrections are included in the calculation and printing of
thermodynamic quantities (see the "thermo_style"_thermo_style.html
command). Their effect will also be included in constant NPT or NPH
simulations where the pressure influences the simulation box
dimensions (e.g. the "fix npt"_fix_nh.html and "fix nph"_fix_nh.html
commands). The formulas used for the long-range corrections come from
equation 5 of "(Sun)"_#Sun.
NOTE: The tail correction terms are computed at the beginning of each
run, using the current atom counts of each atom type. If atoms are
deleted (or lost) or created during a simulation, e.g. via the "fix
gcmc"_fix_gcmc.html command, the correction factors are not
re-computed. If you expect the counts to change dramatically, you can
break a run into a series of shorter runs so that the correction
factors are re-computed more frequently.
Several additional assumptions are inherent in using tail corrections,
including the following:
The simulated system is a 3d bulk homogeneous liquid. This option
should not be used for systems that are non-liquid, 2d, have a slab
geometry (only 2d periodic), or inhomogeneous. :ulb,l
G(r), the radial distribution function (rdf), is unity beyond the
cutoff, so a fairly large cutoff should be used (i.e. 2.5 sigma for an
LJ fluid), and it is probably a good idea to verify this assumption by
checking the rdf. The rdf is not exactly unity beyond the cutoff for
each pair of interaction types, so the tail correction is necessarily
an approximation. :l
The tail corrections are computed at the beginning of each simulation
run. If the number of atoms changes during the run, e.g. due to atoms
leaving the simulation domain, or use of the "fix gcmc"_fix_gcmc.html
command, then the corrections are not updates to relect the changed
atom count. If this is a large effect in your simulation, you should
break the long run into several short runs, so that the correction
factors are re-computed multiple times.
Thermophysical properties obtained from calculations with this option
enabled will not be thermodynamically consistent with the truncated
force-field that was used. In other words, atoms do not feel any LJ
pair interactions beyond the cutoff, but the energy and pressure
reported by the simulation include an estimated contribution from
those interactions. :l
:ule
The {compute} keyword allows pairwise computations to be turned off,
even though a "pair_style"_pair_style.html is defined. This is not
useful for running a real simulation, but can be useful for debugging
purposes or for performing a "rerun"_rerun.html simulation, when you
only wish to compute partial forces that do not include the pairwise
contribution.
Two examples are as follows. First, this option allows you to perform
a simulation with "pair_style hybrid"_pair_hybrid.html with only a
subset of the hybrid sub-styles enabled. Second, this option allows
you to perform a simulation with only long-range interactions but no
short-range pairwise interactions. Doing this by simply not defining
a pair style will not work, because the
"kspace_style"_kspace_style.html command requires a Kspace-compatible
pair style be defined.
:line
The {special} keyword allows to override the 1-2, 1-3, and 1-4
exclusion settings for individual sub-styles of a
"hybrid pair style"_pair_hybrid.html. It requires 4 arguments similar
to the "special_bonds"_special_bonds.html command, {which} and
wt1,wt2,wt3. The {which} argument can be {lj} to change the
Lennard-Jones settings, {coul} to change the Coulombic settings,
or {lj/coul} to change both to the same set of 3 values. The wt1,wt2,wt3
values are numeric weights from 0.0 to 1.0 inclusive, for the 1-2,
1-3, and 1-4 bond topology neighbors, respectively. The {special}
keyword can only be used in conjunction with the {pair} keyword
and has to directly follow it.
NOTE: The global settings specified by the
"special_bonds"_special_bonds.html command affect the construction of
neighbor lists. Weights of 0.0 (for 1-2, 1-3, or 1-4 neighbors)
exclude those pairs from the neighbor list entirely. Weights of 1.0
store the neighbor with no weighting applied. Thus only global values
different from exactly 0.0 or 1.0 can be overridden and an error is
generated if the requested setting is not compatible with the global
setting. Substituting 1.0e-10 for 0.0 and 0.9999999999 for 1.0 is
usually a sufficient workaround in this case without causing a
significant error.
+The {compute/tally} keyword takes exactly 1 argument ({no} or {yes}),
+and allows to selectively disable or enable processing of the various
+"compute */tally"_compute_tally.html styles for a given
+"pair hybrid or hybrid/overlay"_pair_hybrid.html sub-style.
+
+NOTE: Any "pair_modify pair compute/tally" command must be issued
+[before] the corresponding compute style is defined.
+
:line
[Restrictions:] none
You cannot use {shift} yes with {tail} yes, since those are
conflicting options. You cannot use {tail} yes with 2d simulations.
[Related commands:]
-"pair_style"_pair_style.html, "pair_coeff"_pair_coeff.html,
-"thermo_style"_thermo_style.html
+"pair_style"_pair_style.html, "pair_style hybrid"_pair_hybrid.html,
+pair_coeff"_pair_coeff.html, "thermo_style"_thermo_style.html,
+"compute */tally"_compute_tally.html
[Default:]
The option defaults are mix = geometric, shift = no, table = 12,
tabinner = sqrt(2.0), tail = no, and compute = yes.
Note that some pair styles perform mixing, but only a certain style of
mixing. See the doc pages for individual pair styles for details.
:line
:link(Wolff1)
[(Wolff)] Wolff and Rudd, Comp Phys Comm, 120, 200-32 (1999).
:link(Sun)
[(Sun)] Sun, J Phys Chem B, 102, 7338-7364 (1998).
diff --git a/doc/src/pair_reax.txt b/doc/src/pair_reax.txt
index 7215c12ce..1d13f9370 100644
--- a/doc/src/pair_reax.txt
+++ b/doc/src/pair_reax.txt
@@ -1,216 +1,216 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
pair_style reax command :h3
[Syntax:]
pair_style reax hbcut hbnewflag tripflag precision :pre
hbcut = hydrogen-bond cutoff (optional) (distance units)
hbnewflag = use old or new hbond function style (0 or 1) (optional)
tripflag = apply stabilization to all triple bonds (0 or 1) (optional)
precision = precision for charge equilibration (optional) :ul
[Examples:]
pair_style reax
pair_style reax 10.0 0 1 1.0e-5
pair_coeff * * ffield.reax 3 1 2 2
pair_coeff * * ffield.reax 3 NULL NULL 3 :pre
[Description:]
Style {reax} computes the ReaxFF potential of van Duin, Goddard and
co-workers. ReaxFF uses distance-dependent bond-order functions to
represent the contributions of chemical bonding to the potential
energy. There is more than one version of ReaxFF. The version
implemented in LAMMPS uses the functional forms documented in the
supplemental information of the following paper:
"(Chenoweth)"_#Chenoweth_20081. The version integrated into LAMMPS matches
the most up-to-date version of ReaxFF as of summer 2010.
WARNING: pair style reax is now deprecated and will soon be retired. Users
-should switch to "pair_style reax/c"_pair_reax_c.html. The {reax} style
+should switch to "pair_style reax/c"_pair_reaxc.html. The {reax} style
differs from the {reax/c} style in the lo-level implementation details.
The {reax} style is a
Fortran library, linked to LAMMPS. The {reax/c} style was initially
implemented as stand-alone C code and is now integrated into LAMMPS as
a package.
LAMMPS requires that a file called ffield.reax be provided, containing
the ReaxFF parameters for each atom type, bond type, etc. The format
is identical to the ffield file used by van Duin and co-workers. The
filename is required as an argument in the pair_coeff command. Any
value other than "ffield.reax" will be rejected (see below).
LAMMPS provides several different versions of ffield.reax in its
potentials dir, each called potentials/ffield.reax.label. These are
documented in potentials/README.reax. The default ffield.reax
contains parameterizations for the following elements: C, H, O, N.
NOTE: We do not distribute a wide variety of ReaxFF force field files
with LAMMPS. Adri van Duin's group at PSU is the central repository
for this kind of data as they are continuously deriving and updating
parameterizations for different classes of materials. You can submit
a contact request at the Materials Computation Center (MCC) website
"https://www.mri.psu.edu/materials-computation-center/connect-mcc"_https://www.mri.psu.edu/materials-computation-center/connect-mcc,
describing the material(s) you are interested in modeling with ReaxFF.
They can tell
you what is currently available or what it would take to create a
suitable ReaxFF parameterization.
The format of these files is identical to that used originally by van
Duin. We have tested the accuracy of {pair_style reax} potential
against the original ReaxFF code for the systems mentioned above. You
can use other ffield files for specific chemical systems that may be
available elsewhere (but note that their accuracy may not have been
tested).
The {hbcut}, {hbnewflag}, {tripflag}, and {precision} settings are
optional arguments. If none are provided, default settings are used:
{hbcut} = 6 (which is Angstroms in real units), {hbnewflag} = 1 (use
new hbond function style), {tripflag} = 1 (apply stabilization to all
triple bonds), and {precision} = 1.0e-6 (one part in 10^6). If you
wish to override any of these defaults, then all of the settings must
be specified.
Two examples using {pair_style reax} are provided in the examples/reax
sub-directory, along with corresponding examples for
-"pair_style reax/c"_pair_reax_c.html. Note that while the energy and force
+"pair_style reax/c"_pair_reaxc.html. Note that while the energy and force
calculated by both of these pair styles match very closely, the
contributions due to the valence angles differ slightly due to
the fact that with {pair_style reax/c} the default value of {thb_cutoff_sq}
is 0.00001, while for {pair_style reax} it is hard-coded to be 0.001.
Use of this pair style requires that a charge be defined for every
atom since the {reax} pair style performs a charge equilibration (QEq)
calculation. See the "atom_style"_atom_style.html and
"read_data"_read_data.html commands for details on how to specify
charges.
The thermo variable {evdwl} stores the sum of all the ReaxFF potential
energy contributions, with the exception of the Coulombic and charge
equilibration contributions which are stored in the thermo variable
{ecoul}. The output of these quantities is controlled by the
"thermo"_thermo.html command.
This pair style tallies a breakdown of the total ReaxFF potential
energy into sub-categories, which can be accessed via the "compute
pair"_compute_pair.html command as a vector of values of length 14.
The 14 values correspond to the following sub-categories (the variable
names in italics match those used in the ReaxFF FORTRAN library):
{eb} = bond energy
{ea} = atom energy
{elp} = lone-pair energy
{emol} = molecule energy (always 0.0)
{ev} = valence angle energy
{epen} = double-bond valence angle penalty
{ecoa} = valence angle conjugation energy
{ehb} = hydrogen bond energy
{et} = torsion energy
{eco} = conjugation energy
{ew} = van der Waals energy
{ep} = Coulomb energy
{efi} = electric field energy (always 0.0)
{eqeq} = charge equilibration energy :ol
To print these quantities to the log file (with descriptive column
headings) the following commands could be included in an input script:
compute reax all pair reax
variable eb equal c_reax\[1\]
variable ea equal c_reax\[2\]
...
variable eqeq equal c_reax\[14\]
thermo_style custom step temp epair v_eb v_ea ... v_eqeq :pre
Only a single pair_coeff command is used with the {reax} style which
specifies a ReaxFF potential file with parameters for all needed
elements. These are mapped to LAMMPS atom types by specifying N
additional arguments after the filename in the pair_coeff command,
where N is the number of LAMMPS atom types:
filename
N indices = mapping of ReaxFF elements to atom types :ul
The specification of the filename and the mapping of LAMMPS atom types
recognized by the ReaxFF is done differently than for other LAMMPS
potentials, due to the non-portable difficulty of passing character
strings (e.g. filename, element names) between C++ and Fortran.
The filename has to be "ffield.reax" and it has to exist in the
directory you are running LAMMPS in. This means you cannot prepend a
path to the file in the potentials dir. Rather, you should copy that
file into the directory you are running from. If you wish to use
another ReaxFF potential file, then name it "ffield.reax" and put it
in the directory you run from.
In the ReaxFF potential file, near the top, after the general
parameters, is the atomic parameters section that contains element
names, each with a couple dozen numeric parameters. If there are M
elements specified in the {ffield} file, think of these as numbered 1
to M. Each of the N indices you specify for the N atom types of LAMMPS
atoms must be an integer from 1 to M. Atoms with LAMMPS type 1 will
be mapped to whatever element you specify as the first index value,
etc. If a mapping value is specified as NULL, the mapping is not
performed. This can be used when a ReaxFF potential is used as part
of the {hybrid} pair style. The NULL values are placeholders for atom
types that will be used with other potentials.
NOTE: Currently the reax pair style cannot be used as part of the
{hybrid} pair style. Some additional changes still need to be made to
enable this.
As an example, say your LAMMPS simulation has 4 atom types and the
elements are ordered as C, H, O, N in the {ffield} file. If you want
the LAMMPS atom type 1 and 2 to be C, type 3 to be N, and type 4 to be
H, you would use the following pair_coeff command:
pair_coeff * * ffield.reax 1 1 4 2 :pre
:line
[Mixing, shift, table, tail correction, restart, rRESPA info]:
This pair style does not support the "pair_modify"_pair_modify.html
mix, shift, table, and tail options.
This pair style does not write its information to "binary restart
files"_restart.html, since it is stored in potential files. Thus, you
need to re-specify the pair_style and pair_coeff commands in an input
script that reads a restart file.
This pair style can only be used via the {pair} keyword of the
"run_style respa"_run_style.html command. It does not support the
{inner}, {middle}, {outer} keywords.
[Restrictions:]
The ReaxFF potential files provided with LAMMPS in the potentials
directory are parameterized for real "units"_units.html. You can use
the ReaxFF potential with any LAMMPS units, but you would need to
create your own potential file with coefficients listed in the
appropriate units if your simulation doesn't use "real" units.
[Related commands:]
-"pair_coeff"_pair_coeff.html, "pair_style reax/c"_pair_reax_c.html,
+"pair_coeff"_pair_coeff.html, "pair_style reax/c"_pair_reaxc.html,
"fix_reax_bonds"_fix_reax_bonds.html
[Default:]
The keyword defaults are {hbcut} = 6, {hbnewflag} = 1, {tripflag} = 1,
{precision} = 1.0e-6.
:line
:link(Chenoweth_20081)
[(Chenoweth_2008)] Chenoweth, van Duin and Goddard,
Journal of Physical Chemistry A, 112, 1040-1053 (2008).
diff --git a/doc/src/pair_reax_c.txt b/doc/src/pair_reaxc.txt
similarity index 96%
rename from doc/src/pair_reax_c.txt
rename to doc/src/pair_reaxc.txt
index c1d719d22..76a8e6fd5 100644
--- a/doc/src/pair_reax_c.txt
+++ b/doc/src/pair_reaxc.txt
@@ -1,349 +1,357 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
pair_style reax/c command :h3
pair_style reax/c/kk command :h3
[Syntax:]
pair_style reax/c cfile keyword value :pre
cfile = NULL or name of a control file :ulb,l
zero or more keyword/value pairs may be appended :l
keyword = {checkqeq} or {lgvdw} or {safezone} or {mincap}
{checkqeq} value = {yes} or {no} = whether or not to require qeq/reax fix
+ {enobonds} value = {yes} or {no} = whether or not to tally energy of atoms with no bonds
{lgvdw} value = {yes} or {no} = whether or not to use a low gradient vdW correction
{safezone} = factor used for array allocation
{mincap} = minimum size for array allocation :pre
:ule
[Examples:]
pair_style reax/c NULL
pair_style reax/c controlfile checkqeq no
pair_style reax/c NULL lgvdw yes
pair_style reax/c NULL safezone 1.6 mincap 100
pair_coeff * * ffield.reax C H O N :pre
[Description:]
Style {reax/c} computes the ReaxFF potential of van Duin, Goddard and
co-workers. ReaxFF uses distance-dependent bond-order functions to
represent the contributions of chemical bonding to the potential
energy. There is more than one version of ReaxFF. The version
implemented in LAMMPS uses the functional forms documented in the
supplemental information of the following paper: "(Chenoweth et al.,
2008)"_#Chenoweth_20082. The version integrated into LAMMPS matches
the most up-to-date version of ReaxFF as of summer 2010. For more
technical details about the pair reax/c implementation of ReaxFF, see
the "(Aktulga)"_#Aktulga paper. The {reax/c} style was initially
implemented as a stand-alone C code and is now integrated into LAMMPS
as a package.
The {reax/c/kk} style is a Kokkos version of the ReaxFF potential that is
derived from the {reax/c} style. The Kokkos version can run on GPUs and
can also use OpenMP multithreading. For more information about the Kokkos package,
see "Section 4"_Section_packages.html#kokkos and "Section 5.3.3"_accelerate_kokkos.html.
One important consideration when using the {reax/c/kk} style is the choice of either
half or full neighbor lists. This setting can be changed using the Kokkos "package"_package.html
command.
The {reax/c} style differs from the "pair_style reax"_pair_reax.html
command in the lo-level implementation details. The {reax} style is a
Fortran library, linked to LAMMPS. The {reax/c} style was initially
implemented as stand-alone C code and is now integrated into LAMMPS as
a package.
LAMMPS provides several different versions of ffield.reax in its
potentials dir, each called potentials/ffield.reax.label. These are
documented in potentials/README.reax. The default ffield.reax
contains parameterizations for the following elements: C, H, O, N.
The format of these files is identical to that used originally by van
Duin. We have tested the accuracy of {pair_style reax/c} potential
against the original ReaxFF code for the systems mentioned above. You
can use other ffield files for specific chemical systems that may be
available elsewhere (but note that their accuracy may not have been
tested).
NOTE: We do not distribute a wide variety of ReaxFF force field files
with LAMMPS. Adri van Duin's group at PSU is the central repository
for this kind of data as they are continuously deriving and updating
parameterizations for different classes of materials. You can submit
a contact request at the Materials Computation Center (MCC) website
"https://www.mri.psu.edu/materials-computation-center/connect-mcc"_https://www.mri.psu.edu/materials-computation-center/connect-mcc,
describing the material(s) you are interested in modeling with ReaxFF.
They can tell
you what is currently available or what it would take to create a
suitable ReaxFF parameterization.
The {cfile} setting can be specified as NULL, in which case default
settings are used. A control file can be specified which defines
values of control variables. Some control variables are
global parameters for the ReaxFF potential. Others define certain
performance and output settings.
Each line in the control file specifies the value for
a control variable. The format of the control file is described
below.
NOTE: The LAMMPS default values for the ReaxFF global parameters
correspond to those used by Adri van Duin's stand-alone serial
code. If these are changed by setting control variables in the control
file, the results from LAMMPS and the serial code will not agree.
Two examples using {pair_style reax/c} are provided in the examples/reax
sub-directory, along with corresponding examples for
"pair_style reax"_pair_reax.html.
Use of this pair style requires that a charge be defined for every
atom. See the "atom_style"_atom_style.html and
"read_data"_read_data.html commands for details on how to specify
charges.
The ReaxFF parameter files provided were created using a charge
equilibration (QEq) model for handling the electrostatic interactions.
Therefore, by default, LAMMPS requires that the "fix
qeq/reax"_fix_qeq_reax.html command be used with {pair_style reax/c}
when simulating a ReaxFF model, to equilibrate charge each timestep.
Using the keyword {checkqeq} with the value {no}
turns off the check for {fix qeq/reax},
allowing a simulation to be run without charge equilibration.
In this case, the static charges you
assign to each atom will be used for computing the electrostatic
interactions in the system.
See the "fix qeq/reax"_fix_qeq_reax.html command for details.
Using the optional keyword {lgvdw} with the value {yes} turns on
the low-gradient correction of the ReaxFF/C for long-range
London Dispersion, as described in the "(Liu)"_#Liu_2011 paper. Force field
file {ffield.reax.lg} is designed for this correction, and is trained
for several energetic materials (see "Liu"). When using lg-correction,
recommended value for parameter {thb} is 0.01, which can be set in the
control file. Note: Force field files are different for the original
or lg corrected pair styles, using wrong ffield file generates an error message.
+Using the optional keyword {enobonds} with the value {yes}, the energy
+of atoms with no bonds (i.e. isolated atoms) is included in the total
+potential energy and the per-atom energy of that atom. If the value
+{no} is specified then the energy of atoms with no bonds is set to zero.
+The latter behavior is usual not desired, as it causes discontinuities
+in the potential energy when the bonding of an atom drops to zero.
+
Optional keywords {safezone} and {mincap} are used for allocating
reax/c arrays. Increasing these values can avoid memory problems, such
as segmentation faults and bondchk failed errors, that could occur under
certain conditions. These keywords aren't used by the Kokkos version, which
instead uses a more robust memory allocation scheme that checks if the sizes of
the arrays have been exceeded and automatically allocates more memory.
The thermo variable {evdwl} stores the sum of all the ReaxFF potential
energy contributions, with the exception of the Coulombic and charge
equilibration contributions which are stored in the thermo variable
{ecoul}. The output of these quantities is controlled by the
"thermo"_thermo.html command.
This pair style tallies a breakdown of the total ReaxFF potential
energy into sub-categories, which can be accessed via the "compute
pair"_compute_pair.html command as a vector of values of length 14.
The 14 values correspond to the following sub-categories (the variable
names in italics match those used in the original FORTRAN ReaxFF code):
{eb} = bond energy
{ea} = atom energy
{elp} = lone-pair energy
{emol} = molecule energy (always 0.0)
{ev} = valence angle energy
{epen} = double-bond valence angle penalty
{ecoa} = valence angle conjugation energy
{ehb} = hydrogen bond energy
{et} = torsion energy
{eco} = conjugation energy
{ew} = van der Waals energy
{ep} = Coulomb energy
{efi} = electric field energy (always 0.0)
{eqeq} = charge equilibration energy :ol
To print these quantities to the log file (with descriptive column
headings) the following commands could be included in an input script:
compute reax all pair reax/c
variable eb equal c_reax\[1\]
variable ea equal c_reax\[2\]
\[...\]
variable eqeq equal c_reax\[14\]
thermo_style custom step temp epair v_eb v_ea \[...\] v_eqeq :pre
Only a single pair_coeff command is used with the {reax/c} style which
specifies a ReaxFF potential file with parameters for all needed
elements. These are mapped to LAMMPS atom types by specifying N
additional arguments after the filename in the pair_coeff command,
where N is the number of LAMMPS atom types:
filename
N indices = ReaxFF elements :ul
The filename is the ReaxFF potential file. Unlike for the {reax}
pair style, any filename can be used.
In the ReaxFF potential file, near the top, after the general
parameters, is the atomic parameters section that contains element
names, each with a couple dozen numeric parameters. If there are M
elements specified in the {ffield} file, think of these as numbered 1
to M. Each of the N indices you specify for the N atom types of LAMMPS
atoms must be an integer from 1 to M. Atoms with LAMMPS type 1 will
be mapped to whatever element you specify as the first index value,
etc. If a mapping value is specified as NULL, the mapping is not
performed. This can be used when the {reax/c} style is used as part
of the {hybrid} pair style. The NULL values are placeholders for atom
types that will be used with other potentials.
As an example, say your LAMMPS simulation has 4 atom types and the
elements are ordered as C, H, O, N in the {ffield} file. If you want
the LAMMPS atom type 1 and 2 to be C, type 3 to be N, and type 4 to be
H, you would use the following pair_coeff command:
pair_coeff * * ffield.reax C C N H :pre
:line
The format of a line in the control file is as follows:
variable_name value :pre
and it may be followed by an "!" character and a trailing comment.
If the value of a control variable is not specified, then default
values are used. What follows is the list of variables along with a
brief description of their use and default values.
simulation_name: Output files produced by {pair_style reax/c} carry
this name + extensions specific to their contents. Partial energies
are reported with a ".pot" extension, while the trajectory file has
".trj" extension.
tabulate_long_range: To improve performance, long range interactions
can optionally be tabulated (0 means no tabulation). Value of this
variable denotes the size of the long range interaction table. The
range from 0 to long range cutoff (defined in the {ffield} file) is
divided into {tabulate_long_range} points. Then at the start of
simulation, we fill in the entries of the long range interaction table
by computing the energies and forces resulting from van der Waals and
Coulomb interactions between every possible atom type pairs present in
the input system. During the simulation we consult to the long range
interaction table to estimate the energy and forces between a pair of
atoms. Linear interpolation is used for estimation. (default value =
0)
energy_update_freq: Denotes the frequency (in number of steps) of
writes into the partial energies file. (default value = 0)
nbrhood_cutoff: Denotes the near neighbors cutoff (in Angstroms)
regarding the bonded interactions. (default value = 5.0)
hbond_cutoff: Denotes the cutoff distance (in Angstroms) for hydrogen
bond interactions.(default value = 7.5. A value of 0.0 turns off
hydrogen bonds)
bond_graph_cutoff: is the threshold used in determining what is a
physical bond, what is not. Bonds and angles reported in the
trajectory file rely on this cutoff. (default value = 0.3)
thb_cutoff: cutoff value for the strength of bonds to be considered in
three body interactions. (default value = 0.001)
thb_cutoff_sq: cutoff value for the strength of bond order products
to be considered in three body interactions. (default value = 0.00001)
write_freq: Frequency of writes into the trajectory file. (default
value = 0)
traj_title: Title of the trajectory - not the name of the trajectory
file.
atom_info: 1 means print only atomic positions + charge (default = 0)
atom_forces: 1 adds net forces to atom lines in the trajectory file
(default = 0)
atom_velocities: 1 adds atomic velocities to atoms line (default = 0)
bond_info: 1 prints bonds in the trajectory file (default = 0)
angle_info: 1 prints angles in the trajectory file (default = 0)
:line
[Mixing, shift, table, tail correction, restart, rRESPA info]:
This pair style does not support the "pair_modify"_pair_modify.html
mix, shift, table, and tail options.
This pair style does not write its information to "binary restart
files"_restart.html, since it is stored in potential files. Thus, you
need to re-specify the pair_style and pair_coeff commands in an input
script that reads a restart file.
This pair style can only be used via the {pair} keyword of the
"run_style respa"_run_style.html command. It does not support the
{inner}, {middle}, {outer} keywords.
:line
Styles with a {gpu}, {intel}, {kk}, {omp}, or {opt} suffix are
functionally the same as the corresponding style without the suffix.
They have been optimized to run faster, depending on your available
hardware, as discussed in "Section 5"_Section_accelerate.html
of the manual. The accelerated styles take the same arguments and
should produce the same results, except for round-off and precision
issues.
These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
USER-OMP and OPT packages, respectively. They are only enabled if
LAMMPS was built with those packages. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
You can specify the accelerated styles explicitly in your input script
by including their suffix, or you can use the "-suffix command-line
switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
use the "suffix"_suffix.html command in your input script.
See "Section 5"_Section_accelerate.html of the manual for
more instructions on how to use the accelerated styles effectively.
:line
[Restrictions:]
This pair style is part of the USER-REAXC package. It is only enabled
if LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
The ReaxFF potential files provided with LAMMPS in the potentials
directory are parameterized for real "units"_units.html. You can use
the ReaxFF potential with any LAMMPS units, but you would need to
create your own potential file with coefficients listed in the
appropriate units if your simulation doesn't use "real" units.
[Related commands:]
"pair_coeff"_pair_coeff.html, "fix qeq/reax"_fix_qeq_reax.html, "fix
reax/c/bonds"_fix_reax_bonds.html, "fix
reax/c/species"_fix_reaxc_species.html, "pair_style
reax"_pair_reax.html
[Default:]
-The keyword defaults are checkqeq = yes, lgvdw = no, safezone = 1.2,
+The keyword defaults are checkqeq = yes, enobonds = yes, lgvdw = no, safezone = 1.2,
mincap = 50.
:line
:link(Chenoweth_20082)
[(Chenoweth_2008)] Chenoweth, van Duin and Goddard,
Journal of Physical Chemistry A, 112, 1040-1053 (2008).
:link(Aktulga)
(Aktulga) Aktulga, Fogarty, Pandit, Grama, Parallel Computing, 38,
245-259 (2012).
:link(Liu_2011)
[(Liu)] L. Liu, Y. Liu, S. V. Zybin, H. Sun and W. A. Goddard, Journal
of Physical Chemistry A, 115, 11016-11022 (2011).
diff --git a/doc/src/pair_sdk.txt b/doc/src/pair_sdk.txt
index 212760e03..1c348eaaf 100644
--- a/doc/src/pair_sdk.txt
+++ b/doc/src/pair_sdk.txt
@@ -1,156 +1,156 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
pair_style lj/sdk command :h3
pair_style lj/sdk/gpu command :h3
pair_style lj/sdk/kk command :h3
pair_style lj/sdk/omp command :h3
pair_style lj/sdk/coul/long command :h3
pair_style lj/sdk/coul/long/gpu command :h3
pair_style lj/sdk/coul/long/omp command :h3
[Syntax:]
pair_style style args :pre
style = {lj/sdk} or {lj/sdk/coul/long}
args = list of arguments for a particular style :ul
{lj/sdk} args = cutoff
cutoff = global cutoff for Lennard Jones interactions (distance units)
{lj/sdk/coul/long} args = cutoff (cutoff2)
cutoff = global cutoff for LJ (and Coulombic if only 1 arg) (distance units)
cutoff2 = global cutoff for Coulombic (optional) (distance units) :pre
[Examples:]
pair_style lj/sdk 2.5
pair_coeff 1 1 lj12_6 1 1.1 2.8 :pre
pair_style lj/sdk/coul/long 10.0
pair_style lj/sdk/coul/long 10.0 12.0
pair_coeff 1 1 lj9_6 100.0 3.5 12.0 :pre
[Description:]
The {lj/sdk} styles compute a 9/6, 12/4, or 12/6 Lennard-Jones potential,
given by
:c,image(Eqs/pair_cmm.jpg)
as required for the SDK Coarse-grained MD parametrization discussed in
"(Shinoda)"_#Shinoda3 and "(DeVane)"_#DeVane. Rc is the cutoff.
Style {lj/sdk/coul/long} computes the adds Coulombic interactions
with an additional damping factor applied so it can be used in
conjunction with the "kspace_style"_kspace_style.html command and
its {ewald} or {pppm} or {pppm/cg} option. The Coulombic cutoff
specified for this style means that pairwise interactions within
this distance are computed directly; interactions outside that
distance are computed in reciprocal space.
The following coefficients must be defined for each pair of atoms
types via the "pair_coeff"_pair_coeff.html command as in the examples
above, or in the data file or restart files read by the
"read_data"_read_data.html or "read_restart"_read_restart.html
commands, or by mixing as described below:
cg_type (lj9_6, lj12_4, or lj12_6)
epsilon (energy units)
sigma (distance units)
cutoff1 (distance units) :ul
Note that sigma is defined in the LJ formula as the zero-crossing
distance for the potential, not as the energy minimum. The prefactors
are chosen so that the potential minimum is at -epsilon.
The latter 2 coefficients are optional. If not specified, the global
LJ and Coulombic cutoffs specified in the pair_style command are used.
If only one cutoff is specified, it is used as the cutoff for both LJ
and Coulombic interactions for this type pair. If both coefficients
are specified, they are used as the LJ and Coulombic cutoffs for this
type pair.
For {lj/sdk/coul/long} only the LJ cutoff can be specified since a
Coulombic cutoff cannot be specified for an individual I,J type pair.
All type pairs use the same global Coulombic cutoff specified in the
pair_style command.
:line
Styles with a {gpu}, {intel}, {kk}, {omp} or {opt} suffix are
functionally the same as the corresponding style without the suffix.
They have been optimized to run faster, depending on your available
hardware, as discussed in "Section 5"_Section_accelerate.html
of the manual. The accelerated styles take the same arguments and
should produce the same results, except for round-off and precision
issues.
These accelerated styles are part of the GPU, USER-INTEL, KOKKOS,
USER-OMP, and OPT packages respectively. They are only enabled if
LAMMPS was built with those packages. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
You can specify the accelerated styles explicitly in your input script
by including their suffix, or you can use the "-suffix command-line
switch"_Section_start.html#start_7 when you invoke LAMMPS, or you can
use the "suffix"_suffix.html command in your input script.
See "Section 5"_Section_accelerate.html of the manual for
more instructions on how to use the accelerated styles effectively.
:line
[Mixing, shift, table, tail correction, restart, and rRESPA info]:
For atom type pairs I,J and I != J, the epsilon and sigma coefficients
and cutoff distance for all of the lj/sdk pair styles {cannot} be mixed,
since different pairs may have different exponents. So all parameters
for all pairs have to be specified explicitly through the "pair_coeff"
command. Defining then in a data file is also not supported, due to
limitations of that file format.
All of the lj/sdk pair styles support the
"pair_modify"_pair_modify.html shift option for the energy of the
Lennard-Jones portion of the pair interaction.
The {lj/sdk/coul/long} pair styles support the
"pair_modify"_pair_modify.html table option since they can tabulate
the short-range portion of the long-range Coulombic interaction.
All of the lj/sdk pair styles write their information to "binary
restart files"_restart.html, so pair_style and pair_coeff commands do
not need to be specified in an input script that reads a restart file.
The lj/sdk and lj/cut/coul/long pair styles do not support
the use of the {inner}, {middle}, and {outer} keywords of the "run_style
respa"_run_style.html command.
:line
[Restrictions:]
-All of the lj/sdk pair styles are part of the USER-CG-CMM package.
+All of the lj/sdk pair styles are part of the USER-CGSDK package.
The {lj/sdk/coul/long} style also requires the KSPACE package to be
built (which is enabled by default). They are only enabled if LAMMPS
was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
[Related commands:]
"pair_coeff"_pair_coeff.html, "angle_style sdk"_angle_sdk.html
[Default:] none
:line
:link(Shinoda3)
[(Shinoda)] Shinoda, DeVane, Klein, Mol Sim, 33, 27 (2007).
:link(DeVane)
[(DeVane)] Shinoda, DeVane, Klein, Soft Matter, 4, 2453-2462 (2008).
diff --git a/doc/src/pair_srp.txt b/doc/src/pair_srp.txt
index 3f54445ba..e7f1e00d1 100644
--- a/doc/src/pair_srp.txt
+++ b/doc/src/pair_srp.txt
@@ -1,166 +1,168 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
pair_style srp command :h3
[Syntax:]
pair_style srp cutoff btype dist keyword value ...
cutoff = global cutoff for SRP interactions (distance units) :ulb,l
btype = bond type to apply SRP interactions to (can be wildcard, see below) :l
distance = {min} or {mid} :l
zero or more keyword/value pairs may be appended :l
keyword = {exclude} :l
{bptype} value = atom type for bond particles
{exclude} value = {yes} or {no} :pre
:ule
[Examples:]
pair_style hybrid dpd 1.0 1.0 12345 srp 0.8 1 mid exclude yes
pair_coeff 1 1 dpd 60.0 4.5 1.0
pair_coeff 1 2 none
pair_coeff 2 2 srp 100.0 0.8 :pre
pair_style hybrid dpd 1.0 1.0 12345 srp 0.8 * min exclude yes
pair_coeff 1 1 dpd 60.0 50 1.0
pair_coeff 1 2 none
pair_coeff 2 2 srp 40.0 :pre
pair_style hybrid srp 0.8 2 mid
pair_coeff 1 1 none
pair_coeff 1 2 none
pair_coeff 2 2 srp 100.0 0.8 :pre
[Description:]
Style {srp} computes a soft segmental repulsive potential (SRP) that
acts between pairs of bonds. This potential is useful for preventing
bonds from passing through one another when a soft non-bonded
potential acts between beads in, for example, DPD polymer chains. An
example input script that uses this command is provided in
examples/USER/srp.
Bonds of specified type {btype} interact with one another through a
bond-pairwise potential, such that the force on bond {i} due to bond
{j} is as follows
:c,image(Eqs/pair_srp1.jpg)
where {r} and {rij} are the distance and unit vector between the two
bonds. Note that {btype} can be specified as an asterisk "*", which
case the interaction is applied to all bond types. The {mid} option
computes {r} and {rij} from the midpoint distance between bonds. The
{min} option computes {r} and {rij} from the minimum distance between
bonds. The force acting on a bond is mapped onto the two bond atoms
according to the lever rule,
:c,image(Eqs/pair_srp2.jpg)
where {L} is the normalized distance from the atom to the point of
closest approach of bond {i} and {j}. The {mid} option takes {L} as
0.5 for each interaction as described in "(Sirk)"_#Sirk2.
The following coefficients must be defined via the
"pair_coeff"_pair_coeff.html command as in the examples above, or in
the data file or restart file read by the "read_data"_read_data.html
or "read_restart"_read_restart.html commands:
{C} (force units)
{rc} (distance units) :ul
The last coefficient is optional. If not specified, the global cutoff
is used.
NOTE: Pair style srp considers each bond of type {btype} to be a
fictitious "particle" of type {bptype}, where {bptype} is either the
largest atom type in the system, or the type set by the {bptype} flag.
Any actual existing particles with this atom type will be deleted at
the beginning of a run. This means you must specify the number of
types in your system accordingly; usually to be one larger than what
would normally be the case, e.g. via the "create_box"_create_box.html
or by changing the header in your "data file"_read_data.html. The
fictitious "bond particles" are inserted at the beginning of the run,
and serve as placeholders that define the position of the bonds. This
allows neighbor lists to be constructed and pairwise interactions to
be computed in almost the same way as is done for actual particles.
Because bonds interact only with other bonds, "pair_style
hybrid"_pair_hybrid.html should be used to turn off interactions
between atom type {bptype} and all other types of atoms. An error
will be flagged if "pair_style hybrid"_pair_hybrid.html is not used.
The optional {exclude} keyword determines if forces are computed
between first neighbor (directly connected) bonds. For a setting of
{no}, first neighbor forces are computed; for {yes} they are not
computed. A setting of {no} cannot be used with the {min} option for
distance calculation because the minimum distance between directly
connected bonds is zero.
Pair style {srp} turns off normalization of thermodynamic properties
by particle number, as if the command "thermo_modify norm
no"_thermo_modify.html had been issued.
The pairwise energy associated with style {srp} is shifted to be zero
at the cutoff distance {rc}.
:line
[Mixing, shift, table, tail correction, restart, rRESPA info]:
This pair styles does not support mixing.
This pair style does not support the "pair_modify"_pair_modify.html
shift option for the energy of the pair interaction. Note that as
discussed above, the energy term is already shifted to be 0.0 at the
cutoff distance {rc}.
The "pair_modify"_pair_modify.html table option is not relevant for
this pair style.
This pair style does not support the "pair_modify"_pair_modify.html
tail option for adding long-range tail corrections to energy and
pressure.
This pair style writes global and per-atom information to "binary
restart files"_restart.html. Pair srp should be used with "pair_style
hybrid"_pair_hybrid.html, thus the pair_coeff commands need to be
specified in the input script when reading a restart file.
This pair style can only be used via the {pair} keyword of the
"run_style respa"_run_style.html command. It does not support the
{inner}, {middle}, {outer} keywords.
:line
[Restrictions:]
This pair style is part of the USER-MISC package. It is only enabled
if LAMMPS was built with that package. See the Making LAMMPS section
for more info.
This pair style must be used with "pair_style
hybrid"_pair_hybrid.html.
This pair style requires the "newton"_newton.html command to be {on}
for non-bonded interactions.
+This pair style is not compatible with "rigid body integrators"_fix_rigid.html
+
[Related commands:]
"pair_style hybrid"_pair_hybrid.html, "pair_coeff"_pair_coeff.html,
"pair dpd"_pair_dpd.html
[Default:]
The default keyword value is exclude = yes.
:line
:link(Sirk2)
[(Sirk)] Sirk TW, Sliozberg YR, Brennan JK, Lisal M, Andzelm JW, J
Chem Phys, 136 (13) 134903, 2012.
diff --git a/doc/src/pairs.txt b/doc/src/pairs.txt
index 8694747da..0898906e7 100644
--- a/doc/src/pairs.txt
+++ b/doc/src/pairs.txt
@@ -1,107 +1,107 @@
Pair Styles :h1
<!-- RST
.. toctree::
:maxdepth: 1
pair_adp
pair_agni
pair_airebo
pair_awpmd
pair_beck
pair_body
pair_bop
pair_born
pair_brownian
pair_buck
pair_buck_long
pair_charmm
pair_class2
pair_colloid
pair_comb
pair_coul
pair_coul_diel
pair_cs
pair_dipole
pair_dpd
pair_dpd_fdt
pair_dsmc
pair_eam
pair_edip
pair_eff
pair_eim
pair_exp6_rx
pair_gauss
pair_gayberne
pair_gran
pair_gromacs
pair_hbond_dreiding
pair_hybrid
pair_kim
pair_kolmogorov_crespi_z
pair_lcbop
pair_line_lj
pair_list
pair_lj
pair_lj96
pair_lj_cubic
pair_lj_expand
pair_lj_long
pair_lj_sf
pair_lj_smooth
pair_lj_smooth_linear
pair_lj_soft
pair_lubricate
pair_lubricateU
pair_mdf
pair_meam
pair_meam_spline
pair_meam_sw_spline
pair_mgpt
pair_mie
pair_momb
pair_morse
pair_multi_lucy
pair_multi_lucy_rx
pair_nb3b_harmonic
pair_nm
pair_none
pair_oxdna
pair_oxdna2
pair_peri
pair_polymorphic
pair_quip
pair_reax
- pair_reax_c
+ pair_reaxc
pair_resquared
pair_sdk
pair_smd_hertz
pair_smd_tlsph
pair_smd_triangulated_surface
pair_smd_ulsph
pair_smtbq
pair_snap
pair_soft
pair_sph_heatconduction
pair_sph_idealgas
pair_sph_lj
pair_sph_rhosum
pair_sph_taitwater
pair_sph_taitwater_morris
pair_srp
pair_sw
pair_table
pair_table_rx
pair_tersoff
pair_tersoff_mod
pair_tersoff_zbl
pair_thole
pair_tri_lj
pair_vashishta
pair_yukawa
pair_yukawa_colloid
pair_zbl
pair_zero
END_RST -->
diff --git a/doc/src/python.txt b/doc/src/python.txt
index a5003be54..e8a76c0e3 100644
--- a/doc/src/python.txt
+++ b/doc/src/python.txt
@@ -1,481 +1,481 @@
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
python command :h3
[Syntax:]
python func keyword args ... :pre
func = name of Python function :ulb,l
one or more keyword/args pairs must be appended :l
keyword = {invoke} or {input} or {return} or {format} or {length} or {file} or {here} or {exists}
{invoke} arg = none = invoke the previously defined Python function
{input} args = N i1 i2 ... iN
N = # of inputs to function
i1,...,iN = value, SELF, or LAMMPS variable name
value = integer number, floating point number, or string
SELF = reference to LAMMPS itself which can be accessed by Python function
variable = v_name, where name = name of LAMMPS variable, e.g. v_abc
{return} arg = varReturn
varReturn = v_name = LAMMPS variable name which return value of function will be assigned to
{format} arg = fstring with M characters
M = N if no return value, where N = # of inputs
M = N+1 if there is a return value
fstring = each character (i,f,s,p) corresponds in order to an input or return value
'i' = integer, 'f' = floating point, 's' = string, 'p' = SELF
{length} arg = Nlen
Nlen = max length of string returned from Python function
{file} arg = filename
filename = file of Python code, which defines func
{here} arg = inline
inline = one or more lines of Python code which defines func
must be a single argument, typically enclosed between triple quotes
{exists} arg = none = Python code has been loaded by previous python command :pre
:ule
[Examples:]
python pForce input 2 v_x 20.0 return v_f format fff file force.py
python pForce invoke :pre
python factorial input 1 myN return v_fac format ii here """
def factorial(n):
if n == 1: return n
return n * factorial(n-1)
""" :pre
python loop input 1 SELF return v_value format pf here """
def loop(lmpptr,N,cut0):
from lammps import lammps
lmp = lammps(ptr=lmpptr) :pre
# loop N times, increasing cutoff each time :pre
for i in range(N):
cut = cut0 + i*0.1
lmp.set_variable("cut",cut) # set a variable in LAMMPS
lmp.command("pair_style lj/cut $\{cut\}") # LAMMPS commands
lmp.command("pair_coeff * * 1.0 1.0")
lmp.command("run 100")
""" :pre
[Description:]
Define a Python function or execute a previously defined function.
Arguments, including LAMMPS variables, can be passed to the function
from the LAMMPS input script and a value returned by the Python
function to a LAMMPS variable. The Python code for the function can
be included directly in the input script or in a separate Python file.
The function can be standard Python code or it can make "callbacks" to
LAMMPS through its library interface to query or set internal values
within LAMMPS. This is a powerful mechanism for performing complex
operations in a LAMMPS input script that are not possible with the
simple input script and variable syntax which LAMMPS defines. Thus
your input script can operate more like a true programming language.
Use of this command requires building LAMMPS with the PYTHON package
which links to the Python library so that the Python interpreter is
embedded in LAMMPS. More details about this process are given below.
There are two ways to invoke a Python function once it has been
defined. One is using the {invoke} keyword. The other is to assign
the function to a "python-style variable"_variable.html defined in
your input script. Whenever the variable is evaluated, it will
execute the Python function to assign a value to the variable. Note
that variables can be evaluated in many different ways within LAMMPS.
They can be substituted for directly in an input script. Or they can
be passed to various commands as arguments, so that the variable is
evaluated during a simulation run.
A broader overview of how Python can be used with LAMMPS is
given in "Section 11"_Section_python.html. There is an
examples/python directory which illustrates use of the python
command.
:line
The {func} setting specifies the name of the Python function. The
code for the function is defined using the {file} or {here} keywords
as explained below.
If the {invoke} keyword is used, no other keywords can be used, and a
previous python command must have defined the Python function
referenced by this command. This invokes the Python function with the
previously defined arguments and return value processed as explained
below. You can invoke the function as many times as you wish in your
input script.
The {input} keyword defines how many arguments {N} the Python function
expects. If it takes no arguments, then the {input} keyword should
not be used. Each argument can be specified directly as a value,
e.g. 6 or 3.14159 or abc (a string of characters). The type of each
argument is specified by the {format} keyword as explained below, so
that Python will know how to interpret the value. If the word SELF is
used for an argument it has a special meaning. A pointer is passed to
the Python function which it converts into a reference to LAMMPS
itself. This enables the function to call back to LAMMPS through its
library interface as explained below. This allows the Python function
to query or set values internal to LAMMPS which can affect the
subsequent execution of the input script. A LAMMPS variable can also
be used as an argument, specified as v_name, where "name" is the name
of the variable. Any style of LAMMPS variable can be used, as defined
by the "variable"_variable.html command. Each time the Python
function is invoked, the LAMMPS variable is evaluated and its value is
passed to the Python function.
The {return} keyword is only needed if the Python function returns a
value. The specified {varReturn} must be of the form v_name, where
"name" is the name of a python-style LAMMPS variable, defined by the
"variable"_variable.html command. The Python function can return a
numeric or string value, as specified by the {format} keyword.
As explained on the "variable"_variable.html doc page, the definition
of a python-style variable associates a Python function name with the
variable. This must match the {func} setting for this command. For
example these two commands would be self-consistent:
variable foo python myMultiply
python myMultiply return v_foo format f file funcs.py :pre
The two commands can appear in either order in the input script so
long as both are specified before the Python function is invoked for
the first time.
The {format} keyword must be used if the {input} or {return} keyword
is used. It defines an {fstring} with M characters, where M = sum of
number of inputs and outputs. The order of characters corresponds to
the N inputs, followed by the return value (if it exists). Each
character must be one of the following: "i" for integer, "f" for
floating point, "s" for string, or "p" for SELF. Each character
defines the type of the corresponding input or output value of the
Python function and affects the type conversion that is performed
internally as data is passed back and forth between LAMMPS and Python.
Note that it is permissible to use a "python-style
variable"_variable.html in a LAMMPS command that allows for an
equal-style variable as an argument, but only if the output of the
Python function is flagged as a numeric value ("i" or "f") via the
{format} keyword.
If the {return} keyword is used and the {format} keyword specifies the
output as a string, then the default maximum length of that string is
63 characters (64-1 for the string terminator). If you want to return
a longer string, the {length} keyword can be specified with its {Nlen}
value set to a larger number (the code allocates space for Nlen+1 to
include the string terminator). If the Python function generates a
string longer than the default 63 or the specified {Nlen}, it will be
truncated.
:line
Either the {file}, {here}, or {exists} keyword must be used, but only
one of them. These keywords specify what Python code to load into the
Python interpreter. The {file} keyword gives the name of a file,
which should end with a ".py" suffix, which contains Python code. The
code will be immediately loaded into and run in the "main" module of
the Python interpreter. Note that Python code which contains a
function definition does not "execute" the function when it is run; it
simply defines the function so that it can be invoked later.
The {here} keyword does the same thing, except that the Python code
follows as a single argument to the {here} keyword. This can be done
using triple quotes as delimiters, as in the examples above. This
allows Python code to be listed verbatim in your input script, with
proper indentation, blank lines, and comments, as desired. See
"Section 3.2"_Section_commands.html#cmd_2, for an explanation of how
triple quotes can be used as part of input script syntax.
The {exists} keyword takes no argument. It means that Python code
containing the required Python function defined by the {func} setting,
is assumed to have been previously loaded by another python command.
Note that the Python code that is loaded and run must contain a
function with the specified {func} name. To operate properly when
later invoked, the function code must match the {input} and
{return} and {format} keywords specified by the python command.
Otherwise Python will generate an error.
:line
This section describes how Python code can be written to work with
LAMMPS.
Whether you load Python code from a file or directly from your input
script, via the {file} and {here} keywords, the code can be identical.
It must be indented properly as Python requires. It can contain
comments or blank lines. If the code is in your input script, it
cannot however contain triple-quoted Python strings, since that will
conflict with the triple-quote parsing that the LAMMPS input script
performs.
All the Python code you specify via one or more python commands is
loaded into the Python "main" module, i.e. __main__. The code can
define global variables or statements that are outside of function
definitions. It can contain multiple functions, only one of which
matches the {func} setting in the python command. This means you can
use the {file} keyword once to load several functions, and the
{exists} keyword thereafter in subsequent python commands to access
the other functions previously loaded.
A Python function you define (or more generally, the code you load)
can import other Python modules or classes, it can make calls to other
system functions or functions you define, and it can access or modify
global variables (in the "main" module) which will persist between
successive function calls. The latter can be useful, for example, to
prevent a function from being invoke multiple times per timestep by
different commands in a LAMMPS input script that access the returned
python-style variable associated with the function. For example,
consider this function loaded with two global variables defined
outside the function:
nsteplast = -1
nvaluelast = 0 :pre
def expensive(nstep):
global nsteplast,nvaluelast
if nstep == nsteplast: return nvaluelast
nsteplast = nstep
# perform complicated calculation
nvalue = ...
nvaluelast = nvalue
return nvalue :pre
Nsteplast stores the previous timestep the function was invoked
(passed as an argument to the function). Nvaluelast stores the return
value computed on the last function invocation. If the function is
invoked again on the same timestep, the previous value is simply
returned, without re-computing it. The "global" statement inside the
Python function allows it to overwrite the global variables.
Note that if you load Python code multiple times (via multiple python
commands), you can overwrite previously loaded variables and functions
if you are not careful. E.g. if the code above were loaded twice, the
global variables would be re-initialized, which might not be what you
want. Likewise, if a function with the same name exists in two chunks
of Python code you load, the function loaded second will override the
function loaded first.
It's important to realize that if you are running LAMMPS in parallel,
each MPI task will load the Python interpreter and execute a local
copy of the Python function(s) you define. There is no connection
between the Python interpreters running on different processors.
This implies three important things.
First, if you put a print statement in your Python function, you will
see P copies of the output, when running on P processors. If the
prints occur at (nearly) the same time, the P copies of the output may
be mixed together. Welcome to the world of parallel programming and
debugging.
Second, if your Python code loads modules that are not pre-loaded by
the Python library, then it will load the module from disk. This may
be a bottleneck if 1000s of processors try to load a module at the
same time. On some large supercomputers, loading of modules from disk
by Python may be disabled. In this case you would need to pre-build a
Python library that has the required modules pre-loaded and link
LAMMPS with that library.
Third, if your Python code calls back to LAMMPS (discussed in the
next section) and causes LAMMPS to perform an MPI operation requires
global communication (e.g. via MPI_Allreduce), such as computing the
global temperature of the system, then you must insure all your Python
functions (running independently on different processors) call back to
LAMMPS. Otherwise the code may hang.
:line
Your Python function can "call back" to LAMMPS through its
library interface, if you use the SELF input to pass Python
a pointer to LAMMPS. The mechanism for doing this in your
Python function is as follows:
def foo(lmpptr,...):
from lammps import lammps
lmp = lammps(ptr=lmpptr)
lmp.command('print "Hello from inside Python"')
... :pre
The function definition must include a variable (lmpptr in this case)
which corresponds to SELF in the python command. The first line of
the function imports the Python module lammps.py in the python dir of
the distribution. The second line creates a Python object "lmp" which
wraps the instance of LAMMPS that called the function. The
-"ptr=lmpptr" argument is what makes that happen. The thrid line
+"ptr=lmpptr" argument is what makes that happen. The third line
invokes the command() function in the LAMMPS library interface. It
takes a single string argument which is a LAMMPS input script command
for LAMMPS to execute, the same as if it appeared in your input
script. In this case, LAMMPS should output
Hello from inside Python :pre
to the screen and log file. Note that since the LAMMPS print command
itself takes a string in quotes as its argument, the Python string
must be delimited with a different style of quotes.
"Section 11.7"_Section_python.html#py_7 describes the syntax for how
Python wraps the various functions included in the LAMMPS library
interface.
A more interesting example is in the examples/python/in.python script
which loads and runs the following function from examples/python/funcs.py:
def loop(N,cut0,thresh,lmpptr):
print "LOOP ARGS",N,cut0,thresh,lmpptr
from lammps import lammps
lmp = lammps(ptr=lmpptr)
natoms = lmp.get_natoms() :pre
for i in range(N):
cut = cut0 + i*0.1 :pre
lmp.set_variable("cut",cut) # set a variable in LAMMPS
lmp.command("pair_style lj/cut $\{cut\}") # LAMMPS command
#lmp.command("pair_style lj/cut %d" % cut) # LAMMPS command option :pre
lmp.command("pair_coeff * * 1.0 1.0") # ditto
lmp.command("run 10") # ditto
pe = lmp.extract_compute("thermo_pe",0,0) # extract total PE from LAMMPS
print "PE",pe/natoms,thresh
if pe/natoms < thresh: return :pre
with these input script commands:
python loop input 4 10 1.0 -4.0 SELF format iffp file funcs.py
python loop invoke :pre
This has the effect of looping over a series of 10 short runs (10
timesteps each) where the pair style cutoff is increased from a value
of 1.0 in distance units, in increments of 0.1. The looping stops
when the per-atom potential energy falls below a threshold of -4.0 in
energy units. More generally, Python can be used to implement a loop
with complex logic, much more so than can be created using the LAMMPS
"jump"_jump.html and "if"_if.html commands.
Several LAMMPS library functions are called from the loop function.
Get_natoms() returns the number of atoms in the simulation, so that it
can be used to normalize the potential energy that is returned by
extract_compute() for the "thermo_pe" compute that is defined by
default for LAMMPS thermodynamic output. Set_variable() sets the
value of a string variable defined in LAMMPS. This library function
is a useful way for a Python function to return multiple values to
LAMMPS, more than the single value that can be passed back via a
return statement. This cutoff value in the "cut" variable is then
substituted (by LAMMPS) in the pair_style command that is executed
next. Alternatively, the "LAMMPS command option" line could be used
in place of the 2 preceding lines, to have Python insert the value
into the LAMMPS command string.
NOTE: When using the callback mechanism just described, recognize that
there are some operations you should not attempt because LAMMPS cannot
execute them correctly. If the Python function is invoked between
runs in the LAMMPS input script, then it should be OK to invoke any
LAMMPS input script command via the library interface command() or
file() functions, so long as the command would work if it were
executed in the LAMMPS input script directly at the same point.
However, a Python function can also be invoked during a run, whenever
an associated LAMMPS variable it is assigned to is evaluated. If the
variable is an input argument to another LAMMPS command (e.g. "fix
setforce"_fix_setforce.html), then the Python function will be invoked
inside the class for that command, in one of its methods that is
invoked in the middle of a timestep. You cannot execute arbitrary
input script commands from the Python function (again, via the
command() or file() functions) at that point in the run and expect it
to work. Other library functions such as those that invoke computes
or other variables may have hidden side effects as well. In these
cases, LAMMPS has no simple way to check that something illogical is
being attempted.
:line
If you run Python code directly on your workstation, either
interactively or by using Python to launch a Python script stored in a
file, and your code has an error, you will typically see informative
error messages. That is not the case when you run Python code from
LAMMPS using an embedded Python interpreter. The code will typically
fail silently. LAMMPS will catch some errors but cannot tell you
where in the Python code the problem occurred. For example, if the
Python code cannot be loaded and run because it has syntax or other
logic errors, you may get an error from Python pointing to the
offending line, or you may get one of these generic errors from
LAMMPS:
Could not process Python file
Could not process Python string :pre
When the Python function is invoked, if it does not return properly,
you will typically get this generic error from LAMMPS:
Python function evaluation failed :pre
Here are three suggestions for debugging your Python code while
running it under LAMMPS.
First, don't run it under LAMMPS, at least to start with! Debug it
using plain Python. Load and invoke your function, pass it arguments,
check return values, etc.
Second, add Python print statements to the function to check how far
it gets and intermediate values it calculates. See the discussion
above about printing from Python when running in parallel.
Third, use Python exception handling. For example, say this statement
in your Python function is failing, because you have not initialized the
variable foo:
foo += 1 :pre
If you put one (or more) statements inside a "try" statement,
like this:
import exceptions
print "Inside simple function"
try:
foo += 1 # one or more statements here
except Exception, e:
print "FOO error:",e :pre
then you will get this message printed to the screen:
FOO error: local variable 'foo' referenced before assignment :pre
If there is no error in the try statements, then nothing is printed.
Either way the function continues on (unless you put a return or
sys.exit() in the except clause).
:line
[Restrictions:]
This command is part of the PYTHON package. It is only enabled if
LAMMPS was built with that package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
Building LAMMPS with the PYTHON package will link LAMMPS with the
Python library on your system. Settings to enable this are in the
lib/python/Makefile.lammps file. See the lib/python/README file for
information on those settings.
If you use Python code which calls back to LAMMPS, via the SELF input
argument explained above, there is an extra step required when
building LAMMPS. LAMMPS must also be built as a shared library and
your Python function must be able to to load the Python module in
python/lammps.py that wraps the LAMMPS library interface. These are
the same steps required to use Python by itself to wrap LAMMPS.
Details on these steps are explained in "Section
python"_Section_python.html. Note that it is important that the
stand-alone LAMMPS executable and the LAMMPS shared library be
consistent (built from the same source code files) in order for this
to work. If the two have been built at different times using
different source files, problems may occur.
[Related commands:]
"shell"_shell.html, "variable"_variable.html
[Default:] none
diff --git a/examples/USER/cg-cmm/README b/examples/USER/cgsdk/README
similarity index 95%
rename from examples/USER/cg-cmm/README
rename to examples/USER/cgsdk/README
index 6a283114b..5d3a49377 100644
--- a/examples/USER/cg-cmm/README
+++ b/examples/USER/cgsdk/README
@@ -1,19 +1,19 @@
-LAMMPS USER-CMM-CG example problems
+LAMMPS USER-CGSDK example problems
Each of these sub-directories contains a sample problem for the SDK
coarse grained MD potentials that you can run with LAMMPS.
These are the two sample systems
peg-verlet: coarse grained PEG surfactant/water mixture lamella
verlet version
this example uses the plain LJ term only, no charges.
two variants are provided regular harmonic angles and
the SDK variant that includes 1-3 LJ repulsion.
sds-monolayer: coarse grained SDS surfactant monolayers at water/vapor
interface.
this example uses the SDK LJ term with coulomb and shows
how to use the combined coulomb style vs. hybrid/overlay
with possible optimizations due to the small number of
charged particles in this system
diff --git a/examples/USER/cg-cmm/peg-verlet/data.pegc12e8.gz b/examples/USER/cgsdk/peg-verlet/data.pegc12e8.gz
similarity index 100%
rename from examples/USER/cg-cmm/peg-verlet/data.pegc12e8.gz
rename to examples/USER/cgsdk/peg-verlet/data.pegc12e8.gz
diff --git a/examples/USER/cg-cmm/peg-verlet/in.pegc12e8 b/examples/USER/cgsdk/peg-verlet/in.pegc12e8
similarity index 100%
rename from examples/USER/cg-cmm/peg-verlet/in.pegc12e8
rename to examples/USER/cgsdk/peg-verlet/in.pegc12e8
diff --git a/examples/USER/cg-cmm/peg-verlet/in.pegc12e8-angle b/examples/USER/cgsdk/peg-verlet/in.pegc12e8-angle
similarity index 100%
rename from examples/USER/cg-cmm/peg-verlet/in.pegc12e8-angle
rename to examples/USER/cgsdk/peg-verlet/in.pegc12e8-angle
diff --git a/examples/USER/cg-cmm/peg-verlet/log.pegc12e8 b/examples/USER/cgsdk/peg-verlet/log.pegc12e8
similarity index 100%
rename from examples/USER/cg-cmm/peg-verlet/log.pegc12e8
rename to examples/USER/cgsdk/peg-verlet/log.pegc12e8
diff --git a/examples/USER/cg-cmm/peg-verlet/log.pegc12e8-angle b/examples/USER/cgsdk/peg-verlet/log.pegc12e8-angle
similarity index 100%
rename from examples/USER/cg-cmm/peg-verlet/log.pegc12e8-angle
rename to examples/USER/cgsdk/peg-verlet/log.pegc12e8-angle
diff --git a/examples/USER/cg-cmm/sds-monolayer/data.sds.gz b/examples/USER/cgsdk/sds-monolayer/data.sds.gz
similarity index 100%
rename from examples/USER/cg-cmm/sds-monolayer/data.sds.gz
rename to examples/USER/cgsdk/sds-monolayer/data.sds.gz
diff --git a/examples/USER/cg-cmm/sds-monolayer/in.sds-hybrid b/examples/USER/cgsdk/sds-monolayer/in.sds-hybrid
similarity index 100%
rename from examples/USER/cg-cmm/sds-monolayer/in.sds-hybrid
rename to examples/USER/cgsdk/sds-monolayer/in.sds-hybrid
diff --git a/examples/USER/cg-cmm/sds-monolayer/in.sds-regular b/examples/USER/cgsdk/sds-monolayer/in.sds-regular
similarity index 100%
rename from examples/USER/cg-cmm/sds-monolayer/in.sds-regular
rename to examples/USER/cgsdk/sds-monolayer/in.sds-regular
diff --git a/examples/USER/cg-cmm/sds-monolayer/log.sds-hybrid b/examples/USER/cgsdk/sds-monolayer/log.sds-hybrid
similarity index 100%
rename from examples/USER/cg-cmm/sds-monolayer/log.sds-hybrid
rename to examples/USER/cgsdk/sds-monolayer/log.sds-hybrid
diff --git a/examples/USER/cg-cmm/sds-monolayer/log.sds-regular b/examples/USER/cgsdk/sds-monolayer/log.sds-regular
similarity index 100%
rename from examples/USER/cg-cmm/sds-monolayer/log.sds-regular
rename to examples/USER/cgsdk/sds-monolayer/log.sds-regular
diff --git a/examples/USER/flow_gauss/README b/examples/USER/misc/flow_gauss/README
similarity index 100%
rename from examples/USER/flow_gauss/README
rename to examples/USER/misc/flow_gauss/README
diff --git a/examples/USER/flow_gauss/in.GD b/examples/USER/misc/flow_gauss/in.GD
similarity index 100%
rename from examples/USER/flow_gauss/in.GD
rename to examples/USER/misc/flow_gauss/in.GD
diff --git a/examples/cmap/log.11Apr17.cmap.g++.1 b/examples/cmap/log.11Apr17.cmap.g++.1
new file mode 100644
index 000000000..9b4fc2999
--- /dev/null
+++ b/examples/cmap/log.11Apr17.cmap.g++.1
@@ -0,0 +1,205 @@
+LAMMPS (31 Mar 2017)
+# Created by charmm2lammps v1.8.2.6 beta on Thu Mar 3 20:56:57 EST 2016
+
+units real
+neigh_modify delay 2 every 1
+#newton off
+
+boundary p p p
+
+atom_style full
+bond_style harmonic
+angle_style charmm
+dihedral_style charmmfsw
+improper_style harmonic
+
+pair_style lj/charmmfsw/coul/charmmfsh 8 12
+pair_modify mix arithmetic
+
+fix cmap all cmap charmm22.cmap
+Reading potential file charmm22.cmap with DATE: 2016-09-26
+fix_modify cmap energy yes
+
+read_data gagg.data fix cmap crossterm CMAP
+ orthogonal box = (-34.4147 -36.1348 -39.3491) to (45.5853 43.8652 40.6509)
+ 1 by 1 by 1 MPI processor grid
+ reading atoms ...
+ 34 atoms
+ scanning bonds ...
+ 4 = max bonds/atom
+ scanning angles ...
+ 6 = max angles/atom
+ scanning dihedrals ...
+ 12 = max dihedrals/atom
+ scanning impropers ...
+ 1 = max impropers/atom
+ reading bonds ...
+ 33 bonds
+ reading angles ...
+ 57 angles
+ reading dihedrals ...
+ 75 dihedrals
+ reading impropers ...
+ 7 impropers
+ 4 = max # of 1-2 neighbors
+ 7 = max # of 1-3 neighbors
+ 13 = max # of 1-4 neighbors
+ 16 = max # of special neighbors
+
+special_bonds charmm
+fix 1 all nve
+
+#fix 1 all nvt temp 300 300 100.0
+#fix 2 all shake 1e-9 500 0 m 1.0
+
+velocity all create 0.0 12345678 dist uniform
+
+thermo 1000
+thermo_style custom step ecoul evdwl ebond eangle edihed f_cmap eimp
+timestep 2.0
+
+run 100000
+Neighbor list info ...
+ update every 1 steps, delay 2 steps, check yes
+ max neighbors/atom: 2000, page size: 100000
+ master list distance cutoff = 14
+ ghost atom cutoff = 14
+ binsize = 7, bins = 12 12 12
+ 1 neighbor lists, perpetual/occasional/extra = 1 0 0
+ (1) pair lj/charmmfsw/coul/charmmfsh, perpetual
+ attributes: half, newton on
+ pair build: half/bin/newton
+ stencil: half/bin/3d/newton
+ bin: standard
+Per MPI rank memory allocation (min/avg/max) = 14.96 | 14.96 | 14.96 Mbytes
+Step E_coul E_vdwl E_bond E_angle E_dihed f_cmap E_impro
+ 0 16.287573 -0.85933785 1.2470497 4.8441789 4.5432816 -1.473352 0.10453023
+ 1000 18.816462 -0.84379243 0.78931817 2.7554247 4.4371421 -2.7762038 0.12697656
+ 2000 18.091571 -1.045888 0.72306589 3.0951524 4.6725102 -2.3580092 0.22712496
+ 3000 17.835596 -1.2171641 0.72666403 2.6696491 5.4373798 -2.0737041 0.075101693
+ 4000 16.211232 -0.42713611 0.99472642 3.8961462 5.2009895 -2.5626866 0.17356243
+ 5000 17.72183 -0.57081189 0.90733068 3.4376382 4.5457582 -2.3727543 0.12354518
+ 6000 18.753977 -1.5772499 0.81468321 2.9236782 4.6033216 -2.3380859 0.12835782
+ 7000 18.186024 -0.84205608 0.58996182 3.0329585 4.7221473 -2.5733243 0.10047631
+ 8000 18.214306 -1.1360938 0.72597611 3.7493028 4.7319958 -2.8957969 0.2006046
+ 9000 17.248408 -0.48641993 0.90266229 2.9721743 4.7651056 -2.1473354 0.1302043
+ 10000 17.760655 -1.2968444 0.92384663 3.7007455 4.7378947 -2.2147779 0.06940579
+ 11000 17.633929 -0.57368413 0.84872849 3.4277114 4.285393 -2.236944 0.17204973
+ 12000 18.305835 -1.0675148 0.75879532 2.8853173 4.685027 -2.409087 0.087538866
+ 13000 17.391558 -0.9975291 0.66671947 3.8065638 5.2285578 -2.4198822 0.06253594
+ 14000 17.483387 -0.67727643 0.91966477 3.7317031 4.7770445 -2.6080027 0.11487095
+ 15000 18.131749 -1.1918751 1.0025684 3.1238131 4.789742 -2.2546745 0.13782813
+ 16000 16.972343 -0.43926531 0.60644597 3.7551592 4.8658618 -2.2627659 0.12353145
+ 17000 18.080785 -1.2073565 0.7867072 3.5671106 4.43754 -2.5092904 0.17429146
+ 18000 17.474576 -0.97836065 0.8678524 3.7961537 4.3409032 -1.8922572 0.134048
+ 19000 17.000911 -1.2286864 0.83615834 3.9322908 4.9319492 -2.3281576 0.056689619
+ 20000 17.043286 -0.8506561 0.80966589 3.5087339 4.8603878 -2.3365263 0.096794824
+ 21000 17.314495 -1.1430889 0.95363892 4.2446032 4.2756745 -2.1829483 0.17119518
+ 22000 18.954881 -0.998673 0.58688334 2.71536 4.6634319 -2.6862804 0.20328442
+ 23000 17.160427 -0.97803282 0.86894041 4.0897736 4.3146238 -2.1962289 0.075339092
+ 24000 17.602026 -1.0833323 0.94888776 3.7341878 4.3084335 -2.1640414 0.081493681
+ 25000 17.845584 -1.3432612 0.93497086 3.8911043 4.468032 -2.3475883 0.093204333
+ 26000 17.833261 -1.1020534 0.77931087 3.7628141 4.512381 -2.3134761 0.15568465
+ 27000 17.68607 -1.3222026 1.1985872 3.5817624 4.6360755 -2.3492774 0.08427906
+ 28000 18.326649 -1.2669291 0.74809075 3.2624429 4.4698564 -2.3679076 0.14677293
+ 29000 17.720933 -1.0773886 0.83099482 3.7652834 4.6584594 -2.8255303 0.23092596
+ 30000 18.201999 -1.0168706 1.0637455 3.453095 4.3738593 -2.8063214 0.18658217
+ 31000 17.823502 -1.2685768 0.84805585 3.8600661 4.2195821 -2.1169716 0.12517101
+ 32000 16.883133 -0.62062648 0.84434922 3.5042683 5.1264906 -2.2674699 0.030138165
+ 33000 17.805715 -1.679553 1.2430372 4.314677 4.2523894 -2.3008321 0.18591872
+ 34000 16.723767 -0.54189072 1.1282827 3.8542159 4.3026559 -2.2186336 0.05392425
+ 35000 17.976909 -0.72092075 0.5876319 2.9726396 5.0881439 -2.491692 0.17356291
+ 36000 18.782492 -1.514246 0.63237955 3.2777164 4.6077164 -2.502574 0.082537318
+ 37000 17.247716 -0.6344626 0.79885976 3.452491 4.7618281 -2.3902444 0.11450271
+ 38000 17.996494 -1.6712877 1.0111769 4.1689136 4.46963 -2.4076725 0.11875756
+ 39000 17.586857 -0.74508086 0.95970486 3.7395038 4.6011357 -2.9854953 0.30143284
+ 40000 17.494879 -0.30772446 0.72047991 3.2604877 4.7283734 -2.3812495 0.16399034
+ 41000 15.855772 -0.49642605 0.82496448 4.5139653 4.76884 -2.214141 0.10899661
+ 42000 17.898568 -1.3078863 1.1505144 4.0429873 4.3889581 -2.8696559 0.23336417
+ 43000 19.014372 -1.6325979 1.1553166 3.5660772 4.4047997 -2.9302044 0.13672127
+ 44000 18.250782 -0.97211613 0.72714301 3.2258362 4.7257298 -2.5533613 0.11968073
+ 45000 17.335174 0.24746331 1.0415866 3.3220992 4.5251095 -3.0415216 0.24453084
+ 46000 17.72846 -0.9541418 0.88153841 3.7893452 4.5251883 -2.4003613 0.051809816
+ 47000 18.226762 -0.67057787 0.84352989 3.0609522 4.5449078 -2.4694254 0.073703949
+ 48000 17.838074 -0.88768441 1.3812262 3.5890492 4.5827868 -3.0137515 0.21417113
+ 49000 17.973733 -0.75118705 0.69667886 3.3989025 4.7058886 -2.8243945 0.26665792
+ 50000 17.461583 -0.65040016 0.68943524 2.9374743 5.6971777 -2.4438011 0.1697603
+ 51000 16.79766 -0.010684434 0.89795555 3.959039 4.56763 -2.5101098 0.15048853
+ 52000 17.566543 -0.7262764 0.74354418 3.3423185 4.8426523 -2.4187649 0.16908776
+ 53000 17.964274 -0.9270914 1.065952 3.0397181 4.4682262 -2.2179503 0.07873406
+ 54000 17.941256 -0.5807578 0.76516121 3.7262371 4.6975126 -3.179899 0.24433708
+ 55000 17.079478 -0.48559832 0.95364453 3.0414645 5.2811414 -2.7064882 0.30102814
+ 56000 17.632179 -0.75403299 0.97577942 3.3672363 4.4851336 -2.3683659 0.051117638
+ 57000 16.17128 -0.44699325 0.76341543 4.267716 5.0881056 -2.4122329 0.16671692
+ 58000 16.899276 -0.76481024 1.0400825 3.973493 4.8823309 -2.4270284 0.048716383
+ 59000 18.145412 -0.84968335 0.71698306 3.2024358 4.6115739 -2.2520353 0.19466966
+ 60000 17.578258 -1.0067331 0.72822527 3.5375208 4.9110255 -2.2319607 0.11922362
+ 61000 17.434762 -1.0244393 0.90593099 3.8446915 4.8571191 -2.6228357 0.23259208
+ 62000 17.580489 -1.1135917 0.79577432 3.7043524 4.6058114 -2.351492 0.042904152
+ 63000 18.207335 -1.1512268 0.82684507 3.4114738 4.351069 -2.1878441 0.082922105
+ 64000 18.333083 -1.1182287 0.74058959 3.6905164 4.3226172 -2.7110393 0.14721704
+ 65000 16.271579 -0.7122151 1.0200168 4.6983643 4.3681131 -2.194921 0.12831024
+ 66000 17.316444 -0.5729385 0.85254108 3.5769963 4.5526705 -2.3321328 0.040452643
+ 67000 17.19011 -0.8814312 1.1381258 3.8605789 4.4183813 -2.299607 0.091527355
+ 68000 18.223367 -1.362189 0.74472056 3.259165 4.486512 -2.2181134 0.048952796
+ 69000 17.646348 -0.91647162 0.73990335 3.9313692 5.2663097 -3.3816778 0.27769877
+ 70000 18.173493 -1.3107718 0.96484426 3.219728 4.5045124 -2.3349534 0.082327407
+ 71000 17.0627 -0.58509083 0.85964129 3.8490884 4.437895 -2.1673348 0.24151404
+ 72000 17.809764 -0.35128902 0.65479258 3.3945008 4.6160508 -2.5486166 0.10829531
+ 73000 18.27769 -1.0739758 0.80890957 3.6070901 4.6256762 -2.4576547 0.080025736
+ 74000 18.109437 -1.0691837 0.66679323 3.5923203 4.4825716 -2.5048169 0.21372319
+ 75000 17.914569 -1.3500765 1.2993494 3.362421 4.4160377 -2.1278163 0.19397641
+ 76000 16.563928 -0.16539261 1.0067302 3.5742755 4.8581915 -2.1362429 0.059822408
+ 77000 18.130477 -0.38361279 0.43406954 3.4725995 4.7005855 -2.8836242 0.11958174
+ 78000 16.746204 -1.1732959 0.7455507 3.6296638 5.6344113 -2.459208 0.16099803
+ 79000 18.243999 -1.5850155 1.0108545 3.4727867 4.3367411 -2.316686 0.070480814
+ 80000 16.960715 -0.84100929 0.91604996 3.862215 4.780949 -2.3711596 0.073916605
+ 81000 17.697722 -1.1126605 0.952804 3.7114455 4.4216316 -2.2770085 0.091372066
+ 82000 17.835901 -1.3091474 0.71867629 3.8168122 5.0150205 -2.4730634 0.062592852
+ 83000 19.168418 -1.476938 0.75592316 3.2304519 4.3946471 -2.2991395 0.13083324
+ 84000 17.945778 -1.5223622 1.0859941 3.4334011 5.0286682 -2.7550892 0.2476269
+ 85000 17.950251 -0.85843846 0.86888218 3.3101287 4.5511879 -2.3640013 0.12080834
+ 86000 17.480699 -0.97493649 0.85049761 3.4973085 4.6344922 -2.343121 0.2009677
+ 87000 17.980244 -1.114983 0.88796989 3.4113329 4.3535853 -2.2535412 0.14494917
+ 88000 18.023866 -1.226683 0.62339706 3.7649269 4.5923973 -2.3923523 0.10464375
+ 89000 16.362829 -0.311462 1.0265375 4.0101723 4.4184777 -2.0314129 0.056570704
+ 90000 17.533149 -0.41526788 1.0362029 3.4247412 4.2734431 -2.4776658 0.16960663
+ 91000 17.719099 -1.1956801 1.0069945 3.2380672 4.8982805 -2.2154906 0.12950936
+ 92000 17.762654 -1.170027 0.95814525 3.5217717 4.5405343 -2.5983677 0.15037754
+ 93000 17.393958 -0.45641026 0.6579069 3.6002204 4.5942053 -2.5559641 0.12026544
+ 94000 16.8182 -0.92962066 0.86801362 4.2914398 4.659848 -2.5251987 0.18000415
+ 95000 17.642086 -0.7994896 0.7003756 3.8036697 4.5252487 -2.4166307 0.15686517
+ 96000 18.114292 -1.5102104 1.2635908 3.2764427 5.0659496 -2.2777806 0.054309645
+ 97000 18.575765 -1.6015311 0.69500699 3.1649317 4.9945742 -2.4012125 0.067373724
+ 98000 16.578893 -0.78030229 0.91524222 4.4429655 4.4622392 -2.4052655 0.15355705
+ 99000 17.26063 -0.57832833 0.7098846 3.9000046 4.5576484 -2.5333026 0.25517222
+ 100000 18.377235 -0.89109577 0.68988617 2.8751751 4.4115591 -2.3560731 0.12185212
+Loop time of 2.96043 on 1 procs for 100000 steps with 34 atoms
+
+Performance: 5836.990 ns/day, 0.004 hours/ns, 33778.875 timesteps/s
+99.9% CPU use with 1 MPI tasks x no OpenMP threads
+
+MPI task timing breakdown:
+Section | min time | avg time | max time |%varavg| %total
+---------------------------------------------------------------
+Pair | 1.074 | 1.074 | 1.074 | 0.0 | 36.28
+Bond | 1.6497 | 1.6497 | 1.6497 | 0.0 | 55.72
+Neigh | 0.007576 | 0.007576 | 0.007576 | 0.0 | 0.26
+Comm | 0.012847 | 0.012847 | 0.012847 | 0.0 | 0.43
+Output | 0.0010746 | 0.0010746 | 0.0010746 | 0.0 | 0.04
+Modify | 0.16485 | 0.16485 | 0.16485 | 0.0 | 5.57
+Other | | 0.05037 | | | 1.70
+
+Nlocal: 34 ave 34 max 34 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+Nghost: 0 ave 0 max 0 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+Neighs: 395 ave 395 max 395 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+
+Total # of neighbors = 395
+Ave neighs/atom = 11.6176
+Ave special neighs/atom = 9.52941
+Neighbor list builds = 253
+Dangerous builds = 0
+Total wall time: 0:00:02
diff --git a/examples/cmap/log.11Apr17.cmap.g++.4 b/examples/cmap/log.11Apr17.cmap.g++.4
new file mode 100644
index 000000000..ec471d5a7
--- /dev/null
+++ b/examples/cmap/log.11Apr17.cmap.g++.4
@@ -0,0 +1,205 @@
+LAMMPS (31 Mar 2017)
+# Created by charmm2lammps v1.8.2.6 beta on Thu Mar 3 20:56:57 EST 2016
+
+units real
+neigh_modify delay 2 every 1
+#newton off
+
+boundary p p p
+
+atom_style full
+bond_style harmonic
+angle_style charmm
+dihedral_style charmmfsw
+improper_style harmonic
+
+pair_style lj/charmmfsw/coul/charmmfsh 8 12
+pair_modify mix arithmetic
+
+fix cmap all cmap charmm22.cmap
+Reading potential file charmm22.cmap with DATE: 2016-09-26
+fix_modify cmap energy yes
+
+read_data gagg.data fix cmap crossterm CMAP
+ orthogonal box = (-34.4147 -36.1348 -39.3491) to (45.5853 43.8652 40.6509)
+ 1 by 2 by 2 MPI processor grid
+ reading atoms ...
+ 34 atoms
+ scanning bonds ...
+ 4 = max bonds/atom
+ scanning angles ...
+ 6 = max angles/atom
+ scanning dihedrals ...
+ 12 = max dihedrals/atom
+ scanning impropers ...
+ 1 = max impropers/atom
+ reading bonds ...
+ 33 bonds
+ reading angles ...
+ 57 angles
+ reading dihedrals ...
+ 75 dihedrals
+ reading impropers ...
+ 7 impropers
+ 4 = max # of 1-2 neighbors
+ 7 = max # of 1-3 neighbors
+ 13 = max # of 1-4 neighbors
+ 16 = max # of special neighbors
+
+special_bonds charmm
+fix 1 all nve
+
+#fix 1 all nvt temp 300 300 100.0
+#fix 2 all shake 1e-9 500 0 m 1.0
+
+velocity all create 0.0 12345678 dist uniform
+
+thermo 1000
+thermo_style custom step ecoul evdwl ebond eangle edihed f_cmap eimp
+timestep 2.0
+
+run 100000
+Neighbor list info ...
+ update every 1 steps, delay 2 steps, check yes
+ max neighbors/atom: 2000, page size: 100000
+ master list distance cutoff = 14
+ ghost atom cutoff = 14
+ binsize = 7, bins = 12 12 12
+ 1 neighbor lists, perpetual/occasional/extra = 1 0 0
+ (1) pair lj/charmmfsw/coul/charmmfsh, perpetual
+ attributes: half, newton on
+ pair build: half/bin/newton
+ stencil: half/bin/3d/newton
+ bin: standard
+Per MPI rank memory allocation (min/avg/max) = 14.94 | 15.57 | 16.2 Mbytes
+Step E_coul E_vdwl E_bond E_angle E_dihed f_cmap E_impro
+ 0 16.287573 -0.85933785 1.2470497 4.8441789 4.5432816 -1.473352 0.10453023
+ 1000 18.816462 -0.84379243 0.78931817 2.7554247 4.4371421 -2.7762038 0.12697656
+ 2000 18.091571 -1.045888 0.72306589 3.0951524 4.6725102 -2.3580092 0.22712496
+ 3000 17.835596 -1.2171641 0.72666403 2.6696491 5.4373798 -2.0737041 0.075101693
+ 4000 16.211232 -0.42713611 0.99472642 3.8961462 5.2009895 -2.5626866 0.17356243
+ 5000 17.72183 -0.57081189 0.90733068 3.4376382 4.5457582 -2.3727543 0.12354518
+ 6000 18.753977 -1.5772499 0.81468321 2.9236782 4.6033216 -2.3380859 0.12835782
+ 7000 18.186024 -0.84205609 0.58996181 3.0329584 4.7221473 -2.5733244 0.10047631
+ 8000 18.214306 -1.1360934 0.72597583 3.7493032 4.7319959 -2.8957975 0.20060467
+ 9000 17.248415 -0.48642024 0.90266262 2.9721744 4.7651003 -2.1473349 0.13020438
+ 10000 17.760663 -1.2968458 0.92384687 3.7007432 4.7378917 -2.2147799 0.06940514
+ 11000 17.63395 -0.57366075 0.84871737 3.4276851 4.2853865 -2.2369491 0.17205075
+ 12000 18.305713 -1.0672299 0.75876262 2.8852171 4.6850229 -2.4090072 0.087568888
+ 13000 17.383367 -0.99678627 0.66712651 3.8060954 5.233865 -2.4180629 0.062014239
+ 14000 17.510901 -0.68723297 0.92448551 3.7550867 4.7321218 -2.6059088 0.11504409
+ 15000 18.080165 -1.13316 0.99982253 3.09947 4.8171402 -2.2713372 0.14580371
+ 16000 17.383245 -0.4535296 0.57826268 3.6453593 4.6541138 -2.2434512 0.13285609
+ 17000 17.111153 -0.3414839 0.73667584 3.7485311 4.6262965 -2.6166049 0.12635815
+ 18000 16.862046 -1.3592061 1.2371142 4.4878937 4.2937117 -2.2112584 0.066145125
+ 19000 18.313891 -1.654238 0.90644101 3.3934089 4.550735 -2.1862171 0.081267736
+ 20000 19.083561 -1.3081747 0.56257812 2.7633848 4.6211438 -2.5196707 0.13763071
+ 21000 18.23741 -1.051353 0.64408722 3.1735565 4.6912533 -2.2491947 0.099394904
+ 22000 17.914515 -0.89769621 0.61793801 3.1224992 4.8683543 -2.282475 0.14524537
+ 23000 16.756122 -0.98277883 1.2554905 3.7916115 4.7301443 -2.3094994 0.10226772
+ 24000 16.109857 -0.54593177 0.86934462 4.4293574 4.926985 -2.2652264 0.11414331
+ 25000 18.590559 -1.497327 1.1898361 2.9134403 4.7854107 -2.4437918 0.067416154
+ 26000 18.493391 -1.0533797 0.4889578 3.6563013 4.6171721 -2.3240835 0.11607829
+ 27000 18.646522 -1.1229601 0.67956815 2.7937638 4.8991207 -2.4068997 0.10109147
+ 28000 18.545103 -1.7237438 0.72488022 3.8041665 4.6459974 -2.4339333 0.21943258
+ 29000 17.840505 -1.0909667 0.88133248 3.3698456 5.0311644 -2.5116617 0.08102693
+ 30000 17.649527 -0.65409177 0.86781692 3.24112 4.9903073 -2.6234925 0.14799777
+ 31000 18.156812 -0.77476556 0.83192789 2.9620784 4.9160635 -2.8571635 0.22283201
+ 32000 18.251583 -1.3384075 0.8059007 3.2588176 4.4365328 -2.1875071 0.087883637
+ 33000 17.702785 -0.88311587 0.98573641 3.4645713 4.2650091 -2.0909158 0.14233004
+ 34000 17.123413 -1.4873429 1.0419563 4.2628178 4.6318762 -2.2292095 0.105354
+ 35000 18.162061 -1.0136007 0.82436129 3.6365024 4.5801677 -2.6856989 0.28648222
+ 36000 17.65618 -1.094718 0.8872444 3.5075241 4.6382423 -2.3895134 0.18116961
+ 37000 17.336475 -1.0657995 0.98869254 3.9252927 4.4383632 -2.2048244 0.22285949
+ 38000 17.369467 -0.97623132 0.6712095 4.1349304 4.597754 -2.4088341 0.14608514
+ 39000 18.170206 -1.2344285 0.77546195 3.6451049 4.7482287 -2.9895286 0.25768859
+ 40000 16.210866 -0.81407781 0.99246271 4.2676233 5.0253763 -2.2929865 0.13348624
+ 41000 17.641798 -1.0868157 0.80119513 3.4302526 5.280872 -2.4025406 0.22747391
+ 42000 18.349848 -1.613759 1.1497004 3.7800682 4.3237683 -2.8676401 0.2120425
+ 43000 19.130245 -1.196778 0.71845659 2.9325758 4.3684415 -2.433424 0.12240982
+ 44000 18.061321 -1.2410101 1.0329373 3.0751569 4.7138313 -2.2880904 0.075814461
+ 45000 18.162713 -1.4414622 1.009159 4.2298758 4.589593 -2.8502298 0.21606844
+ 46000 18.591574 -0.99730412 1.0955215 3.3965004 4.359466 -3.1049731 0.17322629
+ 47000 18.380259 -1.2717381 0.72291269 3.3958016 4.6099628 -2.4605065 0.19825185
+ 48000 18.130478 -1.5051279 1.2087492 3.2488529 4.6690881 -2.2518174 0.05633061
+ 49000 16.419912 -0.89320635 0.98926144 4.0388252 4.9919488 -2.1699511 0.15646479
+ 50000 16.453196 -1.0433497 0.778346 4.6078069 4.7320614 -2.3760788 0.17161976
+ 51000 18.245221 -0.89550444 0.9310446 3.0758194 4.3944595 -2.3082379 0.19983428
+ 52000 17.839632 -1.0221781 0.76425017 3.3331547 4.5368437 -2.0988773 0.21098435
+ 53000 18.693035 -1.4231915 0.76333082 3.1612761 4.583242 -2.4485762 0.089191206
+ 54000 16.334672 -0.36309884 1.0200365 4.6700448 4.1628702 -2.1713841 0.11431995
+ 55000 17.33842 -0.61522682 0.89847366 3.4970659 4.673495 -2.4743036 0.068004878
+ 56000 17.790294 -1.0150845 0.73697112 3.6000297 4.5988343 -2.4822509 0.11434632
+ 57000 18.913486 -1.0985507 1.0231848 2.7483267 4.4421755 -2.574424 0.1763388
+ 58000 17.586896 -0.98284126 0.96965633 3.3330357 4.5325543 -2.1936869 0.083230915
+ 59000 17.77788 -1.1649953 0.83092298 3.8004148 4.3940176 -2.3136642 0.017207608
+ 60000 17.013042 -0.21728023 1.1688832 3.5374476 4.5462244 -2.4425301 0.15028297
+ 61000 17.236242 -1.1342147 1.0301086 3.685948 4.6842331 -2.328108 0.070210812
+ 62000 17.529852 -1.2961547 1.0323133 3.4474598 5.1435839 -2.4553423 0.060842687
+ 63000 18.754704 -1.1816999 0.51806039 3.140172 4.5832701 -2.2713213 0.06327871
+ 64000 17.54594 -1.3592836 0.9694558 4.1363258 4.3547729 -2.3818433 0.12634448
+ 65000 16.962312 -0.54192775 0.90321315 4.0788618 4.2008255 -2.1376711 0.039504515
+ 66000 18.078619 -1.3552947 1.0716861 3.3285374 4.7229362 -2.3331115 0.21978698
+ 67000 17.132732 -1.4376876 0.91486534 4.4461852 4.6894176 -2.3655045 0.068150385
+ 68000 18.69286 -1.2856207 0.3895394 3.0620063 4.9922992 -2.3459189 0.079879643
+ 69000 18.329552 -1.1545957 0.88632275 3.1741058 4.4562418 -2.7094867 0.25329613
+ 70000 16.681168 -0.94434373 1.2450393 4.5737944 4.4902996 -2.4581775 0.15313095
+ 71000 17.375032 -1.0514442 1.0741595 3.4896146 4.8407713 -2.5302576 0.13640847
+ 72000 17.833013 -0.9047134 0.87067876 3.1658924 4.8825932 -2.4398117 0.2343991
+ 73000 17.421411 -1.2190741 0.73706811 4.2895 4.6464636 -2.3872727 0.19696525
+ 74000 17.383158 -0.34208984 0.71333984 3.2718891 4.2718495 -2.2484281 0.10827022
+ 75000 17.20885 -1.2710479 1.125102 3.8414467 5.3222741 -2.375505 0.12910797
+ 76000 16.811578 -0.545162 0.59076961 3.9118604 4.8031296 -2.2777895 0.063015508
+ 77000 16.679231 -0.080955983 0.7253398 3.4203454 5.0987608 -2.379614 0.12961874
+ 78000 18.164524 -1.3115525 0.92526408 3.5764487 4.3814882 -2.3712488 0.073436724
+ 79000 17.738686 -1.0697859 1.2186866 3.0593848 4.6551053 -2.2505871 0.075340661
+ 80000 16.767483 -0.84777477 1.03128 4.1982958 4.6992227 -2.4146425 0.079774219
+ 81000 16.257265 0.62803774 0.84032194 3.3873471 5.0961071 -2.7219776 0.20467848
+ 82000 18.232082 -1.2129302 0.50746051 3.9207128 4.5073437 -2.599371 0.094522372
+ 83000 16.618985 -0.60917055 0.8825847 3.805497 4.9560959 -2.2194726 0.14852687
+ 84000 17.90762 -0.82336075 0.90504161 3.0324198 4.7444271 -2.5036073 0.15860682
+ 85000 16.699883 -0.50297228 0.83405307 3.8598996 4.7971968 -2.2427788 0.10338668
+ 86000 16.353038 -0.0096880616 0.80705167 4.0865115 4.5364338 -2.4548873 0.098456203
+ 87000 17.887331 -0.75281219 1.0030148 4.0117123 4.3443074 -2.9774392 0.16190152
+ 88000 18.583708 -1.4867053 0.86324814 3.3971237 4.3526221 -2.221239 0.14459352
+ 89000 17.684828 -1.283764 1.0021118 3.5426808 4.9057005 -2.3921967 0.05844702
+ 90000 17.2597 -0.84306489 0.99797936 3.8896866 4.4315457 -2.5662899 0.18270206
+ 91000 16.705581 -0.44704047 0.75239556 3.470805 4.976868 -2.1894571 0.12312848
+ 92000 17.548071 -1.2222664 0.92898812 4.0813773 4.3432647 -2.1631158 0.14071343
+ 93000 17.163675 -0.94994776 0.96876981 3.9137692 4.4388666 -2.1260232 0.13187968
+ 94000 18.842071 -1.2822113 0.58767049 3.1393475 4.5820965 -2.7264682 0.10406266
+ 95000 18.112287 -1.1011381 0.63546648 3.4672667 4.486275 -2.2991936 0.041589685
+ 96000 17.102713 -0.6877313 0.8389032 3.6892719 4.5676004 -2.1905327 0.13507011
+ 97000 16.778253 -1.2902153 1.1588744 4.2820083 4.9537657 -2.4798159 0.35696636
+ 98000 18.34638 -1.2908146 1.185356 3.0739807 4.4575453 -2.3959144 0.22407922
+ 99000 17.995148 -1.3939639 0.7727299 3.8774144 4.4345458 -2.1142776 0.13550099
+ 100000 18.444746 -1.2456693 0.86061526 3.468696 4.5264336 -2.4239851 0.074369539
+Loop time of 2.52011 on 4 procs for 100000 steps with 34 atoms
+
+Performance: 6856.851 ns/day, 0.004 hours/ns, 39680.850 timesteps/s
+98.8% CPU use with 4 MPI tasks x no OpenMP threads
+
+MPI task timing breakdown:
+Section | min time | avg time | max time |%varavg| %total
+---------------------------------------------------------------
+Pair | 0.072506 | 0.28131 | 0.69088 | 46.2 | 11.16
+Bond | 0.050544 | 0.45307 | 0.9416 | 57.6 | 17.98
+Neigh | 0.0060885 | 0.0061619 | 0.0062056 | 0.1 | 0.24
+Comm | 0.44686 | 1.3679 | 2.0111 | 53.5 | 54.28
+Output | 0.0028057 | 0.0029956 | 0.003264 | 0.3 | 0.12
+Modify | 0.028202 | 0.095174 | 0.15782 | 19.8 | 3.78
+Other | | 0.3135 | | | 12.44
+
+Nlocal: 8.5 ave 14 max 2 min
+Histogram: 1 0 1 0 0 0 0 0 0 2
+Nghost: 25.5 ave 32 max 20 min
+Histogram: 2 0 0 0 0 0 0 1 0 1
+Neighs: 98.75 ave 242 max 31 min
+Histogram: 2 0 1 0 0 0 0 0 0 1
+
+Total # of neighbors = 395
+Ave neighs/atom = 11.6176
+Ave special neighs/atom = 9.52941
+Neighbor list builds = 246
+Dangerous builds = 0
+Total wall time: 0:00:02
diff --git a/examples/mscg/log.31Mar17.g++.1 b/examples/mscg/log.31Mar17.g++.1
new file mode 100644
index 000000000..c67bc483d
--- /dev/null
+++ b/examples/mscg/log.31Mar17.g++.1
@@ -0,0 +1,145 @@
+LAMMPS (13 Apr 2017)
+units real
+atom_style full
+pair_style zero 10.0
+
+read_data data.meoh
+ orthogonal box = (-20.6917 -20.6917 -20.6917) to (20.6917 20.6917 20.6917)
+ 1 by 1 by 1 MPI processor grid
+ reading atoms ...
+ 1000 atoms
+ 0 = max # of 1-2 neighbors
+ 0 = max # of 1-3 neighbors
+ 0 = max # of 1-4 neighbors
+ 1 = max # of special neighbors
+pair_coeff * *
+
+thermo 1
+thermo_style custom step
+
+# Test 1a: range finder functionality
+fix 1 all mscg 1 range on
+rerun dump.meoh first 0 last 4500 every 250 dump x y z fx fy fz
+Neighbor list info ...
+ update every 1 steps, delay 10 steps, check yes
+ max neighbors/atom: 2000, page size: 100000
+ master list distance cutoff = 12
+ ghost atom cutoff = 12
+ binsize = 6, bins = 7 7 7
+ 1 neighbor lists, perpetual/occasional/extra = 1 0 0
+ (1) pair zero, perpetual
+ attributes: half, newton on
+ pair build: half/bin/newton
+ stencil: half/bin/3d/newton
+ bin: standard
+Per MPI rank memory allocation (min/avg/max) = 5.794 | 5.794 | 5.794 Mbytes
+Step
+ 0
+ 250
+ 500
+ 750
+ 1000
+ 1250
+ 1500
+ 1750
+ 2000
+ 2250
+ 2500
+ 2750
+ 3000
+ 3250
+ 3500
+ 3750
+ 4000
+ 4250
+ 4500
+Loop time of 0.581537 on 1 procs for 19 steps with 1000 atoms
+
+Performance: 2.823 ns/day, 8.502 hours/ns, 32.672 timesteps/s
+99.2% CPU use with 1 MPI tasks x no OpenMP threads
+
+MPI task timing breakdown:
+Section | min time | avg time | max time |%varavg| %total
+---------------------------------------------------------------
+Pair | 0 | 0 | 0 | 0.0 | 0.00
+Bond | 0 | 0 | 0 | 0.0 | 0.00
+Neigh | 0 | 0 | 0 | 0.0 | 0.00
+Comm | 0 | 0 | 0 | 0.0 | 0.00
+Output | 0 | 0 | 0 | 0.0 | 0.00
+Modify | 0 | 0 | 0 | 0.0 | 0.00
+Other | | 0.5815 | | |100.00
+
+Nlocal: 1000 ave 1000 max 1000 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+Nghost: 2934 ave 2934 max 2934 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+Neighs: 50654 ave 50654 max 50654 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+
+Total # of neighbors = 50654
+Ave neighs/atom = 50.654
+Ave special neighs/atom = 0
+Neighbor list builds = 0
+Dangerous builds = 0
+print "TEST_1a mscg range finder"
+TEST_1a mscg range finder
+unfix 1
+
+# Test 1b: force matching functionality
+fix 1 all mscg 1
+rerun dump.meoh first 0 last 4500 every 250 dump x y z fx fy fz
+Per MPI rank memory allocation (min/avg/max) = 5.794 | 5.794 | 5.794 Mbytes
+Step
+ 0
+ 250
+ 500
+ 750
+ 1000
+ 1250
+ 1500
+ 1750
+ 2000
+ 2250
+ 2500
+ 2750
+ 3000
+ 3250
+ 3500
+ 3750
+ 4000
+ 4250
+ 4500
+Loop time of 0.841917 on 1 procs for 19 steps with 1000 atoms
+
+Performance: 1.950 ns/day, 12.309 hours/ns, 22.568 timesteps/s
+99.8% CPU use with 1 MPI tasks x no OpenMP threads
+
+MPI task timing breakdown:
+Section | min time | avg time | max time |%varavg| %total
+---------------------------------------------------------------
+Pair | 0 | 0 | 0 | 0.0 | 0.00
+Bond | 0 | 0 | 0 | 0.0 | 0.00
+Neigh | 0 | 0 | 0 | 0.0 | 0.00
+Comm | 0 | 0 | 0 | 0.0 | 0.00
+Output | 0 | 0 | 0 | 0.0 | 0.00
+Modify | 0 | 0 | 0 | 0.0 | 0.00
+Other | | 0.8419 | | |100.00
+
+Nlocal: 1000 ave 1000 max 1000 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+Nghost: 2934 ave 2934 max 2934 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+Neighs: 50654 ave 50654 max 50654 min
+Histogram: 1 0 0 0 0 0 0 0 0 0
+
+Total # of neighbors = 50654
+Ave neighs/atom = 50.654
+Ave special neighs/atom = 0
+Neighbor list builds = 0
+Dangerous builds = 0
+print "TEST_1b mscg force matching"
+TEST_1b mscg force matching
+
+print TEST_DONE
+TEST_DONE
+Total wall time: 0:00:01
diff --git a/lib/Install.py b/lib/Install.py
new file mode 100644
index 000000000..18b426f92
--- /dev/null
+++ b/lib/Install.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+
+# install.py tool to do a generic build of a library
+# soft linked to by many of the lib/Install.py files
+# used to automate the steps described in the corresponding lib/README
+
+import sys,commands,os
+
+# help message
+
+help = """
+Syntax: python Install.py -m machine -e suffix
+ specify -m and optionally -e, order does not matter
+ -m = peform a clean followed by "make -f Makefile.machine"
+ machine = suffix of a lib/Makefile.* file
+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
+ does not alter existing Makefile.machine
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+machine = None
+extraflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-m":
+ if iarg+2 > nargs: error()
+ machine = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-e":
+ if iarg+2 > nargs: error()
+ extraflag = 1
+ suffix = args[iarg+1]
+ iarg += 2
+ else: error()
+
+# set lib from working dir
+
+cwd = os.getcwd()
+lib = os.path.basename(cwd)
+
+# create Makefile.auto as copy of Makefile.machine
+# reset EXTRAMAKE if requested
+
+if not os.path.exists("Makefile.%s" % machine):
+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
+
+lines = open("Makefile.%s" % machine,'r').readlines()
+fp = open("Makefile.auto",'w')
+
+for line in lines:
+ words = line.split()
+ if len(words) == 3 and extraflag and \
+ words[0] == "EXTRAMAKE" and words[1] == '=':
+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
+ print >>fp,line,
+
+fp.close()
+
+# make the library via Makefile.auto
+
+print "Building lib%s.a ..." % lib
+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
+txt = commands.getoutput(cmd)
+print txt
+
+if os.path.exists("lib%s.a" % lib): print "Build was successful"
+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
+if not os.path.exists("Makefile.lammps"):
+ print "lib/%s/Makefile.lammps was NOT created" % lib
diff --git a/lib/README b/lib/README
index 72ebb0a5f..3c8f46dd0 100644
--- a/lib/README
+++ b/lib/README
@@ -1,57 +1,59 @@
This directory contains libraries that can be linked to when building
LAMMPS, if particular packages are included in the LAMMPS build.
Most of these directories contain code for the library; some contain
a Makefile.lammps file that points to where the library is installed
elsewhere on your system.
In either case, the library itself must be installed and/or built
first, so that the appropriate library files exist for LAMMPS to link
against.
Each library directory contains a README with additional info about
how to acquire and/or build the library. This may require you to edit
one of the provided Makefiles to make it suitable for your machine.
The libraries in this directory are the following:
atc atomistic-to-continuum methods, USER-ATC package
from Reese Jones, Jeremy Templeton, Jon Zimmerman (Sandia)
awpmd antisymmetrized wave packet molecular dynamics, AWPMD package
from Ilya Valuev (JIHT RAS)
colvars collective variable module (Metadynamics, ABF and more)
from Giacomo Fiorin and Jerome Henin (ICMS, Temple U)
compress hook to system lib for performing I/O compression, COMPRESS pkg
from Axel Kohlmeyer (Temple U)
gpu general GPU routines, GPU package
from Mike Brown (ORNL)
h5md ch5md library for output of MD data in HDF5 format
from Pierre de Buyl (KU Leuven)
kim hooks to the KIM library, used by KIM package
from Ryan Elliott and Ellad Tadmor (U Minn)
kokkos Kokkos package for GPU and many-core acceleration
from Kokkos development team (Sandia)
linalg set of BLAS and LAPACK routines needed by USER-ATC package
from Axel Kohlmeyer (Temple U)
-poems POEMS rigid-body integration package, POEMS package
- from Rudranarayan Mukherjee (RPI)
meam modified embedded atom method (MEAM) potential, MEAM package
from Greg Wagner (Sandia)
molfile hooks to VMD molfile plugins, used by the USER-MOLFILE package
from Axel Kohlmeyer (Temple U) and the VMD development team
mscg hooks to the MSCG library, used by fix_mscg command
from Jacob Wagner and Greg Voth group (U Chicago)
+netcdf hooks to a NetCDF library installed on your system
+ from Lars Pastewka (Karlsruhe Institute of Technology)
+poems POEMS rigid-body integration package, POEMS package
+ from Rudranarayan Mukherjee (RPI)
python hooks to the system Python library, used by the PYTHON package
from the LAMMPS development team
qmmm quantum mechanics/molecular mechanics coupling interface
from Axel Kohlmeyer (Temple U)
quip interface to QUIP/libAtoms framework, USER-QUIP package
from Albert Bartok-Partay and Gabor Csanyi (U Cambridge)
reax ReaxFF potential, REAX package
from Adri van Duin (Penn State) and Aidan Thompson (Sandia)
smd hooks to Eigen library, used by USER-SMD package
from Georg Ganzenmueller (Ernst Mach Institute, Germany)
voronoi hooks to the Voro++ library, used by compute voronoi/atom command
from Daniel Schwen (LANL)
vtk hooks to the VTK library, used by dump custom/vtk command
from Richard Berger (JKU)
diff --git a/lib/atc/Install.py b/lib/atc/Install.py
new file mode 100644
index 000000000..18b426f92
--- /dev/null
+++ b/lib/atc/Install.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+
+# install.py tool to do a generic build of a library
+# soft linked to by many of the lib/Install.py files
+# used to automate the steps described in the corresponding lib/README
+
+import sys,commands,os
+
+# help message
+
+help = """
+Syntax: python Install.py -m machine -e suffix
+ specify -m and optionally -e, order does not matter
+ -m = peform a clean followed by "make -f Makefile.machine"
+ machine = suffix of a lib/Makefile.* file
+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
+ does not alter existing Makefile.machine
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+machine = None
+extraflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-m":
+ if iarg+2 > nargs: error()
+ machine = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-e":
+ if iarg+2 > nargs: error()
+ extraflag = 1
+ suffix = args[iarg+1]
+ iarg += 2
+ else: error()
+
+# set lib from working dir
+
+cwd = os.getcwd()
+lib = os.path.basename(cwd)
+
+# create Makefile.auto as copy of Makefile.machine
+# reset EXTRAMAKE if requested
+
+if not os.path.exists("Makefile.%s" % machine):
+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
+
+lines = open("Makefile.%s" % machine,'r').readlines()
+fp = open("Makefile.auto",'w')
+
+for line in lines:
+ words = line.split()
+ if len(words) == 3 and extraflag and \
+ words[0] == "EXTRAMAKE" and words[1] == '=':
+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
+ print >>fp,line,
+
+fp.close()
+
+# make the library via Makefile.auto
+
+print "Building lib%s.a ..." % lib
+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
+txt = commands.getoutput(cmd)
+print txt
+
+if os.path.exists("lib%s.a" % lib): print "Build was successful"
+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
+if not os.path.exists("Makefile.lammps"):
+ print "lib/%s/Makefile.lammps was NOT created" % lib
diff --git a/lib/atc/README b/lib/atc/README
index 106c303dd..d3adfdafe 100644
--- a/lib/atc/README
+++ b/lib/atc/README
@@ -1,59 +1,64 @@
ATC (Atom To Continuum methods)
Reese Jones, Jeremy Templeton, Jonathan Zimmerman (Sandia National Labs)
rjones, jatempl, jzimmer at sandia.gov
September 2009
This is version 1.0 of the ATC library, which provides continuum field
estimation and molecular dynamics-finite element coupling methods.
-------------------------------------------------
This directory has source files to build a library that LAMMPS
links against when using the USER-ATC package.
This library must be built with a C++ compiler, before LAMMPS is
built, so LAMMPS can link against it.
+You can type "make lib-atc" from the src directory to see help on how
+to build this library via make commands, or you can do the same thing
+by typing "python Install.py" from within this directory, or you can
+do it manually by following the instructions below.
+
Build the library using one of the provided Makefile.* files or create
your own, specific to your compiler and system. For example:
make -f Makefile.g++
Note that the ATC library makes MPI calls, so you must build it with
the same MPI library that is used to build LAMMPS, i.e. as specified
by settings in the lammps/src/MAKE/Makefile.machine file you are
using.
When you are done building this library, two files should
exist in this directory:
libatc.a the library LAMMPS will link against
Makefile.lammps settings the LAMMPS Makefile will import
Makefile.lammps is created by the make command, by copying one of the
Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
Makefile.* files.
IMPORTANT: You must examine the final Makefile.lammps to insure it is
correct for your system, else the LAMMPS build will likely fail.
Makefile.lammps has settings for 3 variables:
user-atc_SYSINC = leave blank for this package
user-atc_SYSLIB = BLAS and LAPACK libraries needed by this package
user-atc_SYSPATH = path(s) to where those libraries are
-You have several choices for these settings:
+You have 3 choices for these settings:
-If the 2 libraries are already installed on your system, the settings
-in Makefile.lammps.installed should work.
+a) If the 2 libraries are already installed on your system, the
+settings in Makefile.lammps.installed should work.
-If they are not, you can install them yourself, and speficy the
-appropriate settings accordingly.
+b) If they are not, you can install them yourself, and specify the
+appropriate settings accordingly in a Makefile.lammps.* file
+and set the EXTRAMAKE setting in Makefile.* to that file.
-If you want to use the minimalist version of these libraries provided
-with LAMMPS in lib/linalg, then the settings in Makefile.lammps.linalg
-should work. Note that in this case you also need to build the
-linear-algebra in lib/linalg; see the lib/linalg/README for more
-details.
+c) Use the minimalist version of these libraries provided with LAMMPS
+in lib/linalg, by using Makefile.lammps.linalg. In this case you also
+need to build the library in lib/linalg; see the lib/linalg/README
+file for more details.
diff --git a/lib/awpmd/Install.py b/lib/awpmd/Install.py
new file mode 100644
index 000000000..18b426f92
--- /dev/null
+++ b/lib/awpmd/Install.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+
+# install.py tool to do a generic build of a library
+# soft linked to by many of the lib/Install.py files
+# used to automate the steps described in the corresponding lib/README
+
+import sys,commands,os
+
+# help message
+
+help = """
+Syntax: python Install.py -m machine -e suffix
+ specify -m and optionally -e, order does not matter
+ -m = peform a clean followed by "make -f Makefile.machine"
+ machine = suffix of a lib/Makefile.* file
+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
+ does not alter existing Makefile.machine
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+machine = None
+extraflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-m":
+ if iarg+2 > nargs: error()
+ machine = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-e":
+ if iarg+2 > nargs: error()
+ extraflag = 1
+ suffix = args[iarg+1]
+ iarg += 2
+ else: error()
+
+# set lib from working dir
+
+cwd = os.getcwd()
+lib = os.path.basename(cwd)
+
+# create Makefile.auto as copy of Makefile.machine
+# reset EXTRAMAKE if requested
+
+if not os.path.exists("Makefile.%s" % machine):
+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
+
+lines = open("Makefile.%s" % machine,'r').readlines()
+fp = open("Makefile.auto",'w')
+
+for line in lines:
+ words = line.split()
+ if len(words) == 3 and extraflag and \
+ words[0] == "EXTRAMAKE" and words[1] == '=':
+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
+ print >>fp,line,
+
+fp.close()
+
+# make the library via Makefile.auto
+
+print "Building lib%s.a ..." % lib
+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
+txt = commands.getoutput(cmd)
+print txt
+
+if os.path.exists("lib%s.a" % lib): print "Build was successful"
+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
+if not os.path.exists("Makefile.lammps"):
+ print "lib/%s/Makefile.lammps was NOT created" % lib
diff --git a/lib/awpmd/README b/lib/awpmd/README
index 3c0248041..20e142f74 100644
--- a/lib/awpmd/README
+++ b/lib/awpmd/README
@@ -1,62 +1,67 @@
AWPMD (Antisymmetrized Wave Packet Molecular Dynamics) library
Ilya Valuev, Igor Morozov, JIHT RAS
valuev at physik.hu-berlin.de
June 2011
This is version 0.9 of the AWPMD library taken from JIHT GridMD project.
It contains interface to calculate electronic and electron-ion Hamiltonian,
norm matrix and forces for AWPMD method.
AWPMD is an open source program distributed under the terms
of wxWidgets Library License (see license directory for details).
-------------------------------------------------
This directory has source files to build a library that LAMMPS
links against when using the USER-AWPMD package.
This library must be built with a C++ compiler, before LAMMPS is
built, so LAMMPS can link against it.
+You can type "make lib-awpmd" from the src directory to see help on
+how to build this library via make commands, or you can do the same
+thing by typing "python Install.py" from within this directory, or you
+can do it manually by following the instructions below.
+
Build the library using one of the provided Makefile.* files or create
your own, specific to your compiler and system. For example:
make -f Makefile.g++
Note that this library makes MPI calls, so you must build it with the
same MPI library that is used to build LAMMPS, i.e. as specified by
settings in the lammps/src/MAKE/Makefile.machine file you are using.
When you are done building this library, two files should
exist in this directory:
libawpmd.a the library LAMMPS will link against
Makefile.lammps settings the LAMMPS Makefile will import
Makefile.lammps is created by the make command, by copying one of the
Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
Makefile.* files.
IMPORTANT: You must examine the final Makefile.lammps to insure it is
correct for your system, else the LAMMPS build will likely fail.
Makefile.lammps has settings for 3 variables:
user-awpmd_SYSINC = leave blank for this package
user-awpmd_SYSLIB = BLAS and LAPACK libraries needed by this package
user-awpmd_SYSPATH = path(s) to where those libraries are
-You have several choices for these settings:
+You have 3 choices for these settings:
-If the 2 libraries are already installed on your system, the settings
-in Makefile.lammps.installed should work.
+a) If the 2 libraries are already installed on your system, the
+settings in Makefile.lammps.installed should work.
-If they are not, you can install them yourself, and speficy the
-appropriate settings accordingly.
+b) If they are not, you can install them yourself, and specify the
+appropriate settings accordingly in a Makefile.lammps.* file
+and set the EXTRAMAKE setting in Makefile.* to that file.
-If you want to use the minimalist version of these libraries provided
-with LAMMPS in lib/linalg, then the settings in Makefile.lammps.linalg
-should work. Note that in this case you also need to build the
-linear-algebra in lib/linalg; see the lib/linalg/README for more
-details.
+c) Use the minimalist version of these libraries provided with LAMMPS
+in lib/linalg, by using Makefile.lammps.linalg. In this case you also
+need to build the library in lib/linalg; see the lib/linalg/README
+file for more details.
diff --git a/lib/colvars/Install.py b/lib/colvars/Install.py
new file mode 100644
index 000000000..18b426f92
--- /dev/null
+++ b/lib/colvars/Install.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+
+# install.py tool to do a generic build of a library
+# soft linked to by many of the lib/Install.py files
+# used to automate the steps described in the corresponding lib/README
+
+import sys,commands,os
+
+# help message
+
+help = """
+Syntax: python Install.py -m machine -e suffix
+ specify -m and optionally -e, order does not matter
+ -m = peform a clean followed by "make -f Makefile.machine"
+ machine = suffix of a lib/Makefile.* file
+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
+ does not alter existing Makefile.machine
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+machine = None
+extraflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-m":
+ if iarg+2 > nargs: error()
+ machine = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-e":
+ if iarg+2 > nargs: error()
+ extraflag = 1
+ suffix = args[iarg+1]
+ iarg += 2
+ else: error()
+
+# set lib from working dir
+
+cwd = os.getcwd()
+lib = os.path.basename(cwd)
+
+# create Makefile.auto as copy of Makefile.machine
+# reset EXTRAMAKE if requested
+
+if not os.path.exists("Makefile.%s" % machine):
+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
+
+lines = open("Makefile.%s" % machine,'r').readlines()
+fp = open("Makefile.auto",'w')
+
+for line in lines:
+ words = line.split()
+ if len(words) == 3 and extraflag and \
+ words[0] == "EXTRAMAKE" and words[1] == '=':
+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
+ print >>fp,line,
+
+fp.close()
+
+# make the library via Makefile.auto
+
+print "Building lib%s.a ..." % lib
+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
+txt = commands.getoutput(cmd)
+print txt
+
+if os.path.exists("lib%s.a" % lib): print "Build was successful"
+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
+if not os.path.exists("Makefile.lammps"):
+ print "lib/%s/Makefile.lammps was NOT created" % lib
diff --git a/lib/colvars/README b/lib/colvars/README
index d6efc333a..a5e5938b2 100644
--- a/lib/colvars/README
+++ b/lib/colvars/README
@@ -1,68 +1,73 @@
This library is the portable "colvars" module, originally interfaced
with the NAMD MD code, to provide an extensible software framework,
that allows enhanced sampling in molecular dynamics simulations.
The module is written to maximize performance, portability,
flexibility of usage for the user, and extensibility for the developer.
The development of the colvars library is now hosted on github at:
http://colvars.github.io/
You can use this site to get access to the latest development sources
and the up-to-date documentation.
Copy of the specific documentation is also in
doc/PDF/colvars-refman-lammps.pdf
Please report bugs and request new features at:
https://github.com/colvars/colvars/issues
The following publications describe the principles of
the implementation of this library:
Using collective variables to drive molecular dynamics simulations,
Giacomo Fiorin , Michael L. Klein & Jérôme Hénin (2013):
Molecular Physics DOI:10.1080/00268976.2013.813594
Exploring Multidimensional Free Energy Landscapes Using
Time-Dependent Biases on Collective Variables,
J. Hénin, G. Fiorin, C. Chipot, and M. L. Klein,
J. Chem. Theory Comput., 6, 35-47 (2010).
-------------------------------------------------
This directory has source files to build a library that LAMMPS
links against when using the USER-COLVARS package.
This library must be built with a C++ compiler, before LAMMPS is
built, so LAMMPS can link against it.
+You can type "make lib-colvars" from the src directory to see help on
+how to build this library via make commands, or you can do the same
+thing by typing "python Install.py" from within this directory, or you
+can do it manually by following the instructions below.
+
Build the library using one of the provided Makefile.* files or create
your own, specific to your compiler and system. For example:
make -f Makefile.g++
When you are done building this library, two files should
exist in this directory:
libcolvars.a the library LAMMPS will link against
Makefile.lammps settings the LAMMPS Makefile will import
Makefile.lammps is created by the make command, by copying one of the
Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
Makefile.* files.
IMPORTANT: You must examine the final Makefile.lammps to insure it is
correct for your system, else the LAMMPS build will likely fail.
Makefile.lammps has settings for 3 variables:
user-colvars_SYSINC = leave blank for this package unless debugging
user-colvars_SYSLIB = leave blank for this package
user-colvars_SYSPATH = leave blank for this package
You have several choices for these settings:
Since they do not normally need to be set, the settings in
Makefile.lammps.empty should work.
If you want to set a debug flag recognized by the library, the
settings in Makefile.lammps.debug should work.
diff --git a/lib/gpu/Install.py b/lib/gpu/Install.py
new file mode 100644
index 000000000..d396be5e1
--- /dev/null
+++ b/lib/gpu/Install.py
@@ -0,0 +1,146 @@
+#!/usr/bin/env python
+
+# Install.py tool to build the GPU library
+# used to automate the steps described in the README file in this dir
+
+import sys,os,re,commands
+
+# help message
+
+help = """
+Syntax: python Install.py -i isuffix -h hdir -a arch -p precision -e esuffix -m -o osuffix
+ specify one or more options, order does not matter
+ copies an existing Makefile.isuffix in lib/gpu to Makefile.auto
+ optionally edits these variables in Makefile.auto:
+ CUDA_HOME, CUDA_ARCH, CUDA_PRECISION, EXTRAMAKE
+ optionally uses Makefile.auto to build the GPU library -> libgpu.a
+ and to copy a Makefile.lammps.esuffix -> Makefile.lammps
+ optionally copies Makefile.auto to a new Makefile.osuffix
+
+ -i = use Makefile.isuffix as starting point, copy to Makefile.auto
+ default isuffix = linux
+ -h = set CUDA_HOME variable in Makefile.auto to hdir
+ hdir = path to NVIDIA Cuda software, e.g. /usr/local/cuda
+ -a = set CUDA_ARCH variable in Makefile.auto to arch
+ use arch = ?? for K40 (Tesla)
+ use arch = 37 for dual K80 (Tesla)
+ use arch = 60 for P100 (Pascal)
+ -p = set CUDA_PRECISION variable in Makefile.auto to precision
+ use precision = double or mixed or single
+ -e = set EXTRAMAKE variable in Makefile.auto to Makefile.lammps.esuffix
+ -m = make the GPU library using Makefile.auto
+ first performs a "make clean"
+ produces libgpu.a if successful
+ also copies EXTRAMAKE file -> Makefile.lammps
+ -e can set which Makefile.lammps.esuffix file is copied
+ -o = copy final Makefile.auto to Makefile.osuffix
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+isuffix = "linux"
+hflag = aflag = pflag = eflag = 0
+makeflag = 0
+outflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-i":
+ if iarg+2 > nargs: error()
+ isuffix = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-h":
+ if iarg+2 > nargs: error()
+ hflag = 1
+ hdir = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-a":
+ if iarg+2 > nargs: error()
+ aflag = 1
+ arch = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-p":
+ if iarg+2 > nargs: error()
+ pflag = 1
+ precision = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-e":
+ if iarg+2 > nargs: error()
+ eflag = 1
+ lmpsuffix = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-m":
+ makeflag = 1
+ iarg += 1
+ elif args[iarg] == "-o":
+ if iarg+2 > nargs: error()
+ outflag = 1
+ osuffix = args[iarg+1]
+ iarg += 2
+ else: error()
+
+if pflag:
+ if precision == "double": precstr = "-D_DOUBLE_DOUBLE"
+ elif precision == "mixed": precstr = "-D_SINGLE_DOUBLE"
+ elif precision == "single": precstr = "-D_SINGLE_SINGLE"
+ else: error("Invalid precision setting")
+
+# create Makefile.auto
+# reset EXTRAMAKE, CUDA_HOME, CUDA_ARCH, CUDA_PRECISION if requested
+
+if not os.path.exists("Makefile.%s" % isuffix):
+ error("lib/gpu/Makefile.%s does not exist" % isuffix)
+
+lines = open("Makefile.%s" % isuffix,'r').readlines()
+fp = open("Makefile.auto",'w')
+
+for line in lines:
+ words = line.split()
+ if len(words) != 3:
+ print >>fp,line,
+ continue
+
+ if hflag and words[0] == "CUDA_HOME" and words[1] == '=':
+ line = line.replace(words[2],hdir)
+ if aflag and words[0] == "CUDA_ARCH" and words[1] == '=':
+ line = line.replace(words[2],"-arch=sm_%s" % arch)
+ if pflag and words[0] == "CUDA_PRECISION" and words[1] == '=':
+ line = line.replace(words[2],precstr)
+ if eflag and words[0] == "EXTRAMAKE" and words[1] == '=':
+ line = line.replace(words[2],"Makefile.lammps.%s" % lmpsuffix)
+
+ print >>fp,line,
+
+fp.close()
+
+# perform make
+# make operations copies EXTRAMAKE file to Makefile.lammps
+
+if makeflag:
+ print "Building libgpu.a ..."
+ cmd = "rm -f libgpu.a"
+ commands.getoutput(cmd)
+ cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
+ commands.getoutput(cmd)
+ if not os.path.exists("libgpu.a"):
+ error("Build of lib/gpu/libgpu.a was NOT successful")
+ if not os.path.exists("Makefile.lammps"):
+ error("lib/gpu/Makefile.lammps was NOT created")
+
+# copy new Makefile.auto to Makefile.osuffix
+
+if outflag:
+ print "Creating new Makefile.%s" % osuffix
+ cmd = "cp Makefile.auto Makefile.%s" % osuffix
+ commands.getoutput(cmd)
diff --git a/lib/gpu/Nvidia.makefile b/lib/gpu/Nvidia.makefile
index e02849cfe..660544cfa 100644
--- a/lib/gpu/Nvidia.makefile
+++ b/lib/gpu/Nvidia.makefile
@@ -1,798 +1,798 @@
CUDA = $(NVCC) $(CUDA_INCLUDE) $(CUDA_OPTS) -Icudpp_mini $(CUDA_ARCH) \
$(CUDA_PRECISION)
CUDR = $(CUDR_CPP) $(CUDR_OPTS) $(CUDA_PRECISION) $(CUDA_INCLUDE) \
$(CUDPP_OPT)
CUDA_LINK = $(CUDA_LIB) -lcudart
BIN2C = $(CUDA_HOME)/bin/bin2c
GPU_LIB = $(LIB_DIR)/libgpu.a
# Headers for Geryon
UCL_H = $(wildcard ./geryon/ucl*.h)
NVC_H = $(wildcard ./geryon/nvc*.h) $(UCL_H)
NVD_H = $(wildcard ./geryon/nvd*.h) $(UCL_H) lal_preprocessor.h
# Headers for Pair Stuff
PAIR_H = lal_atom.h lal_answer.h lal_neighbor_shared.h \
lal_neighbor.h lal_precision.h lal_device.h \
lal_balance.h lal_pppm.h
ALL_H = $(NVD_H) $(PAIR_H)
EXECS = $(BIN_DIR)/nvc_get_devices
ifdef CUDPP_OPT
CUDPP = $(OBJ_DIR)/cudpp.o $(OBJ_DIR)/cudpp_plan.o \
$(OBJ_DIR)/cudpp_maximal_launch.o $(OBJ_DIR)/cudpp_plan_manager.o \
$(OBJ_DIR)/radixsort_app.cu_o $(OBJ_DIR)/scan_app.cu_o
endif
OBJS = $(OBJ_DIR)/lal_atom.o $(OBJ_DIR)/lal_ans.o \
$(OBJ_DIR)/lal_neighbor.o $(OBJ_DIR)/lal_neighbor_shared.o \
$(OBJ_DIR)/lal_device.o $(OBJ_DIR)/lal_base_atomic.o \
$(OBJ_DIR)/lal_base_charge.o $(OBJ_DIR)/lal_base_ellipsoid.o \
$(OBJ_DIR)/lal_base_dipole.o $(OBJ_DIR)/lal_base_three.o \
$(OBJ_DIR)/lal_base_dpd.o \
$(OBJ_DIR)/lal_pppm.o $(OBJ_DIR)/lal_pppm_ext.o \
$(OBJ_DIR)/lal_gayberne.o $(OBJ_DIR)/lal_gayberne_ext.o \
$(OBJ_DIR)/lal_re_squared.o $(OBJ_DIR)/lal_re_squared_ext.o \
$(OBJ_DIR)/lal_lj.o $(OBJ_DIR)/lal_lj_ext.o \
$(OBJ_DIR)/lal_lj96.o $(OBJ_DIR)/lal_lj96_ext.o \
$(OBJ_DIR)/lal_lj_expand.o $(OBJ_DIR)/lal_lj_expand_ext.o \
$(OBJ_DIR)/lal_lj_coul.o $(OBJ_DIR)/lal_lj_coul_ext.o \
$(OBJ_DIR)/lal_lj_coul_long.o $(OBJ_DIR)/lal_lj_coul_long_ext.o \
$(OBJ_DIR)/lal_lj_dsf.o $(OBJ_DIR)/lal_lj_dsf_ext.o \
$(OBJ_DIR)/lal_lj_class2_long.o $(OBJ_DIR)/lal_lj_class2_long_ext.o \
$(OBJ_DIR)/lal_coul_long.o $(OBJ_DIR)/lal_coul_long_ext.o \
$(OBJ_DIR)/lal_morse.o $(OBJ_DIR)/lal_morse_ext.o \
$(OBJ_DIR)/lal_charmm_long.o $(OBJ_DIR)/lal_charmm_long_ext.o \
- $(OBJ_DIR)/lal_cg_cmm.o $(OBJ_DIR)/lal_cg_cmm_ext.o \
- $(OBJ_DIR)/lal_cg_cmm_long.o $(OBJ_DIR)/lal_cg_cmm_long_ext.o \
+ $(OBJ_DIR)/lal_lj_sdk.o $(OBJ_DIR)/lal_lj_sdk_ext.o \
+ $(OBJ_DIR)/lal_lj_sdk_long.o $(OBJ_DIR)/lal_lj_sdk_long_ext.o \
$(OBJ_DIR)/lal_eam.o $(OBJ_DIR)/lal_eam_ext.o \
$(OBJ_DIR)/lal_eam_fs_ext.o $(OBJ_DIR)/lal_eam_alloy_ext.o \
$(OBJ_DIR)/lal_buck.o $(OBJ_DIR)/lal_buck_ext.o \
$(OBJ_DIR)/lal_buck_coul.o $(OBJ_DIR)/lal_buck_coul_ext.o \
$(OBJ_DIR)/lal_buck_coul_long.o $(OBJ_DIR)/lal_buck_coul_long_ext.o \
$(OBJ_DIR)/lal_table.o $(OBJ_DIR)/lal_table_ext.o \
$(OBJ_DIR)/lal_yukawa.o $(OBJ_DIR)/lal_yukawa_ext.o \
$(OBJ_DIR)/lal_born.o $(OBJ_DIR)/lal_born_ext.o \
$(OBJ_DIR)/lal_born_coul_wolf.o $(OBJ_DIR)/lal_born_coul_wolf_ext.o \
$(OBJ_DIR)/lal_born_coul_long.o $(OBJ_DIR)/lal_born_coul_long_ext.o \
$(OBJ_DIR)/lal_dipole_lj.o $(OBJ_DIR)/lal_dipole_lj_ext.o \
$(OBJ_DIR)/lal_dipole_lj_sf.o $(OBJ_DIR)/lal_dipole_lj_sf_ext.o \
$(OBJ_DIR)/lal_colloid.o $(OBJ_DIR)/lal_colloid_ext.o \
$(OBJ_DIR)/lal_gauss.o $(OBJ_DIR)/lal_gauss_ext.o \
$(OBJ_DIR)/lal_yukawa_colloid.o $(OBJ_DIR)/lal_yukawa_colloid_ext.o \
$(OBJ_DIR)/lal_lj_coul_debye.o $(OBJ_DIR)/lal_lj_coul_debye_ext.o \
$(OBJ_DIR)/lal_coul_dsf.o $(OBJ_DIR)/lal_coul_dsf_ext.o \
$(OBJ_DIR)/lal_sw.o $(OBJ_DIR)/lal_sw_ext.o \
$(OBJ_DIR)/lal_beck.o $(OBJ_DIR)/lal_beck_ext.o \
$(OBJ_DIR)/lal_mie.o $(OBJ_DIR)/lal_mie_ext.o \
$(OBJ_DIR)/lal_soft.o $(OBJ_DIR)/lal_soft_ext.o \
$(OBJ_DIR)/lal_lj_coul_msm.o $(OBJ_DIR)/lal_lj_coul_msm_ext.o \
$(OBJ_DIR)/lal_lj_gromacs.o $(OBJ_DIR)/lal_lj_gromacs_ext.o \
$(OBJ_DIR)/lal_dpd.o $(OBJ_DIR)/lal_dpd_ext.o \
$(OBJ_DIR)/lal_tersoff.o $(OBJ_DIR)/lal_tersoff_ext.o \
$(OBJ_DIR)/lal_tersoff_zbl.o $(OBJ_DIR)/lal_tersoff_zbl_ext.o \
$(OBJ_DIR)/lal_tersoff_mod.o $(OBJ_DIR)/lal_tersoff_mod_ext.o \
$(OBJ_DIR)/lal_coul.o $(OBJ_DIR)/lal_coul_ext.o \
$(OBJ_DIR)/lal_coul_debye.o $(OBJ_DIR)/lal_coul_debye_ext.o \
$(OBJ_DIR)/lal_zbl.o $(OBJ_DIR)/lal_zbl_ext.o \
$(OBJ_DIR)/lal_lj_cubic.o $(OBJ_DIR)/lal_lj_cubic_ext.o
CBNS = $(OBJ_DIR)/device.cubin $(OBJ_DIR)/device_cubin.h \
$(OBJ_DIR)/atom.cubin $(OBJ_DIR)/atom_cubin.h \
$(OBJ_DIR)/neighbor_cpu.cubin $(OBJ_DIR)/neighbor_cpu_cubin.h \
$(OBJ_DIR)/neighbor_gpu.cubin $(OBJ_DIR)/neighbor_gpu_cubin.h \
$(OBJ_DIR)/pppm_f.cubin $(OBJ_DIR)/pppm_f_cubin.h \
$(OBJ_DIR)/pppm_d.cubin $(OBJ_DIR)/pppm_d_cubin.h \
$(OBJ_DIR)/ellipsoid_nbor.cubin $(OBJ_DIR)/ellipsoid_nbor_cubin.h \
$(OBJ_DIR)/gayberne.cubin $(OBJ_DIR)/gayberne_lj.cubin \
$(OBJ_DIR)/gayberne_cubin.h $(OBJ_DIR)/gayberne_lj_cubin.h \
$(OBJ_DIR)/re_squared.cubin $(OBJ_DIR)/re_squared_lj.cubin \
$(OBJ_DIR)/re_squared_cubin.h $(OBJ_DIR)/re_squared_lj_cubin.h \
$(OBJ_DIR)/lj.cubin $(OBJ_DIR)/lj_cubin.h \
$(OBJ_DIR)/lj96.cubin $(OBJ_DIR)/lj96_cubin.h \
$(OBJ_DIR)/lj_expand.cubin $(OBJ_DIR)/lj_expand_cubin.h \
$(OBJ_DIR)/lj_coul.cubin $(OBJ_DIR)/lj_coul_cubin.h \
$(OBJ_DIR)/lj_coul_long.cubin $(OBJ_DIR)/lj_coul_long_cubin.h \
$(OBJ_DIR)/lj_dsf.cubin $(OBJ_DIR)/lj_dsf_cubin.h \
$(OBJ_DIR)/lj_class2_long.cubin $(OBJ_DIR)/lj_class2_long_cubin.h \
$(OBJ_DIR)/coul_long.cubin $(OBJ_DIR)/coul_long_cubin.h \
$(OBJ_DIR)/morse.cubin $(OBJ_DIR)/morse_cubin.h \
$(OBJ_DIR)/charmm_long.cubin $(OBJ_DIR)/charmm_long_cubin.h \
- $(OBJ_DIR)/cg_cmm.cubin $(OBJ_DIR)/cg_cmm_cubin.h \
- $(OBJ_DIR)/cg_cmm_long.cubin $(OBJ_DIR)/cg_cmm_long_cubin.h \
+ $(OBJ_DIR)/lj_sdk.cubin $(OBJ_DIR)/lj_sdk_cubin.h \
+ $(OBJ_DIR)/lj_sdk_long.cubin $(OBJ_DIR)/lj_sdk_long_cubin.h \
$(OBJ_DIR)/eam.cubin $(OBJ_DIR)/eam_cubin.h \
$(OBJ_DIR)/buck.cubin $(OBJ_DIR)/buck_cubin.h \
$(OBJ_DIR)/buck_coul_long.cubin $(OBJ_DIR)/buck_coul_long_cubin.h \
$(OBJ_DIR)/buck_coul.cubin $(OBJ_DIR)/buck_coul_cubin.h \
$(OBJ_DIR)/table.cubin $(OBJ_DIR)/table_cubin.h \
$(OBJ_DIR)/yukawa.cubin $(OBJ_DIR)/yukawa_cubin.h \
$(OBJ_DIR)/born.cubin $(OBJ_DIR)/born_cubin.h \
$(OBJ_DIR)/born_coul_wolf.cubin $(OBJ_DIR)/born_coul_wolf_cubin.h \
$(OBJ_DIR)/born_coul_long.cubin $(OBJ_DIR)/born_coul_long_cubin.h \
$(OBJ_DIR)/dipole_lj.cubin $(OBJ_DIR)/dipole_lj_cubin.h \
$(OBJ_DIR)/dipole_lj_sf.cubin $(OBJ_DIR)/dipole_lj_sf_cubin.h \
$(OBJ_DIR)/colloid.cubin $(OBJ_DIR)/colloid_cubin.h \
$(OBJ_DIR)/gauss.cubin $(OBJ_DIR)/gauss_cubin.h \
$(OBJ_DIR)/yukawa_colloid.cubin $(OBJ_DIR)/yukawa_colloid_cubin.h \
$(OBJ_DIR)/lj_coul_debye.cubin $(OBJ_DIR)/lj_coul_debye_cubin.h \
$(OBJ_DIR)/coul_dsf.cubin $(OBJ_DIR)/coul_dsf_cubin.h \
$(OBJ_DIR)/sw.cubin $(OBJ_DIR)/sw_cubin.h \
$(OBJ_DIR)/beck.cubin $(OBJ_DIR)/beck_cubin.h \
$(OBJ_DIR)/mie.cubin $(OBJ_DIR)/mie_cubin.h \
$(OBJ_DIR)/soft.cubin $(OBJ_DIR)/soft_cubin.h \
$(OBJ_DIR)/lj_coul_msm.cubin $(OBJ_DIR)/lj_coul_msm_cubin.h \
$(OBJ_DIR)/lj_gromacs.cubin $(OBJ_DIR)/lj_gromacs_cubin.h \
$(OBJ_DIR)/dpd.cubin $(OBJ_DIR)/dpd_cubin.h \
$(OBJ_DIR)/tersoff.cubin $(OBJ_DIR)/tersoff_cubin.h \
$(OBJ_DIR)/tersoff_zbl.cubin $(OBJ_DIR)/tersoff_zbl_cubin.h \
$(OBJ_DIR)/tersoff_mod.cubin $(OBJ_DIR)/tersoff_mod_cubin.h \
$(OBJ_DIR)/coul.cubin $(OBJ_DIR)/coul_cubin.h \
$(OBJ_DIR)/coul_debye.cubin $(OBJ_DIR)/coul_debye_cubin.h \
$(OBJ_DIR)/zbl.cubin $(OBJ_DIR)/zbl_cubin.h \
$(OBJ_DIR)/lj_cubic.cubin $(OBJ_DIR)/lj_cubic_cubin.h
all: $(OBJ_DIR) $(GPU_LIB) $(EXECS)
$(OBJ_DIR):
mkdir -p $@
$(OBJ_DIR)/cudpp.o: cudpp_mini/cudpp.cpp
$(CUDR) -o $@ -c cudpp_mini/cudpp.cpp -Icudpp_mini
$(OBJ_DIR)/cudpp_plan.o: cudpp_mini/cudpp_plan.cpp
$(CUDR) -o $@ -c cudpp_mini/cudpp_plan.cpp -Icudpp_mini
$(OBJ_DIR)/cudpp_maximal_launch.o: cudpp_mini/cudpp_maximal_launch.cpp
$(CUDR) -o $@ -c cudpp_mini/cudpp_maximal_launch.cpp -Icudpp_mini
$(OBJ_DIR)/cudpp_plan_manager.o: cudpp_mini/cudpp_plan_manager.cpp
$(CUDR) -o $@ -c cudpp_mini/cudpp_plan_manager.cpp -Icudpp_mini
$(OBJ_DIR)/radixsort_app.cu_o: cudpp_mini/radixsort_app.cu
$(CUDA) -o $@ -c cudpp_mini/radixsort_app.cu
$(OBJ_DIR)/scan_app.cu_o: cudpp_mini/scan_app.cu
$(CUDA) -o $@ -c cudpp_mini/scan_app.cu
$(OBJ_DIR)/atom.cubin: lal_atom.cu lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_atom.cu
$(OBJ_DIR)/atom_cubin.h: $(OBJ_DIR)/atom.cubin
$(BIN2C) -c -n atom $(OBJ_DIR)/atom.cubin > $(OBJ_DIR)/atom_cubin.h
$(OBJ_DIR)/lal_atom.o: lal_atom.cpp lal_atom.h $(NVD_H) $(OBJ_DIR)/atom_cubin.h
$(CUDR) -o $@ -c lal_atom.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_ans.o: lal_answer.cpp lal_answer.h $(NVD_H)
$(CUDR) -o $@ -c lal_answer.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/neighbor_cpu.cubin: lal_neighbor_cpu.cu lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_neighbor_cpu.cu
$(OBJ_DIR)/neighbor_cpu_cubin.h: $(OBJ_DIR)/neighbor_cpu.cubin
$(BIN2C) -c -n neighbor_cpu $(OBJ_DIR)/neighbor_cpu.cubin > $(OBJ_DIR)/neighbor_cpu_cubin.h
$(OBJ_DIR)/neighbor_gpu.cubin: lal_neighbor_gpu.cu lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_neighbor_gpu.cu
$(OBJ_DIR)/neighbor_gpu_cubin.h: $(OBJ_DIR)/neighbor_gpu.cubin
$(BIN2C) -c -n neighbor_gpu $(OBJ_DIR)/neighbor_gpu.cubin > $(OBJ_DIR)/neighbor_gpu_cubin.h
$(OBJ_DIR)/lal_neighbor_shared.o: lal_neighbor_shared.cpp lal_neighbor_shared.h $(OBJ_DIR)/neighbor_cpu_cubin.h $(OBJ_DIR)/neighbor_gpu_cubin.h $(NVD_H)
$(CUDR) -o $@ -c lal_neighbor_shared.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_neighbor.o: lal_neighbor.cpp lal_neighbor.h lal_neighbor_shared.h $(NVD_H)
$(CUDR) -o $@ -c lal_neighbor.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/device.cubin: lal_device.cu lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_device.cu
$(OBJ_DIR)/device_cubin.h: $(OBJ_DIR)/device.cubin
$(BIN2C) -c -n device $(OBJ_DIR)/device.cubin > $(OBJ_DIR)/device_cubin.h
$(OBJ_DIR)/lal_device.o: lal_device.cpp lal_device.h $(ALL_H) $(OBJ_DIR)/device_cubin.h
$(CUDR) -o $@ -c lal_device.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_base_atomic.o: $(ALL_H) lal_base_atomic.h lal_base_atomic.cpp
$(CUDR) -o $@ -c lal_base_atomic.cpp
$(OBJ_DIR)/lal_base_charge.o: $(ALL_H) lal_base_charge.h lal_base_charge.cpp
$(CUDR) -o $@ -c lal_base_charge.cpp
$(OBJ_DIR)/lal_base_ellipsoid.o: $(ALL_H) lal_base_ellipsoid.h lal_base_ellipsoid.cpp $(OBJ_DIR)/ellipsoid_nbor_cubin.h
$(CUDR) -o $@ -c lal_base_ellipsoid.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_base_dipole.o: $(ALL_H) lal_base_dipole.h lal_base_dipole.cpp
$(CUDR) -o $@ -c lal_base_dipole.cpp
$(OBJ_DIR)/lal_base_three.o: $(ALL_H) lal_base_three.h lal_base_three.cpp
$(CUDR) -o $@ -c lal_base_three.cpp
$(OBJ_DIR)/lal_base_dpd.o: $(ALL_H) lal_base_dpd.h lal_base_dpd.cpp
$(CUDR) -o $@ -c lal_base_dpd.cpp
$(OBJ_DIR)/pppm_f.cubin: lal_pppm.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -Dgrdtyp=float -Dgrdtyp4=float4 -o $@ lal_pppm.cu
$(OBJ_DIR)/pppm_f_cubin.h: $(OBJ_DIR)/pppm_f.cubin
$(BIN2C) -c -n pppm_f $(OBJ_DIR)/pppm_f.cubin > $(OBJ_DIR)/pppm_f_cubin.h
$(OBJ_DIR)/pppm_d.cubin: lal_pppm.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -Dgrdtyp=double -Dgrdtyp4=double4 -o $@ lal_pppm.cu
$(OBJ_DIR)/pppm_d_cubin.h: $(OBJ_DIR)/pppm_d.cubin
$(BIN2C) -c -n pppm_d $(OBJ_DIR)/pppm_d.cubin > $(OBJ_DIR)/pppm_d_cubin.h
$(OBJ_DIR)/lal_pppm.o: $(ALL_H) lal_pppm.h lal_pppm.cpp $(OBJ_DIR)/pppm_f_cubin.h $(OBJ_DIR)/pppm_d_cubin.h
$(CUDR) -o $@ -c lal_pppm.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_pppm_ext.o: $(ALL_H) lal_pppm.h lal_pppm_ext.cpp
$(CUDR) -o $@ -c lal_pppm_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/ellipsoid_nbor.cubin: lal_ellipsoid_nbor.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_ellipsoid_nbor.cu
$(OBJ_DIR)/ellipsoid_nbor_cubin.h: $(OBJ_DIR)/ellipsoid_nbor.cubin
$(BIN2C) -c -n ellipsoid_nbor $(OBJ_DIR)/ellipsoid_nbor.cubin > $(OBJ_DIR)/ellipsoid_nbor_cubin.h
$(OBJ_DIR)/gayberne.cubin: lal_gayberne.cu lal_precision.h lal_ellipsoid_extra.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_gayberne.cu
$(OBJ_DIR)/gayberne_lj.cubin: lal_gayberne_lj.cu lal_precision.h lal_ellipsoid_extra.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_gayberne_lj.cu
$(OBJ_DIR)/gayberne_cubin.h: $(OBJ_DIR)/gayberne.cubin
$(BIN2C) -c -n gayberne $(OBJ_DIR)/gayberne.cubin > $(OBJ_DIR)/gayberne_cubin.h
$(OBJ_DIR)/gayberne_lj_cubin.h: $(OBJ_DIR)/gayberne_lj.cubin
$(BIN2C) -c -n gayberne_lj $(OBJ_DIR)/gayberne_lj.cubin > $(OBJ_DIR)/gayberne_lj_cubin.h
$(OBJ_DIR)/lal_gayberne.o: $(ALL_H) lal_gayberne.h lal_gayberne.cpp $(OBJ_DIR)/gayberne_cubin.h $(OBJ_DIR)/gayberne_lj_cubin.h $(OBJ_DIR)/lal_base_ellipsoid.o
$(CUDR) -o $@ -c lal_gayberne.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_gayberne_ext.o: $(ALL_H) $(OBJ_DIR)/lal_gayberne.o lal_gayberne_ext.cpp
$(CUDR) -o $@ -c lal_gayberne_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/re_squared.cubin: lal_re_squared.cu lal_precision.h lal_ellipsoid_extra.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_re_squared.cu
$(OBJ_DIR)/re_squared_lj.cubin: lal_re_squared_lj.cu lal_precision.h lal_ellipsoid_extra.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_re_squared_lj.cu
$(OBJ_DIR)/re_squared_cubin.h: $(OBJ_DIR)/re_squared.cubin
$(BIN2C) -c -n re_squared $(OBJ_DIR)/re_squared.cubin > $(OBJ_DIR)/re_squared_cubin.h
$(OBJ_DIR)/re_squared_lj_cubin.h: $(OBJ_DIR)/re_squared_lj.cubin
$(BIN2C) -c -n re_squared_lj $(OBJ_DIR)/re_squared_lj.cubin > $(OBJ_DIR)/re_squared_lj_cubin.h
$(OBJ_DIR)/lal_re_squared.o: $(ALL_H) lal_re_squared.h lal_re_squared.cpp $(OBJ_DIR)/re_squared_cubin.h $(OBJ_DIR)/re_squared_lj_cubin.h $(OBJ_DIR)/lal_base_ellipsoid.o
$(CUDR) -o $@ -c lal_re_squared.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_re_squared_ext.o: $(ALL_H) $(OBJ_DIR)/lal_re_squared.o lal_re_squared_ext.cpp
$(CUDR) -o $@ -c lal_re_squared_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj.cubin: lal_lj.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj.cu
$(OBJ_DIR)/lj_cubin.h: $(OBJ_DIR)/lj.cubin $(OBJ_DIR)/lj.cubin
$(BIN2C) -c -n lj $(OBJ_DIR)/lj.cubin > $(OBJ_DIR)/lj_cubin.h
$(OBJ_DIR)/lal_lj.o: $(ALL_H) lal_lj.h lal_lj.cpp $(OBJ_DIR)/lj_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_lj.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_ext.o: $(ALL_H) lal_lj.h lal_lj_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_lj_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_coul.cubin: lal_lj_coul.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_coul.cu
$(OBJ_DIR)/lj_coul_cubin.h: $(OBJ_DIR)/lj_coul.cubin $(OBJ_DIR)/lj_coul.cubin
$(BIN2C) -c -n lj_coul $(OBJ_DIR)/lj_coul.cubin > $(OBJ_DIR)/lj_coul_cubin.h
$(OBJ_DIR)/lal_lj_coul.o: $(ALL_H) lal_lj_coul.h lal_lj_coul.cpp $(OBJ_DIR)/lj_coul_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_lj_coul.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_coul_ext.o: $(ALL_H) lal_lj_coul.h lal_lj_coul_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_lj_coul_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_class2_long.cubin: lal_lj_class2_long.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_class2_long.cu
$(OBJ_DIR)/lj_class2_long_cubin.h: $(OBJ_DIR)/lj_class2_long.cubin $(OBJ_DIR)/lj_class2_long.cubin
$(BIN2C) -c -n lj_class2_long $(OBJ_DIR)/lj_class2_long.cubin > $(OBJ_DIR)/lj_class2_long_cubin.h
$(OBJ_DIR)/lal_lj_class2_long.o: $(ALL_H) lal_lj_class2_long.h lal_lj_class2_long.cpp $(OBJ_DIR)/lj_class2_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_lj_class2_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_class2_long_ext.o: $(ALL_H) lal_lj_class2_long.h lal_lj_class2_long_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_lj_class2_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/coul_long.cubin: lal_coul_long.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_coul_long.cu
$(OBJ_DIR)/coul_long_cubin.h: $(OBJ_DIR)/coul_long.cubin $(OBJ_DIR)/coul_long.cubin
$(BIN2C) -c -n coul_long $(OBJ_DIR)/coul_long.cubin > $(OBJ_DIR)/coul_long_cubin.h
$(OBJ_DIR)/lal_coul_long.o: $(ALL_H) lal_coul_long.h lal_coul_long.cpp $(OBJ_DIR)/coul_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_coul_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_coul_long_ext.o: $(ALL_H) lal_coul_long.h lal_coul_long_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_coul_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_coul_long.cubin: lal_lj_coul_long.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_coul_long.cu
$(OBJ_DIR)/lj_coul_long_cubin.h: $(OBJ_DIR)/lj_coul_long.cubin $(OBJ_DIR)/lj_coul_long.cubin
$(BIN2C) -c -n lj_coul_long $(OBJ_DIR)/lj_coul_long.cubin > $(OBJ_DIR)/lj_coul_long_cubin.h
$(OBJ_DIR)/lal_lj_coul_long.o: $(ALL_H) lal_lj_coul_long.h lal_lj_coul_long.cpp $(OBJ_DIR)/lj_coul_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_lj_coul_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_coul_long_ext.o: $(ALL_H) lal_lj_coul_long.h lal_lj_coul_long_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_lj_coul_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_dsf.cubin: lal_lj_dsf.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_dsf.cu
$(OBJ_DIR)/lj_dsf_cubin.h: $(OBJ_DIR)/lj_dsf.cubin $(OBJ_DIR)/lj_dsf.cubin
$(BIN2C) -c -n lj_dsf $(OBJ_DIR)/lj_dsf.cubin > $(OBJ_DIR)/lj_dsf_cubin.h
$(OBJ_DIR)/lal_lj_dsf.o: $(ALL_H) lal_lj_dsf.h lal_lj_dsf.cpp $(OBJ_DIR)/lj_dsf_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_lj_dsf.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_dsf_ext.o: $(ALL_H) lal_lj_dsf.h lal_lj_dsf_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_lj_dsf_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/morse.cubin: lal_morse.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_morse.cu
$(OBJ_DIR)/morse_cubin.h: $(OBJ_DIR)/morse.cubin $(OBJ_DIR)/morse.cubin
$(BIN2C) -c -n morse $(OBJ_DIR)/morse.cubin > $(OBJ_DIR)/morse_cubin.h
$(OBJ_DIR)/lal_morse.o: $(ALL_H) lal_morse.h lal_morse.cpp $(OBJ_DIR)/morse_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_morse.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_morse_ext.o: $(ALL_H) lal_morse.h lal_morse_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_morse_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/charmm_long.cubin: lal_charmm_long.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_charmm_long.cu
$(OBJ_DIR)/charmm_long_cubin.h: $(OBJ_DIR)/charmm_long.cubin $(OBJ_DIR)/charmm_long.cubin
$(BIN2C) -c -n charmm_long $(OBJ_DIR)/charmm_long.cubin > $(OBJ_DIR)/charmm_long_cubin.h
$(OBJ_DIR)/lal_charmm_long.o: $(ALL_H) lal_charmm_long.h lal_charmm_long.cpp $(OBJ_DIR)/charmm_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_charmm_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_charmm_long_ext.o: $(ALL_H) lal_charmm_long.h lal_charmm_long_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_charmm_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj96.cubin: lal_lj96.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj96.cu
$(OBJ_DIR)/lj96_cubin.h: $(OBJ_DIR)/lj96.cubin $(OBJ_DIR)/lj96.cubin
$(BIN2C) -c -n lj96 $(OBJ_DIR)/lj96.cubin > $(OBJ_DIR)/lj96_cubin.h
$(OBJ_DIR)/lal_lj96.o: $(ALL_H) lal_lj96.h lal_lj96.cpp $(OBJ_DIR)/lj96_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_lj96.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj96_ext.o: $(ALL_H) lal_lj96.h lal_lj96_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_lj96_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_expand.cubin: lal_lj_expand.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_expand.cu
$(OBJ_DIR)/lj_expand_cubin.h: $(OBJ_DIR)/lj_expand.cubin $(OBJ_DIR)/lj_expand.cubin
$(BIN2C) -c -n lj_expand $(OBJ_DIR)/lj_expand.cubin > $(OBJ_DIR)/lj_expand_cubin.h
$(OBJ_DIR)/lal_lj_expand.o: $(ALL_H) lal_lj_expand.h lal_lj_expand.cpp $(OBJ_DIR)/lj_expand_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_lj_expand.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_expand_ext.o: $(ALL_H) lal_lj_expand.h lal_lj_expand_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_lj_expand_ext.cpp -I$(OBJ_DIR)
-$(OBJ_DIR)/cg_cmm.cubin: lal_cg_cmm.cu lal_precision.h lal_preprocessor.h
- $(CUDA) --cubin -DNV_KERNEL -o $@ lal_cg_cmm.cu
+$(OBJ_DIR)/lj_sdk.cubin: lal_lj_sdk.cu lal_precision.h lal_preprocessor.h
+ $(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_sdk.cu
-$(OBJ_DIR)/cg_cmm_cubin.h: $(OBJ_DIR)/cg_cmm.cubin $(OBJ_DIR)/cg_cmm.cubin
- $(BIN2C) -c -n cg_cmm $(OBJ_DIR)/cg_cmm.cubin > $(OBJ_DIR)/cg_cmm_cubin.h
+$(OBJ_DIR)/lj_sdk_cubin.h: $(OBJ_DIR)/lj_sdk.cubin $(OBJ_DIR)/lj_sdk.cubin
+ $(BIN2C) -c -n lj_sdk $(OBJ_DIR)/lj_sdk.cubin > $(OBJ_DIR)/lj_sdk_cubin.h
-$(OBJ_DIR)/lal_cg_cmm.o: $(ALL_H) lal_cg_cmm.h lal_cg_cmm.cpp $(OBJ_DIR)/cg_cmm_cubin.h $(OBJ_DIR)/lal_base_atomic.o
- $(CUDR) -o $@ -c lal_cg_cmm.cpp -I$(OBJ_DIR)
+$(OBJ_DIR)/lal_lj_sdk.o: $(ALL_H) lal_lj_sdk.h lal_lj_sdk.cpp $(OBJ_DIR)/lj_sdk_cubin.h $(OBJ_DIR)/lal_base_atomic.o
+ $(CUDR) -o $@ -c lal_lj_sdk.cpp -I$(OBJ_DIR)
-$(OBJ_DIR)/lal_cg_cmm_ext.o: $(ALL_H) lal_cg_cmm.h lal_cg_cmm_ext.cpp lal_base_atomic.h
- $(CUDR) -o $@ -c lal_cg_cmm_ext.cpp -I$(OBJ_DIR)
+$(OBJ_DIR)/lal_lj_sdk_ext.o: $(ALL_H) lal_lj_sdk.h lal_lj_sdk_ext.cpp lal_base_atomic.h
+ $(CUDR) -o $@ -c lal_lj_sdk_ext.cpp -I$(OBJ_DIR)
-$(OBJ_DIR)/cg_cmm_long.cubin: lal_cg_cmm_long.cu lal_precision.h lal_preprocessor.h
- $(CUDA) --cubin -DNV_KERNEL -o $@ lal_cg_cmm_long.cu
+$(OBJ_DIR)/lj_sdk_long.cubin: lal_lj_sdk_long.cu lal_precision.h lal_preprocessor.h
+ $(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_sdk_long.cu
-$(OBJ_DIR)/cg_cmm_long_cubin.h: $(OBJ_DIR)/cg_cmm_long.cubin $(OBJ_DIR)/cg_cmm_long.cubin
- $(BIN2C) -c -n cg_cmm_long $(OBJ_DIR)/cg_cmm_long.cubin > $(OBJ_DIR)/cg_cmm_long_cubin.h
+$(OBJ_DIR)/lj_sdk_long_cubin.h: $(OBJ_DIR)/lj_sdk_long.cubin $(OBJ_DIR)/lj_sdk_long.cubin
+ $(BIN2C) -c -n lj_sdk_long $(OBJ_DIR)/lj_sdk_long.cubin > $(OBJ_DIR)/lj_sdk_long_cubin.h
-$(OBJ_DIR)/lal_cg_cmm_long.o: $(ALL_H) lal_cg_cmm_long.h lal_cg_cmm_long.cpp $(OBJ_DIR)/cg_cmm_long_cubin.h $(OBJ_DIR)/lal_base_atomic.o
- $(CUDR) -o $@ -c lal_cg_cmm_long.cpp -I$(OBJ_DIR)
+$(OBJ_DIR)/lal_lj_sdk_long.o: $(ALL_H) lal_lj_sdk_long.h lal_lj_sdk_long.cpp $(OBJ_DIR)/lj_sdk_long_cubin.h $(OBJ_DIR)/lal_base_atomic.o
+ $(CUDR) -o $@ -c lal_lj_sdk_long.cpp -I$(OBJ_DIR)
-$(OBJ_DIR)/lal_cg_cmm_long_ext.o: $(ALL_H) lal_cg_cmm_long.h lal_cg_cmm_long_ext.cpp lal_base_charge.h
- $(CUDR) -o $@ -c lal_cg_cmm_long_ext.cpp -I$(OBJ_DIR)
+$(OBJ_DIR)/lal_lj_sdk_long_ext.o: $(ALL_H) lal_lj_sdk_long.h lal_lj_sdk_long_ext.cpp lal_base_charge.h
+ $(CUDR) -o $@ -c lal_lj_sdk_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/eam.cubin: lal_eam.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_eam.cu
$(OBJ_DIR)/eam_cubin.h: $(OBJ_DIR)/eam.cubin $(OBJ_DIR)/eam.cubin
$(BIN2C) -c -n eam $(OBJ_DIR)/eam.cubin > $(OBJ_DIR)/eam_cubin.h
$(OBJ_DIR)/lal_eam.o: $(ALL_H) lal_eam.h lal_eam.cpp $(OBJ_DIR)/eam_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_eam.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_eam_ext.o: $(ALL_H) lal_eam.h lal_eam_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_eam_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_eam_fs_ext.o: $(ALL_H) lal_eam.h lal_eam_fs_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_eam_fs_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_eam_alloy_ext.o: $(ALL_H) lal_eam.h lal_eam_alloy_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_eam_alloy_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/buck.cubin: lal_buck.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_buck.cu
$(OBJ_DIR)/buck_cubin.h: $(OBJ_DIR)/buck.cubin $(OBJ_DIR)/buck.cubin
$(BIN2C) -c -n buck $(OBJ_DIR)/buck.cubin > $(OBJ_DIR)/buck_cubin.h
$(OBJ_DIR)/lal_buck.o: $(ALL_H) lal_buck.h lal_buck.cpp $(OBJ_DIR)/buck_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_buck.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_buck_ext.o: $(ALL_H) lal_buck.h lal_buck_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_buck_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/buck_coul.cubin: lal_buck_coul.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_buck_coul.cu
$(OBJ_DIR)/buck_coul_cubin.h: $(OBJ_DIR)/buck_coul.cubin $(OBJ_DIR)/buck_coul.cubin
$(BIN2C) -c -n buck_coul $(OBJ_DIR)/buck_coul.cubin > $(OBJ_DIR)/buck_coul_cubin.h
$(OBJ_DIR)/lal_buck_coul.o: $(ALL_H) lal_buck_coul.h lal_buck_coul.cpp $(OBJ_DIR)/buck_coul_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_buck_coul.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_buck_coul_ext.o: $(ALL_H) lal_buck_coul.h lal_buck_coul_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_buck_coul_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/buck_coul_long.cubin: lal_buck_coul_long.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_buck_coul_long.cu
$(OBJ_DIR)/buck_coul_long_cubin.h: $(OBJ_DIR)/buck_coul_long.cubin $(OBJ_DIR)/buck_coul_long.cubin
$(BIN2C) -c -n buck_coul_long $(OBJ_DIR)/buck_coul_long.cubin > $(OBJ_DIR)/buck_coul_long_cubin.h
$(OBJ_DIR)/lal_buck_coul_long.o: $(ALL_H) lal_buck_coul_long.h lal_buck_coul_long.cpp $(OBJ_DIR)/buck_coul_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_buck_coul_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_buck_coul_long_ext.o: $(ALL_H) lal_buck_coul_long.h lal_buck_coul_long_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_buck_coul_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/table.cubin: lal_table.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_table.cu
$(OBJ_DIR)/table_cubin.h: $(OBJ_DIR)/table.cubin $(OBJ_DIR)/table.cubin
$(BIN2C) -c -n table $(OBJ_DIR)/table.cubin > $(OBJ_DIR)/table_cubin.h
$(OBJ_DIR)/lal_table.o: $(ALL_H) lal_table.h lal_table.cpp $(OBJ_DIR)/table_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_table.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_table_ext.o: $(ALL_H) lal_table.h lal_table_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_table_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/yukawa.cubin: lal_yukawa.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_yukawa.cu
$(OBJ_DIR)/yukawa_cubin.h: $(OBJ_DIR)/yukawa.cubin $(OBJ_DIR)/yukawa.cubin
$(BIN2C) -c -n yukawa $(OBJ_DIR)/yukawa.cubin > $(OBJ_DIR)/yukawa_cubin.h
$(OBJ_DIR)/lal_yukawa.o: $(ALL_H) lal_yukawa.h lal_yukawa.cpp $(OBJ_DIR)/yukawa_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_yukawa.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_yukawa_ext.o: $(ALL_H) lal_yukawa.h lal_yukawa_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_yukawa_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/born.cubin: lal_born.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_born.cu
$(OBJ_DIR)/born_cubin.h: $(OBJ_DIR)/born.cubin $(OBJ_DIR)/born.cubin
$(BIN2C) -c -n born $(OBJ_DIR)/born.cubin > $(OBJ_DIR)/born_cubin.h
$(OBJ_DIR)/lal_born.o: $(ALL_H) lal_born.h lal_born.cpp $(OBJ_DIR)/born_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_born.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_born_ext.o: $(ALL_H) lal_born.h lal_born_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_born_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/born_coul_wolf.cubin: lal_born_coul_wolf.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_born_coul_wolf.cu
$(OBJ_DIR)/born_coul_wolf_cubin.h: $(OBJ_DIR)/born_coul_wolf.cubin $(OBJ_DIR)/born_coul_wolf.cubin
$(BIN2C) -c -n born_coul_wolf $(OBJ_DIR)/born_coul_wolf.cubin > $(OBJ_DIR)/born_coul_wolf_cubin.h
$(OBJ_DIR)/lal_born_coul_wolf.o: $(ALL_H) lal_born_coul_wolf.h lal_born_coul_wolf.cpp $(OBJ_DIR)/born_coul_wolf_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_born_coul_wolf.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_born_coul_wolf_ext.o: $(ALL_H) lal_born_coul_wolf.h lal_born_coul_wolf_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_born_coul_wolf_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/born_coul_long.cubin: lal_born_coul_long.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_born_coul_long.cu
$(OBJ_DIR)/born_coul_long_cubin.h: $(OBJ_DIR)/born_coul_long.cubin $(OBJ_DIR)/born_coul_long.cubin
$(BIN2C) -c -n born_coul_long $(OBJ_DIR)/born_coul_long.cubin > $(OBJ_DIR)/born_coul_long_cubin.h
$(OBJ_DIR)/lal_born_coul_long.o: $(ALL_H) lal_born_coul_long.h lal_born_coul_long.cpp $(OBJ_DIR)/born_coul_long_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_born_coul_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_born_coul_long_ext.o: $(ALL_H) lal_born_coul_long.h lal_born_coul_long_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_born_coul_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/dipole_lj.cubin: lal_dipole_lj.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_dipole_lj.cu
$(OBJ_DIR)/dipole_lj_cubin.h: $(OBJ_DIR)/dipole_lj.cubin $(OBJ_DIR)/dipole_lj.cubin
$(BIN2C) -c -n dipole_lj $(OBJ_DIR)/dipole_lj.cubin > $(OBJ_DIR)/dipole_lj_cubin.h
$(OBJ_DIR)/lal_dipole_lj.o: $(ALL_H) lal_dipole_lj.h lal_dipole_lj.cpp $(OBJ_DIR)/dipole_lj_cubin.h $(OBJ_DIR)/lal_base_dipole.o
$(CUDR) -o $@ -c lal_dipole_lj.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_dipole_lj_ext.o: $(ALL_H) lal_dipole_lj.h lal_dipole_lj_ext.cpp lal_base_dipole.h
$(CUDR) -o $@ -c lal_dipole_lj_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/dipole_lj_sf.cubin: lal_dipole_lj_sf.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_dipole_lj_sf.cu
$(OBJ_DIR)/dipole_lj_sf_cubin.h: $(OBJ_DIR)/dipole_lj_sf.cubin $(OBJ_DIR)/dipole_lj_sf.cubin
$(BIN2C) -c -n dipole_lj_sf $(OBJ_DIR)/dipole_lj_sf.cubin > $(OBJ_DIR)/dipole_lj_sf_cubin.h
$(OBJ_DIR)/lal_dipole_lj_sf.o: $(ALL_H) lal_dipole_lj_sf.h lal_dipole_lj_sf.cpp $(OBJ_DIR)/dipole_lj_sf_cubin.h $(OBJ_DIR)/lal_base_dipole.o
$(CUDR) -o $@ -c lal_dipole_lj_sf.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_dipole_lj_sf_ext.o: $(ALL_H) lal_dipole_lj_sf.h lal_dipole_lj_sf_ext.cpp lal_base_dipole.h
$(CUDR) -o $@ -c lal_dipole_lj_sf_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/colloid.cubin: lal_colloid.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_colloid.cu
$(OBJ_DIR)/colloid_cubin.h: $(OBJ_DIR)/colloid.cubin $(OBJ_DIR)/colloid.cubin
$(BIN2C) -c -n colloid $(OBJ_DIR)/colloid.cubin > $(OBJ_DIR)/colloid_cubin.h
$(OBJ_DIR)/lal_colloid.o: $(ALL_H) lal_colloid.h lal_colloid.cpp $(OBJ_DIR)/colloid_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_colloid.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_colloid_ext.o: $(ALL_H) lal_colloid.h lal_colloid_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_colloid_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/gauss.cubin: lal_gauss.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_gauss.cu
$(OBJ_DIR)/gauss_cubin.h: $(OBJ_DIR)/gauss.cubin $(OBJ_DIR)/gauss.cubin
$(BIN2C) -c -n gauss $(OBJ_DIR)/gauss.cubin > $(OBJ_DIR)/gauss_cubin.h
$(OBJ_DIR)/lal_gauss.o: $(ALL_H) lal_gauss.h lal_gauss.cpp $(OBJ_DIR)/gauss_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_gauss.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_gauss_ext.o: $(ALL_H) lal_gauss.h lal_gauss_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_gauss_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/yukawa_colloid.cubin: lal_yukawa_colloid.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_yukawa_colloid.cu
$(OBJ_DIR)/yukawa_colloid_cubin.h: $(OBJ_DIR)/yukawa_colloid.cubin $(OBJ_DIR)/yukawa_colloid.cubin
$(BIN2C) -c -n yukawa_colloid $(OBJ_DIR)/yukawa_colloid.cubin > $(OBJ_DIR)/yukawa_colloid_cubin.h
$(OBJ_DIR)/lal_yukawa_colloid.o: $(ALL_H) lal_yukawa_colloid.h lal_yukawa_colloid.cpp $(OBJ_DIR)/yukawa_colloid_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_yukawa_colloid.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_yukawa_colloid_ext.o: $(ALL_H) lal_yukawa_colloid.h lal_yukawa_colloid_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_yukawa_colloid_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_coul_debye.cubin: lal_lj_coul_debye.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_coul_debye.cu
$(OBJ_DIR)/lj_coul_debye_cubin.h: $(OBJ_DIR)/lj_coul_debye.cubin $(OBJ_DIR)/lj_coul_debye.cubin
$(BIN2C) -c -n lj_coul_debye $(OBJ_DIR)/lj_coul_debye.cubin > $(OBJ_DIR)/lj_coul_debye_cubin.h
$(OBJ_DIR)/lal_lj_coul_debye.o: $(ALL_H) lal_lj_coul_debye.h lal_lj_coul_debye.cpp $(OBJ_DIR)/lj_coul_debye_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_lj_coul_debye.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_coul_debye_ext.o: $(ALL_H) lal_lj_coul_debye.h lal_lj_coul_debye_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_lj_coul_debye_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/coul_dsf.cubin: lal_coul_dsf.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_coul_dsf.cu
$(OBJ_DIR)/coul_dsf_cubin.h: $(OBJ_DIR)/coul_dsf.cubin $(OBJ_DIR)/coul_dsf.cubin
$(BIN2C) -c -n coul_dsf $(OBJ_DIR)/coul_dsf.cubin > $(OBJ_DIR)/coul_dsf_cubin.h
$(OBJ_DIR)/lal_coul_dsf.o: $(ALL_H) lal_coul_dsf.h lal_coul_dsf.cpp $(OBJ_DIR)/coul_dsf_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_coul_dsf.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_coul_dsf_ext.o: $(ALL_H) lal_coul_dsf.h lal_coul_dsf_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_coul_dsf_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/sw.cubin: lal_sw.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_sw.cu
$(OBJ_DIR)/sw_cubin.h: $(OBJ_DIR)/sw.cubin $(OBJ_DIR)/sw.cubin
$(BIN2C) -c -n sw $(OBJ_DIR)/sw.cubin > $(OBJ_DIR)/sw_cubin.h
$(OBJ_DIR)/lal_sw.o: $(ALL_H) lal_sw.h lal_sw.cpp $(OBJ_DIR)/sw_cubin.h $(OBJ_DIR)/lal_base_three.o
$(CUDR) -o $@ -c lal_sw.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_sw_ext.o: $(ALL_H) lal_sw.h lal_sw_ext.cpp lal_base_three.h
$(CUDR) -o $@ -c lal_sw_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/beck.cubin: lal_beck.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_beck.cu
$(OBJ_DIR)/beck_cubin.h: $(OBJ_DIR)/beck.cubin $(OBJ_DIR)/beck.cubin
$(BIN2C) -c -n beck $(OBJ_DIR)/beck.cubin > $(OBJ_DIR)/beck_cubin.h
$(OBJ_DIR)/lal_beck.o: $(ALL_H) lal_beck.h lal_beck.cpp $(OBJ_DIR)/beck_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_beck.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_beck_ext.o: $(ALL_H) lal_beck.h lal_beck_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_beck_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/mie.cubin: lal_mie.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_mie.cu
$(OBJ_DIR)/mie_cubin.h: $(OBJ_DIR)/mie.cubin $(OBJ_DIR)/mie.cubin
$(BIN2C) -c -n mie $(OBJ_DIR)/mie.cubin > $(OBJ_DIR)/mie_cubin.h
$(OBJ_DIR)/lal_mie.o: $(ALL_H) lal_mie.h lal_mie.cpp $(OBJ_DIR)/mie_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_mie.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_mie_ext.o: $(ALL_H) lal_mie.h lal_mie_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_mie_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/soft.cubin: lal_soft.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_soft.cu
$(OBJ_DIR)/soft_cubin.h: $(OBJ_DIR)/soft.cubin $(OBJ_DIR)/soft.cubin
$(BIN2C) -c -n soft $(OBJ_DIR)/soft.cubin > $(OBJ_DIR)/soft_cubin.h
$(OBJ_DIR)/lal_soft.o: $(ALL_H) lal_soft.h lal_soft.cpp $(OBJ_DIR)/soft_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_soft.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_soft_ext.o: $(ALL_H) lal_soft.h lal_soft_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_soft_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_coul_msm.cubin: lal_lj_coul_msm.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_coul_msm.cu
$(OBJ_DIR)/lj_coul_msm_cubin.h: $(OBJ_DIR)/lj_coul_msm.cubin $(OBJ_DIR)/lj_coul_msm.cubin
$(BIN2C) -c -n lj_coul_msm $(OBJ_DIR)/lj_coul_msm.cubin > $(OBJ_DIR)/lj_coul_msm_cubin.h
$(OBJ_DIR)/lal_lj_coul_msm.o: $(ALL_H) lal_lj_coul_msm.h lal_lj_coul_msm.cpp $(OBJ_DIR)/lj_coul_msm_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_lj_coul_msm.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_coul_msm_ext.o: $(ALL_H) lal_lj_coul_msm.h lal_lj_coul_msm_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_lj_coul_msm_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_gromacs.cubin: lal_lj_gromacs.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_gromacs.cu
$(OBJ_DIR)/lj_gromacs_cubin.h: $(OBJ_DIR)/lj_gromacs.cubin $(OBJ_DIR)/lj_gromacs.cubin
$(BIN2C) -c -n lj_gromacs $(OBJ_DIR)/lj_gromacs.cubin > $(OBJ_DIR)/lj_gromacs_cubin.h
$(OBJ_DIR)/lal_lj_gromacs.o: $(ALL_H) lal_lj_gromacs.h lal_lj_gromacs.cpp $(OBJ_DIR)/lj_gromacs_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_lj_gromacs.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_gromacs_ext.o: $(ALL_H) lal_lj_gromacs.h lal_lj_gromacs_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_lj_gromacs_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/dpd.cubin: lal_dpd.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_dpd.cu
$(OBJ_DIR)/dpd_cubin.h: $(OBJ_DIR)/dpd.cubin $(OBJ_DIR)/dpd.cubin
$(BIN2C) -c -n dpd $(OBJ_DIR)/dpd.cubin > $(OBJ_DIR)/dpd_cubin.h
$(OBJ_DIR)/lal_dpd.o: $(ALL_H) lal_dpd.h lal_dpd.cpp $(OBJ_DIR)/dpd_cubin.h $(OBJ_DIR)/lal_base_dpd.o
$(CUDR) -o $@ -c lal_dpd.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_dpd_ext.o: $(ALL_H) lal_dpd.h lal_dpd_ext.cpp lal_base_dpd.h
$(CUDR) -o $@ -c lal_dpd_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/tersoff.cubin: lal_tersoff.cu lal_precision.h lal_tersoff_extra.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_tersoff.cu
$(OBJ_DIR)/tersoff_cubin.h: $(OBJ_DIR)/tersoff.cubin $(OBJ_DIR)/tersoff.cubin
$(BIN2C) -c -n tersoff $(OBJ_DIR)/tersoff.cubin > $(OBJ_DIR)/tersoff_cubin.h
$(OBJ_DIR)/lal_tersoff.o: $(ALL_H) lal_tersoff.h lal_tersoff.cpp $(OBJ_DIR)/tersoff_cubin.h $(OBJ_DIR)/lal_base_three.o
$(CUDR) -o $@ -c lal_tersoff.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_tersoff_ext.o: $(ALL_H) lal_tersoff.h lal_tersoff_ext.cpp lal_base_three.h
$(CUDR) -o $@ -c lal_tersoff_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/tersoff_zbl.cubin: lal_tersoff_zbl.cu lal_precision.h lal_tersoff_zbl_extra.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_tersoff_zbl.cu
$(OBJ_DIR)/tersoff_zbl_cubin.h: $(OBJ_DIR)/tersoff_zbl.cubin $(OBJ_DIR)/tersoff_zbl.cubin
$(BIN2C) -c -n tersoff_zbl $(OBJ_DIR)/tersoff_zbl.cubin > $(OBJ_DIR)/tersoff_zbl_cubin.h
$(OBJ_DIR)/lal_tersoff_zbl.o: $(ALL_H) lal_tersoff_zbl.h lal_tersoff_zbl.cpp $(OBJ_DIR)/tersoff_zbl_cubin.h $(OBJ_DIR)/lal_base_three.o
$(CUDR) -o $@ -c lal_tersoff_zbl.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_tersoff_zbl_ext.o: $(ALL_H) lal_tersoff_zbl.h lal_tersoff_zbl_ext.cpp lal_base_three.h
$(CUDR) -o $@ -c lal_tersoff_zbl_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/tersoff_mod.cubin: lal_tersoff_mod.cu lal_precision.h lal_tersoff_mod_extra.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_tersoff_mod.cu
$(OBJ_DIR)/tersoff_mod_cubin.h: $(OBJ_DIR)/tersoff_mod.cubin $(OBJ_DIR)/tersoff_mod.cubin
$(BIN2C) -c -n tersoff_mod $(OBJ_DIR)/tersoff_mod.cubin > $(OBJ_DIR)/tersoff_mod_cubin.h
$(OBJ_DIR)/lal_tersoff_mod.o: $(ALL_H) lal_tersoff_mod.h lal_tersoff_mod.cpp $(OBJ_DIR)/tersoff_mod_cubin.h $(OBJ_DIR)/lal_base_three.o
$(CUDR) -o $@ -c lal_tersoff_mod.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_tersoff_mod_ext.o: $(ALL_H) lal_tersoff_mod.h lal_tersoff_mod_ext.cpp lal_base_three.h
$(CUDR) -o $@ -c lal_tersoff_mod_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/coul.cubin: lal_coul.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_coul.cu
$(OBJ_DIR)/coul_cubin.h: $(OBJ_DIR)/coul.cubin $(OBJ_DIR)/coul.cubin
$(BIN2C) -c -n coul $(OBJ_DIR)/coul.cubin > $(OBJ_DIR)/coul_cubin.h
$(OBJ_DIR)/lal_coul.o: $(ALL_H) lal_coul.h lal_coul.cpp $(OBJ_DIR)/coul_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_coul.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_coul_ext.o: $(ALL_H) lal_coul.h lal_coul_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_coul_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/coul_debye.cubin: lal_coul_debye.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_coul_debye.cu
$(OBJ_DIR)/coul_debye_cubin.h: $(OBJ_DIR)/coul_debye.cubin $(OBJ_DIR)/coul_debye.cubin
$(BIN2C) -c -n coul_debye $(OBJ_DIR)/coul_debye.cubin > $(OBJ_DIR)/coul_debye_cubin.h
$(OBJ_DIR)/lal_coul_debye.o: $(ALL_H) lal_coul_debye.h lal_coul_debye.cpp $(OBJ_DIR)/coul_debye_cubin.h $(OBJ_DIR)/lal_base_charge.o
$(CUDR) -o $@ -c lal_coul_debye.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_coul_debye_ext.o: $(ALL_H) lal_coul_debye.h lal_coul_debye_ext.cpp lal_base_charge.h
$(CUDR) -o $@ -c lal_coul_debye_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/zbl.cubin: lal_zbl.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_zbl.cu
$(OBJ_DIR)/zbl_cubin.h: $(OBJ_DIR)/zbl.cubin $(OBJ_DIR)/zbl.cubin
$(BIN2C) -c -n zbl $(OBJ_DIR)/zbl.cubin > $(OBJ_DIR)/zbl_cubin.h
$(OBJ_DIR)/lal_zbl.o: $(ALL_H) lal_zbl.h lal_zbl.cpp $(OBJ_DIR)/zbl_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_zbl.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_zbl_ext.o: $(ALL_H) lal_zbl.h lal_zbl_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_zbl_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_cubic.cubin: lal_lj_cubic.cu lal_precision.h lal_preprocessor.h
$(CUDA) --cubin -DNV_KERNEL -o $@ lal_lj_cubic.cu
$(OBJ_DIR)/lj_cubic_cubin.h: $(OBJ_DIR)/lj_cubic.cubin $(OBJ_DIR)/lj_cubic.cubin
$(BIN2C) -c -n lj_cubic $(OBJ_DIR)/lj_cubic.cubin > $(OBJ_DIR)/lj_cubic_cubin.h
$(OBJ_DIR)/lal_lj_cubic.o: $(ALL_H) lal_lj_cubic.h lal_lj_cubic.cpp $(OBJ_DIR)/lj_cubic_cubin.h $(OBJ_DIR)/lal_base_atomic.o
$(CUDR) -o $@ -c lal_lj_cubic.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_cubic_ext.o: $(ALL_H) lal_lj_cubic.h lal_lj_cubic_ext.cpp lal_base_atomic.h
$(CUDR) -o $@ -c lal_lj_cubic_ext.cpp -I$(OBJ_DIR)
$(BIN_DIR)/nvc_get_devices: ./geryon/ucl_get_devices.cpp $(NVD_H)
$(CUDR) -o $@ ./geryon/ucl_get_devices.cpp -DUCL_CUDADR $(CUDA_LIB) -lcuda
$(GPU_LIB): $(OBJS) $(CUDPP)
$(AR) -crusv $(GPU_LIB) $(OBJS) $(CUDPP)
@cp $(EXTRAMAKE) Makefile.lammps
clean:
-rm -f $(EXECS) $(GPU_LIB) $(OBJS) $(CUDPP) $(CBNS) *.linkinfo
veryclean: clean
-rm -rf *~ *.linkinfo
cleanlib:
-rm -f $(EXECS) $(GPU_LIB) $(OBJS) $(CBNS) *.linkinfo
diff --git a/lib/gpu/Opencl.makefile b/lib/gpu/Opencl.makefile
index 7ef1dfba0..4a5959531 100644
--- a/lib/gpu/Opencl.makefile
+++ b/lib/gpu/Opencl.makefile
@@ -1,584 +1,584 @@
OCL = $(OCL_CPP) $(OCL_PREC) $(OCL_TUNE) -DUSE_OPENCL
OCL_LIB = $(LIB_DIR)/libgpu.a
# Headers for Geryon
UCL_H = $(wildcard ./geryon/ucl*.h)
OCL_H = $(wildcard ./geryon/ocl*.h) $(UCL_H)
# Headers for Pair Stuff
PAIR_H = lal_atom.h lal_answer.h lal_neighbor_shared.h \
lal_neighbor.h lal_precision.h lal_device.h \
lal_balance.h lal_pppm.h
# Headers for Preprocessor/Auxiliary Functions
PRE1_H = lal_preprocessor.h lal_aux_fun1.h
ALL_H = $(OCL_H) $(PAIR_H)
EXECS = $(BIN_DIR)/ocl_get_devices
OBJS = $(OBJ_DIR)/lal_atom.o $(OBJ_DIR)/lal_answer.o \
$(OBJ_DIR)/lal_neighbor_shared.o $(OBJ_DIR)/lal_neighbor.o \
$(OBJ_DIR)/lal_device.o $(OBJ_DIR)/lal_base_atomic.o \
$(OBJ_DIR)/lal_base_charge.o $(OBJ_DIR)/lal_base_ellipsoid.o \
$(OBJ_DIR)/lal_base_dipole.o $(OBJ_DIR)/lal_base_three.o \
$(OBJ_DIR)/lal_base_dpd.o \
$(OBJ_DIR)/lal_pppm.o $(OBJ_DIR)/lal_pppm_ext.o \
$(OBJ_DIR)/lal_gayberne.o $(OBJ_DIR)/lal_gayberne_ext.o \
$(OBJ_DIR)/lal_re_squared.o $(OBJ_DIR)/lal_re_squared_ext.o \
$(OBJ_DIR)/lal_lj.o $(OBJ_DIR)/lal_lj_ext.o \
$(OBJ_DIR)/lal_lj96.o $(OBJ_DIR)/lal_lj96_ext.o \
$(OBJ_DIR)/lal_lj_expand.o $(OBJ_DIR)/lal_lj_expand_ext.o \
$(OBJ_DIR)/lal_lj_coul.o $(OBJ_DIR)/lal_lj_coul_ext.o \
$(OBJ_DIR)/lal_lj_coul_long.o $(OBJ_DIR)/lal_lj_coul_long_ext.o \
$(OBJ_DIR)/lal_lj_dsf.o $(OBJ_DIR)/lal_lj_dsf_ext.o \
$(OBJ_DIR)/lal_lj_class2_long.o $(OBJ_DIR)/lal_lj_class2_long_ext.o \
$(OBJ_DIR)/lal_coul_long.o $(OBJ_DIR)/lal_coul_long_ext.o \
$(OBJ_DIR)/lal_morse.o $(OBJ_DIR)/lal_morse_ext.o \
$(OBJ_DIR)/lal_charmm_long.o $(OBJ_DIR)/lal_charmm_long_ext.o \
- $(OBJ_DIR)/lal_cg_cmm.o $(OBJ_DIR)/lal_cg_cmm_ext.o \
- $(OBJ_DIR)/lal_cg_cmm_long.o $(OBJ_DIR)/lal_cg_cmm_long_ext.o \
+ $(OBJ_DIR)/lal_lj_sdk.o $(OBJ_DIR)/lal_lj_sdk_ext.o \
+ $(OBJ_DIR)/lal_lj_sdk_long.o $(OBJ_DIR)/lal_lj_sdk_long_ext.o \
$(OBJ_DIR)/lal_eam.o $(OBJ_DIR)/lal_eam_ext.o \
$(OBJ_DIR)/lal_eam_fs_ext.o $(OBJ_DIR)/lal_eam_alloy_ext.o \
$(OBJ_DIR)/lal_buck.o $(OBJ_DIR)/lal_buck_ext.o \
$(OBJ_DIR)/lal_buck_coul.o $(OBJ_DIR)/lal_buck_coul_ext.o \
$(OBJ_DIR)/lal_buck_coul_long.o $(OBJ_DIR)/lal_buck_coul_long_ext.o \
$(OBJ_DIR)/lal_table.o $(OBJ_DIR)/lal_table_ext.o \
$(OBJ_DIR)/lal_yukawa.o $(OBJ_DIR)/lal_yukawa_ext.o \
$(OBJ_DIR)/lal_born.o $(OBJ_DIR)/lal_born_ext.o \
$(OBJ_DIR)/lal_born_coul_wolf.o $(OBJ_DIR)/lal_born_coul_wolf_ext.o \
$(OBJ_DIR)/lal_born_coul_long.o $(OBJ_DIR)/lal_born_coul_long_ext.o \
$(OBJ_DIR)/lal_dipole_lj.o $(OBJ_DIR)/lal_dipole_lj_ext.o \
$(OBJ_DIR)/lal_dipole_lj_sf.o $(OBJ_DIR)/lal_dipole_lj_sf_ext.o \
$(OBJ_DIR)/lal_colloid.o $(OBJ_DIR)/lal_colloid_ext.o \
$(OBJ_DIR)/lal_gauss.o $(OBJ_DIR)/lal_gauss_ext.o \
$(OBJ_DIR)/lal_yukawa_colloid.o $(OBJ_DIR)/lal_yukawa_colloid_ext.o \
$(OBJ_DIR)/lal_lj_coul_debye.o $(OBJ_DIR)/lal_lj_coul_debye_ext.o \
$(OBJ_DIR)/lal_coul_dsf.o $(OBJ_DIR)/lal_coul_dsf_ext.o \
$(OBJ_DIR)/lal_sw.o $(OBJ_DIR)/lal_sw_ext.o \
$(OBJ_DIR)/lal_beck.o $(OBJ_DIR)/lal_beck_ext.o \
$(OBJ_DIR)/lal_mie.o $(OBJ_DIR)/lal_mie_ext.o \
$(OBJ_DIR)/lal_soft.o $(OBJ_DIR)/lal_soft_ext.o \
$(OBJ_DIR)/lal_lj_coul_msm.o $(OBJ_DIR)/lal_lj_coul_msm_ext.o \
$(OBJ_DIR)/lal_lj_gromacs.o $(OBJ_DIR)/lal_lj_gromacs_ext.o \
$(OBJ_DIR)/lal_dpd.o $(OBJ_DIR)/lal_dpd_ext.o \
$(OBJ_DIR)/lal_tersoff.o $(OBJ_DIR)/lal_tersoff_ext.o \
$(OBJ_DIR)/lal_tersoff_zbl.o $(OBJ_DIR)/lal_tersoff_zbl_ext.o \
$(OBJ_DIR)/lal_tersoff_mod.o $(OBJ_DIR)/lal_tersoff_mod_ext.o \
$(OBJ_DIR)/lal_coul.o $(OBJ_DIR)/lal_coul_ext.o \
$(OBJ_DIR)/lal_coul_debye.o $(OBJ_DIR)/lal_coul_debye_ext.o \
$(OBJ_DIR)/lal_zbl.o $(OBJ_DIR)/lal_zbl_ext.o \
$(OBJ_DIR)/lal_lj_cubic.o $(OBJ_DIR)/lal_lj_cubic_ext.o
KERS = $(OBJ_DIR)/device_cl.h $(OBJ_DIR)/atom_cl.h \
$(OBJ_DIR)/neighbor_cpu_cl.h $(OBJ_DIR)/pppm_cl.h \
$(OBJ_DIR)/ellipsoid_nbor_cl.h $(OBJ_DIR)/gayberne_cl.h \
$(OBJ_DIR)/gayberne_lj_cl.h $(OBJ_DIR)/re_squared_cl.h \
$(OBJ_DIR)/re_squared_lj_cl.h $(OBJ_DIR)/lj_cl.h $(OBJ_DIR)/lj96_cl.h \
$(OBJ_DIR)/lj_expand_cl.h $(OBJ_DIR)/lj_coul_cl.h \
$(OBJ_DIR)/lj_coul_long_cl.h $(OBJ_DIR)/lj_dsf_cl.h \
$(OBJ_DIR)/lj_class2_long_cl.h \
$(OBJ_DIR)/coul_long_cl.h $(OBJ_DIR)/morse_cl.h \
- $(OBJ_DIR)/charmm_long_cl.h $(OBJ_DIR)/cg_cmm_cl.h \
- $(OBJ_DIR)/cg_cmm_long_cl.h $(OBJ_DIR)/neighbor_gpu_cl.h \
+ $(OBJ_DIR)/charmm_long_cl.h $(OBJ_DIR)/lj_sdk_cl.h \
+ $(OBJ_DIR)/lj_sdk_long_cl.h $(OBJ_DIR)/neighbor_gpu_cl.h \
$(OBJ_DIR)/eam_cl.h $(OBJ_DIR)/buck_cl.h \
$(OBJ_DIR)/buck_coul_cl.h $(OBJ_DIR)/buck_coul_long_cl.h \
$(OBJ_DIR)/table_cl.h $(OBJ_DIR)/yukawa_cl.h \
$(OBJ_DIR)/born_cl.h $(OBJ_DIR)/born_coul_wolf_cl.h \
$(OBJ_DIR)/born_coul_long_cl.h $(OBJ_DIR)/dipole_lj_cl.h \
$(OBJ_DIR)/dipole_lj_sf_cl.h $(OBJ_DIR)/colloid_cl.h \
$(OBJ_DIR)/gauss_cl.h $(OBJ_DIR)/yukawa_colloid_cl.h \
$(OBJ_DIR)/lj_coul_debye_cl.h $(OBJ_DIR)/coul_dsf_cl.h \
$(OBJ_DIR)/sw_cl.h $(OBJ_DIR)/beck_cl.h $(OBJ_DIR)/mie_cl.h \
$(OBJ_DIR)/soft_cl.h $(OBJ_DIR)/lj_coul_msm_cl.h \
$(OBJ_DIR)/lj_gromacs_cl.h $(OBJ_DIR)/dpd_cl.h \
$(OBJ_DIR)/lj_gauss_cl.h $(OBJ_DIR)/dzugutov_cl.h \
$(OBJ_DIR)/tersoff_cl.h $(OBJ_DIR)/tersoff_zbl_cl.h \
$(OBJ_DIR)/tersoff_mod_cl.h $(OBJ_DIR)/coul_cl.h \
$(OBJ_DIR)/coul_debye_cl.h $(OBJ_DIR)/zbl_cl.h \
$(OBJ_DIR)/lj_cubic_cl.h
OCL_EXECS = $(BIN_DIR)/ocl_get_devices
all: $(OBJ_DIR) $(OCL_LIB) $(EXECS)
$(OBJ_DIR):
mkdir -p $@
$(OBJ_DIR)/atom_cl.h: lal_atom.cu lal_preprocessor.h
$(BSH) ./geryon/file_to_cstr.sh atom lal_preprocessor.h lal_atom.cu $(OBJ_DIR)/atom_cl.h
$(OBJ_DIR)/lal_atom.o: lal_atom.cpp lal_atom.h $(OCL_H) $(OBJ_DIR)/atom_cl.h
$(OCL) -o $@ -c lal_atom.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_answer.o: lal_answer.cpp lal_answer.h $(OCL_H)
$(OCL) -o $@ -c lal_answer.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/neighbor_cpu_cl.h: lal_neighbor_cpu.cu lal_preprocessor.h
$(BSH) ./geryon/file_to_cstr.sh neighbor_cpu lal_preprocessor.h lal_neighbor_cpu.cu $(OBJ_DIR)/neighbor_cpu_cl.h
$(OBJ_DIR)/neighbor_gpu_cl.h: lal_neighbor_gpu.cu lal_preprocessor.h
$(BSH) ./geryon/file_to_cstr.sh neighbor_gpu lal_preprocessor.h lal_neighbor_gpu.cu $(OBJ_DIR)/neighbor_gpu_cl.h
$(OBJ_DIR)/lal_neighbor_shared.o: lal_neighbor_shared.cpp lal_neighbor_shared.h $(OCL_H) $(OBJ_DIR)/neighbor_cpu_cl.h $(OBJ_DIR)/neighbor_gpu_cl.h
$(OCL) -o $@ -c lal_neighbor_shared.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_neighbor.o: lal_neighbor.cpp lal_neighbor.h $(OCL_H) lal_neighbor_shared.h
$(OCL) -o $@ -c lal_neighbor.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/device_cl.h: lal_device.cu lal_preprocessor.h
$(BSH) ./geryon/file_to_cstr.sh device lal_preprocessor.h lal_device.cu $(OBJ_DIR)/device_cl.h
$(OBJ_DIR)/lal_device.o: lal_device.cpp lal_device.h $(ALL_H) $(OBJ_DIR)/device_cl.h
$(OCL) -o $@ -c lal_device.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_base_atomic.o: $(OCL_H) lal_base_atomic.h lal_base_atomic.cpp
$(OCL) -o $@ -c lal_base_atomic.cpp
$(OBJ_DIR)/lal_base_charge.o: $(OCL_H) lal_base_charge.h lal_base_charge.cpp
$(OCL) -o $@ -c lal_base_charge.cpp
$(OBJ_DIR)/lal_base_ellipsoid.o: $(OCL_H) lal_base_ellipsoid.h lal_base_ellipsoid.cpp $(OBJ_DIR)/ellipsoid_nbor_cl.h
$(OCL) -o $@ -c lal_base_ellipsoid.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_base_dipole.o: $(OCL_H) lal_base_dipole.h lal_base_dipole.cpp
$(OCL) -o $@ -c lal_base_dipole.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_base_three.o: $(OCL_H) lal_base_three.h lal_base_three.cpp
$(OCL) -o $@ -c lal_base_three.cpp
$(OBJ_DIR)/lal_base_dpd.o: $(OCL_H) lal_base_dpd.h lal_base_dpd.cpp
$(OCL) -o $@ -c lal_base_dpd.cpp
$(OBJ_DIR)/pppm_cl.h: lal_pppm.cu lal_preprocessor.h
$(BSH) ./geryon/file_to_cstr.sh pppm lal_preprocessor.h lal_pppm.cu $(OBJ_DIR)/pppm_cl.h;
$(OBJ_DIR)/lal_pppm.o: $(ALL_H) lal_pppm.h lal_pppm.cpp $(OBJ_DIR)/pppm_cl.h $(OBJ_DIR)/pppm_cl.h
$(OCL) -o $@ -c lal_pppm.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_pppm_ext.o: $(ALL_H) lal_pppm.h lal_pppm_ext.cpp
$(OCL) -o $@ -c lal_pppm_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/ellipsoid_nbor_cl.h: lal_ellipsoid_nbor.cu lal_preprocessor.h
$(BSH) ./geryon/file_to_cstr.sh ellipsoid_nbor lal_preprocessor.h lal_ellipsoid_nbor.cu $(OBJ_DIR)/ellipsoid_nbor_cl.h
$(OBJ_DIR)/gayberne_cl.h: lal_gayberne.cu lal_ellipsoid_extra.h lal_aux_fun1.h lal_preprocessor.h
$(BSH) ./geryon/file_to_cstr.sh gayberne lal_preprocessor.h lal_aux_fun1.h lal_ellipsoid_extra.h lal_gayberne.cu $(OBJ_DIR)/gayberne_cl.h;
$(OBJ_DIR)/gayberne_lj_cl.h: lal_gayberne_lj.cu lal_ellipsoid_extra.h lal_aux_fun1.h lal_preprocessor.h
$(BSH) ./geryon/file_to_cstr.sh gayberne_lj lal_preprocessor.h lal_aux_fun1.h lal_ellipsoid_extra.h lal_gayberne_lj.cu $(OBJ_DIR)/gayberne_lj_cl.h;
$(OBJ_DIR)/lal_gayberne.o: $(ALL_H) lal_gayberne.h lal_gayberne.cpp $(OBJ_DIR)/gayberne_cl.h $(OBJ_DIR)/gayberne_lj_cl.h $(OBJ_DIR)/lal_base_ellipsoid.o
$(OCL) -o $@ -c lal_gayberne.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_gayberne_ext.o: $(ALL_H) $(OBJ_DIR)/lal_gayberne.o lal_gayberne_ext.cpp
$(OCL) -o $@ -c lal_gayberne_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/re_squared_cl.h: lal_re_squared.cu lal_ellipsoid_extra.h lal_aux_fun1.h lal_preprocessor.h
$(BSH) ./geryon/file_to_cstr.sh re_squared lal_preprocessor.h lal_aux_fun1.h lal_ellipsoid_extra.h lal_re_squared.cu $(OBJ_DIR)/re_squared_cl.h;
$(OBJ_DIR)/re_squared_lj_cl.h: lal_re_squared_lj.cu lal_ellipsoid_extra.h lal_aux_fun1.h lal_preprocessor.h
$(BSH) ./geryon/file_to_cstr.sh re_squared_lj lal_preprocessor.h lal_aux_fun1.h lal_ellipsoid_extra.h lal_re_squared_lj.cu $(OBJ_DIR)/re_squared_lj_cl.h;
$(OBJ_DIR)/lal_re_squared.o: $(ALL_H) lal_re_squared.h lal_re_squared.cpp $(OBJ_DIR)/re_squared_cl.h $(OBJ_DIR)/re_squared_lj_cl.h $(OBJ_DIR)/lal_base_ellipsoid.o
$(OCL) -o $@ -c lal_re_squared.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_re_squared_ext.o: $(ALL_H) $(OBJ_DIR)/lal_re_squared.o lal_re_squared_ext.cpp
$(OCL) -o $@ -c lal_re_squared_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_cl.h: lal_lj.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj $(PRE1_H) lal_lj.cu $(OBJ_DIR)/lj_cl.h;
$(OBJ_DIR)/lal_lj.o: $(ALL_H) lal_lj.h lal_lj.cpp $(OBJ_DIR)/lj_cl.h $(OBJ_DIR)/lj_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_lj.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_ext.o: $(ALL_H) lal_lj.h lal_lj_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_lj_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_coul_cl.h: lal_lj_coul.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj_coul $(PRE1_H) lal_lj_coul.cu $(OBJ_DIR)/lj_coul_cl.h;
$(OBJ_DIR)/lal_lj_coul.o: $(ALL_H) lal_lj_coul.h lal_lj_coul.cpp $(OBJ_DIR)/lj_coul_cl.h $(OBJ_DIR)/lj_coul_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_lj_coul.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_coul_ext.o: $(ALL_H) lal_lj_coul.h lal_lj_coul_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_lj_coul_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_coul_long_cl.h: lal_lj_coul_long.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj_coul_long $(PRE1_H) lal_lj_coul_long.cu $(OBJ_DIR)/lj_coul_long_cl.h;
$(OBJ_DIR)/lal_lj_coul_long.o: $(ALL_H) lal_lj_coul_long.h lal_lj_coul_long.cpp $(OBJ_DIR)/lj_coul_long_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_lj_coul_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_coul_long_ext.o: $(ALL_H) lal_lj_coul_long.h lal_lj_coul_long_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_lj_coul_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_dsf_cl.h: lal_lj_dsf.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj_dsf $(PRE1_H) lal_lj_dsf.cu $(OBJ_DIR)/lj_dsf_cl.h;
$(OBJ_DIR)/lal_lj_dsf.o: $(ALL_H) lal_lj_dsf.h lal_lj_dsf.cpp $(OBJ_DIR)/lj_dsf_cl.h $(OBJ_DIR)/lj_dsf_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_lj_dsf.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_dsf_ext.o: $(ALL_H) lal_lj_dsf.h lal_lj_dsf_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_lj_dsf_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_class2_long_cl.h: lal_lj_class2_long.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj_class2_long $(PRE1_H) lal_lj_class2_long.cu $(OBJ_DIR)/lj_class2_long_cl.h;
$(OBJ_DIR)/lal_lj_class2_long.o: $(ALL_H) lal_lj_class2_long.h lal_lj_class2_long.cpp $(OBJ_DIR)/lj_class2_long_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_lj_class2_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_class2_long_ext.o: $(ALL_H) lal_lj_class2_long.h lal_lj_class2_long_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_lj_class2_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/coul_long_cl.h: lal_coul_long.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh coul_long $(PRE1_H) lal_coul_long.cu $(OBJ_DIR)/coul_long_cl.h;
$(OBJ_DIR)/lal_coul_long.o: $(ALL_H) lal_coul_long.h lal_coul_long.cpp $(OBJ_DIR)/coul_long_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_coul_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_coul_long_ext.o: $(ALL_H) lal_coul_long.h lal_coul_long_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_coul_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/morse_cl.h: lal_morse.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh morse $(PRE1_H) lal_morse.cu $(OBJ_DIR)/morse_cl.h;
$(OBJ_DIR)/lal_morse.o: $(ALL_H) lal_morse.h lal_morse.cpp $(OBJ_DIR)/morse_cl.h $(OBJ_DIR)/morse_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_morse.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_morse_ext.o: $(ALL_H) lal_morse.h lal_morse_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_morse_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/charmm_long_cl.h: lal_charmm_long.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh charmm_long $(PRE1_H) lal_charmm_long.cu $(OBJ_DIR)/charmm_long_cl.h;
$(OBJ_DIR)/lal_charmm_long.o: $(ALL_H) lal_charmm_long.h lal_charmm_long.cpp $(OBJ_DIR)/charmm_long_cl.h $(OBJ_DIR)/charmm_long_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_charmm_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_charmm_long_ext.o: $(ALL_H) lal_charmm_long.h lal_charmm_long_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_charmm_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj96_cl.h: lal_lj96.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj96 $(PRE1_H) lal_lj96.cu $(OBJ_DIR)/lj96_cl.h;
$(OBJ_DIR)/lal_lj96.o: $(ALL_H) lal_lj96.h lal_lj96.cpp $(OBJ_DIR)/lj96_cl.h $(OBJ_DIR)/lj96_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_lj96.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj96_ext.o: $(ALL_H) lal_lj96.h lal_lj96_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_lj96_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_expand_cl.h: lal_lj_expand.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj_expand $(PRE1_H) lal_lj_expand.cu $(OBJ_DIR)/lj_expand_cl.h;
$(OBJ_DIR)/lal_lj_expand.o: $(ALL_H) lal_lj_expand.h lal_lj_expand.cpp $(OBJ_DIR)/lj_expand_cl.h $(OBJ_DIR)/lj_expand_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_lj_expand.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_expand_ext.o: $(ALL_H) lal_lj_expand.h lal_lj_expand_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_lj_expand_ext.cpp -I$(OBJ_DIR)
-$(OBJ_DIR)/cg_cmm_cl.h: lal_cg_cmm.cu $(PRE1_H)
- $(BSH) ./geryon/file_to_cstr.sh cg_cmm $(PRE1_H) lal_cg_cmm.cu $(OBJ_DIR)/cg_cmm_cl.h;
+$(OBJ_DIR)/lj_sdk_cl.h: lal_lj_sdk.cu $(PRE1_H)
+ $(BSH) ./geryon/file_to_cstr.sh lj_sdk $(PRE1_H) lal_lj_sdk.cu $(OBJ_DIR)/lj_sdk_cl.h;
-$(OBJ_DIR)/lal_cg_cmm.o: $(ALL_H) lal_cg_cmm.h lal_cg_cmm.cpp $(OBJ_DIR)/cg_cmm_cl.h $(OBJ_DIR)/cg_cmm_cl.h $(OBJ_DIR)/lal_base_atomic.o
- $(OCL) -o $@ -c lal_cg_cmm.cpp -I$(OBJ_DIR)
+$(OBJ_DIR)/lal_lj_sdk.o: $(ALL_H) lal_lj_sdk.h lal_lj_sdk.cpp $(OBJ_DIR)/lj_sdk_cl.h $(OBJ_DIR)/lj_sdk_cl.h $(OBJ_DIR)/lal_base_atomic.o
+ $(OCL) -o $@ -c lal_lj_sdk.cpp -I$(OBJ_DIR)
-$(OBJ_DIR)/lal_cg_cmm_ext.o: $(ALL_H) lal_cg_cmm.h lal_cg_cmm_ext.cpp lal_base_atomic.h
- $(OCL) -o $@ -c lal_cg_cmm_ext.cpp -I$(OBJ_DIR)
+$(OBJ_DIR)/lal_lj_sdk_ext.o: $(ALL_H) lal_lj_sdk.h lal_lj_sdk_ext.cpp lal_base_atomic.h
+ $(OCL) -o $@ -c lal_lj_sdk_ext.cpp -I$(OBJ_DIR)
-$(OBJ_DIR)/cg_cmm_long_cl.h: lal_cg_cmm_long.cu $(PRE1_H)
- $(BSH) ./geryon/file_to_cstr.sh cg_cmm_long $(PRE1_H) lal_cg_cmm_long.cu $(OBJ_DIR)/cg_cmm_long_cl.h;
+$(OBJ_DIR)/lj_sdk_long_cl.h: lal_lj_sdk_long.cu $(PRE1_H)
+ $(BSH) ./geryon/file_to_cstr.sh lj_sdk_long $(PRE1_H) lal_lj_sdk_long.cu $(OBJ_DIR)/lj_sdk_long_cl.h;
-$(OBJ_DIR)/lal_cg_cmm_long.o: $(ALL_H) lal_cg_cmm_long.h lal_cg_cmm_long.cpp $(OBJ_DIR)/cg_cmm_long_cl.h $(OBJ_DIR)/cg_cmm_long_cl.h $(OBJ_DIR)/lal_base_atomic.o
- $(OCL) -o $@ -c lal_cg_cmm_long.cpp -I$(OBJ_DIR)
+$(OBJ_DIR)/lal_lj_sdk_long.o: $(ALL_H) lal_lj_sdk_long.h lal_lj_sdk_long.cpp $(OBJ_DIR)/lj_sdk_long_cl.h $(OBJ_DIR)/lj_sdk_long_cl.h $(OBJ_DIR)/lal_base_atomic.o
+ $(OCL) -o $@ -c lal_lj_sdk_long.cpp -I$(OBJ_DIR)
-$(OBJ_DIR)/lal_cg_cmm_long_ext.o: $(ALL_H) lal_cg_cmm_long.h lal_cg_cmm_long_ext.cpp lal_base_charge.h
- $(OCL) -o $@ -c lal_cg_cmm_long_ext.cpp -I$(OBJ_DIR)
+$(OBJ_DIR)/lal_lj_sdk_long_ext.o: $(ALL_H) lal_lj_sdk_long.h lal_lj_sdk_long_ext.cpp lal_base_charge.h
+ $(OCL) -o $@ -c lal_lj_sdk_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/eam_cl.h: lal_eam.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh eam $(PRE1_H) lal_eam.cu $(OBJ_DIR)/eam_cl.h;
$(OBJ_DIR)/lal_eam.o: $(ALL_H) lal_eam.h lal_eam.cpp $(OBJ_DIR)/eam_cl.h $(OBJ_DIR)/eam_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_eam.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_eam_ext.o: $(ALL_H) lal_eam.h lal_eam_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_eam_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_eam_fs_ext.o: $(ALL_H) lal_eam.h lal_eam_fs_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_eam_fs_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_eam_alloy_ext.o: $(ALL_H) lal_eam.h lal_eam_alloy_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_eam_alloy_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/buck_cl.h: lal_buck.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh buck $(PRE1_H) lal_buck.cu $(OBJ_DIR)/buck_cl.h;
$(OBJ_DIR)/lal_buck.o: $(ALL_H) lal_buck.h lal_buck.cpp $(OBJ_DIR)/buck_cl.h $(OBJ_DIR)/buck_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_buck.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_buck_ext.o: $(ALL_H) lal_buck.h lal_buck_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_buck_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/buck_coul_cl.h: lal_buck_coul.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh buck_coul $(PRE1_H) lal_buck_coul.cu $(OBJ_DIR)/buck_coul_cl.h;
$(OBJ_DIR)/lal_buck_coul.o: $(ALL_H) lal_buck_coul.h lal_buck_coul.cpp $(OBJ_DIR)/buck_coul_cl.h $(OBJ_DIR)/buck_coul_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_buck_coul.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_buck_coul_ext.o: $(ALL_H) lal_buck_coul.h lal_buck_coul_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_buck_coul_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/buck_coul_long_cl.h: lal_buck_coul_long.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh buck_coul_long $(PRE1_H) lal_buck_coul_long.cu $(OBJ_DIR)/buck_coul_long_cl.h;
$(OBJ_DIR)/lal_buck_coul_long.o: $(ALL_H) lal_buck_coul_long.h lal_buck_coul_long.cpp $(OBJ_DIR)/buck_coul_long_cl.h $(OBJ_DIR)/buck_coul_long_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_buck_coul_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_buck_coul_long_ext.o: $(ALL_H) lal_buck_coul_long.h lal_buck_coul_long_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_buck_coul_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/table_cl.h: lal_table.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh table $(PRE1_H) lal_table.cu $(OBJ_DIR)/table_cl.h;
$(OBJ_DIR)/lal_table.o: $(ALL_H) lal_table.h lal_table.cpp $(OBJ_DIR)/table_cl.h $(OBJ_DIR)/table_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_table.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_table_ext.o: $(ALL_H) lal_table.h lal_table_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_table_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/yukawa_cl.h: lal_yukawa.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh yukawa $(PRE1_H) lal_yukawa.cu $(OBJ_DIR)/yukawa_cl.h;
$(OBJ_DIR)/lal_yukawa.o: $(ALL_H) lal_yukawa.h lal_yukawa.cpp $(OBJ_DIR)/yukawa_cl.h $(OBJ_DIR)/yukawa_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_yukawa.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_yukawa_ext.o: $(ALL_H) lal_yukawa.h lal_yukawa_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_yukawa_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/born_cl.h: lal_born.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh born $(PRE1_H) lal_born.cu $(OBJ_DIR)/born_cl.h;
$(OBJ_DIR)/lal_born.o: $(ALL_H) lal_born.h lal_born.cpp $(OBJ_DIR)/born_cl.h $(OBJ_DIR)/born_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_born.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_born_ext.o: $(ALL_H) lal_born.h lal_born_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_born_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/born_coul_wolf_cl.h: lal_born_coul_wolf.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh born_coul_wolf $(PRE1_H) lal_born_coul_wolf.cu $(OBJ_DIR)/born_coul_wolf_cl.h;
$(OBJ_DIR)/lal_born_coul_wolf.o: $(ALL_H) lal_born_coul_wolf.h lal_born_coul_wolf.cpp $(OBJ_DIR)/born_coul_wolf_cl.h $(OBJ_DIR)/born_coul_wolf_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_born_coul_wolf.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_born_coul_wolf_ext.o: $(ALL_H) lal_born_coul_wolf.h lal_born_coul_wolf_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_born_coul_wolf_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/born_coul_long_cl.h: lal_born_coul_long.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh born_coul_long $(PRE1_H) lal_born_coul_long.cu $(OBJ_DIR)/born_coul_long_cl.h;
$(OBJ_DIR)/lal_born_coul_long.o: $(ALL_H) lal_born_coul_long.h lal_born_coul_long.cpp $(OBJ_DIR)/born_coul_long_cl.h $(OBJ_DIR)/born_coul_long_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_born_coul_long.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_born_coul_long_ext.o: $(ALL_H) lal_born_coul_long.h lal_born_coul_long_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_born_coul_long_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/dipole_lj_cl.h: lal_dipole_lj.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh dipole_lj $(PRE1_H) lal_dipole_lj.cu $(OBJ_DIR)/dipole_lj_cl.h;
$(OBJ_DIR)/lal_dipole_lj.o: $(ALL_H) lal_dipole_lj.h lal_dipole_lj.cpp $(OBJ_DIR)/dipole_lj_cl.h $(OBJ_DIR)/dipole_lj_cl.h $(OBJ_DIR)/lal_base_dipole.o
$(OCL) -o $@ -c lal_dipole_lj.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_dipole_lj_ext.o: $(ALL_H) lal_dipole_lj.h lal_dipole_lj_ext.cpp lal_base_dipole.h
$(OCL) -o $@ -c lal_dipole_lj_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/dipole_lj_sf_cl.h: lal_dipole_lj_sf.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh dipole_lj_sf $(PRE1_H) lal_dipole_lj_sf.cu $(OBJ_DIR)/dipole_lj_sf_cl.h;
$(OBJ_DIR)/lal_dipole_lj_sf.o: $(ALL_H) lal_dipole_lj_sf.h lal_dipole_lj_sf.cpp $(OBJ_DIR)/dipole_lj_sf_cl.h $(OBJ_DIR)/dipole_lj_sf_cl.h $(OBJ_DIR)/lal_base_dipole.o
$(OCL) -o $@ -c lal_dipole_lj_sf.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_dipole_lj_sf_ext.o: $(ALL_H) lal_dipole_lj_sf.h lal_dipole_lj_sf_ext.cpp lal_base_dipole.h
$(OCL) -o $@ -c lal_dipole_lj_sf_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/colloid_cl.h: lal_colloid.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh colloid $(PRE1_H) lal_colloid.cu $(OBJ_DIR)/colloid_cl.h;
$(OBJ_DIR)/lal_colloid.o: $(ALL_H) lal_colloid.h lal_colloid.cpp $(OBJ_DIR)/colloid_cl.h $(OBJ_DIR)/colloid_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_colloid.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_colloid_ext.o: $(ALL_H) lal_colloid.h lal_colloid_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_colloid_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/gauss_cl.h: lal_gauss.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh gauss $(PRE1_H) lal_gauss.cu $(OBJ_DIR)/gauss_cl.h;
$(OBJ_DIR)/lal_gauss.o: $(ALL_H) lal_gauss.h lal_gauss.cpp $(OBJ_DIR)/gauss_cl.h $(OBJ_DIR)/gauss_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_gauss.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_gauss_ext.o: $(ALL_H) lal_gauss.h lal_gauss_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_gauss_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/yukawa_colloid_cl.h: lal_yukawa_colloid.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh yukawa_colloid $(PRE1_H) lal_yukawa_colloid.cu $(OBJ_DIR)/yukawa_colloid_cl.h;
$(OBJ_DIR)/lal_yukawa_colloid.o: $(ALL_H) lal_yukawa_colloid.h lal_yukawa_colloid.cpp $(OBJ_DIR)/yukawa_colloid_cl.h $(OBJ_DIR)/yukawa_colloid_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_yukawa_colloid.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_yukawa_colloid_ext.o: $(ALL_H) lal_yukawa_colloid.h lal_yukawa_colloid_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_yukawa_colloid_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_coul_debye_cl.h: lal_lj_coul_debye.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj_coul_debye $(PRE1_H) lal_lj_coul_debye.cu $(OBJ_DIR)/lj_coul_debye_cl.h;
$(OBJ_DIR)/lal_lj_coul_debye.o: $(ALL_H) lal_lj_coul_debye.h lal_lj_coul_debye.cpp $(OBJ_DIR)/lj_coul_debye_cl.h $(OBJ_DIR)/lj_coul_debye_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_lj_coul_debye.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_coul_debye_ext.o: $(ALL_H) lal_lj_coul_debye.h lal_lj_coul_debye_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_lj_coul_debye_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/coul_dsf_cl.h: lal_coul_dsf.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh coul_dsf $(PRE1_H) lal_coul_dsf.cu $(OBJ_DIR)/coul_dsf_cl.h;
$(OBJ_DIR)/lal_coul_dsf.o: $(ALL_H) lal_coul_dsf.h lal_coul_dsf.cpp $(OBJ_DIR)/coul_dsf_cl.h $(OBJ_DIR)/coul_dsf_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_coul_dsf.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_coul_dsf_ext.o: $(ALL_H) lal_coul_dsf.h lal_coul_dsf_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_coul_dsf_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/sw_cl.h: lal_sw.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh sw $(PRE1_H) lal_sw.cu $(OBJ_DIR)/sw_cl.h;
$(OBJ_DIR)/lal_sw.o: $(ALL_H) lal_sw.h lal_sw.cpp $(OBJ_DIR)/sw_cl.h $(OBJ_DIR)/sw_cl.h $(OBJ_DIR)/lal_base_three.o
$(OCL) -o $@ -c lal_sw.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_sw_ext.o: $(ALL_H) lal_sw.h lal_sw_ext.cpp lal_base_three.h
$(OCL) -o $@ -c lal_sw_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/beck_cl.h: lal_beck.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh beck $(PRE1_H) lal_beck.cu $(OBJ_DIR)/beck_cl.h;
$(OBJ_DIR)/lal_beck.o: $(ALL_H) lal_beck.h lal_beck.cpp $(OBJ_DIR)/beck_cl.h $(OBJ_DIR)/beck_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_beck.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_beck_ext.o: $(ALL_H) lal_beck.h lal_beck_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_beck_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/mie_cl.h: lal_mie.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh mie $(PRE1_H) lal_mie.cu $(OBJ_DIR)/mie_cl.h;
$(OBJ_DIR)/lal_mie.o: $(ALL_H) lal_mie.h lal_mie.cpp $(OBJ_DIR)/mie_cl.h $(OBJ_DIR)/mie_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_mie.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_mie_ext.o: $(ALL_H) lal_mie.h lal_mie_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_mie_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/soft_cl.h: lal_soft.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh soft $(PRE1_H) lal_soft.cu $(OBJ_DIR)/soft_cl.h;
$(OBJ_DIR)/lal_soft.o: $(ALL_H) lal_soft.h lal_soft.cpp $(OBJ_DIR)/soft_cl.h $(OBJ_DIR)/soft_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_soft.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_soft_ext.o: $(ALL_H) lal_soft.h lal_soft_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_soft_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_coul_msm_cl.h: lal_lj_coul_msm.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj_coul_msm $(PRE1_H) lal_lj_coul_msm.cu $(OBJ_DIR)/lj_coul_msm_cl.h;
$(OBJ_DIR)/lal_lj_coul_msm.o: $(ALL_H) lal_lj_coul_msm.h lal_lj_coul_msm.cpp $(OBJ_DIR)/lj_coul_msm_cl.h $(OBJ_DIR)/lj_coul_msm_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_lj_coul_msm.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_coul_msm_ext.o: $(ALL_H) lal_lj_coul_msm.h lal_lj_coul_msm_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_lj_coul_msm_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_gromacs_cl.h: lal_lj_gromacs.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj_gromacs $(PRE1_H) lal_lj_gromacs.cu $(OBJ_DIR)/lj_gromacs_cl.h;
$(OBJ_DIR)/lal_lj_gromacs.o: $(ALL_H) lal_lj_gromacs.h lal_lj_gromacs.cpp $(OBJ_DIR)/lj_gromacs_cl.h $(OBJ_DIR)/lj_gromacs_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_lj_gromacs.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_gromacs_ext.o: $(ALL_H) lal_lj_gromacs.h lal_lj_gromacs_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_lj_gromacs_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/dpd_cl.h: lal_dpd.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh dpd $(PRE1_H) lal_dpd.cu $(OBJ_DIR)/dpd_cl.h;
$(OBJ_DIR)/lal_dpd.o: $(ALL_H) lal_dpd.h lal_dpd.cpp $(OBJ_DIR)/dpd_cl.h $(OBJ_DIR)/dpd_cl.h $(OBJ_DIR)/lal_base_dpd.o
$(OCL) -o $@ -c lal_dpd.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_dpd_ext.o: $(ALL_H) lal_dpd.h lal_dpd_ext.cpp lal_base_dpd.h
$(OCL) -o $@ -c lal_dpd_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/tersoff_cl.h: lal_tersoff.cu lal_tersoff_extra.h $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh tersoff $(PRE1_H) lal_tersoff_extra.h lal_tersoff.cu $(OBJ_DIR)/tersoff_cl.h;
$(OBJ_DIR)/lal_tersoff.o: $(ALL_H) lal_tersoff.h lal_tersoff.cpp $(OBJ_DIR)/tersoff_cl.h $(OBJ_DIR)/tersoff_cl.h $(OBJ_DIR)/lal_base_three.o
$(OCL) -o $@ -c lal_tersoff.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_tersoff_ext.o: $(ALL_H) lal_tersoff.h lal_tersoff_ext.cpp lal_base_three.h
$(OCL) -o $@ -c lal_tersoff_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/tersoff_zbl_cl.h: lal_tersoff_zbl.cu lal_tersoff_zbl_extra.h $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh tersoff_zbl $(PRE1_H) lal_tersoff_zbl_extra.h lal_tersoff_zbl.cu $(OBJ_DIR)/tersoff_zbl_cl.h;
$(OBJ_DIR)/lal_tersoff_zbl.o: $(ALL_H) lal_tersoff_zbl.h lal_tersoff_zbl.cpp $(OBJ_DIR)/tersoff_zbl_cl.h $(OBJ_DIR)/tersoff_zbl_cl.h $(OBJ_DIR)/lal_base_three.o
$(OCL) -o $@ -c lal_tersoff_zbl.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_tersoff_zbl_ext.o: $(ALL_H) lal_tersoff_zbl.h lal_tersoff_zbl_ext.cpp lal_base_three.h
$(OCL) -o $@ -c lal_tersoff_zbl_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/tersoff_mod_cl.h: lal_tersoff_mod.cu lal_tersoff_mod_extra.h $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh tersoff_mod $(PRE1_H) lal_tersoff_mod_extra.h lal_tersoff_mod.cu $(OBJ_DIR)/tersoff_mod_cl.h;
$(OBJ_DIR)/lal_tersoff_mod.o: $(ALL_H) lal_tersoff_mod.h lal_tersoff_mod.cpp $(OBJ_DIR)/tersoff_mod_cl.h $(OBJ_DIR)/tersoff_mod_cl.h $(OBJ_DIR)/lal_base_three.o
$(OCL) -o $@ -c lal_tersoff_mod.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_tersoff_mod_ext.o: $(ALL_H) lal_tersoff_mod.h lal_tersoff_mod_ext.cpp lal_base_three.h
$(OCL) -o $@ -c lal_tersoff_mod_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/coul_cl.h: lal_coul.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh coul $(PRE1_H) lal_coul.cu $(OBJ_DIR)/coul_cl.h;
$(OBJ_DIR)/lal_coul.o: $(ALL_H) lal_coul.h lal_coul.cpp $(OBJ_DIR)/coul_cl.h $(OBJ_DIR)/coul_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_coul.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_coul_ext.o: $(ALL_H) lal_coul.h lal_coul_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_coul_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/coul_debye_cl.h: lal_coul_debye.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh coul_debye $(PRE1_H) lal_coul_debye.cu $(OBJ_DIR)/coul_debye_cl.h;
$(OBJ_DIR)/lal_coul_debye.o: $(ALL_H) lal_coul_debye.h lal_coul_debye.cpp $(OBJ_DIR)/coul_debye_cl.h $(OBJ_DIR)/coul_debye_cl.h $(OBJ_DIR)/lal_base_charge.o
$(OCL) -o $@ -c lal_coul_debye.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_coul_debye_ext.o: $(ALL_H) lal_coul_debye.h lal_coul_debye_ext.cpp lal_base_charge.h
$(OCL) -o $@ -c lal_coul_debye_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/zbl_cl.h: lal_zbl.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh zbl $(PRE1_H) lal_zbl.cu $(OBJ_DIR)/zbl_cl.h;
$(OBJ_DIR)/lal_zbl.o: $(ALL_H) lal_zbl.h lal_zbl.cpp $(OBJ_DIR)/zbl_cl.h $(OBJ_DIR)/zbl_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_zbl.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_zbl_ext.o: $(ALL_H) lal_zbl.h lal_zbl_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_zbl_ext.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lj_cubic_cl.h: lal_lj_cubic.cu $(PRE1_H)
$(BSH) ./geryon/file_to_cstr.sh lj_cubic $(PRE1_H) lal_lj_cubic.cu $(OBJ_DIR)/lj_cubic_cl.h;
$(OBJ_DIR)/lal_lj_cubic.o: $(ALL_H) lal_lj_cubic.h lal_lj_cubic.cpp $(OBJ_DIR)/lj_cubic_cl.h $(OBJ_DIR)/lj_cubic_cl.h $(OBJ_DIR)/lal_base_atomic.o
$(OCL) -o $@ -c lal_lj_cubic.cpp -I$(OBJ_DIR)
$(OBJ_DIR)/lal_lj_cubic_ext.o: $(ALL_H) lal_lj_cubic.h lal_lj_cubic_ext.cpp lal_base_atomic.h
$(OCL) -o $@ -c lal_lj_cubic_ext.cpp -I$(OBJ_DIR)
$(BIN_DIR)/ocl_get_devices: ./geryon/ucl_get_devices.cpp
$(OCL) -o $@ ./geryon/ucl_get_devices.cpp -DUCL_OPENCL $(OCL_LINK)
$(OCL_LIB): $(OBJS) $(PTXS)
$(AR) -crusv $(OCL_LIB) $(OBJS)
@cp $(EXTRAMAKE) Makefile.lammps
opencl: $(OCL_EXECS)
clean:
-rm -rf $(EXECS) $(OCL_EXECS) $(OCL_LIB) $(OBJS) $(KERS) *.linkinfo
veryclean: clean
-rm -rf *~ *.linkinfo
diff --git a/lib/gpu/README b/lib/gpu/README
index 45c8ce49b..b26897e88 100644
--- a/lib/gpu/README
+++ b/lib/gpu/README
@@ -1,217 +1,222 @@
--------------------------------
LAMMPS ACCELERATOR LIBRARY
--------------------------------
W. Michael Brown (ORNL)
Trung Dac Nguyen (ORNL)
Peng Wang (NVIDIA)
Axel Kohlmeyer (Temple)
Steve Plimpton (SNL)
Inderaj Bains (NVIDIA)
-------------------------------------------------------------------
This directory has source files to build a library that LAMMPS
links against when using the GPU package.
This library must be built with a C++ compiler, before LAMMPS is
built, so LAMMPS can link against it.
+You can type "make lib-gpu" from the src directory to see help on how
+to build this library via make commands, or you can do the same thing
+by typing "python Install.py" from within this directory, or you can
+do it manually by following the instructions below.
+
Build the library using one of the provided Makefile.* files or create
your own, specific to your compiler and system. For example:
make -f Makefile.linux
When you are done building this library, two files should
exist in this directory:
libgpu.a the library LAMMPS will link against
Makefile.lammps settings the LAMMPS Makefile will import
Makefile.lammps is created by the make command, by copying one of the
Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
Makefile.* files.
IMPORTANT: You should examine the final Makefile.lammps to insure it is
correct for your system, else the LAMMPS build can fail.
IMPORTANT: If you re-build the library, e.g. for a different precision
(see below), you should do a "make clean" first, e.g. make -f
Makefile.linux clean, to insure all previous derived files are removed
before the new build is done.
Makefile.lammps has settings for 3 variables:
user-gpu_SYSINC = leave blank for this package
user-gpu_SYSLIB = CUDA libraries needed by this package
user-gpu_SYSPATH = path(s) to where those libraries are
Because you have the CUDA compilers on your system, you should have
the needed libraries. If the CUDA developement tools were installed
in the standard manner, the settings in the Makefile.lammps.standard
file should work.
-------------------------------------------------------------------
GENERAL NOTES
--------------------------------
This library, libgpu.a, provides routines for GPU acceleration
of certain LAMMPS styles and neighbor list builds. Compilation of this
library requires installing the CUDA GPU driver and CUDA toolkit for
your operating system. Installation of the CUDA SDK is not necessary.
In addition to the LAMMPS library, the binary nvc_get_devices will also
be built. This can be used to query the names and properties of GPU
devices on your system. A Makefile for OpenCL compilation is provided,
but support for OpenCL use is not currently provided by the developers.
Details of the implementation are provided in:
----
Brown, W.M., Wang, P. Plimpton, S.J., Tharrington, A.N. Implementing
Molecular Dynamics on Hybrid High Performance Computers - Short Range
Forces. Computer Physics Communications. 2011. 182: p. 898-911.
and
Brown, W.M., Kohlmeyer, A. Plimpton, S.J., Tharrington, A.N. Implementing
Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle
Particle-Mesh. Computer Physics Communications. 2012. 183: p. 449-459.
and
Brown, W.M., Masako, Y. Implementing Molecular Dynamics on Hybrid High
Performance Computers - Three-Body Potentials. Computer Physics Communications.
2013. 184: p. 2785–2793.
----
NOTE: Installation of the CUDA SDK is not required.
Current styles supporting GPU acceleration:
1 beck
2 born/coul/long
3 born/coul/wolf
4 born
5 buck/coul/cut
6 buck/coul/long
7 buck
8 colloid
9 coul/dsf
10 coul/long
11 eam/alloy
12 eam/fs
13 eam
14 gauss
15 gayberne
16 lj96/cut
17 lj/charmm/coul/long
18 lj/class2/coul/long
19 lj/class2
20 lj/cut/coul/cut
21 lj/cut/coul/debye
22 lj/cut/coul/dsf
23 lj/cut/coul/long
24 lj/cut/coul/msm
25 lj/cut/dipole/cut
26 lj/cut
27 lj/expand
28 lj/gromacs
29 lj/sdk/coul/long
30 lj/sdk
31 lj/sf/dipole/sf
32 mie/cut
33 morse
34 resquared
35 soft
36 sw
37 table
38 yukawa/colloid
39 yukawa
40 pppm
MULTIPLE LAMMPS PROCESSES
--------------------------------
Multiple LAMMPS MPI processes can share GPUs on the system, but multiple
GPUs cannot be utilized by a single MPI process. In many cases, the
best performance will be obtained by running as many MPI processes as
CPU cores available with the condition that the number of MPI processes
is an integer multiple of the number of GPUs being used. See the
LAMMPS user manual for details on running with GPU acceleration.
BUILDING AND PRECISION MODES
--------------------------------
To build, edit the CUDA_ARCH, CUDA_PRECISION, CUDA_HOME variables in one of
the Makefiles. CUDA_ARCH should be set based on the compute capability of
your GPU. This can be verified by running the nvc_get_devices executable after
the build is complete. Additionally, the GPU package must be installed and
compiled for LAMMPS. This may require editing the gpu_SYSPATH variable in the
LAMMPS makefile.
Please note that the GPU library accesses the CUDA driver library directly,
so it needs to be linked not only to the CUDA runtime library (libcudart.so)
that ships with the CUDA toolkit, but also with the CUDA driver library
(libcuda.so) that ships with the Nvidia driver. If you are compiling LAMMPS
on the head node of a GPU cluster, this library may not be installed,
so you may need to copy it over from one of the compute nodes (best into
this directory).
The gpu library supports 3 precision modes as determined by
the CUDA_PRECISION variable:
- CUDA_PREC = -D_SINGLE_SINGLE # Single precision for all calculations
- CUDA_PREC = -D_DOUBLE_DOUBLE # Double precision for all calculations
- CUDA_PREC = -D_SINGLE_DOUBLE # Accumulation of forces, etc. in double
+ CUDA_PRECISION = -D_SINGLE_SINGLE # Single precision for all calculations
+ CUDA_PRECISION = -D_DOUBLE_DOUBLE # Double precision for all calculations
+ CUDA_PRECISION = -D_SINGLE_DOUBLE # Accumulation of forces, etc. in double
NOTE: PPPM acceleration can only be run on GPUs with compute capability>=1.1.
You will get the error "GPU library not compiled for this accelerator."
when attempting to run PPPM on a GPU with compute capability 1.0.
NOTE: Double precision is only supported on certain GPUs (with
compute capability>=1.3). If you compile the GPU library for
a GPU with compute capability 1.1 and 1.2, then only single
precision FFTs are supported, i.e. LAMMPS has to be compiled
with -DFFT_SINGLE. For details on configuring FFT support in
LAMMPS, see http://lammps.sandia.gov/doc/Section_start.html#2_2_4
NOTE: For graphics cards with compute capability>=1.3 (e.g. Tesla C1060),
make sure that -arch=sm_13 is set on the CUDA_ARCH line.
NOTE: For newer graphics card (a.k.a. "Fermi", e.g. Tesla C2050), make
sure that either -arch=sm_20 or -arch=sm_21 is set on the
CUDA_ARCH line, depending on hardware and CUDA toolkit version.
NOTE: The gayberne/gpu pair style will only be installed if the ASPHERE
package has been installed.
NOTE: The cg/cmm/gpu and cg/cmm/coul/long/gpu pair styles will only be
installed if the USER-CG-CMM package has been installed.
NOTE: The lj/cut/coul/long/gpu, cg/cmm/coul/long/gpu, coul/long/gpu,
lj/charmm/coul/long/gpu and pppm/gpu styles will only be installed
if the KSPACE package has been installed.
NOTE: The system-specific setting LAMMPS_SMALLBIG (default), LAMMPS_BIGBIG,
or LAMMPS_SMALLSMALL if specified when building LAMMPS (i.e. in
src/MAKE/Makefile.foo) should be consistent with that specified
when building libgpu.a (i.e. by LMP_INC in the lib/gpu/Makefile.bar).
EXAMPLE BUILD PROCESS
--------------------------------
cd ~/lammps/lib/gpu
emacs Makefile.linux
make -f Makefile.linux
./nvc_get_devices
cd ../../src
emacs ./MAKE/Makefile.linux
make yes-asphere
make yes-kspace
make yes-gpu
make linux
diff --git a/lib/gpu/lal_cg_cmm.cpp b/lib/gpu/lal_lj_sdk.cpp
similarity index 85%
rename from lib/gpu/lal_cg_cmm.cpp
rename to lib/gpu/lal_lj_sdk.cpp
index d361e32b0..618555e38 100644
--- a/lib/gpu/lal_cg_cmm.cpp
+++ b/lib/gpu/lal_lj_sdk.cpp
@@ -1,154 +1,154 @@
/***************************************************************************
- cg_cmm.cpp
+ lj_sdk.cpp
-------------------
W. Michael Brown (ORNL)
Class for acceleration of the lj/sdk/cut pair style
__________________________________________________________________________
This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
__________________________________________________________________________
begin :
email : brownw@ornl.gov
***************************************************************************/
#if defined(USE_OPENCL)
-#include "cg_cmm_cl.h"
+#include "lj_sdk_cl.h"
#elif defined(USE_CUDART)
-const char *cg_cmm=0;
+const char *lj_sdk=0;
#else
-#include "cg_cmm_cubin.h"
+#include "lj_sdk_cubin.h"
#endif
-#include "lal_cg_cmm.h"
+#include "lal_lj_sdk.h"
#include <cassert>
using namespace LAMMPS_AL;
#define CGCMMT CGCMM<numtyp, acctyp>
extern Device<PRECISION,ACC_PRECISION> device;
template <class numtyp, class acctyp>
CGCMMT::CGCMM() : BaseAtomic<numtyp,acctyp>(), _allocated(false) {
}
template <class numtyp, class acctyp>
CGCMMT::~CGCMM() {
clear();
}
template <class numtyp, class acctyp>
int CGCMMT::bytes_per_atom(const int max_nbors) const {
return this->bytes_per_atom_atomic(max_nbors);
}
template <class numtyp, class acctyp>
int CGCMMT::init(const int ntypes, double **host_cutsq,
int **host_cg_type, double **host_lj1,
double **host_lj2, double **host_lj3,
double **host_lj4, double **host_offset,
double *host_special_lj, const int nlocal,
const int nall, const int max_nbors,
const int maxspecial, const double cell_size,
const double gpu_split, FILE *_screen) {
int success;
success=this->init_atomic(nlocal,nall,max_nbors,maxspecial,cell_size,gpu_split,
- _screen,cg_cmm,"k_cg_cmm");
+ _screen,lj_sdk,"k_lj_sdk");
if (success!=0)
return success;
// If atom type constants fit in shared memory use fast kernel
- int cmm_types=ntypes;
+ int sdk_types=ntypes;
shared_types=false;
int max_shared_types=this->device->max_shared_types();
- if (cmm_types<=max_shared_types && this->_block_size>=max_shared_types) {
- cmm_types=max_shared_types;
+ if (sdk_types<=max_shared_types && this->_block_size>=max_shared_types) {
+ sdk_types=max_shared_types;
shared_types=true;
}
- _cmm_types=cmm_types;
+ _sdk_types=sdk_types;
// Allocate a host write buffer for data initialization
- UCL_H_Vec<numtyp> host_write(cmm_types*cmm_types*32,*(this->ucl_device),
+ UCL_H_Vec<numtyp> host_write(sdk_types*sdk_types*32,*(this->ucl_device),
UCL_WRITE_ONLY);
- for (int i=0; i<cmm_types*cmm_types; i++)
+ for (int i=0; i<sdk_types*sdk_types; i++)
host_write[i]=0.0;
- lj1.alloc(cmm_types*cmm_types,*(this->ucl_device),UCL_READ_ONLY);
- this->atom->type_pack4(ntypes,cmm_types,lj1,host_write,host_cutsq,
+ lj1.alloc(sdk_types*sdk_types,*(this->ucl_device),UCL_READ_ONLY);
+ this->atom->type_pack4(ntypes,sdk_types,lj1,host_write,host_cutsq,
host_cg_type,host_lj1,host_lj2);
- lj3.alloc(cmm_types*cmm_types,*(this->ucl_device),UCL_READ_ONLY);
- this->atom->type_pack4(ntypes,cmm_types,lj3,host_write,host_lj3,host_lj4,
+ lj3.alloc(sdk_types*sdk_types,*(this->ucl_device),UCL_READ_ONLY);
+ this->atom->type_pack4(ntypes,sdk_types,lj3,host_write,host_lj3,host_lj4,
host_offset);
UCL_H_Vec<double> dview;
sp_lj.alloc(4,*(this->ucl_device),UCL_READ_ONLY);
dview.view(host_special_lj,4,*(this->ucl_device));
ucl_copy(sp_lj,dview,false);
_allocated=true;
this->_max_bytes=lj1.row_bytes()+lj3.row_bytes()+sp_lj.row_bytes();
return 0;
}
template <class numtyp, class acctyp>
void CGCMMT::clear() {
if (!_allocated)
return;
_allocated=false;
lj1.clear();
lj3.clear();
sp_lj.clear();
this->clear_atomic();
}
template <class numtyp, class acctyp>
double CGCMMT::host_memory_usage() const {
return this->host_memory_usage_atomic()+sizeof(CGCMM<numtyp,acctyp>);
}
// ---------------------------------------------------------------------------
// Calculate energies, forces, and torques
// ---------------------------------------------------------------------------
template <class numtyp, class acctyp>
void CGCMMT::loop(const bool _eflag, const bool _vflag) {
// Compute the block size and grid size to keep all cores busy
const int BX=this->block_size();
int eflag, vflag;
if (_eflag)
eflag=1;
else
eflag=0;
if (_vflag)
vflag=1;
else
vflag=0;
int GX=static_cast<int>(ceil(static_cast<double>(this->ans->inum())/
(BX/this->_threads_per_atom)));
int ainum=this->ans->inum();
int nbor_pitch=this->nbor->nbor_pitch();
this->time_pair.start();
if (shared_types) {
this->k_pair_fast.set_size(GX,BX);
this->k_pair_fast.run(&this->atom->x, &lj1, &lj3, &sp_lj,
&this->nbor->dev_nbor, &this->_nbor_data->begin(),
&this->ans->force, &this->ans->engv, &eflag,
&vflag, &ainum, &nbor_pitch,
&this->_threads_per_atom);
} else {
this->k_pair.set_size(GX,BX);
this->k_pair.run(&this->atom->x, &lj1, &lj3,
- &_cmm_types, &sp_lj, &this->nbor->dev_nbor,
+ &_sdk_types, &sp_lj, &this->nbor->dev_nbor,
&this->_nbor_data->begin(), &this->ans->force,
&this->ans->engv, &eflag, &vflag, &ainum,
&nbor_pitch, &this->_threads_per_atom);
}
this->time_pair.stop();
}
template class CGCMM<PRECISION,ACC_PRECISION>;
diff --git a/lib/gpu/lal_cg_cmm.cu b/lib/gpu/lal_lj_sdk.cu
similarity index 97%
rename from lib/gpu/lal_cg_cmm.cu
rename to lib/gpu/lal_lj_sdk.cu
index 70d2ab609..01b2cdd18 100644
--- a/lib/gpu/lal_cg_cmm.cu
+++ b/lib/gpu/lal_lj_sdk.cu
@@ -1,216 +1,216 @@
// **************************************************************************
-// cg_cmm.cu
+// lj_sdk.cu
// -------------------
// W. Michael Brown (ORNL)
//
// Device code for acceleration of the lj/sdk pair style
//
// __________________________________________________________________________
// This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
// __________________________________________________________________________
//
// begin :
// email : brownw@ornl.gov
// ***************************************************************************/
#ifdef NV_KERNEL
#include "lal_aux_fun1.h"
#ifndef _DOUBLE_DOUBLE
texture<float4> pos_tex;
#else
texture<int4,1> pos_tex;
#endif
#else
#define pos_tex x_
#endif
-__kernel void k_cg_cmm(const __global numtyp4 *restrict x_,
+__kernel void k_lj_sdk(const __global numtyp4 *restrict x_,
const __global numtyp4 *restrict lj1,
const __global numtyp4 *restrict lj3,
const int lj_types,
const __global numtyp *restrict sp_lj_in,
const __global int *dev_nbor,
const __global int *dev_packed,
__global acctyp4 *restrict ans,
__global acctyp *restrict engv,
const int eflag, const int vflag, const int inum,
const int nbor_pitch, const int t_per_atom) {
int tid, ii, offset;
atom_info(t_per_atom,ii,tid,offset);
__local numtyp sp_lj[4];
sp_lj[0]=sp_lj_in[0];
sp_lj[1]=sp_lj_in[1];
sp_lj[2]=sp_lj_in[2];
sp_lj[3]=sp_lj_in[3];
acctyp energy=(acctyp)0;
acctyp4 f;
f.x=(acctyp)0; f.y=(acctyp)0; f.z=(acctyp)0;
acctyp virial[6];
for (int i=0; i<6; i++)
virial[i]=(acctyp)0;
if (ii<inum) {
int nbor, nbor_end;
int i, numj;
__local int n_stride;
nbor_info(dev_nbor,dev_packed,nbor_pitch,t_per_atom,ii,offset,i,numj,
n_stride,nbor_end,nbor);
numtyp4 ix; fetch4(ix,i,pos_tex); //x_[i];
int itype=ix.w;
numtyp factor_lj;
for ( ; nbor<nbor_end; nbor+=n_stride) {
int j=dev_packed[nbor];
factor_lj = sp_lj[sbmask(j)];
j &= NEIGHMASK;
numtyp4 jx; fetch4(jx,j,pos_tex); //x_[j];
int jtype=jx.w;
// Compute r12
numtyp delx = ix.x-jx.x;
numtyp dely = ix.y-jx.y;
numtyp delz = ix.z-jx.z;
numtyp r2inv = delx*delx+dely*dely+delz*delz;
int mtype=itype*lj_types+jtype;
if (r2inv<lj1[mtype].x) {
r2inv=ucl_recip(r2inv);
numtyp inv1,inv2;
if (lj1[mtype].y == 2) {
inv1=r2inv*r2inv;
inv2=inv1*inv1;
} else if (lj1[mtype].y == 1) {
inv2=r2inv*ucl_sqrt(r2inv);
inv1=inv2*inv2;
} else {
inv1=r2inv*r2inv*r2inv;
inv2=inv1;
}
numtyp force = factor_lj*r2inv*inv1*(lj1[mtype].z*inv2-lj1[mtype].w);
f.x+=delx*force;
f.y+=dely*force;
f.z+=delz*force;
if (eflag>0)
energy += factor_lj*inv1*(lj3[mtype].x*inv2-lj3[mtype].y)-
lj3[mtype].z;
if (vflag>0) {
virial[0] += delx*delx*force;
virial[1] += dely*dely*force;
virial[2] += delz*delz*force;
virial[3] += delx*dely*force;
virial[4] += delx*delz*force;
virial[5] += dely*delz*force;
}
}
} // for nbor
store_answers(f,energy,virial,ii,inum,tid,t_per_atom,offset,eflag,vflag,
ans,engv);
} // if ii
}
-__kernel void k_cg_cmm_fast(const __global numtyp4 *restrict x_,
+__kernel void k_lj_sdk_fast(const __global numtyp4 *restrict x_,
const __global numtyp4 *restrict lj1_in,
const __global numtyp4 *restrict lj3_in,
const __global numtyp *restrict sp_lj_in,
const __global int *dev_nbor,
const __global int *dev_packed,
__global acctyp4 *restrict ans,
__global acctyp *restrict engv,
const int eflag, const int vflag, const int inum,
const int nbor_pitch, const int t_per_atom) {
int tid, ii, offset;
atom_info(t_per_atom,ii,tid,offset);
__local numtyp4 lj1[MAX_SHARED_TYPES*MAX_SHARED_TYPES];
__local numtyp4 lj3[MAX_SHARED_TYPES*MAX_SHARED_TYPES];
__local numtyp sp_lj[4];
if (tid<4)
sp_lj[tid]=sp_lj_in[tid];
if (tid<MAX_SHARED_TYPES*MAX_SHARED_TYPES) {
lj1[tid]=lj1_in[tid];
if (eflag>0)
lj3[tid]=lj3_in[tid];
}
acctyp energy=(acctyp)0;
acctyp4 f;
f.x=(acctyp)0; f.y=(acctyp)0; f.z=(acctyp)0;
acctyp virial[6];
for (int i=0; i<6; i++)
virial[i]=(acctyp)0;
__syncthreads();
if (ii<inum) {
int nbor, nbor_end;
int i, numj;
__local int n_stride;
nbor_info(dev_nbor,dev_packed,nbor_pitch,t_per_atom,ii,offset,i,numj,
n_stride,nbor_end,nbor);
numtyp4 ix; fetch4(ix,i,pos_tex); //x_[i];
int iw=ix.w;
int itype=fast_mul((int)MAX_SHARED_TYPES,iw);
numtyp factor_lj;
for ( ; nbor<nbor_end; nbor+=n_stride) {
int j=dev_packed[nbor];
factor_lj = sp_lj[sbmask(j)];
j &= NEIGHMASK;
numtyp4 jx; fetch4(jx,j,pos_tex); //x_[j];
int mtype=itype+jx.w;
// Compute r12
numtyp delx = ix.x-jx.x;
numtyp dely = ix.y-jx.y;
numtyp delz = ix.z-jx.z;
numtyp r2inv = delx*delx+dely*dely+delz*delz;
if (r2inv<lj1[mtype].x) {
r2inv=ucl_recip(r2inv);
numtyp inv1,inv2;
if (lj1[mtype].y == (numtyp)2) {
inv1=r2inv*r2inv;
inv2=inv1*inv1;
} else if (lj1[mtype].y == (numtyp)1) {
inv2=r2inv*ucl_sqrt(r2inv);
inv1=inv2*inv2;
} else {
inv1=r2inv*r2inv*r2inv;
inv2=inv1;
}
numtyp force = factor_lj*r2inv*inv1*(lj1[mtype].z*inv2-lj1[mtype].w);
f.x+=delx*force;
f.y+=dely*force;
f.z+=delz*force;
if (eflag>0)
energy += factor_lj*inv1*(lj3[mtype].x*inv2-lj3[mtype].y)-
lj3[mtype].z;
if (vflag>0) {
virial[0] += delx*delx*force;
virial[1] += dely*dely*force;
virial[2] += delz*delz*force;
virial[3] += delx*dely*force;
virial[4] += delx*delz*force;
virial[5] += dely*delz*force;
}
}
} // for nbor
store_answers(f,energy,virial,ii,inum,tid,t_per_atom,offset,eflag,vflag,
ans,engv);
} // if ii
}
diff --git a/lib/gpu/lal_cg_cmm.h b/lib/gpu/lal_lj_sdk.h
similarity index 97%
rename from lib/gpu/lal_cg_cmm.h
rename to lib/gpu/lal_lj_sdk.h
index b7895b589..ac2b9aafe 100644
--- a/lib/gpu/lal_cg_cmm.h
+++ b/lib/gpu/lal_lj_sdk.h
@@ -1,79 +1,79 @@
/***************************************************************************
- cg_cmm.h
+ lj_sdk.h
-------------------
W. Michael Brown (ORNL)
Class for acceleration of the lj/sdk pair style
__________________________________________________________________________
This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
__________________________________________________________________________
begin :
email : brownw@ornl.gov
***************************************************************************/
#ifndef LAL_CG_CMM_H
#define LAL_CG_CMM_H
#include "lal_base_atomic.h"
namespace LAMMPS_AL {
template <class numtyp, class acctyp>
class CGCMM : public BaseAtomic<numtyp, acctyp> {
public:
CGCMM();
~CGCMM();
/// Clear any previous data and set up for a new LAMMPS run
/** \param max_nbors initial number of rows in the neighbor matrix
* \param cell_size cutoff + skin
* \param gpu_split fraction of particles handled by device
*
* Returns:
* - 0 if successfull
* - -1 if fix gpu not found
* - -3 if there is an out of memory error
* - -4 if the GPU library was not compiled for GPU
* - -5 Double precision is not supported on card **/
int init(const int ntypes, double **host_cutsq, int **host_cg_type,
double **host_lj1, double **host_lj2, double **host_lj3,
double **host_lj4, double **host_offset, double *host_special_lj,
const int nlocal, const int nall, const int max_nbors,
const int maxspecial, const double cell_size,
const double gpu_split, FILE *screen);
/// Clear all host and device data
/** \note This is called at the beginning of the init() routine **/
void clear();
/// Returns memory usage on device per atom
int bytes_per_atom(const int max_nbors) const;
/// Total host memory used by library for pair style
double host_memory_usage() const;
// --------------------------- TYPE DATA --------------------------
/// lj1.x = cutsq, lj1.y=cg_type, lj1.z = lj1, lj1.w = lj2
UCL_D_Vec<numtyp4> lj1;
/// lj3.x = lj3, lj3.y = lj4, lj3.z = offset
UCL_D_Vec<numtyp4> lj3;
/// Special LJ values
UCL_D_Vec<numtyp> sp_lj;
/// If atom type constants fit in shared memory, use fast kernels
bool shared_types;
/// Number of atom types
- int _cmm_types;
+ int _sdk_types;
private:
bool _allocated;
void loop(const bool _eflag, const bool _vflag);
};
}
#endif
diff --git a/lib/gpu/lal_cg_cmm_ext.cpp b/lib/gpu/lal_lj_sdk_ext.cpp
similarity index 93%
rename from lib/gpu/lal_cg_cmm_ext.cpp
rename to lib/gpu/lal_lj_sdk_ext.cpp
index b6fc110b1..386106161 100644
--- a/lib/gpu/lal_cg_cmm_ext.cpp
+++ b/lib/gpu/lal_lj_sdk_ext.cpp
@@ -1,121 +1,121 @@
/***************************************************************************
- cg_cmm.h
+ lj_sdk.h
-------------------
W. Michael Brown (ORNL)
Functions for LAMMPS access to lj/sdk pair acceleration routines
__________________________________________________________________________
This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
__________________________________________________________________________
begin :
email : brownw@ornl.gov
***************************************************************************/
#include <iostream>
#include <cassert>
#include <math.h>
-#include "lal_cg_cmm.h"
+#include "lal_lj_sdk.h"
using namespace std;
using namespace LAMMPS_AL;
static CGCMM<PRECISION,ACC_PRECISION> CMMMF;
// ---------------------------------------------------------------------------
// Allocate memory on host and device and copy constants to device
// ---------------------------------------------------------------------------
-int cmm_gpu_init(const int ntypes, double **cutsq, int **cg_types,
+int sdk_gpu_init(const int ntypes, double **cutsq, int **cg_types,
double **host_lj1, double **host_lj2, double **host_lj3,
double **host_lj4, double **offset, double *special_lj,
const int inum, const int nall, const int max_nbors,
const int maxspecial, const double cell_size, int &gpu_mode,
FILE *screen) {
CMMMF.clear();
gpu_mode=CMMMF.device->gpu_mode();
double gpu_split=CMMMF.device->particle_split();
int first_gpu=CMMMF.device->first_device();
int last_gpu=CMMMF.device->last_device();
int world_me=CMMMF.device->world_me();
int gpu_rank=CMMMF.device->gpu_rank();
int procs_per_gpu=CMMMF.device->procs_per_gpu();
CMMMF.device->init_message(screen,"lj/sdk",first_gpu,last_gpu);
bool message=false;
if (CMMMF.device->replica_me()==0 && screen)
message=true;
if (message) {
fprintf(screen,"Initializing Device and compiling on process 0...");
fflush(screen);
}
int init_ok=0;
if (world_me==0)
init_ok=CMMMF.init(ntypes,cutsq,cg_types,host_lj1,host_lj2,host_lj3,
host_lj4, offset, special_lj, inum, nall, 300,
maxspecial, cell_size, gpu_split, screen);
CMMMF.device->world_barrier();
if (message)
fprintf(screen,"Done.\n");
for (int i=0; i<procs_per_gpu; i++) {
if (message) {
if (last_gpu-first_gpu==0)
fprintf(screen,"Initializing Device %d on core %d...",first_gpu,i);
else
fprintf(screen,"Initializing Devices %d-%d on core %d...",first_gpu,
last_gpu,i);
fflush(screen);
}
if (gpu_rank==i && world_me!=0)
init_ok=CMMMF.init(ntypes,cutsq,cg_types,host_lj1,host_lj2,host_lj3,
host_lj4, offset, special_lj, inum, nall, 300,
maxspecial, cell_size, gpu_split, screen);
CMMMF.device->gpu_barrier();
if (message)
fprintf(screen,"Done.\n");
}
if (message)
fprintf(screen,"\n");
if (init_ok==0)
CMMMF.estimate_gpu_overhead();
return init_ok;
}
-void cmm_gpu_clear() {
+void sdk_gpu_clear() {
CMMMF.clear();
}
-int** cmm_gpu_compute_n(const int ago, const int inum_full,
+int** sdk_gpu_compute_n(const int ago, const int inum_full,
const int nall, double **host_x, int *host_type,
double *sublo, double *subhi, tagint *tag, int **nspecial,
tagint **special, const bool eflag, const bool vflag,
const bool eatom, const bool vatom, int &host_start,
int **ilist, int **jnum, const double cpu_time,
bool &success) {
return CMMMF.compute(ago, inum_full, nall, host_x, host_type, sublo,
subhi, tag, nspecial, special, eflag, vflag, eatom,
vatom, host_start, ilist, jnum, cpu_time, success);
}
-void cmm_gpu_compute(const int ago, const int inum_full, const int nall,
+void sdk_gpu_compute(const int ago, const int inum_full, const int nall,
double **host_x, int *host_type, int *ilist, int *numj,
int **firstneigh, const bool eflag, const bool vflag,
const bool eatom, const bool vatom, int &host_start,
const double cpu_time, bool &success) {
CMMMF.compute(ago,inum_full,nall,host_x,host_type,ilist,numj,
firstneigh,eflag,vflag,eatom,vatom,host_start,cpu_time,success);
}
-double cmm_gpu_bytes() {
+double sdk_gpu_bytes() {
return CMMMF.host_memory_usage();
}
diff --git a/lib/gpu/lal_cg_cmm_long.cpp b/lib/gpu/lal_lj_sdk_long.cpp
similarity index 96%
rename from lib/gpu/lal_cg_cmm_long.cpp
rename to lib/gpu/lal_lj_sdk_long.cpp
index 14b5b7622..46caf6bd3 100644
--- a/lib/gpu/lal_cg_cmm_long.cpp
+++ b/lib/gpu/lal_lj_sdk_long.cpp
@@ -1,166 +1,166 @@
/***************************************************************************
- cg_cmm_long.cpp
+ lj_sdk_long.cpp
-------------------
W. Michael Brown (ORNL)
Class for acceleration of the lj/sdk/coul/long pair style
__________________________________________________________________________
This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
__________________________________________________________________________
begin :
email : brownw@ornl.gov
***************************************************************************/
#if defined(USE_OPENCL)
-#include "cg_cmm_long_cl.h"
+#include "lj_sdk_long_cl.h"
#elif defined(USE_CUDART)
-const char *cg_cmm_long=0;
+const char *lj_sdk_long=0;
#else
-#include "cg_cmm_long_cubin.h"
+#include "lj_sdk_long_cubin.h"
#endif
-#include "lal_cg_cmm_long.h"
+#include "lal_lj_sdk_long.h"
#include <cassert>
using namespace LAMMPS_AL;
#define CGCMMLongT CGCMMLong<numtyp, acctyp>
extern Device<PRECISION,ACC_PRECISION> device;
template <class numtyp, class acctyp>
CGCMMLongT::CGCMMLong() : BaseCharge<numtyp,acctyp>(),
_allocated(false) {
}
template <class numtyp, class acctyp>
CGCMMLongT::~CGCMMLong() {
clear();
}
template <class numtyp, class acctyp>
int CGCMMLongT::bytes_per_atom(const int max_nbors) const {
return this->bytes_per_atom_atomic(max_nbors);
}
template <class numtyp, class acctyp>
int CGCMMLongT::init(const int ntypes, double **host_cutsq,
int **host_cg_type, double **host_lj1,
double **host_lj2, double **host_lj3,
double **host_lj4, double **host_offset,
double *host_special_lj, const int nlocal,
const int nall, const int max_nbors,
const int maxspecial, const double cell_size,
const double gpu_split, FILE *_screen,
double **host_cut_ljsq,
const double host_cut_coulsq,
double *host_special_coul, const double qqrd2e,
const double g_ewald) {
int success;
success=this->init_atomic(nlocal,nall,max_nbors,maxspecial,cell_size,gpu_split,
- _screen,cg_cmm_long,"k_cg_cmm_long");
+ _screen,lj_sdk_long,"k_lj_sdk_long");
if (success!=0)
return success;
// If atom type constants fit in shared memory use fast kernel
int lj_types=ntypes;
shared_types=false;
int max_shared_types=this->device->max_shared_types();
if (lj_types<=max_shared_types && this->_block_size>=max_shared_types) {
lj_types=max_shared_types;
shared_types=true;
}
_lj_types=lj_types;
// Allocate a host write buffer for data initialization
UCL_H_Vec<numtyp> host_write(lj_types*lj_types*32,*(this->ucl_device),
UCL_WRITE_ONLY);
for (int i=0; i<lj_types*lj_types; i++)
host_write[i]=0.0;
lj1.alloc(lj_types*lj_types,*(this->ucl_device),UCL_READ_ONLY);
this->atom->type_pack4(ntypes,lj_types,lj1,host_write,host_cutsq,
host_cut_ljsq,host_lj1,host_lj2);
lj3.alloc(lj_types*lj_types,*(this->ucl_device),UCL_READ_ONLY);
this->atom->type_pack4(ntypes,lj_types,lj3,host_write,host_cg_type,host_lj3,
host_lj4,host_offset);
sp_lj.alloc(8,*(this->ucl_device),UCL_READ_ONLY);
for (int i=0; i<4; i++) {
host_write[i]=host_special_lj[i];
host_write[i+4]=host_special_coul[i];
}
ucl_copy(sp_lj,host_write,8,false);
_cut_coulsq=host_cut_coulsq;
_qqrd2e=qqrd2e;
_g_ewald=g_ewald;
_allocated=true;
this->_max_bytes=lj1.row_bytes()+lj3.row_bytes()+sp_lj.row_bytes();
return 0;
}
template <class numtyp, class acctyp>
void CGCMMLongT::clear() {
if (!_allocated)
return;
_allocated=false;
lj1.clear();
lj3.clear();
sp_lj.clear();
this->clear_atomic();
}
template <class numtyp, class acctyp>
double CGCMMLongT::host_memory_usage() const {
return this->host_memory_usage_atomic()+sizeof(CGCMMLong<numtyp,acctyp>);
}
// ---------------------------------------------------------------------------
// Calculate energies, forces, and torques
// ---------------------------------------------------------------------------
template <class numtyp, class acctyp>
void CGCMMLongT::loop(const bool _eflag, const bool _vflag) {
// Compute the block size and grid size to keep all cores busy
const int BX=this->block_size();
int eflag, vflag;
if (_eflag)
eflag=1;
else
eflag=0;
if (_vflag)
vflag=1;
else
vflag=0;
int GX=static_cast<int>(ceil(static_cast<double>(this->ans->inum())/
(BX/this->_threads_per_atom)));
int ainum=this->ans->inum();
int nbor_pitch=this->nbor->nbor_pitch();
this->time_pair.start();
if (shared_types) {
this->k_pair_fast.set_size(GX,BX);
this->k_pair_fast.run(&this->atom->x, &lj1, &lj3, &sp_lj,
&this->nbor->dev_nbor, &this->_nbor_data->begin(),
&this->ans->force, &this->ans->engv, &eflag,
&vflag, &ainum, &nbor_pitch, &this->atom->q,
&_cut_coulsq, &_qqrd2e, &_g_ewald,
&this->_threads_per_atom);
} else {
this->k_pair.set_size(GX,BX);
this->k_pair.run(&this->atom->x, &lj1, &lj3, &_lj_types, &sp_lj,
&this->nbor->dev_nbor, &this->_nbor_data->begin(),
&this->ans->force, &this->ans->engv, &eflag, &vflag,
&ainum, &nbor_pitch, &this->atom->q, &_cut_coulsq,
&_qqrd2e, &_g_ewald, &this->_threads_per_atom);
}
this->time_pair.stop();
}
template class CGCMMLong<PRECISION,ACC_PRECISION>;
diff --git a/lib/gpu/lal_cg_cmm_long.cu b/lib/gpu/lal_lj_sdk_long.cu
similarity index 98%
rename from lib/gpu/lal_cg_cmm_long.cu
rename to lib/gpu/lal_lj_sdk_long.cu
index f6942d180..5ff64b225 100644
--- a/lib/gpu/lal_cg_cmm_long.cu
+++ b/lib/gpu/lal_lj_sdk_long.cu
@@ -1,282 +1,282 @@
// **************************************************************************
-// cg_cmm_long.cu
+// lj_sdk_long.cu
// -------------------
// W. Michael Brown (ORNL)
//
// Device code for acceleration of the lj/sdk/coul/long pair style
//
// __________________________________________________________________________
// This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
// __________________________________________________________________________
//
// begin :
// email : brownw@ornl.gov
// ***************************************************************************/
#ifdef NV_KERNEL
#include "lal_aux_fun1.h"
#ifndef _DOUBLE_DOUBLE
texture<float4> pos_tex;
texture<float> q_tex;
#else
texture<int4,1> pos_tex;
texture<int2> q_tex;
#endif
#else
#define pos_tex x_
#define q_tex q_
#endif
-__kernel void k_cg_cmm_long(const __global numtyp4 *restrict x_,
+__kernel void k_lj_sdk_long(const __global numtyp4 *restrict x_,
const __global numtyp4 *restrict lj1,
const __global numtyp4 *restrict lj3,
const int lj_types,
const __global numtyp *restrict sp_lj_in,
const __global int *dev_nbor,
const __global int *dev_packed,
__global acctyp4 *restrict ans,
__global acctyp *restrict engv,
const int eflag, const int vflag, const int inum,
const int nbor_pitch,
const __global numtyp *restrict q_ ,
const numtyp cut_coulsq, const numtyp qqrd2e,
const numtyp g_ewald, const int t_per_atom) {
int tid, ii, offset;
atom_info(t_per_atom,ii,tid,offset);
__local numtyp sp_lj[8];
sp_lj[0]=sp_lj_in[0];
sp_lj[1]=sp_lj_in[1];
sp_lj[2]=sp_lj_in[2];
sp_lj[3]=sp_lj_in[3];
sp_lj[4]=sp_lj_in[4];
sp_lj[5]=sp_lj_in[5];
sp_lj[6]=sp_lj_in[6];
sp_lj[7]=sp_lj_in[7];
acctyp energy=(acctyp)0;
acctyp e_coul=(acctyp)0;
acctyp4 f;
f.x=(acctyp)0; f.y=(acctyp)0; f.z=(acctyp)0;
acctyp virial[6];
for (int i=0; i<6; i++)
virial[i]=(acctyp)0;
if (ii<inum) {
int nbor, nbor_end;
int i, numj;
__local int n_stride;
nbor_info(dev_nbor,dev_packed,nbor_pitch,t_per_atom,ii,offset,i,numj,
n_stride,nbor_end,nbor);
numtyp4 ix; fetch4(ix,i,pos_tex); //x_[i];
numtyp qtmp; fetch(qtmp,i,q_tex);
int itype=ix.w;
for ( ; nbor<nbor_end; nbor+=n_stride) {
int j=dev_packed[nbor];
numtyp factor_lj, factor_coul;
factor_lj = sp_lj[sbmask(j)];
factor_coul = (numtyp)1.0-sp_lj[sbmask(j)+4];
j &= NEIGHMASK;
numtyp4 jx; fetch4(jx,j,pos_tex); //x_[j];
int jtype=jx.w;
// Compute r12
numtyp delx = ix.x-jx.x;
numtyp dely = ix.y-jx.y;
numtyp delz = ix.z-jx.z;
numtyp rsq = delx*delx+dely*dely+delz*delz;
int mtype=itype*lj_types+jtype;
if (rsq<lj1[mtype].x) {
numtyp forcecoul, force_lj, force, inv1, inv2, prefactor, _erfc;
numtyp r2inv=ucl_recip(rsq);
if (rsq < lj1[mtype].y) {
if (lj3[mtype].x == (numtyp)2) {
inv1=r2inv*r2inv;
inv2=inv1*inv1;
} else if (lj3[mtype].x == (numtyp)1) {
inv2=r2inv*ucl_rsqrt(rsq);
inv1=inv2*inv2;
} else {
inv1=r2inv*r2inv*r2inv;
inv2=inv1;
}
force_lj = factor_lj*inv1*(lj1[mtype].z*inv2-lj1[mtype].w);
} else
force_lj = (numtyp)0.0;
if (rsq < cut_coulsq) {
numtyp r = ucl_rsqrt(r2inv);
numtyp grij = g_ewald * r;
numtyp expm2 = ucl_exp(-grij*grij);
numtyp t = ucl_recip((numtyp)1.0 + EWALD_P*grij);
_erfc = t * (A1+t*(A2+t*(A3+t*(A4+t*A5)))) * expm2;
fetch(prefactor,j,q_tex);
prefactor *= qqrd2e * qtmp/r;
forcecoul = prefactor * (_erfc + EWALD_F*grij*expm2-factor_coul);
} else
forcecoul = (numtyp)0.0;
force = (force_lj + forcecoul) * r2inv;
f.x+=delx*force;
f.y+=dely*force;
f.z+=delz*force;
if (eflag>0) {
if (rsq < cut_coulsq)
e_coul += prefactor*(_erfc-factor_coul);
if (rsq < lj1[mtype].y) {
energy += factor_lj*inv1*(lj3[mtype].y*inv2-lj3[mtype].z)-
lj3[mtype].w;
}
}
if (vflag>0) {
virial[0] += delx*delx*force;
virial[1] += dely*dely*force;
virial[2] += delz*delz*force;
virial[3] += delx*dely*force;
virial[4] += delx*delz*force;
virial[5] += dely*delz*force;
}
}
} // for nbor
store_answers_q(f,energy,e_coul,virial,ii,inum,tid,t_per_atom,offset,eflag,
vflag,ans,engv);
} // if ii
}
-__kernel void k_cg_cmm_long_fast(const __global numtyp4 *restrict x_,
+__kernel void k_lj_sdk_long_fast(const __global numtyp4 *restrict x_,
const __global numtyp4 *restrict lj1_in,
const __global numtyp4 *restrict lj3_in,
const __global numtyp *restrict sp_lj_in,
const __global int *dev_nbor,
const __global int *dev_packed,
__global acctyp4 *restrict ans,
__global acctyp *restrict engv,
const int eflag, const int vflag,
const int inum, const int nbor_pitch,
const __global numtyp *restrict q_,
const numtyp cut_coulsq, const numtyp qqrd2e,
const numtyp g_ewald, const int t_per_atom) {
int tid, ii, offset;
atom_info(t_per_atom,ii,tid,offset);
__local numtyp4 lj1[MAX_SHARED_TYPES*MAX_SHARED_TYPES];
__local numtyp4 lj3[MAX_SHARED_TYPES*MAX_SHARED_TYPES];
__local numtyp sp_lj[8];
if (tid<8)
sp_lj[tid]=sp_lj_in[tid];
if (tid<MAX_SHARED_TYPES*MAX_SHARED_TYPES) {
lj1[tid]=lj1_in[tid];
lj3[tid]=lj3_in[tid];
}
acctyp energy=(acctyp)0;
acctyp e_coul=(acctyp)0;
acctyp4 f;
f.x=(acctyp)0; f.y=(acctyp)0; f.z=(acctyp)0;
acctyp virial[6];
for (int i=0; i<6; i++)
virial[i]=(acctyp)0;
__syncthreads();
if (ii<inum) {
int nbor, nbor_end;
int i, numj;
__local int n_stride;
nbor_info(dev_nbor,dev_packed,nbor_pitch,t_per_atom,ii,offset,i,numj,
n_stride,nbor_end,nbor);
numtyp4 ix; fetch4(ix,i,pos_tex); //x_[i];
numtyp qtmp; fetch(qtmp,i,q_tex);
int iw=ix.w;
int itype=fast_mul((int)MAX_SHARED_TYPES,iw);
for ( ; nbor<nbor_end; nbor+=n_stride) {
int j=dev_packed[nbor];
numtyp factor_lj, factor_coul;
factor_lj = sp_lj[sbmask(j)];
factor_coul = (numtyp)1.0-sp_lj[sbmask(j)+4];
j &= NEIGHMASK;
numtyp4 jx; fetch4(jx,j,pos_tex); //x_[j];
int mtype=itype+jx.w;
// Compute r12
numtyp delx = ix.x-jx.x;
numtyp dely = ix.y-jx.y;
numtyp delz = ix.z-jx.z;
numtyp rsq = delx*delx+dely*dely+delz*delz;
if (rsq<lj1[mtype].x) {
numtyp forcecoul, force_lj, force, inv1, inv2, prefactor, _erfc;
numtyp r2inv=ucl_recip(rsq);
if (rsq < lj1[mtype].y) {
if (lj3[mtype].x == (numtyp)2) {
inv1=r2inv*r2inv;
inv2=inv1*inv1;
} else if (lj3[mtype].x == (numtyp)1) {
inv2=r2inv*ucl_rsqrt(rsq);
inv1=inv2*inv2;
} else {
inv1=r2inv*r2inv*r2inv;
inv2=inv1;
}
force_lj = factor_lj*inv1*(lj1[mtype].z*inv2-lj1[mtype].w);
} else
force_lj = (numtyp)0.0;
if (rsq < cut_coulsq) {
numtyp r = ucl_rsqrt(r2inv);
numtyp grij = g_ewald * r;
numtyp expm2 = ucl_exp(-grij*grij);
numtyp t = ucl_recip((numtyp)1.0 + EWALD_P*grij);
_erfc = t * (A1+t*(A2+t*(A3+t*(A4+t*A5)))) * expm2;
fetch(prefactor,j,q_tex);
prefactor *= qqrd2e * qtmp/r;
forcecoul = prefactor * (_erfc + EWALD_F*grij*expm2-factor_coul);
} else
forcecoul = (numtyp)0.0;
force = (force_lj + forcecoul) * r2inv;
f.x+=delx*force;
f.y+=dely*force;
f.z+=delz*force;
if (eflag>0) {
if (rsq < cut_coulsq)
e_coul += prefactor*(_erfc-factor_coul);
if (rsq < lj1[mtype].y) {
energy += factor_lj*inv1*(lj3[mtype].y*inv2-lj3[mtype].z)-
lj3[mtype].w;
}
}
if (vflag>0) {
virial[0] += delx*delx*force;
virial[1] += dely*dely*force;
virial[2] += delz*delz*force;
virial[3] += delx*dely*force;
virial[4] += delx*delz*force;
virial[5] += dely*delz*force;
}
}
} // for nbor
store_answers_q(f,energy,e_coul,virial,ii,inum,tid,t_per_atom,offset,eflag,
vflag,ans,engv);
} // if ii
}
diff --git a/lib/gpu/lal_cg_cmm_long.h b/lib/gpu/lal_lj_sdk_long.h
similarity index 98%
rename from lib/gpu/lal_cg_cmm_long.h
rename to lib/gpu/lal_lj_sdk_long.h
index aa0cbfbaf..f56687cd7 100644
--- a/lib/gpu/lal_cg_cmm_long.h
+++ b/lib/gpu/lal_lj_sdk_long.h
@@ -1,83 +1,83 @@
/***************************************************************************
- cg_cmm_long.h
+ lj_sdk_long.h
-------------------
W. Michael Brown (ORNL)
Class for acceleration of the lj/sdk/coul/long pair style
__________________________________________________________________________
This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
__________________________________________________________________________
begin :
email : brownw@ornl.gov
***************************************************************************/
#ifndef LAL_CG_CMM_LONG_H
#define LAL_CG_CMM_LONG_H
#include "lal_base_charge.h"
namespace LAMMPS_AL {
template <class numtyp, class acctyp>
class CGCMMLong : public BaseCharge<numtyp, acctyp> {
public:
CGCMMLong();
~CGCMMLong();
/// Clear any previous data and set up for a new LAMMPS run
/** \param max_nbors initial number of rows in the neighbor matrix
* \param cell_size cutoff + skin
* \param gpu_split fraction of particles handled by device
*
* Returns:
* - 0 if successfull
* - -1 if fix gpu not found
* - -3 if there is an out of memory error
* - -4 if the GPU library was not compiled for GPU
* - -5 Double precision is not supported on card **/
int init(const int ntypes, double **host_cutsq, int ** cg_type,
double **host_lj1, double **host_lj2, double **host_lj3,
double **host_lj4, double **host_offset, double *host_special_lj,
const int nlocal, const int nall, const int max_nbors,
const int maxspecial, const double cell_size,
const double gpu_split, FILE *screen, double **host_cut_ljsq,
const double host_cut_coulsq, double *host_special_coul,
const double qqrd2e, const double g_ewald);
/// Clear all host and device data
/** \note This is called at the beginning of the init() routine **/
void clear();
/// Returns memory usage on device per atom
int bytes_per_atom(const int max_nbors) const;
/// Total host memory used by library for pair style
double host_memory_usage() const;
// --------------------------- TYPE DATA --------------------------
/// lj1.x = cutsq, lj1.y = cutsq_vdw, lj1.z = lj1, lj1.w = lj2,
UCL_D_Vec<numtyp4> lj1;
/// lj3.x = cg_type, lj3.y = lj3, lj3.z = lj4, lj3.w = offset
UCL_D_Vec<numtyp4> lj3;
/// Special LJ values [0-3] and Special Coul values [4-7]
UCL_D_Vec<numtyp> sp_lj;
/// If atom type constants fit in shared memory, use fast kernels
bool shared_types;
/// Number of atom types
int _lj_types;
numtyp _cut_coulsq, _qqrd2e, _g_ewald;
private:
bool _allocated;
void loop(const bool _eflag, const bool _vflag);
};
}
#endif
diff --git a/lib/gpu/lal_cg_cmm_long_ext.cpp b/lib/gpu/lal_lj_sdk_long_ext.cpp
similarity index 93%
rename from lib/gpu/lal_cg_cmm_long_ext.cpp
rename to lib/gpu/lal_lj_sdk_long_ext.cpp
index ee0a0269e..08390d3ee 100644
--- a/lib/gpu/lal_cg_cmm_long_ext.cpp
+++ b/lib/gpu/lal_lj_sdk_long_ext.cpp
@@ -1,129 +1,129 @@
/***************************************************************************
- cg_cmm_long.h
+ lj_sdk_long.h
-------------------
W. Michael Brown (ORNL)
Functions for LAMMPS access to lj/sdk/coul/long acceleration functions
__________________________________________________________________________
This file is part of the LAMMPS Accelerator Library (LAMMPS_AL)
__________________________________________________________________________
begin :
email : brownw@ornl.gov
***************************************************************************/
#include <iostream>
#include <cassert>
#include <math.h>
-#include "lal_cg_cmm_long.h"
+#include "lal_lj_sdk_long.h"
using namespace std;
using namespace LAMMPS_AL;
static CGCMMLong<PRECISION,ACC_PRECISION> CMMLMF;
// ---------------------------------------------------------------------------
// Allocate memory on host and device and copy constants to device
// ---------------------------------------------------------------------------
-int cmml_gpu_init(const int ntypes, double **cutsq, int **cg_type,
+int sdkl_gpu_init(const int ntypes, double **cutsq, int **cg_type,
double **host_lj1, double **host_lj2, double **host_lj3,
double **host_lj4, double **offset, double *special_lj,
const int inum, const int nall, const int max_nbors,
const int maxspecial, const double cell_size, int &gpu_mode,
FILE *screen, double **host_cut_ljsq, double host_cut_coulsq,
double *host_special_coul, const double qqrd2e,
const double g_ewald) {
CMMLMF.clear();
gpu_mode=CMMLMF.device->gpu_mode();
double gpu_split=CMMLMF.device->particle_split();
int first_gpu=CMMLMF.device->first_device();
int last_gpu=CMMLMF.device->last_device();
int world_me=CMMLMF.device->world_me();
int gpu_rank=CMMLMF.device->gpu_rank();
int procs_per_gpu=CMMLMF.device->procs_per_gpu();
CMMLMF.device->init_message(screen,"lj/sdk/coul/long",first_gpu,last_gpu);
bool message=false;
if (CMMLMF.device->replica_me()==0 && screen)
message=true;
if (message) {
fprintf(screen,"Initializing Device and compiling on process 0...");
fflush(screen);
}
int init_ok=0;
if (world_me==0)
init_ok=CMMLMF.init(ntypes, cutsq, cg_type, host_lj1, host_lj2, host_lj3,
host_lj4, offset, special_lj, inum, nall, 300,
maxspecial, cell_size, gpu_split, screen, host_cut_ljsq,
host_cut_coulsq, host_special_coul, qqrd2e,g_ewald);
CMMLMF.device->world_barrier();
if (message)
fprintf(screen,"Done.\n");
for (int i=0; i<procs_per_gpu; i++) {
if (message) {
if (last_gpu-first_gpu==0)
fprintf(screen,"Initializing Device %d on core %d...",first_gpu,i);
else
fprintf(screen,"Initializing Devices %d-%d on core %d...",first_gpu,
last_gpu,i);
fflush(screen);
}
if (gpu_rank==i && world_me!=0)
init_ok=CMMLMF.init(ntypes, cutsq, cg_type, host_lj1, host_lj2, host_lj3,
host_lj4, offset, special_lj, inum, nall, 300,
maxspecial, cell_size, gpu_split, screen,
host_cut_ljsq, host_cut_coulsq, host_special_coul,
qqrd2e, g_ewald);
CMMLMF.device->gpu_barrier();
if (message)
fprintf(screen,"Done.\n");
}
if (message)
fprintf(screen,"\n");
if (init_ok==0)
CMMLMF.estimate_gpu_overhead();
return init_ok;
}
-void cmml_gpu_clear() {
+void sdkl_gpu_clear() {
CMMLMF.clear();
}
-int** cmml_gpu_compute_n(const int ago, const int inum_full,
+int** sdkl_gpu_compute_n(const int ago, const int inum_full,
const int nall, double **host_x, int *host_type,
double *sublo, double *subhi, tagint *tag, int **nspecial,
tagint **special, const bool eflag, const bool vflag,
const bool eatom, const bool vatom, int &host_start,
int **ilist, int **jnum, const double cpu_time,
bool &success, double *host_q, double *boxlo,
double *prd) {
return CMMLMF.compute(ago, inum_full, nall, host_x, host_type, sublo,
subhi, tag, nspecial, special, eflag, vflag, eatom,
vatom, host_start, ilist, jnum, cpu_time, success,
host_q,boxlo,prd);
}
-void cmml_gpu_compute(const int ago, const int inum_full, const int nall,
+void sdkl_gpu_compute(const int ago, const int inum_full, const int nall,
double **host_x, int *host_type, int *ilist, int *numj,
int **firstneigh, const bool eflag, const bool vflag,
const bool eatom, const bool vatom, int &host_start,
const double cpu_time, bool &success, double *host_q,
const int nlocal, double *boxlo, double *prd) {
CMMLMF.compute(ago,inum_full,nall,host_x,host_type,ilist,numj,
firstneigh,eflag,vflag,eatom,vatom,host_start,cpu_time,success,
host_q,nlocal,boxlo,prd);
}
-double cmml_gpu_bytes() {
+double sdkl_gpu_bytes() {
return CMMLMF.host_memory_usage();
}
diff --git a/lib/h5md/Install.py b/lib/h5md/Install.py
new file mode 100644
index 000000000..18b426f92
--- /dev/null
+++ b/lib/h5md/Install.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+
+# install.py tool to do a generic build of a library
+# soft linked to by many of the lib/Install.py files
+# used to automate the steps described in the corresponding lib/README
+
+import sys,commands,os
+
+# help message
+
+help = """
+Syntax: python Install.py -m machine -e suffix
+ specify -m and optionally -e, order does not matter
+ -m = peform a clean followed by "make -f Makefile.machine"
+ machine = suffix of a lib/Makefile.* file
+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
+ does not alter existing Makefile.machine
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+machine = None
+extraflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-m":
+ if iarg+2 > nargs: error()
+ machine = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-e":
+ if iarg+2 > nargs: error()
+ extraflag = 1
+ suffix = args[iarg+1]
+ iarg += 2
+ else: error()
+
+# set lib from working dir
+
+cwd = os.getcwd()
+lib = os.path.basename(cwd)
+
+# create Makefile.auto as copy of Makefile.machine
+# reset EXTRAMAKE if requested
+
+if not os.path.exists("Makefile.%s" % machine):
+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
+
+lines = open("Makefile.%s" % machine,'r').readlines()
+fp = open("Makefile.auto",'w')
+
+for line in lines:
+ words = line.split()
+ if len(words) == 3 and extraflag and \
+ words[0] == "EXTRAMAKE" and words[1] == '=':
+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
+ print >>fp,line,
+
+fp.close()
+
+# make the library via Makefile.auto
+
+print "Building lib%s.a ..." % lib
+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
+txt = commands.getoutput(cmd)
+print txt
+
+if os.path.exists("lib%s.a" % lib): print "Build was successful"
+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
+if not os.path.exists("Makefile.lammps"):
+ print "lib/%s/Makefile.lammps was NOT created" % lib
diff --git a/lib/h5md/Makefile b/lib/h5md/Makefile.h5cc
similarity index 95%
rename from lib/h5md/Makefile
rename to lib/h5md/Makefile.h5cc
index 085d21ff6..bd3e8a978 100644
--- a/lib/h5md/Makefile
+++ b/lib/h5md/Makefile.h5cc
@@ -1,33 +1,33 @@
EXTRAMAKE=Makefile.lammps.empty
CC=h5cc
# -DH5_NO_DEPRECATED_SYMBOLS is required here to ensure we are using
# the v1.8 API when HDF5 is configured to default to using the v1.6 API.
CFLAGS=-D_DEFAULT_SOURCE -O2 -DH5_NO_DEPRECATED_SYMBOLS -Wall -fPIC
HDF5_PATH=/usr
INC=-I include
AR=ar
ARFLAGS=rc
LIB=libch5md.a
all: lib Makefile.lammps
build:
mkdir -p build
build/ch5md.o: src/ch5md.c | build
$(CC) $(INC) $(CFLAGS) -c $< -o $@
Makefile.lammps:
- cp Makefile.lammps.empty $@
+ cp $(EXTRAMAKE) $@
.PHONY: all lib clean
$(LIB): build/ch5md.o
$(AR) $(ARFLAGS) $(LIB) build/ch5md.o
lib: $(LIB)
clean:
rm -f build/*.o $(LIB)
diff --git a/lib/h5md/README b/lib/h5md/README
index 62a4979cb..fb7d82bfc 100644
--- a/lib/h5md/README
+++ b/lib/h5md/README
@@ -1,27 +1,38 @@
This directory contains the ch5md library, which is bundled with
LAMMPS under its own BSD license; see below. This library is used
when the USER-H5MD package is included in a LAMMPS build and the dump
h5md command is invoked in a LAMMPS input script.
+You can type "make lib-h5md" from the src directory to see help on how
+to build this library via make commands, or you can do the same thing
+by typing "python Install.py" from within this directory, or you can
+do it manually by following the instructions below.
+
---------------------
ch5md : Read and write H5MD files in C
======================================
Copyright (C) 2013-2014 Pierre de Buyl
ch5md is a set of C routines to manipulate H5MD files. H5MD is a file format
specification based on [HDF5](http://www.hdfgroup.org/HDF5/) for storing
molecular data, whose development is found at <http://nongnu.org/h5md/>.
ch5md is developped by Pierre de Buyl and is released under the 3-clause BSD
license that can be found in the file LICENSE.
-To use the h5md dump style in lammps, execute make in this directory then 'make
-yes-user-h5md' in the src directory of lammps. Rebuild lammps.
+To use the h5md dump style in lammps, execute
+make -f Makefile.h5cc
+in this directory then
+make yes-user-h5md
+in the src directory of LAMMPS to rebuild LAMMPS.
+
+Note that you must have the h5cc compiler installed to use
+Makefile.h5cc. It should be part
If HDF5 is not in a standard system location, edit Makefile.lammps accordingly.
In the case of 2015 and more recent debian and ubuntu systems where concurrent
serial and mpi are possible, use the full platform depedent path, i.e.
`HDF5_PATH=/usr/lib/x86_64-linux-gnu/hdf5/serial`
diff --git a/lib/kokkos/CHANGELOG.md b/lib/kokkos/CHANGELOG.md
index 4a96e2441..c6fe991b9 100644
--- a/lib/kokkos/CHANGELOG.md
+++ b/lib/kokkos/CHANGELOG.md
@@ -1,306 +1,329 @@
# Change Log
+## [2.03.00](https://github.com/kokkos/kokkos/tree/2.03.00) (2017-04-25)
+[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.15...2.03.00)
+
+**Implemented enhancements:**
+
+- UnorderedMap: make it accept Devices or MemorySpaces [\#711](https://github.com/kokkos/kokkos/issues/711)
+- sort to accept DynamicView and \[begin,end\) indices [\#691](https://github.com/kokkos/kokkos/issues/691)
+- ENABLE Macros should only be used via \#ifdef or \#if defined [\#675](https://github.com/kokkos/kokkos/issues/675)
+- Remove impl/Kokkos\_Synchronic\_\* [\#666](https://github.com/kokkos/kokkos/issues/666)
+- Turning off IVDEP for Intel 14. [\#638](https://github.com/kokkos/kokkos/issues/638)
+- Using an installed Kokkos in a target application using CMake [\#633](https://github.com/kokkos/kokkos/issues/633)
+- Create Kokkos Bill of Materials [\#632](https://github.com/kokkos/kokkos/issues/632)
+- MDRangePolicy and tagged evaluators [\#547](https://github.com/kokkos/kokkos/issues/547)
+- Add PGI support [\#289](https://github.com/kokkos/kokkos/issues/289)
+
+**Fixed bugs:**
+
+- Output from PerTeam fails [\#733](https://github.com/kokkos/kokkos/issues/733)
+- Cuda: architecture flag not added to link line [\#688](https://github.com/kokkos/kokkos/issues/688)
+- Getting large chunks of memory for a thread team in a universal way [\#664](https://github.com/kokkos/kokkos/issues/664)
+- Kokkos RNG normal\(\) function hangs for small seed value [\#655](https://github.com/kokkos/kokkos/issues/655)
+- Kokkos Tests Errors on Shepard/HSW Builds [\#644](https://github.com/kokkos/kokkos/issues/644)
+
## [2.02.15](https://github.com/kokkos/kokkos/tree/2.02.15) (2017-02-10)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.07...2.02.15)
**Implemented enhancements:**
- Containers: Adding block partitioning to StaticCrsGraph [\#625](https://github.com/kokkos/kokkos/issues/625)
- Kokkos Make System can induce Errors on Cray Volta System [\#610](https://github.com/kokkos/kokkos/issues/610)
- OpenMP: error out if KOKKOS\_HAVE\_OPENMP is defined but not \_OPENMP [\#605](https://github.com/kokkos/kokkos/issues/605)
- CMake: fix standalone build with tests [\#604](https://github.com/kokkos/kokkos/issues/604)
- Change README \(that GitHub shows when opening Kokkos project page\) to tell users how to submit PRs [\#597](https://github.com/kokkos/kokkos/issues/597)
- Add correctness testing for all operators of Atomic View [\#420](https://github.com/kokkos/kokkos/issues/420)
- Allow assignment of Views with compatible memory spaces [\#290](https://github.com/kokkos/kokkos/issues/290)
- Build only one version of Kokkos library for tests [\#213](https://github.com/kokkos/kokkos/issues/213)
- Clean out old KOKKOS\_HAVE\_CXX11 macros clauses [\#156](https://github.com/kokkos/kokkos/issues/156)
- Harmonize Macro names [\#150](https://github.com/kokkos/kokkos/issues/150)
**Fixed bugs:**
- Cray and PGI: Kokkos\_Parallel\_Reduce [\#634](https://github.com/kokkos/kokkos/issues/634)
- Kokkos Make System can induce Errors on Cray Volta System [\#610](https://github.com/kokkos/kokkos/issues/610)
- Normal\(\) function random number generator doesn't give the expected distribution [\#592](https://github.com/kokkos/kokkos/issues/592)
## [2.02.07](https://github.com/kokkos/kokkos/tree/2.02.07) (2016-12-16)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.01...2.02.07)
**Implemented enhancements:**
- Add CMake option to enable Cuda Lambda support [\#589](https://github.com/kokkos/kokkos/issues/589)
- Add CMake option to enable Cuda RDC support [\#588](https://github.com/kokkos/kokkos/issues/588)
- Add Initial Intel Sky Lake Xeon-HPC Compiler Support to Kokkos Make System [\#584](https://github.com/kokkos/kokkos/issues/584)
- Building Tutorial Examples [\#582](https://github.com/kokkos/kokkos/issues/582)
- Internal way for using ThreadVectorRange without TeamHandle [\#574](https://github.com/kokkos/kokkos/issues/574)
- Testing: Add testing for uvm and rdc [\#571](https://github.com/kokkos/kokkos/issues/571)
- Profiling: Add Memory Tracing and Region Markers [\#557](https://github.com/kokkos/kokkos/issues/557)
- nvcc\_wrapper not installed with Kokkos built with CUDA through CMake [\#543](https://github.com/kokkos/kokkos/issues/543)
- Improve DynRankView debug check [\#541](https://github.com/kokkos/kokkos/issues/541)
- Benchmarks: Add Gather benchmark [\#536](https://github.com/kokkos/kokkos/issues/536)
- Testing: add spot\_check option to test\_all\_sandia [\#535](https://github.com/kokkos/kokkos/issues/535)
- Deprecate Kokkos::Impl::VerifyExecutionCanAccessMemorySpace [\#527](https://github.com/kokkos/kokkos/issues/527)
- Add AtomicAdd support for 64bit float for Pascal [\#522](https://github.com/kokkos/kokkos/issues/522)
- Add Restrict and Aligned memory trait [\#517](https://github.com/kokkos/kokkos/issues/517)
- Kokkos Tests are Not Run using Compiler Optimization [\#501](https://github.com/kokkos/kokkos/issues/501)
- Add support for clang 3.7 w/ openmp backend [\#393](https://github.com/kokkos/kokkos/issues/393)
- Provide an error throw class [\#79](https://github.com/kokkos/kokkos/issues/79)
**Fixed bugs:**
- Cuda UVM Allocation test broken with UVM as default space [\#586](https://github.com/kokkos/kokkos/issues/586)
- Bug \(develop branch only\): multiple tests are now failing when forcing uvm usage. [\#570](https://github.com/kokkos/kokkos/issues/570)
- Error in generate\_makefile.sh for Kokkos when Compiler is Empty String/Fails [\#568](https://github.com/kokkos/kokkos/issues/568)
- XL 13.1.4 incorrect C++11 flag [\#553](https://github.com/kokkos/kokkos/issues/553)
- Improve DynRankView debug check [\#541](https://github.com/kokkos/kokkos/issues/541)
- Installing Library on MAC broken due to cp -u [\#539](https://github.com/kokkos/kokkos/issues/539)
- Intel Nightly Testing with Debug enabled fails [\#534](https://github.com/kokkos/kokkos/issues/534)
## [2.02.01](https://github.com/kokkos/kokkos/tree/2.02.01) (2016-11-01)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.00...2.02.01)
**Implemented enhancements:**
- Add Changelog generation to our process. [\#506](https://github.com/kokkos/kokkos/issues/506)
**Fixed bugs:**
- Test scratch\_request fails in Serial with Debug enabled [\#520](https://github.com/kokkos/kokkos/issues/520)
- Bug In BoundsCheck for DynRankView [\#516](https://github.com/kokkos/kokkos/issues/516)
## [2.02.00](https://github.com/kokkos/kokkos/tree/2.02.00) (2016-10-30)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.10...2.02.00)
**Implemented enhancements:**
- Add PowerPC assembly for grabbing clock register in memory pool [\#511](https://github.com/kokkos/kokkos/issues/511)
- Add GCC 6.x support [\#508](https://github.com/kokkos/kokkos/issues/508)
- Test install and build against installed library [\#498](https://github.com/kokkos/kokkos/issues/498)
- Makefile.kokkos adds expt-extended-lambda to cuda build with clang [\#490](https://github.com/kokkos/kokkos/issues/490)
- Add top-level makefile option to just test kokkos-core unit-test [\#485](https://github.com/kokkos/kokkos/issues/485)
- Split and harmonize Object Files of Core UnitTests to increase build parallelism [\#484](https://github.com/kokkos/kokkos/issues/484)
- LayoutLeft to LayoutLeft subview for 3D and 4D views [\#473](https://github.com/kokkos/kokkos/issues/473)
- Add official Cuda 8.0 support [\#468](https://github.com/kokkos/kokkos/issues/468)
- Allow C++1Z Flag for Class Lambda capture [\#465](https://github.com/kokkos/kokkos/issues/465)
- Add Clang 4.0+ compilation of Cuda code [\#455](https://github.com/kokkos/kokkos/issues/455)
- Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch [\#445](https://github.com/kokkos/kokkos/issues/445)
- Add name of view to "View bounds error" [\#432](https://github.com/kokkos/kokkos/issues/432)
- Move Sort Binning Operators into Kokkos namespace [\#421](https://github.com/kokkos/kokkos/issues/421)
- TaskPolicy - generate error when attempt to use uninitialized [\#396](https://github.com/kokkos/kokkos/issues/396)
- Import WithoutInitializing and AllowPadding into Kokkos namespace [\#325](https://github.com/kokkos/kokkos/issues/325)
- TeamThreadRange requires begin, end to be the same type [\#305](https://github.com/kokkos/kokkos/issues/305)
- CudaUVMSpace should track \# allocations, due to CUDA limit on \# UVM allocations [\#300](https://github.com/kokkos/kokkos/issues/300)
- Remove old View and its infrastructure [\#259](https://github.com/kokkos/kokkos/issues/259)
**Fixed bugs:**
- Bug in TestCuda\_Other.cpp: most likely assembly inserted into Device code [\#515](https://github.com/kokkos/kokkos/issues/515)
- Cuda Compute Capability check of GPU is outdated [\#509](https://github.com/kokkos/kokkos/issues/509)
- multi\_scratch test with hwloc and pthreads seg-faults. [\#504](https://github.com/kokkos/kokkos/issues/504)
- generate\_makefile.bash: "make install" is broken [\#503](https://github.com/kokkos/kokkos/issues/503)
- make clean in Out of Source Build/Tests Does Not Work Correctly [\#502](https://github.com/kokkos/kokkos/issues/502)
- Makefiles for test and examples have issues in Cuda when CXX is not explicitly specified [\#497](https://github.com/kokkos/kokkos/issues/497)
- Dispatch lambda test directly inside GTEST macro doesn't work with nvcc [\#491](https://github.com/kokkos/kokkos/issues/491)
- UnitTests with HWLOC enabled fail if run with mpirun bound to a single core [\#489](https://github.com/kokkos/kokkos/issues/489)
- Failing Reducer Test on Mac with Pthreads [\#479](https://github.com/kokkos/kokkos/issues/479)
- make test Dumps Error with Clang Not Found [\#471](https://github.com/kokkos/kokkos/issues/471)
- OpenMP TeamPolicy member broadcast not using correct volatile shared variable [\#424](https://github.com/kokkos/kokkos/issues/424)
- TaskPolicy - generate error when attempt to use uninitialized [\#396](https://github.com/kokkos/kokkos/issues/396)
- New task policy implementation is pulling in old experimental code. [\#372](https://github.com/kokkos/kokkos/issues/372)
- MemoryPool unit test hangs on Power8 with GCC 6.1.0 [\#298](https://github.com/kokkos/kokkos/issues/298)
## [2.01.10](https://github.com/kokkos/kokkos/tree/2.01.10) (2016-09-27)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.06...2.01.10)
**Implemented enhancements:**
- Enable Profiling by default in Tribits build [\#438](https://github.com/kokkos/kokkos/issues/438)
- parallel\_reduce\(0\), parallel\_scan\(0\) unit tests [\#436](https://github.com/kokkos/kokkos/issues/436)
- data\(\)==NULL after realloc with LayoutStride [\#351](https://github.com/kokkos/kokkos/issues/351)
- Fix tutorials to track new Kokkos::View [\#323](https://github.com/kokkos/kokkos/issues/323)
- Rename team policy set\_scratch\_size. [\#195](https://github.com/kokkos/kokkos/issues/195)
**Fixed bugs:**
- Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch [\#445](https://github.com/kokkos/kokkos/issues/445)
- Makefile spits syntax error [\#435](https://github.com/kokkos/kokkos/issues/435)
- Kokkos::sort fails for view with all the same values [\#422](https://github.com/kokkos/kokkos/issues/422)
- Generic Reducers: can't accept inline constructed reducer [\#404](https://github.com/kokkos/kokkos/issues/404)
- data\\(\\)==NULL after realloc with LayoutStride [\#351](https://github.com/kokkos/kokkos/issues/351)
- const subview of const view with compile time dimensions on Cuda backend [\#310](https://github.com/kokkos/kokkos/issues/310)
- Kokkos \(in Trilinos\) Causes Internal Compiler Error on CUDA 8.0.21-EA on POWER8 [\#307](https://github.com/kokkos/kokkos/issues/307)
- Core Oversubscription Detection Broken? [\#159](https://github.com/kokkos/kokkos/issues/159)
## [2.01.06](https://github.com/kokkos/kokkos/tree/2.01.06) (2016-09-02)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.01.00...2.01.06)
**Implemented enhancements:**
- Add "standard" reducers for lambda-supportable customized reduce [\#411](https://github.com/kokkos/kokkos/issues/411)
- TaskPolicy - single thread back-end execution [\#390](https://github.com/kokkos/kokkos/issues/390)
- Kokkos master clone tag [\#387](https://github.com/kokkos/kokkos/issues/387)
- Query memory requirements from task policy [\#378](https://github.com/kokkos/kokkos/issues/378)
- Output order of test\_atomic.cpp is confusing [\#373](https://github.com/kokkos/kokkos/issues/373)
- Missing testing for atomics [\#341](https://github.com/kokkos/kokkos/issues/341)
- Feature request for Kokkos to provide Kokkos::atomic\_fetch\_max and atomic\_fetch\_min [\#336](https://github.com/kokkos/kokkos/issues/336)
- TaskPolicy\<Cuda\> performance requires teams mapped to warps [\#218](https://github.com/kokkos/kokkos/issues/218)
**Fixed bugs:**
- Reduce with Teams broken for custom initialize [\#407](https://github.com/kokkos/kokkos/issues/407)
- Failing Kokkos build on Debian [\#402](https://github.com/kokkos/kokkos/issues/402)
- Failing Tests on NVIDIA Pascal GPUs [\#398](https://github.com/kokkos/kokkos/issues/398)
- Algorithms: fill\_random assumes dimensions fit in unsigned int [\#389](https://github.com/kokkos/kokkos/issues/389)
- Kokkos::subview with RandomAccess Memory Trait [\#385](https://github.com/kokkos/kokkos/issues/385)
- Build warning \(signed / unsigned comparison\) in Cuda implementation [\#365](https://github.com/kokkos/kokkos/issues/365)
- wrong results for a parallel\_reduce with CUDA8 / Maxwell50 [\#352](https://github.com/kokkos/kokkos/issues/352)
- Hierarchical parallelism - 3 level unit test [\#344](https://github.com/kokkos/kokkos/issues/344)
- Can I allocate a View w/ both WithoutInitializing & AllowPadding? [\#324](https://github.com/kokkos/kokkos/issues/324)
- subview View layout determination [\#309](https://github.com/kokkos/kokkos/issues/309)
- Unit tests with Cuda - Maxwell [\#196](https://github.com/kokkos/kokkos/issues/196)
## [2.01.00](https://github.com/kokkos/kokkos/tree/2.01.00) (2016-07-21)
[Full Changelog](https://github.com/kokkos/kokkos/compare/End_C++98...2.01.00)
**Implemented enhancements:**
- Edit ViewMapping so assigning Views with the same custom layout compiles when const casting [\#327](https://github.com/kokkos/kokkos/issues/327)
- DynRankView: Performance improvement for operator\(\) [\#321](https://github.com/kokkos/kokkos/issues/321)
- Interoperability between static and dynamic rank views [\#295](https://github.com/kokkos/kokkos/issues/295)
- subview member function ? [\#280](https://github.com/kokkos/kokkos/issues/280)
- Inter-operatibility between View and DynRankView. [\#245](https://github.com/kokkos/kokkos/issues/245)
- \(Trilinos\) build warning in atomic\_assign, with Kokkos::complex [\#177](https://github.com/kokkos/kokkos/issues/177)
- View\<\>::shmem\_size should runtime check for number of arguments equal to rank [\#176](https://github.com/kokkos/kokkos/issues/176)
- Custom reduction join via lambda argument [\#99](https://github.com/kokkos/kokkos/issues/99)
- DynRankView with 0 dimensions passed in at construction [\#293](https://github.com/kokkos/kokkos/issues/293)
- Inject view\_alloc and friends into Kokkos namespace [\#292](https://github.com/kokkos/kokkos/issues/292)
- Less restrictive TeamPolicy reduction on Cuda [\#286](https://github.com/kokkos/kokkos/issues/286)
- deep\_copy using remap with source execution space [\#267](https://github.com/kokkos/kokkos/issues/267)
- Suggestion: Enable opt-in L1 caching via nvcc-wrapper [\#261](https://github.com/kokkos/kokkos/issues/261)
- More flexible create\_mirror functions [\#260](https://github.com/kokkos/kokkos/issues/260)
- Rename View::memory\_span to View::required\_allocation\_size [\#256](https://github.com/kokkos/kokkos/issues/256)
- Use of subviews and views with compile-time dimensions [\#237](https://github.com/kokkos/kokkos/issues/237)
- Use of subviews and views with compile-time dimensions [\#237](https://github.com/kokkos/kokkos/issues/237)
- Kokkos::Timer [\#234](https://github.com/kokkos/kokkos/issues/234)
- Fence CudaUVMSpace allocations [\#230](https://github.com/kokkos/kokkos/issues/230)
- View::operator\(\) accept std::is\_integral and std::is\_enum [\#227](https://github.com/kokkos/kokkos/issues/227)
- Allocating zero size View [\#216](https://github.com/kokkos/kokkos/issues/216)
- Thread scalable memory pool [\#212](https://github.com/kokkos/kokkos/issues/212)
- Add a way to disable memory leak output [\#194](https://github.com/kokkos/kokkos/issues/194)
- Kokkos exec space init should init Kokkos profiling [\#192](https://github.com/kokkos/kokkos/issues/192)
- Runtime rank wrapper for View [\#189](https://github.com/kokkos/kokkos/issues/189)
- Profiling Interface [\#158](https://github.com/kokkos/kokkos/issues/158)
- Fix View assignment \(of managed to unmanaged\) [\#153](https://github.com/kokkos/kokkos/issues/153)
- Add unit test for assignment of managed View to unmanaged View [\#152](https://github.com/kokkos/kokkos/issues/152)
- Check for oversubscription of threads with MPI in Kokkos::initialize [\#149](https://github.com/kokkos/kokkos/issues/149)
- Dynamic resizeable 1dimensional view [\#143](https://github.com/kokkos/kokkos/issues/143)
- Develop TaskPolicy for CUDA [\#142](https://github.com/kokkos/kokkos/issues/142)
- New View : Test Compilation Downstream [\#138](https://github.com/kokkos/kokkos/issues/138)
- New View Implementation [\#135](https://github.com/kokkos/kokkos/issues/135)
- Add variant of subview that lets users add traits [\#134](https://github.com/kokkos/kokkos/issues/134)
- NVCC-WRAPPER: Add --host-only flag [\#121](https://github.com/kokkos/kokkos/issues/121)
- Address gtest issue with TriBITS Kokkos build outside of Trilinos [\#117](https://github.com/kokkos/kokkos/issues/117)
- Make tests pass with -expt-extended-lambda on CUDA [\#108](https://github.com/kokkos/kokkos/issues/108)
- Dynamic scheduling for parallel\_for and parallel\_reduce [\#106](https://github.com/kokkos/kokkos/issues/106)
- Runtime or compile time error when reduce functor's join is not properly specified as const member function or with volatile arguments [\#105](https://github.com/kokkos/kokkos/issues/105)
- Error out when the number of threads is modified after kokkos is initialized [\#104](https://github.com/kokkos/kokkos/issues/104)
- Porting to POWER and remove assumption of X86 default [\#103](https://github.com/kokkos/kokkos/issues/103)
- Dynamic scheduling option for RangePolicy [\#100](https://github.com/kokkos/kokkos/issues/100)
- SharedMemory Support for Lambdas [\#81](https://github.com/kokkos/kokkos/issues/81)
- Recommended TeamSize for Lambdas [\#80](https://github.com/kokkos/kokkos/issues/80)
- Add Aggressive Vectorization Compilation mode [\#72](https://github.com/kokkos/kokkos/issues/72)
- Dynamic scheduling team execution policy [\#53](https://github.com/kokkos/kokkos/issues/53)
- UVM allocations in multi-GPU systems [\#50](https://github.com/kokkos/kokkos/issues/50)
- Synchronic in Kokkos::Impl [\#44](https://github.com/kokkos/kokkos/issues/44)
- index and dimension types in for loops [\#28](https://github.com/kokkos/kokkos/issues/28)
- Subview assign of 1D Strided with stride 1 to LayoutLeft/Right [\#1](https://github.com/kokkos/kokkos/issues/1)
**Fixed bugs:**
- misspelled variable name in Kokkos\_Atomic\_Fetch + missing unit tests [\#340](https://github.com/kokkos/kokkos/issues/340)
- seg fault Kokkos::Impl::CudaInternal::print\_configuration [\#338](https://github.com/kokkos/kokkos/issues/338)
- Clang compiler error with named parallel\_reduce, tags, and TeamPolicy. [\#335](https://github.com/kokkos/kokkos/issues/335)
- Shared Memory Allocation Error at parallel\_reduce [\#311](https://github.com/kokkos/kokkos/issues/311)
- DynRankView: Fix resize and realloc [\#303](https://github.com/kokkos/kokkos/issues/303)
- Scratch memory and dynamic scheduling [\#279](https://github.com/kokkos/kokkos/issues/279)
- MemoryPool infinite loop when out of memory [\#312](https://github.com/kokkos/kokkos/issues/312)
- Kokkos DynRankView changes break Sacado and Panzer [\#299](https://github.com/kokkos/kokkos/issues/299)
- MemoryPool fails to compile on non-cuda non-x86 [\#297](https://github.com/kokkos/kokkos/issues/297)
- Random Number Generator Fix [\#296](https://github.com/kokkos/kokkos/issues/296)
- View template parameter ordering Bug [\#282](https://github.com/kokkos/kokkos/issues/282)
- Serial task policy broken. [\#281](https://github.com/kokkos/kokkos/issues/281)
- deep\_copy with LayoutStride should not memcpy [\#262](https://github.com/kokkos/kokkos/issues/262)
- DualView::need\_sync should be a const method [\#248](https://github.com/kokkos/kokkos/issues/248)
- Arbitrary-sized atomics on GPUs broken; loop forever [\#238](https://github.com/kokkos/kokkos/issues/238)
- boolean reduction value\_type changes answer [\#225](https://github.com/kokkos/kokkos/issues/225)
- Custom init\(\) function for parallel\_reduce with array value\_type [\#210](https://github.com/kokkos/kokkos/issues/210)
- unit\_test Makefile is Broken - Recursively Calls itself until Machine Apocalypse. [\#202](https://github.com/kokkos/kokkos/issues/202)
- nvcc\_wrapper Does Not Support -Xcompiler \<compiler option\> [\#198](https://github.com/kokkos/kokkos/issues/198)
- Kokkos exec space init should init Kokkos profiling [\#192](https://github.com/kokkos/kokkos/issues/192)
- Kokkos Threads Backend impl\_shared\_alloc Broken on Intel 16.1 \(Shepard Haswell\) [\#186](https://github.com/kokkos/kokkos/issues/186)
- pthread back end hangs if used uninitialized [\#182](https://github.com/kokkos/kokkos/issues/182)
- parallel\_reduce of size 0, not calling init/join [\#175](https://github.com/kokkos/kokkos/issues/175)
- Bug in Threads with OpenMP enabled [\#173](https://github.com/kokkos/kokkos/issues/173)
- KokkosExp\_SharedAlloc, m\_team\_work\_index inaccessible [\#166](https://github.com/kokkos/kokkos/issues/166)
- 128-bit CAS without Assembly Broken? [\#161](https://github.com/kokkos/kokkos/issues/161)
- fatal error: Cuda/Kokkos\_Cuda\_abort.hpp: No such file or directory [\#157](https://github.com/kokkos/kokkos/issues/157)
- Power8: Fix OpenMP backend [\#139](https://github.com/kokkos/kokkos/issues/139)
- Data race in Kokkos OpenMP initialization [\#131](https://github.com/kokkos/kokkos/issues/131)
- parallel\_launch\_local\_memory and cuda 7.5 [\#125](https://github.com/kokkos/kokkos/issues/125)
- Resize can fail with Cuda due to asynchronous dispatch [\#119](https://github.com/kokkos/kokkos/issues/119)
- Qthread taskpolicy initialization bug. [\#92](https://github.com/kokkos/kokkos/issues/92)
- Windows: sys/mman.h [\#89](https://github.com/kokkos/kokkos/issues/89)
- Windows: atomic\_fetch\_sub\(\) [\#88](https://github.com/kokkos/kokkos/issues/88)
- Windows: snprintf [\#87](https://github.com/kokkos/kokkos/issues/87)
- Parallel\_Reduce with TeamPolicy and league size of 0 returns garbage [\#85](https://github.com/kokkos/kokkos/issues/85)
- Throw with Cuda when using \(2D\) team\_policy parallel\_reduce with less than a warp size [\#76](https://github.com/kokkos/kokkos/issues/76)
- Scalar views don't work with Kokkos::Atomic memory trait [\#69](https://github.com/kokkos/kokkos/issues/69)
- Reduce the number of threads per team for Cuda [\#63](https://github.com/kokkos/kokkos/issues/63)
- Named Kernels fail for reductions with CUDA [\#60](https://github.com/kokkos/kokkos/issues/60)
- Kokkos View dimension\_\(\) for long returning unsigned int [\#20](https://github.com/kokkos/kokkos/issues/20)
- atomic test hangs with LLVM [\#6](https://github.com/kokkos/kokkos/issues/6)
- OpenMP Test should set omp\_set\_num\_threads to 1 [\#4](https://github.com/kokkos/kokkos/issues/4)
**Closed issues:**
- develop branch broken with CUDA 8 and --expt-extended-lambda [\#354](https://github.com/kokkos/kokkos/issues/354)
- --arch=KNL with Intel 2016 build failure [\#349](https://github.com/kokkos/kokkos/issues/349)
- Error building with Cuda when passing -DKOKKOS\_CUDA\_USE\_LAMBDA to generate\_makefile.bash [\#343](https://github.com/kokkos/kokkos/issues/343)
- Can I safely use int indices in a 2-D View with capacity \> 2B? [\#318](https://github.com/kokkos/kokkos/issues/318)
- Kokkos::ViewAllocateWithoutInitializing is not working [\#317](https://github.com/kokkos/kokkos/issues/317)
- Intel build on Mac OS X [\#277](https://github.com/kokkos/kokkos/issues/277)
- deleted [\#271](https://github.com/kokkos/kokkos/issues/271)
- Broken Mira build [\#268](https://github.com/kokkos/kokkos/issues/268)
- 32-bit build [\#246](https://github.com/kokkos/kokkos/issues/246)
- parallel\_reduce with RDC crashes linker [\#232](https://github.com/kokkos/kokkos/issues/232)
- build of Kokkos\_Sparse\_MV\_impl\_spmv\_Serial.cpp.o fails if you use nvcc and have cuda disabled [\#209](https://github.com/kokkos/kokkos/issues/209)
- Kokkos Serial execution space is not tested with TeamPolicy. [\#207](https://github.com/kokkos/kokkos/issues/207)
- Unit test failure on Hansen KokkosCore\_UnitTest\_Cuda\_MPI\_1 [\#200](https://github.com/kokkos/kokkos/issues/200)
- nvcc compiler warning: calling a \_\_host\_\_ function from a \_\_host\_\_ \_\_device\_\_ function is not allowed [\#180](https://github.com/kokkos/kokkos/issues/180)
- Intel 15 build error with defaulted "move" operators [\#171](https://github.com/kokkos/kokkos/issues/171)
- missing libkokkos.a during Trilinos 12.4.2 build, yet other libkokkos\*.a libs are there [\#165](https://github.com/kokkos/kokkos/issues/165)
- Tie atomic updates to execution space or even to thread team? \(speculation\) [\#144](https://github.com/kokkos/kokkos/issues/144)
- New View: Compiletime/size Test [\#137](https://github.com/kokkos/kokkos/issues/137)
- New View : Performance Test [\#136](https://github.com/kokkos/kokkos/issues/136)
- Signed/unsigned comparison warning in CUDA parallel [\#130](https://github.com/kokkos/kokkos/issues/130)
- Kokkos::complex: Need op\* w/ std::complex & real [\#126](https://github.com/kokkos/kokkos/issues/126)
- Use uintptr\_t for casting pointers [\#110](https://github.com/kokkos/kokkos/issues/110)
- Default thread mapping behavior between P and Q threads. [\#91](https://github.com/kokkos/kokkos/issues/91)
- Windows: Atomic\_Fetch\_Exchange\(\) return type [\#90](https://github.com/kokkos/kokkos/issues/90)
- Synchronic unit test is way too long [\#84](https://github.com/kokkos/kokkos/issues/84)
- nvcc\_wrapper -\> $\(NVCC\_WRAPPER\) [\#42](https://github.com/kokkos/kokkos/issues/42)
- Check compiler version and print helpful message [\#39](https://github.com/kokkos/kokkos/issues/39)
- Kokkos shared memory on Cuda uses a lot of registers [\#31](https://github.com/kokkos/kokkos/issues/31)
- Can not pass unit test `cuda.space` without a GT 720 [\#25](https://github.com/kokkos/kokkos/issues/25)
- Makefile.kokkos lacks bounds checking option that CMake has [\#24](https://github.com/kokkos/kokkos/issues/24)
- Kokkos can not complete unit tests with CUDA UVM enabled [\#23](https://github.com/kokkos/kokkos/issues/23)
- Simplify teams + shared memory histogram example to remove vectorization [\#21](https://github.com/kokkos/kokkos/issues/21)
- Kokkos needs to rever to ${PROJECT\_NAME}\_ENABLE\_CXX11 not Trilinos\_ENABLE\_CXX11 [\#17](https://github.com/kokkos/kokkos/issues/17)
- Kokkos Base Makefile adds AVX to KNC Build [\#16](https://github.com/kokkos/kokkos/issues/16)
- MS Visual Studio 2013 Build Errors [\#9](https://github.com/kokkos/kokkos/issues/9)
- subview\(X, ALL\(\), j\) for 2-D LayoutRight View X: should it view a column? [\#5](https://github.com/kokkos/kokkos/issues/5)
## [End_C++98](https://github.com/kokkos/kokkos/tree/End_C++98) (2015-04-15)
\* *This Change Log was automatically generated by [github_changelog_generator](https://github.com/skywinder/Github-Changelog-Generator)*
diff --git a/lib/kokkos/CMakeLists.txt b/lib/kokkos/CMakeLists.txt
index 16854c839..1c820660a 100644
--- a/lib/kokkos/CMakeLists.txt
+++ b/lib/kokkos/CMakeLists.txt
@@ -1,216 +1,215 @@
IF(COMMAND TRIBITS_PACKAGE_DECL)
SET(KOKKOS_HAS_TRILINOS ON CACHE BOOL "")
ELSE()
SET(KOKKOS_HAS_TRILINOS OFF CACHE BOOL "")
ENDIF()
IF(NOT KOKKOS_HAS_TRILINOS)
CMAKE_MINIMUM_REQUIRED(VERSION 2.8.11 FATAL_ERROR)
INCLUDE(cmake/tribits.cmake)
SET(CMAKE_CXX_STANDARD 11)
ENDIF()
#
# A) Forward delcare the package so that certain options are also defined for
# subpackages
#
TRIBITS_PACKAGE_DECL(Kokkos) # ENABLE_SHADOWING_WARNINGS)
#------------------------------------------------------------------------------
#
# B) Define the common options for Kokkos first so they can be used by
# subpackages as well.
#
# mfh 01 Aug 2016: See Issue #61:
#
# https://github.com/kokkos/kokkos/issues/61
#
# Don't use TRIBITS_ADD_DEBUG_OPTION() here, because that defines
# HAVE_KOKKOS_DEBUG. We define KOKKOS_HAVE_DEBUG here instead,
# for compatibility with Kokkos' Makefile build system.
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_DEBUG
KOKKOS_HAVE_DEBUG
"Enable run-time debug checks. These checks may be expensive, so they are disabled by default in a release build."
${${PROJECT_NAME}_ENABLE_DEBUG}
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_SIERRA_BUILD
KOKKOS_FOR_SIERRA
"Configure Kokkos for building within the Sierra build system."
OFF
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Cuda
KOKKOS_HAVE_CUDA
"Enable CUDA support in Kokkos."
"${TPL_ENABLE_CUDA}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Cuda_UVM
KOKKOS_USE_CUDA_UVM
"Enable CUDA Unified Virtual Memory as the default in Kokkos."
OFF
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Cuda_RDC
KOKKOS_HAVE_CUDA_RDC
"Enable CUDA Relocatable Device Code support in Kokkos."
OFF
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Cuda_Lambda
KOKKOS_HAVE_CUDA_LAMBDA
"Enable CUDA LAMBDA support in Kokkos."
OFF
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Pthread
KOKKOS_HAVE_PTHREAD
"Enable Pthread support in Kokkos."
OFF
)
ASSERT_DEFINED(TPL_ENABLE_Pthread)
IF (Kokkos_ENABLE_Pthread AND NOT TPL_ENABLE_Pthread)
MESSAGE(FATAL_ERROR "You set Kokkos_ENABLE_Pthread=ON, but Trilinos' support for Pthread(s) is not enabled (TPL_ENABLE_Pthread=OFF). This is not allowed. Please enable Pthreads in Trilinos before attempting to enable Kokkos' support for Pthreads.")
ENDIF ()
IF (NOT TPL_ENABLE_Pthread)
ADD_DEFINITIONS(-DGTEST_HAS_PTHREAD=0)
ENDIF()
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_OpenMP
KOKKOS_HAVE_OPENMP
"Enable OpenMP support in Kokkos."
"${${PROJECT_NAME}_ENABLE_OpenMP}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
- Kokkos_ENABLE_QTHREAD
- KOKKOS_HAVE_QTHREAD
- "Enable QTHREAD support in Kokkos."
- "${TPL_ENABLE_QTHREAD}"
+ Kokkos_ENABLE_Qthreads
+ KOKKOS_HAVE_QTHREADS
+ "Enable Qthreads support in Kokkos."
+ "${TPL_ENABLE_QTHREADS}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_CXX11
KOKKOS_HAVE_CXX11
"Enable C++11 support in Kokkos."
"${${PROJECT_NAME}_ENABLE_CXX11}"
)
-
+
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_HWLOC
KOKKOS_HAVE_HWLOC
"Enable HWLOC support in Kokkos."
"${TPL_ENABLE_HWLOC}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_MPI
KOKKOS_HAVE_MPI
"Enable MPI support in Kokkos."
"${TPL_ENABLE_MPI}"
)
# Set default value of Kokkos_ENABLE_Debug_Bounds_Check option
#
# CMake is case sensitive. The Kokkos_ENABLE_Debug_Bounds_Check
# option (defined below) is annoyingly not all caps, but we need to
# keep it that way for backwards compatibility. If users forget and
# try using an all-caps variable, then make it count by using the
# all-caps version as the default value of the original, not-all-caps
# option. Otherwise, the default value of this option comes from
# Kokkos_ENABLE_DEBUG (see Issue #367).
ASSERT_DEFINED(${PACKAGE_NAME}_ENABLE_DEBUG)
IF(DEFINED Kokkos_ENABLE_DEBUG_BOUNDS_CHECK)
IF(Kokkos_ENABLE_DEBUG_BOUNDS_CHECK)
SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT ON)
ELSE()
SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT "${${PACKAGE_NAME}_ENABLE_DEBUG}")
ENDIF()
ELSE()
SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT "${${PACKAGE_NAME}_ENABLE_DEBUG}")
ENDIF()
ASSERT_DEFINED(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Debug_Bounds_Check
KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
"Enable Kokkos::View run-time bounds checking."
"${Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Profiling
KOKKOS_ENABLE_PROFILING_INTERNAL
"Enable KokkosP profiling support for kernel data collections."
"${TPL_ENABLE_DLlib}"
)
# placeholder for future device...
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Winthread
KOKKOS_HAVE_WINTHREAD
"Enable Winthread support in Kokkos."
"${TPL_ENABLE_Winthread}"
)
# use new/old View
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_USING_DEPRECATED_VIEW
KOKKOS_USING_DEPRECATED_VIEW
"Choose whether to use the old, deprecated Kokkos::View"
OFF
)
#------------------------------------------------------------------------------
#
# C) Install Kokkos' executable scripts
#
# nvcc_wrapper is Kokkos' wrapper for NVIDIA's NVCC CUDA compiler.
# Kokkos needs nvcc_wrapper in order to build. Other libraries and
# executables also need nvcc_wrapper. Thus, we need to install it.
# If the argument of DESTINATION is a relative path, CMake computes it
# as relative to ${CMAKE_INSTALL_PATH}.
INSTALL(PROGRAMS ${CMAKE_CURRENT_SOURCE_DIR}/bin/nvcc_wrapper DESTINATION bin)
#------------------------------------------------------------------------------
#
# D) Process the subpackages for Kokkos
#
TRIBITS_PROCESS_SUBPACKAGES()
#
# E) If Kokkos itself is enabled, process the Kokkos package
#
TRIBITS_PACKAGE_DEF()
TRIBITS_EXCLUDE_AUTOTOOLS_FILES()
TRIBITS_EXCLUDE_FILES(
classic/doc
classic/LinAlg/doc/CrsRefactorNotesMay2012
)
TRIBITS_PACKAGE_POSTPROCESS()
-
diff --git a/lib/kokkos/Makefile.kokkos b/lib/kokkos/Makefile.kokkos
index 9d00c1902..5b094dba8 100644
--- a/lib/kokkos/Makefile.kokkos
+++ b/lib/kokkos/Makefile.kokkos
@@ -1,676 +1,699 @@
-# Default settings common options
+# Default settings common options.
#LAMMPS specific settings:
KOKKOS_PATH=../../lib/kokkos
CXXFLAGS=$(CCFLAGS)
-#Options: OpenMP,Serial,Pthreads,Cuda
+# Options: Cuda,OpenMP,Pthreads,Qthreads,Serial
KOKKOS_DEVICES ?= "OpenMP"
#KOKKOS_DEVICES ?= "Pthreads"
-#Options: KNC,SNB,HSW,Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal61,ARMv80,ARMv81,ARMv8-ThunderX,BGQ,Power7,Power8,Power9,KNL,BDW,SKX
+# Options: KNC,SNB,HSW,Kepler,Kepler30,Kepler32,Kepler35,Kepler37,Maxwell,Maxwell50,Maxwell52,Maxwell53,Pascal60,Pascal61,ARMv80,ARMv81,ARMv8-ThunderX,BGQ,Power7,Power8,Power9,KNL,BDW,SKX
KOKKOS_ARCH ?= ""
-#Options: yes,no
+# Options: yes,no
KOKKOS_DEBUG ?= "no"
-#Options: hwloc,librt,experimental_memkind
+# Options: hwloc,librt,experimental_memkind
KOKKOS_USE_TPLS ?= ""
-#Options: c++11,c++1z
+# Options: c++11,c++1z
KOKKOS_CXX_STANDARD ?= "c++11"
-#Options: aggressive_vectorization,disable_profiling
+# Options: aggressive_vectorization,disable_profiling
KOKKOS_OPTIONS ?= ""
-#Default settings specific options
-#Options: force_uvm,use_ldg,rdc,enable_lambda
+# Default settings specific options.
+# Options: force_uvm,use_ldg,rdc,enable_lambda
KOKKOS_CUDA_OPTIONS ?= "enable_lambda"
-# Check for general settings
-
+# Check for general settings.
KOKKOS_INTERNAL_ENABLE_DEBUG := $(strip $(shell echo $(KOKKOS_DEBUG) | grep "yes" | wc -l))
KOKKOS_INTERNAL_ENABLE_CXX11 := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) | grep "c++11" | wc -l))
KOKKOS_INTERNAL_ENABLE_CXX1Z := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) | grep "c++1z" | wc -l))
-# Check for external libraries
+# Check for external libraries.
KOKKOS_INTERNAL_USE_HWLOC := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "hwloc" | wc -l))
KOKKOS_INTERNAL_USE_LIBRT := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "librt" | wc -l))
KOKKOS_INTERNAL_USE_MEMKIND := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "experimental_memkind" | wc -l))
-# Check for advanced settings
+# Check for advanced settings.
KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "aggressive_vectorization" | wc -l))
KOKKOS_INTERNAL_DISABLE_PROFILING := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "disable_profiling" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_LDG := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "use_ldg" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_UVM := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "force_uvm" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_RELOC := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "rdc" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_LAMBDA := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "enable_lambda" | wc -l))
-# Check for Kokkos Host Execution Spaces one of which must be on
-
+# Check for Kokkos Host Execution Spaces one of which must be on.
KOKKOS_INTERNAL_USE_OPENMP := $(strip $(shell echo $(KOKKOS_DEVICES) | grep OpenMP | wc -l))
KOKKOS_INTERNAL_USE_PTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Pthread | wc -l))
+KOKKOS_INTERNAL_USE_QTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Qthreads | wc -l))
KOKKOS_INTERNAL_USE_SERIAL := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Serial | wc -l))
-KOKKOS_INTERNAL_USE_QTHREAD := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Qthread | wc -l))
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 0)
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 0)
- KOKKOS_INTERNAL_USE_SERIAL := 1
+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 0)
+ KOKKOS_INTERNAL_USE_SERIAL := 1
+endif
endif
endif
-# Check for other Execution Spaces
-
+# Check for other Execution Spaces.
KOKKOS_INTERNAL_USE_CUDA := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Cuda | wc -l))
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
KOKKOS_INTERNAL_NVCC_PATH := $(shell which nvcc)
CUDA_PATH ?= $(KOKKOS_INTERNAL_NVCC_PATH:/bin/nvcc=)
KOKKOS_INTERNAL_COMPILER_NVCC_VERSION := $(shell nvcc --version 2>&1 | grep release | cut -d' ' -f5 | cut -d',' -f1 | tr -d .)
endif
-# Check OS
-
+# Check OS.
KOKKOS_OS := $(shell uname -s)
KOKKOS_INTERNAL_OS_CYGWIN := $(shell uname -s | grep CYGWIN | wc -l)
KOKKOS_INTERNAL_OS_LINUX := $(shell uname -s | grep Linux | wc -l)
KOKKOS_INTERNAL_OS_DARWIN := $(shell uname -s | grep Darwin | wc -l)
-# Check compiler
-
-KOKKOS_INTERNAL_COMPILER_INTEL := $(shell $(CXX) --version 2>&1 | grep "Intel Corporation" | wc -l)
-KOKKOS_INTERNAL_COMPILER_PGI := $(shell $(CXX) --version 2>&1 | grep PGI | wc -l)
-KOKKOS_INTERNAL_COMPILER_XL := $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l)
-KOKKOS_INTERNAL_COMPILER_CRAY := $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l)
-KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(CXX) --version 2>&1 | grep "nvcc" | wc -l)
+# Check compiler.
+KOKKOS_INTERNAL_COMPILER_INTEL := $(shell $(CXX) --version 2>&1 | grep "Intel Corporation" | wc -l)
+KOKKOS_INTERNAL_COMPILER_PGI := $(shell $(CXX) --version 2>&1 | grep PGI | wc -l)
+KOKKOS_INTERNAL_COMPILER_XL := $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l)
+KOKKOS_INTERNAL_COMPILER_CRAY := $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l)
+KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(CXX) --version 2>&1 | grep "nvcc" | wc -l)
ifneq ($(OMPI_CXX),)
KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(OMPI_CXX) --version 2>&1 | grep "nvcc" | wc -l)
endif
ifneq ($(MPICH_CXX),)
KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(MPICH_CXX) --version 2>&1 | grep "nvcc" | wc -l)
endif
-KOKKOS_INTERNAL_COMPILER_CLANG := $(shell $(CXX) --version 2>&1 | grep "clang" | wc -l)
+KOKKOS_INTERNAL_COMPILER_CLANG := $(shell $(CXX) --version 2>&1 | grep "clang" | wc -l)
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 2)
KOKKOS_INTERNAL_COMPILER_CLANG = 1
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 2)
KOKKOS_INTERNAL_COMPILER_XL = 1
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_INTERNAL_COMPILER_CLANG_VERSION := $(shell clang --version | grep version | cut -d ' ' -f3 | tr -d '.')
+
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_CLANG_VERSION) -lt 400; echo $$?),0)
- $(error Compiling Cuda code directly with Clang requires version 4.0.0 or higher)
+ $(error Compiling Cuda code directly with Clang requires version 4.0.0 or higher)
endif
KOKKOS_INTERNAL_CUDA_USE_LAMBDA := 1
endif
endif
-
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
- KOKKOS_INTERNAL_OPENMP_FLAG := -mp
+ KOKKOS_INTERNAL_OPENMP_FLAG := -mp
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp=libomp
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
KOKKOS_INTERNAL_OPENMP_FLAG := -qsmp=omp
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
- # OpenMP is turned on by default in Cray compiler environment
+ # OpenMP is turned on by default in Cray compiler environment.
KOKKOS_INTERNAL_OPENMP_FLAG :=
else
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp
endif
endif
endif
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
KOKKOS_INTERNAL_CXX11_FLAG := --c++11
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
KOKKOS_INTERNAL_CXX11_FLAG := -std=c++11
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
KOKKOS_INTERNAL_CXX11_FLAG := -hstd=c++11
else
KOKKOS_INTERNAL_CXX11_FLAG := --std=c++11
KOKKOS_INTERNAL_CXX1Z_FLAG := --std=c++1z
endif
endif
endif
-# Check for Kokkos Architecture settings
+# Check for Kokkos Architecture settings.
-#Intel based
+# Intel based.
KOKKOS_INTERNAL_USE_ARCH_KNC := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNC | wc -l))
KOKKOS_INTERNAL_USE_ARCH_SNB := $(strip $(shell echo $(KOKKOS_ARCH) | grep SNB | wc -l))
KOKKOS_INTERNAL_USE_ARCH_HSW := $(strip $(shell echo $(KOKKOS_ARCH) | grep HSW | wc -l))
KOKKOS_INTERNAL_USE_ARCH_BDW := $(strip $(shell echo $(KOKKOS_ARCH) | grep BDW | wc -l))
KOKKOS_INTERNAL_USE_ARCH_SKX := $(strip $(shell echo $(KOKKOS_ARCH) | grep SKX | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KNL := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNL | wc -l))
-#NVIDIA based
-NVCC_WRAPPER := $(KOKKOS_PATH)/config/nvcc_wrapper
+# NVIDIA based.
+NVCC_WRAPPER := $(KOKKOS_PATH)/config/nvcc_wrapper
KOKKOS_INTERNAL_USE_ARCH_KEPLER30 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler30 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER32 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler32 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler35 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER37 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler37 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell50 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_MAXWELL52 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell52 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_MAXWELL53 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell53 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_PASCAL61 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Pascal61 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_PASCAL60 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Pascal60 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
-KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell | wc -l))
-KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler | wc -l))
-KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
- + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
- + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
- + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
- + $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
- + $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
- + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
- + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
- + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
-endif
-
-#ARM based
+ KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell | wc -l))
+ KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler | wc -l))
+ KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
+ + $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
+endif
+
+# ARM based.
KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv80 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv81 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv8-ThunderX | wc -l))
-#IBM based
+# IBM based.
KOKKOS_INTERNAL_USE_ARCH_BGQ := $(strip $(shell echo $(KOKKOS_ARCH) | grep BGQ | wc -l))
KOKKOS_INTERNAL_USE_ARCH_POWER7 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Power7 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_POWER8 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Power8 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_POWER9 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Power9 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_IBM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_BGQ)+$(KOKKOS_INTERNAL_USE_ARCH_POWER7)+$(KOKKOS_INTERNAL_USE_ARCH_POWER8)+$(KOKKOS_INTERNAL_USE_ARCH_POWER9) | bc))
-#AMD based
+# AMD based.
KOKKOS_INTERNAL_USE_ARCH_AMDAVX := $(strip $(shell echo $(KOKKOS_ARCH) | grep AMDAVX | wc -l))
-#Any AVX?
+# Any AVX?
KOKKOS_INTERNAL_USE_ARCH_AVX := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX) | bc ))
KOKKOS_INTERNAL_USE_ARCH_AVX2 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW) | bc ))
KOKKOS_INTERNAL_USE_ARCH_AVX512MIC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNL) | bc ))
KOKKOS_INTERNAL_USE_ARCH_AVX512XEON := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
-# Decide what ISA level we are able to support
-KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_KNL)+$(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
-KOKKOS_INTERNAL_USE_ISA_KNC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNC) | bc ))
-KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER8)+$(KOKKOS_INTERNAL_USE_ARCH_POWER9) | bc ))
+# Decide what ISA level we are able to support.
+KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_KNL)+$(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
+KOKKOS_INTERNAL_USE_ISA_KNC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNC) | bc ))
+KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER8)+$(KOKKOS_INTERNAL_USE_ARCH_POWER9) | bc ))
-#Incompatible flags?
+# Incompatible flags?
KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX)>1" | bc ))
KOKKOS_INTERNAL_USE_ARCH_MULTIGPU := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_NVIDIA)>1" | bc))
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MULTIHOST), 1)
$(error Defined Multiple Host architectures: KOKKOS_ARCH=$(KOKKOS_ARCH) )
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MULTIGPU), 1)
$(error Defined Multiple GPU architectures: KOKKOS_ARCH=$(KOKKOS_ARCH) )
endif
-#Generating the list of Flags
+# Generating the list of Flags.
KOKKOS_CPPFLAGS = -I./ -I$(KOKKOS_PATH)/core/src -I$(KOKKOS_PATH)/containers/src -I$(KOKKOS_PATH)/algorithms/src
# No warnings:
KOKKOS_CXXFLAGS =
# INTEL and CLANG warnings:
#KOKKOS_CXXFLAGS = -Wall -Wshadow -pedantic -Wsign-compare -Wtype-limits -Wuninitialized
# GCC warnings:
#KOKKOS_CXXFLAGS = -Wall -Wshadow -pedantic -Wsign-compare -Wtype-limits -Wuninitialized -Wignored-qualifiers -Wempty-body -Wclobbered
KOKKOS_LIBS = -lkokkos -ldl
KOKKOS_LDFLAGS = -L$(shell pwd)
-KOKKOS_SRC =
+KOKKOS_SRC =
KOKKOS_HEADERS =
-#Generating the KokkosCore_config.h file
+# Generating the KokkosCore_config.h file.
tmp := $(shell echo "/* ---------------------------------------------" > KokkosCore_config.tmp)
tmp := $(shell echo "Makefile constructed configuration:" >> KokkosCore_config.tmp)
tmp := $(shell date >> KokkosCore_config.tmp)
tmp := $(shell echo "----------------------------------------------*/" >> KokkosCore_config.tmp)
-
tmp := $(shell echo "/* Execution Spaces */" >> KokkosCore_config.tmp)
+
+ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
+ tmp := $(shell echo "\#define KOKKOS_HAVE_CUDA 1" >> KokkosCore_config.tmp )
+endif
+
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
- tmp := $(shell echo '\#define KOKKOS_HAVE_OPENMP 1' >> KokkosCore_config.tmp)
+ tmp := $(shell echo '\#define KOKKOS_HAVE_OPENMP 1' >> KokkosCore_config.tmp)
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
- tmp := $(shell echo "\#define KOKKOS_HAVE_PTHREAD 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_HAVE_PTHREAD 1" >> KokkosCore_config.tmp )
endif
-ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
- tmp := $(shell echo "\#define KOKKOS_HAVE_SERIAL 1" >> KokkosCore_config.tmp )
+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
+ tmp := $(shell echo "\#define KOKKOS_HAVE_QTHREADS 1" >> KokkosCore_config.tmp )
endif
-ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- tmp := $(shell echo "\#define KOKKOS_HAVE_CUDA 1" >> KokkosCore_config.tmp )
+ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
+ tmp := $(shell echo "\#define KOKKOS_HAVE_SERIAL 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_X86_64), 1)
- tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_USE_ISA_X86_64" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_USE_ISA_X86_64" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_KNC), 1)
- tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_USE_ISA_KNC" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_USE_ISA_KNC" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_POWERPCLE), 1)
- tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_USE_ISA_POWERPCLE" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
-endif
-
-ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
- KOKKOS_CPPFLAGS += -I$(QTHREAD_PATH)/include
- KOKKOS_LDFLAGS += -L$(QTHREAD_PATH)/lib
- tmp := $(shell echo "\#define KOKKOS_HAVE_QTHREAD 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_USE_ISA_POWERPCLE" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
endif
tmp := $(shell echo "/* General Settings */" >> KokkosCore_config.tmp)
ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX11), 1)
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX11_FLAG)
- tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX11_FLAG)
+ tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX1Z), 1)
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX1Z_FLAG)
- tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_HAVE_CXX1Z 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX1Z_FLAG)
+ tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_HAVE_CXX1Z 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_ENABLE_DEBUG), 1)
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
- KOKKOS_CXXFLAGS += -lineinfo
+ KOKKOS_CXXFLAGS += -lineinfo
endif
- KOKKOS_CXXFLAGS += -g
- KOKKOS_LDFLAGS += -g -ldl
- tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_HAVE_DEBUG 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += -g
+ KOKKOS_LDFLAGS += -g -ldl
+ tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_HAVE_DEBUG 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_HWLOC), 1)
- KOKKOS_CPPFLAGS += -I$(HWLOC_PATH)/include
- KOKKOS_LDFLAGS += -L$(HWLOC_PATH)/lib
- KOKKOS_LIBS += -lhwloc
- tmp := $(shell echo "\#define KOKKOS_HAVE_HWLOC 1" >> KokkosCore_config.tmp )
+ KOKKOS_CPPFLAGS += -I$(HWLOC_PATH)/include
+ KOKKOS_LDFLAGS += -L$(HWLOC_PATH)/lib
+ KOKKOS_LIBS += -lhwloc
+ tmp := $(shell echo "\#define KOKKOS_HAVE_HWLOC 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_LIBRT), 1)
- tmp := $(shell echo "\#define KOKKOS_USE_LIBRT 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define PREC_TIMER 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_USE_LIBRT 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define PREC_TIMER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOSP_ENABLE_RTLIB 1" >> KokkosCore_config.tmp )
- KOKKOS_LIBS += -lrt
+ KOKKOS_LIBS += -lrt
endif
ifeq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1)
KOKKOS_CPPFLAGS += -I$(MEMKIND_PATH)/include
- KOKKOS_LDFLAGS += -L$(MEMKIND_PATH)/lib
- KOKKOS_LIBS += -lmemkind
+ KOKKOS_LDFLAGS += -L$(MEMKIND_PATH)/lib
+ KOKKOS_LIBS += -lmemkind
tmp := $(shell echo "\#define KOKKOS_HAVE_HBWSPACE 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_DISABLE_PROFILING), 1)
tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING 0" >> KokkosCore_config.tmp )
endif
tmp := $(shell echo "/* Optimization Settings */" >> KokkosCore_config.tmp)
ifeq ($(KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION), 1)
tmp := $(shell echo "\#define KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION 1" >> KokkosCore_config.tmp )
endif
tmp := $(shell echo "/* Cuda Settings */" >> KokkosCore_config.tmp)
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
+
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LDG), 1)
- tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
- tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_USE_CUDA_UVM 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_USE_CUDA_UVM 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
- tmp := $(shell echo "\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += --relocatable-device-code=true
- KOKKOS_LDFLAGS += --relocatable-device-code=true
+ tmp := $(shell echo "\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += --relocatable-device-code=true
+ KOKKOS_LDFLAGS += --relocatable-device-code=true
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LAMBDA), 1)
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION) -gt 70; echo $$?),0)
- tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -expt-extended-lambda
+ tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += -expt-extended-lambda
else
$(warning Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.)
endif
endif
+
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
endif
endif
+
endif
-#Add Architecture flags
+# Add Architecture flags.
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV80), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
- KOKKOS_CXXFLAGS +=
- KOKKOS_LDFLAGS +=
+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
+
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
- KOKKOS_CXXFLAGS +=
- KOKKOS_LDFLAGS +=
- else
- KOKKOS_CXXFLAGS += -march=armv8-a
- KOKKOS_LDFLAGS += -march=armv8-a
- endif
+ KOKKOS_CXXFLAGS += -march=armv8-a
+ KOKKOS_LDFLAGS += -march=armv8-a
endif
+ endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV81), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV81 1" >> KokkosCore_config.tmp )
- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
- KOKKOS_CXXFLAGS +=
- KOKKOS_LDFLAGS +=
+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV81 1" >> KokkosCore_config.tmp )
+
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
- KOKKOS_CXXFLAGS +=
- KOKKOS_LDFLAGS +=
- else
- KOKKOS_CXXFLAGS += -march=armv8.1-a
- KOKKOS_LDFLAGS += -march=armv8.1-a
- endif
+ KOKKOS_CXXFLAGS += -march=armv8.1-a
+ KOKKOS_LDFLAGS += -march=armv8.1-a
endif
+ endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV8_THUNDERX 1" >> KokkosCore_config.tmp )
- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
- KOKKOS_CXXFLAGS +=
- KOKKOS_LDFLAGS +=
+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV8_THUNDERX 1" >> KokkosCore_config.tmp )
+
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ KOKKOS_CXXFLAGS +=
+ KOKKOS_LDFLAGS +=
else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
- KOKKOS_CXXFLAGS +=
- KOKKOS_LDFLAGS +=
- else
- KOKKOS_CXXFLAGS += -march=armv8-a -mtune=thunderx
- KOKKOS_LDFLAGS += -march=armv8-a -mtune=thunderx
- endif
+ KOKKOS_CXXFLAGS += -march=armv8-a -mtune=thunderx
+ KOKKOS_LDFLAGS += -march=armv8-a -mtune=thunderx
endif
+ endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_AVX 1" >> KokkosCore_config.tmp )
- ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
- KOKKOS_CXXFLAGS += -mavx
- KOKKOS_LDFLAGS += -mavx
- else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
-
- else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
- KOKKOS_CXXFLAGS += -tp=sandybridge
- KOKKOS_LDFLAGS += -tp=sandybridge
- else
- # Assume that this is a really a GNU compiler
- KOKKOS_CXXFLAGS += -mavx
- KOKKOS_LDFLAGS += -mavx
- endif
- endif
- endif
+ tmp := $(shell echo "\#define KOKKOS_ARCH_AVX 1" >> KokkosCore_config.tmp )
+
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
+ KOKKOS_CXXFLAGS += -mavx
+ KOKKOS_LDFLAGS += -mavx
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ KOKKOS_CXXFLAGS += -tp=sandybridge
+ KOKKOS_LDFLAGS += -tp=sandybridge
+ else
+ # Assume that this is a really a GNU compiler.
+ KOKKOS_CXXFLAGS += -mavx
+ KOKKOS_LDFLAGS += -mavx
+ endif
+ endif
+ endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER8), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_POWER8 1" >> KokkosCore_config.tmp )
- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ tmp := $(shell echo "\#define KOKKOS_ARCH_POWER8 1" >> KokkosCore_config.tmp )
+
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
- else
- # Assume that this is a really a GNU compiler or it could be XL on P8
- KOKKOS_CXXFLAGS += -mcpu=power8 -mtune=power8
- KOKKOS_LDFLAGS += -mcpu=power8 -mtune=power8
- endif
+ else
+ # Assume that this is a really a GNU compiler or it could be XL on P8.
+ KOKKOS_CXXFLAGS += -mcpu=power8 -mtune=power8
+ KOKKOS_LDFLAGS += -mcpu=power8 -mtune=power8
+ endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER9), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_POWER9 1" >> KokkosCore_config.tmp )
- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ tmp := $(shell echo "\#define KOKKOS_ARCH_POWER9 1" >> KokkosCore_config.tmp )
+
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
- else
- # Assume that this is a really a GNU compiler or it could be XL on P9
- KOKKOS_CXXFLAGS += -mcpu=power9 -mtune=power9
- KOKKOS_LDFLAGS += -mcpu=power9 -mtune=power9
- endif
+ else
+ # Assume that this is a really a GNU compiler or it could be XL on P9.
+ KOKKOS_CXXFLAGS += -mcpu=power9 -mtune=power9
+ KOKKOS_LDFLAGS += -mcpu=power9 -mtune=power9
+ endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX2), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
- ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
- KOKKOS_CXXFLAGS += -xCORE-AVX2
- KOKKOS_LDFLAGS += -xCORE-AVX2
- else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
-
- else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
- KOKKOS_CXXFLAGS += -tp=haswell
- KOKKOS_LDFLAGS += -tp=haswell
- else
- # Assume that this is a really a GNU compiler
- KOKKOS_CXXFLAGS += -march=core-avx2 -mtune=core-avx2
- KOKKOS_LDFLAGS += -march=core-avx2 -mtune=core-avx2
- endif
- endif
- endif
+ tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
+
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
+ KOKKOS_CXXFLAGS += -xCORE-AVX2
+ KOKKOS_LDFLAGS += -xCORE-AVX2
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ KOKKOS_CXXFLAGS += -tp=haswell
+ KOKKOS_LDFLAGS += -tp=haswell
+ else
+ # Assume that this is a really a GNU compiler.
+ KOKKOS_CXXFLAGS += -march=core-avx2 -mtune=core-avx2
+ KOKKOS_LDFLAGS += -march=core-avx2 -mtune=core-avx2
+ endif
+ endif
+ endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512MIC 1" >> KokkosCore_config.tmp )
- ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
- KOKKOS_CXXFLAGS += -xMIC-AVX512
- KOKKOS_LDFLAGS += -xMIC-AVX512
- else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+ tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512MIC 1" >> KokkosCore_config.tmp )
+
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
+ KOKKOS_CXXFLAGS += -xMIC-AVX512
+ KOKKOS_LDFLAGS += -xMIC-AVX512
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
- else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
- else
- # Asssume that this is really a GNU compiler
- KOKKOS_CXXFLAGS += -march=knl
- KOKKOS_LDFLAGS += -march=knl
- endif
- endif
- endif
+ else
+ # Asssume that this is really a GNU compiler.
+ KOKKOS_CXXFLAGS += -march=knl
+ KOKKOS_LDFLAGS += -march=knl
+ endif
+ endif
+ endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512XEON 1" >> KokkosCore_config.tmp )
- ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
- KOKKOS_CXXFLAGS += -xCORE-AVX512
- KOKKOS_LDFLAGS += -xCORE-AVX512
- else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
+ tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512XEON 1" >> KokkosCore_config.tmp )
+
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
+ KOKKOS_CXXFLAGS += -xCORE-AVX512
+ KOKKOS_LDFLAGS += -xCORE-AVX512
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
- else
- ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
+ else
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
- else
- # Nothing here yet
- KOKKOS_CXXFLAGS += -march=skylake-avx512
- KOKKOS_LDFLAGS += -march=skylake-avx512
- endif
- endif
- endif
+ else
+ # Nothing here yet.
+ KOKKOS_CXXFLAGS += -march=skylake-avx512
+ KOKKOS_LDFLAGS += -march=skylake-avx512
+ endif
+ endif
+ endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KNC), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_KNC 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += -mmic
- KOKKOS_LDFLAGS += -mmic
+ tmp := $(shell echo "\#define KOKKOS_ARCH_KNC 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += -mmic
+ KOKKOS_LDFLAGS += -mmic
endif
-#Figure out the architecture flag for Cuda
+# Figure out the architecture flag for Cuda.
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
+
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=-arch
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
- KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=-x cuda --cuda-gpu-arch
+ KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=--cuda-gpu-arch
+ KOKKOS_CXXFLAGS += -x cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER30), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER32), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER35), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER37), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL61), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
+ tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL60), 1)
- tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
- tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
+ tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
+ tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
endif
+
endif
-
+
KOKKOS_INTERNAL_LS_CONFIG := $(shell ls KokkosCore_config.h)
ifeq ($(KOKKOS_INTERNAL_LS_CONFIG), KokkosCore_config.h)
-KOKKOS_INTERNAL_NEW_CONFIG := $(strip $(shell diff KokkosCore_config.h KokkosCore_config.tmp | grep define | wc -l))
+ KOKKOS_INTERNAL_NEW_CONFIG := $(strip $(shell diff KokkosCore_config.h KokkosCore_config.tmp | grep define | wc -l))
else
-KOKKOS_INTERNAL_NEW_CONFIG := 1
+ KOKKOS_INTERNAL_NEW_CONFIG := 1
endif
ifneq ($(KOKKOS_INTERNAL_NEW_CONFIG), 0)
- tmp := $(shell cp KokkosCore_config.tmp KokkosCore_config.h)
+ tmp := $(shell cp KokkosCore_config.tmp KokkosCore_config.h)
endif
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/containers/src/*.hpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.hpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/algorithms/src/*.hpp)
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/impl/*.cpp)
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.cpp)
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.cpp)
- KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
- KOKKOS_CXXFLAGS += -I$(CUDA_PATH)/include
- KOKKOS_LDFLAGS += -L$(CUDA_PATH)/lib64
- KOKKOS_LIBS += -lcudart -lcuda
+ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.cpp)
+ KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
+ KOKKOS_CXXFLAGS += -I$(CUDA_PATH)/include
+ KOKKOS_LDFLAGS += -L$(CUDA_PATH)/lib64
+ KOKKOS_LIBS += -lcudart -lcuda
endif
-ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
- KOKKOS_LIBS += -lpthread
- KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.cpp)
- KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
+ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
+ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.cpp)
+ KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
+
+ ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
+ KOKKOS_CXXFLAGS += -Xcompiler $(KOKKOS_INTERNAL_OPENMP_FLAG)
+ else
+ KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
+ endif
+
+ KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
endif
-ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
- KOKKOS_LIBS += -lqthread
- KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Qthread/*.cpp)
- KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Qthread/*.hpp)
+ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
+ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.cpp)
+ KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
+ KOKKOS_LIBS += -lpthread
endif
-ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
- KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.cpp)
- KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
- ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
- KOKKOS_CXXFLAGS += -Xcompiler $(KOKKOS_INTERNAL_OPENMP_FLAG)
- else
- KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
- endif
- KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_OPENMP_FLAG)
+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
+ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Qthreads/*.cpp)
+ KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Qthreads/*.hpp)
+ KOKKOS_CPPFLAGS += -I$(QTHREADS_PATH)/include
+ KOKKOS_LDFLAGS += -L$(QTHREADS_PATH)/lib
+ KOKKOS_LIBS += -lqthread
endif
-#Explicitly set the GCC Toolchain for Clang
+# Explicitly set the GCC Toolchain for Clang.
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
- KOKKOS_INTERNAL_GCC_PATH = $(shell which g++)
- KOKKOS_INTERNAL_GCC_TOOLCHAIN = $(KOKKOS_INTERNAL_GCC_PATH:/bin/g++=)
- KOKKOS_CXXFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN) -DKOKKOS_CUDA_CLANG_WORKAROUND -DKOKKOS_CUDA_USE_LDG_INTRINSIC
- KOKKOS_LDFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN)
+ KOKKOS_INTERNAL_GCC_PATH = $(shell which g++)
+ KOKKOS_INTERNAL_GCC_TOOLCHAIN = $(KOKKOS_INTERNAL_GCC_PATH:/bin/g++=)
+ KOKKOS_CXXFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN) -DKOKKOS_CUDA_CLANG_WORKAROUND -DKOKKOS_CUDA_USE_LDG_INTRINSIC
+ KOKKOS_LDFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN)
endif
-#With Cygwin functions such as fdopen and fileno are not defined
-#when strict ansi is enabled. strict ansi gets enabled with --std=c++11
-#though. So we hard undefine it here. Not sure if that has any bad side effects
-#This is needed for gtest actually, not for Kokkos itself!
+# With Cygwin functions such as fdopen and fileno are not defined
+# when strict ansi is enabled. strict ansi gets enabled with --std=c++11
+# though. So we hard undefine it here. Not sure if that has any bad side effects
+# This is needed for gtest actually, not for Kokkos itself!
ifeq ($(KOKKOS_INTERNAL_OS_CYGWIN), 1)
KOKKOS_CXXFLAGS += -U__STRICT_ANSI__
endif
-# Setting up dependencies
+# Setting up dependencies.
KokkosCore_config.h:
KOKKOS_CPP_DEPENDS := KokkosCore_config.h $(KOKKOS_HEADERS)
KOKKOS_OBJ = $(KOKKOS_SRC:.cpp=.o)
KOKKOS_OBJ_LINK = $(notdir $(KOKKOS_OBJ))
include $(KOKKOS_PATH)/Makefile.targets
kokkos-clean:
rm -f $(KOKKOS_OBJ_LINK) KokkosCore_config.h KokkosCore_config.tmp libkokkos.a
libkokkos.a: $(KOKKOS_OBJ_LINK) $(KOKKOS_SRC) $(KOKKOS_HEADERS)
ar cr libkokkos.a $(KOKKOS_OBJ_LINK)
ranlib libkokkos.a
KOKKOS_LINK_DEPENDS=libkokkos.a
diff --git a/lib/kokkos/Makefile.targets b/lib/kokkos/Makefile.targets
index a48a5f6eb..54cacb741 100644
--- a/lib/kokkos/Makefile.targets
+++ b/lib/kokkos/Makefile.targets
@@ -1,62 +1,63 @@
Kokkos_UnorderedMap_impl.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/containers/src/impl/Kokkos_UnorderedMap_impl.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/containers/src/impl/Kokkos_UnorderedMap_impl.cpp
Kokkos_Core.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Core.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Core.cpp
Kokkos_CPUDiscovery.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_CPUDiscovery.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_CPUDiscovery.cpp
Kokkos_Error.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Error.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Error.cpp
Kokkos_ExecPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_ExecPolicy.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_ExecPolicy.cpp
Kokkos_HostSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HostSpace.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HostSpace.cpp
Kokkos_hwloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_hwloc.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_hwloc.cpp
Kokkos_Serial.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp
Kokkos_Serial_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp
Kokkos_TaskQueue.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_TaskQueue.cpp
+Kokkos_HostThreadTeam.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HostThreadTeam.cpp
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HostThreadTeam.cpp
Kokkos_spinwait.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_spinwait.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_spinwait.cpp
Kokkos_Profiling_Interface.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
Kokkos_SharedAlloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_SharedAlloc.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_SharedAlloc.cpp
Kokkos_MemoryPool.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_MemoryPool.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_MemoryPool.cpp
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
Kokkos_Cuda_Impl.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Impl.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Impl.cpp
Kokkos_CudaSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_CudaSpace.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_CudaSpace.cpp
Kokkos_Cuda_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Task.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Cuda/Kokkos_Cuda_Task.cpp
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
Kokkos_ThreadsExec_base.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec_base.cpp
Kokkos_ThreadsExec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Threads/Kokkos_ThreadsExec.cpp
endif
-ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
-Kokkos_QthreadExec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthread/Kokkos_QthreadExec.cpp
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Qthread/Kokkos_QthreadExec.cpp
-Kokkos_Qthread_TaskPolicy.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
- $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
+Kokkos_QthreadsExec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthreads/Kokkos_QthreadsExec.cpp
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Qthreads/Kokkos_QthreadsExec.cpp
+Kokkos_Qthreads_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthreads/Kokkos_Qthreads_Task.cpp
+ $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/Qthreads/Kokkos_Qthreads_Task.cpp
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
Kokkos_OpenMPexec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMPexec.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMPexec.cpp
Kokkos_OpenMP_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
endif
Kokkos_HBWSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
-
diff --git a/lib/kokkos/README b/lib/kokkos/README
index 7ebde23a1..257a2e5db 100644
--- a/lib/kokkos/README
+++ b/lib/kokkos/README
@@ -1,165 +1,173 @@
Kokkos implements a programming model in C++ for writing performance portable
applications targeting all major HPC platforms. For that purpose it provides
abstractions for both parallel execution of code and data management.
Kokkos is designed to target complex node architectures with N-level memory
hierarchies and multiple types of execution resources. It currently can use
OpenMP, Pthreads and CUDA as backend programming models.
Kokkos is licensed under standard 3-clause BSD terms of use. For specifics
see the LICENSE file contained in the repository or distribution.
The core developers of Kokkos are Carter Edwards and Christian Trott
at the Computer Science Research Institute of the Sandia National
Laboratories.
The KokkosP interface and associated tools are developed by the Application
Performance Team and Kokkos core developers at Sandia National Laboratories.
To learn more about Kokkos consider watching one of our presentations:
GTC 2015:
http://on-demand.gputechconf.com/gtc/2015/video/S5166.html
http://on-demand.gputechconf.com/gtc/2015/presentation/S5166-H-Carter-Edwards.pdf
A programming guide can be found under doc/Kokkos_PG.pdf. This is an initial version
and feedback is greatly appreciated.
A separate repository with extensive tutorial material can be found under
https://github.com/kokkos/kokkos-tutorials.
If you have a patch to contribute please feel free to issue a pull request against
the develop branch. For major contributions it is better to contact us first
for guidance.
For questions please send an email to
kokkos-users@software.sandia.gov
For non-public questions send an email to
hcedwar(at)sandia.gov and crtrott(at)sandia.gov
============================================================================
====Requirements============================================================
============================================================================
Primary tested compilers on X86 are:
GCC 4.7.2
GCC 4.8.4
GCC 4.9.2
GCC 5.1.0
+ GCC 5.2.0
Intel 14.0.4
Intel 15.0.2
Intel 16.0.1
Intel 17.0.098
+ Intel 17.1.132
Clang 3.5.2
Clang 3.6.1
+ Clang 3.7.1
+ Clang 3.8.1
Clang 3.9.0
+ PGI 17.1
Primary tested compilers on Power 8 are:
GCC 5.4.0 (OpenMP,Serial)
IBM XL 13.1.3 (OpenMP, Serial) (There is a workaround in place to avoid a compiler bug)
Primary tested compilers on Intel KNL are:
+ GCC 6.2.0
Intel 16.2.181 (with gcc 4.7.2)
Intel 17.0.098 (with gcc 4.7.2)
+ Intel 17.1.132 (with gcc 4.9.3)
+ Intel 17.2.174 (with gcc 4.9.3)
+ Intel 18.0.061 (beta) (with gcc 4.9.3)
Secondary tested compilers are:
- CUDA 7.0 (with gcc 4.7.2)
- CUDA 7.5 (with gcc 4.7.2)
+ CUDA 7.0 (with gcc 4.8.4)
+ CUDA 7.5 (with gcc 4.8.4)
CUDA 8.0 (with gcc 5.3.0 on X86 and gcc 5.4.0 on Power8)
CUDA/Clang 8.0 using Clang/Trunk compiler
Other compilers working:
X86:
- PGI 15.4
Cygwin 2.1.0 64bit with gcc 4.9.3
Known non-working combinations:
Power8:
Pthreads backend
Primary tested compiler are passing in release mode
with warnings as errors. They also are tested with a comprehensive set of
backend combinations (i.e. OpenMP, Pthreads, Serial, OpenMP+Serial, ...).
We are using the following set of flags:
GCC: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits
-Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized
Intel: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
Clang: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
Secondary compilers are passing without -Werror.
Other compilers are tested occasionally, in particular when pushing from develop to
master branch, without -Werror and only for a select set of backends.
============================================================================
====Getting started=========================================================
============================================================================
In the 'example/tutorial' directory you will find step by step tutorial
examples which explain many of the features of Kokkos. They work with
simple Makefiles. To build with g++ and OpenMP simply type 'make'
in the 'example/tutorial' directory. This will build all examples in the
subfolders. To change the build options refer to the Programming Guide
in the compilation section.
============================================================================
====Running Unit Tests======================================================
============================================================================
To run the unit tests create a build directory and run the following commands
KOKKOS_PATH/generate_makefile.bash
make build-test
make test
Run KOKKOS_PATH/generate_makefile.bash --help for more detailed options such as
changing the device type for which to build.
============================================================================
====Install the library=====================================================
============================================================================
To install Kokkos as a library create a build directory and run the following
KOKKOS_PATH/generate_makefile.bash --prefix=INSTALL_PATH
make lib
make install
KOKKOS_PATH/generate_makefile.bash --help for more detailed options such as
changing the device type for which to build.
============================================================================
====CMakeFiles==============================================================
============================================================================
The CMake files contained in this repository require Tribits and are used
for integration with Trilinos. They do not currently support a standalone
CMake build.
===========================================================================
====Kokkos and CUDA UVM====================================================
===========================================================================
Kokkos does support UVM as a specific memory space called CudaUVMSpace.
Allocations made with that space are accessible from host and device.
You can tell Kokkos to use that as the default space for Cuda allocations.
In either case UVM comes with a number of restrictions:
(i) You can't access allocations on the host while a kernel is potentially
running. This will lead to segfaults. To avoid that you either need to
call Kokkos::Cuda::fence() (or just Kokkos::fence()), after kernels, or
you can set the environment variable CUDA_LAUNCH_BLOCKING=1.
Furthermore in multi socket multi GPU machines, UVM defaults to using
zero copy allocations for technical reasons related to using multiple
GPUs from the same process. If an executable doesn't do that (e.g. each
MPI rank of an application uses a single GPU [can be the same GPU for
multiple MPI ranks]) you can set CUDA_MANAGED_FORCE_DEVICE_ALLOC=1.
This will enforce proper UVM allocations, but can lead to errors if
more than a single GPU is used by a single process.
===========================================================================
====Contributing===========================================================
===========================================================================
Contributions to Kokkos are welcome. In order to do so, please open an issue
where a feature request or bug can be discussed. Then issue a pull request
with your contribution. Pull requests must be issued against the develop branch.
diff --git a/lib/kokkos/algorithms/cmake/Dependencies.cmake b/lib/kokkos/algorithms/cmake/Dependencies.cmake
index 1d71d8af3..c36b62523 100644
--- a/lib/kokkos/algorithms/cmake/Dependencies.cmake
+++ b/lib/kokkos/algorithms/cmake/Dependencies.cmake
@@ -1,5 +1,5 @@
TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
- LIB_REQUIRED_PACKAGES KokkosCore
+ LIB_REQUIRED_PACKAGES KokkosCore KokkosContainers
LIB_OPTIONAL_TPLS Pthread CUDA HWLOC
TEST_OPTIONAL_TPLS CUSPARSE
)
diff --git a/lib/kokkos/algorithms/src/Kokkos_Random.hpp b/lib/kokkos/algorithms/src/Kokkos_Random.hpp
index d376173bf..bd7358236 100644
--- a/lib/kokkos/algorithms/src/Kokkos_Random.hpp
+++ b/lib/kokkos/algorithms/src/Kokkos_Random.hpp
@@ -1,1751 +1,1755 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_RANDOM_HPP
#define KOKKOS_RANDOM_HPP
#include <Kokkos_Core.hpp>
#include <Kokkos_Complex.hpp>
#include <cstdio>
#include <cstdlib>
#include <cmath>
/// \file Kokkos_Random.hpp
/// \brief Pseudorandom number generators
///
/// These generators are based on Vigna, Sebastiano (2014). "An
/// experimental exploration of Marsaglia's xorshift generators,
/// scrambled." See: http://arxiv.org/abs/1402.6246
namespace Kokkos {
/*Template functions to get equidistributed random numbers from a generator for a specific Scalar type
template<class Generator,Scalar>
struct rand{
//Max value returned by draw(Generator& gen)
KOKKOS_INLINE_FUNCTION
static Scalar max();
//Returns a value between zero and max()
KOKKOS_INLINE_FUNCTION
static Scalar draw(Generator& gen);
//Returns a value between zero and range()
//Note: for floating point values range can be larger than max()
KOKKOS_INLINE_FUNCTION
static Scalar draw(Generator& gen, const Scalar& range){}
//Return value between start and end
KOKKOS_INLINE_FUNCTION
static Scalar draw(Generator& gen, const Scalar& start, const Scalar& end);
};
The Random number generators themselves have two components a state-pool and the actual generator
A state-pool manages a number of generators, so that each active thread is able to grep its own.
This allows the generation of random numbers which are independent between threads. Note that
in contrast to CuRand none of the functions of the pool (or the generator) are collectives,
i.e. all functions can be called inside conditionals.
template<class Device>
class Pool {
public:
//The Kokkos device type
typedef Device device_type;
//The actual generator type
typedef Generator<Device> generator_type;
//Default constructor: does not initialize a pool
Pool();
//Initializing constructor: calls init(seed,Device_Specific_Number);
Pool(unsigned int seed);
//Intialize Pool with seed as a starting seed with a pool_size of num_states
//The Random_XorShift64 generator is used in serial to initialize all states,
//thus the intialization process is platform independent and deterministic.
void init(unsigned int seed, int num_states);
//Get a generator. This will lock one of the states, guaranteeing that each thread
//will have its private generator. Note: on Cuda getting a state involves atomics,
//and is thus not deterministic!
generator_type get_state();
//Give a state back to the pool. This unlocks the state, and writes the modified
//state of the generator back to the pool.
void free_state(generator_type gen);
}
template<class Device>
class Generator {
public:
//The Kokkos device type
typedef DeviceType device_type;
//Max return values of respective [X]rand[S]() functions
enum {MAX_URAND = 0xffffffffU};
enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
enum {MAX_RAND = static_cast<int>(0xffffffffU/2)};
enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffULL/2-1)};
//Init with a state and the idx with respect to pool. Note: in serial the
//Generator can be used by just giving it the necessary state arguments
KOKKOS_INLINE_FUNCTION
Generator (STATE_ARGUMENTS, int state_idx = 0);
//Draw a equidistributed uint32_t in the range (0,MAX_URAND]
KOKKOS_INLINE_FUNCTION
uint32_t urand();
//Draw a equidistributed uint64_t in the range (0,MAX_URAND64]
KOKKOS_INLINE_FUNCTION
uint64_t urand64();
//Draw a equidistributed uint32_t in the range (0,range]
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& range);
//Draw a equidistributed uint32_t in the range (start,end]
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& start, const uint32_t& end );
//Draw a equidistributed uint64_t in the range (0,range]
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& range);
//Draw a equidistributed uint64_t in the range (start,end]
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& start, const uint64_t& end );
//Draw a equidistributed int in the range (0,MAX_RAND]
KOKKOS_INLINE_FUNCTION
int rand();
//Draw a equidistributed int in the range (0,range]
KOKKOS_INLINE_FUNCTION
int rand(const int& range);
//Draw a equidistributed int in the range (start,end]
KOKKOS_INLINE_FUNCTION
int rand(const int& start, const int& end );
//Draw a equidistributed int64_t in the range (0,MAX_RAND64]
KOKKOS_INLINE_FUNCTION
int64_t rand64();
//Draw a equidistributed int64_t in the range (0,range]
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& range);
//Draw a equidistributed int64_t in the range (start,end]
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& start, const int64_t& end );
//Draw a equidistributed float in the range (0,1.0]
KOKKOS_INLINE_FUNCTION
float frand();
//Draw a equidistributed float in the range (0,range]
KOKKOS_INLINE_FUNCTION
float frand(const float& range);
//Draw a equidistributed float in the range (start,end]
KOKKOS_INLINE_FUNCTION
float frand(const float& start, const float& end );
//Draw a equidistributed double in the range (0,1.0]
KOKKOS_INLINE_FUNCTION
double drand();
//Draw a equidistributed double in the range (0,range]
KOKKOS_INLINE_FUNCTION
double drand(const double& range);
//Draw a equidistributed double in the range (start,end]
KOKKOS_INLINE_FUNCTION
double drand(const double& start, const double& end );
//Draw a standard normal distributed double
KOKKOS_INLINE_FUNCTION
double normal() ;
//Draw a normal distributed double with given mean and standard deviation
KOKKOS_INLINE_FUNCTION
double normal(const double& mean, const double& std_dev=1.0);
}
//Additional Functions:
//Fills view with random numbers in the range (0,range]
template<class ViewType, class PoolType>
void fill_random(ViewType view, PoolType pool, ViewType::value_type range);
//Fills view with random numbers in the range (start,end]
template<class ViewType, class PoolType>
void fill_random(ViewType view, PoolType pool,
ViewType::value_type start, ViewType::value_type end);
*/
template<class Generator, class Scalar>
struct rand;
template<class Generator>
struct rand<Generator,char> {
KOKKOS_INLINE_FUNCTION
static short max(){return 127;}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen)
{return short((gen.rand()&0xff+256)%256);}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen, const char& range)
{return char(gen.rand(range));}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen, const char& start, const char& end)
{return char(gen.rand(start,end));}
};
template<class Generator>
struct rand<Generator,short> {
KOKKOS_INLINE_FUNCTION
static short max(){return 32767;}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen)
{return short((gen.rand()&0xffff+65536)%32768);}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen, const short& range)
{return short(gen.rand(range));}
KOKKOS_INLINE_FUNCTION
static short draw(Generator& gen, const short& start, const short& end)
{return short(gen.rand(start,end));}
};
template<class Generator>
struct rand<Generator,int> {
KOKKOS_INLINE_FUNCTION
static int max(){return Generator::MAX_RAND;}
KOKKOS_INLINE_FUNCTION
static int draw(Generator& gen)
{return gen.rand();}
KOKKOS_INLINE_FUNCTION
static int draw(Generator& gen, const int& range)
{return gen.rand(range);}
KOKKOS_INLINE_FUNCTION
static int draw(Generator& gen, const int& start, const int& end)
{return gen.rand(start,end);}
};
template<class Generator>
struct rand<Generator,unsigned int> {
KOKKOS_INLINE_FUNCTION
static unsigned int max () {
return Generator::MAX_URAND;
}
KOKKOS_INLINE_FUNCTION
static unsigned int draw (Generator& gen) {
return gen.urand ();
}
KOKKOS_INLINE_FUNCTION
static unsigned int draw(Generator& gen, const unsigned int& range) {
return gen.urand (range);
}
KOKKOS_INLINE_FUNCTION
static unsigned int
draw (Generator& gen, const unsigned int& start, const unsigned int& end) {
return gen.urand (start, end);
}
};
template<class Generator>
struct rand<Generator,long> {
KOKKOS_INLINE_FUNCTION
static long max () {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (long) == 4 ?
static_cast<long> (Generator::MAX_RAND) :
static_cast<long> (Generator::MAX_RAND64);
}
KOKKOS_INLINE_FUNCTION
static long draw (Generator& gen) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (long) == 4 ?
static_cast<long> (gen.rand ()) :
static_cast<long> (gen.rand64 ());
}
KOKKOS_INLINE_FUNCTION
static long draw (Generator& gen, const long& range) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (long) == 4 ?
static_cast<long> (gen.rand (static_cast<int> (range))) :
static_cast<long> (gen.rand64 (range));
}
KOKKOS_INLINE_FUNCTION
static long draw (Generator& gen, const long& start, const long& end) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (long) == 4 ?
static_cast<long> (gen.rand (static_cast<int> (start),
static_cast<int> (end))) :
static_cast<long> (gen.rand64 (start, end));
}
};
template<class Generator>
struct rand<Generator,unsigned long> {
KOKKOS_INLINE_FUNCTION
static unsigned long max () {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (unsigned long) == 4 ?
static_cast<unsigned long> (Generator::MAX_URAND) :
static_cast<unsigned long> (Generator::MAX_URAND64);
}
KOKKOS_INLINE_FUNCTION
static unsigned long draw (Generator& gen) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (unsigned long) == 4 ?
static_cast<unsigned long> (gen.urand ()) :
static_cast<unsigned long> (gen.urand64 ());
}
KOKKOS_INLINE_FUNCTION
static unsigned long draw(Generator& gen, const unsigned long& range) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (unsigned long) == 4 ?
static_cast<unsigned long> (gen.urand (static_cast<unsigned int> (range))) :
static_cast<unsigned long> (gen.urand64 (range));
}
KOKKOS_INLINE_FUNCTION
static unsigned long
draw (Generator& gen, const unsigned long& start, const unsigned long& end) {
// FIXME (mfh 26 Oct 2014) It would be better to select the
// return value at compile time, using something like enable_if.
return sizeof (unsigned long) == 4 ?
static_cast<unsigned long> (gen.urand (static_cast<unsigned int> (start),
static_cast<unsigned int> (end))) :
static_cast<unsigned long> (gen.urand64 (start, end));
}
};
// NOTE (mfh 26 oct 2014) This is a partial specialization for long
// long, a C99 / C++11 signed type which is guaranteed to be at
// least 64 bits. Do NOT write a partial specialization for
// int64_t!!! This is just a typedef! It could be either long or
// long long. We don't know which a priori, and I've seen both.
// The types long and long long are guaranteed to differ, so it's
// always safe to specialize for both.
template<class Generator>
struct rand<Generator, long long> {
KOKKOS_INLINE_FUNCTION
static long long max () {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return Generator::MAX_RAND64;
}
KOKKOS_INLINE_FUNCTION
static long long draw (Generator& gen) {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return gen.rand64 ();
}
KOKKOS_INLINE_FUNCTION
static long long draw (Generator& gen, const long long& range) {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return gen.rand64 (range);
}
KOKKOS_INLINE_FUNCTION
static long long draw (Generator& gen, const long long& start, const long long& end) {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return gen.rand64 (start, end);
}
};
// NOTE (mfh 26 oct 2014) This is a partial specialization for
// unsigned long long, a C99 / C++11 unsigned type which is
// guaranteed to be at least 64 bits. Do NOT write a partial
// specialization for uint64_t!!! This is just a typedef! It could
// be either unsigned long or unsigned long long. We don't know
// which a priori, and I've seen both. The types unsigned long and
// unsigned long long are guaranteed to differ, so it's always safe
// to specialize for both.
template<class Generator>
struct rand<Generator,unsigned long long> {
KOKKOS_INLINE_FUNCTION
static unsigned long long max () {
// FIXME (mfh 26 Oct 2014) It's legal for unsigned long long to be > 64 bits.
return Generator::MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
static unsigned long long draw (Generator& gen) {
// FIXME (mfh 26 Oct 2014) It's legal for unsigned long long to be > 64 bits.
return gen.urand64 ();
}
KOKKOS_INLINE_FUNCTION
static unsigned long long draw (Generator& gen, const unsigned long long& range) {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return gen.urand64 (range);
}
KOKKOS_INLINE_FUNCTION
static unsigned long long
draw (Generator& gen, const unsigned long long& start, const unsigned long long& end) {
// FIXME (mfh 26 Oct 2014) It's legal for long long to be > 64 bits.
return gen.urand64 (start, end);
}
};
template<class Generator>
struct rand<Generator,float> {
KOKKOS_INLINE_FUNCTION
static float max(){return 1.0f;}
KOKKOS_INLINE_FUNCTION
static float draw(Generator& gen)
{return gen.frand();}
KOKKOS_INLINE_FUNCTION
static float draw(Generator& gen, const float& range)
{return gen.frand(range);}
KOKKOS_INLINE_FUNCTION
static float draw(Generator& gen, const float& start, const float& end)
{return gen.frand(start,end);}
};
template<class Generator>
struct rand<Generator,double> {
KOKKOS_INLINE_FUNCTION
static double max(){return 1.0;}
KOKKOS_INLINE_FUNCTION
static double draw(Generator& gen)
{return gen.drand();}
KOKKOS_INLINE_FUNCTION
static double draw(Generator& gen, const double& range)
{return gen.drand(range);}
KOKKOS_INLINE_FUNCTION
static double draw(Generator& gen, const double& start, const double& end)
{return gen.drand(start,end);}
};
template<class Generator>
struct rand<Generator, Kokkos::complex<float> > {
KOKKOS_INLINE_FUNCTION
static Kokkos::complex<float> max () {
return Kokkos::complex<float> (1.0, 1.0);
}
KOKKOS_INLINE_FUNCTION
static Kokkos::complex<float> draw (Generator& gen) {
const float re = gen.frand ();
const float im = gen.frand ();
return Kokkos::complex<float> (re, im);
}
KOKKOS_INLINE_FUNCTION
static Kokkos::complex<float> draw (Generator& gen, const Kokkos::complex<float>& range) {
const float re = gen.frand (real (range));
const float im = gen.frand (imag (range));
return Kokkos::complex<float> (re, im);
}
KOKKOS_INLINE_FUNCTION
static Kokkos::complex<float> draw (Generator& gen, const Kokkos::complex<float>& start, const Kokkos::complex<float>& end) {
const float re = gen.frand (real (start), real (end));
const float im = gen.frand (imag (start), imag (end));
return Kokkos::complex<float> (re, im);
}
};
template<class Generator>
struct rand<Generator, Kokkos::complex<double> > {
KOKKOS_INLINE_FUNCTION
static Kokkos::complex<double> max () {
return Kokkos::complex<double> (1.0, 1.0);
}
KOKKOS_INLINE_FUNCTION
static Kokkos::complex<double> draw (Generator& gen) {
const double re = gen.drand ();
const double im = gen.drand ();
return Kokkos::complex<double> (re, im);
}
KOKKOS_INLINE_FUNCTION
static Kokkos::complex<double> draw (Generator& gen, const Kokkos::complex<double>& range) {
const double re = gen.drand (real (range));
const double im = gen.drand (imag (range));
return Kokkos::complex<double> (re, im);
}
KOKKOS_INLINE_FUNCTION
static Kokkos::complex<double> draw (Generator& gen, const Kokkos::complex<double>& start, const Kokkos::complex<double>& end) {
const double re = gen.drand (real (start), real (end));
const double im = gen.drand (imag (start), imag (end));
return Kokkos::complex<double> (re, im);
}
};
template<class DeviceType>
class Random_XorShift64_Pool;
template<class DeviceType>
class Random_XorShift64 {
private:
uint64_t state_;
const int state_idx_;
friend class Random_XorShift64_Pool<DeviceType>;
public:
typedef DeviceType device_type;
enum {MAX_URAND = 0xffffffffU};
enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
enum {MAX_RAND = static_cast<int>(0xffffffff/2)};
enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffLL/2-1)};
KOKKOS_INLINE_FUNCTION
Random_XorShift64 (uint64_t state, int state_idx = 0)
- : state_(state),state_idx_(state_idx){}
+ : state_(state==0?uint64_t(1318319):state),state_idx_(state_idx){}
KOKKOS_INLINE_FUNCTION
uint32_t urand() {
state_ ^= state_ >> 12;
state_ ^= state_ << 25;
state_ ^= state_ >> 27;
uint64_t tmp = state_ * 2685821657736338717ULL;
tmp = tmp>>16;
return static_cast<uint32_t>(tmp&MAX_URAND);
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64() {
state_ ^= state_ >> 12;
state_ ^= state_ << 25;
state_ ^= state_ >> 27;
return (state_ * 2685821657736338717ULL) - 1;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& range) {
const uint32_t max_val = (MAX_URAND/range)*range;
uint32_t tmp = urand();
while(tmp>=max_val)
tmp = urand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& start, const uint32_t& end ) {
return urand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& range) {
const uint64_t max_val = (MAX_URAND64/range)*range;
uint64_t tmp = urand64();
while(tmp>=max_val)
tmp = urand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& start, const uint64_t& end ) {
return urand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int rand() {
return static_cast<int>(urand()/2);
}
KOKKOS_INLINE_FUNCTION
int rand(const int& range) {
const int max_val = (MAX_RAND/range)*range;
int tmp = rand();
while(tmp>=max_val)
tmp = rand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int rand(const int& start, const int& end ) {
return rand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64() {
return static_cast<int64_t>(urand64()/2);
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& range) {
const int64_t max_val = (MAX_RAND64/range)*range;
int64_t tmp = rand64();
while(tmp>=max_val)
tmp = rand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& start, const int64_t& end ) {
return rand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
float frand() {
return 1.0f * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& start, const float& end ) {
return frand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
double drand() {
return 1.0 * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& start, const double& end ) {
return drand(end-start)+start;
}
//Marsaglia polar method for drawing a standard normal distributed random number
KOKKOS_INLINE_FUNCTION
double normal() {
double S = 2.0;
double U;
while(S>=1.0) {
U = 2.0*drand() - 1.0;
const double V = 2.0*drand() - 1.0;
S = U*U+V*V;
}
return U*sqrt(-2.0*log(S)/S);
}
KOKKOS_INLINE_FUNCTION
double normal(const double& mean, const double& std_dev=1.0) {
return mean + normal()*std_dev;
}
};
template<class DeviceType = Kokkos::DefaultExecutionSpace>
class Random_XorShift64_Pool {
private:
typedef View<int*,DeviceType> lock_type;
typedef View<uint64_t*,DeviceType> state_data_type;
lock_type locks_;
state_data_type state_;
int num_states_;
public:
typedef Random_XorShift64<DeviceType> generator_type;
typedef DeviceType device_type;
Random_XorShift64_Pool() {
num_states_ = 0;
}
Random_XorShift64_Pool(uint64_t seed) {
num_states_ = 0;
init(seed,DeviceType::max_hardware_threads());
}
Random_XorShift64_Pool(const Random_XorShift64_Pool& src):
locks_(src.locks_),
state_(src.state_),
num_states_(src.num_states_)
{}
Random_XorShift64_Pool operator = (const Random_XorShift64_Pool& src) {
locks_ = src.locks_;
state_ = src.state_;
num_states_ = src.num_states_;
return *this;
}
void init(uint64_t seed, int num_states) {
+ if(seed==0)
+ seed = uint64_t(1318319);
+
num_states_ = num_states;
locks_ = lock_type("Kokkos::Random_XorShift64::locks",num_states_);
state_ = state_data_type("Kokkos::Random_XorShift64::state",num_states_);
typename state_data_type::HostMirror h_state = create_mirror_view(state_);
typename lock_type::HostMirror h_lock = create_mirror_view(locks_);
// Execute on the HostMirror's default execution space.
Random_XorShift64<typename state_data_type::HostMirror::execution_space> gen(seed,0);
for(int i = 0; i < 17; i++)
gen.rand();
for(int i = 0; i < num_states_; i++) {
int n1 = gen.rand();
int n2 = gen.rand();
int n3 = gen.rand();
int n4 = gen.rand();
h_state(i) = (((static_cast<uint64_t>(n1)) & 0xffff)<<00) |
(((static_cast<uint64_t>(n2)) & 0xffff)<<16) |
(((static_cast<uint64_t>(n3)) & 0xffff)<<32) |
(((static_cast<uint64_t>(n4)) & 0xffff)<<48);
h_lock(i) = 0;
}
deep_copy(state_,h_state);
deep_copy(locks_,h_lock);
}
KOKKOS_INLINE_FUNCTION
Random_XorShift64<DeviceType> get_state() const {
const int i = DeviceType::hardware_thread_id();;
return Random_XorShift64<DeviceType>(state_(i),i);
}
KOKKOS_INLINE_FUNCTION
void free_state(const Random_XorShift64<DeviceType>& state) const {
state_(state.state_idx_) = state.state_;
}
};
template<class DeviceType>
class Random_XorShift1024_Pool;
template<class DeviceType>
class Random_XorShift1024 {
private:
int p_;
const int state_idx_;
uint64_t state_[16];
friend class Random_XorShift1024_Pool<DeviceType>;
public:
typedef Random_XorShift1024_Pool<DeviceType> pool_type;
typedef DeviceType device_type;
enum {MAX_URAND = 0xffffffffU};
enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
enum {MAX_RAND = static_cast<int>(0xffffffffU/2)};
enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffULL/2-1)};
KOKKOS_INLINE_FUNCTION
Random_XorShift1024 (const typename pool_type::state_data_type& state, int p, int state_idx = 0):
p_(p),state_idx_(state_idx){
for(int i=0 ; i<16; i++)
state_[i] = state(state_idx,i);
}
KOKKOS_INLINE_FUNCTION
uint32_t urand() {
uint64_t state_0 = state_[ p_ ];
uint64_t state_1 = state_[ p_ = ( p_ + 1 ) & 15 ];
state_1 ^= state_1 << 31;
state_1 ^= state_1 >> 11;
state_0 ^= state_0 >> 30;
uint64_t tmp = ( state_[ p_ ] = state_0 ^ state_1 ) * 1181783497276652981ULL;
tmp = tmp>>16;
return static_cast<uint32_t>(tmp&MAX_URAND);
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64() {
uint64_t state_0 = state_[ p_ ];
uint64_t state_1 = state_[ p_ = ( p_ + 1 ) & 15 ];
state_1 ^= state_1 << 31;
state_1 ^= state_1 >> 11;
state_0 ^= state_0 >> 30;
return (( state_[ p_ ] = state_0 ^ state_1 ) * 1181783497276652981LL) - 1;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& range) {
const uint32_t max_val = (MAX_URAND/range)*range;
uint32_t tmp = urand();
while(tmp>=max_val)
tmp = urand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& start, const uint32_t& end ) {
return urand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& range) {
const uint64_t max_val = (MAX_URAND64/range)*range;
uint64_t tmp = urand64();
while(tmp>=max_val)
tmp = urand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& start, const uint64_t& end ) {
return urand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int rand() {
return static_cast<int>(urand()/2);
}
KOKKOS_INLINE_FUNCTION
int rand(const int& range) {
const int max_val = (MAX_RAND/range)*range;
int tmp = rand();
while(tmp>=max_val)
tmp = rand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int rand(const int& start, const int& end ) {
return rand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64() {
return static_cast<int64_t>(urand64()/2);
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& range) {
const int64_t max_val = (MAX_RAND64/range)*range;
int64_t tmp = rand64();
while(tmp>=max_val)
tmp = rand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& start, const int64_t& end ) {
return rand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
float frand() {
return 1.0f * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& start, const float& end ) {
return frand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
double drand() {
return 1.0 * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& start, const double& end ) {
return frand(end-start)+start;
}
//Marsaglia polar method for drawing a standard normal distributed random number
KOKKOS_INLINE_FUNCTION
double normal() {
double S = 2.0;
double U;
while(S>=1.0) {
U = 2.0*drand() - 1.0;
const double V = 2.0*drand() - 1.0;
S = U*U+V*V;
}
return U*sqrt(-2.0*log(S)/S);
}
KOKKOS_INLINE_FUNCTION
double normal(const double& mean, const double& std_dev=1.0) {
return mean + normal()*std_dev;
}
};
template<class DeviceType = Kokkos::DefaultExecutionSpace>
class Random_XorShift1024_Pool {
private:
typedef View<int*,DeviceType> int_view_type;
typedef View<uint64_t*[16],DeviceType> state_data_type;
int_view_type locks_;
state_data_type state_;
int_view_type p_;
int num_states_;
friend class Random_XorShift1024<DeviceType>;
public:
typedef Random_XorShift1024<DeviceType> generator_type;
typedef DeviceType device_type;
Random_XorShift1024_Pool() {
num_states_ = 0;
}
inline
Random_XorShift1024_Pool(uint64_t seed){
num_states_ = 0;
init(seed,DeviceType::max_hardware_threads());
}
Random_XorShift1024_Pool(const Random_XorShift1024_Pool& src):
locks_(src.locks_),
state_(src.state_),
p_(src.p_),
num_states_(src.num_states_)
{}
Random_XorShift1024_Pool operator = (const Random_XorShift1024_Pool& src) {
locks_ = src.locks_;
state_ = src.state_;
p_ = src.p_;
num_states_ = src.num_states_;
return *this;
}
inline
void init(uint64_t seed, int num_states) {
+ if(seed==0)
+ seed = uint64_t(1318319);
num_states_ = num_states;
-
locks_ = int_view_type("Kokkos::Random_XorShift1024::locks",num_states_);
state_ = state_data_type("Kokkos::Random_XorShift1024::state",num_states_);
p_ = int_view_type("Kokkos::Random_XorShift1024::p",num_states_);
typename state_data_type::HostMirror h_state = create_mirror_view(state_);
typename int_view_type::HostMirror h_lock = create_mirror_view(locks_);
typename int_view_type::HostMirror h_p = create_mirror_view(p_);
// Execute on the HostMirror's default execution space.
Random_XorShift64<typename state_data_type::HostMirror::execution_space> gen(seed,0);
for(int i = 0; i < 17; i++)
gen.rand();
for(int i = 0; i < num_states_; i++) {
for(int j = 0; j < 16 ; j++) {
int n1 = gen.rand();
int n2 = gen.rand();
int n3 = gen.rand();
int n4 = gen.rand();
h_state(i,j) = (((static_cast<uint64_t>(n1)) & 0xffff)<<00) |
(((static_cast<uint64_t>(n2)) & 0xffff)<<16) |
(((static_cast<uint64_t>(n3)) & 0xffff)<<32) |
(((static_cast<uint64_t>(n4)) & 0xffff)<<48);
}
h_p(i) = 0;
h_lock(i) = 0;
}
deep_copy(state_,h_state);
deep_copy(locks_,h_lock);
}
KOKKOS_INLINE_FUNCTION
Random_XorShift1024<DeviceType> get_state() const {
const int i = DeviceType::hardware_thread_id();
return Random_XorShift1024<DeviceType>(state_,p_(i),i);
};
KOKKOS_INLINE_FUNCTION
void free_state(const Random_XorShift1024<DeviceType>& state) const {
for(int i = 0; i<16; i++)
state_(state.state_idx_,i) = state.state_[i];
p_(state.state_idx_) = state.p_;
}
};
#if defined(KOKKOS_ENABLE_CUDA) && defined(__CUDACC__)
template<>
class Random_XorShift1024<Kokkos::Cuda> {
private:
int p_;
const int state_idx_;
uint64_t* state_;
const int stride_;
friend class Random_XorShift1024_Pool<Kokkos::Cuda>;
public:
typedef Kokkos::Cuda device_type;
typedef Random_XorShift1024_Pool<device_type> pool_type;
enum {MAX_URAND = 0xffffffffU};
enum {MAX_URAND64 = 0xffffffffffffffffULL-1};
enum {MAX_RAND = static_cast<int>(0xffffffffU/2)};
enum {MAX_RAND64 = static_cast<int64_t>(0xffffffffffffffffULL/2-1)};
KOKKOS_INLINE_FUNCTION
Random_XorShift1024 (const typename pool_type::state_data_type& state, int p, int state_idx = 0):
p_(p),state_idx_(state_idx),state_(&state(state_idx,0)),stride_(state.stride_1()){
}
KOKKOS_INLINE_FUNCTION
uint32_t urand() {
uint64_t state_0 = state_[ p_ * stride_ ];
uint64_t state_1 = state_[ (p_ = ( p_ + 1 ) & 15) * stride_ ];
state_1 ^= state_1 << 31;
state_1 ^= state_1 >> 11;
state_0 ^= state_0 >> 30;
uint64_t tmp = ( state_[ p_ * stride_ ] = state_0 ^ state_1 ) * 1181783497276652981ULL;
tmp = tmp>>16;
return static_cast<uint32_t>(tmp&MAX_URAND);
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64() {
uint64_t state_0 = state_[ p_ * stride_ ];
uint64_t state_1 = state_[ (p_ = ( p_ + 1 ) & 15) * stride_ ];
state_1 ^= state_1 << 31;
state_1 ^= state_1 >> 11;
state_0 ^= state_0 >> 30;
return (( state_[ p_ * stride_ ] = state_0 ^ state_1 ) * 1181783497276652981LL) - 1;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& range) {
const uint32_t max_val = (MAX_URAND/range)*range;
uint32_t tmp = urand();
while(tmp>=max_val)
urand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint32_t urand(const uint32_t& start, const uint32_t& end ) {
return urand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& range) {
const uint64_t max_val = (MAX_URAND64/range)*range;
uint64_t tmp = urand64();
while(tmp>=max_val)
urand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
uint64_t urand64(const uint64_t& start, const uint64_t& end ) {
return urand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int rand() {
return static_cast<int>(urand()/2);
}
KOKKOS_INLINE_FUNCTION
int rand(const int& range) {
const int max_val = (MAX_RAND/range)*range;
int tmp = rand();
while(tmp>=max_val)
rand();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int rand(const int& start, const int& end ) {
return rand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64() {
return static_cast<int64_t>(urand64()/2);
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& range) {
const int64_t max_val = (MAX_RAND64/range)*range;
int64_t tmp = rand64();
while(tmp>=max_val)
rand64();
return tmp%range;
}
KOKKOS_INLINE_FUNCTION
int64_t rand64(const int64_t& start, const int64_t& end ) {
return rand64(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
float frand() {
return 1.0f * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
float frand(const float& start, const float& end ) {
return frand(end-start)+start;
}
KOKKOS_INLINE_FUNCTION
double drand() {
return 1.0 * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& range) {
return range * urand64()/MAX_URAND64;
}
KOKKOS_INLINE_FUNCTION
double drand(const double& start, const double& end ) {
return frand(end-start)+start;
}
//Marsaglia polar method for drawing a standard normal distributed random number
KOKKOS_INLINE_FUNCTION
double normal() {
double S = 2.0;
double U;
while(S>=1.0) {
U = 2.0*drand() - 1.0;
const double V = 2.0*drand() - 1.0;
S = U*U+V*V;
}
return U*sqrt(-2.0*log(S)/S);
}
KOKKOS_INLINE_FUNCTION
double normal(const double& mean, const double& std_dev=1.0) {
return mean + normal()*std_dev;
}
};
template<>
inline
Random_XorShift64_Pool<Kokkos::Cuda>::Random_XorShift64_Pool(uint64_t seed) {
num_states_ = 0;
init(seed,4*32768);
}
template<>
KOKKOS_INLINE_FUNCTION
Random_XorShift64<Kokkos::Cuda> Random_XorShift64_Pool<Kokkos::Cuda>::get_state() const {
#ifdef __CUDA_ARCH__
const int i_offset = (threadIdx.x*blockDim.y + threadIdx.y)*blockDim.z+threadIdx.z;
int i = (((blockIdx.x*gridDim.y+blockIdx.y)*gridDim.z + blockIdx.z) *
blockDim.x*blockDim.y*blockDim.z + i_offset)%num_states_;
while(Kokkos::atomic_compare_exchange(&locks_(i),0,1)) {
i+=blockDim.x*blockDim.y*blockDim.z;
if(i>=num_states_) {i = i_offset;}
}
return Random_XorShift64<Kokkos::Cuda>(state_(i),i);
#else
return Random_XorShift64<Kokkos::Cuda>(state_(0),0);
#endif
}
template<>
KOKKOS_INLINE_FUNCTION
void Random_XorShift64_Pool<Kokkos::Cuda>::free_state(const Random_XorShift64<Kokkos::Cuda> &state) const {
#ifdef __CUDA_ARCH__
state_(state.state_idx_) = state.state_;
locks_(state.state_idx_) = 0;
return;
#endif
}
template<>
inline
Random_XorShift1024_Pool<Kokkos::Cuda>::Random_XorShift1024_Pool(uint64_t seed) {
num_states_ = 0;
init(seed,4*32768);
}
template<>
KOKKOS_INLINE_FUNCTION
Random_XorShift1024<Kokkos::Cuda> Random_XorShift1024_Pool<Kokkos::Cuda>::get_state() const {
#ifdef __CUDA_ARCH__
const int i_offset = (threadIdx.x*blockDim.y + threadIdx.y)*blockDim.z+threadIdx.z;
int i = (((blockIdx.x*gridDim.y+blockIdx.y)*gridDim.z + blockIdx.z) *
blockDim.x*blockDim.y*blockDim.z + i_offset)%num_states_;
while(Kokkos::atomic_compare_exchange(&locks_(i),0,1)) {
i+=blockDim.x*blockDim.y*blockDim.z;
if(i>=num_states_) {i = i_offset;}
}
return Random_XorShift1024<Kokkos::Cuda>(state_, p_(i), i);
#else
return Random_XorShift1024<Kokkos::Cuda>(state_, p_(0), 0);
#endif
}
template<>
KOKKOS_INLINE_FUNCTION
void Random_XorShift1024_Pool<Kokkos::Cuda>::free_state(const Random_XorShift1024<Kokkos::Cuda> &state) const {
#ifdef __CUDA_ARCH__
for(int i=0; i<16; i++)
state_(state.state_idx_,i) = state.state_[i];
locks_(state.state_idx_) = 0;
return;
#endif
}
#endif
namespace Impl {
template<class ViewType, class RandomPool, int loops, int rank, class IndexType>
struct fill_random_functor_range;
template<class ViewType, class RandomPool, int loops, int rank, class IndexType>
struct fill_random_functor_begin_end;
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,1,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (const IndexType& i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0()))
a(idx) = Rand::draw(gen,range);
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,2,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
a(idx,k) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,3,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
a(idx,k,l) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,4, IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
a(idx,k,l,m) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,5,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
a(idx,k,l,m,n) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,6,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
a(idx,k,l,m,n,o) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,7,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
a(idx,k,l,m,n,o,p) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_range<ViewType,RandomPool,loops,8,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type range;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_range(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type range_):
a(a_),rand_pool(rand_pool_),range(range_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
for(IndexType q=0;q<static_cast<IndexType>(a.dimension_7());q++)
a(idx,k,l,m,n,o,p,q) = Rand::draw(gen,range);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,1,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0()))
a(idx) = Rand::draw(gen,begin,end);
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,2,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
a(idx,k) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,3,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
a(idx,k,l) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,4,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
a(idx,k,l,m) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,5,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())){
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_1());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_2());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_3());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_4());o++)
a(idx,l,m,n,o) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,6,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
a(idx,k,l,m,n,o) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,7,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
a(idx,k,l,m,n,o,p) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
template<class ViewType, class RandomPool, int loops, class IndexType>
struct fill_random_functor_begin_end<ViewType,RandomPool,loops,8,IndexType>{
typedef typename ViewType::execution_space execution_space;
ViewType a;
RandomPool rand_pool;
typename ViewType::const_value_type begin,end;
typedef rand<typename RandomPool::generator_type, typename ViewType::non_const_value_type> Rand;
fill_random_functor_begin_end(ViewType a_, RandomPool rand_pool_,
typename ViewType::const_value_type begin_, typename ViewType::const_value_type end_):
a(a_),rand_pool(rand_pool_),begin(begin_),end(end_) {}
KOKKOS_INLINE_FUNCTION
void operator() (IndexType i) const {
typename RandomPool::generator_type gen = rand_pool.get_state();
for(IndexType j=0;j<loops;j++) {
const IndexType idx = i*loops+j;
if(idx<static_cast<IndexType>(a.dimension_0())) {
for(IndexType k=0;k<static_cast<IndexType>(a.dimension_1());k++)
for(IndexType l=0;l<static_cast<IndexType>(a.dimension_2());l++)
for(IndexType m=0;m<static_cast<IndexType>(a.dimension_3());m++)
for(IndexType n=0;n<static_cast<IndexType>(a.dimension_4());n++)
for(IndexType o=0;o<static_cast<IndexType>(a.dimension_5());o++)
for(IndexType p=0;p<static_cast<IndexType>(a.dimension_6());p++)
for(IndexType q=0;q<static_cast<IndexType>(a.dimension_7());q++)
a(idx,k,l,m,n,o,p,q) = Rand::draw(gen,begin,end);
}
}
rand_pool.free_state(gen);
}
};
}
template<class ViewType, class RandomPool, class IndexType = int64_t>
void fill_random(ViewType a, RandomPool g, typename ViewType::const_value_type range) {
int64_t LDA = a.dimension_0();
if(LDA>0)
parallel_for((LDA+127)/128,Impl::fill_random_functor_range<ViewType,RandomPool,128,ViewType::Rank,IndexType>(a,g,range));
}
template<class ViewType, class RandomPool, class IndexType = int64_t>
void fill_random(ViewType a, RandomPool g, typename ViewType::const_value_type begin,typename ViewType::const_value_type end ) {
int64_t LDA = a.dimension_0();
if(LDA>0)
parallel_for((LDA+127)/128,Impl::fill_random_functor_begin_end<ViewType,RandomPool,128,ViewType::Rank,IndexType>(a,g,begin,end));
}
}
#endif
diff --git a/lib/kokkos/algorithms/src/Kokkos_Sort.hpp b/lib/kokkos/algorithms/src/Kokkos_Sort.hpp
index 5b8c65fee..237de751f 100644
--- a/lib/kokkos/algorithms/src/Kokkos_Sort.hpp
+++ b/lib/kokkos/algorithms/src/Kokkos_Sort.hpp
@@ -1,407 +1,548 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_SORT_HPP_
#define KOKKOS_SORT_HPP_
#include <Kokkos_Core.hpp>
#include <algorithm>
namespace Kokkos {
namespace Impl {
- template<class ValuesViewType, int Rank=ValuesViewType::Rank>
+ template< class DstViewType , class SrcViewType
+ , int Rank = DstViewType::Rank >
struct CopyOp;
- template<class ValuesViewType>
- struct CopyOp<ValuesViewType,1> {
- template<class DstType, class SrcType>
+ template< class DstViewType , class SrcViewType >
+ struct CopyOp<DstViewType,SrcViewType,1> {
KOKKOS_INLINE_FUNCTION
- static void copy(DstType& dst, size_t i_dst,
- SrcType& src, size_t i_src ) {
+ static void copy(DstViewType const& dst, size_t i_dst,
+ SrcViewType const& src, size_t i_src ) {
dst(i_dst) = src(i_src);
}
};
- template<class ValuesViewType>
- struct CopyOp<ValuesViewType,2> {
- template<class DstType, class SrcType>
+ template< class DstViewType , class SrcViewType >
+ struct CopyOp<DstViewType,SrcViewType,2> {
KOKKOS_INLINE_FUNCTION
- static void copy(DstType& dst, size_t i_dst,
- SrcType& src, size_t i_src ) {
- for(int j = 0;j< (int) dst.dimension_1(); j++)
+ static void copy(DstViewType const& dst, size_t i_dst,
+ SrcViewType const& src, size_t i_src ) {
+ for(int j = 0;j< (int) dst.extent(1); j++)
dst(i_dst,j) = src(i_src,j);
}
};
- template<class ValuesViewType>
- struct CopyOp<ValuesViewType,3> {
- template<class DstType, class SrcType>
+ template< class DstViewType , class SrcViewType >
+ struct CopyOp<DstViewType,SrcViewType,3> {
KOKKOS_INLINE_FUNCTION
- static void copy(DstType& dst, size_t i_dst,
- SrcType& src, size_t i_src ) {
- for(int j = 0; j<dst.dimension_1(); j++)
- for(int k = 0; k<dst.dimension_2(); k++)
+ static void copy(DstViewType const& dst, size_t i_dst,
+ SrcViewType const& src, size_t i_src ) {
+ for(int j = 0; j<dst.extent(1); j++)
+ for(int k = 0; k<dst.extent(2); k++)
dst(i_dst,j,k) = src(i_src,j,k);
}
};
}
-template<class KeyViewType, class BinSortOp, class ExecutionSpace = typename KeyViewType::execution_space,
- class SizeType = typename KeyViewType::memory_space::size_type>
+//----------------------------------------------------------------------------
+
+template< class KeyViewType
+ , class BinSortOp
+ , class Space = typename KeyViewType::device_type
+ , class SizeType = typename KeyViewType::memory_space::size_type
+ >
class BinSort {
+public:
+ template< class DstViewType , class SrcViewType >
+ struct copy_functor {
-public:
- template<class ValuesViewType, class PermuteViewType, class CopyOp>
- struct bin_sort_sort_functor {
- typedef ExecutionSpace execution_space;
- typedef typename ValuesViewType::non_const_type values_view_type;
- typedef typename ValuesViewType::const_type const_values_view_type;
- Kokkos::View<typename values_view_type::const_data_type,typename values_view_type::array_layout,
- typename values_view_type::memory_space,Kokkos::MemoryTraits<Kokkos::RandomAccess> > values;
- values_view_type sorted_values;
- typename PermuteViewType::const_type sort_order;
- bin_sort_sort_functor(const_values_view_type values_, values_view_type sorted_values_, PermuteViewType sort_order_):
- values(values_),sorted_values(sorted_values_),sort_order(sort_order_) {}
+ typedef typename SrcViewType::const_type src_view_type ;
+
+ typedef Impl::CopyOp< DstViewType , src_view_type > copy_op ;
+
+ DstViewType dst_values ;
+ src_view_type src_values ;
+ int dst_offset ;
+
+ copy_functor( DstViewType const & dst_values_
+ , int const & dst_offset_
+ , SrcViewType const & src_values_
+ )
+ : dst_values( dst_values_ )
+ , src_values( src_values_ )
+ , dst_offset( dst_offset_ )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator() (const int& i) const {
+ // printf("copy: dst(%i) src(%i)\n",i+dst_offset,i);
+ copy_op::copy(dst_values,i+dst_offset,src_values,i);
+ }
+ };
+
+ template< class DstViewType
+ , class PermuteViewType
+ , class SrcViewType
+ >
+ struct copy_permute_functor {
+
+ // If a Kokkos::View then can generate constant random access
+ // otherwise can only use the constant type.
+
+ typedef typename std::conditional
+ < Kokkos::is_view< SrcViewType >::value
+ , Kokkos::View< typename SrcViewType::const_data_type
+ , typename SrcViewType::array_layout
+ , typename SrcViewType::device_type
+ , Kokkos::MemoryTraits<Kokkos::RandomAccess>
+ >
+ , typename SrcViewType::const_type
+ >::type src_view_type ;
+
+ typedef typename PermuteViewType::const_type perm_view_type ;
+
+ typedef Impl::CopyOp< DstViewType , src_view_type > copy_op ;
+
+ DstViewType dst_values ;
+ perm_view_type sort_order ;
+ src_view_type src_values ;
+
+ copy_permute_functor( DstViewType const & dst_values_
+ , PermuteViewType const & sort_order_
+ , SrcViewType const & src_values_
+ )
+ : dst_values( dst_values_ )
+ , sort_order( sort_order_ )
+ , src_values( src_values_ )
+ {}
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
- //printf("Sort: %i %i\n",i,sort_order(i));
- CopyOp::copy(sorted_values,i,values,sort_order(i));
+ // printf("copy_permute: dst(%i) src(%i)\n",i,sort_order(i));
+ copy_op::copy(dst_values,i,src_values,sort_order(i));
}
};
- typedef ExecutionSpace execution_space;
+ typedef typename Space::execution_space execution_space;
typedef BinSortOp bin_op_type;
struct bin_count_tag {};
struct bin_offset_tag {};
struct bin_binning_tag {};
struct bin_sort_bins_tag {};
public:
+
typedef SizeType size_type;
typedef size_type value_type;
- typedef Kokkos::View<size_type*, execution_space> offset_type;
- typedef Kokkos::View<const int*, execution_space> bin_count_type;
+ typedef Kokkos::View<size_type*, Space> offset_type;
+ typedef Kokkos::View<const int*, Space> bin_count_type;
+ typedef typename KeyViewType::const_type const_key_view_type ;
- typedef Kokkos::View<typename KeyViewType::const_data_type,
- typename KeyViewType::array_layout,
- typename KeyViewType::memory_space> const_key_view_type;
- typedef Kokkos::View<typename KeyViewType::const_data_type,
- typename KeyViewType::array_layout,
- typename KeyViewType::memory_space,
- Kokkos::MemoryTraits<Kokkos::RandomAccess> > const_rnd_key_view_type;
+ // If a Kokkos::View then can generate constant random access
+ // otherwise can only use the constant type.
+
+ typedef typename std::conditional
+ < Kokkos::is_view< KeyViewType >::value
+ , Kokkos::View< typename KeyViewType::const_data_type,
+ typename KeyViewType::array_layout,
+ typename KeyViewType::device_type,
+ Kokkos::MemoryTraits<Kokkos::RandomAccess> >
+ , const_key_view_type
+ >::type const_rnd_key_view_type;
typedef typename KeyViewType::non_const_value_type non_const_key_scalar;
typedef typename KeyViewType::const_value_type const_key_scalar;
+ typedef Kokkos::View<int*, Space, Kokkos::MemoryTraits<Kokkos::Atomic> > bin_count_atomic_type ;
+
private:
+
const_key_view_type keys;
const_rnd_key_view_type keys_rnd;
public:
- BinSortOp bin_op;
- offset_type bin_offsets;
+ BinSortOp bin_op ;
+ offset_type bin_offsets ;
+ bin_count_atomic_type bin_count_atomic ;
+ bin_count_type bin_count_const ;
+ offset_type sort_order ;
- Kokkos::View<int*, ExecutionSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > bin_count_atomic;
- bin_count_type bin_count_const;
-
- offset_type sort_order;
-
- bool sort_within_bins;
+ int range_begin ;
+ int range_end ;
+ bool sort_within_bins ;
public:
- // Constructor: takes the keys, the binning_operator and optionally whether to sort within bins (default false)
- BinSort(const_key_view_type keys_, BinSortOp bin_op_,
- bool sort_within_bins_ = false)
- :keys(keys_),keys_rnd(keys_), bin_op(bin_op_) {
+ BinSort() {}
- bin_count_atomic = Kokkos::View<int*, ExecutionSpace >("Kokkos::SortImpl::BinSortFunctor::bin_count",bin_op.max_bins());
+ //----------------------------------------
+ // Constructor: takes the keys, the binning_operator and optionally whether to sort within bins (default false)
+ BinSort( const_key_view_type keys_
+ , int range_begin_
+ , int range_end_
+ , BinSortOp bin_op_
+ , bool sort_within_bins_ = false
+ )
+ : keys(keys_)
+ , keys_rnd(keys_)
+ , bin_op(bin_op_)
+ , bin_offsets()
+ , bin_count_atomic()
+ , bin_count_const()
+ , sort_order()
+ , range_begin( range_begin_ )
+ , range_end( range_end_ )
+ , sort_within_bins( sort_within_bins_ )
+ {
+ bin_count_atomic = Kokkos::View<int*, Space >("Kokkos::SortImpl::BinSortFunctor::bin_count",bin_op.max_bins());
bin_count_const = bin_count_atomic;
bin_offsets = offset_type("Kokkos::SortImpl::BinSortFunctor::bin_offsets",bin_op.max_bins());
- sort_order = offset_type("PermutationVector",keys.dimension_0());
- sort_within_bins = sort_within_bins_;
+ sort_order = offset_type("PermutationVector",range_end-range_begin);
}
+ BinSort( const_key_view_type keys_
+ , BinSortOp bin_op_
+ , bool sort_within_bins_ = false
+ )
+ : BinSort( keys_ , 0 , keys_.extent(0), bin_op_ , sort_within_bins_ ) {}
+
+ //----------------------------------------
// Create the permutation vector, the bin_offset array and the bin_count array. Can be called again if keys changed
void create_permute_vector() {
- Kokkos::parallel_for (Kokkos::RangePolicy<ExecutionSpace,bin_count_tag> (0,keys.dimension_0()),*this);
- Kokkos::parallel_scan(Kokkos::RangePolicy<ExecutionSpace,bin_offset_tag> (0,bin_op.max_bins()) ,*this);
+ const size_t len = range_end - range_begin ;
+ Kokkos::parallel_for (Kokkos::RangePolicy<execution_space,bin_count_tag> (0,len),*this);
+ Kokkos::parallel_scan(Kokkos::RangePolicy<execution_space,bin_offset_tag> (0,bin_op.max_bins()) ,*this);
Kokkos::deep_copy(bin_count_atomic,0);
- Kokkos::parallel_for (Kokkos::RangePolicy<ExecutionSpace,bin_binning_tag> (0,keys.dimension_0()),*this);
+ Kokkos::parallel_for (Kokkos::RangePolicy<execution_space,bin_binning_tag> (0,len),*this);
if(sort_within_bins)
- Kokkos::parallel_for (Kokkos::RangePolicy<ExecutionSpace,bin_sort_bins_tag>(0,bin_op.max_bins()) ,*this);
+ Kokkos::parallel_for (Kokkos::RangePolicy<execution_space,bin_sort_bins_tag>(0,bin_op.max_bins()) ,*this);
}
// Sort a view with respect ot the first dimension using the permutation array
template<class ValuesViewType>
- void sort(ValuesViewType values) {
- ValuesViewType sorted_values = ValuesViewType("Copy",
- values.dimension_0(),
- values.dimension_1(),
- values.dimension_2(),
- values.dimension_3(),
- values.dimension_4(),
- values.dimension_5(),
- values.dimension_6(),
- values.dimension_7());
-
- parallel_for(values.dimension_0(),
- bin_sort_sort_functor<ValuesViewType, offset_type,
- Impl::CopyOp<ValuesViewType> >(values,sorted_values,sort_order));
-
- deep_copy(values,sorted_values);
+ void sort( ValuesViewType const & values)
+ {
+ typedef
+ Kokkos::View< typename ValuesViewType::data_type,
+ typename ValuesViewType::array_layout,
+ typename ValuesViewType::device_type >
+ scratch_view_type ;
+
+ const size_t len = range_end - range_begin ;
+
+ scratch_view_type
+ sorted_values("Scratch",
+ len,
+ values.extent(1),
+ values.extent(2),
+ values.extent(3),
+ values.extent(4),
+ values.extent(5),
+ values.extent(6),
+ values.extent(7));
+
+ {
+ copy_permute_functor< scratch_view_type /* DstViewType */
+ , offset_type /* PermuteViewType */
+ , ValuesViewType /* SrcViewType */
+ >
+ functor( sorted_values , sort_order , values );
+
+ parallel_for( Kokkos::RangePolicy<execution_space>(0,len),functor);
+ }
+
+ {
+ copy_functor< ValuesViewType , scratch_view_type >
+ functor( values , range_begin , sorted_values );
+
+ parallel_for( Kokkos::RangePolicy<execution_space>(0,len),functor);
+ }
}
// Get the permutation vector
KOKKOS_INLINE_FUNCTION
offset_type get_permute_vector() const { return sort_order;}
// Get the start offsets for each bin
KOKKOS_INLINE_FUNCTION
offset_type get_bin_offsets() const { return bin_offsets;}
// Get the count for each bin
KOKKOS_INLINE_FUNCTION
bin_count_type get_bin_count() const {return bin_count_const;}
public:
+
KOKKOS_INLINE_FUNCTION
void operator() (const bin_count_tag& tag, const int& i) const {
- bin_count_atomic(bin_op.bin(keys,i))++;
+ const int j = range_begin + i ;
+ bin_count_atomic(bin_op.bin(keys,j))++;
}
KOKKOS_INLINE_FUNCTION
void operator() (const bin_offset_tag& tag, const int& i, value_type& offset, const bool& final) const {
if(final) {
bin_offsets(i) = offset;
}
offset+=bin_count_const(i);
}
KOKKOS_INLINE_FUNCTION
void operator() (const bin_binning_tag& tag, const int& i) const {
- const int bin = bin_op.bin(keys,i);
+ const int j = range_begin + i ;
+ const int bin = bin_op.bin(keys,j);
const int count = bin_count_atomic(bin)++;
- sort_order(bin_offsets(bin) + count) = i;
+ sort_order(bin_offsets(bin) + count) = j ;
}
KOKKOS_INLINE_FUNCTION
void operator() (const bin_sort_bins_tag& tag, const int&i ) const {
bool sorted = false;
int upper_bound = bin_offsets(i)+bin_count_const(i);
while(!sorted) {
sorted = true;
int old_idx = sort_order(bin_offsets(i));
int new_idx;
for(int k=bin_offsets(i)+1; k<upper_bound; k++) {
new_idx = sort_order(k);
if(!bin_op(keys_rnd,old_idx,new_idx)) {
sort_order(k-1) = new_idx;
sort_order(k) = old_idx;
sorted = false;
} else {
old_idx = new_idx;
}
}
upper_bound--;
}
}
};
+//----------------------------------------------------------------------------
+
template<class KeyViewType>
struct BinOp1D {
- const int max_bins_;
- const double mul_;
+ int max_bins_;
+ double mul_;
typename KeyViewType::const_value_type range_;
typename KeyViewType::const_value_type min_;
+ BinOp1D():max_bins_(0),mul_(0.0),
+ range_(typename KeyViewType::const_value_type()),
+ min_(typename KeyViewType::const_value_type()) {}
+
//Construct BinOp with number of bins, minimum value and maxuimum value
BinOp1D(int max_bins__, typename KeyViewType::const_value_type min,
typename KeyViewType::const_value_type max )
:max_bins_(max_bins__+1),mul_(1.0*max_bins__/(max-min)),range_(max-min),min_(min) {}
//Determine bin index from key value
template<class ViewType>
KOKKOS_INLINE_FUNCTION
int bin(ViewType& keys, const int& i) const {
return int(mul_*(keys(i)-min_));
}
//Return maximum bin index + 1
KOKKOS_INLINE_FUNCTION
int max_bins() const {
return max_bins_;
}
//Compare to keys within a bin if true new_val will be put before old_val
template<class ViewType, typename iType1, typename iType2>
KOKKOS_INLINE_FUNCTION
bool operator()(ViewType& keys, iType1& i1, iType2& i2) const {
return keys(i1)<keys(i2);
}
};
template<class KeyViewType>
struct BinOp3D {
int max_bins_[3];
double mul_[3];
typename KeyViewType::non_const_value_type range_[3];
typename KeyViewType::non_const_value_type min_[3];
+ BinOp3D() {}
+
BinOp3D(int max_bins__[], typename KeyViewType::const_value_type min[],
typename KeyViewType::const_value_type max[] )
{
- max_bins_[0] = max_bins__[0]+1;
- max_bins_[1] = max_bins__[1]+1;
- max_bins_[2] = max_bins__[2]+1;
+ max_bins_[0] = max_bins__[0];
+ max_bins_[1] = max_bins__[1];
+ max_bins_[2] = max_bins__[2];
mul_[0] = 1.0*max_bins__[0]/(max[0]-min[0]);
mul_[1] = 1.0*max_bins__[1]/(max[1]-min[1]);
mul_[2] = 1.0*max_bins__[2]/(max[2]-min[2]);
range_[0] = max[0]-min[0];
range_[1] = max[1]-min[1];
range_[2] = max[2]-min[2];
min_[0] = min[0];
min_[1] = min[1];
min_[2] = min[2];
}
template<class ViewType>
KOKKOS_INLINE_FUNCTION
int bin(ViewType& keys, const int& i) const {
return int( (((int(mul_[0]*(keys(i,0)-min_[0]))*max_bins_[1]) +
int(mul_[1]*(keys(i,1)-min_[1])))*max_bins_[2]) +
int(mul_[2]*(keys(i,2)-min_[2])));
}
KOKKOS_INLINE_FUNCTION
int max_bins() const {
return max_bins_[0]*max_bins_[1]*max_bins_[2];
}
template<class ViewType, typename iType1, typename iType2>
KOKKOS_INLINE_FUNCTION
bool operator()(ViewType& keys, iType1& i1 , iType2& i2) const {
if (keys(i1,0)>keys(i2,0)) return true;
else if (keys(i1,0)==keys(i2,0)) {
if (keys(i1,1)>keys(i2,1)) return true;
else if (keys(i1,1)==keys(i2,2)) {
if (keys(i1,2)>keys(i2,2)) return true;
}
}
return false;
}
};
namespace Impl {
template<class ViewType>
bool try_std_sort(ViewType view) {
bool possible = true;
size_t stride[8] = { view.stride_0()
, view.stride_1()
, view.stride_2()
, view.stride_3()
, view.stride_4()
, view.stride_5()
, view.stride_6()
, view.stride_7()
};
possible = possible && std::is_same<typename ViewType::memory_space, HostSpace>::value;
possible = possible && (ViewType::Rank == 1);
possible = possible && (stride[0] == 1);
if(possible) {
- std::sort(view.ptr_on_device(),view.ptr_on_device()+view.dimension_0());
+ std::sort(view.data(),view.data()+view.extent(0));
}
return possible;
}
template<class ViewType>
struct min_max_functor {
typedef Kokkos::Experimental::MinMaxScalar<typename ViewType::non_const_value_type> minmax_scalar;
ViewType view;
min_max_functor(const ViewType& view_):view(view_) {}
KOKKOS_INLINE_FUNCTION
void operator() (const size_t& i, minmax_scalar& minmax) const {
if(view(i) < minmax.min_val) minmax.min_val = view(i);
if(view(i) > minmax.max_val) minmax.max_val = view(i);
}
};
}
template<class ViewType>
-void sort(ViewType view, bool always_use_kokkos_sort = false) {
+void sort( ViewType const & view , bool const always_use_kokkos_sort = false)
+{
if(!always_use_kokkos_sort) {
if(Impl::try_std_sort(view)) return;
}
typedef BinOp1D<ViewType> CompType;
Kokkos::Experimental::MinMaxScalar<typename ViewType::non_const_value_type> result;
Kokkos::Experimental::MinMax<typename ViewType::non_const_value_type> reducer(result);
- parallel_reduce(Kokkos::RangePolicy<typename ViewType::execution_space>(0,view.dimension_0()),
+ parallel_reduce(Kokkos::RangePolicy<typename ViewType::execution_space>(0,view.extent(0)),
Impl::min_max_functor<ViewType>(view),reducer);
if(result.min_val == result.max_val) return;
- BinSort<ViewType, CompType> bin_sort(view,CompType(view.dimension_0()/2,result.min_val,result.max_val),true);
+ BinSort<ViewType, CompType> bin_sort(view,CompType(view.extent(0)/2,result.min_val,result.max_val),true);
bin_sort.create_permute_vector();
bin_sort.sort(view);
}
+template<class ViewType>
+void sort( ViewType view
+ , size_t const begin
+ , size_t const end
+ )
+{
+ typedef Kokkos::RangePolicy<typename ViewType::execution_space> range_policy ;
+ typedef BinOp1D<ViewType> CompType;
+
+ Kokkos::Experimental::MinMaxScalar<typename ViewType::non_const_value_type> result;
+ Kokkos::Experimental::MinMax<typename ViewType::non_const_value_type> reducer(result);
+
+ parallel_reduce( range_policy( begin , end )
+ , Impl::min_max_functor<ViewType>(view),reducer );
+
+ if(result.min_val == result.max_val) return;
+
+ BinSort<ViewType, CompType>
+ bin_sort(view,begin,end,CompType((end-begin)/2,result.min_val,result.max_val),true);
+
+ bin_sort.create_permute_vector();
+ bin_sort.sort(view);
+}
}
#endif
diff --git a/lib/kokkos/algorithms/unit_tests/TestSort.hpp b/lib/kokkos/algorithms/unit_tests/TestSort.hpp
index 03e4fb691..61ffa6f43 100644
--- a/lib/kokkos/algorithms/unit_tests/TestSort.hpp
+++ b/lib/kokkos/algorithms/unit_tests/TestSort.hpp
@@ -1,210 +1,275 @@
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
#ifndef TESTSORT_HPP_
#define TESTSORT_HPP_
#include <gtest/gtest.h>
#include<Kokkos_Core.hpp>
+#include<Kokkos_DynamicView.hpp>
#include<Kokkos_Random.hpp>
#include<Kokkos_Sort.hpp>
namespace Test {
namespace Impl{
template<class ExecutionSpace, class Scalar>
struct is_sorted_struct {
typedef unsigned int value_type;
typedef ExecutionSpace execution_space;
Kokkos::View<Scalar*,ExecutionSpace> keys;
is_sorted_struct(Kokkos::View<Scalar*,ExecutionSpace> keys_):keys(keys_) {}
KOKKOS_INLINE_FUNCTION
void operator() (int i, unsigned int& count) const {
if(keys(i)>keys(i+1)) count++;
}
};
template<class ExecutionSpace, class Scalar>
struct sum {
typedef double value_type;
typedef ExecutionSpace execution_space;
Kokkos::View<Scalar*,ExecutionSpace> keys;
sum(Kokkos::View<Scalar*,ExecutionSpace> keys_):keys(keys_) {}
KOKKOS_INLINE_FUNCTION
void operator() (int i, double& count) const {
count+=keys(i);
}
};
template<class ExecutionSpace, class Scalar>
struct bin3d_is_sorted_struct {
typedef unsigned int value_type;
typedef ExecutionSpace execution_space;
Kokkos::View<Scalar*[3],ExecutionSpace> keys;
int max_bins;
Scalar min;
Scalar max;
bin3d_is_sorted_struct(Kokkos::View<Scalar*[3],ExecutionSpace> keys_,int max_bins_,Scalar min_,Scalar max_):
keys(keys_),max_bins(max_bins_),min(min_),max(max_) {
}
KOKKOS_INLINE_FUNCTION
void operator() (int i, unsigned int& count) const {
int ix1 = int ((keys(i,0)-min)/max * max_bins);
int iy1 = int ((keys(i,1)-min)/max * max_bins);
int iz1 = int ((keys(i,2)-min)/max * max_bins);
int ix2 = int ((keys(i+1,0)-min)/max * max_bins);
int iy2 = int ((keys(i+1,1)-min)/max * max_bins);
int iz2 = int ((keys(i+1,2)-min)/max * max_bins);
if (ix1>ix2) count++;
else if(ix1==ix2) {
if (iy1>iy2) count++;
else if ((iy1==iy2) && (iz1>iz2)) count++;
}
}
};
template<class ExecutionSpace, class Scalar>
struct sum3D {
typedef double value_type;
typedef ExecutionSpace execution_space;
Kokkos::View<Scalar*[3],ExecutionSpace> keys;
sum3D(Kokkos::View<Scalar*[3],ExecutionSpace> keys_):keys(keys_) {}
KOKKOS_INLINE_FUNCTION
void operator() (int i, double& count) const {
count+=keys(i,0);
count+=keys(i,1);
count+=keys(i,2);
}
};
template<class ExecutionSpace, typename KeyType>
void test_1D_sort(unsigned int n,bool force_kokkos) {
typedef Kokkos::View<KeyType*,ExecutionSpace> KeyViewType;
KeyViewType keys("Keys",n);
// Test sorting array with all numbers equal
Kokkos::deep_copy(keys,KeyType(1));
Kokkos::sort(keys,force_kokkos);
Kokkos::Random_XorShift64_Pool<ExecutionSpace> g(1931);
Kokkos::fill_random(keys,g,Kokkos::Random_XorShift64_Pool<ExecutionSpace>::generator_type::MAX_URAND);
double sum_before = 0.0;
double sum_after = 0.0;
unsigned int sort_fails = 0;
Kokkos::parallel_reduce(n,sum<ExecutionSpace, KeyType>(keys),sum_before);
Kokkos::sort(keys,force_kokkos);
Kokkos::parallel_reduce(n,sum<ExecutionSpace, KeyType>(keys),sum_after);
Kokkos::parallel_reduce(n-1,is_sorted_struct<ExecutionSpace, KeyType>(keys),sort_fails);
double ratio = sum_before/sum_after;
double epsilon = 1e-10;
unsigned int equal_sum = (ratio > (1.0-epsilon)) && (ratio < (1.0+epsilon)) ? 1 : 0;
ASSERT_EQ(sort_fails,0);
ASSERT_EQ(equal_sum,1);
}
template<class ExecutionSpace, typename KeyType>
void test_3D_sort(unsigned int n) {
typedef Kokkos::View<KeyType*[3],ExecutionSpace > KeyViewType;
KeyViewType keys("Keys",n*n*n);
Kokkos::Random_XorShift64_Pool<ExecutionSpace> g(1931);
Kokkos::fill_random(keys,g,100.0);
double sum_before = 0.0;
double sum_after = 0.0;
unsigned int sort_fails = 0;
Kokkos::parallel_reduce(keys.dimension_0(),sum3D<ExecutionSpace, KeyType>(keys),sum_before);
int bin_1d = 1;
while( bin_1d*bin_1d*bin_1d*4< (int) keys.dimension_0() ) bin_1d*=2;
int bin_max[3] = {bin_1d,bin_1d,bin_1d};
typename KeyViewType::value_type min[3] = {0,0,0};
typename KeyViewType::value_type max[3] = {100,100,100};
typedef Kokkos::BinOp3D< KeyViewType > BinOp;
BinOp bin_op(bin_max,min,max);
Kokkos::BinSort< KeyViewType , BinOp >
Sorter(keys,bin_op,false);
Sorter.create_permute_vector();
Sorter.template sort< KeyViewType >(keys);
Kokkos::parallel_reduce(keys.dimension_0(),sum3D<ExecutionSpace, KeyType>(keys),sum_after);
Kokkos::parallel_reduce(keys.dimension_0()-1,bin3d_is_sorted_struct<ExecutionSpace, KeyType>(keys,bin_1d,min[0],max[0]),sort_fails);
double ratio = sum_before/sum_after;
double epsilon = 1e-10;
unsigned int equal_sum = (ratio > (1.0-epsilon)) && (ratio < (1.0+epsilon)) ? 1 : 0;
- printf("3D Sort Sum: %f %f Fails: %u\n",sum_before,sum_after,sort_fails);
+ if ( sort_fails )
+ printf("3D Sort Sum: %f %f Fails: %u\n",sum_before,sum_after,sort_fails);
+
ASSERT_EQ(sort_fails,0);
ASSERT_EQ(equal_sum,1);
}
+//----------------------------------------------------------------------------
+
+template<class ExecutionSpace, typename KeyType>
+void test_dynamic_view_sort(unsigned int n )
+{
+ typedef typename ExecutionSpace::memory_space memory_space ;
+ typedef Kokkos::Experimental::DynamicView<KeyType*,ExecutionSpace> KeyDynamicViewType;
+ typedef Kokkos::View<KeyType*,ExecutionSpace> KeyViewType;
+
+ const size_t upper_bound = 2 * n ;
+
+ typename KeyDynamicViewType::memory_pool
+ pool( memory_space() , 2 * n * sizeof(KeyType) );
+
+ KeyDynamicViewType keys("Keys",pool,upper_bound);
+
+ keys.resize_serial(n);
+
+ KeyViewType keys_view("KeysTmp", n );
+
+ // Test sorting array with all numbers equal
+ Kokkos::deep_copy(keys_view,KeyType(1));
+ Kokkos::Experimental::deep_copy(keys,keys_view);
+ Kokkos::sort(keys, 0 /* begin */ , n /* end */ );
+
+ Kokkos::Random_XorShift64_Pool<ExecutionSpace> g(1931);
+ Kokkos::fill_random(keys_view,g,Kokkos::Random_XorShift64_Pool<ExecutionSpace>::generator_type::MAX_URAND);
+
+ Kokkos::Experimental::deep_copy(keys,keys_view);
+
+ double sum_before = 0.0;
+ double sum_after = 0.0;
+ unsigned int sort_fails = 0;
+
+ Kokkos::parallel_reduce(n,sum<ExecutionSpace, KeyType>(keys_view),sum_before);
+
+ Kokkos::sort(keys, 0 /* begin */ , n /* end */ );
+
+ Kokkos::Experimental::deep_copy( keys_view , keys );
+
+ Kokkos::parallel_reduce(n,sum<ExecutionSpace, KeyType>(keys_view),sum_after);
+ Kokkos::parallel_reduce(n-1,is_sorted_struct<ExecutionSpace, KeyType>(keys_view),sort_fails);
+
+ double ratio = sum_before/sum_after;
+ double epsilon = 1e-10;
+ unsigned int equal_sum = (ratio > (1.0-epsilon)) && (ratio < (1.0+epsilon)) ? 1 : 0;
+
+ if ( sort_fails != 0 || equal_sum != 1 ) {
+ std::cout << " N = " << n
+ << " ; sum_before = " << sum_before
+ << " ; sum_after = " << sum_after
+ << " ; ratio = " << ratio
+ << std::endl ;
+ }
+
+ ASSERT_EQ(sort_fails,0);
+ ASSERT_EQ(equal_sum,1);
+}
+
+//----------------------------------------------------------------------------
+
template<class ExecutionSpace, typename KeyType>
void test_sort(unsigned int N)
{
test_1D_sort<ExecutionSpace,KeyType>(N*N*N, true);
test_1D_sort<ExecutionSpace,KeyType>(N*N*N, false);
test_3D_sort<ExecutionSpace,KeyType>(N);
+ test_dynamic_view_sort<ExecutionSpace,KeyType>(N*N);
}
}
}
#endif /* TESTSORT_HPP_ */
diff --git a/lib/kokkos/bin/nvcc_wrapper b/lib/kokkos/bin/nvcc_wrapper
index cb206cf88..09fa5d500 100755
--- a/lib/kokkos/bin/nvcc_wrapper
+++ b/lib/kokkos/bin/nvcc_wrapper
@@ -1,284 +1,287 @@
#!/bin/bash
#
# This shell script (nvcc_wrapper) wraps both the host compiler and
# NVCC, if you are building legacy C or C++ code with CUDA enabled.
# The script remedies some differences between the interface of NVCC
# and that of the host compiler, in particular for linking.
# It also means that a legacy code doesn't need separate .cu files;
# it can just use .cpp files.
#
# Default settings: change those according to your machine. For
# example, you may have have two different wrappers with either icpc
# or g++ as their back-end compiler. The defaults can be overwritten
# by using the usual arguments (e.g., -arch=sm_30 -ccbin icpc).
default_arch="sm_35"
#default_arch="sm_50"
#
# The default C++ compiler.
#
host_compiler=${NVCC_WRAPPER_DEFAULT_COMPILER:-"g++"}
#host_compiler="icpc"
#host_compiler="/usr/local/gcc/4.8.3/bin/g++"
#host_compiler="/usr/local/gcc/4.9.1/bin/g++"
#
# Internal variables
#
# C++ files
cpp_files=""
# Host compiler arguments
xcompiler_args=""
# Cuda (NVCC) only arguments
cuda_args=""
# Arguments for both NVCC and Host compiler
shared_args=""
# Linker arguments
xlinker_args=""
# Object files passable to NVCC
object_files=""
# Link objects for the host linker only
object_files_xlinker=""
# Shared libraries with version numbers are not handled correctly by NVCC
shared_versioned_libraries_host=""
shared_versioned_libraries=""
# Does the User set the architecture
arch_set=0
# Does the user overwrite the host compiler
ccbin_set=0
#Error code of compilation
error_code=0
# Do a dry run without actually compiling
dry_run=0
# Skip NVCC compilation and use host compiler directly
host_only=0
# Enable workaround for CUDA 6.5 for pragma ident
replace_pragma_ident=0
# Mark first host compiler argument
first_xcompiler_arg=1
temp_dir=${TMPDIR:-/tmp}
# Check if we have an optimization argument already
optimization_applied=0
#echo "Arguments: $# $@"
while [ $# -gt 0 ]
do
case $1 in
#show the executed command
--show|--nvcc-wrapper-show)
dry_run=1
;;
#run host compilation only
--host-only)
host_only=1
;;
#replace '#pragma ident' with '#ident' this is needed to compile OpenMPI due to a configure script bug and a non standardized behaviour of pragma with macros
--replace-pragma-ident)
replace_pragma_ident=1
;;
#handle source files to be compiled as cuda files
*.cpp|*.cxx|*.cc|*.C|*.c++|*.cu)
cpp_files="$cpp_files $1"
;;
# Ensure we only have one optimization flag because NVCC doesn't allow muliple
-O*)
if [ $optimization_applied -eq 1 ]; then
echo "nvcc_wrapper - *warning* you have set multiple optimization flags (-O*), only the first is used because nvcc can only accept a single optimization setting."
else
shared_args="$shared_args $1"
optimization_applied=1
fi
;;
#Handle shared args (valid for both nvcc and the host compiler)
-D*|-c|-I*|-L*|-l*|-g|--help|--version|-E|-M|-shared)
shared_args="$shared_args $1"
;;
#Handle shared args that have an argument
-o|-MT)
shared_args="$shared_args $1 $2"
shift
;;
#Handle known nvcc args
-gencode*|--dryrun|--verbose|--keep|--keep-dir*|-G|--relocatable-device-code*|-lineinfo|-expt-extended-lambda|--resource-usage|-Xptxas*)
cuda_args="$cuda_args $1"
;;
#Handle more known nvcc args
--expt-extended-lambda|--expt-relaxed-constexpr)
cuda_args="$cuda_args $1"
;;
#Handle known nvcc args that have an argument
-rdc|-maxrregcount|--default-stream)
cuda_args="$cuda_args $1 $2"
shift
;;
#Handle c++11 setting
--std=c++11|-std=c++11)
shared_args="$shared_args $1"
;;
#strip of -std=c++98 due to nvcc warnings and Tribits will place both -std=c++11 and -std=c++98
-std=c++98|--std=c++98)
;;
#strip of pedantic because it produces endless warnings about #LINE added by the preprocessor
-pedantic|-Wpedantic|-ansi)
;;
+ #strip of -Woverloaded-virtual to avoid "cc1: warning: command line option ‘-Woverloaded-virtual’ is valid for C++/ObjC++ but not for C"
+ -Woverloaded-virtual)
+ ;;
#strip -Xcompiler because we add it
-Xcompiler)
if [ $first_xcompiler_arg -eq 1 ]; then
xcompiler_args="$2"
first_xcompiler_arg=0
else
xcompiler_args="$xcompiler_args,$2"
fi
shift
;;
#strip of "-x cu" because we add that
-x)
if [[ $2 != "cu" ]]; then
if [ $first_xcompiler_arg -eq 1 ]; then
xcompiler_args="-x,$2"
first_xcompiler_arg=0
else
xcompiler_args="$xcompiler_args,-x,$2"
fi
fi
shift
;;
#Handle -ccbin (if its not set we can set it to a default value)
-ccbin)
cuda_args="$cuda_args $1 $2"
ccbin_set=1
host_compiler=$2
shift
;;
#Handle -arch argument (if its not set use a default
-arch*)
cuda_args="$cuda_args $1"
arch_set=1
;;
#Handle -Xcudafe argument
-Xcudafe)
cuda_args="$cuda_args -Xcudafe $2"
shift
;;
#Handle args that should be sent to the linker
-Wl*)
xlinker_args="$xlinker_args -Xlinker ${1:4:${#1}}"
host_linker_args="$host_linker_args ${1:4:${#1}}"
;;
#Handle object files: -x cu applies to all input files, so give them to linker, except if only linking
*.a|*.so|*.o|*.obj)
object_files="$object_files $1"
object_files_xlinker="$object_files_xlinker -Xlinker $1"
;;
#Handle object files which always need to use "-Xlinker": -x cu applies to all input files, so give them to linker, except if only linking
- *.dylib)
+ @*|*.dylib)
object_files="$object_files -Xlinker $1"
object_files_xlinker="$object_files_xlinker -Xlinker $1"
;;
#Handle shared libraries with *.so.* names which nvcc can't do.
*.so.*)
shared_versioned_libraries_host="$shared_versioned_libraries_host $1"
shared_versioned_libraries="$shared_versioned_libraries -Xlinker $1"
;;
#All other args are sent to the host compiler
*)
if [ $first_xcompiler_arg -eq 1 ]; then
xcompiler_args=$1
first_xcompiler_arg=0
else
xcompiler_args="$xcompiler_args,$1"
fi
;;
esac
shift
done
#Add default host compiler if necessary
if [ $ccbin_set -ne 1 ]; then
cuda_args="$cuda_args -ccbin $host_compiler"
fi
#Add architecture command
if [ $arch_set -ne 1 ]; then
cuda_args="$cuda_args -arch=$default_arch"
fi
#Compose compilation command
nvcc_command="nvcc $cuda_args $shared_args $xlinker_args $shared_versioned_libraries"
if [ $first_xcompiler_arg -eq 0 ]; then
nvcc_command="$nvcc_command -Xcompiler $xcompiler_args"
fi
#Compose host only command
host_command="$host_compiler $shared_args $xcompiler_args $host_linker_args $shared_versioned_libraries_host"
#nvcc does not accept '#pragma ident SOME_MACRO_STRING' but it does accept '#ident SOME_MACRO_STRING'
if [ $replace_pragma_ident -eq 1 ]; then
cpp_files2=""
for file in $cpp_files
do
var=`grep pragma ${file} | grep ident | grep "#"`
if [ "${#var}" -gt 0 ]
then
sed 's/#[\ \t]*pragma[\ \t]*ident/#ident/g' $file > $temp_dir/nvcc_wrapper_tmp_$file
cpp_files2="$cpp_files2 $temp_dir/nvcc_wrapper_tmp_$file"
else
cpp_files2="$cpp_files2 $file"
fi
done
cpp_files=$cpp_files2
#echo $cpp_files
fi
if [ "$cpp_files" ]; then
nvcc_command="$nvcc_command $object_files_xlinker -x cu $cpp_files"
else
nvcc_command="$nvcc_command $object_files"
fi
if [ "$cpp_files" ]; then
host_command="$host_command $object_files $cpp_files"
else
host_command="$host_command $object_files"
fi
#Print command for dryrun
if [ $dry_run -eq 1 ]; then
if [ $host_only -eq 1 ]; then
echo $host_command
else
echo $nvcc_command
fi
exit 0
fi
#Run compilation command
if [ $host_only -eq 1 ]; then
$host_command
else
$nvcc_command
fi
error_code=$?
#Report error code
exit $error_code
diff --git a/lib/kokkos/cmake/tpls/FindTPLQTHREAD.cmake b/lib/kokkos/cmake/deps/QTHREADS.cmake
similarity index 98%
rename from lib/kokkos/cmake/tpls/FindTPLQTHREAD.cmake
rename to lib/kokkos/cmake/deps/QTHREADS.cmake
index 994b72b20..c312f2590 100644
--- a/lib/kokkos/cmake/tpls/FindTPLQTHREAD.cmake
+++ b/lib/kokkos/cmake/deps/QTHREADS.cmake
@@ -1,70 +1,69 @@
# @HEADER
# ************************************************************************
#
# Trilinos: An Object-Oriented Solver Framework
# Copyright (2001) Sandia Corporation
#
#
# Copyright (2001) Sandia Corporation. Under the terms of Contract
# DE-AC04-94AL85000, there is a non-exclusive license for use of this
# work by or on behalf of the U.S. Government. Export of this program
# may require a license from the United States Government.
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# 3. Neither the name of the Corporation nor the names of the
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# NOTICE: The United States Government is granted for itself and others
# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
# license in this data to reproduce, prepare derivative works, and
# perform publicly and display publicly. Beginning five (5) years from
# July 25, 2001, the United States Government is granted for itself and
# others acting on its behalf a paid-up, nonexclusive, irrevocable
# worldwide license in this data to reproduce, prepare derivative works,
# distribute copies to the public, perform publicly and display
# publicly, and to permit others to do so.
#
# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
#
# ************************************************************************
# @HEADER
#-----------------------------------------------------------------------------
# Hardware locality detection and control library.
#
# Acquisition information:
# Date checked: July 2014
# Checked by: H. Carter Edwards <hcedwar AT sandia.gov>
# Source: https://code.google.com/p/qthreads
#
-TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES( QTHREAD
+TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES( QTHREADS
REQUIRED_HEADERS qthread.h
REQUIRED_LIBS_NAMES "qthread"
)
-
diff --git a/lib/kokkos/cmake/deps/QTHREAD.cmake b/lib/kokkos/cmake/tpls/FindTPLQTHREADS.cmake
similarity index 98%
rename from lib/kokkos/cmake/deps/QTHREAD.cmake
rename to lib/kokkos/cmake/tpls/FindTPLQTHREADS.cmake
index 994b72b20..c312f2590 100644
--- a/lib/kokkos/cmake/deps/QTHREAD.cmake
+++ b/lib/kokkos/cmake/tpls/FindTPLQTHREADS.cmake
@@ -1,70 +1,69 @@
# @HEADER
# ************************************************************************
#
# Trilinos: An Object-Oriented Solver Framework
# Copyright (2001) Sandia Corporation
#
#
# Copyright (2001) Sandia Corporation. Under the terms of Contract
# DE-AC04-94AL85000, there is a non-exclusive license for use of this
# work by or on behalf of the U.S. Government. Export of this program
# may require a license from the United States Government.
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# 3. Neither the name of the Corporation nor the names of the
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# NOTICE: The United States Government is granted for itself and others
# acting on its behalf a paid-up, nonexclusive, irrevocable worldwide
# license in this data to reproduce, prepare derivative works, and
# perform publicly and display publicly. Beginning five (5) years from
# July 25, 2001, the United States Government is granted for itself and
# others acting on its behalf a paid-up, nonexclusive, irrevocable
# worldwide license in this data to reproduce, prepare derivative works,
# distribute copies to the public, perform publicly and display
# publicly, and to permit others to do so.
#
# NEITHER THE UNITED STATES GOVERNMENT, NOR THE UNITED STATES DEPARTMENT
# OF ENERGY, NOR SANDIA CORPORATION, NOR ANY OF THEIR EMPLOYEES, MAKES
# ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR
# RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OF ANY
# INFORMATION, APPARATUS, PRODUCT, OR PROCESS DISCLOSED, OR REPRESENTS
# THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS.
#
# ************************************************************************
# @HEADER
#-----------------------------------------------------------------------------
# Hardware locality detection and control library.
#
# Acquisition information:
# Date checked: July 2014
# Checked by: H. Carter Edwards <hcedwar AT sandia.gov>
# Source: https://code.google.com/p/qthreads
#
-TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES( QTHREAD
+TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES( QTHREADS
REQUIRED_HEADERS qthread.h
REQUIRED_LIBS_NAMES "qthread"
)
-
diff --git a/lib/kokkos/config/kokkos_dev/config-core-all.sh b/lib/kokkos/config/kokkos_dev/config-core-all.sh
index fa588c778..d4fb25a8e 100755
--- a/lib/kokkos/config/kokkos_dev/config-core-all.sh
+++ b/lib/kokkos/config/kokkos_dev/config-core-all.sh
@@ -1,113 +1,110 @@
#!/bin/sh
#
# Copy this script, put it outside the Trilinos source directory, and
# build there.
#
#-----------------------------------------------------------------------------
# Building on 'kokkos-dev.sandia.gov' with enabled capabilities:
#
-# Cuda, OpenMP, Threads, Qthread, hwloc
+# Cuda, OpenMP, Threads, Qthreads, hwloc
#
# module loaded on 'kokkos-dev.sandia.gov' for this build
#
# module load cmake/2.8.11.2 gcc/4.8.3 cuda/6.5.14 nvcc-wrapper/gnu
#
# The 'nvcc-wrapper' module should load a script that matches
# kokkos/config/nvcc_wrapper
#
#-----------------------------------------------------------------------------
# Source and installation directories:
TRILINOS_SOURCE_DIR=${HOME}/Trilinos
TRILINOS_INSTALL_DIR=${HOME}/TrilinosInstall/`date +%F`
CMAKE_CONFIGURE=""
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_INSTALL_PREFIX=${TRILINOS_INSTALL_DIR}"
#-----------------------------------------------------------------------------
# Debug/optimized
# CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_BUILD_TYPE:STRING=DEBUG"
# CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Kokkos_ENABLE_BOUNDS_CHECK:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_BUILD_TYPE:STRING=RELEASE"
#-----------------------------------------------------------------------------
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_CXX_FLAGS:STRING=-Wall"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_C_COMPILER=gcc"
#-----------------------------------------------------------------------------
# Cuda using GNU, use the nvcc_wrapper to build CUDA source
# CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_CXX_COMPILER=g++"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D CMAKE_CXX_COMPILER=nvcc_wrapper"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_CUDA:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_CUSPARSE:BOOL=ON"
#-----------------------------------------------------------------------------
# Configure for Kokkos subpackages and tests:
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_Fortran:BOOL=OFF"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_ALL_PACKAGES:BOOL=OFF"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_EXAMPLES:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_TESTS:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_KokkosCore:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_KokkosContainers:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_KokkosAlgorithms:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_TpetraKernels:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_KokkosExample:BOOL=ON"
#-----------------------------------------------------------------------------
# Hardware locality configuration:
HWLOC_BASE_DIR="/home/projects/hwloc/1.7.1/host/gnu/4.7.3"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_HWLOC:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D HWLOC_INCLUDE_DIRS:FILEPATH=${HWLOC_BASE_DIR}/include"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D HWLOC_LIBRARY_DIRS:FILEPATH=${HWLOC_BASE_DIR}/lib"
#-----------------------------------------------------------------------------
# Pthread
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_Pthread:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Kokkos_ENABLE_Pthread:BOOL=ON"
#-----------------------------------------------------------------------------
# OpenMP
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_OpenMP:BOOL=ON"
CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Kokkos_ENABLE_OpenMP:BOOL=ON"
#-----------------------------------------------------------------------------
-# Qthread
+# Qthreads
-QTHREAD_BASE_DIR="/home/projects/qthreads/2014-07-08/host/gnu/4.7.3"
+QTHREADS_BASE_DIR="/home/projects/qthreads/2014-07-08/host/gnu/4.7.3"
-CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_QTHREAD:BOOL=ON"
-CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D QTHREAD_INCLUDE_DIRS:FILEPATH=${QTHREAD_BASE_DIR}/include"
-CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D QTHREAD_LIBRARY_DIRS:FILEPATH=${QTHREAD_BASE_DIR}/lib"
+CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D TPL_ENABLE_QTHREADS:BOOL=ON"
+CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D QTHREADS_INCLUDE_DIRS:FILEPATH=${QTHREADS_BASE_DIR}/include"
+CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D QTHREADS_LIBRARY_DIRS:FILEPATH=${QTHREADS_BASE_DIR}/lib"
#-----------------------------------------------------------------------------
# C++11
# CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Trilinos_ENABLE_CXX11:BOOL=ON"
# CMAKE_CONFIGURE="${CMAKE_CONFIGURE} -D Kokkos_ENABLE_CXX11:BOOL=ON"
#-----------------------------------------------------------------------------
#
# Remove CMake output files to force reconfigure from scratch.
#
rm -rf CMake* Trilinos* packages Dart* Testing cmake_install.cmake MakeFile*
#
echo cmake ${CMAKE_CONFIGURE} ${TRILINOS_SOURCE_DIR}
cmake ${CMAKE_CONFIGURE} ${TRILINOS_SOURCE_DIR}
-
-#-----------------------------------------------------------------------------
-
diff --git a/lib/kokkos/config/master_history.txt b/lib/kokkos/config/master_history.txt
index 446cbb021..9eaecb503 100644
--- a/lib/kokkos/config/master_history.txt
+++ b/lib/kokkos/config/master_history.txt
@@ -1,7 +1,8 @@
tag: 2.01.00 date: 07:21:2016 master: xxxxxxxx develop: fa6dfcc4
tag: 2.01.06 date: 09:02:2016 master: 9afaa87f develop: 555f1a3a
tag: 2.01.10 date: 09:27:2016 master: e4119325 develop: e6cda11e
tag: 2.02.00 date: 10:30:2016 master: 6c90a581 develop: ca3dd56e
tag: 2.02.01 date: 11:01:2016 master: 9c698c86 develop: b0072304
tag: 2.02.07 date: 12:16:2016 master: 4b4cc4ba develop: 382c0966
-tag: 2.02.15 date: 02:10:2017 master: 8c64cd93 develop: 28dea8b6
+tag: 2.02.15 date: 02:10:2017 master: 8c64cd93 develop: 28dea8b6
+tag: 2.03.00 date: 04:25:2017 master: 120d9ce7 develop: 015ba641
diff --git a/lib/kokkos/config/test_all_sandia b/lib/kokkos/config/test_all_sandia
index 2c15e951b..690960664 100755
--- a/lib/kokkos/config/test_all_sandia
+++ b/lib/kokkos/config/test_all_sandia
@@ -1,676 +1,714 @@
#!/bin/bash -e
#
# Global config
#
set -o pipefail
-# Determine current machine
+# Determine current machine.
MACHINE=""
HOSTNAME=$(hostname)
PROCESSOR=`uname -p`
if [[ "$HOSTNAME" =~ (white|ride).* ]]; then
- MACHINE=white
+ MACHINE=white
elif [[ "$HOSTNAME" =~ .*bowman.* ]]; then
- MACHINE=bowman
+ MACHINE=bowman
elif [[ "$HOSTNAME" =~ node.* ]]; then # Warning: very generic name
- if [[ "$PROCESSOR" = "aarch64" ]]; then
- MACHINE=sullivan
- else
- MACHINE=shepard
- fi
+ if [[ "$PROCESSOR" = "aarch64" ]]; then
+ MACHINE=sullivan
+ else
+ MACHINE=shepard
+ fi
elif [[ "$HOSTNAME" =~ apollo ]]; then
- MACHINE=apollo
+ MACHINE=apollo
elif [ ! -z "$SEMS_MODULEFILES_ROOT" ]; then
- MACHINE=sems
+ MACHINE=sems
else
- echo "Unrecognized machine" >&2
- exit 1
+ echo "Unrecognized machine" >&2
+ exit 1
fi
GCC_BUILD_LIST="OpenMP,Pthread,Serial,OpenMP_Serial,Pthread_Serial"
IBM_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
ARM_GCC_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
INTEL_BUILD_LIST="OpenMP,Pthread,Serial,OpenMP_Serial,Pthread_Serial"
CLANG_BUILD_LIST="Pthread,Serial,Pthread_Serial"
CUDA_BUILD_LIST="Cuda_OpenMP,Cuda_Pthread,Cuda_Serial"
CUDA_IBM_BUILD_LIST="Cuda_OpenMP,Cuda_Serial"
GCC_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wignored-qualifiers,-Wempty-body,-Wclobbered,-Wuninitialized"
IBM_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
CLANG_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
INTEL_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
CUDA_WARNING_FLAGS=""
-# Default. Machine specific can override
+# Default. Machine specific can override.
DEBUG=False
ARGS=""
CUSTOM_BUILD_LIST=""
+QTHREADS_PATH=""
DRYRUN=False
BUILD_ONLY=False
declare -i NUM_JOBS_TO_RUN_IN_PARALLEL=3
TEST_SCRIPT=False
SKIP_HWLOC=False
SPOT_CHECK=False
PRINT_HELP=False
OPT_FLAG=""
KOKKOS_OPTIONS=""
-
#
-# Handle arguments
+# Handle arguments.
#
while [[ $# > 0 ]]
do
-key="$1"
-case $key in
---kokkos-path*)
-KOKKOS_PATH="${key#*=}"
-;;
---build-list*)
-CUSTOM_BUILD_LIST="${key#*=}"
-;;
---debug*)
-DEBUG=True
-;;
---build-only*)
-BUILD_ONLY=True
-;;
---test-script*)
-TEST_SCRIPT=True
-;;
---skip-hwloc*)
-SKIP_HWLOC=True
-;;
---num*)
-NUM_JOBS_TO_RUN_IN_PARALLEL="${key#*=}"
-;;
---dry-run*)
-DRYRUN=True
-;;
---spot-check*)
-SPOT_CHECK=True
-;;
---arch*)
-ARCH_FLAG="--arch=${key#*=}"
-;;
---opt-flag*)
-OPT_FLAG="${key#*=}"
-;;
---with-cuda-options*)
-KOKKOS_CUDA_OPTIONS="--with-cuda-options=${key#*=}"
-;;
---help*)
-PRINT_HELP=True
-;;
-*)
-# args, just append
-ARGS="$ARGS $1"
-;;
-esac
-shift
+ key="$1"
+
+ case $key in
+ --kokkos-path*)
+ KOKKOS_PATH="${key#*=}"
+ ;;
+ --qthreads-path*)
+ QTHREADS_PATH="${key#*=}"
+ ;;
+ --build-list*)
+ CUSTOM_BUILD_LIST="${key#*=}"
+ ;;
+ --debug*)
+ DEBUG=True
+ ;;
+ --build-only*)
+ BUILD_ONLY=True
+ ;;
+ --test-script*)
+ TEST_SCRIPT=True
+ ;;
+ --skip-hwloc*)
+ SKIP_HWLOC=True
+ ;;
+ --num*)
+ NUM_JOBS_TO_RUN_IN_PARALLEL="${key#*=}"
+ ;;
+ --dry-run*)
+ DRYRUN=True
+ ;;
+ --spot-check*)
+ SPOT_CHECK=True
+ ;;
+ --arch*)
+ ARCH_FLAG="--arch=${key#*=}"
+ ;;
+ --opt-flag*)
+ OPT_FLAG="${key#*=}"
+ ;;
+ --with-cuda-options*)
+ KOKKOS_CUDA_OPTIONS="--with-cuda-options=${key#*=}"
+ ;;
+ --help*)
+ PRINT_HELP=True
+ ;;
+ *)
+ # args, just append
+ ARGS="$ARGS $1"
+ ;;
+ esac
+
+ shift
done
SCRIPT_KOKKOS_ROOT=$( cd "$( dirname "$0" )" && cd .. && pwd )
-# set kokkos path
+# Set kokkos path.
if [ -z "$KOKKOS_PATH" ]; then
- KOKKOS_PATH=$SCRIPT_KOKKOS_ROOT
+ KOKKOS_PATH=$SCRIPT_KOKKOS_ROOT
else
- # Ensure KOKKOS_PATH is abs path
- KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
+ # Ensure KOKKOS_PATH is abs path.
+ KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
fi
#
-# Machine specific config
+# Machine specific config.
#
if [ "$MACHINE" = "sems" ]; then
- source /projects/sems/modulefiles/utils/sems-modules-init.sh
+ source /projects/sems/modulefiles/utils/sems-modules-init.sh
- BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
- CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
- CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"
+ BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
+ CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
+ CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"
- if [ -z "$ARCH_FLAG" ]; then
- ARCH_FLAG=""
- fi
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG=""
+ fi
if [ "$SPOT_CHECK" = "True" ]; then
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
"gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
"cuda/8.0.44 $CUDA8_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
else
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
- "gcc/4.9.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
- "gcc/5.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"clang/3.7.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"clang/3.8.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
- "clang/3.9.0 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/8.0.44 $CUDA8_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
fi
-
elif [ "$MACHINE" = "white" ]; then
- source /etc/profile.d/modules.sh
- SKIP_HWLOC=True
- export SLURM_TASKS_PER_NODE=32
+ source /etc/profile.d/modules.sh
+ SKIP_HWLOC=True
+ export SLURM_TASKS_PER_NODE=32
- BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
- IBM_MODULE_LIST="<COMPILER_NAME>/xl/<COMPILER_VERSION>"
- CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/5.4.0"
+ BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
+ IBM_MODULE_LIST="<COMPILER_NAME>/xl/<COMPILER_VERSION>"
+ CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/5.4.0"
- # Don't do pthread on white
- GCC_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
+ # Don't do pthread on white.
+ GCC_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
- # Format: (compiler module-list build-list exe-name warning-flag)
- COMPILERS=("gcc/5.4.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
- "ibm/13.1.3 $IBM_MODULE_LIST $IBM_BUILD_LIST xlC $IBM_WARNING_FLAGS"
- "cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
- )
- if [ -z "$ARCH_FLAG" ]; then
- ARCH_FLAG="--arch=Power8,Kepler37"
- fi
- NUM_JOBS_TO_RUN_IN_PARALLEL=2
+ # Format: (compiler module-list build-list exe-name warning-flag)
+ COMPILERS=("gcc/5.4.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
+ "ibm/13.1.3 $IBM_MODULE_LIST $IBM_BUILD_LIST xlC $IBM_WARNING_FLAGS"
+ "cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
+ )
+
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG="--arch=Power8,Kepler37"
+ fi
+
+ NUM_JOBS_TO_RUN_IN_PARALLEL=2
elif [ "$MACHINE" = "bowman" ]; then
- source /etc/profile.d/modules.sh
- SKIP_HWLOC=True
- export SLURM_TASKS_PER_NODE=32
+ source /etc/profile.d/modules.sh
+ SKIP_HWLOC=True
+ export SLURM_TASKS_PER_NODE=32
- BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
+ BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
- OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
+ OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
- # Format: (compiler module-list build-list exe-name warning-flag)
- COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
- "intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
- )
+ # Format: (compiler module-list build-list exe-name warning-flag)
+ COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
+ "intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
+ )
- if [ -z "$ARCH_FLAG" ]; then
- ARCH_FLAG="--arch=KNL"
- fi
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG="--arch=KNL"
+ fi
- NUM_JOBS_TO_RUN_IN_PARALLEL=2
+ NUM_JOBS_TO_RUN_IN_PARALLEL=2
elif [ "$MACHINE" = "sullivan" ]; then
- source /etc/profile.d/modules.sh
- SKIP_HWLOC=True
- export SLURM_TASKS_PER_NODE=96
+ source /etc/profile.d/modules.sh
+ SKIP_HWLOC=True
+ export SLURM_TASKS_PER_NODE=96
- BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
+ BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
- # Format: (compiler module-list build-list exe-name warning-flag)
- COMPILERS=("gcc/5.3.0 $BASE_MODULE_LIST $ARM_GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS")
+ # Format: (compiler module-list build-list exe-name warning-flag)
+ COMPILERS=("gcc/5.3.0 $BASE_MODULE_LIST $ARM_GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS")
- if [ -z "$ARCH_FLAG" ]; then
- ARCH_FLAG="--arch=ARMv8-ThunderX"
- fi
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG="--arch=ARMv8-ThunderX"
+ fi
- NUM_JOBS_TO_RUN_IN_PARALLEL=2
+ NUM_JOBS_TO_RUN_IN_PARALLEL=2
elif [ "$MACHINE" = "shepard" ]; then
- source /etc/profile.d/modules.sh
- SKIP_HWLOC=True
- export SLURM_TASKS_PER_NODE=32
+ source /etc/profile.d/modules.sh
+ SKIP_HWLOC=True
+ export SLURM_TASKS_PER_NODE=32
- BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
+ BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
- OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
+ OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
- # Format: (compiler module-list build-list exe-name warning-flag)
- COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
- "intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
- )
+ # Format: (compiler module-list build-list exe-name warning-flag)
+ COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
+ "intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
+ )
- if [ -z "$ARCH_FLAG" ]; then
- ARCH_FLAG="--arch=HSW"
- fi
- NUM_JOBS_TO_RUN_IN_PARALLEL=2
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG="--arch=HSW"
+ fi
+ NUM_JOBS_TO_RUN_IN_PARALLEL=2
elif [ "$MACHINE" = "apollo" ]; then
- source /projects/sems/modulefiles/utils/sems-modules-init.sh
- module use /home/projects/modulefiles/local/x86-64
- module load kokkos-env
+ source /projects/sems/modulefiles/utils/sems-modules-init.sh
+ module use /home/projects/modulefiles/local/x86-64
+ module load kokkos-env
- module load sems-git
- module load sems-tex
- module load sems-cmake/3.5.2
- module load sems-gdb
+ module load sems-git
+ module load sems-tex
+ module load sems-cmake/3.5.2
+ module load sems-gdb
- SKIP_HWLOC=True
+ SKIP_HWLOC=True
- BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
- CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
- CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"
+ BASE_MODULE_LIST="sems-env,kokkos-env,sems-<COMPILER_NAME>/<COMPILER_VERSION>,kokkos-hwloc/1.10.1/base"
+ CUDA_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/4.8.4,kokkos-hwloc/1.10.1/base"
+ CUDA8_MODULE_LIST="sems-env,kokkos-env,kokkos-<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0,kokkos-hwloc/1.10.1/base"
- CLANG_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,cuda/8.0.44"
- NVCC_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0"
+ CLANG_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,cuda/8.0.44"
+ NVCC_MODULE_LIST="sems-env,kokkos-env,sems-git,sems-cmake/3.5.2,<COMPILER_NAME>/<COMPILER_VERSION>,sems-gcc/5.3.0"
- BUILD_LIST_CUDA_NVCC="Cuda_Serial,Cuda_OpenMP"
- BUILD_LIST_CUDA_CLANG="Cuda_Serial,Cuda_Pthread"
- BUILD_LIST_CLANG="Serial,Pthread,OpenMP"
+ BUILD_LIST_CUDA_NVCC="Cuda_Serial,Cuda_OpenMP"
+ BUILD_LIST_CUDA_CLANG="Cuda_Serial,Cuda_Pthread"
+ BUILD_LIST_CLANG="Serial,Pthread,OpenMP"
if [ "$SPOT_CHECK" = "True" ]; then
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
"gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
"clang/head $CLANG_MODULE_LIST "Cuda_Pthread" clang++ $CUDA_WARNING_FLAGS"
"cuda/8.0.44 $CUDA_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
else
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("cuda/8.0.44 $CUDA8_MODULE_LIST $BUILD_LIST_CUDA_NVCC $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"clang/head $CLANG_MODULE_LIST $BUILD_LIST_CUDA_CLANG clang++ $CUDA_WARNING_FLAGS"
"clang/3.9.0 $CLANG_MODULE_LIST $BUILD_LIST_CLANG clang++ $CLANG_WARNING_FLAGS"
"gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/4.9.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/5.3.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/6.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"clang/3.5.2 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
fi
- if [ -z "$ARCH_FLAG" ]; then
- ARCH_FLAG="--arch=SNB,Kepler35"
- fi
- NUM_JOBS_TO_RUN_IN_PARALLEL=2
-else
- echo "Unhandled machine $MACHINE" >&2
- exit 1
-fi
+ if [ -z "$ARCH_FLAG" ]; then
+ ARCH_FLAG="--arch=SNB,Kepler35"
+ fi
+ NUM_JOBS_TO_RUN_IN_PARALLEL=2
+else
+ echo "Unhandled machine $MACHINE" >&2
+ exit 1
+fi
export OMP_NUM_THREADS=4
declare -i NUM_RESULTS_TO_KEEP=7
RESULT_ROOT_PREFIX=TestAll
if [ "$PRINT_HELP" = "True" ]; then
-echo "test_all_sandia <ARGS> <OPTIONS>:"
-echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory"
-echo " Defaults to root repo containing this script"
-echo "--debug: Run tests in debug. Defaults to False"
-echo "--test-script: Test this script, not Kokkos"
-echo "--skip-hwloc: Do not do hwloc tests"
-echo "--num=N: Number of jobs to run in parallel"
-echo "--spot-check: Minimal test set to issue pull request"
-echo "--dry-run: Just print what would be executed"
-echo "--build-only: Just do builds, don't run anything"
-echo "--opt-flag=FLAG: Optimization flag (default: -O3)"
-echo "--arch=ARCHITECTURE: overwrite architecture flags"
-echo "--with-cuda-options=OPT: set KOKKOS_CUDA_OPTIONS"
-echo "--build-list=BUILD,BUILD,BUILD..."
-echo " Provide a comma-separated list of builds instead of running all builds"
-echo " Valid items:"
-echo " OpenMP, Pthread, Serial, OpenMP_Serial, Pthread_Serial"
-echo " Cuda_OpenMP, Cuda_Pthread, Cuda_Serial"
-echo ""
-
-echo "ARGS: list of expressions matching compilers to test"
-echo " supported compilers sems"
-for COMPILER_DATA in "${COMPILERS[@]}"; do
+ echo "test_all_sandia <ARGS> <OPTIONS>:"
+ echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory"
+ echo " Defaults to root repo containing this script"
+ echo "--debug: Run tests in debug. Defaults to False"
+ echo "--test-script: Test this script, not Kokkos"
+ echo "--skip-hwloc: Do not do hwloc tests"
+ echo "--num=N: Number of jobs to run in parallel"
+ echo "--spot-check: Minimal test set to issue pull request"
+ echo "--dry-run: Just print what would be executed"
+ echo "--build-only: Just do builds, don't run anything"
+ echo "--opt-flag=FLAG: Optimization flag (default: -O3)"
+ echo "--arch=ARCHITECTURE: overwrite architecture flags"
+ echo "--with-cuda-options=OPT: set KOKKOS_CUDA_OPTIONS"
+ echo "--build-list=BUILD,BUILD,BUILD..."
+ echo " Provide a comma-separated list of builds instead of running all builds"
+ echo " Valid items:"
+ echo " OpenMP, Pthread, Qthreads, Serial, OpenMP_Serial, Pthread_Serial"
+ echo " Qthreads_Serial, Cuda_OpenMP, Cuda_Pthread, Cuda_Serial"
+ echo ""
+
+ echo "ARGS: list of expressions matching compilers to test"
+ echo " supported compilers sems"
+ for COMPILER_DATA in "${COMPILERS[@]}"; do
ARR=($COMPILER_DATA)
COMPILER=${ARR[0]}
echo " $COMPILER"
-done
-echo ""
-
-echo "Examples:"
-echo " Run all tests"
-echo " % test_all_sandia"
-echo ""
-echo " Run all gcc tests"
-echo " % test_all_sandia gcc"
-echo ""
-echo " Run all gcc/4.7.2 and all intel tests"
-echo " % test_all_sandia gcc/4.7.2 intel"
-echo ""
-echo " Run all tests in debug"
-echo " % test_all_sandia --debug"
-echo ""
-echo " Run gcc/4.7.2 and only do OpenMP and OpenMP_Serial builds"
-echo " % test_all_sandia gcc/4.7.2 --build-list=OpenMP,OpenMP_Serial"
-echo ""
-echo "If you want to kill the tests, do:"
-echo " hit ctrl-z"
-echo " % kill -9 %1"
-echo
-exit 0
+ done
+ echo ""
+
+ echo "Examples:"
+ echo " Run all tests"
+ echo " % test_all_sandia"
+ echo ""
+ echo " Run all gcc tests"
+ echo " % test_all_sandia gcc"
+ echo ""
+ echo " Run all gcc/4.7.2 and all intel tests"
+ echo " % test_all_sandia gcc/4.7.2 intel"
+ echo ""
+ echo " Run all tests in debug"
+ echo " % test_all_sandia --debug"
+ echo ""
+ echo " Run gcc/4.7.2 and only do OpenMP and OpenMP_Serial builds"
+ echo " % test_all_sandia gcc/4.7.2 --build-list=OpenMP,OpenMP_Serial"
+ echo ""
+ echo "If you want to kill the tests, do:"
+ echo " hit ctrl-z"
+ echo " % kill -9 %1"
+ echo
+ exit 0
fi
-# set build type
+# Set build type.
if [ "$DEBUG" = "True" ]; then
- BUILD_TYPE=debug
+ BUILD_TYPE=debug
else
- BUILD_TYPE=release
+ BUILD_TYPE=release
fi
-# If no args provided, do all compilers
+# If no args provided, do all compilers.
if [ -z "$ARGS" ]; then
- ARGS='?'
+ ARGS='?'
fi
-# Process args to figure out which compilers to test
+# Process args to figure out which compilers to test.
COMPILERS_TO_TEST=""
+
for ARG in $ARGS; do
- for COMPILER_DATA in "${COMPILERS[@]}"; do
- ARR=($COMPILER_DATA)
- COMPILER=${ARR[0]}
- if [[ "$COMPILER" = $ARG* ]]; then
- if [[ "$COMPILERS_TO_TEST" != *${COMPILER}* ]]; then
- COMPILERS_TO_TEST="$COMPILERS_TO_TEST $COMPILER"
- else
- echo "Tried to add $COMPILER twice"
- fi
- fi
- done
+ for COMPILER_DATA in "${COMPILERS[@]}"; do
+ ARR=($COMPILER_DATA)
+ COMPILER=${ARR[0]}
+
+ if [[ "$COMPILER" = $ARG* ]]; then
+ if [[ "$COMPILERS_TO_TEST" != *${COMPILER}* ]]; then
+ COMPILERS_TO_TEST="$COMPILERS_TO_TEST $COMPILER"
+ else
+ echo "Tried to add $COMPILER twice"
+ fi
+ fi
+ done
done
+# Check if Qthreads build requested.
+HAVE_QTHREADS_BUILD="False"
+if [ -n "$CUSTOM_BUILD_LIST" ]; then
+ if [[ "$CUSTOM_BUILD_LIST" = *Qthreads* ]]; then
+ HAVE_QTHREADS_BUILD="True"
+ fi
+else
+ for COMPILER_DATA in "${COMPILERS[@]}"; do
+ ARR=($COMPILER_DATA)
+ BUILD_LIST=${ARR[2]}
+ if [[ "$BUILD_LIST" = *Qthreads* ]]; then
+ HAVE_QTHREADS_BUILD="True"
+ fi
+ done
+fi
+
+# Ensure Qthreads path is set if Qthreads build is requested.
+if [ "$HAVE_QTHREADS_BUILD" = "True" ]; then
+ if [ -z "$QTHREADS_PATH" ]; then
+ echo "Need to supply Qthreads path (--qthreads-path) when testing Qthreads backend." >&2
+ exit 1
+ else
+ # Strip trailing slashes from path.
+ QTHREADS_PATH=$(echo $QTHREADS_PATH | sed 's/\/*$//')
+ fi
+fi
+
#
-# Functions
+# Functions.
#
# get_compiler_name <COMPILER>
get_compiler_name() {
- echo $1 | cut -d/ -f1
+ echo $1 | cut -d/ -f1
}
# get_compiler_version <COMPILER>
get_compiler_version() {
- echo $1 | cut -d/ -f2
+ echo $1 | cut -d/ -f2
}
-# Do not call directly
+# Do not call directly.
get_compiler_data() {
- local compiler=$1
- local item=$2
- local compiler_name=$(get_compiler_name $compiler)
- local compiler_vers=$(get_compiler_version $compiler)
-
- local compiler_data
- for compiler_data in "${COMPILERS[@]}" ; do
- local arr=($compiler_data)
- if [ "$compiler" = "${arr[0]}" ]; then
- echo "${arr[$item]}" | tr , ' ' | sed -e "s/<COMPILER_NAME>/$compiler_name/g" -e "s/<COMPILER_VERSION>/$compiler_vers/g"
- return 0
- fi
- done
-
- # Not found
- echo "Unreconized compiler $compiler" >&2
- exit 1
+ local compiler=$1
+ local item=$2
+ local compiler_name=$(get_compiler_name $compiler)
+ local compiler_vers=$(get_compiler_version $compiler)
+
+ local compiler_data
+ for compiler_data in "${COMPILERS[@]}" ; do
+ local arr=($compiler_data)
+
+ if [ "$compiler" = "${arr[0]}" ]; then
+ echo "${arr[$item]}" | tr , ' ' | sed -e "s/<COMPILER_NAME>/$compiler_name/g" -e "s/<COMPILER_VERSION>/$compiler_vers/g"
+ return 0
+ fi
+ done
+
+ # Not found.
+ echo "Unreconized compiler $compiler" >&2
+ exit 1
}
#
# For all getters, usage: <GETTER> <COMPILER>
#
get_compiler_modules() {
- get_compiler_data $1 1
+ get_compiler_data $1 1
}
get_compiler_build_list() {
- get_compiler_data $1 2
+ get_compiler_data $1 2
}
get_compiler_exe_name() {
- get_compiler_data $1 3
+ get_compiler_data $1 3
}
get_compiler_warning_flags() {
- get_compiler_data $1 4
+ get_compiler_data $1 4
}
run_cmd() {
- echo "RUNNING: $*"
- if [ "$DRYRUN" != "True" ]; then
- eval "$* 2>&1"
- fi
+ echo "RUNNING: $*"
+ if [ "$DRYRUN" != "True" ]; then
+ eval "$* 2>&1"
+ fi
}
# report_and_log_test_results <SUCCESS> <DESC> <COMMENT>
report_and_log_test_result() {
- # Use sane var names
- local success=$1; local desc=$2; local comment=$3;
+ # Use sane var names.
+ local success=$1; local desc=$2; local comment=$3;
- if [ "$success" = "0" ]; then
- echo " PASSED $desc"
- echo $comment > $PASSED_DIR/$desc
- else
- # For failures, comment should be the name of the phase that failed
- echo " FAILED $desc" >&2
- echo $comment > $FAILED_DIR/$desc
- cat ${desc}.${comment}.log
- fi
+ if [ "$success" = "0" ]; then
+ echo " PASSED $desc"
+ echo $comment > $PASSED_DIR/$desc
+ else
+ # For failures, comment should be the name of the phase that failed.
+ echo " FAILED $desc" >&2
+ echo $comment > $FAILED_DIR/$desc
+ cat ${desc}.${comment}.log
+ fi
}
setup_env() {
- local compiler=$1
- local compiler_modules=$(get_compiler_modules $compiler)
-
- module purge
-
- local mod
- for mod in $compiler_modules; do
- echo "Loading module $mod"
- module load $mod 2>&1
- # It is ridiculously hard to check for the success of a loaded
- # module. Module does not return error codes and piping to grep
- # causes module to run in a subshell.
- module list 2>&1 | grep "$mod" >& /dev/null || return 1
- done
-
- return 0
+ local compiler=$1
+ local compiler_modules=$(get_compiler_modules $compiler)
+
+ module purge
+
+ local mod
+ for mod in $compiler_modules; do
+ echo "Loading module $mod"
+ module load $mod 2>&1
+ # It is ridiculously hard to check for the success of a loaded
+ # module. Module does not return error codes and piping to grep
+ # causes module to run in a subshell.
+ module list 2>&1 | grep "$mod" >& /dev/null || return 1
+ done
+
+ return 0
}
# single_build_and_test <COMPILER> <BUILD> <BUILD_TYPE>
single_build_and_test() {
- # Use sane var names
- local compiler=$1; local build=$2; local build_type=$3;
+ # Use sane var names.
+ local compiler=$1; local build=$2; local build_type=$3;
+
+ # Set up env.
+ mkdir -p $ROOT_DIR/$compiler/"${build}-$build_type"
+ cd $ROOT_DIR/$compiler/"${build}-$build_type"
+ local desc=$(echo "${compiler}-${build}-${build_type}" | sed 's:/:-:g')
+ setup_env $compiler >& ${desc}.configure.log || { report_and_log_test_result 1 ${desc} configure && return 0; }
- # set up env
- mkdir -p $ROOT_DIR/$compiler/"${build}-$build_type"
- cd $ROOT_DIR/$compiler/"${build}-$build_type"
- local desc=$(echo "${compiler}-${build}-${build_type}" | sed 's:/:-:g')
- setup_env $compiler >& ${desc}.configure.log || { report_and_log_test_result 1 ${desc} configure && return 0; }
+ # Set up flags.
+ local compiler_warning_flags=$(get_compiler_warning_flags $compiler)
+ local compiler_exe=$(get_compiler_exe_name $compiler)
- # Set up flags
- local compiler_warning_flags=$(get_compiler_warning_flags $compiler)
- local compiler_exe=$(get_compiler_exe_name $compiler)
+ if [[ "$build_type" = hwloc* ]]; then
+ local extra_args=--with-hwloc=$(dirname $(dirname $(which hwloc-info)))
+ fi
+ if [[ "$build" = *Qthreads* ]]; then
if [[ "$build_type" = hwloc* ]]; then
- local extra_args=--with-hwloc=$(dirname $(dirname $(which hwloc-info)))
+ local extra_args="$extra_args --qthreads-path=${QTHREADS_PATH}_hwloc"
+ else
+ local extra_args="$extra_args --qthreads-path=$QTHREADS_PATH"
fi
+ fi
- if [[ "$OPT_FLAG" = "" ]]; then
- OPT_FLAG="-O3"
- fi
+ if [[ "$OPT_FLAG" = "" ]]; then
+ OPT_FLAG="-O3"
+ fi
- if [[ "$build_type" = *debug* ]]; then
- local extra_args="$extra_args --debug"
- local cxxflags="-g $compiler_warning_flags"
- else
- local cxxflags="$OPT_FLAG $compiler_warning_flags"
- fi
+ if [[ "$build_type" = *debug* ]]; then
+ local extra_args="$extra_args --debug"
+ local cxxflags="-g $compiler_warning_flags"
+ else
+ local cxxflags="$OPT_FLAG $compiler_warning_flags"
+ fi
- if [[ "$compiler" == cuda* ]]; then
- cxxflags="--keep --keep-dir=$(pwd) $cxxflags"
- export TMPDIR=$(pwd)
- fi
+ if [[ "$KOKKOS_CUDA_OPTIONS" != "" ]]; then
+ local extra_args="$extra_args $KOKKOS_CUDA_OPTIONS"
+ fi
- if [[ "$KOKKOS_CUDA_OPTIONS" != "" ]]; then
- local extra_args="$extra_args $KOKKOS_CUDA_OPTIONS"
- fi
+ echo " Starting job $desc"
- echo " Starting job $desc"
+ local comment="no_comment"
- local comment="no_comment"
+ if [ "$TEST_SCRIPT" = "True" ]; then
+ local rand=$[ 1 + $[ RANDOM % 10 ]]
+ sleep $rand
- if [ "$TEST_SCRIPT" = "True" ]; then
- local rand=$[ 1 + $[ RANDOM % 10 ]]
- sleep $rand
- if [ $rand -gt 5 ]; then
- run_cmd ls fake_problem >& ${desc}.configure.log || { report_and_log_test_result 1 $desc configure && return 0; }
- fi
- else
- run_cmd ${KOKKOS_PATH}/generate_makefile.bash --with-devices=$build $ARCH_FLAG --compiler=$(which $compiler_exe) --cxxflags=\"$cxxflags\" $extra_args &>> ${desc}.configure.log || { report_and_log_test_result 1 ${desc} configure && return 0; }
- local -i build_start_time=$(date +%s)
- run_cmd make build-test >& ${desc}.build.log || { report_and_log_test_result 1 ${desc} build && return 0; }
- local -i build_end_time=$(date +%s)
- comment="build_time=$(($build_end_time-$build_start_time))"
- if [[ "$BUILD_ONLY" == False ]]; then
- run_cmd make test >& ${desc}.test.log || { report_and_log_test_result 1 ${desc} test && return 0; }
- local -i run_end_time=$(date +%s)
- comment="$comment run_time=$(($run_end_time-$build_end_time))"
- fi
+ if [ $rand -gt 5 ]; then
+ run_cmd ls fake_problem >& ${desc}.configure.log || { report_and_log_test_result 1 $desc configure && return 0; }
fi
+ else
+ run_cmd ${KOKKOS_PATH}/generate_makefile.bash --with-devices=$build $ARCH_FLAG --compiler=$(which $compiler_exe) --cxxflags=\"$cxxflags\" $extra_args &>> ${desc}.configure.log || { report_and_log_test_result 1 ${desc} configure && return 0; }
+ local -i build_start_time=$(date +%s)
+ run_cmd make build-test >& ${desc}.build.log || { report_and_log_test_result 1 ${desc} build && return 0; }
+ local -i build_end_time=$(date +%s)
+ comment="build_time=$(($build_end_time-$build_start_time))"
+
+ if [[ "$BUILD_ONLY" == False ]]; then
+ run_cmd make test >& ${desc}.test.log || { report_and_log_test_result 1 ${desc} test && return 0; }
+ local -i run_end_time=$(date +%s)
+ comment="$comment run_time=$(($run_end_time-$build_end_time))"
+ fi
+ fi
- report_and_log_test_result 0 $desc "$comment"
+ report_and_log_test_result 0 $desc "$comment"
- return 0
+ return 0
}
# wait_for_jobs <NUM-JOBS>
wait_for_jobs() {
- local -i max_jobs=$1
- local -i num_active_jobs=$(jobs | wc -l)
- while [ $num_active_jobs -ge $max_jobs ]
- do
- sleep 1
- num_active_jobs=$(jobs | wc -l)
- jobs >& /dev/null
- done
+ local -i max_jobs=$1
+ local -i num_active_jobs=$(jobs | wc -l)
+ while [ $num_active_jobs -ge $max_jobs ]
+ do
+ sleep 1
+ num_active_jobs=$(jobs | wc -l)
+ jobs >& /dev/null
+ done
}
# run_in_background <COMPILER> <BUILD> <BUILD_TYPE>
run_in_background() {
- local compiler=$1
-
- local -i num_jobs=$NUM_JOBS_TO_RUN_IN_PARALLEL
- # don't override command line input
- # if [[ "$BUILD_ONLY" == True ]]; then
- # num_jobs=8
- # else
- if [[ "$compiler" == cuda* ]]; then
- num_jobs=1
- fi
- # fi
- wait_for_jobs $num_jobs
-
- single_build_and_test $* &
+ local compiler=$1
+
+ local -i num_jobs=$NUM_JOBS_TO_RUN_IN_PARALLEL
+ # Don't override command line input.
+ # if [[ "$BUILD_ONLY" == True ]]; then
+ # num_jobs=8
+ # else
+ if [[ "$compiler" == cuda* ]]; then
+ num_jobs=1
+ fi
+ # fi
+ wait_for_jobs $num_jobs
+
+ single_build_and_test $* &
}
# build_and_test_all <COMPILER>
build_and_test_all() {
- # Get compiler data
- local compiler=$1
- if [ -z "$CUSTOM_BUILD_LIST" ]; then
- local compiler_build_list=$(get_compiler_build_list $compiler)
- else
- local compiler_build_list=$(echo "$CUSTOM_BUILD_LIST" | tr , ' ')
- fi
+ # Get compiler data.
+ local compiler=$1
+ if [ -z "$CUSTOM_BUILD_LIST" ]; then
+ local compiler_build_list=$(get_compiler_build_list $compiler)
+ else
+ local compiler_build_list=$(echo "$CUSTOM_BUILD_LIST" | tr , ' ')
+ fi
- # do builds
- local build
- for build in $compiler_build_list
- do
- run_in_background $compiler $build $BUILD_TYPE
+ # Do builds.
+ local build
+ for build in $compiler_build_list
+ do
+ run_in_background $compiler $build $BUILD_TYPE
- # If not cuda, do a hwloc test too
- if [[ "$compiler" != cuda* && "$SKIP_HWLOC" == False ]]; then
- run_in_background $compiler $build "hwloc-$BUILD_TYPE"
- fi
- done
+ # If not cuda, do a hwloc test too.
+ if [[ "$compiler" != cuda* && "$SKIP_HWLOC" == False ]]; then
+ run_in_background $compiler $build "hwloc-$BUILD_TYPE"
+ fi
+ done
- return 0
+ return 0
}
get_test_root_dir() {
- local existing_results=$(find . -maxdepth 1 -name "$RESULT_ROOT_PREFIX*" | sort)
- local -i num_existing_results=$(echo $existing_results | tr ' ' '\n' | wc -l)
- local -i num_to_delete=${num_existing_results}-${NUM_RESULTS_TO_KEEP}
+ local existing_results=$(find . -maxdepth 1 -name "$RESULT_ROOT_PREFIX*" | sort)
+ local -i num_existing_results=$(echo $existing_results | tr ' ' '\n' | wc -l)
+ local -i num_to_delete=${num_existing_results}-${NUM_RESULTS_TO_KEEP}
- if [ $num_to_delete -gt 0 ]; then
- /bin/rm -rf $(echo $existing_results | tr ' ' '\n' | head -n $num_to_delete)
- fi
+ if [ $num_to_delete -gt 0 ]; then
+ /bin/rm -rf $(echo $existing_results | tr ' ' '\n' | head -n $num_to_delete)
+ fi
- echo $(pwd)/${RESULT_ROOT_PREFIX}_$(date +"%Y-%m-%d_%H.%M.%S")
+ echo $(pwd)/${RESULT_ROOT_PREFIX}_$(date +"%Y-%m-%d_%H.%M.%S")
}
wait_summarize_and_exit() {
- wait_for_jobs 1
-
- echo "#######################################################"
- echo "PASSED TESTS"
- echo "#######################################################"
-
- local passed_test
- for passed_test in $(\ls -1 $PASSED_DIR | sort)
- do
- echo $passed_test $(cat $PASSED_DIR/$passed_test)
- done
-
- echo "#######################################################"
- echo "FAILED TESTS"
- echo "#######################################################"
-
- local failed_test
- local -i rv=0
- for failed_test in $(\ls -1 $FAILED_DIR | sort)
- do
- echo $failed_test "("$(cat $FAILED_DIR/$failed_test)" failed)"
- rv=$rv+1
- done
-
- exit $rv
+ wait_for_jobs 1
+
+ echo "#######################################################"
+ echo "PASSED TESTS"
+ echo "#######################################################"
+
+ local passed_test
+ for passed_test in $(\ls -1 $PASSED_DIR | sort)
+ do
+ echo $passed_test $(cat $PASSED_DIR/$passed_test)
+ done
+
+ echo "#######################################################"
+ echo "FAILED TESTS"
+ echo "#######################################################"
+
+ local failed_test
+ local -i rv=0
+ for failed_test in $(\ls -1 $FAILED_DIR | sort)
+ do
+ echo $failed_test "("$(cat $FAILED_DIR/$failed_test)" failed)"
+ rv=$rv+1
+ done
+
+ exit $rv
}
#
-# Main
+# Main.
#
ROOT_DIR=$(get_test_root_dir)
mkdir -p $ROOT_DIR
cd $ROOT_DIR
PASSED_DIR=$ROOT_DIR/results/passed
FAILED_DIR=$ROOT_DIR/results/failed
mkdir -p $PASSED_DIR
mkdir -p $FAILED_DIR
echo "Going to test compilers: " $COMPILERS_TO_TEST
for COMPILER in $COMPILERS_TO_TEST; do
- echo "Testing compiler $COMPILER"
- build_and_test_all $COMPILER
+ echo "Testing compiler $COMPILER"
+ build_and_test_all $COMPILER
done
wait_summarize_and_exit
diff --git a/lib/kokkos/containers/src/Kokkos_DynamicView.hpp b/lib/kokkos/containers/src/Kokkos_DynamicView.hpp
index 3277c007d..53e0eab69 100644
--- a/lib/kokkos/containers/src/Kokkos_DynamicView.hpp
+++ b/lib/kokkos/containers/src/Kokkos_DynamicView.hpp
@@ -1,494 +1,591 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_DYNAMIC_VIEW_HPP
#define KOKKOS_DYNAMIC_VIEW_HPP
#include <cstdio>
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Error.hpp>
namespace Kokkos {
namespace Experimental {
/** \brief Dynamic views are restricted to rank-one and no layout.
* Subviews are not allowed.
*/
template< typename DataType , typename ... P >
class DynamicView : public Kokkos::ViewTraits< DataType , P ... >
{
public:
- typedef ViewTraits< DataType , P ... > traits ;
+ typedef Kokkos::ViewTraits< DataType , P ... > traits ;
private:
template< class , class ... > friend class DynamicView ;
typedef Kokkos::Experimental::Impl::SharedAllocationTracker track_type ;
static_assert( traits::rank == 1 && traits::rank_dynamic == 1
, "DynamicView must be rank-one" );
static_assert( std::is_trivial< typename traits::value_type >::value &&
std::is_same< typename traits::specialize , void >::value
, "DynamicView must have trivial data type" );
template< class Space , bool = Kokkos::Impl::MemorySpaceAccess< Space , typename traits::memory_space >::accessible > struct verify_space
{ KOKKOS_FORCEINLINE_FUNCTION static void check() {} };
template< class Space > struct verify_space<Space,false>
{ KOKKOS_FORCEINLINE_FUNCTION static void check()
{ Kokkos::abort("Kokkos::DynamicView ERROR: attempt to access inaccessible memory space"); };
};
public:
typedef Kokkos::Experimental::MemoryPool< typename traits::device_type > memory_pool ;
private:
memory_pool m_pool ;
track_type m_track ;
typename traits::value_type ** m_chunks ;
unsigned m_chunk_shift ;
unsigned m_chunk_mask ;
unsigned m_chunk_max ;
public:
//----------------------------------------------------------------------
/** \brief Compatible view of array of scalar types */
typedef DynamicView< typename traits::data_type ,
typename traits::device_type >
array_type ;
/** \brief Compatible view of const data type */
typedef DynamicView< typename traits::const_data_type ,
typename traits::device_type >
const_type ;
/** \brief Compatible view of non-const data type */
typedef DynamicView< typename traits::non_const_data_type ,
typename traits::device_type >
non_const_type ;
/** \brief Must be accessible everywhere */
typedef DynamicView HostMirror ;
//----------------------------------------------------------------------
enum { Rank = 1 };
- KOKKOS_INLINE_FUNCTION constexpr size_t size() const
+ KOKKOS_INLINE_FUNCTION
+ size_t size() const noexcept
{
- return
- Kokkos::Impl::MemorySpaceAccess
- < Kokkos::Impl::ActiveExecutionMemorySpace
- , typename traits::memory_space
- >::accessible
- ? // Runtime size is at the end of the chunk pointer array
- (*reinterpret_cast<const uintptr_t*>( m_chunks + m_chunk_max ))
- << m_chunk_shift
- : 0 ;
+ uintptr_t n = 0 ;
+
+ if ( Kokkos::Impl::MemorySpaceAccess
+ < Kokkos::Impl::ActiveExecutionMemorySpace
+ , typename traits::memory_space
+ >::accessible ) {
+ n = *reinterpret_cast<const uintptr_t*>( m_chunks + m_chunk_max );
+ }
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ else {
+ Kokkos::Impl::DeepCopy< Kokkos::HostSpace
+ , typename traits::memory_space
+ , Kokkos::HostSpace::execution_space >
+ ( & n
+ , reinterpret_cast<const uintptr_t*>( m_chunks + m_chunk_max )
+ , sizeof(uintptr_t) );
+ }
+#endif
+ return n << m_chunk_shift ;
}
template< typename iType >
- KOKKOS_INLINE_FUNCTION constexpr
+ KOKKOS_INLINE_FUNCTION
size_t extent( const iType & r ) const
{ return r == 0 ? size() : 1 ; }
template< typename iType >
- KOKKOS_INLINE_FUNCTION constexpr
+ KOKKOS_INLINE_FUNCTION
size_t extent_int( const iType & r ) const
{ return r == 0 ? size() : 1 ; }
- KOKKOS_INLINE_FUNCTION constexpr size_t dimension_0() const { return size(); }
+ KOKKOS_INLINE_FUNCTION size_t dimension_0() const { return size(); }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_1() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_2() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_3() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_4() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_5() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_6() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t dimension_7() const { return 1 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_0() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_1() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_2() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_3() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_4() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_5() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_6() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr size_t stride_7() const { return 0 ; }
template< typename iType >
KOKKOS_INLINE_FUNCTION void stride( iType * const s ) const { *s = 0 ; }
//----------------------------------------------------------------------
// Range span is the span which contains all members.
typedef typename traits::value_type & reference_type ;
typedef typename traits::value_type * pointer_type ;
enum { reference_type_is_lvalue_reference = std::is_lvalue_reference< reference_type >::value };
KOKKOS_INLINE_FUNCTION constexpr bool span_is_contiguous() const { return false ; }
KOKKOS_INLINE_FUNCTION constexpr size_t span() const { return 0 ; }
KOKKOS_INLINE_FUNCTION constexpr pointer_type data() const { return 0 ; }
//----------------------------------------
template< typename I0 , class ... Args >
KOKKOS_INLINE_FUNCTION
reference_type operator()( const I0 & i0 , const Args & ... args ) const
{
static_assert( Kokkos::Impl::are_integral<I0,Args...>::value
, "Indices must be integral type" );
DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
// Which chunk is being indexed.
const uintptr_t ic = uintptr_t( i0 >> m_chunk_shift );
typename traits::value_type * volatile * const ch = m_chunks + ic ;
// Do bounds checking if enabled or if the chunk pointer is zero.
// If not bounds checking then we assume a non-zero pointer is valid.
#if ! defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
if ( 0 == *ch )
#endif
{
// Verify that allocation of the requested chunk in in progress.
// The allocated chunk counter is m_chunks[ m_chunk_max ]
const uintptr_t n =
*reinterpret_cast<uintptr_t volatile *>( m_chunks + m_chunk_max );
if ( n <= ic ) {
Kokkos::abort("Kokkos::DynamicView array bounds error");
}
// Allocation of this chunk is in progress
// so wait for allocation to complete.
while ( 0 == *ch );
}
return (*ch)[ i0 & m_chunk_mask ];
}
//----------------------------------------
/** \brief Resizing in parallel only increases the array size,
* never decrease.
*/
KOKKOS_INLINE_FUNCTION
void resize_parallel( size_t n ) const
{
typedef typename traits::value_type value_type ;
DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;
if ( m_chunk_max < NC ) {
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
printf("DynamicView::resize_parallel(%lu) m_chunk_max(%u) NC(%lu)\n"
, n , m_chunk_max , NC );
#endif
Kokkos::abort("DynamicView::resize_parallel exceeded maximum size");
}
typename traits::value_type * volatile * const ch = m_chunks ;
// The allocated chunk counter is m_chunks[ m_chunk_max ]
uintptr_t volatile * const pc =
reinterpret_cast<uintptr_t volatile*>( m_chunks + m_chunk_max );
// Potentially concurrent iteration of allocation to the required size.
for ( uintptr_t jc = *pc ; jc < NC ; ) {
// Claim the 'jc' chunk to-be-allocated index
const uintptr_t jc_try = jc ;
// Jump iteration to the chunk counter.
jc = atomic_compare_exchange( pc , jc_try , jc_try + 1 );
if ( jc_try == jc ) {
ch[jc_try] = reinterpret_cast<value_type*>(
m_pool.allocate( sizeof(value_type) << m_chunk_shift ));
Kokkos::memory_fence();
}
}
}
/** \brief Resizing in serial can grow or shrink the array size, */
+ template< typename IntType >
inline
- void resize_serial( size_t n )
+ typename std::enable_if
+ < std::is_integral<IntType>::value &&
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace
+ , typename traits::memory_space
+ >::accessible
+ >::type
+ resize_serial( IntType const & n )
{
- DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
+ typedef typename traits::value_type value_type ;
+ typedef value_type * pointer_type ;
const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;
if ( m_chunk_max < NC ) {
Kokkos::abort("DynamicView::resize_serial exceeded maximum size");
}
uintptr_t * const pc =
reinterpret_cast<uintptr_t*>( m_chunks + m_chunk_max );
if ( *pc < NC ) {
while ( *pc < NC ) {
- m_chunks[*pc] =
- m_pool.allocate( sizeof(traits::value_type) << m_chunk_shift );
+ m_chunks[*pc] = reinterpret_cast<pointer_type>
+ ( m_pool.allocate( sizeof(value_type) << m_chunk_shift ) );
++*pc ;
}
}
else {
while ( NC + 1 <= *pc ) {
--*pc ;
m_pool.deallocate( m_chunks[*pc]
- , sizeof(traits::value_type) << m_chunk_shift );
+ , sizeof(value_type) << m_chunk_shift );
m_chunks[*pc] = 0 ;
}
}
}
+ //----------------------------------------
+
+ struct ResizeSerial {
+ memory_pool m_pool ;
+ typename traits::value_type ** m_chunks ;
+ uintptr_t * m_pc ;
+ uintptr_t m_nc ;
+ unsigned m_chunk_shift ;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( int ) const
+ {
+ typedef typename traits::value_type value_type ;
+ typedef value_type * pointer_type ;
+
+ if ( *m_pc < m_nc ) {
+ while ( *m_pc < m_nc ) {
+ m_chunks[*m_pc] = reinterpret_cast<pointer_type>
+ ( m_pool.allocate( sizeof(value_type) << m_chunk_shift ) );
+ ++*m_pc ;
+ }
+ }
+ else {
+ while ( m_nc + 1 <= *m_pc ) {
+ --*m_pc ;
+ m_pool.deallocate( m_chunks[*m_pc]
+ , sizeof(value_type) << m_chunk_shift );
+ m_chunks[*m_pc] = 0 ;
+ }
+ }
+ }
+
+ ResizeSerial( memory_pool const & arg_pool
+ , typename traits::value_type ** arg_chunks
+ , uintptr_t * arg_pc
+ , uintptr_t arg_nc
+ , unsigned arg_chunk_shift
+ )
+ : m_pool( arg_pool )
+ , m_chunks( arg_chunks )
+ , m_pc( arg_pc )
+ , m_nc( arg_nc )
+ , m_chunk_shift( arg_chunk_shift )
+ {}
+ };
+
+ template< typename IntType >
+ inline
+ typename std::enable_if
+ < std::is_integral<IntType>::value &&
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace
+ , typename traits::memory_space
+ >::accessible
+ >::type
+ resize_serial( IntType const & n )
+ {
+ const uintptr_t NC = ( n + m_chunk_mask ) >> m_chunk_shift ;
+
+ if ( m_chunk_max < NC ) {
+ Kokkos::abort("DynamicView::resize_serial exceeded maximum size");
+ }
+
+ // Must dispatch kernel
+
+ typedef Kokkos::RangePolicy< typename traits::execution_space > Range ;
+
+ uintptr_t * const pc =
+ reinterpret_cast<uintptr_t*>( m_chunks + m_chunk_max );
+
+ Kokkos::Impl::ParallelFor<ResizeSerial,Range>
+ closure( ResizeSerial( m_pool, m_chunks, pc, NC, m_chunk_shift )
+ , Range(0,1) );
+
+ closure.execute();
+
+ traits::execution_space::fence();
+ }
+
//----------------------------------------------------------------------
~DynamicView() = default ;
DynamicView() = default ;
DynamicView( DynamicView && ) = default ;
DynamicView( const DynamicView & ) = default ;
DynamicView & operator = ( DynamicView && ) = default ;
DynamicView & operator = ( const DynamicView & ) = default ;
template< class RT , class ... RP >
- KOKKOS_INLINE_FUNCTION
DynamicView( const DynamicView<RT,RP...> & rhs )
: m_pool( rhs.m_pool )
, m_track( rhs.m_track )
- , m_chunks( rhs.m_chunks )
+ , m_chunks( (typename traits::value_type **) rhs.m_chunks )
, m_chunk_shift( rhs.m_chunk_shift )
, m_chunk_mask( rhs.m_chunk_mask )
, m_chunk_max( rhs.m_chunk_max )
{
+ typedef typename DynamicView<RT,RP...>::traits SrcTraits ;
+ typedef Kokkos::Impl::ViewMapping< traits , SrcTraits , void > Mapping ;
+ static_assert( Mapping::is_assignable , "Incompatible DynamicView copy construction" );
}
//----------------------------------------------------------------------
struct Destroy {
memory_pool m_pool ;
typename traits::value_type ** m_chunks ;
unsigned m_chunk_max ;
bool m_destroy ;
// Initialize or destroy array of chunk pointers.
// Two entries beyond the max chunks are allocation counters.
KOKKOS_INLINE_FUNCTION
void operator()( unsigned i ) const
{
if ( m_destroy && i < m_chunk_max && 0 != m_chunks[i] ) {
m_pool.deallocate( m_chunks[i] , m_pool.get_min_block_size() );
}
m_chunks[i] = 0 ;
}
void execute( bool arg_destroy )
{
typedef Kokkos::RangePolicy< typename traits::execution_space > Range ;
m_destroy = arg_destroy ;
Kokkos::Impl::ParallelFor<Destroy,Range>
closure( *this , Range(0, m_chunk_max + 1) );
closure.execute();
traits::execution_space::fence();
}
void construct_shared_allocation()
{ execute( false ); }
void destroy_shared_allocation()
{ execute( true ); }
Destroy() = default ;
Destroy( Destroy && ) = default ;
Destroy( const Destroy & ) = default ;
Destroy & operator = ( Destroy && ) = default ;
Destroy & operator = ( const Destroy & ) = default ;
Destroy( const memory_pool & arg_pool
, typename traits::value_type ** arg_chunk
, const unsigned arg_chunk_max )
: m_pool( arg_pool )
, m_chunks( arg_chunk )
, m_chunk_max( arg_chunk_max )
, m_destroy( false )
{}
};
/**\brief Allocation constructor
*
* Memory is allocated in chunks from the memory pool.
* The chunk size conforms to the memory pool's chunk size.
* A maximum size is required in order to allocate a
* chunk-pointer array.
*/
explicit inline
DynamicView( const std::string & arg_label
, const memory_pool & arg_pool
, const size_t arg_size_max )
: m_pool( arg_pool )
, m_track()
, m_chunks(0)
// The memory pool chunk is guaranteed to be a power of two
, m_chunk_shift(
Kokkos::Impl::integral_power_of_two(
m_pool.get_min_block_size()/sizeof(typename traits::value_type)) )
, m_chunk_mask( ( 1 << m_chunk_shift ) - 1 )
, m_chunk_max( ( arg_size_max + m_chunk_mask ) >> m_chunk_shift )
{
- DynamicView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check();
-
// A functor to deallocate all of the chunks upon final destruction
typedef typename traits::memory_space memory_space ;
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< memory_space , Destroy > record_type ;
// Allocate chunk pointers and allocation counter
record_type * const record =
record_type::allocate( memory_space()
, arg_label
, ( sizeof(pointer_type) * ( m_chunk_max + 1 ) ) );
m_chunks = reinterpret_cast<pointer_type*>( record->data() );
record->m_destroy = Destroy( m_pool , m_chunks , m_chunk_max );
// Initialize to zero
record->m_destroy.construct_shared_allocation();
m_track.assign_allocated_record_to_uninitialized( record );
}
};
} // namespace Experimental
} // namespace Kokkos
namespace Kokkos {
namespace Experimental {
template< class T , class ... P >
inline
typename Kokkos::Experimental::DynamicView<T,P...>::HostMirror
create_mirror_view( const Kokkos::Experimental::DynamicView<T,P...> & src )
{
return src ;
}
template< class T , class ... DP , class ... SP >
inline
void deep_copy( const View<T,DP...> & dst
, const DynamicView<T,SP...> & src
)
{
typedef View<T,DP...> dst_type ;
typedef DynamicView<T,SP...> src_type ;
typedef typename ViewTraits<T,DP...>::execution_space dst_execution_space ;
typedef typename ViewTraits<T,SP...>::memory_space src_memory_space ;
enum { DstExecCanAccessSrc =
Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };
if ( DstExecCanAccessSrc ) {
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
Kokkos::Experimental::Impl::ViewRemap< dst_type , src_type >( dst , src );
}
else {
Kokkos::Impl::throw_runtime_exception("deep_copy given views that would require a temporary allocation");
}
}
template< class T , class ... DP , class ... SP >
inline
void deep_copy( const DynamicView<T,DP...> & dst
, const View<T,SP...> & src
)
{
typedef DynamicView<T,SP...> dst_type ;
typedef View<T,DP...> src_type ;
typedef typename ViewTraits<T,DP...>::execution_space dst_execution_space ;
typedef typename ViewTraits<T,SP...>::memory_space src_memory_space ;
enum { DstExecCanAccessSrc =
Kokkos::Impl::SpaceAccessibility< dst_execution_space , src_memory_space >::accessible };
if ( DstExecCanAccessSrc ) {
// Copying data between views in accessible memory spaces and either non-contiguous or incompatible shape.
Kokkos::Experimental::Impl::ViewRemap< dst_type , src_type >( dst , src );
}
else {
Kokkos::Impl::throw_runtime_exception("deep_copy given views that would require a temporary allocation");
}
}
} // namespace Experimental
} // namespace Kokkos
#endif /* #ifndef KOKKOS_DYNAMIC_VIEW_HPP */
diff --git a/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp b/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp
index 8646d2779..193f1bc33 100644
--- a/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp
+++ b/lib/kokkos/containers/src/Kokkos_UnorderedMap.hpp
@@ -1,848 +1,849 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
/// \file Kokkos_UnorderedMap.hpp
/// \brief Declaration and definition of Kokkos::UnorderedMap.
///
/// This header file declares and defines Kokkos::UnorderedMap and its
/// related nonmember functions.
#ifndef KOKKOS_UNORDERED_MAP_HPP
#define KOKKOS_UNORDERED_MAP_HPP
#include <Kokkos_Core.hpp>
#include <Kokkos_Functional.hpp>
#include <Kokkos_Bitset.hpp>
#include <impl/Kokkos_Traits.hpp>
#include <impl/Kokkos_UnorderedMap_impl.hpp>
#include <iostream>
#include <stdint.h>
#include <stdexcept>
namespace Kokkos {
enum { UnorderedMapInvalidIndex = ~0u };
/// \brief First element of the return value of UnorderedMap::insert().
///
/// Inserting an element into an UnorderedMap is not guaranteed to
/// succeed. There are three possible conditions:
/// <ol>
/// <li> <tt>INSERT_FAILED</tt>: The insert failed. This usually
/// means that the UnorderedMap ran out of space. </li>
/// <li> <tt>INSERT_SUCCESS</tt>: The insert succeeded, and the key
/// did <i>not</i> exist in the table before. </li>
/// <li> <tt>INSERT_EXISTING</tt>: The insert succeeded, and the key
/// <i>did</i> exist in the table before. The new value was
/// ignored and the old value was left in place. </li>
/// </ol>
class UnorderedMapInsertResult
{
private:
enum Status{
SUCCESS = 1u << 31
, EXISTING = 1u << 30
, FREED_EXISTING = 1u << 29
, LIST_LENGTH_MASK = ~(SUCCESS | EXISTING | FREED_EXISTING)
};
public:
/// Did the map successful insert the key/value pair
KOKKOS_FORCEINLINE_FUNCTION
bool success() const { return (m_status & SUCCESS); }
/// Was the key already present in the map
KOKKOS_FORCEINLINE_FUNCTION
bool existing() const { return (m_status & EXISTING); }
/// Did the map fail to insert the key due to insufficent capacity
KOKKOS_FORCEINLINE_FUNCTION
bool failed() const { return m_index == UnorderedMapInvalidIndex; }
/// Did the map lose a race condition to insert a dupulicate key/value pair
/// where an index was claimed that needed to be released
KOKKOS_FORCEINLINE_FUNCTION
bool freed_existing() const { return (m_status & FREED_EXISTING); }
/// How many iterations through the insert loop did it take before the
/// map returned
KOKKOS_FORCEINLINE_FUNCTION
uint32_t list_position() const { return (m_status & LIST_LENGTH_MASK); }
/// Index where the key can be found as long as the insert did not fail
KOKKOS_FORCEINLINE_FUNCTION
uint32_t index() const { return m_index; }
KOKKOS_FORCEINLINE_FUNCTION
UnorderedMapInsertResult()
: m_index(UnorderedMapInvalidIndex)
, m_status(0)
{}
KOKKOS_FORCEINLINE_FUNCTION
void increment_list_position()
{
m_status += (list_position() < LIST_LENGTH_MASK) ? 1u : 0u;
}
KOKKOS_FORCEINLINE_FUNCTION
void set_existing(uint32_t i, bool arg_freed_existing)
{
m_index = i;
m_status = EXISTING | (arg_freed_existing ? FREED_EXISTING : 0u) | list_position();
}
KOKKOS_FORCEINLINE_FUNCTION
void set_success(uint32_t i)
{
m_index = i;
m_status = SUCCESS | list_position();
}
private:
uint32_t m_index;
uint32_t m_status;
};
/// \class UnorderedMap
/// \brief Thread-safe, performance-portable lookup table.
///
/// This class provides a lookup table. In terms of functionality,
/// this class compares to std::unordered_map (new in C++11).
/// "Unordered" means that keys are not stored in any particular
/// order, unlike (for example) std::map. "Thread-safe" means that
/// lookups, insertion, and deletion are safe to call by multiple
/// threads in parallel. "Performance-portable" means that parallel
/// performance of these operations is reasonable, on multiple
/// hardware platforms. Platforms on which performance has been
/// tested include conventional Intel x86 multicore processors, Intel
/// Xeon Phi ("MIC"), and NVIDIA GPUs.
///
/// Parallel performance portability entails design decisions that
/// might differ from one's expectation for a sequential interface.
/// This particularly affects insertion of single elements. In an
/// interface intended for sequential use, insertion might reallocate
/// memory if the original allocation did not suffice to hold the new
/// element. In this class, insertion does <i>not</i> reallocate
/// memory. This means that it might fail. insert() returns an enum
/// which indicates whether the insert failed. There are three
/// possible conditions:
/// <ol>
/// <li> <tt>INSERT_FAILED</tt>: The insert failed. This usually
/// means that the UnorderedMap ran out of space. </li>
/// <li> <tt>INSERT_SUCCESS</tt>: The insert succeeded, and the key
/// did <i>not</i> exist in the table before. </li>
/// <li> <tt>INSERT_EXISTING</tt>: The insert succeeded, and the key
/// <i>did</i> exist in the table before. The new value was
/// ignored and the old value was left in place. </li>
/// </ol>
///
/// \tparam Key Type of keys of the lookup table. If \c const, users
/// are not allowed to add or remove keys, though they are allowed
/// to change values. In that case, the implementation may make
/// optimizations specific to the <tt>Device</tt>. For example, if
/// <tt>Device</tt> is \c Cuda, it may use texture fetches to access
/// keys.
///
/// \tparam Value Type of values stored in the lookup table. You may use
/// \c void here, in which case the table will be a set of keys. If
/// \c const, users are not allowed to change entries.
/// In that case, the implementation may make
/// optimizations specific to the \c Device, such as using texture
/// fetches to access values.
///
/// \tparam Device The Kokkos Device type.
///
/// \tparam Hasher Definition of the hash function for instances of
/// <tt>Key</tt>. The default will calculate a bitwise hash.
///
/// \tparam EqualTo Definition of the equality function for instances of
/// <tt>Key</tt>. The default will do a bitwise equality comparison.
///
template < typename Key
, typename Value
, typename Device = Kokkos::DefaultExecutionSpace
, typename Hasher = pod_hash<typename Impl::remove_const<Key>::type>
, typename EqualTo = pod_equal_to<typename Impl::remove_const<Key>::type>
>
class UnorderedMap
{
private:
typedef typename ViewTraits<Key,Device,void,void>::host_mirror_space host_mirror_space ;
public:
//! \name Public types and constants
//@{
//key_types
typedef Key declared_key_type;
typedef typename Impl::remove_const<declared_key_type>::type key_type;
typedef typename Impl::add_const<key_type>::type const_key_type;
//value_types
typedef Value declared_value_type;
typedef typename Impl::remove_const<declared_value_type>::type value_type;
typedef typename Impl::add_const<value_type>::type const_value_type;
- typedef Device execution_space;
+ typedef Device device_type;
+ typedef typename Device::execution_space execution_space;
typedef Hasher hasher_type;
typedef EqualTo equal_to_type;
typedef uint32_t size_type;
//map_types
- typedef UnorderedMap<declared_key_type,declared_value_type,execution_space,hasher_type,equal_to_type> declared_map_type;
- typedef UnorderedMap<key_type,value_type,execution_space,hasher_type,equal_to_type> insertable_map_type;
- typedef UnorderedMap<const_key_type,value_type,execution_space,hasher_type,equal_to_type> modifiable_map_type;
- typedef UnorderedMap<const_key_type,const_value_type,execution_space,hasher_type,equal_to_type> const_map_type;
+ typedef UnorderedMap<declared_key_type,declared_value_type,device_type,hasher_type,equal_to_type> declared_map_type;
+ typedef UnorderedMap<key_type,value_type,device_type,hasher_type,equal_to_type> insertable_map_type;
+ typedef UnorderedMap<const_key_type,value_type,device_type,hasher_type,equal_to_type> modifiable_map_type;
+ typedef UnorderedMap<const_key_type,const_value_type,device_type,hasher_type,equal_to_type> const_map_type;
static const bool is_set = std::is_same<void,value_type>::value;
static const bool has_const_key = std::is_same<const_key_type,declared_key_type>::value;
static const bool has_const_value = is_set || std::is_same<const_value_type,declared_value_type>::value;
static const bool is_insertable_map = !has_const_key && (is_set || !has_const_value);
static const bool is_modifiable_map = has_const_key && !has_const_value;
static const bool is_const_map = has_const_key && has_const_value;
typedef UnorderedMapInsertResult insert_result;
typedef UnorderedMap<Key,Value,host_mirror_space,Hasher,EqualTo> HostMirror;
typedef Impl::UnorderedMapHistogram<const_map_type> histogram_type;
//@}
private:
enum { invalid_index = ~static_cast<size_type>(0) };
typedef typename Impl::if_c< is_set, int, declared_value_type>::type impl_value_type;
typedef typename Impl::if_c< is_insertable_map
- , View< key_type *, execution_space>
- , View< const key_type *, execution_space, MemoryTraits<RandomAccess> >
+ , View< key_type *, device_type>
+ , View< const key_type *, device_type, MemoryTraits<RandomAccess> >
>::type key_type_view;
typedef typename Impl::if_c< is_insertable_map || is_modifiable_map
- , View< impl_value_type *, execution_space>
- , View< const impl_value_type *, execution_space, MemoryTraits<RandomAccess> >
+ , View< impl_value_type *, device_type>
+ , View< const impl_value_type *, device_type, MemoryTraits<RandomAccess> >
>::type value_type_view;
typedef typename Impl::if_c< is_insertable_map
- , View< size_type *, execution_space>
- , View< const size_type *, execution_space, MemoryTraits<RandomAccess> >
+ , View< size_type *, device_type>
+ , View< const size_type *, device_type, MemoryTraits<RandomAccess> >
>::type size_type_view;
typedef typename Impl::if_c< is_insertable_map
, Bitset< execution_space >
, ConstBitset< execution_space>
>::type bitset_type;
enum { modified_idx = 0, erasable_idx = 1, failed_insert_idx = 2 };
enum { num_scalars = 3 };
- typedef View< int[num_scalars], LayoutLeft, execution_space> scalars_view;
+ typedef View< int[num_scalars], LayoutLeft, device_type> scalars_view;
public:
//! \name Public member functions
//@{
UnorderedMap()
: m_bounded_insert()
, m_hasher()
, m_equal_to()
, m_size()
, m_available_indexes()
, m_hash_lists()
, m_next_index()
, m_keys()
, m_values()
, m_scalars()
{}
/// \brief Constructor
///
/// \param capacity_hint [in] Initial guess of how many unique keys will be inserted into the map
/// \param hash [in] Hasher function for \c Key instances. The
/// default value usually suffices.
UnorderedMap( size_type capacity_hint, hasher_type hasher = hasher_type(), equal_to_type equal_to = equal_to_type() )
: m_bounded_insert(true)
, m_hasher(hasher)
, m_equal_to(equal_to)
, m_size()
, m_available_indexes(calculate_capacity(capacity_hint))
, m_hash_lists(ViewAllocateWithoutInitializing("UnorderedMap hash list"), Impl::find_hash_size(capacity()))
, m_next_index(ViewAllocateWithoutInitializing("UnorderedMap next index"), capacity()+1) // +1 so that the *_at functions can always return a valid reference
, m_keys("UnorderedMap keys",capacity()+1)
, m_values("UnorderedMap values",(is_set? 1 : capacity()+1))
, m_scalars("UnorderedMap scalars")
{
if (!is_insertable_map) {
throw std::runtime_error("Cannot construct a non-insertable (i.e. const key_type) unordered_map");
}
Kokkos::deep_copy(m_hash_lists, invalid_index);
Kokkos::deep_copy(m_next_index, invalid_index);
}
void reset_failed_insert_flag()
{
reset_flag(failed_insert_idx);
}
histogram_type get_histogram()
{
return histogram_type(*this);
}
//! Clear all entries in the table.
void clear()
{
m_bounded_insert = true;
if (capacity() == 0) return;
m_available_indexes.clear();
Kokkos::deep_copy(m_hash_lists, invalid_index);
Kokkos::deep_copy(m_next_index, invalid_index);
{
const key_type tmp = key_type();
Kokkos::deep_copy(m_keys,tmp);
}
if (is_set){
const impl_value_type tmp = impl_value_type();
Kokkos::deep_copy(m_values,tmp);
}
{
Kokkos::deep_copy(m_scalars, 0);
}
}
/// \brief Change the capacity of the the map
///
/// If there are no failed inserts the current size of the map will
/// be used as a lower bound for the input capacity.
/// If the map is not empty and does not have failed inserts
/// and the capacity changes then the current data is copied
/// into the resized / rehashed map.
///
/// This is <i>not</i> a device function; it may <i>not</i> be
/// called in a parallel kernel.
bool rehash(size_type requested_capacity = 0)
{
const bool bounded_insert = (capacity() == 0) || (size() == 0u);
return rehash(requested_capacity, bounded_insert );
}
bool rehash(size_type requested_capacity, bool bounded_insert)
{
if(!is_insertable_map) return false;
const size_type curr_size = size();
requested_capacity = (requested_capacity < curr_size) ? curr_size : requested_capacity;
insertable_map_type tmp(requested_capacity, m_hasher, m_equal_to);
if (curr_size) {
tmp.m_bounded_insert = false;
Impl::UnorderedMapRehash<insertable_map_type> f(tmp,*this);
f.apply();
}
tmp.m_bounded_insert = bounded_insert;
*this = tmp;
return true;
}
/// \brief The number of entries in the table.
///
/// This method has undefined behavior when erasable() is true.
///
/// Note that this is not a device function; it cannot be called in
/// a parallel kernel. The value is not stored as a variable; it
/// must be computed.
size_type size() const
{
if( capacity() == 0u ) return 0u;
if (modified()) {
m_size = m_available_indexes.count();
reset_flag(modified_idx);
}
return m_size;
}
/// \brief The current number of failed insert() calls.
///
/// This is <i>not</i> a device function; it may <i>not</i> be
/// called in a parallel kernel. The value is not stored as a
/// variable; it must be computed.
bool failed_insert() const
{
return get_flag(failed_insert_idx);
}
bool erasable() const
{
return is_insertable_map ? get_flag(erasable_idx) : false;
}
bool begin_erase()
{
bool result = !erasable();
if (is_insertable_map && result) {
execution_space::fence();
set_flag(erasable_idx);
execution_space::fence();
}
return result;
}
bool end_erase()
{
bool result = erasable();
if (is_insertable_map && result) {
execution_space::fence();
Impl::UnorderedMapErase<declared_map_type> f(*this);
f.apply();
execution_space::fence();
reset_flag(erasable_idx);
}
return result;
}
/// \brief The maximum number of entries that the table can hold.
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
KOKKOS_FORCEINLINE_FUNCTION
size_type capacity() const
{ return m_available_indexes.size(); }
/// \brief The number of hash table "buckets."
///
/// This is different than the number of entries that the table can
/// hold. Each key hashes to an index in [0, hash_capacity() - 1].
/// That index can hold zero or more entries. This class decides
/// what hash_capacity() should be, given the user's upper bound on
/// the number of entries the table must be able to hold.
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
KOKKOS_INLINE_FUNCTION
size_type hash_capacity() const
{ return m_hash_lists.dimension_0(); }
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel. As discussed in the class documentation, it need not
/// succeed. The return value tells you if it did.
///
/// \param k [in] The key to attempt to insert.
/// \param v [in] The corresponding value to attempt to insert. If
/// using this class as a set (with Value = void), then you need not
/// provide this value.
KOKKOS_INLINE_FUNCTION
insert_result insert(key_type const& k, impl_value_type const&v = impl_value_type()) const
{
insert_result result;
if ( !is_insertable_map || capacity() == 0u || m_scalars((int)erasable_idx) ) {
return result;
}
if ( !m_scalars((int)modified_idx) ) {
m_scalars((int)modified_idx) = true;
}
int volatile & failed_insert_ref = m_scalars((int)failed_insert_idx) ;
const size_type hash_value = m_hasher(k);
const size_type hash_list = hash_value % m_hash_lists.dimension_0();
size_type * curr_ptr = & m_hash_lists[ hash_list ];
size_type new_index = invalid_index ;
// Force integer multiply to long
size_type index_hint = static_cast<size_type>( (static_cast<double>(hash_list) * capacity()) / m_hash_lists.dimension_0());
size_type find_attempts = 0;
enum { bounded_find_attempts = 32u };
const size_type max_attempts = (m_bounded_insert && (bounded_find_attempts < m_available_indexes.max_hint()) ) ?
bounded_find_attempts :
m_available_indexes.max_hint();
bool not_done = true ;
#if defined( __MIC__ )
#pragma noprefetch
#endif
while ( not_done ) {
// Continue searching the unordered list for this key,
// list will only be appended during insert phase.
// Need volatile_load as other threads may be appending.
size_type curr = volatile_load(curr_ptr);
KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
#if defined( __MIC__ )
#pragma noprefetch
#endif
while ( curr != invalid_index && ! m_equal_to( volatile_load(&m_keys[curr]), k) ) {
result.increment_list_position();
index_hint = curr;
curr_ptr = &m_next_index[curr];
curr = volatile_load(curr_ptr);
KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
}
//------------------------------------------------------------
// If key already present then return that index.
if ( curr != invalid_index ) {
const bool free_existing = new_index != invalid_index;
if ( free_existing ) {
// Previously claimed an unused entry that was not inserted.
// Release this unused entry immediately.
if (!m_available_indexes.reset(new_index) ) {
printf("Unable to free existing\n");
}
}
result.set_existing(curr, free_existing);
not_done = false ;
}
//------------------------------------------------------------
// Key is not currently in the map.
// If the thread has claimed an entry try to insert now.
else {
//------------------------------------------------------------
// If have not already claimed an unused entry then do so now.
if (new_index == invalid_index) {
bool found = false;
// use the hash_list as the flag for the search direction
Kokkos::tie(found, index_hint) = m_available_indexes.find_any_unset_near( index_hint, hash_list );
// found and index and this thread set it
if ( !found && ++find_attempts >= max_attempts ) {
failed_insert_ref = true;
not_done = false ;
}
else if (m_available_indexes.set(index_hint) ) {
new_index = index_hint;
// Set key and value
KOKKOS_NONTEMPORAL_PREFETCH_STORE(&m_keys[new_index]);
m_keys[new_index] = k ;
if (!is_set) {
KOKKOS_NONTEMPORAL_PREFETCH_STORE(&m_values[new_index]);
m_values[new_index] = v ;
}
// Do not proceed until key and value are updated in global memory
memory_fence();
}
}
else if (failed_insert_ref) {
not_done = false;
}
// Attempt to append claimed entry into the list.
// Another thread may also be trying to append the same list so protect with atomic.
if ( new_index != invalid_index &&
curr == atomic_compare_exchange(curr_ptr, static_cast<size_type>(invalid_index), new_index) ) {
// Succeeded in appending
result.set_success(new_index);
not_done = false ;
}
}
} // while ( not_done )
return result ;
}
KOKKOS_INLINE_FUNCTION
bool erase(key_type const& k) const
{
bool result = false;
if(is_insertable_map && 0u < capacity() && m_scalars((int)erasable_idx)) {
if ( ! m_scalars((int)modified_idx) ) {
m_scalars((int)modified_idx) = true;
}
size_type index = find(k);
if (valid_at(index)) {
m_available_indexes.reset(index);
result = true;
}
}
return result;
}
/// \brief Find the given key \c k, if it exists in the table.
///
/// \return If the key exists in the table, the index of the
/// value corresponding to that key; otherwise, an invalid index.
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
KOKKOS_INLINE_FUNCTION
size_type find( const key_type & k) const
{
size_type curr = 0u < capacity() ? m_hash_lists( m_hasher(k) % m_hash_lists.dimension_0() ) : invalid_index ;
KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
while (curr != invalid_index && !m_equal_to( m_keys[curr], k) ) {
KOKKOS_NONTEMPORAL_PREFETCH_LOAD(&m_keys[curr != invalid_index ? curr : 0]);
curr = m_next_index[curr];
}
return curr;
}
/// \brief Does the key exist in the map
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
KOKKOS_INLINE_FUNCTION
bool exists( const key_type & k) const
{
return valid_at(find(k));
}
/// \brief Get the value with \c i as its direct index.
///
/// \param i [in] Index directly into the array of entries.
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
///
/// 'const value_type' via Cuda texture fetch must return by value.
KOKKOS_FORCEINLINE_FUNCTION
typename Impl::if_c< (is_set || has_const_value), impl_value_type, impl_value_type &>::type
value_at(size_type i) const
{
return m_values[ is_set ? 0 : (i < capacity() ? i : capacity()) ];
}
/// \brief Get the key with \c i as its direct index.
///
/// \param i [in] Index directly into the array of entries.
///
/// This <i>is</i> a device function; it may be called in a parallel
/// kernel.
KOKKOS_FORCEINLINE_FUNCTION
key_type key_at(size_type i) const
{
return m_keys[ i < capacity() ? i : capacity() ];
}
KOKKOS_FORCEINLINE_FUNCTION
bool valid_at(size_type i) const
{
return m_available_indexes.test(i);
}
template <typename SKey, typename SValue>
UnorderedMap( UnorderedMap<SKey,SValue,Device,Hasher,EqualTo> const& src,
typename Impl::enable_if< Impl::UnorderedMapCanAssign<declared_key_type,declared_value_type,SKey,SValue>::value,int>::type = 0
)
: m_bounded_insert(src.m_bounded_insert)
, m_hasher(src.m_hasher)
, m_equal_to(src.m_equal_to)
, m_size(src.m_size)
, m_available_indexes(src.m_available_indexes)
, m_hash_lists(src.m_hash_lists)
, m_next_index(src.m_next_index)
, m_keys(src.m_keys)
, m_values(src.m_values)
, m_scalars(src.m_scalars)
{}
template <typename SKey, typename SValue>
typename Impl::enable_if< Impl::UnorderedMapCanAssign<declared_key_type,declared_value_type,SKey,SValue>::value
,declared_map_type & >::type
operator=( UnorderedMap<SKey,SValue,Device,Hasher,EqualTo> const& src)
{
m_bounded_insert = src.m_bounded_insert;
m_hasher = src.m_hasher;
m_equal_to = src.m_equal_to;
m_size = src.m_size;
m_available_indexes = src.m_available_indexes;
m_hash_lists = src.m_hash_lists;
m_next_index = src.m_next_index;
m_keys = src.m_keys;
m_values = src.m_values;
m_scalars = src.m_scalars;
return *this;
}
template <typename SKey, typename SValue, typename SDevice>
typename Impl::enable_if< std::is_same< typename Impl::remove_const<SKey>::type, key_type>::value &&
std::is_same< typename Impl::remove_const<SValue>::type, value_type>::value
>::type
create_copy_view( UnorderedMap<SKey, SValue, SDevice, Hasher,EqualTo> const& src)
{
if (m_hash_lists.ptr_on_device() != src.m_hash_lists.ptr_on_device()) {
insertable_map_type tmp;
tmp.m_bounded_insert = src.m_bounded_insert;
tmp.m_hasher = src.m_hasher;
tmp.m_equal_to = src.m_equal_to;
tmp.m_size = src.size();
tmp.m_available_indexes = bitset_type( src.capacity() );
tmp.m_hash_lists = size_type_view( ViewAllocateWithoutInitializing("UnorderedMap hash list"), src.m_hash_lists.dimension_0() );
tmp.m_next_index = size_type_view( ViewAllocateWithoutInitializing("UnorderedMap next index"), src.m_next_index.dimension_0() );
tmp.m_keys = key_type_view( ViewAllocateWithoutInitializing("UnorderedMap keys"), src.m_keys.dimension_0() );
tmp.m_values = value_type_view( ViewAllocateWithoutInitializing("UnorderedMap values"), src.m_values.dimension_0() );
tmp.m_scalars = scalars_view("UnorderedMap scalars");
Kokkos::deep_copy(tmp.m_available_indexes, src.m_available_indexes);
- typedef Kokkos::Impl::DeepCopy< typename execution_space::memory_space, typename SDevice::memory_space > raw_deep_copy;
+ typedef Kokkos::Impl::DeepCopy< typename device_type::memory_space, typename SDevice::memory_space > raw_deep_copy;
raw_deep_copy(tmp.m_hash_lists.ptr_on_device(), src.m_hash_lists.ptr_on_device(), sizeof(size_type)*src.m_hash_lists.dimension_0());
raw_deep_copy(tmp.m_next_index.ptr_on_device(), src.m_next_index.ptr_on_device(), sizeof(size_type)*src.m_next_index.dimension_0());
raw_deep_copy(tmp.m_keys.ptr_on_device(), src.m_keys.ptr_on_device(), sizeof(key_type)*src.m_keys.dimension_0());
if (!is_set) {
raw_deep_copy(tmp.m_values.ptr_on_device(), src.m_values.ptr_on_device(), sizeof(impl_value_type)*src.m_values.dimension_0());
}
raw_deep_copy(tmp.m_scalars.ptr_on_device(), src.m_scalars.ptr_on_device(), sizeof(int)*num_scalars );
*this = tmp;
}
}
//@}
private: // private member functions
bool modified() const
{
return get_flag(modified_idx);
}
void set_flag(int flag) const
{
- typedef Kokkos::Impl::DeepCopy< typename execution_space::memory_space, Kokkos::HostSpace > raw_deep_copy;
+ typedef Kokkos::Impl::DeepCopy< typename device_type::memory_space, Kokkos::HostSpace > raw_deep_copy;
const int true_ = true;
raw_deep_copy(m_scalars.ptr_on_device() + flag, &true_, sizeof(int));
}
void reset_flag(int flag) const
{
- typedef Kokkos::Impl::DeepCopy< typename execution_space::memory_space, Kokkos::HostSpace > raw_deep_copy;
+ typedef Kokkos::Impl::DeepCopy< typename device_type::memory_space, Kokkos::HostSpace > raw_deep_copy;
const int false_ = false;
raw_deep_copy(m_scalars.ptr_on_device() + flag, &false_, sizeof(int));
}
bool get_flag(int flag) const
{
- typedef Kokkos::Impl::DeepCopy< Kokkos::HostSpace, typename execution_space::memory_space > raw_deep_copy;
+ typedef Kokkos::Impl::DeepCopy< Kokkos::HostSpace, typename device_type::memory_space > raw_deep_copy;
int result = false;
raw_deep_copy(&result, m_scalars.ptr_on_device() + flag, sizeof(int));
return result;
}
static uint32_t calculate_capacity(uint32_t capacity_hint)
{
// increase by 16% and round to nears multiple of 128
return capacity_hint ? ((static_cast<uint32_t>(7ull*capacity_hint/6u) + 127u)/128u)*128u : 128u;
}
private: // private members
bool m_bounded_insert;
hasher_type m_hasher;
equal_to_type m_equal_to;
mutable size_type m_size;
bitset_type m_available_indexes;
size_type_view m_hash_lists;
size_type_view m_next_index;
key_type_view m_keys;
value_type_view m_values;
scalars_view m_scalars;
template <typename KKey, typename VValue, typename DDevice, typename HHash, typename EEqualTo>
friend class UnorderedMap;
template <typename UMap>
friend struct Impl::UnorderedMapErase;
template <typename UMap>
friend struct Impl::UnorderedMapHistogram;
template <typename UMap>
friend struct Impl::UnorderedMapPrint;
};
// Specialization of deep_copy for two UnorderedMap objects.
template < typename DKey, typename DT, typename DDevice
, typename SKey, typename ST, typename SDevice
, typename Hasher, typename EqualTo >
inline void deep_copy( UnorderedMap<DKey, DT, DDevice, Hasher, EqualTo> & dst
, const UnorderedMap<SKey, ST, SDevice, Hasher, EqualTo> & src )
{
dst.create_copy_view(src);
}
} // namespace Kokkos
#endif //KOKKOS_UNORDERED_MAP_HPP
diff --git a/lib/kokkos/containers/unit_tests/CMakeLists.txt b/lib/kokkos/containers/unit_tests/CMakeLists.txt
index b9d860f32..0c59c616d 100644
--- a/lib/kokkos/containers/unit_tests/CMakeLists.txt
+++ b/lib/kokkos/containers/unit_tests/CMakeLists.txt
@@ -1,40 +1,51 @@
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
-SET(SOURCES
- UnitTestMain.cpp
- TestCuda.cpp
- )
-
SET(LIBRARIES kokkoscore)
IF(Kokkos_ENABLE_Pthread)
- LIST( APPEND SOURCES
- TestThreads.cpp
+TRIBITS_ADD_EXECUTABLE_AND_TEST(
+ UnitTest_Threads
+ SOURCES TestThreads.cpp UnitTestMain.cpp
+ COMM serial mpi
+ NUM_MPI_PROCS 1
+ FAIL_REGULAR_EXPRESSION " FAILED "
+ TESTONLYLIBS kokkos_gtest
)
ENDIF()
IF(Kokkos_ENABLE_Serial)
- LIST( APPEND SOURCES
- TestSerial.cpp
+TRIBITS_ADD_EXECUTABLE_AND_TEST(
+ UnitTest_Serial
+ SOURCES TestSerial.cpp UnitTestMain.cpp
+ COMM serial mpi
+ NUM_MPI_PROCS 1
+ FAIL_REGULAR_EXPRESSION " FAILED "
+ TESTONLYLIBS kokkos_gtest
)
ENDIF()
IF(Kokkos_ENABLE_OpenMP)
- LIST( APPEND SOURCES
- TestOpenMP.cpp
+TRIBITS_ADD_EXECUTABLE_AND_TEST(
+ UnitTest_OpenMP
+ SOURCES TestOpenMP.cpp UnitTestMain.cpp
+ COMM serial mpi
+ NUM_MPI_PROCS 1
+ FAIL_REGULAR_EXPRESSION " FAILED "
+ TESTONLYLIBS kokkos_gtest
)
ENDIF()
-
+IF(Kokkos_ENABLE_Cuda)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
- UnitTest
- SOURCES ${SOURCES}
+ UnitTest_Cuda
+ SOURCES TestCuda.cpp UnitTestMain.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
-
+ENDIF()
+
diff --git a/lib/kokkos/containers/unit_tests/TestDynamicView.hpp b/lib/kokkos/containers/unit_tests/TestDynamicView.hpp
index 7e3ca005f..beb07bd79 100644
--- a/lib/kokkos/containers/unit_tests/TestDynamicView.hpp
+++ b/lib/kokkos/containers/unit_tests/TestDynamicView.hpp
@@ -1,168 +1,171 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TEST_DYNAMICVIEW_HPP
#define KOKKOS_TEST_DYNAMICVIEW_HPP
#include <gtest/gtest.h>
#include <iostream>
#include <cstdlib>
#include <cstdio>
#include <Kokkos_Core.hpp>
#include <Kokkos_DynamicView.hpp>
#include <impl/Kokkos_Timer.hpp>
namespace Test {
template< typename Scalar , class Space >
struct TestDynamicView
{
typedef typename Space::execution_space execution_space ;
typedef typename Space::memory_space memory_space ;
typedef Kokkos::Experimental::MemoryPool<typename Space::device_type> memory_pool_type;
typedef Kokkos::Experimental::DynamicView<Scalar*,Space> view_type;
+ typedef typename view_type::const_type const_view_type ;
typedef typename Kokkos::TeamPolicy<execution_space>::member_type member_type ;
typedef double value_type;
struct TEST {};
struct VERIFY {};
view_type a;
const unsigned total_size ;
TestDynamicView( const view_type & arg_a , const unsigned arg_total )
: a(arg_a), total_size( arg_total ) {}
KOKKOS_INLINE_FUNCTION
void operator() ( const TEST , member_type team_member, double& value) const
{
const unsigned int team_idx = team_member.league_rank() * team_member.team_size();
if ( team_member.team_rank() == 0 ) {
unsigned n = team_idx + team_member.team_size();
if ( total_size < n ) n = total_size ;
a.resize_parallel( n );
if ( a.extent(0) < n ) {
Kokkos::abort("GrowTest TEST failed resize_parallel");
}
}
// Make sure resize is done for all team members:
team_member.team_barrier();
const unsigned int val = team_idx + team_member.team_rank();
if ( val < total_size ) {
value += val ;
a( val ) = val ;
}
}
KOKKOS_INLINE_FUNCTION
void operator() ( const VERIFY , member_type team_member, double& value) const
{
const unsigned int val =
team_member.team_rank() +
team_member.league_rank() * team_member.team_size();
if ( val < total_size ) {
if ( val != a(val) ) {
Kokkos::abort("GrowTest VERIFY failed resize_parallel");
}
value += a(val);
}
}
static void run( unsigned arg_total_size )
{
typedef Kokkos::TeamPolicy<execution_space,TEST> TestPolicy ;
typedef Kokkos::TeamPolicy<execution_space,VERIFY> VerifyPolicy ;
// printf("TestDynamicView::run(%d) construct memory pool\n",arg_total_size);
memory_pool_type pool( memory_space() , arg_total_size * sizeof(Scalar) * 1.2 );
// printf("TestDynamicView::run(%d) construct dynamic view\n",arg_total_size);
view_type da("A",pool,arg_total_size);
+ const_view_type ca(da);
+
// printf("TestDynamicView::run(%d) construct test functor\n",arg_total_size);
TestDynamicView functor(da,arg_total_size);
const unsigned team_size = TestPolicy::team_size_recommended(functor);
const unsigned league_size = ( arg_total_size + team_size - 1 ) / team_size ;
double reference = 0;
double result = 0;
// printf("TestDynamicView::run(%d) run functor test\n",arg_total_size);
Kokkos::parallel_reduce( TestPolicy(league_size,team_size) , functor , reference);
execution_space::fence();
// printf("TestDynamicView::run(%d) run functor verify\n",arg_total_size);
Kokkos::parallel_reduce( VerifyPolicy(league_size,team_size) , functor , result );
execution_space::fence();
// printf("TestDynamicView::run(%d) done\n",arg_total_size);
}
};
} // namespace Test
#endif /* #ifndef KOKKOS_TEST_DYNAMICVIEW_HPP */
diff --git a/lib/kokkos/core/cmake/Dependencies.cmake b/lib/kokkos/core/cmake/Dependencies.cmake
index ae9a20c50..8d9872725 100644
--- a/lib/kokkos/core/cmake/Dependencies.cmake
+++ b/lib/kokkos/core/cmake/Dependencies.cmake
@@ -1,6 +1,6 @@
TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
- LIB_OPTIONAL_TPLS Pthread CUDA HWLOC QTHREAD DLlib
+ LIB_OPTIONAL_TPLS Pthread CUDA HWLOC QTHREADS DLlib
TEST_OPTIONAL_TPLS CUSPARSE
)
-TRIBITS_TPL_TENTATIVELY_ENABLE(DLlib)
\ No newline at end of file
+TRIBITS_TPL_TENTATIVELY_ENABLE(DLlib)
diff --git a/lib/kokkos/core/cmake/KokkosCore_config.h.in b/lib/kokkos/core/cmake/KokkosCore_config.h.in
index 9359b5a32..a71e60f20 100644
--- a/lib/kokkos/core/cmake/KokkosCore_config.h.in
+++ b/lib/kokkos/core/cmake/KokkosCore_config.h.in
@@ -1,67 +1,67 @@
#ifndef KOKKOS_CORE_CONFIG_H
#define KOKKOS_CORE_CONFIG_H
/* The trivial 'src/build_common.sh' creates a config
* that must stay in sync with this file.
*/
#cmakedefine KOKKOS_FOR_SIERRA
#if !defined( KOKKOS_FOR_SIERRA )
#cmakedefine KOKKOS_HAVE_MPI
#cmakedefine KOKKOS_HAVE_CUDA
// mfh 16 Sep 2014: If passed in on the command line, that overrides
// any value of KOKKOS_USE_CUDA_UVM here. Doing this should prevent build
// warnings like this one:
//
// packages/kokkos/core/src/KokkosCore_config.h:13:1: warning: "KOKKOS_USE_CUDA_UVM" redefined
//
// At some point, we should edit the test-build scripts in
// Trilinos/cmake/ctest/drivers/perseus/, and take
// -DKOKKOS_USE_CUDA_UVM from the command-line arguments there. I
// hesitate to do that now, because I'm not sure if all the files are
// including KokkosCore_config.h (or a header file that includes it) like
// they should.
#if ! defined(KOKKOS_USE_CUDA_UVM)
#cmakedefine KOKKOS_USE_CUDA_UVM
#endif // ! defined(KOKKOS_USE_CUDA_UVM)
#cmakedefine KOKKOS_HAVE_PTHREAD
#cmakedefine KOKKOS_HAVE_SERIAL
-#cmakedefine KOKKOS_HAVE_QTHREAD
+#cmakedefine KOKKOS_HAVE_QTHREADS
#cmakedefine KOKKOS_HAVE_Winthread
#cmakedefine KOKKOS_HAVE_OPENMP
#cmakedefine KOKKOS_HAVE_HWLOC
#cmakedefine KOKKOS_HAVE_DEBUG
#cmakedefine KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
#cmakedefine KOKKOS_HAVE_CXX11
#cmakedefine KOKKOS_HAVE_CUSPARSE
#cmakedefine KOKKOS_ENABLE_PROFILING_INTERNAL
#ifdef KOKKOS_ENABLE_PROFILING_INTERNAL
#define KOKKOS_ENABLE_PROFILING 1
#else
#define KOKKOS_ENABLE_PROFILING 0
#endif
#cmakedefine KOKKOS_HAVE_CUDA_RDC
#ifdef KOKKOS_HAVE_CUDA_RDC
#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1
#endif
#cmakedefine KOKKOS_HAVE_CUDA_LAMBDA
#ifdef KOKKOS_HAVE_CUDA_LAMBDA
#define KOKKOS_CUDA_USE_LAMBDA 1
#endif
// Don't forbid users from defining this macro on the command line,
// but still make sure that CMake logic can control its definition.
#if ! defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
#cmakedefine KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
#endif // KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
#cmakedefine KOKKOS_USING_DEPRECATED_VIEW
#endif // KOKKOS_FOR_SIERRA
#endif // KOKKOS_CORE_CONFIG_H
diff --git a/lib/kokkos/core/perf_test/Makefile b/lib/kokkos/core/perf_test/Makefile
index 85f869971..3a0ad2d4c 100644
--- a/lib/kokkos/core/perf_test/Makefile
+++ b/lib/kokkos/core/perf_test/Makefile
@@ -1,63 +1,62 @@
KOKKOS_PATH = ../..
GTEST_PATH = ../../tpls/gtest
vpath %.cpp ${KOKKOS_PATH}/core/perf_test
default: build_all
echo "End Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
else
CXX = g++
endif
CXXFLAGS = -O3
LINK ?= $(CXX)
LDFLAGS ?= -lpthread
include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/perf_test
TEST_TARGETS =
TARGETS =
OBJ_PERF = PerfTestHost.o PerfTestCuda.o PerfTestMain.o gtest-all.o
TARGETS += KokkosCore_PerformanceTest
TEST_TARGETS += test-performance
OBJ_ATOMICS = test_atomic.o
TARGETS += KokkosCore_PerformanceTest_Atomics
TEST_TARGETS += test-atomic
KokkosCore_PerformanceTest: $(OBJ_PERF) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_PERF) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest
KokkosCore_PerformanceTest_Atomics: $(OBJ_ATOMICS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_ATOMICS) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_Atomics
test-performance: KokkosCore_PerformanceTest
./KokkosCore_PerformanceTest
test-atomic: KokkosCore_PerformanceTest_Atomics
./KokkosCore_PerformanceTest_Atomics
build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
-
diff --git a/lib/kokkos/core/perf_test/PerfTestCuda.cpp b/lib/kokkos/core/perf_test/PerfTestCuda.cpp
index 7386ecef2..65ce61fb5 100644
--- a/lib/kokkos/core/perf_test/PerfTestCuda.cpp
+++ b/lib/kokkos/core/perf_test/PerfTestCuda.cpp
@@ -1,189 +1,199 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <iostream>
#include <iomanip>
#include <algorithm>
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_ENABLE_CUDA )
#include <impl/Kokkos_Timer.hpp>
+#include <PerfTestMDRange.hpp>
+
#include <PerfTestHexGrad.hpp>
#include <PerfTestBlasKernels.hpp>
#include <PerfTestGramSchmidt.hpp>
#include <PerfTestDriver.hpp>
namespace Test {
class cuda : public ::testing::Test {
protected:
static void SetUpTestCase() {
Kokkos::HostSpace::execution_space::initialize();
Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice(0) );
}
static void TearDownTestCase() {
Kokkos::Cuda::finalize();
Kokkos::HostSpace::execution_space::finalize();
}
};
+//TEST_F( cuda, mdrange_lr ) {
+// EXPECT_NO_THROW( (run_test_mdrange<Kokkos::Cuda , Kokkos::LayoutRight>( 5, 8, "Kokkos::Cuda" )) );
+//}
+
+//TEST_F( cuda, mdrange_ll ) {
+// EXPECT_NO_THROW( (run_test_mdrange<Kokkos::Cuda , Kokkos::LayoutLeft>( 5, 8, "Kokkos::Cuda" )) );
+//}
+
TEST_F( cuda, hexgrad )
{
EXPECT_NO_THROW( run_test_hexgrad< Kokkos::Cuda >( 10 , 20, "Kokkos::Cuda" ) );
}
TEST_F( cuda, gramschmidt )
{
EXPECT_NO_THROW( run_test_gramschmidt< Kokkos::Cuda >( 10 , 20, "Kokkos::Cuda" ) );
}
namespace {
template <typename T>
struct TextureFetch
{
typedef Kokkos::View< T *, Kokkos::CudaSpace> array_type;
typedef Kokkos::View< const T *, Kokkos::CudaSpace, Kokkos::MemoryRandomAccess> const_array_type;
typedef Kokkos::View< int *, Kokkos::CudaSpace> index_array_type;
typedef Kokkos::View< const int *, Kokkos::CudaSpace> const_index_array_type;
struct FillArray
{
array_type m_array;
FillArray( const array_type & array )
: m_array(array)
{}
void apply() const
{
Kokkos::parallel_for( Kokkos::RangePolicy<Kokkos::Cuda,int>(0,m_array.dimension_0()), *this);
}
KOKKOS_INLINE_FUNCTION
void operator()(int i) const { m_array(i) = i; }
};
struct RandomIndexes
{
index_array_type m_indexes;
typename index_array_type::HostMirror m_host_indexes;
RandomIndexes( const index_array_type & indexes)
: m_indexes(indexes)
, m_host_indexes(Kokkos::create_mirror(m_indexes))
{}
void apply() const
{
Kokkos::parallel_for( Kokkos::RangePolicy<Kokkos::HostSpace::execution_space,int>(0,m_host_indexes.dimension_0()), *this);
//random shuffle
Kokkos::HostSpace::execution_space::fence();
std::random_shuffle(m_host_indexes.ptr_on_device(), m_host_indexes.ptr_on_device() + m_host_indexes.dimension_0());
Kokkos::deep_copy(m_indexes,m_host_indexes);
}
KOKKOS_INLINE_FUNCTION
void operator()(int i) const { m_host_indexes(i) = i; }
};
struct RandomReduce
{
const_array_type m_array;
const_index_array_type m_indexes;
RandomReduce( const const_array_type & array, const const_index_array_type & indexes)
: m_array(array)
, m_indexes(indexes)
{}
void apply(T & reduce) const
{
Kokkos::parallel_reduce( Kokkos::RangePolicy<Kokkos::Cuda,int>(0,m_array.dimension_0()), *this, reduce);
}
KOKKOS_INLINE_FUNCTION
void operator()(int i, T & reduce) const
{ reduce += m_array(m_indexes(i)); }
};
static void run(int size, double & reduce_time, T &reduce)
{
array_type array("array",size);
index_array_type indexes("indexes",size);
{ FillArray f(array); f.apply(); }
{ RandomIndexes f(indexes); f.apply(); }
Kokkos::Cuda::fence();
Kokkos::Timer timer;
for (int j=0; j<10; ++j) {
RandomReduce f(array,indexes);
f.apply(reduce);
}
Kokkos::Cuda::fence();
reduce_time = timer.seconds();
}
};
} // unnamed namespace
TEST_F( cuda, texture_double )
{
printf("Random reduce of double through texture fetch\n");
for (int i=1; i<=26; ++i) {
int size = 1<<i;
double time = 0;
double reduce = 0;
TextureFetch<double>::run(size,time,reduce);
printf(" time = %1.3e size = 2^%d\n", time, i);
}
}
} // namespace Test
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
diff --git a/lib/kokkos/core/perf_test/PerfTestDriver.hpp b/lib/kokkos/core/perf_test/PerfTestDriver.hpp
index 7b6cfc5b5..4732c3275 100644
--- a/lib/kokkos/core/perf_test/PerfTestDriver.hpp
+++ b/lib/kokkos/core/perf_test/PerfTestDriver.hpp
@@ -1,152 +1,488 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <iostream>
#include <string>
// mfh 06 Jun 2013: This macro doesn't work like one might thing it
// should. It doesn't take the template parameter DeviceType and
// print its actual type name; it just literally prints out
// "DeviceType". I've worked around this below without using the
// macro, so I'm commenting out the macro to avoid compiler complaints
// about an unused macro.
// #define KOKKOS_IMPL_MACRO_TO_STRING( X ) #X
// #define KOKKOS_MACRO_TO_STRING( X ) KOKKOS_IMPL_MACRO_TO_STRING( X )
//------------------------------------------------------------------------
namespace Test {
enum { NUMBER_OF_TRIALS = 5 };
+template< class DeviceType , class LayoutType >
+void run_test_mdrange( int exp_beg , int exp_end, const char deviceTypeName[], int range_offset = 0, int tile_offset = 0 )
+// exp_beg = 6 => 2^6 = 64 is starting range length
+{
+#define MDRANGE_PERFORMANCE_OUTPUT_VERBOSE 0
+
+ std::string label_mdrange ;
+ label_mdrange.append( "\"MDRange< double , " );
+ label_mdrange.append( deviceTypeName );
+ label_mdrange.append( " >\"" );
+
+ std::string label_range_col2 ;
+ label_range_col2.append( "\"RangeColTwo< double , " );
+ label_range_col2.append( deviceTypeName );
+ label_range_col2.append( " >\"" );
+
+ std::string label_range_col_all ;
+ label_range_col_all.append( "\"RangeColAll< double , " );
+ label_range_col_all.append( deviceTypeName );
+ label_range_col_all.append( " >\"" );
+
+ if ( std::is_same<LayoutType, Kokkos::LayoutRight>::value) {
+ std::cout << "--------------------------------------------------------------\n"
+ << "Performance tests for MDRange Layout Right"
+ << "\n--------------------------------------------------------------" << std::endl;
+ } else {
+ std::cout << "--------------------------------------------------------------\n"
+ << "Performance tests for MDRange Layout Left"
+ << "\n--------------------------------------------------------------" << std::endl;
+ }
+
+
+ for (int i = exp_beg ; i < exp_end ; ++i) {
+ const int range_length = (1<<i) + range_offset;
+
+ std::cout << "\n--------------------------------------------------------------\n"
+ << "--------------------------------------------------------------\n"
+ << "MDRange Test: range bounds: " << range_length << " , " << range_length << " , " << range_length
+ << "\n--------------------------------------------------------------\n"
+ << "--------------------------------------------------------------\n";
+// << std::endl;
+
+ int t0_min = 0, t1_min = 0, t2_min = 0;
+ double seconds_min = 0.0;
+
+ // Test 1: The MDRange in full
+ {
+ int t0 = 1, t1 = 1, t2 = 1;
+ int counter = 1;
+#if !defined(KOKKOS_HAVE_CUDA)
+ int min_bnd = 8;
+ int tfast = range_length;
+#else
+ int min_bnd = 2;
+ int tfast = 32;
+#endif
+ while ( tfast >= min_bnd ) {
+ int tmid = min_bnd;
+ while ( tmid < tfast ) {
+ t0 = min_bnd;
+ t1 = tmid;
+ t2 = tfast;
+ int t2_rev = min_bnd;
+ int t1_rev = tmid;
+ int t0_rev = tfast;
+
+#if defined(KOKKOS_HAVE_CUDA)
+ //Note: Product of tile sizes must be < 1024 for Cuda
+ if ( t0*t1*t2 >= 1024 ) {
+ printf(" Exceeded Cuda tile limits; onto next range set\n\n");
+ break;
+ }
+#endif
+
+ // Run 1 with tiles LayoutRight style
+ double seconds_1 = 0;
+ { seconds_1 = MultiDimRangePerf3D< DeviceType , double , LayoutType >::test_multi_index(range_length,range_length,range_length, t0, t1, t2) ; }
+
+#if MDRANGE_PERFORMANCE_OUTPUT_VERBOSE
+ std::cout << label_mdrange
+ << " , " << t0 << " , " << t1 << " , " << t2
+ << " , " << seconds_1
+ << std::endl ;
+#endif
+
+ if ( counter == 1 ) {
+ seconds_min = seconds_1;
+ t0_min = t0;
+ t1_min = t1;
+ t2_min = t2;
+ }
+ else {
+ if ( seconds_1 < seconds_min )
+ {
+ seconds_min = seconds_1;
+ t0_min = t0;
+ t1_min = t1;
+ t2_min = t2;
+ }
+ }
+
+ // Run 2 with tiles LayoutLeft style - reverse order of tile dims
+ double seconds_1rev = 0;
+ { seconds_1rev = MultiDimRangePerf3D< DeviceType , double , LayoutType >::test_multi_index(range_length,range_length,range_length, t0_rev, t1_rev, t2_rev) ; }
+
+#if MDRANGE_PERFORMANCE_OUTPUT_VERBOSE
+ std::cout << label_mdrange
+ << " , " << t0_rev << " , " << t1_rev << " , " << t2_rev
+ << " , " << seconds_1rev
+ << std::endl ;
+#endif
+
+ if ( seconds_1rev < seconds_min )
+ {
+ seconds_min = seconds_1rev;
+ t0_min = t0_rev;
+ t1_min = t1_rev;
+ t2_min = t2_rev;
+ }
+
+ ++counter;
+ tmid <<= 1;
+ } //end inner while
+ tfast >>=1;
+ } //end outer while
+
+ std::cout << "\n"
+ << "--------------------------------------------------------------\n"
+ << label_mdrange
+ << "\n Min values "
+ << "\n Range length per dim (3D): " << range_length
+ << "\n TileDims: " << t0_min << " , " << t1_min << " , " << t2_min
+ << "\n Min time: " << seconds_min
+ << "\n---------------------------------------------------------------"
+ << std::endl ;
+ } //end scope
+
+#if !defined(KOKKOS_HAVE_CUDA)
+ double seconds_min_c = 0.0;
+ int t0c_min = 0, t1c_min = 0, t2c_min = 0;
+ int counter = 1;
+ {
+ int min_bnd = 8;
+ // Test 1_c: MDRange with 0 for 'inner' tile dim; this case will utilize the full span in that direction, should be similar to Collapse<2>
+ if ( std::is_same<LayoutType, Kokkos::LayoutRight>::value ) {
+ for ( unsigned int T0 = min_bnd; T0 < static_cast<unsigned int>(range_length); T0<<=1 ) {
+ for ( unsigned int T1 = min_bnd; T1 < static_cast<unsigned int>(range_length); T1<<=1 ) {
+ double seconds_c = 0;
+ { seconds_c = MultiDimRangePerf3D< DeviceType , double , LayoutType >::test_multi_index(range_length,range_length,range_length, T0, T1, 0) ; }
+
+#if MDRANGE_PERFORMANCE_OUTPUT_VERBOSE
+ std::cout << " MDRange LR with '0' tile - collapse-like \n"
+ << label_mdrange
+ << " , " << T0 << " , " << T1 << " , " << range_length
+ << " , " << seconds_c
+ << std::endl ;
+#endif
+
+ t2c_min = range_length;
+ if ( counter == 1 ) {
+ seconds_min_c = seconds_c;
+ t0c_min = T0;
+ t1c_min = T1;
+ }
+ else {
+ if ( seconds_c < seconds_min_c )
+ {
+ seconds_min_c = seconds_c;
+ t0c_min = T0;
+ t1c_min = T1;
+ }
+ }
+ ++counter;
+ }
+ }
+ }
+ else {
+ for ( unsigned int T1 = min_bnd; T1 <= static_cast<unsigned int>(range_length); T1<<=1 ) {
+ for ( unsigned int T2 = min_bnd; T2 <= static_cast<unsigned int>(range_length); T2<<=1 ) {
+ double seconds_c = 0;
+ { seconds_c = MultiDimRangePerf3D< DeviceType , double , LayoutType >::test_multi_index(range_length,range_length,range_length, 0, T1, T2) ; }
+
+#if MDRANGE_PERFORMANCE_OUTPUT_VERBOSE
+ std::cout << " MDRange LL with '0' tile - collapse-like \n"
+ << label_mdrange
+ << " , " <<range_length << " < " << T1 << " , " << T2
+ << " , " << seconds_c
+ << std::endl ;
+#endif
+
+
+ t0c_min = range_length;
+ if ( counter == 1 ) {
+ seconds_min_c = seconds_c;
+ t1c_min = T1;
+ t2c_min = T2;
+ }
+ else {
+ if ( seconds_c < seconds_min_c )
+ {
+ seconds_min_c = seconds_c;
+ t1c_min = T1;
+ t2c_min = T2;
+ }
+ }
+ ++counter;
+ }
+ }
+ }
+
+ std::cout
+// << "--------------------------------------------------------------\n"
+ << label_mdrange
+ << " Collapse<2> style: "
+ << "\n Min values "
+ << "\n Range length per dim (3D): " << range_length
+ << "\n TileDims: " << t0c_min << " , " << t1c_min << " , " << t2c_min
+ << "\n Min time: " << seconds_min_c
+ << "\n---------------------------------------------------------------"
+ << std::endl ;
+ } //end scope test 2
+#endif
+
+
+ // Test 2: RangePolicy Collapse2 style
+ double seconds_2 = 0;
+ { seconds_2 = RangePolicyCollapseTwo< DeviceType , double , LayoutType >::test_index_collapse_two(range_length,range_length,range_length) ; }
+ std::cout << label_range_col2
+ << " , " << range_length
+ << " , " << seconds_2
+ << std::endl ;
+
+
+ // Test 3: RangePolicy Collapse all style - not necessary, always slow
+ /*
+ double seconds_3 = 0;
+ { seconds_3 = RangePolicyCollapseAll< DeviceType , double , LayoutType >::test_collapse_all(range_length,range_length,range_length) ; }
+ std::cout << label_range_col_all
+ << " , " << range_length
+ << " , " << seconds_3
+ << "\n---------------------------------------------------------------"
+ << std::endl ;
+ */
+
+ // Compare fastest times... will never be collapse all so ignore it
+ // seconds_min = tiled MDRange
+ // seconds_min_c = collapse<2>-like MDRange (tiledim = span for fast dim) - only for non-Cuda, else tile too long
+ // seconds_2 = collapse<2>-style RangePolicy
+ // seconds_3 = collapse<3>-style RangePolicy
+
+#if !defined(KOKKOS_HAVE_CUDA)
+ if ( seconds_min < seconds_min_c ) {
+ if ( seconds_min < seconds_2 ) {
+ std::cout << "--------------------------------------------------------------\n"
+ << " Fastest run: MDRange tiled\n"
+ << " Time: " << seconds_min
+ << " Difference: " << seconds_2 - seconds_min
+ << " Other times: \n"
+ << " MDrange collapse-like (tiledim = span on fast dim) type: " << seconds_min_c << "\n"
+ << " Collapse2 Range Policy: " << seconds_2 << "\n"
+ << "\n--------------------------------------------------------------"
+ << "\n--------------------------------------------------------------"
+ //<< "\n\n"
+ << std::endl;
+ }
+ else if ( seconds_min > seconds_2 ) {
+ std::cout << " Fastest run: Collapse2 RangePolicy\n"
+ << " Time: " << seconds_2
+ << " Difference: " << seconds_min - seconds_2
+ << " Other times: \n"
+ << " MDrange Tiled: " << seconds_min << "\n"
+ << " MDrange collapse-like (tiledim = span on fast dim) type: " << seconds_min_c << "\n"
+ << "\n--------------------------------------------------------------"
+ << "\n--------------------------------------------------------------"
+ //<< "\n\n"
+ << std::endl;
+ }
+ }
+ else if ( seconds_min > seconds_min_c ) {
+ if ( seconds_min_c < seconds_2 ) {
+ std::cout << "--------------------------------------------------------------\n"
+ << " Fastest run: MDRange collapse-like (tiledim = span on fast dim) type\n"
+ << " Time: " << seconds_min_c
+ << " Difference: " << seconds_2 - seconds_min_c
+ << " Other times: \n"
+ << " MDrange Tiled: " << seconds_min << "\n"
+ << " Collapse2 Range Policy: " << seconds_2 << "\n"
+ << "\n--------------------------------------------------------------"
+ << "\n--------------------------------------------------------------"
+ //<< "\n\n"
+ << std::endl;
+ }
+ else if ( seconds_min_c > seconds_2 ) {
+ std::cout << " Fastest run: Collapse2 RangePolicy\n"
+ << " Time: " << seconds_2
+ << " Difference: " << seconds_min_c - seconds_2
+ << " Other times: \n"
+ << " MDrange Tiled: " << seconds_min << "\n"
+ << " MDrange collapse-like (tiledim = span on fast dim) type: " << seconds_min_c << "\n"
+ << "\n--------------------------------------------------------------"
+ << "\n--------------------------------------------------------------"
+ //<< "\n\n"
+ << std::endl;
+ }
+ } // end else if
+#else
+ if ( seconds_min < seconds_2 ) {
+ std::cout << "--------------------------------------------------------------\n"
+ << " Fastest run: MDRange tiled\n"
+ << " Time: " << seconds_min
+ << " Difference: " << seconds_2 - seconds_min
+ << " Other times: \n"
+ << " Collapse2 Range Policy: " << seconds_2 << "\n"
+ << "\n--------------------------------------------------------------"
+ << "\n--------------------------------------------------------------"
+ //<< "\n\n"
+ << std::endl;
+ }
+ else if ( seconds_min > seconds_2 ) {
+ std::cout << " Fastest run: Collapse2 RangePolicy\n"
+ << " Time: " << seconds_2
+ << " Difference: " << seconds_min - seconds_2
+ << " Other times: \n"
+ << " MDrange Tiled: " << seconds_min << "\n"
+ << "\n--------------------------------------------------------------"
+ << "\n--------------------------------------------------------------"
+ //<< "\n\n"
+ << std::endl;
+ }
+#endif
+
+ } //end for
+
+#undef MDRANGE_PERFORMANCE_OUTPUT_VERBOSE
+
+}
template< class DeviceType >
void run_test_hexgrad( int exp_beg , int exp_end, const char deviceTypeName[] )
{
std::string label_hexgrad ;
label_hexgrad.append( "\"HexGrad< double , " );
// mfh 06 Jun 2013: This only appends "DeviceType" (literally) to
// the string, not the actual name of the device type. Thus, I've
// modified the function to take the name of the device type.
//
//label_hexgrad.append( KOKKOS_MACRO_TO_STRING( DeviceType ) );
label_hexgrad.append( deviceTypeName );
label_hexgrad.append( " >\"" );
for (int i = exp_beg ; i < exp_end ; ++i) {
double min_seconds = 0.0 ;
double max_seconds = 0.0 ;
double avg_seconds = 0.0 ;
const int parallel_work_length = 1<<i;
for ( int j = 0 ; j < NUMBER_OF_TRIALS ; ++j ) {
const double seconds = HexGrad< DeviceType >::test(parallel_work_length) ;
if ( 0 == j ) {
min_seconds = seconds ;
max_seconds = seconds ;
}
else {
if ( seconds < min_seconds ) min_seconds = seconds ;
if ( seconds > max_seconds ) max_seconds = seconds ;
}
avg_seconds += seconds ;
}
avg_seconds /= NUMBER_OF_TRIALS ;
std::cout << label_hexgrad
<< " , " << parallel_work_length
<< " , " << min_seconds
<< " , " << ( min_seconds / parallel_work_length )
<< std::endl ;
}
}
template< class DeviceType >
void run_test_gramschmidt( int exp_beg , int exp_end, const char deviceTypeName[] )
{
std::string label_gramschmidt ;
label_gramschmidt.append( "\"GramSchmidt< double , " );
// mfh 06 Jun 2013: This only appends "DeviceType" (literally) to
// the string, not the actual name of the device type. Thus, I've
// modified the function to take the name of the device type.
//
//label_gramschmidt.append( KOKKOS_MACRO_TO_STRING( DeviceType ) );
label_gramschmidt.append( deviceTypeName );
label_gramschmidt.append( " >\"" );
for (int i = exp_beg ; i < exp_end ; ++i) {
double min_seconds = 0.0 ;
double max_seconds = 0.0 ;
double avg_seconds = 0.0 ;
const int parallel_work_length = 1<<i;
for ( int j = 0 ; j < NUMBER_OF_TRIALS ; ++j ) {
const double seconds = ModifiedGramSchmidt< double , DeviceType >::test(parallel_work_length, 32 ) ;
if ( 0 == j ) {
min_seconds = seconds ;
max_seconds = seconds ;
}
else {
if ( seconds < min_seconds ) min_seconds = seconds ;
if ( seconds > max_seconds ) max_seconds = seconds ;
}
avg_seconds += seconds ;
}
avg_seconds /= NUMBER_OF_TRIALS ;
std::cout << label_gramschmidt
<< " , " << parallel_work_length
<< " , " << min_seconds
<< " , " << ( min_seconds / parallel_work_length )
<< std::endl ;
}
}
}
diff --git a/lib/kokkos/core/perf_test/PerfTestHost.cpp b/lib/kokkos/core/perf_test/PerfTestHost.cpp
index 606177ca5..831d58110 100644
--- a/lib/kokkos/core/perf_test/PerfTestHost.cpp
+++ b/lib/kokkos/core/perf_test/PerfTestHost.cpp
@@ -1,115 +1,125 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_ENABLE_OPENMP )
typedef Kokkos::OpenMP TestHostDevice ;
const char TestHostDeviceName[] = "Kokkos::OpenMP" ;
#elif defined( KOKKOS_ENABLE_PTHREAD )
typedef Kokkos::Threads TestHostDevice ;
const char TestHostDeviceName[] = "Kokkos::Threads" ;
#elif defined( KOKKOS_ENABLE_SERIAL )
typedef Kokkos::Serial TestHostDevice ;
const char TestHostDeviceName[] = "Kokkos::Serial" ;
#else
# error "You must enable at least one of the following execution spaces in order to build this test: Kokkos::Threads, Kokkos::OpenMP, or Kokkos::Serial."
#endif
#include <impl/Kokkos_Timer.hpp>
+#include <PerfTestMDRange.hpp>
+
#include <PerfTestHexGrad.hpp>
#include <PerfTestBlasKernels.hpp>
#include <PerfTestGramSchmidt.hpp>
#include <PerfTestDriver.hpp>
//------------------------------------------------------------------------
namespace Test {
class host : public ::testing::Test {
protected:
static void SetUpTestCase()
{
if(Kokkos::hwloc::available()) {
const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
unsigned threads_count = 0 ;
threads_count = std::max( 1u , numa_count )
* std::max( 2u , cores_per_numa * threads_per_core );
TestHostDevice::initialize( threads_count );
} else {
const unsigned thread_count = 4 ;
TestHostDevice::initialize( thread_count );
}
}
static void TearDownTestCase()
{
TestHostDevice::finalize();
}
};
+//TEST_F( host, mdrange_lr ) {
+// EXPECT_NO_THROW( (run_test_mdrange<TestHostDevice , Kokkos::LayoutRight> (5, 8, TestHostDeviceName) ) );
+//}
+
+//TEST_F( host, mdrange_ll ) {
+// EXPECT_NO_THROW( (run_test_mdrange<TestHostDevice , Kokkos::LayoutLeft> (5, 8, TestHostDeviceName) ) );
+//}
+
TEST_F( host, hexgrad ) {
EXPECT_NO_THROW(run_test_hexgrad< TestHostDevice>( 10, 20, TestHostDeviceName ));
}
TEST_F( host, gramschmidt ) {
EXPECT_NO_THROW(run_test_gramschmidt< TestHostDevice>( 10, 20, TestHostDeviceName ));
}
} // namespace Test
diff --git a/lib/kokkos/core/perf_test/PerfTestMDRange.hpp b/lib/kokkos/core/perf_test/PerfTestMDRange.hpp
new file mode 100644
index 000000000..d910b513c
--- /dev/null
+++ b/lib/kokkos/core/perf_test/PerfTestMDRange.hpp
@@ -0,0 +1,564 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+namespace Test {
+template< class DeviceType
+ , typename ScalarType = double
+ , typename TestLayout = Kokkos::LayoutRight
+ >
+struct MultiDimRangePerf3D
+{
+ typedef DeviceType execution_space;
+ typedef typename execution_space::size_type size_type;
+
+ using iterate_type = Kokkos::Experimental::Iterate;
+
+ typedef Kokkos::View<ScalarType***, TestLayout, DeviceType> view_type;
+ typedef typename view_type::HostMirror host_view_type;
+
+ view_type A;
+ view_type B;
+ const long irange;
+ const long jrange;
+ const long krange;
+
+ MultiDimRangePerf3D(const view_type & A_, const view_type & B_, const long &irange_, const long &jrange_, const long &krange_)
+ : A(A_), B(B_), irange(irange_), jrange(jrange_), krange(krange_)
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()(const long i, const long j, const long k) const
+ {
+ A(i,j,k) = 0.25*(ScalarType)( B(i+2,j,k) + B(i+1,j,k)
+ + B(i,j+2,k) + B(i,j+1,k)
+ + B(i,j,k+2) + B(i,j,k+1)
+ + B(i,j,k) );
+ }
+
+
+ struct InitZeroTag {};
+// struct InitViewTag {};
+
+ struct Init
+ {
+
+ Init(const view_type & input_, const long &irange_, const long &jrange_, const long &krange_)
+ : input(input_), irange(irange_), jrange(jrange_), krange(krange_) {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()(const long i, const long j, const long k) const
+ {
+ input(i,j,k) = 1.0;
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()(const InitZeroTag&, const long i, const long j, const long k) const
+ {
+ input(i,j,k) = 0;
+ }
+
+ view_type input;
+ const long irange;
+ const long jrange;
+ const long krange;
+ };
+
+
+ static double test_multi_index(const unsigned int icount, const unsigned int jcount, const unsigned int kcount, const unsigned int Ti = 1, const unsigned int Tj = 1, const unsigned int Tk = 1, const long iter = 1)
+ {
+ //This test performs multidim range over all dims
+ view_type Atest("Atest", icount, jcount, kcount);
+ view_type Btest("Btest", icount+2, jcount+2, kcount+2);
+ typedef MultiDimRangePerf3D<execution_space,ScalarType,TestLayout> FunctorType;
+
+ double dt_min = 0;
+
+ // LayoutRight
+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value ) {
+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Right, iterate_type::Right>, execution_space > policy_initA({{0,0,0}},{{icount,jcount,kcount}},{{Ti,Tj,Tk}});
+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Right, iterate_type::Right>, execution_space > policy_initB({{0,0,0}},{{icount+2,jcount+2,kcount+2}},{{Ti,Tj,Tk}});
+
+ typedef typename Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Right, iterate_type::Right>, execution_space > MDRangeType;
+ using tile_type = typename MDRangeType::tile_type;
+ using point_type = typename MDRangeType::point_type;
+
+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Right, iterate_type::Right>, execution_space > policy(point_type{{0,0,0}},point_type{{icount,jcount,kcount}},tile_type{{Ti,Tj,Tk}} );
+
+ Kokkos::Experimental::md_parallel_for( policy_initA, Init(Atest, icount, jcount, kcount) );
+ execution_space::fence();
+ Kokkos::Experimental::md_parallel_for( policy_initB, Init(Btest, icount+2, jcount+2, kcount+2) );
+ execution_space::fence();
+
+ for (int i = 0; i < iter; ++i)
+ {
+ Kokkos::Timer timer;
+ Kokkos::Experimental::md_parallel_for( policy, FunctorType(Atest, Btest, icount, jcount, kcount) );
+ execution_space::fence();
+ const double dt = timer.seconds();
+ if ( 0 == i ) dt_min = dt ;
+ else dt_min = dt < dt_min ? dt : dt_min ;
+
+ //Correctness check - only the first run
+ if ( 0 == i )
+ {
+ long numErrors = 0;
+ host_view_type Ahost("Ahost", icount, jcount, kcount);
+ Kokkos::deep_copy(Ahost, Atest);
+ host_view_type Bhost("Bhost", icount+2, jcount+2, kcount+2);
+ Kokkos::deep_copy(Bhost, Btest);
+
+ // On KNL, this may vectorize - add print statement to prevent
+ // Also, compare against epsilon, as vectorization can change bitwise answer
+ for ( long l = 0; l < static_cast<long>(icount); ++l ) {
+ for ( long j = 0; j < static_cast<long>(jcount); ++j ) {
+ for ( long k = 0; k < static_cast<long>(kcount); ++k ) {
+ ScalarType check = 0.25*(ScalarType)( Bhost(l+2,j,k) + Bhost(l+1,j,k)
+ + Bhost(l,j+2,k) + Bhost(l,j+1,k)
+ + Bhost(l,j,k+2) + Bhost(l,j,k+1)
+ + Bhost(l,j,k) );
+ if ( Ahost(l,j,k) - check != 0 ) {
+ ++numErrors;
+ std::cout << " Correctness error at index: " << l << ","<<j<<","<<k<<"\n"
+ << " multi Ahost = " << Ahost(l,j,k) << " expected = " << check
+ << " multi Bhost(ijk) = " << Bhost(l,j,k)
+ << " multi Bhost(l+1jk) = " << Bhost(l+1,j,k)
+ << " multi Bhost(l+2jk) = " << Bhost(l+2,j,k)
+ << " multi Bhost(ij+1k) = " << Bhost(l,j+1,k)
+ << " multi Bhost(ij+2k) = " << Bhost(l,j+2,k)
+ << " multi Bhost(ijk+1) = " << Bhost(l,j,k+1)
+ << " multi Bhost(ijk+2) = " << Bhost(l,j,k+2)
+ << std::endl;
+ //exit(-1);
+ }
+ } } }
+ if ( numErrors != 0 ) { std::cout << "LR multi: errors " << numErrors << " range product " << icount*jcount*kcount << " LL " << jcount*kcount << " LR " << icount*jcount << std::endl; }
+ //else { std::cout << " multi: No errors!" << std::endl; }
+ }
+ } //end for
+
+ }
+ // LayoutLeft
+ else {
+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3,iterate_type::Left,iterate_type::Left>, execution_space > policy_initA({{0,0,0}},{{icount,jcount,kcount}},{{Ti,Tj,Tk}});
+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3,iterate_type::Left,iterate_type::Left>, execution_space > policy_initB({{0,0,0}},{{icount+2,jcount+2,kcount+2}},{{Ti,Tj,Tk}});
+
+ //typedef typename Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Left, iterate_type::Left>, execution_space > MDRangeType;
+ //using tile_type = typename MDRangeType::tile_type;
+ //using point_type = typename MDRangeType::point_type;
+ //Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Left, iterate_type::Left>, execution_space > policy(point_type{{0,0,0}},point_type{{icount,jcount,kcount}},tile_type{{Ti,Tj,Tk}} );
+ Kokkos::Experimental::MDRangePolicy<Kokkos::Experimental::Rank<3, iterate_type::Left, iterate_type::Left>, execution_space > policy({{0,0,0}},{{icount,jcount,kcount}},{{Ti,Tj,Tk}} );
+
+ Kokkos::Experimental::md_parallel_for( policy_initA, Init(Atest, icount, jcount, kcount) );
+ execution_space::fence();
+ Kokkos::Experimental::md_parallel_for( policy_initB, Init(Btest, icount+2, jcount+2, kcount+2) );
+ execution_space::fence();
+
+ for (int i = 0; i < iter; ++i)
+ {
+ Kokkos::Timer timer;
+ Kokkos::Experimental::md_parallel_for( policy, FunctorType(Atest, Btest, icount, jcount, kcount) );
+ execution_space::fence();
+ const double dt = timer.seconds();
+ if ( 0 == i ) dt_min = dt ;
+ else dt_min = dt < dt_min ? dt : dt_min ;
+
+ //Correctness check - only the first run
+ if ( 0 == i )
+ {
+ long numErrors = 0;
+ host_view_type Ahost("Ahost", icount, jcount, kcount);
+ Kokkos::deep_copy(Ahost, Atest);
+ host_view_type Bhost("Bhost", icount+2, jcount+2, kcount+2);
+ Kokkos::deep_copy(Bhost, Btest);
+
+ // On KNL, this may vectorize - add print statement to prevent
+ // Also, compare against epsilon, as vectorization can change bitwise answer
+ for ( long l = 0; l < static_cast<long>(icount); ++l ) {
+ for ( long j = 0; j < static_cast<long>(jcount); ++j ) {
+ for ( long k = 0; k < static_cast<long>(kcount); ++k ) {
+ ScalarType check = 0.25*(ScalarType)( Bhost(l+2,j,k) + Bhost(l+1,j,k)
+ + Bhost(l,j+2,k) + Bhost(l,j+1,k)
+ + Bhost(l,j,k+2) + Bhost(l,j,k+1)
+ + Bhost(l,j,k) );
+ if ( Ahost(l,j,k) - check != 0 ) {
+ ++numErrors;
+ std::cout << " Correctness error at index: " << l << ","<<j<<","<<k<<"\n"
+ << " multi Ahost = " << Ahost(l,j,k) << " expected = " << check
+ << " multi Bhost(ijk) = " << Bhost(l,j,k)
+ << " multi Bhost(l+1jk) = " << Bhost(l+1,j,k)
+ << " multi Bhost(l+2jk) = " << Bhost(l+2,j,k)
+ << " multi Bhost(ij+1k) = " << Bhost(l,j+1,k)
+ << " multi Bhost(ij+2k) = " << Bhost(l,j+2,k)
+ << " multi Bhost(ijk+1) = " << Bhost(l,j,k+1)
+ << " multi Bhost(ijk+2) = " << Bhost(l,j,k+2)
+ << std::endl;
+ //exit(-1);
+ }
+ } } }
+ if ( numErrors != 0 ) { std::cout << " LL multi run: errors " << numErrors << " range product " << icount*jcount*kcount << " LL " << jcount*kcount << " LR " << icount*jcount << std::endl; }
+ //else { std::cout << " multi: No errors!" << std::endl; }
+
+ }
+ } //end for
+ }
+
+ return dt_min;
+ }
+
+};
+
+
+template< class DeviceType
+ , typename ScalarType = double
+ , typename TestLayout = Kokkos::LayoutRight
+ >
+struct RangePolicyCollapseTwo
+{
+ // RangePolicy for 3D range, but will collapse only 2 dims => like Rank<2> for multi-dim; unroll 2 dims in one-dim
+
+ typedef DeviceType execution_space;
+ typedef typename execution_space::size_type size_type;
+ typedef TestLayout layout;
+
+ using iterate_type = Kokkos::Experimental::Iterate;
+
+ typedef Kokkos::View<ScalarType***, TestLayout, DeviceType> view_type;
+ typedef typename view_type::HostMirror host_view_type;
+
+ view_type A;
+ view_type B;
+ const long irange;
+ const long jrange;
+ const long krange;
+
+ RangePolicyCollapseTwo(view_type & A_, const view_type & B_, const long &irange_, const long &jrange_, const long &krange_)
+ : A(A_), B(B_) , irange(irange_), jrange(jrange_), krange(krange_)
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()(const long r) const
+ {
+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value )
+ {
+//id(i,j,k) = k + j*Nk + i*Nk*Nj = k + Nk*(j + i*Nj) = k + Nk*r
+//r = j + i*Nj
+ long i = int(r / jrange);
+ long j = int( r - i*jrange);
+ for (int k = 0; k < krange; ++k) {
+ A(i,j,k) = 0.25*(ScalarType)( B(i+2,j,k) + B(i+1,j,k)
+ + B(i,j+2,k) + B(i,j+1,k)
+ + B(i,j,k+2) + B(i,j,k+1)
+ + B(i,j,k) );
+ }
+ }
+ else if ( std::is_same<TestLayout, Kokkos::LayoutLeft>::value )
+ {
+//id(i,j,k) = i + j*Ni + k*Ni*Nj = i + Ni*(j + k*Nj) = i + Ni*r
+//r = j + k*Nj
+ long k = int(r / jrange);
+ long j = int( r - k*jrange);
+ for (int i = 0; i < irange; ++i) {
+ A(i,j,k) = 0.25*(ScalarType)( B(i+2,j,k) + B(i+1,j,k)
+ + B(i,j+2,k) + B(i,j+1,k)
+ + B(i,j,k+2) + B(i,j,k+1)
+ + B(i,j,k) );
+ }
+ }
+ }
+
+
+ struct Init
+ {
+ view_type input;
+ const long irange;
+ const long jrange;
+ const long krange;
+
+ Init(const view_type & input_, const long &irange_, const long &jrange_, const long &krange_)
+ : input(input_), irange(irange_), jrange(jrange_), krange(krange_) {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()(const long r) const
+ {
+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value )
+ {
+ long i = int(r / jrange);
+ long j = int( r - i*jrange);
+ for (int k = 0; k < krange; ++k) {
+ input(i,j,k) = 1;
+ }
+ }
+ else if ( std::is_same<TestLayout, Kokkos::LayoutLeft>::value )
+ {
+ long k = int(r / jrange);
+ long j = int( r - k*jrange);
+ for (int i = 0; i < irange; ++i) {
+ input(i,j,k) = 1;
+ }
+ }
+ }
+ };
+
+
+ static double test_index_collapse_two(const unsigned int icount, const unsigned int jcount, const unsigned int kcount, const long iter = 1)
+ {
+ // This test refers to collapsing two dims while using the RangePolicy
+ view_type Atest("Atest", icount, jcount, kcount);
+ view_type Btest("Btest", icount+2, jcount+2, kcount+2);
+ typedef RangePolicyCollapseTwo<execution_space,ScalarType,TestLayout> FunctorType;
+
+ long collapse_index_rangeA = 0;
+ long collapse_index_rangeB = 0;
+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value ) {
+ collapse_index_rangeA = icount*jcount;
+ collapse_index_rangeB = (icount+2)*(jcount+2);
+// std::cout << " LayoutRight " << std::endl;
+ } else if ( std::is_same<TestLayout, Kokkos::LayoutLeft>::value ) {
+ collapse_index_rangeA = kcount*jcount;
+ collapse_index_rangeB = (kcount+2)*(jcount+2);
+// std::cout << " LayoutLeft " << std::endl;
+ } else {
+ std::cout << " LayoutRight or LayoutLeft required - will pass 0 as range instead " << std::endl;
+ exit(-1);
+ }
+
+ Kokkos::RangePolicy<execution_space> policy(0, (collapse_index_rangeA) );
+ Kokkos::RangePolicy<execution_space> policy_initB(0, (collapse_index_rangeB) );
+
+ double dt_min = 0;
+
+ Kokkos::parallel_for( policy, Init(Atest,icount,jcount,kcount) );
+ execution_space::fence();
+ Kokkos::parallel_for( policy_initB, Init(Btest,icount+2,jcount+2,kcount+2) );
+ execution_space::fence();
+
+ for (int i = 0; i < iter; ++i)
+ {
+ Kokkos::Timer timer;
+ Kokkos::parallel_for(policy, FunctorType(Atest, Btest, icount, jcount, kcount));
+ execution_space::fence();
+ const double dt = timer.seconds();
+ if ( 0 == i ) dt_min = dt ;
+ else dt_min = dt < dt_min ? dt : dt_min ;
+
+ //Correctness check - first iteration only
+ if ( 0 == i )
+ {
+ long numErrors = 0;
+ host_view_type Ahost("Ahost", icount, jcount, kcount);
+ Kokkos::deep_copy(Ahost, Atest);
+ host_view_type Bhost("Bhost", icount+2, jcount+2, kcount+2);
+ Kokkos::deep_copy(Bhost, Btest);
+
+ // On KNL, this may vectorize - add print statement to prevent
+ // Also, compare against epsilon, as vectorization can change bitwise answer
+ for ( long l = 0; l < static_cast<long>(icount); ++l ) {
+ for ( long j = 0; j < static_cast<long>(jcount); ++j ) {
+ for ( long k = 0; k < static_cast<long>(kcount); ++k ) {
+ ScalarType check = 0.25*(ScalarType)( Bhost(l+2,j,k) + Bhost(l+1,j,k)
+ + Bhost(l,j+2,k) + Bhost(l,j+1,k)
+ + Bhost(l,j,k+2) + Bhost(l,j,k+1)
+ + Bhost(l,j,k) );
+ if ( Ahost(l,j,k) - check != 0 ) {
+ ++numErrors;
+ std::cout << " Correctness error at index: " << l << ","<<j<<","<<k<<"\n"
+ << " flat Ahost = " << Ahost(l,j,k) << " expected = " << check << std::endl;
+ //exit(-1);
+ }
+ } } }
+ if ( numErrors != 0 ) { std::cout << " RP collapse2: errors " << numErrors << " range product " << icount*jcount*kcount << " LL " << jcount*kcount << " LR " << icount*jcount << std::endl; }
+ //else { std::cout << " RP collapse2: Pass! " << std::endl; }
+ }
+ }
+
+ return dt_min;
+ }
+
+};
+
+
+template< class DeviceType
+ , typename ScalarType = double
+ , typename TestLayout = Kokkos::LayoutRight
+ >
+struct RangePolicyCollapseAll
+{
+ // RangePolicy for 3D range, but will collapse all dims
+
+ typedef DeviceType execution_space;
+ typedef typename execution_space::size_type size_type;
+ typedef TestLayout layout;
+
+ typedef Kokkos::View<ScalarType***, TestLayout, DeviceType> view_type;
+ typedef typename view_type::HostMirror host_view_type;
+
+ view_type A;
+ view_type B;
+ const long irange;
+ const long jrange;
+ const long krange;
+
+ RangePolicyCollapseAll(view_type & A_, const view_type & B_, const long &irange_, const long &jrange_, const long &krange_)
+ : A(A_), B(B_), irange(irange_), jrange(jrange_), krange(krange_)
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()(const long r) const
+ {
+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value )
+ {
+ long i = int(r / (jrange*krange));
+ long j = int(( r - i*jrange*krange)/krange);
+ long k = int(r - i*jrange*krange - j*krange);
+ A(i,j,k) = 0.25*(ScalarType)( B(i+2,j,k) + B(i+1,j,k)
+ + B(i,j+2,k) + B(i,j+1,k)
+ + B(i,j,k+2) + B(i,j,k+1)
+ + B(i,j,k) );
+ }
+ else if ( std::is_same<TestLayout, Kokkos::LayoutLeft>::value )
+ {
+ long k = int(r / (irange*jrange));
+ long j = int(( r - k*irange*jrange)/irange);
+ long i = int(r - k*irange*jrange - j*irange);
+ A(i,j,k) = 0.25*(ScalarType)( B(i+2,j,k) + B(i+1,j,k)
+ + B(i,j+2,k) + B(i,j+1,k)
+ + B(i,j,k+2) + B(i,j,k+1)
+ + B(i,j,k) );
+ }
+ }
+
+
+ struct Init
+ {
+ view_type input;
+ const long irange;
+ const long jrange;
+ const long krange;
+
+ Init(const view_type & input_, const long &irange_, const long &jrange_, const long &krange_)
+ : input(input_), irange(irange_), jrange(jrange_), krange(krange_) {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()(const long r) const
+ {
+ if ( std::is_same<TestLayout, Kokkos::LayoutRight>::value )
+ {
+ long i = int(r / (jrange*krange));
+ long j = int(( r - i*jrange*krange)/krange);
+ long k = int(r - i*jrange*krange - j*krange);
+ input(i,j,k) = 1;
+ }
+ else if ( std::is_same<TestLayout, Kokkos::LayoutLeft>::value )
+ {
+ long k = int(r / (irange*jrange));
+ long j = int(( r - k*irange*jrange)/irange);
+ long i = int(r - k*irange*jrange - j*irange);
+ input(i,j,k) = 1;
+ }
+ }
+ };
+
+
+ static double test_collapse_all(const unsigned int icount, const unsigned int jcount, const unsigned int kcount, const long iter = 1)
+ {
+ //This test refers to collapsing all dims using the RangePolicy
+ view_type Atest("Atest", icount, jcount, kcount);
+ view_type Btest("Btest", icount+2, jcount+2, kcount+2);
+ typedef RangePolicyCollapseAll<execution_space,ScalarType,TestLayout> FunctorType;
+
+ const long flat_index_range = icount*jcount*kcount;
+ Kokkos::RangePolicy<execution_space> policy(0, flat_index_range );
+ Kokkos::RangePolicy<execution_space> policy_initB(0, (icount+2)*(jcount+2)*(kcount+2) );
+
+ double dt_min = 0;
+
+ Kokkos::parallel_for( policy, Init(Atest,icount,jcount,kcount) );
+ execution_space::fence();
+ Kokkos::parallel_for( policy_initB, Init(Btest,icount+2,jcount+2,kcount+2) );
+ execution_space::fence();
+
+ for (int i = 0; i < iter; ++i)
+ {
+ Kokkos::Timer timer;
+ Kokkos::parallel_for(policy, FunctorType(Atest, Btest, icount, jcount, kcount));
+ execution_space::fence();
+ const double dt = timer.seconds();
+ if ( 0 == i ) dt_min = dt ;
+ else dt_min = dt < dt_min ? dt : dt_min ;
+
+ //Correctness check - first iteration only
+ if ( 0 == i )
+ {
+ long numErrors = 0;
+ host_view_type Ahost("Ahost", icount, jcount, kcount);
+ Kokkos::deep_copy(Ahost, Atest);
+ host_view_type Bhost("Bhost", icount+2, jcount+2, kcount+2);
+ Kokkos::deep_copy(Bhost, Btest);
+
+ // On KNL, this may vectorize - add print statement to prevent
+ // Also, compare against epsilon, as vectorization can change bitwise answer
+ for ( long l = 0; l < static_cast<long>(icount); ++l ) {
+ for ( long j = 0; j < static_cast<long>(jcount); ++j ) {
+ for ( long k = 0; k < static_cast<long>(kcount); ++k ) {
+ ScalarType check = 0.25*(ScalarType)( Bhost(l+2,j,k) + Bhost(l+1,j,k)
+ + Bhost(l,j+2,k) + Bhost(l,j+1,k)
+ + Bhost(l,j,k+2) + Bhost(l,j,k+1)
+ + Bhost(l,j,k) );
+ if ( Ahost(l,j,k) - check != 0 ) {
+ ++numErrors;
+ std::cout << " Callapse ALL Correctness error at index: " << l << ","<<j<<","<<k<<"\n"
+ << " flat Ahost = " << Ahost(l,j,k) << " expected = " << check << std::endl;
+ //exit(-1);
+ }
+ } } }
+ if ( numErrors != 0 ) { std::cout << " RP collapse all: errors " << numErrors << " range product " << icount*jcount*kcount << " LL " << jcount*kcount << " LR " << icount*jcount << std::endl; }
+ //else { std::cout << " RP collapse all: Pass! " << std::endl; }
+ }
+ }
+
+ return dt_min;
+ }
+
+};
+
+} //end namespace Test
diff --git a/lib/kokkos/core/src/CMakeLists.txt b/lib/kokkos/core/src/CMakeLists.txt
index 807a01ed0..492470d05 100644
--- a/lib/kokkos/core/src/CMakeLists.txt
+++ b/lib/kokkos/core/src/CMakeLists.txt
@@ -1,113 +1,111 @@
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Serial
KOKKOS_HAVE_SERIAL
"Whether to enable the Kokkos::Serial device. This device executes \"parallel\" kernels sequentially on a single CPU thread. It is enabled by default. If you disable this device, please enable at least one other CPU device, such as Kokkos::OpenMP or Kokkos::Threads."
ON
)
ASSERT_DEFINED(${PROJECT_NAME}_ENABLE_CXX11)
ASSERT_DEFINED(${PACKAGE_NAME}_ENABLE_CUDA)
# Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA governs whether Kokkos allows
# use of lambdas at the outer level of parallel dispatch (that is, as
# the argument to an outer parallel_for, parallel_reduce, or
# parallel_scan). This works with non-CUDA execution spaces if C++11
# is enabled. It does not currently work with public releases of
# CUDA. If that changes, please change the default here to ON if CUDA
# and C++11 are ON.
IF (${PROJECT_NAME}_ENABLE_CXX11)
IF (${PACKAGE_NAME}_ENABLE_CUDA)
SET(Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA_DEFAULT OFF)
ELSE ()
SET(Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA_DEFAULT ON)
ENDIF ()
ELSE ()
SET(Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA_DEFAULT OFF)
ENDIF ()
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA
KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
"Whether Kokkos allows use of lambdas at the outer level of parallel dispatch (that is, as the argument to an outer parallel_for, parallel_reduce, or parallel_scan). This requires C++11. It also does not currently work with public releases of CUDA. As a result, even if C++11 is enabled, this will be OFF by default if CUDA is enabled. If this option is ON, the macro KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA will be defined. For compatibility with Kokkos' Makefile build system, it is also possible to define that macro on the command line."
${Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA_DEFAULT}
)
TRIBITS_CONFIGURE_FILE(${PACKAGE_NAME}_config.h)
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
#-----------------------------------------------------------------------------
SET(TRILINOS_INCDIR ${CMAKE_INSTALL_PREFIX}/${${PROJECT_NAME}_INSTALL_INCLUDE_DIR})
#-----------------------------------------------------------------------------
SET(HEADERS_PUBLIC "")
SET(HEADERS_PRIVATE "")
SET(SOURCES "")
FILE(GLOB HEADERS_PUBLIC Kokkos*.hpp)
LIST( APPEND HEADERS_PUBLIC ${CMAKE_CURRENT_BINARY_DIR}/${PACKAGE_NAME}_config.h )
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_IMPL impl/*.hpp)
FILE(GLOB SOURCES_IMPL impl/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_IMPL} )
LIST(APPEND SOURCES ${SOURCES_IMPL} )
INSTALL(FILES ${HEADERS_IMPL} DESTINATION ${TRILINOS_INCDIR}/impl/)
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_THREADS Threads/*.hpp)
FILE(GLOB SOURCES_THREADS Threads/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_THREADS} )
LIST(APPEND SOURCES ${SOURCES_THREADS} )
INSTALL(FILES ${HEADERS_THREADS} DESTINATION ${TRILINOS_INCDIR}/Threads/)
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_OPENMP OpenMP/*.hpp)
FILE(GLOB SOURCES_OPENMP OpenMP/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_OPENMP} )
LIST(APPEND SOURCES ${SOURCES_OPENMP} )
INSTALL(FILES ${HEADERS_OPENMP} DESTINATION ${TRILINOS_INCDIR}/OpenMP/)
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_CUDA Cuda/*.hpp)
FILE(GLOB SOURCES_CUDA Cuda/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_CUDA} )
LIST(APPEND SOURCES ${SOURCES_CUDA} )
INSTALL(FILES ${HEADERS_CUDA} DESTINATION ${TRILINOS_INCDIR}/Cuda/)
#-----------------------------------------------------------------------------
-FILE(GLOB HEADERS_QTHREAD Qthread/*.hpp)
-FILE(GLOB SOURCES_QTHREAD Qthread/*.cpp)
+FILE(GLOB HEADERS_QTHREADS Qthreads/*.hpp)
+FILE(GLOB SOURCES_QTHREADS Qthreads/*.cpp)
-LIST(APPEND HEADERS_PRIVATE ${HEADERS_QTHREAD} )
-LIST(APPEND SOURCES ${SOURCES_QTHREAD} )
+LIST(APPEND HEADERS_PRIVATE ${HEADERS_QTHREADS} )
+LIST(APPEND SOURCES ${SOURCES_QTHREADS} )
-INSTALL(FILES ${HEADERS_QTHREAD} DESTINATION ${TRILINOS_INCDIR}/Qthread/)
+INSTALL(FILES ${HEADERS_QTHREADS} DESTINATION ${TRILINOS_INCDIR}/Qthreads/)
#-----------------------------------------------------------------------------
TRIBITS_ADD_LIBRARY(
kokkoscore
HEADERS ${HEADERS_PUBLIC}
NOINSTALLHEADERS ${HEADERS_PRIVATE}
SOURCES ${SOURCES}
DEPLIBS
)
-
-
diff --git a/lib/kokkos/core/src/Cuda/KokkosExp_Cuda_IterateTile.hpp b/lib/kokkos/core/src/Cuda/KokkosExp_Cuda_IterateTile.hpp
new file mode 100644
index 000000000..e0eadb25a
--- /dev/null
+++ b/lib/kokkos/core/src/Cuda/KokkosExp_Cuda_IterateTile.hpp
@@ -0,0 +1,1300 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#ifndef KOKKOS_CUDA_EXP_ITERATE_TILE_HPP
+#define KOKKOS_CUDA_EXP_ITERATE_TILE_HPP
+
+#include <iostream>
+#include <algorithm>
+#include <stdio.h>
+
+#include <Kokkos_Macros.hpp>
+
+/* only compile this file if CUDA is enabled for Kokkos */
+#if defined( __CUDACC__ ) && defined( KOKKOS_HAVE_CUDA )
+
+#include <utility>
+
+//#include<Cuda/Kokkos_CudaExec.hpp>
+// Including the file above, leads to following type of errors:
+// /home/ndellin/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp(84): error: incomplete type is not allowed
+// As a result, recreate cuda_parallel_launch and associated code
+
+#if defined(KOKKOS_ENABLE_PROFILING)
+#include <impl/Kokkos_Profiling_Interface.hpp>
+#include <typeinfo>
+#endif
+
+namespace Kokkos { namespace Experimental { namespace Impl {
+
+// ------------------------------------------------------------------ //
+
+template< class DriverType >
+__global__
+static void cuda_parallel_launch( const DriverType driver )
+{
+ driver();
+}
+
+template< class DriverType >
+struct CudaLaunch
+{
+ inline
+ CudaLaunch( const DriverType & driver
+ , const dim3 & grid
+ , const dim3 & block
+ )
+ {
+ cuda_parallel_launch< DriverType ><<< grid , block >>>(driver);
+ }
+
+};
+
+// ------------------------------------------------------------------ //
+template< int N , typename RP , typename Functor , typename Tag >
+struct apply_impl;
+
+//Rank 2
+// Specializations for void tag type
+template< typename RP , typename Functor >
+struct apply_impl<2,RP,Functor,void >
+{
+ using index_type = typename RP::index_type;
+
+ __device__
+ apply_impl( const RP & rp_ , const Functor & f_ )
+ : m_rp(rp_)
+ , m_func(f_)
+ {}
+
+ inline __device__
+ void exec_range() const
+ {
+// LL
+ if (RP::inner_direction == RP::Left) {
+ /*
+ index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
+ index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
+
+ for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
+ for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
+ m_func(i, j);
+ } }
+*/
+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
+
+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
+ m_func(offset_0 , offset_1);
+ }
+ }
+ }
+ }
+ }
+// LR
+ else {
+/*
+ index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
+ index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
+
+ for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
+ for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
+ m_func(i, j);
+ } }
+*/
+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
+
+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
+ m_func(offset_0 , offset_1);
+ }
+ }
+ }
+ }
+ }
+
+ } //end exec_range
+
+private:
+ const RP & m_rp;
+ const Functor & m_func;
+
+};
+
+// Specializations for tag type
+template< typename RP , typename Functor , typename Tag >
+struct apply_impl<2,RP,Functor,Tag>
+{
+ using index_type = typename RP::index_type;
+
+ inline __device__
+ apply_impl( const RP & rp_ , const Functor & f_ )
+ : m_rp(rp_)
+ , m_func(f_)
+ {}
+
+ inline __device__
+ void exec_range() const
+ {
+ if (RP::inner_direction == RP::Left) {
+ // Loop over size maxnumblocks until full range covered
+/*
+ index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
+ index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
+
+ for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
+ for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
+ m_func(Tag(), i, j);
+ } }
+*/
+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
+
+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
+ m_func(Tag(), offset_0 , offset_1);
+ }
+ }
+ }
+ }
+ }
+ else {
+/*
+ index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
+ index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
+
+ for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
+ for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
+ m_func(Tag(), i, j);
+ } }
+*/
+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
+
+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
+ m_func(Tag(), offset_0 , offset_1);
+ }
+ }
+ }
+ }
+ }
+
+ } //end exec_range
+
+private:
+ const RP & m_rp;
+ const Functor & m_func;
+};
+
+
+//Rank 3
+// Specializations for void tag type
+template< typename RP , typename Functor >
+struct apply_impl<3,RP,Functor,void >
+{
+ using index_type = typename RP::index_type;
+
+ __device__
+ apply_impl( const RP & rp_ , const Functor & f_ )
+ : m_rp(rp_)
+ , m_func(f_)
+ {}
+
+ inline __device__
+ void exec_range() const
+ {
+// LL
+ if (RP::inner_direction == RP::Left) {
+ for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
+
+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
+
+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
+ m_func(offset_0 , offset_1 , offset_2);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+// LR
+ else {
+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
+
+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
+
+ for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
+ m_func(offset_0 , offset_1 , offset_2);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ } //end exec_range
+
+private:
+ const RP & m_rp;
+ const Functor & m_func;
+};
+
+// Specializations for void tag type
+template< typename RP , typename Functor , typename Tag >
+struct apply_impl<3,RP,Functor,Tag>
+{
+ using index_type = typename RP::index_type;
+
+ inline __device__
+ apply_impl( const RP & rp_ , const Functor & f_ )
+ : m_rp(rp_)
+ , m_func(f_)
+ {}
+
+ inline __device__
+ void exec_range() const
+ {
+ if (RP::inner_direction == RP::Left) {
+ for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
+
+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
+
+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
+ m_func(Tag(), offset_0 , offset_1 , offset_2);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ else {
+ for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
+ const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
+ if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
+
+ for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
+ if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
+
+ for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
+ m_func(Tag(), offset_0 , offset_1 , offset_2);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ } //end exec_range
+
+private:
+ const RP & m_rp;
+ const Functor & m_func;
+};
+
+
+//Rank 4
+// Specializations for void tag type
+template< typename RP , typename Functor >
+struct apply_impl<4,RP,Functor,void >
+{
+ using index_type = typename RP::index_type;
+
+ __device__
+ apply_impl( const RP & rp_ , const Functor & f_ )
+ : m_rp(rp_)
+ , m_func(f_)
+ {}
+
+ static constexpr index_type max_blocks = 65535;
+
+ inline __device__
+ void exec_range() const
+ {
+// LL
+ if (RP::inner_direction == RP::Left) {
+ const index_type temp0 = m_rp.m_tile_end[0];
+ const index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x % numbl0;
+ const index_type tile_id1 = blockIdx.x / numbl0;
+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
+
+ for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
+ const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
+ if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
+
+ for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
+
+ for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+ m_func(offset_0 , offset_1 , offset_2 , offset_3);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+// LR
+ else {
+ const index_type temp0 = m_rp.m_tile_end[0];
+ const index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x / numbl1;
+ const index_type tile_id1 = blockIdx.x % numbl1;
+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
+
+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+
+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
+
+ for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
+ const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
+ if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
+ m_func(offset_0 , offset_1 , offset_2 , offset_3);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ } //end exec_range
+
+private:
+ const RP & m_rp;
+ const Functor & m_func;
+};
+
+// Specializations for void tag type
+template< typename RP , typename Functor , typename Tag >
+struct apply_impl<4,RP,Functor,Tag>
+{
+ using index_type = typename RP::index_type;
+
+ inline __device__
+ apply_impl( const RP & rp_ , const Functor & f_ )
+ : m_rp(rp_)
+ , m_func(f_)
+ {}
+
+ static constexpr index_type max_blocks = 65535;
+
+ inline __device__
+ void exec_range() const
+ {
+ if (RP::inner_direction == RP::Left) {
+ const index_type temp0 = m_rp.m_tile_end[0];
+ const index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x % numbl0;
+ const index_type tile_id1 = blockIdx.x / numbl0;
+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
+
+ for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
+ const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
+ if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
+
+ for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
+
+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+ m_func(Tag(), offset_0 , offset_1 , offset_2 , offset_3);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ else {
+ const index_type temp0 = m_rp.m_tile_end[0];
+ const index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x / numbl1;
+ const index_type tile_id1 = blockIdx.x % numbl1;
+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
+
+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+
+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = tile_id1*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
+ const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
+ if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
+
+ for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
+ const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
+ if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
+ m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ } //end exec_range
+
+private:
+ const RP & m_rp;
+ const Functor & m_func;
+};
+
+
+//Rank 5
+// Specializations for void tag type
+template< typename RP , typename Functor >
+struct apply_impl<5,RP,Functor,void >
+{
+ using index_type = typename RP::index_type;
+
+ __device__
+ apply_impl( const RP & rp_ , const Functor & f_ )
+ : m_rp(rp_)
+ , m_func(f_)
+ {}
+
+ static constexpr index_type max_blocks = 65535;
+
+ inline __device__
+ void exec_range() const
+ {
+// LL
+ if (RP::inner_direction == RP::Left) {
+
+ index_type temp0 = m_rp.m_tile_end[0];
+ index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x % numbl0;
+ const index_type tile_id1 = blockIdx.x / numbl0;
+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
+
+ temp0 = m_rp.m_tile_end[2];
+ temp1 = m_rp.m_tile_end[3];
+ const index_type numbl2 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl3 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl2 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id2 = blockIdx.y % numbl2;
+ const index_type tile_id3 = blockIdx.y / numbl2;
+ const index_type thr_id2 = threadIdx.y % m_rp.m_tile[2];
+ const index_type thr_id3 = threadIdx.y / m_rp.m_tile[2];
+
+ for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
+ const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
+ if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
+
+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
+
+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
+
+ for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+ m_func(offset_0 , offset_1 , offset_2 , offset_3, offset_4);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+// LR
+ else {
+ index_type temp0 = m_rp.m_tile_end[0];
+ index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x / numbl1;
+ const index_type tile_id1 = blockIdx.x % numbl1;
+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
+
+ temp0 = m_rp.m_tile_end[2];
+ temp1 = m_rp.m_tile_end[3];
+ const index_type numbl3 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl2 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl3 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id2 = blockIdx.y / numbl3;
+ const index_type tile_id3 = blockIdx.y % numbl3;
+ const index_type thr_id2 = threadIdx.y / m_rp.m_tile[3];
+ const index_type thr_id3 = threadIdx.y % m_rp.m_tile[3];
+
+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+
+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
+
+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
+
+ for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
+ const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
+ if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
+ m_func(offset_0 , offset_1 , offset_2 , offset_3 , offset_4);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ } //end exec_range
+
+private:
+ const RP & m_rp;
+ const Functor & m_func;
+};
+
+// Specializations for tag type
+template< typename RP , typename Functor , typename Tag >
+struct apply_impl<5,RP,Functor,Tag>
+{
+ using index_type = typename RP::index_type;
+
+ __device__
+ apply_impl( const RP & rp_ , const Functor & f_ )
+ : m_rp(rp_)
+ , m_func(f_)
+ {}
+
+ static constexpr index_type max_blocks = 65535;
+
+ inline __device__
+ void exec_range() const
+ {
+// LL
+ if (RP::inner_direction == RP::Left) {
+ index_type temp0 = m_rp.m_tile_end[0];
+ index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x % numbl0;
+ const index_type tile_id1 = blockIdx.x / numbl0;
+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
+
+ temp0 = m_rp.m_tile_end[2];
+ temp1 = m_rp.m_tile_end[3];
+ const index_type numbl2 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl3 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl2 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id2 = blockIdx.y % numbl2;
+ const index_type tile_id3 = blockIdx.y / numbl2;
+ const index_type thr_id2 = threadIdx.y % m_rp.m_tile[2];
+ const index_type thr_id3 = threadIdx.y / m_rp.m_tile[2];
+
+ for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
+ const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
+ if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
+
+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
+
+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
+
+ for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+ m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3, offset_4);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+// LR
+ else {
+ index_type temp0 = m_rp.m_tile_end[0];
+ index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x / numbl1;
+ const index_type tile_id1 = blockIdx.x % numbl1;
+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
+
+ temp0 = m_rp.m_tile_end[2];
+ temp1 = m_rp.m_tile_end[3];
+ const index_type numbl3 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl2 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl3 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id2 = blockIdx.y / numbl3;
+ const index_type tile_id3 = blockIdx.y % numbl3;
+ const index_type thr_id2 = threadIdx.y / m_rp.m_tile[3];
+ const index_type thr_id3 = threadIdx.y % m_rp.m_tile[3];
+
+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+
+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
+
+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
+
+ for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
+ const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
+ if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
+ m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3 , offset_4);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ } //end exec_range
+
+private:
+ const RP & m_rp;
+ const Functor & m_func;
+};
+
+
+//Rank 6
+// Specializations for void tag type
+template< typename RP , typename Functor >
+struct apply_impl<6,RP,Functor,void >
+{
+ using index_type = typename RP::index_type;
+
+ __device__
+ apply_impl( const RP & rp_ , const Functor & f_ )
+ : m_rp(rp_)
+ , m_func(f_)
+ {}
+
+ static constexpr index_type max_blocks = 65535;
+
+ inline __device__
+ void exec_range() const
+ {
+// LL
+ if (RP::inner_direction == RP::Left) {
+ index_type temp0 = m_rp.m_tile_end[0];
+ index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x % numbl0;
+ const index_type tile_id1 = blockIdx.x / numbl0;
+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
+
+ temp0 = m_rp.m_tile_end[2];
+ temp1 = m_rp.m_tile_end[3];
+ const index_type numbl2 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl3 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl2 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id2 = blockIdx.y % numbl2;
+ const index_type tile_id3 = blockIdx.y / numbl2;
+ const index_type thr_id2 = threadIdx.y % m_rp.m_tile[2];
+ const index_type thr_id3 = threadIdx.y / m_rp.m_tile[2];
+
+ temp0 = m_rp.m_tile_end[4];
+ temp1 = m_rp.m_tile_end[5];
+ const index_type numbl4 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl5 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl4 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id4 = blockIdx.z % numbl4;
+ const index_type tile_id5 = blockIdx.z / numbl4;
+ const index_type thr_id4 = threadIdx.z % m_rp.m_tile[4];
+ const index_type thr_id5 = threadIdx.z / m_rp.m_tile[4];
+
+ for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
+ const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
+ if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
+
+ for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
+ const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
+ if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
+
+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
+
+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
+
+ for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+ m_func(offset_0 , offset_1 , offset_2 , offset_3, offset_4, offset_5);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+// LR
+ else {
+ index_type temp0 = m_rp.m_tile_end[0];
+ index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x / numbl1;
+ const index_type tile_id1 = blockIdx.x % numbl1;
+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
+
+ temp0 = m_rp.m_tile_end[2];
+ temp1 = m_rp.m_tile_end[3];
+ const index_type numbl3 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl2 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl3 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id2 = blockIdx.y / numbl3;
+ const index_type tile_id3 = blockIdx.y % numbl3;
+ const index_type thr_id2 = threadIdx.y / m_rp.m_tile[3];
+ const index_type thr_id3 = threadIdx.y % m_rp.m_tile[3];
+
+ temp0 = m_rp.m_tile_end[4];
+ temp1 = m_rp.m_tile_end[5];
+ const index_type numbl5 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl4 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl5 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id4 = blockIdx.z / numbl5;
+ const index_type tile_id5 = blockIdx.z % numbl5;
+ const index_type thr_id4 = threadIdx.z / m_rp.m_tile[5];
+ const index_type thr_id5 = threadIdx.z % m_rp.m_tile[5];
+
+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+
+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
+
+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
+
+ for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
+ const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
+ if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
+
+ for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
+ const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
+ if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
+ m_func(offset_0 , offset_1 , offset_2 , offset_3 , offset_4 , offset_5);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ } //end exec_range
+
+private:
+ const RP & m_rp;
+ const Functor & m_func;
+};
+
+// Specializations for tag type
+template< typename RP , typename Functor , typename Tag >
+struct apply_impl<6,RP,Functor,Tag>
+{
+ using index_type = typename RP::index_type;
+
+ __device__
+ apply_impl( const RP & rp_ , const Functor & f_ )
+ : m_rp(rp_)
+ , m_func(f_)
+ {}
+
+ static constexpr index_type max_blocks = 65535;
+
+ inline __device__
+ void exec_range() const
+ {
+// LL
+ if (RP::inner_direction == RP::Left) {
+ index_type temp0 = m_rp.m_tile_end[0];
+ index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl0 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl1 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl0 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x % numbl0;
+ const index_type tile_id1 = blockIdx.x / numbl0;
+ const index_type thr_id0 = threadIdx.x % m_rp.m_tile[0];
+ const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
+
+ temp0 = m_rp.m_tile_end[2];
+ temp1 = m_rp.m_tile_end[3];
+ const index_type numbl2 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl3 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl2 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id2 = blockIdx.y % numbl2;
+ const index_type tile_id3 = blockIdx.y / numbl2;
+ const index_type thr_id2 = threadIdx.y % m_rp.m_tile[2];
+ const index_type thr_id3 = threadIdx.y / m_rp.m_tile[2];
+
+ temp0 = m_rp.m_tile_end[4];
+ temp1 = m_rp.m_tile_end[5];
+ const index_type numbl4 = ( temp0 <= max_blocks ? temp0 : max_blocks ) ;
+ const index_type numbl5 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl4 ) :
+ ( temp1 <= max_blocks ? temp1 : max_blocks ) );
+
+ const index_type tile_id4 = blockIdx.z % numbl4;
+ const index_type tile_id5 = blockIdx.z / numbl4;
+ const index_type thr_id4 = threadIdx.z % m_rp.m_tile[4];
+ const index_type thr_id5 = threadIdx.z / m_rp.m_tile[4];
+
+ for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
+ const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
+ if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
+
+ for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
+ const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
+ if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
+
+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
+
+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
+
+ for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+ m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3, offset_4, offset_5);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+// LR
+ else {
+ index_type temp0 = m_rp.m_tile_end[0];
+ index_type temp1 = m_rp.m_tile_end[1];
+ const index_type numbl1 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl0 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl1 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id0 = blockIdx.x / numbl1;
+ const index_type tile_id1 = blockIdx.x % numbl1;
+ const index_type thr_id0 = threadIdx.x / m_rp.m_tile[1];
+ const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
+
+ temp0 = m_rp.m_tile_end[2];
+ temp1 = m_rp.m_tile_end[3];
+ const index_type numbl3 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl2 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl3 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id2 = blockIdx.y / numbl3;
+ const index_type tile_id3 = blockIdx.y % numbl3;
+ const index_type thr_id2 = threadIdx.y / m_rp.m_tile[3];
+ const index_type thr_id3 = threadIdx.y % m_rp.m_tile[3];
+
+ temp0 = m_rp.m_tile_end[4];
+ temp1 = m_rp.m_tile_end[5];
+ const index_type numbl5 = ( temp1 <= max_blocks ? temp1 : max_blocks ) ;
+ const index_type numbl4 = ( temp0*temp1 > max_blocks ? index_type( max_blocks / numbl5 ) :
+ ( temp0 <= max_blocks ? temp0 : max_blocks ) );
+
+ const index_type tile_id4 = blockIdx.z / numbl5;
+ const index_type tile_id5 = blockIdx.z % numbl5;
+ const index_type thr_id4 = threadIdx.z / m_rp.m_tile[5];
+ const index_type thr_id5 = threadIdx.z % m_rp.m_tile[5];
+
+ for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
+ const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
+ if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
+
+ for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
+ const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
+ if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
+
+ for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
+ const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
+ if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
+
+ for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
+ const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
+ if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
+
+ for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
+ const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
+ if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
+
+ for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
+ const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
+ if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
+ m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3 , offset_4 , offset_5);
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ } //end exec_range
+
+private:
+ const RP & m_rp;
+ const Functor & m_func;
+};
+
+// ----------------------------------------------------------------------------------
+
+template < typename RP
+ , typename Functor
+ , typename Tag
+ >
+struct DeviceIterateTile
+{
+ using index_type = typename RP::index_type;
+ using array_index_type = typename RP::array_index_type;
+ using point_type = typename RP::point_type;
+
+ struct VoidDummy {};
+ typedef typename std::conditional< std::is_same<Tag, void>::value, VoidDummy, Tag>::type usable_tag;
+
+ DeviceIterateTile( const RP & rp, const Functor & func )
+ : m_rp{rp}
+ , m_func{func}
+ {}
+
+private:
+ inline __device__
+ void apply() const
+ {
+ apply_impl<RP::rank,RP,Functor,Tag>(m_rp,m_func).exec_range();
+ } //end apply
+
+public:
+
+ inline
+ __device__
+ void operator()(void) const
+ {
+ this-> apply();
+ }
+
+ inline
+ void execute() const
+ {
+ const array_index_type maxblocks = 65535; //not true for blockIdx.x for newer archs
+ if ( RP::rank == 2 )
+ {
+ const dim3 block( m_rp.m_tile[0] , m_rp.m_tile[1] , 1);
+ const dim3 grid(
+ std::min( ( m_rp.m_upper[0] - m_rp.m_lower[0] + block.x - 1 ) / block.x , maxblocks )
+ , std::min( ( m_rp.m_upper[1] - m_rp.m_lower[1] + block.y - 1 ) / block.y , maxblocks )
+ , 1
+ );
+ CudaLaunch< DeviceIterateTile >( *this , grid , block );
+ }
+ else if ( RP::rank == 3 )
+ {
+ const dim3 block( m_rp.m_tile[0] , m_rp.m_tile[1] , m_rp.m_tile[2] );
+ const dim3 grid(
+ std::min( ( m_rp.m_upper[0] - m_rp.m_lower[0] + block.x - 1 ) / block.x , maxblocks )
+ , std::min( ( m_rp.m_upper[1] - m_rp.m_lower[1] + block.y - 1 ) / block.y , maxblocks )
+ , std::min( ( m_rp.m_upper[2] - m_rp.m_lower[2] + block.z - 1 ) / block.z , maxblocks )
+ );
+ CudaLaunch< DeviceIterateTile >( *this , grid , block );
+ }
+ else if ( RP::rank == 4 )
+ {
+ // id0,id1 encoded within threadIdx.x; id2 to threadIdx.y; id3 to threadIdx.z
+ const dim3 block( m_rp.m_tile[0]*m_rp.m_tile[1] , m_rp.m_tile[2] , m_rp.m_tile[3] );
+ const dim3 grid(
+ std::min( static_cast<index_type>( m_rp.m_tile_end[0] * m_rp.m_tile_end[1] )
+ , static_cast<index_type>(maxblocks) )
+ , std::min( ( m_rp.m_upper[2] - m_rp.m_lower[2] + block.y - 1 ) / block.y , maxblocks )
+ , std::min( ( m_rp.m_upper[3] - m_rp.m_lower[3] + block.z - 1 ) / block.z , maxblocks )
+ );
+ CudaLaunch< DeviceIterateTile >( *this , grid , block );
+ }
+ else if ( RP::rank == 5 )
+ {
+ // id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4 to threadIdx.z
+ const dim3 block( m_rp.m_tile[0]*m_rp.m_tile[1] , m_rp.m_tile[2]*m_rp.m_tile[3] , m_rp.m_tile[4] );
+ const dim3 grid(
+ std::min( static_cast<index_type>( m_rp.m_tile_end[0] * m_rp.m_tile_end[1] )
+ , static_cast<index_type>(maxblocks) )
+ , std::min( static_cast<index_type>( m_rp.m_tile_end[2] * m_rp.m_tile_end[3] )
+ , static_cast<index_type>(maxblocks) )
+ , std::min( ( m_rp.m_upper[4] - m_rp.m_lower[4] + block.z - 1 ) / block.z , maxblocks )
+ );
+ CudaLaunch< DeviceIterateTile >( *this , grid , block );
+ }
+ else if ( RP::rank == 6 )
+ {
+ // id0,id1 encoded within threadIdx.x; id2,id3 to threadIdx.y; id4,id5 to threadIdx.z
+ const dim3 block( m_rp.m_tile[0]*m_rp.m_tile[1] , m_rp.m_tile[2]*m_rp.m_tile[3] , m_rp.m_tile[4]*m_rp.m_tile[5] );
+ const dim3 grid(
+ std::min( static_cast<index_type>( m_rp.m_tile_end[0] * m_rp.m_tile_end[1] )
+ , static_cast<index_type>(maxblocks) )
+ , std::min( static_cast<index_type>( m_rp.m_tile_end[2] * m_rp.m_tile_end[3] )
+ , static_cast<index_type>(maxblocks) )
+ , std::min( static_cast<index_type>( m_rp.m_tile_end[4] * m_rp.m_tile_end[5] )
+ , static_cast<index_type>(maxblocks) )
+ );
+ CudaLaunch< DeviceIterateTile >( *this , grid , block );
+ }
+ else
+ {
+ printf("Kokkos::MDRange Error: Exceeded rank bounds with Cuda\n");
+ Kokkos::abort("Aborting");
+ }
+
+ } //end execute
+
+protected:
+ const RP m_rp;
+ const Functor m_func;
+};
+
+} } } //end namespace Kokkos::Experimental::Impl
+
+#endif
+#endif
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp b/lib/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp
index 0a0f41686..a273db998 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp
@@ -1,318 +1,321 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDAEXEC_HPP
#define KOKKOS_CUDAEXEC_HPP
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#ifdef KOKKOS_ENABLE_CUDA
#include <string>
#include <Kokkos_Parallel.hpp>
#include <impl/Kokkos_Error.hpp>
#include <Cuda/Kokkos_Cuda_abort.hpp>
#include <Cuda/Kokkos_Cuda_Error.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
struct CudaTraits {
enum { WarpSize = 32 /* 0x0020 */ };
enum { WarpIndexMask = 0x001f /* Mask for warpindex */ };
enum { WarpIndexShift = 5 /* WarpSize == 1 << WarpShift */ };
enum { SharedMemoryBanks = 32 /* Compute device 2.0 */ };
enum { SharedMemoryCapacity = 0x0C000 /* 48k shared / 16k L1 Cache */ };
enum { SharedMemoryUsage = 0x04000 /* 16k shared / 48k L1 Cache */ };
enum { UpperBoundGridCount = 65535 /* Hard upper bound */ };
enum { ConstantMemoryCapacity = 0x010000 /* 64k bytes */ };
enum { ConstantMemoryUsage = 0x008000 /* 32k bytes */ };
enum { ConstantMemoryCache = 0x002000 /* 8k bytes */ };
typedef unsigned long
ConstantGlobalBufferType[ ConstantMemoryUsage / sizeof(unsigned long) ];
enum { ConstantMemoryUseThreshold = 0x000200 /* 512 bytes */ };
KOKKOS_INLINE_FUNCTION static
CudaSpace::size_type warp_count( CudaSpace::size_type i )
{ return ( i + WarpIndexMask ) >> WarpIndexShift ; }
KOKKOS_INLINE_FUNCTION static
CudaSpace::size_type warp_align( CudaSpace::size_type i )
{
enum { Mask = ~CudaSpace::size_type( WarpIndexMask ) };
return ( i + WarpIndexMask ) & Mask ;
}
};
//----------------------------------------------------------------------------
CudaSpace::size_type cuda_internal_multiprocessor_count();
CudaSpace::size_type cuda_internal_maximum_warp_count();
CudaSpace::size_type cuda_internal_maximum_grid_count();
CudaSpace::size_type cuda_internal_maximum_shared_words();
CudaSpace::size_type * cuda_internal_scratch_flags( const CudaSpace::size_type size );
CudaSpace::size_type * cuda_internal_scratch_space( const CudaSpace::size_type size );
CudaSpace::size_type * cuda_internal_scratch_unified( const CudaSpace::size_type size );
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#if defined( __CUDACC__ )
/** \brief Access to constant memory on the device */
#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
__device__ __constant__
extern unsigned long kokkos_impl_cuda_constant_memory_buffer[] ;
#else
__device__ __constant__
unsigned long kokkos_impl_cuda_constant_memory_buffer[ Kokkos::Impl::CudaTraits::ConstantMemoryUsage / sizeof(unsigned long) ] ;
#endif
namespace Kokkos {
namespace Impl {
struct CudaLockArraysStruct {
int* atomic;
int* scratch;
int* threadid;
+ int n;
};
}
}
__device__ __constant__
#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
extern
#endif
Kokkos::Impl::CudaLockArraysStruct kokkos_impl_cuda_lock_arrays ;
#define CUDA_SPACE_ATOMIC_MASK 0x1FFFF
#define CUDA_SPACE_ATOMIC_XOR_MASK 0x15A39
namespace Kokkos {
namespace Impl {
void* cuda_resize_scratch_space(size_t bytes, bool force_shrink = false);
}
}
namespace Kokkos {
namespace Impl {
__device__ inline
bool lock_address_cuda_space(void* ptr) {
size_t offset = size_t(ptr);
offset = offset >> 2;
offset = offset & CUDA_SPACE_ATOMIC_MASK;
return (0 == atomicCAS(&kokkos_impl_cuda_lock_arrays.atomic[offset],0,1));
}
__device__ inline
void unlock_address_cuda_space(void* ptr) {
size_t offset = size_t(ptr);
offset = offset >> 2;
offset = offset & CUDA_SPACE_ATOMIC_MASK;
atomicExch( &kokkos_impl_cuda_lock_arrays.atomic[ offset ], 0);
}
}
}
template< typename T >
inline
__device__
T * kokkos_impl_cuda_shared_memory()
{ extern __shared__ Kokkos::CudaSpace::size_type sh[]; return (T*) sh ; }
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
// See section B.17 of Cuda C Programming Guide Version 3.2
// for discussion of
// __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
// function qualifier which could be used to improve performance.
//----------------------------------------------------------------------------
// Maximize L1 cache and minimize shared memory:
// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1 );
// For 2.0 capability: 48 KB L1 and 16 KB shared
//----------------------------------------------------------------------------
template< class DriverType >
__global__
static void cuda_parallel_launch_constant_memory()
{
const DriverType & driver =
*((const DriverType *) kokkos_impl_cuda_constant_memory_buffer );
driver();
}
template< class DriverType >
__global__
static void cuda_parallel_launch_local_memory( const DriverType driver )
{
driver();
}
template < class DriverType ,
bool Large = ( CudaTraits::ConstantMemoryUseThreshold < sizeof(DriverType) ) >
struct CudaParallelLaunch ;
template < class DriverType >
struct CudaParallelLaunch< DriverType , true > {
inline
CudaParallelLaunch( const DriverType & driver
, const dim3 & grid
, const dim3 & block
, const int shmem
, const cudaStream_t stream = 0 )
{
if ( grid.x && ( block.x * block.y * block.z ) ) {
if ( sizeof( Kokkos::Impl::CudaTraits::ConstantGlobalBufferType ) <
sizeof( DriverType ) ) {
Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: Functor is too large") );
}
// Fence before changing settings and copying closure
Kokkos::Cuda::fence();
if ( CudaTraits::SharedMemoryCapacity < shmem ) {
Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: shared memory request is too large") );
}
#ifndef KOKKOS_ARCH_KEPLER //On Kepler the L1 has no benefit since it doesn't cache reads
else if ( shmem ) {
CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_constant_memory< DriverType > , cudaFuncCachePreferShared ) );
} else {
CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_constant_memory< DriverType > , cudaFuncCachePreferL1 ) );
}
#endif
// Copy functor to constant memory on the device
cudaMemcpyToSymbol( kokkos_impl_cuda_constant_memory_buffer , & driver , sizeof(DriverType) );
#ifndef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
Kokkos::Impl::CudaLockArraysStruct locks;
locks.atomic = atomic_lock_array_cuda_space_ptr(false);
locks.scratch = scratch_lock_array_cuda_space_ptr(false);
locks.threadid = threadid_lock_array_cuda_space_ptr(false);
+ locks.n = Kokkos::Cuda::concurrency();
cudaMemcpyToSymbol( kokkos_impl_cuda_lock_arrays , & locks , sizeof(CudaLockArraysStruct) );
#endif
// Invoke the driver function on the device
cuda_parallel_launch_constant_memory< DriverType ><<< grid , block , shmem , stream >>>();
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
CUDA_SAFE_CALL( cudaGetLastError() );
Kokkos::Cuda::fence();
#endif
}
}
};
template < class DriverType >
struct CudaParallelLaunch< DriverType , false > {
inline
CudaParallelLaunch( const DriverType & driver
, const dim3 & grid
, const dim3 & block
, const int shmem
, const cudaStream_t stream = 0 )
{
if ( grid.x && ( block.x * block.y * block.z ) ) {
if ( CudaTraits::SharedMemoryCapacity < shmem ) {
Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: shared memory request is too large") );
}
#ifndef KOKKOS_ARCH_KEPLER //On Kepler the L1 has no benefit since it doesn't cache reads
else if ( shmem ) {
CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_local_memory< DriverType > , cudaFuncCachePreferShared ) );
} else {
CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_local_memory< DriverType > , cudaFuncCachePreferL1 ) );
}
#endif
#ifndef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
Kokkos::Impl::CudaLockArraysStruct locks;
locks.atomic = atomic_lock_array_cuda_space_ptr(false);
locks.scratch = scratch_lock_array_cuda_space_ptr(false);
locks.threadid = threadid_lock_array_cuda_space_ptr(false);
+ locks.n = Kokkos::Cuda::concurrency();
cudaMemcpyToSymbol( kokkos_impl_cuda_lock_arrays , & locks , sizeof(CudaLockArraysStruct) );
#endif
cuda_parallel_launch_local_memory< DriverType ><<< grid , block , shmem , stream >>>( driver );
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
CUDA_SAFE_CALL( cudaGetLastError() );
Kokkos::Cuda::fence();
#endif
}
}
};
//----------------------------------------------------------------------------
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* defined( __CUDACC__ ) */
#endif /* defined( KOKKOS_ENABLE_CUDA ) */
#endif /* #ifndef KOKKOS_CUDAEXEC_HPP */
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp b/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
index 91a3c9213..303b3fa4f 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
@@ -1,914 +1,915 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <stdlib.h>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <algorithm>
#include <atomic>
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#ifdef KOKKOS_ENABLE_CUDA
#include <Kokkos_Core.hpp>
#include <Kokkos_Cuda.hpp>
#include <Kokkos_CudaSpace.hpp>
#include <Cuda/Kokkos_Cuda_Internal.hpp>
#include <impl/Kokkos_Error.hpp>
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
#include <impl/Kokkos_Profiling_Interface.hpp>
#endif
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
namespace {
static std::atomic<int> num_uvm_allocations(0) ;
cudaStream_t get_deep_copy_stream() {
static cudaStream_t s = 0;
if( s == 0) {
cudaStreamCreate ( &s );
}
return s;
}
}
DeepCopy<CudaSpace,CudaSpace,Cuda>::DeepCopy( void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpy( dst , src , n , cudaMemcpyDefault ) ); }
DeepCopy<HostSpace,CudaSpace,Cuda>::DeepCopy( void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpy( dst , src , n , cudaMemcpyDefault ) ); }
DeepCopy<CudaSpace,HostSpace,Cuda>::DeepCopy( void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpy( dst , src , n , cudaMemcpyDefault ) ); }
DeepCopy<CudaSpace,CudaSpace,Cuda>::DeepCopy( const Cuda & instance , void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , instance.cuda_stream() ) ); }
DeepCopy<HostSpace,CudaSpace,Cuda>::DeepCopy( const Cuda & instance , void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , instance.cuda_stream() ) ); }
DeepCopy<CudaSpace,HostSpace,Cuda>::DeepCopy( const Cuda & instance , void * dst , const void * src , size_t n )
{ CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , instance.cuda_stream() ) ); }
void DeepCopyAsyncCuda( void * dst , const void * src , size_t n) {
cudaStream_t s = get_deep_copy_stream();
CUDA_SAFE_CALL( cudaMemcpyAsync( dst , src , n , cudaMemcpyDefault , s ) );
cudaStreamSynchronize(s);
}
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
void CudaSpace::access_error()
{
const std::string msg("Kokkos::CudaSpace::access_error attempt to execute Cuda function from non-Cuda space" );
Kokkos::Impl::throw_runtime_exception( msg );
}
void CudaSpace::access_error( const void * const )
{
const std::string msg("Kokkos::CudaSpace::access_error attempt to execute Cuda function from non-Cuda space" );
Kokkos::Impl::throw_runtime_exception( msg );
}
/*--------------------------------------------------------------------------*/
bool CudaUVMSpace::available()
{
#if defined( CUDA_VERSION ) && ( 6000 <= CUDA_VERSION ) && !defined(__APPLE__)
enum { UVM_available = true };
#else
enum { UVM_available = false };
#endif
return UVM_available;
}
/*--------------------------------------------------------------------------*/
int CudaUVMSpace::number_of_allocations()
{
return Kokkos::Impl::num_uvm_allocations.load();
}
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
CudaSpace::CudaSpace()
: m_device( Kokkos::Cuda().cuda_device() )
{
}
CudaUVMSpace::CudaUVMSpace()
: m_device( Kokkos::Cuda().cuda_device() )
{
}
CudaHostPinnedSpace::CudaHostPinnedSpace()
{
}
void * CudaSpace::allocate( const size_t arg_alloc_size ) const
{
void * ptr = NULL;
CUDA_SAFE_CALL( cudaMalloc( &ptr, arg_alloc_size ) );
return ptr ;
}
void * CudaUVMSpace::allocate( const size_t arg_alloc_size ) const
{
void * ptr = NULL;
enum { max_uvm_allocations = 65536 };
- if ( arg_alloc_size > 0 )
+ if ( arg_alloc_size > 0 )
{
Kokkos::Impl::num_uvm_allocations++;
if ( Kokkos::Impl::num_uvm_allocations.load() > max_uvm_allocations ) {
Kokkos::Impl::throw_runtime_exception( "CudaUVM error: The maximum limit of UVM allocations exceeded (currently 65536)." ) ;
}
CUDA_SAFE_CALL( cudaMallocManaged( &ptr, arg_alloc_size , cudaMemAttachGlobal ) );
- }
+ }
return ptr ;
}
void * CudaHostPinnedSpace::allocate( const size_t arg_alloc_size ) const
{
void * ptr = NULL;
CUDA_SAFE_CALL( cudaHostAlloc( &ptr, arg_alloc_size , cudaHostAllocDefault ) );
return ptr ;
}
void CudaSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
{
try {
CUDA_SAFE_CALL( cudaFree( arg_alloc_ptr ) );
} catch(...) {}
}
void CudaUVMSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
{
try {
if ( arg_alloc_ptr != nullptr ) {
Kokkos::Impl::num_uvm_allocations--;
CUDA_SAFE_CALL( cudaFree( arg_alloc_ptr ) );
}
} catch(...) {}
}
void CudaHostPinnedSpace::deallocate( void * const arg_alloc_ptr , const size_t /* arg_alloc_size */ ) const
{
try {
CUDA_SAFE_CALL( cudaFreeHost( arg_alloc_ptr ) );
} catch(...) {}
}
constexpr const char* CudaSpace::name() {
return m_name;
}
constexpr const char* CudaUVMSpace::name() {
return m_name;
}
constexpr const char* CudaHostPinnedSpace::name() {
return m_name;
}
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
SharedAllocationRecord< void , void >
SharedAllocationRecord< Kokkos::CudaSpace , void >::s_root_record ;
SharedAllocationRecord< void , void >
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::s_root_record ;
SharedAllocationRecord< void , void >
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::s_root_record ;
::cudaTextureObject_t
SharedAllocationRecord< Kokkos::CudaSpace , void >::
attach_texture_object( const unsigned sizeof_alias
, void * const alloc_ptr
, size_t const alloc_size )
{
enum { TEXTURE_BOUND_1D = 1u << 27 };
if ( ( alloc_ptr == 0 ) || ( sizeof_alias * TEXTURE_BOUND_1D <= alloc_size ) ) {
std::ostringstream msg ;
msg << "Kokkos::CudaSpace ERROR: Cannot attach texture object to"
<< " alloc_ptr(" << alloc_ptr << ")"
<< " alloc_size(" << alloc_size << ")"
<< " max_size(" << ( sizeof_alias * TEXTURE_BOUND_1D ) << ")" ;
std::cerr << msg.str() << std::endl ;
std::cerr.flush();
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
::cudaTextureObject_t tex_obj ;
struct cudaResourceDesc resDesc ;
struct cudaTextureDesc texDesc ;
memset( & resDesc , 0 , sizeof(resDesc) );
memset( & texDesc , 0 , sizeof(texDesc) );
resDesc.resType = cudaResourceTypeLinear ;
resDesc.res.linear.desc = ( sizeof_alias == 4 ? cudaCreateChannelDesc< int >() :
( sizeof_alias == 8 ? cudaCreateChannelDesc< ::int2 >() :
/* sizeof_alias == 16 */ cudaCreateChannelDesc< ::int4 >() ) );
resDesc.res.linear.sizeInBytes = alloc_size ;
resDesc.res.linear.devPtr = alloc_ptr ;
CUDA_SAFE_CALL( cudaCreateTextureObject( & tex_obj , & resDesc, & texDesc, NULL ) );
return tex_obj ;
}
std::string
SharedAllocationRecord< Kokkos::CudaSpace , void >::get_label() const
{
SharedAllocationHeader header ;
Kokkos::Impl::DeepCopy< Kokkos::HostSpace , Kokkos::CudaSpace >( & header , RecordBase::head() , sizeof(SharedAllocationHeader) );
return std::string( header.m_label );
}
std::string
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_label() const
{
return std::string( RecordBase::head()->m_label );
}
std::string
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_label() const
{
return std::string( RecordBase::head()->m_label );
}
SharedAllocationRecord< Kokkos::CudaSpace , void > *
SharedAllocationRecord< Kokkos::CudaSpace , void >::
allocate( const Kokkos::CudaSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
)
{
return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
}
SharedAllocationRecord< Kokkos::CudaUVMSpace , void > *
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
allocate( const Kokkos::CudaUVMSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
)
{
return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
}
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > *
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
allocate( const Kokkos::CudaHostPinnedSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
)
{
return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
}
void
SharedAllocationRecord< Kokkos::CudaSpace , void >::
deallocate( SharedAllocationRecord< void , void > * arg_rec )
{
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
void
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
deallocate( SharedAllocationRecord< void , void > * arg_rec )
{
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
void
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
deallocate( SharedAllocationRecord< void , void > * arg_rec )
{
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
SharedAllocationRecord< Kokkos::CudaSpace , void >::
~SharedAllocationRecord()
{
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
SharedAllocationHeader header ;
Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>::DeepCopy( & header , RecordBase::m_alloc_ptr , sizeof(SharedAllocationHeader) );
Kokkos::Profiling::deallocateData(
Kokkos::Profiling::SpaceHandle(Kokkos::CudaSpace::name()),header.m_label,
data(),size());
}
#endif
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
, SharedAllocationRecord< void , void >::m_alloc_size
);
}
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
~SharedAllocationRecord()
{
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::fence(); //Make sure I can access the label ...
Kokkos::Profiling::deallocateData(
Kokkos::Profiling::SpaceHandle(Kokkos::CudaUVMSpace::name()),RecordBase::m_alloc_ptr->m_label,
data(),size());
}
#endif
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
, SharedAllocationRecord< void , void >::m_alloc_size
);
}
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
~SharedAllocationRecord()
{
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::deallocateData(
Kokkos::Profiling::SpaceHandle(Kokkos::CudaHostPinnedSpace::name()),RecordBase::m_alloc_ptr->m_label,
data(),size());
}
#endif
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
, SharedAllocationRecord< void , void >::m_alloc_size
);
}
SharedAllocationRecord< Kokkos::CudaSpace , void >::
SharedAllocationRecord( const Kokkos::CudaSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const SharedAllocationRecord< void , void >::function_type arg_dealloc
)
// Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function
: SharedAllocationRecord< void , void >
( & SharedAllocationRecord< Kokkos::CudaSpace , void >::s_root_record
, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
, sizeof(SharedAllocationHeader) + arg_alloc_size
, arg_dealloc
)
, m_tex_obj( 0 )
, m_space( arg_space )
{
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
}
#endif
SharedAllocationHeader header ;
// Fill in the Header information
header.m_record = static_cast< SharedAllocationRecord< void , void > * >( this );
strncpy( header.m_label
, arg_label.c_str()
, SharedAllocationHeader::maximum_label_length
);
// Copy to device memory
Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>::DeepCopy( RecordBase::m_alloc_ptr , & header , sizeof(SharedAllocationHeader) );
}
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
SharedAllocationRecord( const Kokkos::CudaUVMSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const SharedAllocationRecord< void , void >::function_type arg_dealloc
)
// Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function
: SharedAllocationRecord< void , void >
( & SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::s_root_record
, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
, sizeof(SharedAllocationHeader) + arg_alloc_size
, arg_dealloc
)
, m_tex_obj( 0 )
, m_space( arg_space )
{
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
}
#endif
// Fill in the Header information, directly accessible via UVM
RecordBase::m_alloc_ptr->m_record = this ;
strncpy( RecordBase::m_alloc_ptr->m_label
, arg_label.c_str()
, SharedAllocationHeader::maximum_label_length
);
}
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
SharedAllocationRecord( const Kokkos::CudaHostPinnedSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const SharedAllocationRecord< void , void >::function_type arg_dealloc
)
// Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function
: SharedAllocationRecord< void , void >
( & SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::s_root_record
, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
, sizeof(SharedAllocationHeader) + arg_alloc_size
, arg_dealloc
)
, m_space( arg_space )
{
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
}
#endif
// Fill in the Header information, directly accessible via UVM
RecordBase::m_alloc_ptr->m_record = this ;
strncpy( RecordBase::m_alloc_ptr->m_label
, arg_label.c_str()
, SharedAllocationHeader::maximum_label_length
);
}
//----------------------------------------------------------------------------
void * SharedAllocationRecord< Kokkos::CudaSpace , void >::
allocate_tracked( const Kokkos::CudaSpace & arg_space
, const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
if ( ! arg_alloc_size ) return (void *) 0 ;
SharedAllocationRecord * const r =
allocate( arg_space , arg_alloc_label , arg_alloc_size );
RecordBase::increment( r );
return r->data();
}
void SharedAllocationRecord< Kokkos::CudaSpace , void >::
deallocate_tracked( void * const arg_alloc_ptr )
{
if ( arg_alloc_ptr != 0 ) {
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
RecordBase::decrement( r );
}
}
void * SharedAllocationRecord< Kokkos::CudaSpace , void >::
reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size )
{
SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );
Kokkos::Impl::DeepCopy<CudaSpace,CudaSpace>( r_new->data() , r_old->data()
, std::min( r_old->size() , r_new->size() ) );
RecordBase::increment( r_new );
RecordBase::decrement( r_old );
return r_new->data();
}
void * SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
allocate_tracked( const Kokkos::CudaUVMSpace & arg_space
, const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
if ( ! arg_alloc_size ) return (void *) 0 ;
SharedAllocationRecord * const r =
allocate( arg_space , arg_alloc_label , arg_alloc_size );
RecordBase::increment( r );
return r->data();
}
void SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
deallocate_tracked( void * const arg_alloc_ptr )
{
if ( arg_alloc_ptr != 0 ) {
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
RecordBase::decrement( r );
}
}
void * SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size )
{
SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );
Kokkos::Impl::DeepCopy<CudaUVMSpace,CudaUVMSpace>( r_new->data() , r_old->data()
, std::min( r_old->size() , r_new->size() ) );
RecordBase::increment( r_new );
RecordBase::decrement( r_old );
return r_new->data();
}
void * SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
allocate_tracked( const Kokkos::CudaHostPinnedSpace & arg_space
, const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
if ( ! arg_alloc_size ) return (void *) 0 ;
SharedAllocationRecord * const r =
allocate( arg_space , arg_alloc_label , arg_alloc_size );
RecordBase::increment( r );
return r->data();
}
void SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
deallocate_tracked( void * const arg_alloc_ptr )
{
if ( arg_alloc_ptr != 0 ) {
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
RecordBase::decrement( r );
}
}
void * SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size )
{
SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );
Kokkos::Impl::DeepCopy<CudaHostPinnedSpace,CudaHostPinnedSpace>( r_new->data() , r_old->data()
, std::min( r_old->size() , r_new->size() ) );
RecordBase::increment( r_new );
RecordBase::decrement( r_old );
return r_new->data();
}
//----------------------------------------------------------------------------
SharedAllocationRecord< Kokkos::CudaSpace , void > *
SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record( void * alloc_ptr )
{
using Header = SharedAllocationHeader ;
using RecordBase = SharedAllocationRecord< void , void > ;
using RecordCuda = SharedAllocationRecord< Kokkos::CudaSpace , void > ;
#if 0
// Copy the header from the allocation
Header head ;
Header const * const head_cuda = alloc_ptr ? Header::get_header( alloc_ptr ) : (Header*) 0 ;
if ( alloc_ptr ) {
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , head_cuda , sizeof(SharedAllocationHeader) );
}
RecordCuda * const record = alloc_ptr ? static_cast< RecordCuda * >( head.m_record ) : (RecordCuda *) 0 ;
if ( ! alloc_ptr || record->m_alloc_ptr != head_cuda ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
}
#else
// Iterate the list to search for the record among all allocations
// requires obtaining the root of the list and then locking the list.
RecordCuda * const record = static_cast< RecordCuda * >( RecordBase::find( & s_root_record , alloc_ptr ) );
if ( record == 0 ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record ERROR" ) );
}
#endif
return record ;
}
SharedAllocationRecord< Kokkos::CudaUVMSpace , void > *
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_record( void * alloc_ptr )
{
using Header = SharedAllocationHeader ;
using RecordCuda = SharedAllocationRecord< Kokkos::CudaUVMSpace , void > ;
Header * const h = alloc_ptr ? reinterpret_cast< Header * >( alloc_ptr ) - 1 : (Header *) 0 ;
if ( ! alloc_ptr || h->m_record->m_alloc_ptr != h ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::get_record ERROR" ) );
}
return static_cast< RecordCuda * >( h->m_record );
}
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > *
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record( void * alloc_ptr )
{
using Header = SharedAllocationHeader ;
using RecordCuda = SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > ;
Header * const h = alloc_ptr ? reinterpret_cast< Header * >( alloc_ptr ) - 1 : (Header *) 0 ;
if ( ! alloc_ptr || h->m_record->m_alloc_ptr != h ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record ERROR" ) );
}
return static_cast< RecordCuda * >( h->m_record );
}
// Iterate records to print orphaned memory ...
void
SharedAllocationRecord< Kokkos::CudaSpace , void >::
print_records( std::ostream & s , const Kokkos::CudaSpace & space , bool detail )
{
SharedAllocationRecord< void , void > * r = & s_root_record ;
char buffer[256] ;
SharedAllocationHeader head ;
if ( detail ) {
do {
if ( r->m_alloc_ptr ) {
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , r->m_alloc_ptr , sizeof(SharedAllocationHeader) );
}
else {
head.m_label[0] = 0 ;
}
//Formatting dependent on sizeof(uintptr_t)
const char * format_string;
- if (sizeof(uintptr_t) == sizeof(unsigned long)) {
+ if (sizeof(uintptr_t) == sizeof(unsigned long)) {
format_string = "Cuda addr( 0x%.12lx ) list( 0x%.12lx 0x%.12lx ) extent[ 0x%.12lx + %.8ld ] count(%d) dealloc(0x%.12lx) %s\n";
}
- else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
+ else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
format_string = "Cuda addr( 0x%.12llx ) list( 0x%.12llx 0x%.12llx ) extent[ 0x%.12llx + %.8ld ] count(%d) dealloc(0x%.12llx) %s\n";
}
- snprintf( buffer , 256
+ snprintf( buffer , 256
, format_string
, reinterpret_cast<uintptr_t>( r )
, reinterpret_cast<uintptr_t>( r->m_prev )
, reinterpret_cast<uintptr_t>( r->m_next )
, reinterpret_cast<uintptr_t>( r->m_alloc_ptr )
, r->m_alloc_size
, r->m_count
, reinterpret_cast<uintptr_t>( r->m_dealloc )
, head.m_label
);
std::cout << buffer ;
r = r->m_next ;
} while ( r != & s_root_record );
}
else {
do {
if ( r->m_alloc_ptr ) {
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , r->m_alloc_ptr , sizeof(SharedAllocationHeader) );
//Formatting dependent on sizeof(uintptr_t)
const char * format_string;
- if (sizeof(uintptr_t) == sizeof(unsigned long)) {
+ if (sizeof(uintptr_t) == sizeof(unsigned long)) {
format_string = "Cuda [ 0x%.12lx + %ld ] %s\n";
}
- else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
+ else if (sizeof(uintptr_t) == sizeof(unsigned long long)) {
format_string = "Cuda [ 0x%.12llx + %ld ] %s\n";
}
- snprintf( buffer , 256
+ snprintf( buffer , 256
, format_string
, reinterpret_cast< uintptr_t >( r->data() )
, r->size()
, head.m_label
);
}
else {
snprintf( buffer , 256 , "Cuda [ 0 + 0 ]\n" );
}
std::cout << buffer ;
r = r->m_next ;
} while ( r != & s_root_record );
}
}
void
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
print_records( std::ostream & s , const Kokkos::CudaUVMSpace & space , bool detail )
{
SharedAllocationRecord< void , void >::print_host_accessible_records( s , "CudaUVM" , & s_root_record , detail );
}
void
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
print_records( std::ostream & s , const Kokkos::CudaHostPinnedSpace & space , bool detail )
{
SharedAllocationRecord< void , void >::print_host_accessible_records( s , "CudaHostPinned" , & s_root_record , detail );
}
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace {
__global__ void init_lock_array_kernel_atomic() {
unsigned i = blockIdx.x*blockDim.x + threadIdx.x;
if(i<CUDA_SPACE_ATOMIC_MASK+1)
kokkos_impl_cuda_lock_arrays.atomic[i] = 0;
}
__global__ void init_lock_array_kernel_scratch_threadid(int N) {
unsigned i = blockIdx.x*blockDim.x + threadIdx.x;
if(i<N) {
kokkos_impl_cuda_lock_arrays.scratch[i] = 0;
kokkos_impl_cuda_lock_arrays.threadid[i] = 0;
}
}
}
namespace Impl {
int* atomic_lock_array_cuda_space_ptr(bool deallocate) {
static int* ptr = NULL;
if(deallocate) {
cudaFree(ptr);
ptr = NULL;
}
if(ptr==NULL && !deallocate)
cudaMalloc(&ptr,sizeof(int)*(CUDA_SPACE_ATOMIC_MASK+1));
return ptr;
}
int* scratch_lock_array_cuda_space_ptr(bool deallocate) {
static int* ptr = NULL;
if(deallocate) {
cudaFree(ptr);
ptr = NULL;
}
if(ptr==NULL && !deallocate)
cudaMalloc(&ptr,sizeof(int)*(Cuda::concurrency()));
return ptr;
}
int* threadid_lock_array_cuda_space_ptr(bool deallocate) {
static int* ptr = NULL;
if(deallocate) {
cudaFree(ptr);
ptr = NULL;
}
if(ptr==NULL && !deallocate)
cudaMalloc(&ptr,sizeof(int)*(Cuda::concurrency()));
return ptr;
}
void init_lock_arrays_cuda_space() {
static int is_initialized = 0;
if(! is_initialized) {
Kokkos::Impl::CudaLockArraysStruct locks;
locks.atomic = atomic_lock_array_cuda_space_ptr(false);
locks.scratch = scratch_lock_array_cuda_space_ptr(false);
locks.threadid = threadid_lock_array_cuda_space_ptr(false);
+ locks.n = Kokkos::Cuda::concurrency();
cudaMemcpyToSymbol( kokkos_impl_cuda_lock_arrays , & locks , sizeof(CudaLockArraysStruct) );
init_lock_array_kernel_atomic<<<(CUDA_SPACE_ATOMIC_MASK+255)/256,256>>>();
init_lock_array_kernel_scratch_threadid<<<(Kokkos::Cuda::concurrency()+255)/256,256>>>(Kokkos::Cuda::concurrency());
}
}
void* cuda_resize_scratch_space(size_t bytes, bool force_shrink) {
static void* ptr = NULL;
static size_t current_size = 0;
if(current_size == 0) {
current_size = bytes;
ptr = Kokkos::kokkos_malloc<Kokkos::CudaSpace>("CudaSpace::ScratchMemory",current_size);
}
if(bytes > current_size) {
current_size = bytes;
ptr = Kokkos::kokkos_realloc<Kokkos::CudaSpace>(ptr,current_size);
}
if((bytes < current_size) && (force_shrink)) {
current_size = bytes;
Kokkos::kokkos_free<Kokkos::CudaSpace>(ptr);
ptr = Kokkos::kokkos_malloc<Kokkos::CudaSpace>("CudaSpace::ScratchMemory",current_size);
}
return ptr;
}
}
}
#endif // KOKKOS_ENABLE_CUDA
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp
index eeea97049..44d908d10 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp
@@ -1,778 +1,779 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
/*--------------------------------------------------------------------------*/
/* Kokkos interfaces */
#include <Kokkos_Core.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#ifdef KOKKOS_ENABLE_CUDA
#include <Cuda/Kokkos_Cuda_Error.hpp>
#include <Cuda/Kokkos_Cuda_Internal.hpp>
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_Profiling_Interface.hpp>
/*--------------------------------------------------------------------------*/
/* Standard 'C' libraries */
#include <stdlib.h>
/* Standard 'C++' libraries */
#include <vector>
#include <iostream>
#include <sstream>
#include <string>
#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
__device__ __constant__
unsigned long kokkos_impl_cuda_constant_memory_buffer[ Kokkos::Impl::CudaTraits::ConstantMemoryUsage / sizeof(unsigned long) ] ;
__device__ __constant__
Kokkos::Impl::CudaLockArraysStruct kokkos_impl_cuda_lock_arrays ;
#endif
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
namespace {
__global__
void query_cuda_kernel_arch( int * d_arch )
{
#if defined( __CUDA_ARCH__ )
*d_arch = __CUDA_ARCH__ ;
#else
*d_arch = 0 ;
#endif
}
/** Query what compute capability is actually launched to the device: */
int cuda_kernel_arch()
{
int * d_arch = 0 ;
cudaMalloc( (void **) & d_arch , sizeof(int) );
query_cuda_kernel_arch<<<1,1>>>( d_arch );
int arch = 0 ;
cudaMemcpy( & arch , d_arch , sizeof(int) , cudaMemcpyDefault );
cudaFree( d_arch );
return arch ;
}
bool cuda_launch_blocking()
{
const char * env = getenv("CUDA_LAUNCH_BLOCKING");
if (env == 0) return false;
return atoi(env);
}
}
void cuda_device_synchronize()
{
// static const bool launch_blocking = cuda_launch_blocking();
// if (!launch_blocking) {
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
// }
}
void cuda_internal_error_throw( cudaError e , const char * name, const char * file, const int line )
{
std::ostringstream out ;
out << name << " error( " << cudaGetErrorName(e) << "): " << cudaGetErrorString(e);
if (file) {
out << " " << file << ":" << line;
}
throw_runtime_exception( out.str() );
}
//----------------------------------------------------------------------------
// Some significant cuda device properties:
//
// cudaDeviceProp::name : Text label for device
// cudaDeviceProp::major : Device major number
// cudaDeviceProp::minor : Device minor number
// cudaDeviceProp::warpSize : number of threads per warp
// cudaDeviceProp::multiProcessorCount : number of multiprocessors
// cudaDeviceProp::sharedMemPerBlock : capacity of shared memory per block
// cudaDeviceProp::totalConstMem : capacity of constant memory
// cudaDeviceProp::totalGlobalMem : capacity of global memory
// cudaDeviceProp::maxGridSize[3] : maximum grid size
//
// Section 4.4.2.4 of the CUDA Toolkit Reference Manual
//
// struct cudaDeviceProp {
// char name[256];
// size_t totalGlobalMem;
// size_t sharedMemPerBlock;
// int regsPerBlock;
// int warpSize;
// size_t memPitch;
// int maxThreadsPerBlock;
// int maxThreadsDim[3];
// int maxGridSize[3];
// size_t totalConstMem;
// int major;
// int minor;
// int clockRate;
// size_t textureAlignment;
// int deviceOverlap;
// int multiProcessorCount;
// int kernelExecTimeoutEnabled;
// int integrated;
// int canMapHostMemory;
// int computeMode;
// int concurrentKernels;
// int ECCEnabled;
// int pciBusID;
// int pciDeviceID;
// int tccDriver;
// int asyncEngineCount;
// int unifiedAddressing;
// int memoryClockRate;
// int memoryBusWidth;
// int l2CacheSize;
// int maxThreadsPerMultiProcessor;
// };
namespace {
class CudaInternalDevices {
public:
enum { MAXIMUM_DEVICE_COUNT = 64 };
struct cudaDeviceProp m_cudaProp[ MAXIMUM_DEVICE_COUNT ] ;
int m_cudaDevCount ;
CudaInternalDevices();
static const CudaInternalDevices & singleton();
};
CudaInternalDevices::CudaInternalDevices()
{
// See 'cudaSetDeviceFlags' for host-device thread interaction
// Section 4.4.2.6 of the CUDA Toolkit Reference Manual
CUDA_SAFE_CALL (cudaGetDeviceCount( & m_cudaDevCount ) );
if(m_cudaDevCount > MAXIMUM_DEVICE_COUNT) {
Kokkos::abort("Sorry, you have more GPUs per node than we thought anybody would ever have. Please report this to github.com/kokkos/kokkos.");
}
for ( int i = 0 ; i < m_cudaDevCount ; ++i ) {
CUDA_SAFE_CALL( cudaGetDeviceProperties( m_cudaProp + i , i ) );
}
}
const CudaInternalDevices & CudaInternalDevices::singleton()
{
static CudaInternalDevices self ; return self ;
}
}
//----------------------------------------------------------------------------
class CudaInternal {
private:
CudaInternal( const CudaInternal & );
CudaInternal & operator = ( const CudaInternal & );
public:
typedef Cuda::size_type size_type ;
int m_cudaDev ;
int m_cudaArch ;
unsigned m_multiProcCount ;
unsigned m_maxWarpCount ;
unsigned m_maxBlock ;
unsigned m_maxSharedWords ;
size_type m_scratchSpaceCount ;
size_type m_scratchFlagsCount ;
size_type m_scratchUnifiedCount ;
size_type m_scratchUnifiedSupported ;
size_type m_streamCount ;
size_type * m_scratchSpace ;
size_type * m_scratchFlags ;
size_type * m_scratchUnified ;
cudaStream_t * m_stream ;
static int was_initialized;
static int was_finalized;
static CudaInternal & singleton();
int verify_is_initialized( const char * const label ) const ;
int is_initialized() const
{ return 0 != m_scratchSpace && 0 != m_scratchFlags ; }
void initialize( int cuda_device_id , int stream_count );
void finalize();
void print_configuration( std::ostream & ) const ;
~CudaInternal();
CudaInternal()
: m_cudaDev( -1 )
, m_cudaArch( -1 )
, m_multiProcCount( 0 )
, m_maxWarpCount( 0 )
, m_maxBlock( 0 )
, m_maxSharedWords( 0 )
, m_scratchSpaceCount( 0 )
, m_scratchFlagsCount( 0 )
, m_scratchUnifiedCount( 0 )
, m_scratchUnifiedSupported( 0 )
, m_streamCount( 0 )
, m_scratchSpace( 0 )
, m_scratchFlags( 0 )
, m_scratchUnified( 0 )
, m_stream( 0 )
{}
size_type * scratch_space( const size_type size );
size_type * scratch_flags( const size_type size );
size_type * scratch_unified( const size_type size );
};
int CudaInternal::was_initialized = 0;
int CudaInternal::was_finalized = 0;
//----------------------------------------------------------------------------
void CudaInternal::print_configuration( std::ostream & s ) const
{
const CudaInternalDevices & dev_info = CudaInternalDevices::singleton();
#if defined( KOKKOS_ENABLE_CUDA )
s << "macro KOKKOS_ENABLE_CUDA : defined" << std::endl ;
#endif
#if defined( CUDA_VERSION )
s << "macro CUDA_VERSION = " << CUDA_VERSION
<< " = version " << CUDA_VERSION / 1000
<< "." << ( CUDA_VERSION % 1000 ) / 10
<< std::endl ;
#endif
for ( int i = 0 ; i < dev_info.m_cudaDevCount ; ++i ) {
s << "Kokkos::Cuda[ " << i << " ] "
<< dev_info.m_cudaProp[i].name
<< " capability " << dev_info.m_cudaProp[i].major << "." << dev_info.m_cudaProp[i].minor
<< ", Total Global Memory: " << human_memory_size(dev_info.m_cudaProp[i].totalGlobalMem)
<< ", Shared Memory per Block: " << human_memory_size(dev_info.m_cudaProp[i].sharedMemPerBlock);
if ( m_cudaDev == i ) s << " : Selected" ;
s << std::endl ;
}
}
//----------------------------------------------------------------------------
CudaInternal::~CudaInternal()
{
if ( m_stream ||
m_scratchSpace ||
m_scratchFlags ||
m_scratchUnified ) {
std::cerr << "Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize()"
<< std::endl ;
std::cerr.flush();
}
m_cudaDev = -1 ;
m_cudaArch = -1 ;
m_multiProcCount = 0 ;
m_maxWarpCount = 0 ;
m_maxBlock = 0 ;
m_maxSharedWords = 0 ;
m_scratchSpaceCount = 0 ;
m_scratchFlagsCount = 0 ;
m_scratchUnifiedCount = 0 ;
m_scratchUnifiedSupported = 0 ;
m_streamCount = 0 ;
m_scratchSpace = 0 ;
m_scratchFlags = 0 ;
m_scratchUnified = 0 ;
m_stream = 0 ;
}
int CudaInternal::verify_is_initialized( const char * const label ) const
{
if ( m_cudaDev < 0 ) {
std::cerr << "Kokkos::Cuda::" << label << " : ERROR device not initialized" << std::endl ;
}
return 0 <= m_cudaDev ;
}
CudaInternal & CudaInternal::singleton()
{
static CudaInternal self ;
return self ;
}
void CudaInternal::initialize( int cuda_device_id , int stream_count )
{
if ( was_finalized ) Kokkos::abort("Calling Cuda::initialize after Cuda::finalize is illegal\n");
was_initialized = 1;
if ( is_initialized() ) return;
enum { WordSize = sizeof(size_type) };
if ( ! HostSpace::execution_space::is_initialized() ) {
const std::string msg("Cuda::initialize ERROR : HostSpace::execution_space is not initialized");
throw_runtime_exception( msg );
}
const CudaInternalDevices & dev_info = CudaInternalDevices::singleton();
const bool ok_init = 0 == m_scratchSpace || 0 == m_scratchFlags ;
const bool ok_id = 0 <= cuda_device_id &&
cuda_device_id < dev_info.m_cudaDevCount ;
// Need device capability 3.0 or better
const bool ok_dev = ok_id &&
( 3 <= dev_info.m_cudaProp[ cuda_device_id ].major &&
0 <= dev_info.m_cudaProp[ cuda_device_id ].minor );
if ( ok_init && ok_dev ) {
const struct cudaDeviceProp & cudaProp =
dev_info.m_cudaProp[ cuda_device_id ];
m_cudaDev = cuda_device_id ;
CUDA_SAFE_CALL( cudaSetDevice( m_cudaDev ) );
CUDA_SAFE_CALL( cudaDeviceReset() );
Kokkos::Impl::cuda_device_synchronize();
// Query what compute capability architecture a kernel executes:
m_cudaArch = cuda_kernel_arch();
if ( m_cudaArch != cudaProp.major * 100 + cudaProp.minor * 10 ) {
std::cerr << "Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability "
<< ( m_cudaArch / 100 ) << "." << ( ( m_cudaArch % 100 ) / 10 )
<< " on device with compute capability "
<< cudaProp.major << "." << cudaProp.minor
<< " , this will likely reduce potential performance."
<< std::endl ;
}
// number of multiprocessors
m_multiProcCount = cudaProp.multiProcessorCount ;
//----------------------------------
// Maximum number of warps,
// at most one warp per thread in a warp for reduction.
// HCE 2012-February :
// Found bug in CUDA 4.1 that sometimes a kernel launch would fail
// if the thread count == 1024 and a functor is passed to the kernel.
// Copying the kernel to constant memory and then launching with
// thread count == 1024 would work fine.
//
// HCE 2012-October :
// All compute capabilities support at least 16 warps (512 threads).
// However, we have found that 8 warps typically gives better performance.
m_maxWarpCount = 8 ;
// m_maxWarpCount = cudaProp.maxThreadsPerBlock / Impl::CudaTraits::WarpSize ;
if ( Impl::CudaTraits::WarpSize < m_maxWarpCount ) {
m_maxWarpCount = Impl::CudaTraits::WarpSize ;
}
m_maxSharedWords = cudaProp.sharedMemPerBlock / WordSize ;
//----------------------------------
// Maximum number of blocks:
m_maxBlock = cudaProp.maxGridSize[0] ;
//----------------------------------
m_scratchUnifiedSupported = cudaProp.unifiedAddressing ;
if ( ! m_scratchUnifiedSupported ) {
std::cout << "Kokkos::Cuda device "
<< cudaProp.name << " capability "
<< cudaProp.major << "." << cudaProp.minor
<< " does not support unified virtual address space"
<< std::endl ;
}
//----------------------------------
// Multiblock reduction uses scratch flags for counters
// and scratch space for partial reduction values.
// Allocate some initial space. This will grow as needed.
{
const unsigned reduce_block_count = m_maxWarpCount * Impl::CudaTraits::WarpSize ;
(void) scratch_unified( 16 * sizeof(size_type) );
(void) scratch_flags( reduce_block_count * 2 * sizeof(size_type) );
(void) scratch_space( reduce_block_count * 16 * sizeof(size_type) );
}
//----------------------------------
if ( stream_count ) {
m_stream = (cudaStream_t*) ::malloc( stream_count * sizeof(cudaStream_t) );
m_streamCount = stream_count ;
for ( size_type i = 0 ; i < m_streamCount ; ++i ) m_stream[i] = 0 ;
}
}
else {
std::ostringstream msg ;
msg << "Kokkos::Cuda::initialize(" << cuda_device_id << ") FAILED" ;
if ( ! ok_init ) {
msg << " : Already initialized" ;
}
if ( ! ok_id ) {
msg << " : Device identifier out of range "
<< "[0.." << dev_info.m_cudaDevCount << "]" ;
}
else if ( ! ok_dev ) {
msg << " : Device " ;
msg << dev_info.m_cudaProp[ cuda_device_id ].major ;
msg << "." ;
msg << dev_info.m_cudaProp[ cuda_device_id ].minor ;
msg << " has insufficient capability, required 3.0 or better" ;
}
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
#ifdef KOKKOS_ENABLE_CUDA_UVM
if(!cuda_launch_blocking()) {
std::cout << "Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default" << std::endl;
std::cout << " without setting CUDA_LAUNCH_BLOCKING=1." << std::endl;
std::cout << " The code must call Cuda::fence() after each kernel" << std::endl;
- std::cout << " or will likely crash when accessing data on the host." << std::endl;
+ std::cout << " or will likely crash when accessing data on the host." << std::endl;
}
const char * env_force_device_alloc = getenv("CUDA_MANAGED_FORCE_DEVICE_ALLOC");
bool force_device_alloc;
if (env_force_device_alloc == 0) force_device_alloc=false;
else force_device_alloc=atoi(env_force_device_alloc)!=0;
-
+
const char * env_visible_devices = getenv("CUDA_VISIBLE_DEVICES");
bool visible_devices_one=true;
if (env_visible_devices == 0) visible_devices_one=false;
-
+
if(!visible_devices_one && !force_device_alloc) {
std::cout << "Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default" << std::endl;
std::cout << " without setting CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 or " << std::endl;
std::cout << " setting CUDA_VISIBLE_DEVICES." << std::endl;
std::cout << " This could on multi GPU systems lead to severe performance" << std::endl;
std::cout << " penalties." << std::endl;
}
#endif
cudaThreadSetCacheConfig(cudaFuncCachePreferShared);
// Init the array for used for arbitrarily sized atomics
Impl::init_lock_arrays_cuda_space();
#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
Kokkos::Impl::CudaLockArraysStruct locks;
locks.atomic = atomic_lock_array_cuda_space_ptr(false);
locks.scratch = scratch_lock_array_cuda_space_ptr(false);
locks.threadid = threadid_lock_array_cuda_space_ptr(false);
+ locks.n = Kokkos::Cuda::concurrency();
cudaMemcpyToSymbol( kokkos_impl_cuda_lock_arrays , & locks , sizeof(CudaLockArraysStruct) );
#endif
}
//----------------------------------------------------------------------------
typedef Cuda::size_type ScratchGrain[ Impl::CudaTraits::WarpSize ] ;
enum { sizeScratchGrain = sizeof(ScratchGrain) };
Cuda::size_type *
CudaInternal::scratch_flags( const Cuda::size_type size )
{
if ( verify_is_initialized("scratch_flags") && m_scratchFlagsCount * sizeScratchGrain < size ) {
m_scratchFlagsCount = ( size + sizeScratchGrain - 1 ) / sizeScratchGrain ;
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void > Record ;
Record * const r = Record::allocate( Kokkos::CudaSpace()
, "InternalScratchFlags"
, ( sizeof( ScratchGrain ) * m_scratchFlagsCount ) );
Record::increment( r );
m_scratchFlags = reinterpret_cast<size_type *>( r->data() );
CUDA_SAFE_CALL( cudaMemset( m_scratchFlags , 0 , m_scratchFlagsCount * sizeScratchGrain ) );
}
return m_scratchFlags ;
}
Cuda::size_type *
CudaInternal::scratch_space( const Cuda::size_type size )
{
if ( verify_is_initialized("scratch_space") && m_scratchSpaceCount * sizeScratchGrain < size ) {
m_scratchSpaceCount = ( size + sizeScratchGrain - 1 ) / sizeScratchGrain ;
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaSpace , void > Record ;
Record * const r = Record::allocate( Kokkos::CudaSpace()
, "InternalScratchSpace"
, ( sizeof( ScratchGrain ) * m_scratchSpaceCount ) );
Record::increment( r );
m_scratchSpace = reinterpret_cast<size_type *>( r->data() );
}
return m_scratchSpace ;
}
Cuda::size_type *
CudaInternal::scratch_unified( const Cuda::size_type size )
{
if ( verify_is_initialized("scratch_unified") &&
m_scratchUnifiedSupported && m_scratchUnifiedCount * sizeScratchGrain < size ) {
m_scratchUnifiedCount = ( size + sizeScratchGrain - 1 ) / sizeScratchGrain ;
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void > Record ;
Record * const r = Record::allocate( Kokkos::CudaHostPinnedSpace()
, "InternalScratchUnified"
, ( sizeof( ScratchGrain ) * m_scratchUnifiedCount ) );
Record::increment( r );
m_scratchUnified = reinterpret_cast<size_type *>( r->data() );
}
return m_scratchUnified ;
}
//----------------------------------------------------------------------------
void CudaInternal::finalize()
{
was_finalized = 1;
if ( 0 != m_scratchSpace || 0 != m_scratchFlags ) {
- atomic_lock_array_cuda_space_ptr(false);
- scratch_lock_array_cuda_space_ptr(false);
- threadid_lock_array_cuda_space_ptr(false);
+ atomic_lock_array_cuda_space_ptr(true);
+ scratch_lock_array_cuda_space_ptr(true);
+ threadid_lock_array_cuda_space_ptr(true);
if ( m_stream ) {
for ( size_type i = 1 ; i < m_streamCount ; ++i ) {
cudaStreamDestroy( m_stream[i] );
m_stream[i] = 0 ;
}
::free( m_stream );
}
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< CudaSpace > RecordCuda ;
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< CudaHostPinnedSpace > RecordHost ;
RecordCuda::decrement( RecordCuda::get_record( m_scratchFlags ) );
RecordCuda::decrement( RecordCuda::get_record( m_scratchSpace ) );
RecordHost::decrement( RecordHost::get_record( m_scratchUnified ) );
m_cudaDev = -1 ;
m_multiProcCount = 0 ;
m_maxWarpCount = 0 ;
m_maxBlock = 0 ;
m_maxSharedWords = 0 ;
m_scratchSpaceCount = 0 ;
m_scratchFlagsCount = 0 ;
m_scratchUnifiedCount = 0 ;
m_streamCount = 0 ;
m_scratchSpace = 0 ;
m_scratchFlags = 0 ;
m_scratchUnified = 0 ;
m_stream = 0 ;
}
}
//----------------------------------------------------------------------------
Cuda::size_type cuda_internal_multiprocessor_count()
{ return CudaInternal::singleton().m_multiProcCount ; }
Cuda::size_type cuda_internal_maximum_warp_count()
{ return CudaInternal::singleton().m_maxWarpCount ; }
Cuda::size_type cuda_internal_maximum_grid_count()
{ return CudaInternal::singleton().m_maxBlock ; }
Cuda::size_type cuda_internal_maximum_shared_words()
{ return CudaInternal::singleton().m_maxSharedWords ; }
Cuda::size_type * cuda_internal_scratch_space( const Cuda::size_type size )
{ return CudaInternal::singleton().scratch_space( size ); }
Cuda::size_type * cuda_internal_scratch_flags( const Cuda::size_type size )
{ return CudaInternal::singleton().scratch_flags( size ); }
Cuda::size_type * cuda_internal_scratch_unified( const Cuda::size_type size )
{ return CudaInternal::singleton().scratch_unified( size ); }
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
namespace Kokkos {
Cuda::size_type Cuda::detect_device_count()
{ return Impl::CudaInternalDevices::singleton().m_cudaDevCount ; }
int Cuda::concurrency() {
return 131072;
}
int Cuda::is_initialized()
{ return Impl::CudaInternal::singleton().is_initialized(); }
void Cuda::initialize( const Cuda::SelectDevice config , size_t num_instances )
{
Impl::CudaInternal::singleton().initialize( config.cuda_device_id , num_instances );
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::initialize();
#endif
}
std::vector<unsigned>
Cuda::detect_device_arch()
{
const Impl::CudaInternalDevices & s = Impl::CudaInternalDevices::singleton();
std::vector<unsigned> output( s.m_cudaDevCount );
for ( int i = 0 ; i < s.m_cudaDevCount ; ++i ) {
output[i] = s.m_cudaProp[i].major * 100 + s.m_cudaProp[i].minor ;
}
return output ;
}
Cuda::size_type Cuda::device_arch()
{
const int dev_id = Impl::CudaInternal::singleton().m_cudaDev ;
int dev_arch = 0 ;
if ( 0 <= dev_id ) {
const struct cudaDeviceProp & cudaProp =
Impl::CudaInternalDevices::singleton().m_cudaProp[ dev_id ] ;
dev_arch = cudaProp.major * 100 + cudaProp.minor ;
}
return dev_arch ;
}
void Cuda::finalize()
{
Impl::CudaInternal::singleton().finalize();
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::finalize();
#endif
}
Cuda::Cuda()
: m_device( Impl::CudaInternal::singleton().m_cudaDev )
, m_stream( 0 )
{
Impl::CudaInternal::singleton().verify_is_initialized( "Cuda instance constructor" );
}
Cuda::Cuda( const int instance_id )
: m_device( Impl::CudaInternal::singleton().m_cudaDev )
, m_stream(
Impl::CudaInternal::singleton().verify_is_initialized( "Cuda instance constructor" )
? Impl::CudaInternal::singleton().m_stream[ instance_id % Impl::CudaInternal::singleton().m_streamCount ]
: 0 )
{}
void Cuda::print_configuration( std::ostream & s , const bool )
{ Impl::CudaInternal::singleton().print_configuration( s ); }
bool Cuda::sleep() { return false ; }
bool Cuda::wake() { return true ; }
void Cuda::fence()
{
Kokkos::Impl::cuda_device_synchronize();
}
} // namespace Kokkos
#endif // KOKKOS_ENABLE_CUDA
//----------------------------------------------------------------------------
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp
index fa29d732f..56e6a3c1e 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp
@@ -1,1926 +1,1970 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDA_PARALLEL_HPP
#define KOKKOS_CUDA_PARALLEL_HPP
#include <iostream>
#include <algorithm>
#include <stdio.h>
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
#include <utility>
#include <Kokkos_Parallel.hpp>
#include <Cuda/Kokkos_CudaExec.hpp>
#include <Cuda/Kokkos_Cuda_ReduceScan.hpp>
#include <Cuda/Kokkos_Cuda_Internal.hpp>
#include <Kokkos_Vectorization.hpp>
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
#include <impl/Kokkos_Profiling_Interface.hpp>
#include <typeinfo>
#endif
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< typename Type >
struct CudaJoinFunctor {
typedef Type value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
volatile const value_type & input )
{ update += input ; }
};
class CudaTeamMember {
private:
typedef Kokkos::Cuda execution_space ;
typedef execution_space::scratch_memory_space scratch_memory_space ;
void * m_team_reduce ;
scratch_memory_space m_team_shared ;
int m_league_rank ;
int m_league_size ;
public:
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & team_shmem() const
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & team_scratch(const int& level) const
{ return m_team_shared.set_team_thread_mode(level,1,0) ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & thread_scratch(const int& level) const
{ return m_team_shared.set_team_thread_mode(level,team_size(),team_rank()) ; }
KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
KOKKOS_INLINE_FUNCTION int team_rank() const {
#ifdef __CUDA_ARCH__
return threadIdx.y ;
#else
return 1;
#endif
}
KOKKOS_INLINE_FUNCTION int team_size() const {
#ifdef __CUDA_ARCH__
return blockDim.y ;
#else
return 1;
#endif
}
KOKKOS_INLINE_FUNCTION void team_barrier() const {
#ifdef __CUDA_ARCH__
__syncthreads();
#endif
}
template<class ValueType>
KOKKOS_INLINE_FUNCTION void team_broadcast(ValueType& value, const int& thread_id) const {
#ifdef __CUDA_ARCH__
__shared__ ValueType sh_val;
if(threadIdx.x == 0 && threadIdx.y == thread_id) {
sh_val = value;
}
team_barrier();
value = sh_val;
team_barrier();
#endif
}
template< class ValueType, class JoinOp >
KOKKOS_INLINE_FUNCTION
typename JoinOp::value_type team_reduce( const ValueType & value
, const JoinOp & op_in ) const {
#ifdef __CUDA_ARCH__
typedef JoinLambdaAdapter<ValueType,JoinOp> JoinOpFunctor ;
const JoinOpFunctor op(op_in);
ValueType * const base_data = (ValueType *) m_team_reduce ;
__syncthreads(); // Don't write in to shared data until all threads have entered this function
if ( 0 == threadIdx.y ) { base_data[0] = 0 ; }
base_data[ threadIdx.y ] = value ;
Impl::cuda_intra_block_reduce_scan<false,JoinOpFunctor,void>( op , base_data );
return base_data[ blockDim.y - 1 ];
#else
return typename JoinOp::value_type();
#endif
}
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
* with intra-team non-deterministic ordering accumulation.
*
* The global inter-team accumulation value will, at the end of the
* league's parallel execution, be the scan's total.
* Parallel execution ordering of the league's teams is non-deterministic.
* As such the base value for each team's scan operation is similarly
* non-deterministic.
*/
template< typename Type >
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const {
#ifdef __CUDA_ARCH__
Type * const base_data = (Type *) m_team_reduce ;
__syncthreads(); // Don't write in to shared data until all threads have entered this function
if ( 0 == threadIdx.y ) { base_data[0] = 0 ; }
base_data[ threadIdx.y + 1 ] = value ;
Impl::cuda_intra_block_reduce_scan<true,Impl::CudaJoinFunctor<Type>,void>( Impl::CudaJoinFunctor<Type>() , base_data + 1 );
if ( global_accum ) {
if ( blockDim.y == threadIdx.y + 1 ) {
base_data[ blockDim.y ] = atomic_fetch_add( global_accum , base_data[ blockDim.y ] );
}
__syncthreads(); // Wait for atomic
base_data[ threadIdx.y ] += base_data[ blockDim.y ] ;
}
return base_data[ threadIdx.y ];
#else
return Type();
#endif
}
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
*
* The highest rank thread can compute the reduction total as
* reduction_total = dev.team_scan( value ) + value ;
*/
template< typename Type >
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const {
return this->template team_scan<Type>( value , 0 );
}
//----------------------------------------
// Private for the driver
KOKKOS_INLINE_FUNCTION
CudaTeamMember( void * shared
, const int shared_begin
, const int shared_size
, void* scratch_level_1_ptr
, const int scratch_level_1_size
, const int arg_league_rank
, const int arg_league_size )
: m_team_reduce( shared )
, m_team_shared( ((char *)shared) + shared_begin , shared_size, scratch_level_1_ptr, scratch_level_1_size)
, m_league_rank( arg_league_rank )
, m_league_size( arg_league_size )
{}
};
} // namespace Impl
namespace Impl {
template< class ... Properties >
class TeamPolicyInternal< Kokkos::Cuda , Properties ... >: public PolicyTraits<Properties ... >
{
public:
//! Tag this class as a kokkos execution policy
typedef TeamPolicyInternal execution_policy ;
typedef PolicyTraits<Properties ... > traits;
private:
enum { MAX_WARP = 8 };
int m_league_size ;
int m_team_size ;
int m_vector_length ;
int m_team_scratch_size[2] ;
int m_thread_scratch_size[2] ;
int m_chunk_size;
public:
//! Execution space of this execution policy
typedef Kokkos::Cuda execution_space ;
TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
m_league_size = p.m_league_size;
m_team_size = p.m_team_size;
m_vector_length = p.m_vector_length;
m_team_scratch_size[0] = p.m_team_scratch_size[0];
m_team_scratch_size[1] = p.m_team_scratch_size[1];
m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
m_chunk_size = p.m_chunk_size;
return *this;
}
//----------------------------------------
template< class FunctorType >
inline static
int team_size_max( const FunctorType & functor )
{
int n = MAX_WARP * Impl::CudaTraits::WarpSize ;
for ( ; n ; n >>= 1 ) {
const int shmem_size =
/* for global reduce */ Impl::cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,typename traits::work_tag>( functor , n )
/* for team reduce */ + ( n + 2 ) * sizeof(double)
/* for team shared */ + Impl::FunctorTeamShmemSize< FunctorType >::value( functor , n );
if ( shmem_size < Impl::CudaTraits::SharedMemoryCapacity ) break ;
}
return n ;
}
template< class FunctorType >
static int team_size_recommended( const FunctorType & functor )
{ return team_size_max( functor ); }
template< class FunctorType >
static int team_size_recommended( const FunctorType & functor , const int vector_length)
{
int max = team_size_max( functor )/vector_length;
if(max<1) max = 1;
return max;
}
inline static
int vector_length_max()
{ return Impl::CudaTraits::WarpSize; }
//----------------------------------------
inline int vector_length() const { return m_vector_length ; }
inline int team_size() const { return m_team_size ; }
inline int league_size() const { return m_league_size ; }
inline int scratch_size(int level, int team_size_ = -1) const {
if(team_size_<0) team_size_ = m_team_size;
return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level];
}
inline size_t team_scratch_size(int level) const {
return m_team_scratch_size[level];
}
inline size_t thread_scratch_size(int level) const {
return m_thread_scratch_size[level];
}
TeamPolicyInternal()
: m_league_size( 0 )
, m_team_size( 0 )
, m_vector_length( 0 )
, m_team_scratch_size {0,0}
, m_thread_scratch_size {0,0}
, m_chunk_size ( 32 )
{}
/** \brief Specify league size, request team size */
TeamPolicyInternal( execution_space &
, int league_size_
, int team_size_request
, int vector_length_request = 1 )
: m_league_size( league_size_ )
, m_team_size( team_size_request )
, m_vector_length( vector_length_request )
, m_team_scratch_size {0,0}
, m_thread_scratch_size {0,0}
, m_chunk_size ( 32 )
{
// Allow only power-of-two vector_length
if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
}
// Make sure league size is permissable
if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");
// Make sure total block size is permissable
if ( m_team_size * m_vector_length > 1024 ) {
Impl::throw_runtime_exception(std::string("Kokkos::TeamPolicy< Cuda > the team size is too large. Team size x vector length must be smaller than 1024."));
}
}
/** \brief Specify league size, request team size */
TeamPolicyInternal( execution_space &
, int league_size_
, const Kokkos::AUTO_t & /* team_size_request */
, int vector_length_request = 1 )
: m_league_size( league_size_ )
, m_team_size( -1 )
, m_vector_length( vector_length_request )
, m_team_scratch_size {0,0}
, m_thread_scratch_size {0,0}
, m_chunk_size ( 32 )
{
// Allow only power-of-two vector_length
if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
}
// Make sure league size is permissable
if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");
}
TeamPolicyInternal( int league_size_
, int team_size_request
, int vector_length_request = 1 )
: m_league_size( league_size_ )
, m_team_size( team_size_request )
, m_vector_length ( vector_length_request )
, m_team_scratch_size {0,0}
, m_thread_scratch_size {0,0}
, m_chunk_size ( 32 )
{
// Allow only power-of-two vector_length
if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
}
// Make sure league size is permissable
if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");
// Make sure total block size is permissable
if ( m_team_size * m_vector_length > 1024 ) {
Impl::throw_runtime_exception(std::string("Kokkos::TeamPolicy< Cuda > the team size is too large. Team size x vector length must be smaller than 1024."));
}
}
TeamPolicyInternal( int league_size_
, const Kokkos::AUTO_t & /* team_size_request */
, int vector_length_request = 1 )
: m_league_size( league_size_ )
, m_team_size( -1 )
, m_vector_length ( vector_length_request )
, m_team_scratch_size {0,0}
, m_thread_scratch_size {0,0}
, m_chunk_size ( 32 )
{
// Allow only power-of-two vector_length
if ( ! Kokkos::Impl::is_integral_power_of_two( vector_length_request ) ) {
Impl::throw_runtime_exception( "Requested non-power-of-two vector length for TeamPolicy.");
}
// Make sure league size is permissable
if(league_size_ >= int(Impl::cuda_internal_maximum_grid_count()))
Impl::throw_runtime_exception( "Requested too large league_size for TeamPolicy on Cuda execution space.");
}
inline int chunk_size() const { return m_chunk_size ; }
/** \brief set chunk_size to a discrete value*/
inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
TeamPolicyInternal p = *this;
p.m_chunk_size = chunk_size_;
return p;
}
/** \brief set per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
return p;
};
/** \brief set per thread scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
/** \brief set per thread and per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
typedef Kokkos::Impl::CudaTeamMember member_type ;
};
} // namspace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType
, Kokkos::RangePolicy< Traits ... >
, Kokkos::Cuda
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
const FunctorType m_functor ;
const Policy m_policy ;
ParallelFor() = delete ;
ParallelFor & operator = ( const ParallelFor & ) = delete ;
template< class TagType >
inline __device__
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const Member i ) const
{ m_functor( i ); }
template< class TagType >
inline __device__
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const Member i ) const
{ m_functor( TagType() , i ); }
public:
typedef FunctorType functor_type ;
inline
__device__
void operator()(void) const
{
const Member work_stride = blockDim.y * gridDim.x ;
const Member work_end = m_policy.end();
for ( Member
iwork = m_policy.begin() + threadIdx.y + blockDim.y * blockIdx.x ;
iwork < work_end ;
iwork += work_stride ) {
this-> template exec_range< WorkTag >( iwork );
}
}
inline
void execute() const
{
const int nwork = m_policy.end() - m_policy.begin();
const dim3 block( 1 , CudaTraits::WarpSize * cuda_internal_maximum_warp_count(), 1);
const dim3 grid( std::min( ( nwork + block.y - 1 ) / block.y , cuda_internal_maximum_grid_count() ) , 1 , 1);
CudaParallelLaunch< ParallelFor >( *this , grid , block , 0 );
}
ParallelFor( const FunctorType & arg_functor ,
const Policy & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
{ }
};
template< class FunctorType , class ... Properties >
class ParallelFor< FunctorType
, Kokkos::TeamPolicy< Properties ... >
, Kokkos::Cuda
>
{
private:
typedef TeamPolicyInternal< Kokkos::Cuda , Properties ... > Policy ;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
public:
typedef FunctorType functor_type ;
typedef Cuda::size_type size_type ;
private:
// Algorithmic constraints: blockDim.y is a power of two AND blockDim.y == blockDim.z == 1
// shared memory utilization:
//
// [ team reduce space ]
// [ team shared space ]
//
const FunctorType m_functor ;
const size_type m_league_size ;
const size_type m_team_size ;
const size_type m_vector_size ;
const size_type m_shmem_begin ;
const size_type m_shmem_size ;
void* m_scratch_ptr[2] ;
const int m_scratch_size[2] ;
template< class TagType >
__device__ inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_team( const Member & member ) const
{ m_functor( member ); }
template< class TagType >
__device__ inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_team( const Member & member ) const
{ m_functor( TagType() , member ); }
public:
__device__ inline
void operator()(void) const
{
// Iterate this block through the league
+ int threadid = 0;
+ if ( m_scratch_size[1]>0 ) {
+ __shared__ int base_thread_id;
+ if (threadIdx.x==0 && threadIdx.y==0 ) {
+ threadid = ((blockIdx.x*blockDim.z + threadIdx.z) * blockDim.x * blockDim.y) % kokkos_impl_cuda_lock_arrays.n;
+ threadid = ((threadid + blockDim.x * blockDim.y-1)/(blockDim.x * blockDim.y)) * blockDim.x * blockDim.y;
+ if(threadid > kokkos_impl_cuda_lock_arrays.n) threadid-=blockDim.x * blockDim.y;
+ int done = 0;
+ while (!done) {
+ done = (0 == atomicCAS(&kokkos_impl_cuda_lock_arrays.atomic[threadid],0,1));
+ if(!done) {
+ threadid += blockDim.x * blockDim.y;
+ if(threadid > kokkos_impl_cuda_lock_arrays.n) threadid = 0;
+ }
+ }
+ base_thread_id = threadid;
+ }
+ __syncthreads();
+ threadid = base_thread_id;
+ }
+
+
for ( int league_rank = blockIdx.x ; league_rank < m_league_size ; league_rank += gridDim.x ) {
this-> template exec_team< WorkTag >(
typename Policy::member_type( kokkos_impl_cuda_shared_memory<void>()
, m_shmem_begin
, m_shmem_size
- , m_scratch_ptr[1]
+ , (void*) ( ((char*)m_scratch_ptr[1]) + threadid/(blockDim.x*blockDim.y) * m_scratch_size[1])
, m_scratch_size[1]
, league_rank
, m_league_size ) );
}
}
inline
void execute() const
{
const int shmem_size_total = m_shmem_begin + m_shmem_size ;
const dim3 grid( int(m_league_size) , 1 , 1 );
const dim3 block( int(m_vector_size) , int(m_team_size) , 1 );
CudaParallelLaunch< ParallelFor >( *this, grid, block, shmem_size_total ); // copy to device and execute
}
ParallelFor( const FunctorType & arg_functor
, const Policy & arg_policy
)
: m_functor( arg_functor )
, m_league_size( arg_policy.league_size() )
, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
Kokkos::Impl::cuda_get_opt_block_size< ParallelFor >( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length() )
, m_vector_size( arg_policy.vector_length() )
, m_shmem_begin( sizeof(double) * ( m_team_size + 2 ) )
, m_shmem_size( arg_policy.scratch_size(0,m_team_size) + FunctorTeamShmemSize< FunctorType >::value( m_functor , m_team_size ) )
, m_scratch_ptr{NULL,NULL}
, m_scratch_size{arg_policy.scratch_size(0,m_team_size),arg_policy.scratch_size(1,m_team_size)}
{
// Functor's reduce memory, team scan memory, and team shared memory depend upon team size.
m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1]*(Cuda::concurrency()/(m_team_size*m_vector_size)));
const int shmem_size_total = m_shmem_begin + m_shmem_size ;
if ( CudaTraits::SharedMemoryCapacity < shmem_size_total ) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelFor< Cuda > insufficient shared memory"));
}
if ( int(m_team_size) >
int(Kokkos::Impl::cuda_get_max_block_size< ParallelFor >
( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length())) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelFor< Cuda > requested too large team size."));
}
}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ReducerType, class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::RangePolicy< Traits ... >
, ReducerType
, Kokkos::Cuda
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::member_type Member ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
public:
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::value_type value_type ;
typedef typename ValueTraits::reference_type reference_type ;
typedef FunctorType functor_type ;
typedef Cuda::size_type size_type ;
// Algorithmic constraints: blockSize is a power of two AND blockDim.y == blockDim.z == 1
const FunctorType m_functor ;
const Policy m_policy ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
size_type * m_scratch_space ;
size_type * m_scratch_flags ;
size_type * m_unified_space ;
// Shall we use the shfl based reduction or not (only use it for static sized types of more than 128bit
enum { UseShflReduction = ((sizeof(value_type)>2*sizeof(double)) && ValueTraits::StaticValueSize) };
// Some crutch to do function overloading
private:
typedef double DummyShflReductionType;
typedef int DummySHMEMReductionType;
public:
template< class TagType >
__device__ inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const Member & i , reference_type update ) const
{ m_functor( i , update ); }
template< class TagType >
__device__ inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const Member & i , reference_type update ) const
{ m_functor( TagType() , i , update ); }
__device__ inline
void operator() () const {
run(Kokkos::Impl::if_c<UseShflReduction, DummyShflReductionType, DummySHMEMReductionType>::select(1,1.0) );
}
__device__ inline
void run(const DummySHMEMReductionType& ) const
{
const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
word_count( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) / sizeof(size_type) );
{
reference_type value =
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , kokkos_impl_cuda_shared_memory<size_type>() + threadIdx.y * word_count.value );
// Number of blocks is bounded so that the reduction can be limited to two passes.
// Each thread block is given an approximately equal amount of work to perform.
// Accumulate the values for this block.
// The accumulation ordering does not match the final pass, but is arithmatically equivalent.
const WorkRange range( m_policy , blockIdx.x , gridDim.x );
for ( Member iwork = range.begin() + threadIdx.y , iwork_end = range.end() ;
iwork < iwork_end ; iwork += blockDim.y ) {
this-> template exec_range< WorkTag >( iwork , value );
}
}
// Reduce with final value at blockDim.y - 1 location.
if ( cuda_single_inter_block_reduce_scan<false,ReducerTypeFwd,WorkTag>(
ReducerConditional::select(m_functor , m_reducer) , blockIdx.x , gridDim.x ,
kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags ) ) {
// This is the final block with the final result at the final threads' location
size_type * const shared = kokkos_impl_cuda_shared_memory<size_type>() + ( blockDim.y - 1 ) * word_count.value ;
size_type * const global = m_unified_space ? m_unified_space : m_scratch_space ;
if ( threadIdx.y == 0 ) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
}
if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); }
for ( unsigned i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i]; }
}
}
__device__ inline
void run(const DummyShflReductionType&) const
{
value_type value;
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &value);
// Number of blocks is bounded so that the reduction can be limited to two passes.
// Each thread block is given an approximately equal amount of work to perform.
// Accumulate the values for this block.
// The accumulation ordering does not match the final pass, but is arithmatically equivalent.
const WorkRange range( m_policy , blockIdx.x , gridDim.x );
for ( Member iwork = range.begin() + threadIdx.y , iwork_end = range.end() ;
iwork < iwork_end ; iwork += blockDim.y ) {
this-> template exec_range< WorkTag >( iwork , value );
}
pointer_type const result = (pointer_type) (m_unified_space ? m_unified_space : m_scratch_space) ;
int max_active_thread = range.end()-range.begin() < blockDim.y ? range.end() - range.begin():blockDim.y;
max_active_thread = (max_active_thread == 0)?blockDim.y:max_active_thread;
value_type init;
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &init);
if(Impl::cuda_inter_block_reduction<ReducerTypeFwd,ValueJoin,WorkTag>
(value,init,ValueJoin(ReducerConditional::select(m_functor , m_reducer)),m_scratch_space,result,m_scratch_flags,max_active_thread)) {
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
if(id==0) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
*result = value;
}
}
}
// Determine block size constrained by shared memory:
static inline
unsigned local_block_size( const FunctorType & f )
{
unsigned n = CudaTraits::WarpSize * 8 ;
while ( n && CudaTraits::SharedMemoryCapacity < cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( f , n ) ) { n >>= 1 ; }
return n ;
}
inline
void execute()
{
const int nwork = m_policy.end() - m_policy.begin();
if ( nwork ) {
const int block_size = local_block_size( m_functor );
m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) * block_size /* block_size == max block_count */ );
m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) );
m_unified_space = cuda_internal_scratch_unified( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) );
// REQUIRED ( 1 , N , 1 )
const dim3 block( 1 , block_size , 1 );
// Required grid.x <= block.y
const dim3 grid( std::min( int(block.y) , int( ( nwork + block.y - 1 ) / block.y ) ) , 1 , 1 );
const int shmem = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( m_functor , block.y );
CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem ); // copy to device and execute
Cuda::fence();
if ( m_result_ptr ) {
if ( m_unified_space ) {
const int count = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
for ( int i = 0 ; i < count ; ++i ) { m_result_ptr[i] = pointer_type(m_unified_space)[i] ; }
}
else {
const int size = ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) );
DeepCopy<HostSpace,CudaSpace>( m_result_ptr , m_scratch_space , size );
}
}
}
else {
if (m_result_ptr) {
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
}
}
}
template< class HostViewType >
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const HostViewType & arg_result
, typename std::enable_if<
Kokkos::is_view< HostViewType >::value
,void*>::type = NULL)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result.ptr_on_device() )
, m_scratch_space( 0 )
, m_scratch_flags( 0 )
, m_unified_space( 0 )
{ }
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const ReducerType & reducer)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().ptr_on_device() )
, m_scratch_space( 0 )
, m_scratch_flags( 0 )
, m_unified_space( 0 )
{ }
};
//----------------------------------------------------------------------------
template< class FunctorType , class ReducerType, class ... Properties >
class ParallelReduce< FunctorType
, Kokkos::TeamPolicy< Properties ... >
, ReducerType
, Kokkos::Cuda
>
{
private:
typedef TeamPolicyInternal< Kokkos::Cuda, Properties ... > Policy ;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
typedef typename ValueTraits::value_type value_type ;
public:
typedef FunctorType functor_type ;
typedef Cuda::size_type size_type ;
enum { UseShflReduction = (true && ValueTraits::StaticValueSize) };
private:
typedef double DummyShflReductionType;
typedef int DummySHMEMReductionType;
// Algorithmic constraints: blockDim.y is a power of two AND blockDim.y == blockDim.z == 1
// shared memory utilization:
//
// [ global reduce space ]
// [ team reduce space ]
// [ team shared space ]
//
const FunctorType m_functor ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
size_type * m_scratch_space ;
size_type * m_scratch_flags ;
size_type * m_unified_space ;
size_type m_team_begin ;
size_type m_shmem_begin ;
size_type m_shmem_size ;
void* m_scratch_ptr[2] ;
int m_scratch_size[2] ;
const size_type m_league_size ;
const size_type m_team_size ;
const size_type m_vector_size ;
template< class TagType >
__device__ inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_team( const Member & member , reference_type update ) const
{ m_functor( member , update ); }
template< class TagType >
__device__ inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_team( const Member & member , reference_type update ) const
{ m_functor( TagType() , member , update ); }
public:
__device__ inline
void operator() () const {
- run(Kokkos::Impl::if_c<UseShflReduction, DummyShflReductionType, DummySHMEMReductionType>::select(1,1.0) );
+ int threadid = 0;
+ if ( m_scratch_size[1]>0 ) {
+ __shared__ int base_thread_id;
+ if (threadIdx.x==0 && threadIdx.y==0 ) {
+ threadid = ((blockIdx.x*blockDim.z + threadIdx.z) * blockDim.x * blockDim.y) % kokkos_impl_cuda_lock_arrays.n;
+ threadid = ((threadid + blockDim.x * blockDim.y-1)/(blockDim.x * blockDim.y)) * blockDim.x * blockDim.y;
+ if(threadid > kokkos_impl_cuda_lock_arrays.n) threadid-=blockDim.x * blockDim.y;
+ int done = 0;
+ while (!done) {
+ done = (0 == atomicCAS(&kokkos_impl_cuda_lock_arrays.atomic[threadid],0,1));
+ if(!done) {
+ threadid += blockDim.x * blockDim.y;
+ if(threadid > kokkos_impl_cuda_lock_arrays.n) threadid = 0;
+ }
+ }
+ base_thread_id = threadid;
+ }
+ __syncthreads();
+ threadid = base_thread_id;
+ }
+
+ run(Kokkos::Impl::if_c<UseShflReduction, DummyShflReductionType, DummySHMEMReductionType>::select(1,1.0), threadid );
}
__device__ inline
- void run(const DummySHMEMReductionType&) const
+ void run(const DummySHMEMReductionType&, const int& threadid) const
{
const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
word_count( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) / sizeof(size_type) );
reference_type value =
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , kokkos_impl_cuda_shared_memory<size_type>() + threadIdx.y * word_count.value );
// Iterate this block through the league
for ( int league_rank = blockIdx.x ; league_rank < m_league_size ; league_rank += gridDim.x ) {
this-> template exec_team< WorkTag >
( Member( kokkos_impl_cuda_shared_memory<char>() + m_team_begin
, m_shmem_begin
, m_shmem_size
- , m_scratch_ptr[1]
+ , (void*) ( ((char*)m_scratch_ptr[1]) + threadid/(blockDim.x*blockDim.y) * m_scratch_size[1])
, m_scratch_size[1]
, league_rank
, m_league_size )
, value );
}
// Reduce with final value at blockDim.y - 1 location.
if ( cuda_single_inter_block_reduce_scan<false,FunctorType,WorkTag>(
ReducerConditional::select(m_functor , m_reducer) , blockIdx.x , gridDim.x ,
kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags ) ) {
// This is the final block with the final result at the final threads' location
size_type * const shared = kokkos_impl_cuda_shared_memory<size_type>() + ( blockDim.y - 1 ) * word_count.value ;
size_type * const global = m_unified_space ? m_unified_space : m_scratch_space ;
if ( threadIdx.y == 0 ) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
}
if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); }
for ( unsigned i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i]; }
}
}
__device__ inline
- void run(const DummyShflReductionType&) const
+ void run(const DummyShflReductionType&, const int& threadid) const
{
value_type value;
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &value);
// Iterate this block through the league
for ( int league_rank = blockIdx.x ; league_rank < m_league_size ; league_rank += gridDim.x ) {
this-> template exec_team< WorkTag >
( Member( kokkos_impl_cuda_shared_memory<char>() + m_team_begin
, m_shmem_begin
, m_shmem_size
- , m_scratch_ptr[1]
+ , (void*) ( ((char*)m_scratch_ptr[1]) + threadid/(blockDim.x*blockDim.y) * m_scratch_size[1])
, m_scratch_size[1]
, league_rank
, m_league_size )
, value );
}
pointer_type const result = (pointer_type) (m_unified_space ? m_unified_space : m_scratch_space) ;
value_type init;
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &init);
if(Impl::cuda_inter_block_reduction<FunctorType,ValueJoin,WorkTag>
(value,init,ValueJoin(ReducerConditional::select(m_functor , m_reducer)),m_scratch_space,result,m_scratch_flags,blockDim.y)) {
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
if(id==0) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
*result = value;
}
}
}
inline
void execute()
{
const int nwork = m_league_size * m_team_size ;
if ( nwork ) {
const int block_count = UseShflReduction? std::min( m_league_size , size_type(1024) )
:std::min( m_league_size , m_team_size );
m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) * block_count );
m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) );
m_unified_space = cuda_internal_scratch_unified( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) );
const dim3 block( m_vector_size , m_team_size , 1 );
const dim3 grid( block_count , 1 , 1 );
const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;
CudaParallelLaunch< ParallelReduce >( *this, grid, block, shmem_size_total ); // copy to device and execute
Cuda::fence();
if ( m_result_ptr ) {
if ( m_unified_space ) {
const int count = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
for ( int i = 0 ; i < count ; ++i ) { m_result_ptr[i] = pointer_type(m_unified_space)[i] ; }
}
else {
const int size = ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) );
DeepCopy<HostSpace,CudaSpace>( m_result_ptr, m_scratch_space, size );
}
}
}
else {
if (m_result_ptr) {
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
}
}
}
template< class HostViewType >
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const HostViewType & arg_result
, typename std::enable_if<
Kokkos::is_view< HostViewType >::value
,void*>::type = NULL)
: m_functor( arg_functor )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result.ptr_on_device() )
, m_scratch_space( 0 )
, m_scratch_flags( 0 )
, m_unified_space( 0 )
, m_team_begin( 0 )
, m_shmem_begin( 0 )
, m_shmem_size( 0 )
, m_scratch_ptr{NULL,NULL}
, m_league_size( arg_policy.league_size() )
, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
arg_policy.vector_length() )
, m_vector_size( arg_policy.vector_length() )
, m_scratch_size{
arg_policy.scratch_size(0,( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
arg_policy.vector_length() )
), arg_policy.scratch_size(1,( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
arg_policy.vector_length() )
)}
{
// Return Init value if the number of worksets is zero
if( arg_policy.league_size() == 0) {
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , arg_result.ptr_on_device() );
return ;
}
m_team_begin = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( arg_functor , m_team_size );
m_shmem_begin = sizeof(double) * ( m_team_size + 2 );
m_shmem_size = arg_policy.scratch_size(0,m_team_size) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , m_team_size );
m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1]*(Cuda::concurrency()/(m_team_size*m_vector_size)));
m_scratch_size[0] = m_shmem_size;
m_scratch_size[1] = arg_policy.scratch_size(1,m_team_size);
// The global parallel_reduce does not support vector_length other than 1 at the moment
if( (arg_policy.vector_length() > 1) && !UseShflReduction )
Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a vector length of greater than 1 is not currently supported for CUDA for dynamic sized reduction types.");
if( (m_team_size < 32) && !UseShflReduction )
Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a team_size smaller than 32 is not currently supported with CUDA for dynamic sized reduction types.");
// Functor's reduce memory, team scan memory, and team shared memory depend upon team size.
const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;
if (! Kokkos::Impl::is_integral_power_of_two( m_team_size ) && !UseShflReduction ) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > bad team size"));
}
if ( CudaTraits::SharedMemoryCapacity < shmem_size_total ) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > requested too much L0 scratch memory"));
}
- if ( m_team_size >
- Kokkos::Impl::cuda_get_max_block_size< ParallelReduce >
- ( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length()) {
+ if ( unsigned(m_team_size) >
+ unsigned(Kokkos::Impl::cuda_get_max_block_size< ParallelReduce >
+ ( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length())) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > requested too large team size."));
}
}
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const ReducerType & reducer)
: m_functor( arg_functor )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().ptr_on_device() )
, m_scratch_space( 0 )
, m_scratch_flags( 0 )
, m_unified_space( 0 )
, m_team_begin( 0 )
, m_shmem_begin( 0 )
, m_shmem_size( 0 )
, m_scratch_ptr{NULL,NULL}
, m_league_size( arg_policy.league_size() )
, m_team_size( 0 <= arg_policy.team_size() ? arg_policy.team_size() :
Kokkos::Impl::cuda_get_opt_block_size< ParallelReduce >( arg_functor , arg_policy.vector_length(),
arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) /
arg_policy.vector_length() )
, m_vector_size( arg_policy.vector_length() )
{
// Return Init value if the number of worksets is zero
if( arg_policy.league_size() == 0) {
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , m_result_ptr );
return ;
}
m_team_begin = UseShflReduction?0:cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( arg_functor , m_team_size );
m_shmem_begin = sizeof(double) * ( m_team_size + 2 );
m_shmem_size = arg_policy.scratch_size(0,m_team_size) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , m_team_size );
m_scratch_ptr[1] = cuda_resize_scratch_space(m_scratch_size[1]*(Cuda::concurrency()/(m_team_size*m_vector_size)));
m_scratch_size[0] = m_shmem_size;
m_scratch_size[1] = arg_policy.scratch_size(1,m_team_size);
// The global parallel_reduce does not support vector_length other than 1 at the moment
if( (arg_policy.vector_length() > 1) && !UseShflReduction )
Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a vector length of greater than 1 is not currently supported for CUDA for dynamic sized reduction types.");
if( (m_team_size < 32) && !UseShflReduction )
Impl::throw_runtime_exception( "Kokkos::parallel_reduce with a TeamPolicy using a team_size smaller than 32 is not currently supported with CUDA for dynamic sized reduction types.");
// Functor's reduce memory, team scan memory, and team shared memory depend upon team size.
const int shmem_size_total = m_team_begin + m_shmem_begin + m_shmem_size ;
if ( (! Kokkos::Impl::is_integral_power_of_two( m_team_size ) && !UseShflReduction ) ||
CudaTraits::SharedMemoryCapacity < shmem_size_total ) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > bad team size"));
}
if ( int(m_team_size) >
int(Kokkos::Impl::cuda_get_max_block_size< ParallelReduce >
( arg_functor , arg_policy.vector_length(), arg_policy.team_scratch_size(0),arg_policy.thread_scratch_size(0) ) / arg_policy.vector_length())) {
Kokkos::Impl::throw_runtime_exception(std::string("Kokkos::Impl::ParallelReduce< Cuda > requested too large team size."));
}
}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelScan< FunctorType
, Kokkos::RangePolicy< Traits ... >
, Kokkos::Cuda
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::WorkRange WorkRange ;
typedef Kokkos::Impl::FunctorValueTraits< FunctorType, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< FunctorType, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueOps< FunctorType, WorkTag > ValueOps ;
public:
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
typedef FunctorType functor_type ;
typedef Cuda::size_type size_type ;
private:
// Algorithmic constraints:
// (a) blockDim.y is a power of two
// (b) blockDim.y == blockDim.z == 1
// (c) gridDim.x <= blockDim.y * blockDim.y
// (d) gridDim.y == gridDim.z == 1
const FunctorType m_functor ;
const Policy m_policy ;
size_type * m_scratch_space ;
size_type * m_scratch_flags ;
size_type m_final ;
template< class TagType >
__device__ inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const Member & i , reference_type update , const bool final_result ) const
{ m_functor( i , update , final_result ); }
template< class TagType >
__device__ inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const Member & i , reference_type update , const bool final_result ) const
{ m_functor( TagType() , i , update , final_result ); }
//----------------------------------------
__device__ inline
void initial(void) const
{
const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
word_count( ValueTraits::value_size( m_functor ) / sizeof(size_type) );
size_type * const shared_value = kokkos_impl_cuda_shared_memory<size_type>() + word_count.value * threadIdx.y ;
ValueInit::init( m_functor , shared_value );
// Number of blocks is bounded so that the reduction can be limited to two passes.
// Each thread block is given an approximately equal amount of work to perform.
// Accumulate the values for this block.
// The accumulation ordering does not match the final pass, but is arithmatically equivalent.
const WorkRange range( m_policy , blockIdx.x , gridDim.x );
for ( Member iwork = range.begin() + threadIdx.y , iwork_end = range.end() ;
iwork < iwork_end ; iwork += blockDim.y ) {
this-> template exec_range< WorkTag >( iwork , ValueOps::reference( shared_value ) , false );
}
// Reduce and scan, writing out scan of blocks' totals and block-groups' totals.
// Blocks' scan values are written to 'blockIdx.x' location.
// Block-groups' scan values are at: i = ( j * blockDim.y - 1 ) for i < gridDim.x
cuda_single_inter_block_reduce_scan<true,FunctorType,WorkTag>( m_functor , blockIdx.x , gridDim.x , kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags );
}
//----------------------------------------
__device__ inline
void final(void) const
{
const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
word_count( ValueTraits::value_size( m_functor ) / sizeof(size_type) );
// Use shared memory as an exclusive scan: { 0 , value[0] , value[1] , value[2] , ... }
size_type * const shared_data = kokkos_impl_cuda_shared_memory<size_type>();
size_type * const shared_prefix = shared_data + word_count.value * threadIdx.y ;
size_type * const shared_accum = shared_data + word_count.value * ( blockDim.y + 1 );
// Starting value for this thread block is the previous block's total.
if ( blockIdx.x ) {
size_type * const block_total = m_scratch_space + word_count.value * ( blockIdx.x - 1 );
for ( unsigned i = threadIdx.y ; i < word_count.value ; ++i ) { shared_accum[i] = block_total[i] ; }
}
else if ( 0 == threadIdx.y ) {
ValueInit::init( m_functor , shared_accum );
}
const WorkRange range( m_policy , blockIdx.x , gridDim.x );
for ( typename Policy::member_type iwork_base = range.begin(); iwork_base < range.end() ; iwork_base += blockDim.y ) {
const typename Policy::member_type iwork = iwork_base + threadIdx.y ;
__syncthreads(); // Don't overwrite previous iteration values until they are used
ValueInit::init( m_functor , shared_prefix + word_count.value );
// Copy previous block's accumulation total into thread[0] prefix and inclusive scan value of this block
for ( unsigned i = threadIdx.y ; i < word_count.value ; ++i ) {
shared_data[i + word_count.value] = shared_data[i] = shared_accum[i] ;
}
if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); } // Protect against large scan values.
// Call functor to accumulate inclusive scan value for this work item
if ( iwork < range.end() ) {
this-> template exec_range< WorkTag >( iwork , ValueOps::reference( shared_prefix + word_count.value ) , false );
}
// Scan block values into locations shared_data[1..blockDim.y]
cuda_intra_block_reduce_scan<true,FunctorType,WorkTag>( m_functor , typename ValueTraits::pointer_type(shared_data+word_count.value) );
{
size_type * const block_total = shared_data + word_count.value * blockDim.y ;
for ( unsigned i = threadIdx.y ; i < word_count.value ; ++i ) { shared_accum[i] = block_total[i]; }
}
// Call functor with exclusive scan value
if ( iwork < range.end() ) {
this-> template exec_range< WorkTag >( iwork , ValueOps::reference( shared_prefix ) , true );
}
}
}
public:
//----------------------------------------
__device__ inline
void operator()(void) const
{
if ( ! m_final ) {
initial();
}
else {
final();
}
}
// Determine block size constrained by shared memory:
static inline
unsigned local_block_size( const FunctorType & f )
{
// blockDim.y must be power of two = 128 (4 warps) or 256 (8 warps) or 512 (16 warps)
// gridDim.x <= blockDim.y * blockDim.y
//
// 4 warps was 10% faster than 8 warps and 20% faster than 16 warps in unit testing
unsigned n = CudaTraits::WarpSize * 4 ;
while ( n && CudaTraits::SharedMemoryCapacity < cuda_single_inter_block_reduce_scan_shmem<false,FunctorType,WorkTag>( f , n ) ) { n >>= 1 ; }
return n ;
}
inline
void execute()
{
const int nwork = m_policy.end() - m_policy.begin();
if ( nwork ) {
enum { GridMaxComputeCapability_2x = 0x0ffff };
const int block_size = local_block_size( m_functor );
const int grid_max =
( block_size * block_size ) < GridMaxComputeCapability_2x ?
( block_size * block_size ) : GridMaxComputeCapability_2x ;
// At most 'max_grid' blocks:
const int max_grid = std::min( int(grid_max) , int(( nwork + block_size - 1 ) / block_size ));
// How much work per block:
const int work_per_block = ( nwork + max_grid - 1 ) / max_grid ;
// How many block are really needed for this much work:
const int grid_x = ( nwork + work_per_block - 1 ) / work_per_block ;
m_scratch_space = cuda_internal_scratch_space( ValueTraits::value_size( m_functor ) * grid_x );
m_scratch_flags = cuda_internal_scratch_flags( sizeof(size_type) * 1 );
const dim3 grid( grid_x , 1 , 1 );
const dim3 block( 1 , block_size , 1 ); // REQUIRED DIMENSIONS ( 1 , N , 1 )
const int shmem = ValueTraits::value_size( m_functor ) * ( block_size + 2 );
m_final = false ;
CudaParallelLaunch< ParallelScan >( *this, grid, block, shmem ); // copy to device and execute
m_final = true ;
CudaParallelLaunch< ParallelScan >( *this, grid, block, shmem ); // copy to device and execute
}
}
ParallelScan( const FunctorType & arg_functor ,
const Policy & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_scratch_space( 0 )
, m_scratch_flags( 0 )
, m_final( false )
{ }
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template<typename iType>
struct TeamThreadRangeBoundariesStruct<iType,CudaTeamMember> {
typedef iType index_type;
const iType start;
const iType end;
const iType increment;
const CudaTeamMember& thread;
#ifdef __CUDA_ARCH__
__device__ inline
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& count):
start( threadIdx.y ),
end( count ),
increment( blockDim.y ),
thread(thread_)
{}
__device__ inline
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& begin_, const iType& end_):
start( begin_+threadIdx.y ),
end( end_ ),
increment( blockDim.y ),
thread(thread_)
{}
#else
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& count):
start( 0 ),
end( count ),
increment( 1 ),
thread(thread_)
{}
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& begin_, const iType& end_):
start( begin_ ),
end( end_ ),
increment( 1 ),
thread(thread_)
{}
#endif
};
template<typename iType>
struct ThreadVectorRangeBoundariesStruct<iType,CudaTeamMember> {
typedef iType index_type;
const iType start;
const iType end;
const iType increment;
#ifdef __CUDA_ARCH__
__device__ inline
ThreadVectorRangeBoundariesStruct (const CudaTeamMember, const iType& count):
start( threadIdx.x ),
end( count ),
increment( blockDim.x )
{}
__device__ inline
ThreadVectorRangeBoundariesStruct (const iType& count):
start( threadIdx.x ),
end( count ),
increment( blockDim.x )
{}
#else
KOKKOS_INLINE_FUNCTION
ThreadVectorRangeBoundariesStruct (const CudaTeamMember, const iType& count):
start( 0 ),
end( count ),
increment( 1 )
{}
KOKKOS_INLINE_FUNCTION
ThreadVectorRangeBoundariesStruct (const iType& count):
start( 0 ),
end( count ),
increment( 1 )
{}
#endif
};
} // namespace Impl
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >
TeamThreadRange( const Impl::CudaTeamMember & thread, const iType & count ) {
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, count );
}
template< typename iType1, typename iType2 >
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
Impl::CudaTeamMember >
TeamThreadRange( const Impl::CudaTeamMember & thread, const iType1 & begin, const iType2 & end ) {
typedef typename std::common_type< iType1, iType2 >::type iType;
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, iType(begin), iType(end) );
}
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
ThreadVectorRange(const Impl::CudaTeamMember& thread, const iType& count) {
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >(thread,count);
}
KOKKOS_INLINE_FUNCTION
Impl::ThreadSingleStruct<Impl::CudaTeamMember> PerTeam(const Impl::CudaTeamMember& thread) {
return Impl::ThreadSingleStruct<Impl::CudaTeamMember>(thread);
}
KOKKOS_INLINE_FUNCTION
Impl::VectorSingleStruct<Impl::CudaTeamMember> PerThread(const Impl::CudaTeamMember& thread) {
return Impl::VectorSingleStruct<Impl::CudaTeamMember>(thread);
}
} // namespace Kokkos
namespace Kokkos {
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>& loop_boundaries, const Lambda& lambda) {
#ifdef __CUDA_ARCH__
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
#endif
}
/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>& loop_boundaries,
const Lambda & lambda, ValueType& result) {
#ifdef __CUDA_ARCH__
result = ValueType();
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
Impl::cuda_intra_warp_reduction(result,[&] (ValueType& dst, const ValueType& src)
{ dst+=src; });
Impl::cuda_inter_warp_reduction(result,[&] (ValueType& dst, const ValueType& src)
{ dst+=src; });
#endif
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>& loop_boundaries,
const Lambda & lambda, const JoinType& join, ValueType& init_result) {
#ifdef __CUDA_ARCH__
ValueType result = init_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
Impl::cuda_intra_warp_reduction(result, join );
Impl::cuda_inter_warp_reduction(result, join );
init_result = result;
#endif
}
} //namespace Kokkos
namespace Kokkos {
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
loop_boundaries, const Lambda& lambda) {
#ifdef __CUDA_ARCH__
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
#endif
}
-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
+/** \brief Intra-thread vector parallel_reduce.
*
- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
- * val is performed and put into result. This functionality requires C++11 support.*/
+ * Calls lambda(iType i, ValueType & val) for each i=[0..N).
+ *
+ * The range [0..N) is mapped to all vector lanes of
+ * the calling thread and a reduction of val is performed using +=
+ * and output into result.
+ *
+ * The identity value for the += operator is assumed to be the default
+ * constructed value.
+ */
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
- loop_boundaries, const Lambda & lambda, ValueType& result) {
+void parallel_reduce
+ ( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
+ const & loop_boundaries
+ , Lambda const & lambda
+ , ValueType & result )
+{
#ifdef __CUDA_ARCH__
result = ValueType();
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
- if (loop_boundaries.increment > 1)
- result += shfl_down(result, 1,loop_boundaries.increment);
- if (loop_boundaries.increment > 2)
- result += shfl_down(result, 2,loop_boundaries.increment);
- if (loop_boundaries.increment > 4)
- result += shfl_down(result, 4,loop_boundaries.increment);
- if (loop_boundaries.increment > 8)
- result += shfl_down(result, 8,loop_boundaries.increment);
- if (loop_boundaries.increment > 16)
- result += shfl_down(result, 16,loop_boundaries.increment);
-
- result = shfl(result,0,loop_boundaries.increment);
+ Impl::cuda_intra_warp_vector_reduce(
+ Impl::Reducer< ValueType , Impl::ReduceSum< ValueType > >( & result ) );
+
#endif
}
-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
+/** \brief Intra-thread vector parallel_reduce.
*
- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
- * val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
- * The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
- * the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
- * '1 for *'). This functionality requires C++11 support.*/
+ * Calls lambda(iType i, ValueType & val) for each i=[0..N).
+ *
+ * The range [0..N) is mapped to all vector lanes of
+ * the calling thread and a reduction of val is performed
+ * using JoinType::operator()(ValueType& val, const ValueType& update)
+ * and output into result.
+ *
+ * The input value of result must be the identity value for the
+ * reduction operation; e.g., ( 0 , += ) or ( 1 , *= ).
+ */
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
- loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
-
+void parallel_reduce
+ ( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
+ const & loop_boundaries
+ , Lambda const & lambda
+ , JoinType const & join
+ , ValueType & result )
+{
#ifdef __CUDA_ARCH__
- ValueType result = init_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
- if (loop_boundaries.increment > 1)
- join( result, shfl_down(result, 1,loop_boundaries.increment));
- if (loop_boundaries.increment > 2)
- join( result, shfl_down(result, 2,loop_boundaries.increment));
- if (loop_boundaries.increment > 4)
- join( result, shfl_down(result, 4,loop_boundaries.increment));
- if (loop_boundaries.increment > 8)
- join( result, shfl_down(result, 8,loop_boundaries.increment));
- if (loop_boundaries.increment > 16)
- join( result, shfl_down(result, 16,loop_boundaries.increment));
-
- init_result = shfl(result,0,loop_boundaries.increment);
+ Impl::cuda_intra_warp_vector_reduce(
+ Impl::Reducer< ValueType , JoinType >( join , & result ) );
+
#endif
}
/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
* for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
* Depending on the target execution space the operator might be called twice: once with final=false
* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
* "i" needs to be added to val no matter whether final==true or not. In a serial execution
* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
* to the final sum value over all vector lanes.
* This functionality requires C++11 support.*/
template< typename iType, class FunctorType >
KOKKOS_INLINE_FUNCTION
void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
loop_boundaries, const FunctorType & lambda) {
#ifdef __CUDA_ARCH__
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename ValueTraits::value_type value_type ;
value_type scan_val = value_type();
const int VectorLength = blockDim.x;
iType loop_bound = ((loop_boundaries.end+VectorLength-1)/VectorLength) * VectorLength;
for(int _i = threadIdx.x; _i < loop_bound; _i += VectorLength) {
value_type val = value_type();
if(_i<loop_boundaries.end)
lambda(_i , val , false);
value_type tmp = val;
value_type result_i;
if(threadIdx.x%VectorLength == 0)
result_i = tmp;
if (VectorLength > 1) {
const value_type tmp2 = shfl_up(tmp, 1,VectorLength);
if(threadIdx.x > 0)
tmp+=tmp2;
}
if(threadIdx.x%VectorLength == 1)
result_i = tmp;
if (VectorLength > 3) {
const value_type tmp2 = shfl_up(tmp, 2,VectorLength);
if(threadIdx.x > 1)
tmp+=tmp2;
}
if ((threadIdx.x%VectorLength >= 2) &&
(threadIdx.x%VectorLength < 4))
result_i = tmp;
if (VectorLength > 7) {
const value_type tmp2 = shfl_up(tmp, 4,VectorLength);
if(threadIdx.x > 3)
tmp+=tmp2;
}
if ((threadIdx.x%VectorLength >= 4) &&
(threadIdx.x%VectorLength < 8))
result_i = tmp;
if (VectorLength > 15) {
const value_type tmp2 = shfl_up(tmp, 8,VectorLength);
if(threadIdx.x > 7)
tmp+=tmp2;
}
if ((threadIdx.x%VectorLength >= 8) &&
(threadIdx.x%VectorLength < 16))
result_i = tmp;
if (VectorLength > 31) {
const value_type tmp2 = shfl_up(tmp, 16,VectorLength);
if(threadIdx.x > 15)
tmp+=tmp2;
}
if (threadIdx.x%VectorLength >= 16)
result_i = tmp;
val = scan_val + result_i - val;
scan_val += shfl(tmp,VectorLength-1,VectorLength);
if(_i<loop_boundaries.end)
lambda(_i , val , true);
}
#endif
}
}
namespace Kokkos {
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0) lambda();
#endif
}
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0 && threadIdx.y == 0) lambda();
#endif
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda, ValueType& val) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0) lambda(val);
val = shfl(val,0,blockDim.x);
#endif
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::CudaTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0 && threadIdx.y == 0) {
lambda(val);
}
single_struct.team_member.team_broadcast(val,0);
#endif
}
}
namespace Kokkos {
namespace Impl {
template< class FunctorType, class ExecPolicy, class ValueType , class Tag = typename ExecPolicy::work_tag>
struct CudaFunctorAdapter {
const FunctorType f;
typedef ValueType value_type;
CudaFunctorAdapter(const FunctorType& f_):f(f_) {}
__device__ inline
void operator() (typename ExecPolicy::work_tag, const typename ExecPolicy::member_type& i, ValueType& val) const {
//Insert Static Assert with decltype on ValueType equals third argument type of FunctorType::operator()
f(typename ExecPolicy::work_tag(), i,val);
}
};
template< class FunctorType, class ExecPolicy, class ValueType >
struct CudaFunctorAdapter<FunctorType,ExecPolicy,ValueType,void> {
const FunctorType f;
typedef ValueType value_type;
CudaFunctorAdapter(const FunctorType& f_):f(f_) {}
__device__ inline
void operator() (const typename ExecPolicy::member_type& i, ValueType& val) const {
//Insert Static Assert with decltype on ValueType equals second argument type of FunctorType::operator()
f(i,val);
}
__device__ inline
void operator() (typename ExecPolicy::member_type& i, ValueType& val) const {
//Insert Static Assert with decltype on ValueType equals second argument type of FunctorType::operator()
f(i,val);
}
};
template< class FunctorType, class Enable = void>
struct ReduceFunctorHasInit {
enum {value = false};
};
template< class FunctorType>
struct ReduceFunctorHasInit<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::init ) >::type > {
enum {value = true};
};
template< class FunctorType, class Enable = void>
struct ReduceFunctorHasJoin {
enum {value = false};
};
template< class FunctorType>
struct ReduceFunctorHasJoin<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::join ) >::type > {
enum {value = true};
};
template< class FunctorType, class Enable = void>
struct ReduceFunctorHasFinal {
enum {value = false};
};
template< class FunctorType>
struct ReduceFunctorHasFinal<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::final ) >::type > {
enum {value = true};
};
template< class FunctorType, class Enable = void>
struct ReduceFunctorHasShmemSize {
enum {value = false};
};
template< class FunctorType>
struct ReduceFunctorHasShmemSize<FunctorType, typename Impl::enable_if< 0 < sizeof( & FunctorType::team_shmem_size ) >::type > {
enum {value = true};
};
template< class FunctorType, bool Enable =
( FunctorDeclaresValueType<FunctorType,void>::value) ||
( ReduceFunctorHasInit<FunctorType>::value ) ||
( ReduceFunctorHasJoin<FunctorType>::value ) ||
( ReduceFunctorHasFinal<FunctorType>::value ) ||
( ReduceFunctorHasShmemSize<FunctorType>::value )
>
struct IsNonTrivialReduceFunctor {
enum {value = false};
};
template< class FunctorType>
struct IsNonTrivialReduceFunctor<FunctorType, true> {
enum {value = true};
};
template<class FunctorType, class ResultType, class Tag, bool Enable = IsNonTrivialReduceFunctor<FunctorType>::value >
struct FunctorReferenceType {
typedef ResultType& reference_type;
};
template<class FunctorType, class ResultType, class Tag>
struct FunctorReferenceType<FunctorType, ResultType, Tag, true> {
typedef typename Kokkos::Impl::FunctorValueTraits< FunctorType ,Tag >::reference_type reference_type;
};
template< class FunctorTypeIn, class ExecPolicy, class ValueType>
struct ParallelReduceFunctorType<FunctorTypeIn,ExecPolicy,ValueType,Cuda> {
enum {FunctorHasValueType = IsNonTrivialReduceFunctor<FunctorTypeIn>::value };
typedef typename Kokkos::Impl::if_c<FunctorHasValueType, FunctorTypeIn, Impl::CudaFunctorAdapter<FunctorTypeIn,ExecPolicy,ValueType> >::type functor_type;
static functor_type functor(const FunctorTypeIn& functor_in) {
return Impl::if_c<FunctorHasValueType,FunctorTypeIn,functor_type>::select(functor_in,functor_type(functor_in));
}
};
}
} // namespace Kokkos
#endif /* defined( __CUDACC__ ) */
#endif /* #ifndef KOKKOS_CUDA_PARALLEL_HPP */
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
index ad9cca26c..79b3867ba 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_ReduceScan.hpp
@@ -1,444 +1,595 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDA_REDUCESCAN_HPP
#define KOKKOS_CUDA_REDUCESCAN_HPP
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
#include <utility>
#include <Kokkos_Parallel.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
#include <impl/Kokkos_Error.hpp>
#include <Cuda/Kokkos_Cuda_Vectorization.hpp>
+
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
+//----------------------------------------------------------------------------
+
+template< typename T >
+__device__ inline
+void cuda_shfl( T & out , T const & in , int lane ,
+ typename std::enable_if< sizeof(int) == sizeof(T) , int >::type width )
+{
+ *reinterpret_cast<int*>(&out) =
+ __shfl( *reinterpret_cast<int const *>(&in) , lane , width );
+}
+
+template< typename T >
+__device__ inline
+void cuda_shfl( T & out , T const & in , int lane ,
+ typename std::enable_if
+ < ( sizeof(int) < sizeof(T) ) && ( 0 == ( sizeof(T) % sizeof(int) ) )
+ , int >::type width )
+{
+ enum : int { N = sizeof(T) / sizeof(int) };
+
+ for ( int i = 0 ; i < N ; ++i ) {
+ reinterpret_cast<int*>(&out)[i] =
+ __shfl( reinterpret_cast<int const *>(&in)[i] , lane , width );
+ }
+}
+
+//----------------------------------------------------------------------------
+
+template< typename T >
+__device__ inline
+void cuda_shfl_down( T & out , T const & in , int delta ,
+ typename std::enable_if< sizeof(int) == sizeof(T) , int >::type width )
+{
+ *reinterpret_cast<int*>(&out) =
+ __shfl_down( *reinterpret_cast<int const *>(&in) , delta , width );
+}
+
+template< typename T >
+__device__ inline
+void cuda_shfl_down( T & out , T const & in , int delta ,
+ typename std::enable_if
+ < ( sizeof(int) < sizeof(T) ) && ( 0 == ( sizeof(T) % sizeof(int) ) )
+ , int >::type width )
+{
+ enum : int { N = sizeof(T) / sizeof(int) };
+
+ for ( int i = 0 ; i < N ; ++i ) {
+ reinterpret_cast<int*>(&out)[i] =
+ __shfl_down( reinterpret_cast<int const *>(&in)[i] , delta , width );
+ }
+}
+//----------------------------------------------------------------------------
-//Shfl based reductions
+template< typename T >
+__device__ inline
+void cuda_shfl_up( T & out , T const & in , int delta ,
+ typename std::enable_if< sizeof(int) == sizeof(T) , int >::type width )
+{
+ *reinterpret_cast<int*>(&out) =
+ __shfl_up( *reinterpret_cast<int const *>(&in) , delta , width );
+}
+
+template< typename T >
+__device__ inline
+void cuda_shfl_up( T & out , T const & in , int delta ,
+ typename std::enable_if
+ < ( sizeof(int) < sizeof(T) ) && ( 0 == ( sizeof(T) % sizeof(int) ) )
+ , int >::type width )
+{
+ enum : int { N = sizeof(T) / sizeof(int) };
+
+ for ( int i = 0 ; i < N ; ++i ) {
+ reinterpret_cast<int*>(&out)[i] =
+ __shfl_up( reinterpret_cast<int const *>(&in)[i] , delta , width );
+ }
+}
+
+//----------------------------------------------------------------------------
+/** \brief Reduce within a warp over blockDim.x, the "vector" dimension.
+ *
+ * This will be called within a nested, intra-team parallel operation.
+ * Use shuffle operations to avoid conflicts with shared memory usage.
+ *
+ * Requires:
+ * blockDim.x is power of 2
+ * blockDim.x <= 32 (one warp)
+ *
+ * Cannot use "butterfly" pattern because floating point
+ * addition is non-associative. Therefore, must broadcast
+ * the final result.
+ */
+template< class Reducer >
+__device__ inline
+void cuda_intra_warp_vector_reduce( Reducer const & reducer )
+{
+ static_assert(
+ std::is_reference< typename Reducer::reference_type >::value , "" );
+
+ if ( 1 < blockDim.x ) {
+
+ typename Reducer::value_type tmp ;
+
+ for ( int i = blockDim.x ; ( i >>= 1 ) ; ) {
+
+ cuda_shfl_down( tmp , reducer.reference() , i , blockDim.x );
+
+ if ( threadIdx.x < i ) { reducer.join( reducer.data() , & tmp ); }
+ }
+
+ // Broadcast from root "lane" to all other "lanes"
+
+ cuda_shfl( reducer.reference() , reducer.reference() , 0 , blockDim.x );
+ }
+}
+
+/** \brief Inclusive scan over blockDim.x, the "vector" dimension.
+ *
+ * This will be called within a nested, intra-team parallel operation.
+ * Use shuffle operations to avoid conflicts with shared memory usage.
+ *
+ * Algorithm is concurrent bottom-up reductions in triangular pattern
+ * where each CUDA thread is the root of a reduction tree from the
+ * zeroth CUDA thread to itself.
+ *
+ * Requires:
+ * blockDim.x is power of 2
+ * blockDim.x <= 32 (one warp)
+ */
+template< typename ValueType >
+__device__ inline
+void cuda_intra_warp_vector_inclusive_scan( ValueType & local )
+{
+ ValueType tmp ;
+
+ // Bottom up:
+ // [t] += [t-1] if t >= 1
+ // [t] += [t-2] if t >= 2
+ // [t] += [t-4] if t >= 4
+ // ...
+
+ for ( int i = 1 ; i < blockDim.x ; i <<= 1 ) {
+
+ cuda_shfl_up( tmp , local , i , blockDim.x );
+
+ if ( i <= threadIdx.x ) { local += tmp ; }
+ }
+}
+
+//----------------------------------------------------------------------------
/*
* Algorithmic constraints:
* (a) threads with same threadIdx.y have same value
* (b) blockDim.x == power of two
* (c) blockDim.z == 1
*/
template< class ValueType , class JoinOp>
__device__
inline void cuda_intra_warp_reduction( ValueType& result,
const JoinOp& join,
const int max_active_thread = blockDim.y) {
unsigned int shift = 1;
//Reduce over values from threads with different threadIdx.y
while(blockDim.x * shift < 32 ) {
const ValueType tmp = shfl_down(result, blockDim.x*shift,32u);
//Only join if upper thread is active (this allows non power of two for blockDim.y
if(threadIdx.y + shift < max_active_thread)
join(result , tmp);
shift*=2;
}
result = shfl(result,0,32);
}
template< class ValueType , class JoinOp>
__device__
inline void cuda_inter_warp_reduction( ValueType& value,
const JoinOp& join,
const int max_active_thread = blockDim.y) {
#define STEP_WIDTH 4
- __shared__ char sh_result[sizeof(ValueType)*STEP_WIDTH];
+ // Depending on the ValueType _shared__ memory must be aligned up to 8byte boundaries
+ // The reason not to use ValueType directly is that for types with constructors it
+ // could lead to race conditions
+ __shared__ double sh_result[(sizeof(ValueType)+7)/8*STEP_WIDTH];
ValueType* result = (ValueType*) & sh_result;
const unsigned step = 32 / blockDim.x;
unsigned shift = STEP_WIDTH;
const int id = threadIdx.y%step==0?threadIdx.y/step:65000;
if(id < STEP_WIDTH ) {
result[id] = value;
}
__syncthreads();
while (shift<=max_active_thread/step) {
if(shift<=id && shift+STEP_WIDTH>id && threadIdx.x==0) {
join(result[id%STEP_WIDTH],value);
}
__syncthreads();
shift+=STEP_WIDTH;
}
value = result[0];
for(int i = 1; (i*step<max_active_thread) && i<STEP_WIDTH; i++)
join(value,result[i]);
}
template< class ValueType , class JoinOp>
__device__
inline void cuda_intra_block_reduction( ValueType& value,
const JoinOp& join,
const int max_active_thread = blockDim.y) {
cuda_intra_warp_reduction(value,join,max_active_thread);
cuda_inter_warp_reduction(value,join,max_active_thread);
}
template< class FunctorType , class JoinOp , class ArgTag = void >
__device__
bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgTag >::reference_type value,
typename FunctorValueTraits< FunctorType , ArgTag >::reference_type neutral,
const JoinOp& join,
Cuda::size_type * const m_scratch_space,
typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type const result,
Cuda::size_type * const m_scratch_flags,
const int max_active_thread = blockDim.y) {
#ifdef __CUDA_ARCH__
typedef typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type pointer_type;
typedef typename FunctorValueTraits< FunctorType , ArgTag >::value_type value_type;
//Do the intra-block reduction with shfl operations and static shared memory
cuda_intra_block_reduction(value,join,max_active_thread);
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
//One thread in the block writes block result to global scratch_memory
if(id == 0 ) {
pointer_type global = ((pointer_type) m_scratch_space) + blockIdx.x;
*global = value;
}
//One warp of last block performs inter block reduction through loading the block values from global scratch_memory
bool last_block = false;
__syncthreads();
if ( id < 32 ) {
Cuda::size_type count;
//Figure out whether this is the last block
if(id == 0)
count = Kokkos::atomic_fetch_add(m_scratch_flags,1);
count = Kokkos::shfl(count,0,32);
//Last block does the inter block reduction
if( count == gridDim.x - 1) {
//set flag back to zero
if(id == 0)
*m_scratch_flags = 0;
last_block = true;
value = neutral;
pointer_type const volatile global = (pointer_type) m_scratch_space ;
//Reduce all global values with splitting work over threads in one warp
const int step_size = blockDim.x*blockDim.y < 32 ? blockDim.x*blockDim.y : 32;
for(int i=id; i<gridDim.x; i+=step_size) {
value_type tmp = global[i];
join(value, tmp);
}
//Perform shfl reductions within the warp only join if contribution is valid (allows gridDim.x non power of two and <32)
if (blockDim.x*blockDim.y > 1) {
value_type tmp = Kokkos::shfl_down(value, 1,32);
if( id + 1 < gridDim.x )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 2) {
value_type tmp = Kokkos::shfl_down(value, 2,32);
if( id + 2 < gridDim.x )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 4) {
value_type tmp = Kokkos::shfl_down(value, 4,32);
if( id + 4 < gridDim.x )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 8) {
value_type tmp = Kokkos::shfl_down(value, 8,32);
if( id + 8 < gridDim.x )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 16) {
value_type tmp = Kokkos::shfl_down(value, 16,32);
if( id + 16 < gridDim.x )
join(value, tmp);
}
}
}
//The last block has in its thread=0 the global reduction value through "value"
return last_block;
#else
return true;
#endif
}
//----------------------------------------------------------------------------
// See section B.17 of Cuda C Programming Guide Version 3.2
// for discussion of
// __launch_bounds__(maxThreadsPerBlock,minBlocksPerMultiprocessor)
// function qualifier which could be used to improve performance.
//----------------------------------------------------------------------------
// Maximize shared memory and minimize L1 cache:
// cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShared );
// For 2.0 capability: 48 KB shared and 16 KB L1
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
/*
* Algorithmic constraints:
* (a) blockDim.y is a power of two
* (b) blockDim.y <= 512
* (c) blockDim.x == blockDim.z == 1
*/
template< bool DoScan , class FunctorType , class ArgTag >
__device__
void cuda_intra_block_reduce_scan( const FunctorType & functor ,
const typename FunctorValueTraits< FunctorType , ArgTag >::pointer_type base_data )
{
typedef FunctorValueTraits< FunctorType , ArgTag > ValueTraits ;
typedef FunctorValueJoin< FunctorType , ArgTag > ValueJoin ;
typedef typename ValueTraits::pointer_type pointer_type ;
const unsigned value_count = ValueTraits::value_count( functor );
const unsigned BlockSizeMask = blockDim.y - 1 ;
// Must have power of two thread count
if ( BlockSizeMask & blockDim.y ) { Kokkos::abort("Cuda::cuda_intra_block_scan requires power-of-two blockDim"); }
#define BLOCK_REDUCE_STEP( R , TD , S ) \
if ( ! ( R & ((1<<(S+1))-1) ) ) { ValueJoin::join( functor , TD , (TD - (value_count<<S)) ); }
#define BLOCK_SCAN_STEP( TD , N , S ) \
if ( N == (1<<S) ) { ValueJoin::join( functor , TD , (TD - (value_count<<S))); }
const unsigned rtid_intra = threadIdx.y ^ BlockSizeMask ;
const pointer_type tdata_intra = base_data + value_count * threadIdx.y ;
{ // Intra-warp reduction:
BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,0)
BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,1)
BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,2)
BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,3)
BLOCK_REDUCE_STEP(rtid_intra,tdata_intra,4)
}
__syncthreads(); // Wait for all warps to reduce
{ // Inter-warp reduce-scan by a single warp to avoid extra synchronizations
const unsigned rtid_inter = ( threadIdx.y ^ BlockSizeMask ) << CudaTraits::WarpIndexShift ;
if ( rtid_inter < blockDim.y ) {
const pointer_type tdata_inter = base_data + value_count * ( rtid_inter ^ BlockSizeMask );
if ( (1<<5) < BlockSizeMask ) { BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,5) }
if ( (1<<6) < BlockSizeMask ) { __threadfence_block(); BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,6) }
if ( (1<<7) < BlockSizeMask ) { __threadfence_block(); BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,7) }
if ( (1<<8) < BlockSizeMask ) { __threadfence_block(); BLOCK_REDUCE_STEP(rtid_inter,tdata_inter,8) }
if ( DoScan ) {
int n = ( rtid_inter & 32 ) ? 32 : (
( rtid_inter & 64 ) ? 64 : (
( rtid_inter & 128 ) ? 128 : (
( rtid_inter & 256 ) ? 256 : 0 )));
if ( ! ( rtid_inter + n < blockDim.y ) ) n = 0 ;
__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,8)
__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,7)
__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,6)
__threadfence_block(); BLOCK_SCAN_STEP(tdata_inter,n,5)
}
}
}
__syncthreads(); // Wait for inter-warp reduce-scan to complete
if ( DoScan ) {
int n = ( rtid_intra & 1 ) ? 1 : (
( rtid_intra & 2 ) ? 2 : (
( rtid_intra & 4 ) ? 4 : (
( rtid_intra & 8 ) ? 8 : (
( rtid_intra & 16 ) ? 16 : 0 ))));
if ( ! ( rtid_intra + n < blockDim.y ) ) n = 0 ;
#ifdef KOKKOS_IMPL_CUDA_CLANG_WORKAROUND
BLOCK_SCAN_STEP(tdata_intra,n,4) __syncthreads();//__threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,3) __syncthreads();//__threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,2) __syncthreads();//__threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,1) __syncthreads();//__threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,0) __syncthreads();
#else
BLOCK_SCAN_STEP(tdata_intra,n,4) __threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,3) __threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,2) __threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,1) __threadfence_block();
BLOCK_SCAN_STEP(tdata_intra,n,0) __threadfence_block();
#endif
}
#undef BLOCK_SCAN_STEP
#undef BLOCK_REDUCE_STEP
}
//----------------------------------------------------------------------------
/**\brief Input value-per-thread starting at 'shared_data'.
* Reduction value at last thread's location.
*
* If 'DoScan' then write blocks' scan values and block-groups' scan values.
*
* Global reduce result is in the last threads' 'shared_data' location.
*/
template< bool DoScan , class FunctorType , class ArgTag >
__device__
bool cuda_single_inter_block_reduce_scan( const FunctorType & functor ,
const Cuda::size_type block_id ,
const Cuda::size_type block_count ,
Cuda::size_type * const shared_data ,
Cuda::size_type * const global_data ,
Cuda::size_type * const global_flags )
{
typedef Cuda::size_type size_type ;
typedef FunctorValueTraits< FunctorType , ArgTag > ValueTraits ;
typedef FunctorValueJoin< FunctorType , ArgTag > ValueJoin ;
typedef FunctorValueInit< FunctorType , ArgTag > ValueInit ;
typedef FunctorValueOps< FunctorType , ArgTag > ValueOps ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
// '__ffs' = position of the least significant bit set to 1.
// 'blockDim.y' is guaranteed to be a power of two so this
// is the integral shift value that can replace an integral divide.
const unsigned BlockSizeShift = __ffs( blockDim.y ) - 1 ;
const unsigned BlockSizeMask = blockDim.y - 1 ;
// Must have power of two thread count
if ( BlockSizeMask & blockDim.y ) { Kokkos::abort("Cuda::cuda_single_inter_block_reduce_scan requires power-of-two blockDim"); }
const integral_nonzero_constant< size_type , ValueTraits::StaticValueSize / sizeof(size_type) >
word_count( ValueTraits::value_size( functor ) / sizeof(size_type) );
// Reduce the accumulation for the entire block.
cuda_intra_block_reduce_scan<false,FunctorType,ArgTag>( functor , pointer_type(shared_data) );
{
// Write accumulation total to global scratch space.
// Accumulation total is the last thread's data.
size_type * const shared = shared_data + word_count.value * BlockSizeMask ;
size_type * const global = global_data + word_count.value * block_id ;
#if (__CUDA_ARCH__ < 500)
for ( size_type i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i] ; }
#else
for ( size_type i = 0 ; i < word_count.value ; i += 1 ) { global[i] = shared[i] ; }
#endif
}
// Contributing blocks note that their contribution has been completed via an atomic-increment flag
// If this block is not the last block to contribute to this group then the block is done.
const bool is_last_block =
! __syncthreads_or( threadIdx.y ? 0 : ( 1 + atomicInc( global_flags , block_count - 1 ) < block_count ) );
if ( is_last_block ) {
const size_type b = ( long(block_count) * long(threadIdx.y) ) >> BlockSizeShift ;
const size_type e = ( long(block_count) * long( threadIdx.y + 1 ) ) >> BlockSizeShift ;
{
void * const shared_ptr = shared_data + word_count.value * threadIdx.y ;
reference_type shared_value = ValueInit::init( functor , shared_ptr );
for ( size_type i = b ; i < e ; ++i ) {
ValueJoin::join( functor , shared_ptr , global_data + word_count.value * i );
}
}
cuda_intra_block_reduce_scan<DoScan,FunctorType,ArgTag>( functor , pointer_type(shared_data) );
if ( DoScan ) {
size_type * const shared_value = shared_data + word_count.value * ( threadIdx.y ? threadIdx.y - 1 : blockDim.y );
if ( ! threadIdx.y ) { ValueInit::init( functor , shared_value ); }
// Join previous inclusive scan value to each member
for ( size_type i = b ; i < e ; ++i ) {
size_type * const global_value = global_data + word_count.value * i ;
ValueJoin::join( functor , shared_value , global_value );
ValueOps ::copy( functor , global_value , shared_value );
}
}
}
return is_last_block ;
}
// Size in bytes required for inter block reduce or scan
template< bool DoScan , class FunctorType , class ArgTag >
inline
unsigned cuda_single_inter_block_reduce_scan_shmem( const FunctorType & functor , const unsigned BlockSize )
{
return ( BlockSize + 2 ) * Impl::FunctorValueTraits< FunctorType , ArgTag >::value_size( functor );
}
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( __CUDACC__ ) */
#endif /* KOKKOS_CUDA_REDUCESCAN_HPP */
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
index c96b8b7d4..cf3e55d50 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
@@ -1,179 +1,179 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_ENABLE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG )
#include <impl/Kokkos_TaskQueue_impl.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template class TaskQueue< Kokkos::Cuda > ;
//----------------------------------------------------------------------------
__device__
void TaskQueueSpecialization< Kokkos::Cuda >::driver
( TaskQueueSpecialization< Kokkos::Cuda >::queue_type * const queue )
{
using Member = TaskExec< Kokkos::Cuda > ;
using Queue = TaskQueue< Kokkos::Cuda > ;
using task_root_type = TaskBase< Kokkos::Cuda , void , void > ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
Member single_exec( 1 );
Member team_exec( blockDim.y );
const int warp_lane = threadIdx.x + threadIdx.y * blockDim.x ;
union {
task_root_type * ptr ;
int raw[2] ;
} task ;
// Loop until all queues are empty and no tasks in flight
do {
// Each team lead attempts to acquire either a thread team task
// or collection of single thread tasks for the team.
if ( 0 == warp_lane ) {
task.ptr = 0 < *((volatile int *) & queue->m_ready_count) ? end : 0 ;
// Loop by priority and then type
for ( int i = 0 ; i < Queue::NumQueue && end == task.ptr ; ++i ) {
for ( int j = 0 ; j < 2 && end == task.ptr ; ++j ) {
- task.ptr = Queue::pop_task( & queue->m_ready[i][j] );
+ task.ptr = Queue::pop_ready_task( & queue->m_ready[i][j] );
}
}
#if 0
printf("TaskQueue<Cuda>::driver(%d,%d) task(%lx)\n",threadIdx.z,blockIdx.x
, uintptr_t(task.ptr));
#endif
}
// shuffle broadcast
task.raw[0] = __shfl( task.raw[0] , 0 );
task.raw[1] = __shfl( task.raw[1] , 0 );
if ( 0 == task.ptr ) break ; // 0 == queue->m_ready_count
if ( end != task.ptr ) {
if ( task_root_type::TaskTeam == task.ptr->m_task_type ) {
// Thread Team Task
(*task.ptr->m_apply)( task.ptr , & team_exec );
}
else if ( 0 == threadIdx.y ) {
// Single Thread Task
(*task.ptr->m_apply)( task.ptr , & single_exec );
}
if ( 0 == warp_lane ) {
queue->complete( task.ptr );
}
}
} while(1);
}
namespace {
__global__
void cuda_task_queue_execute( TaskQueue< Kokkos::Cuda > * queue )
{ TaskQueueSpecialization< Kokkos::Cuda >::driver( queue ); }
}
void TaskQueueSpecialization< Kokkos::Cuda >::execute
( TaskQueue< Kokkos::Cuda > * const queue )
{
const int warps_per_block = 4 ;
const dim3 grid( Kokkos::Impl::cuda_internal_multiprocessor_count() , 1 , 1 );
const dim3 block( 1 , Kokkos::Impl::CudaTraits::WarpSize , warps_per_block );
const int shared = 0 ;
const cudaStream_t stream = 0 ;
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
#if 0
printf("cuda_task_queue_execute before\n");
#endif
// Query the stack size, in bytes:
//
// size_t stack_size = 0 ;
// CUDA_SAFE_CALL( cudaDeviceGetLimit( & stack_size , cudaLimitStackSize ) );
//
// If not large enough then set the stack size, in bytes:
//
// CUDA_SAFE_CALL( cudaDeviceSetLimit( cudaLimitStackSize , stack_size ) );
cuda_task_queue_execute<<< grid , block , shared , stream >>>( queue );
CUDA_SAFE_CALL( cudaGetLastError() );
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
#if 0
printf("cuda_task_queue_execute after\n");
#endif
}
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG ) */
diff --git a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp
index 479294f30..a13e37837 100644
--- a/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp
+++ b/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Task.hpp
@@ -1,523 +1,546 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_IMPL_CUDA_TASK_HPP
#define KOKKOS_IMPL_CUDA_TASK_HPP
#if defined( KOKKOS_ENABLE_TASKDAG )
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
namespace {
template< typename TaskType >
__global__
void set_cuda_task_base_apply_function_pointer
( TaskBase<Kokkos::Cuda,void,void>::function_type * ptr )
{ *ptr = TaskType::apply ; }
}
+template< class > class TaskExec ;
+
template<>
class TaskQueueSpecialization< Kokkos::Cuda >
{
public:
using execution_space = Kokkos::Cuda ;
using memory_space = Kokkos::CudaUVMSpace ;
using queue_type = TaskQueue< execution_space > ;
+ using member_type = TaskExec< Kokkos::Cuda > ;
static
void iff_single_thread_recursive_execute( queue_type * const ) {}
__device__
static void driver( queue_type * const );
static
void execute( queue_type * const );
- template< typename FunctorType >
+ template< typename TaskType >
static
- void proc_set_apply( TaskBase<execution_space,void,void>::function_type * ptr )
+ typename TaskType::function_type
+ get_function_pointer()
{
- using TaskType = TaskBase< execution_space
- , typename FunctorType::value_type
- , FunctorType > ;
+ using function_type = typename TaskType::function_type ;
+
+ function_type * const ptr =
+ (function_type*) cuda_internal_scratch_unified( sizeof(function_type) );
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
set_cuda_task_base_apply_function_pointer<TaskType><<<1,1>>>(ptr);
CUDA_SAFE_CALL( cudaGetLastError() );
CUDA_SAFE_CALL( cudaDeviceSynchronize() );
+
+ return *ptr ;
}
};
extern template class TaskQueue< Kokkos::Cuda > ;
//----------------------------------------------------------------------------
/**\brief Impl::TaskExec<Cuda> is the TaskScheduler<Cuda>::member_type
* passed to tasks running in a Cuda space.
*
* Cuda thread blocks for tasking are dimensioned:
* blockDim.x == vector length
* blockDim.y == team size
* blockDim.z == number of teams
* where
* blockDim.x * blockDim.y == WarpSize
*
* Both single thread and thread team tasks are run by a full Cuda warp.
* A single thread task is called by warp lane #0 and the remaining
* lanes of the warp are idle.
*/
template<>
class TaskExec< Kokkos::Cuda >
{
private:
TaskExec( TaskExec && ) = delete ;
TaskExec( TaskExec const & ) = delete ;
TaskExec & operator = ( TaskExec && ) = delete ;
TaskExec & operator = ( TaskExec const & ) = delete ;
friend class Kokkos::Impl::TaskQueue< Kokkos::Cuda > ;
friend class Kokkos::Impl::TaskQueueSpecialization< Kokkos::Cuda > ;
const int m_team_size ;
__device__
TaskExec( int arg_team_size = blockDim.y )
: m_team_size( arg_team_size ) {}
public:
#if defined( __CUDA_ARCH__ )
__device__ void team_barrier() { /* __threadfence_block(); */ }
__device__ int team_rank() const { return threadIdx.y ; }
__device__ int team_size() const { return m_team_size ; }
#else
__host__ void team_barrier() {}
__host__ int team_rank() const { return 0 ; }
__host__ int team_size() const { return 0 ; }
#endif
};
//----------------------------------------------------------------------------
template<typename iType>
struct TeamThreadRangeBoundariesStruct<iType, TaskExec< Kokkos::Cuda > >
{
typedef iType index_type;
const iType start ;
const iType end ;
const iType increment ;
const TaskExec< Kokkos::Cuda > & thread;
#if defined( __CUDA_ARCH__ )
__device__ inline
TeamThreadRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count)
: start( threadIdx.y )
, end(arg_count)
, increment( blockDim.y )
, thread(arg_thread)
{}
__device__ inline
TeamThreadRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread
, const iType & arg_start
, const iType & arg_end
)
: start( arg_start + threadIdx.y )
, end( arg_end)
, increment( blockDim.y )
, thread( arg_thread )
{}
#else
TeamThreadRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count);
TeamThreadRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread
, const iType & arg_start
, const iType & arg_end
);
#endif
};
//----------------------------------------------------------------------------
template<typename iType>
struct ThreadVectorRangeBoundariesStruct<iType, TaskExec< Kokkos::Cuda > >
{
typedef iType index_type;
const iType start ;
const iType end ;
const iType increment ;
const TaskExec< Kokkos::Cuda > & thread;
#if defined( __CUDA_ARCH__ )
__device__ inline
ThreadVectorRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count)
: start( threadIdx.x )
, end(arg_count)
, increment( blockDim.x )
, thread(arg_thread)
{}
#else
ThreadVectorRangeBoundariesStruct
( const TaskExec< Kokkos::Cuda > & arg_thread, const iType& arg_count);
#endif
};
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
namespace Kokkos {
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >
TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread, const iType & count )
{
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >( thread, count );
}
template<typename iType1, typename iType2>
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct
< typename std::common_type<iType1,iType2>::type
, Impl::TaskExec< Kokkos::Cuda > >
TeamThreadRange( const Impl::TaskExec< Kokkos::Cuda > & thread
, const iType1 & begin, const iType2 & end )
{
typedef typename std::common_type< iType1, iType2 >::type iType;
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Cuda > >(
thread, iType(begin), iType(end) );
}
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >
ThreadVectorRange( const Impl::TaskExec< Kokkos::Cuda > & thread
, const iType & count )
{
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >(thread,count);
}
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.
*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for
( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::Cuda > >& loop_boundaries
, const Lambda& lambda
)
{
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i);
}
}
// reduce across corresponding lanes between team members within warp
// assume stride*team_size == warp_size
template< typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void strided_shfl_warp_reduction
(const JoinType& join,
ValueType& val,
int team_size,
int stride)
{
for (int lane_delta=(team_size*stride)>>1; lane_delta>=stride; lane_delta>>=1) {
join(val, Kokkos::shfl_down(val, lane_delta, team_size*stride));
}
}
// multiple within-warp non-strided reductions
template< typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void multi_shfl_warp_reduction
(const JoinType& join,
ValueType& val,
int vec_length)
{
for (int lane_delta=vec_length>>1; lane_delta; lane_delta>>=1) {
join(val, Kokkos::shfl_down(val, lane_delta, vec_length));
}
}
// broadcast within warp
template< class ValueType >
KOKKOS_INLINE_FUNCTION
ValueType shfl_warp_broadcast
(ValueType& val,
int src_lane,
int width)
{
return Kokkos::shfl(val, src_lane, width);
}
// all-reduce across corresponding vector lanes between team members within warp
// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda,
const JoinType& join,
ValueType& initialized_result) {
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
initialized_result = result;
strided_shfl_warp_reduction<ValueType, JoinType>(
join,
initialized_result,
loop_boundaries.thread.team_size(),
blockDim.x);
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, threadIdx.x, Impl::CudaTraits::WarpSize );
}
// all-reduce across corresponding vector lanes between team members within warp
// if no join() provided, use sum
// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda,
ValueType& initialized_result) {
//TODO what is the point of creating this temporary?
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
initialized_result = result;
strided_shfl_warp_reduction(
[&] (ValueType& val1, const ValueType& val2) { val1 += val2; },
initialized_result,
loop_boundaries.thread.team_size(),
blockDim.x);
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, threadIdx.x, Impl::CudaTraits::WarpSize );
}
// all-reduce within team members within warp
// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda,
const JoinType& join,
ValueType& initialized_result) {
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
initialized_result = result;
multi_shfl_warp_reduction<ValueType, JoinType>(join, initialized_result, blockDim.x);
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, 0, blockDim.x );
}
// all-reduce within team members within warp
// if no join() provided, use sum
// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda,
ValueType& initialized_result) {
ValueType result = initialized_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
initialized_result = result;
//initialized_result = multi_shfl_warp_reduction(
multi_shfl_warp_reduction(
[&] (ValueType& val1, const ValueType& val2) { val1 += val2; },
initialized_result,
blockDim.x);
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, 0, blockDim.x );
}
// scan across corresponding vector lanes between team members within warp
// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
-template< typename ValueType, typename iType, class Lambda >
+template< typename iType, class Closure >
KOKKOS_INLINE_FUNCTION
void parallel_scan
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
- const Lambda & lambda) {
+ const Closure & closure )
+{
+ // Extract value_type from closure
- ValueType accum = 0 ;
- ValueType val, y, local_total;
+ using value_type =
+ typename Kokkos::Impl::FunctorAnalysis
+ < Kokkos::Impl::FunctorPatternInterface::SCAN
+ , void
+ , Closure >::value_type ;
+
+ value_type accum = 0 ;
+ value_type val, y, local_total;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
val = 0;
- lambda(i,val,false);
+ closure(i,val,false);
// intra-blockDim.y exclusive scan on 'val'
// accum = accumulated, sum in total for this iteration
// INCLUSIVE scan
for( int offset = blockDim.x ; offset < Impl::CudaTraits::WarpSize ; offset <<= 1 ) {
y = Kokkos::shfl_up(val, offset, Impl::CudaTraits::WarpSize);
if(threadIdx.y*blockDim.x >= offset) { val += y; }
}
// pass accum to all threads
- local_total = shfl_warp_broadcast<ValueType>(val,
+ local_total = shfl_warp_broadcast<value_type>(val,
threadIdx.x+Impl::CudaTraits::WarpSize-blockDim.x,
Impl::CudaTraits::WarpSize);
// make EXCLUSIVE scan by shifting values over one
val = Kokkos::shfl_up(val, blockDim.x, Impl::CudaTraits::WarpSize);
if ( threadIdx.y == 0 ) { val = 0 ; }
val += accum;
- lambda(i,val,true);
+ closure(i,val,true);
accum += local_total;
}
}
// scan within team member (vector) within warp
// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
-template< typename iType, class Lambda, typename ValueType >
+template< typename iType, class Closure >
KOKKOS_INLINE_FUNCTION
void parallel_scan
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
- const Lambda & lambda)
+ const Closure & closure )
{
- ValueType accum = 0 ;
- ValueType val, y, local_total;
+ // Extract value_type from closure
+
+ using value_type =
+ typename Kokkos::Impl::FunctorAnalysis
+ < Kokkos::Impl::FunctorPatternInterface::SCAN
+ , void
+ , Closure >::value_type ;
+
+ value_type accum = 0 ;
+ value_type val, y, local_total;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
val = 0;
- lambda(i,val,false);
+ closure(i,val,false);
// intra-blockDim.x exclusive scan on 'val'
// accum = accumulated, sum in total for this iteration
// INCLUSIVE scan
for( int offset = 1 ; offset < blockDim.x ; offset <<= 1 ) {
y = Kokkos::shfl_up(val, offset, blockDim.x);
if(threadIdx.x >= offset) { val += y; }
}
// pass accum to all threads
- local_total = shfl_warp_broadcast<ValueType>(val, blockDim.x-1, blockDim.x);
+ local_total = shfl_warp_broadcast<value_type>(val, blockDim.x-1, blockDim.x);
// make EXCLUSIVE scan by shifting values over one
val = Kokkos::shfl_up(val, 1, blockDim.x);
if ( threadIdx.x == 0 ) { val = 0 ; }
val += accum;
- lambda(i,val,true);
+ closure(i,val,true);
accum += local_total;
}
}
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #ifndef KOKKOS_IMPL_CUDA_TASK_HPP */
diff --git a/lib/kokkos/core/src/KokkosExp_MDRangePolicy.hpp b/lib/kokkos/core/src/KokkosExp_MDRangePolicy.hpp
index 4e1ce855c..a450ca36a 100644
--- a/lib/kokkos/core/src/KokkosExp_MDRangePolicy.hpp
+++ b/lib/kokkos/core/src/KokkosExp_MDRangePolicy.hpp
@@ -1,611 +1,477 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CORE_EXP_MD_RANGE_POLICY_HPP
#define KOKKOS_CORE_EXP_MD_RANGE_POLICY_HPP
+#include <initializer_list>
+
+#include<impl/KokkosExp_Host_IterateTile.hpp>
#include <Kokkos_ExecPolicy.hpp>
#include <Kokkos_Parallel.hpp>
-#include <initializer_list>
-#if defined(KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION) && defined(KOKKOS_ENABLE_PRAGMA_IVDEP) && !defined(__CUDA_ARCH__)
-#define KOKKOS_IMPL_MDRANGE_IVDEP
+#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
+#include<Cuda/KokkosExp_Cuda_IterateTile.hpp>
#endif
namespace Kokkos { namespace Experimental {
+// ------------------------------------------------------------------ //
+
enum class Iterate
{
Default, // Default for the device
Left, // Left indices stride fastest
Right, // Right indices stride fastest
- Flat, // Do not tile, only valid for inner direction
};
template <typename ExecSpace>
struct default_outer_direction
{
using type = Iterate;
+ #if defined( KOKKOS_ENABLE_CUDA)
+ static constexpr Iterate value = Iterate::Left;
+ #else
static constexpr Iterate value = Iterate::Right;
+ #endif
};
template <typename ExecSpace>
struct default_inner_direction
{
using type = Iterate;
+ #if defined( KOKKOS_ENABLE_CUDA)
+ static constexpr Iterate value = Iterate::Left;
+ #else
static constexpr Iterate value = Iterate::Right;
+ #endif
};
// Iteration Pattern
template < unsigned N
, Iterate OuterDir = Iterate::Default
, Iterate InnerDir = Iterate::Default
>
struct Rank
{
static_assert( N != 0u, "Kokkos Error: rank 0 undefined");
static_assert( N != 1u, "Kokkos Error: rank 1 is not a multi-dimensional range");
- static_assert( N < 4u, "Kokkos Error: Unsupported rank...");
+ static_assert( N < 7u, "Kokkos Error: Unsupported rank...");
using iteration_pattern = Rank<N, OuterDir, InnerDir>;
static constexpr int rank = N;
static constexpr Iterate outer_direction = OuterDir;
static constexpr Iterate inner_direction = InnerDir;
};
-
// multi-dimensional iteration pattern
template <typename... Properties>
struct MDRangePolicy
+ : public Kokkos::Impl::PolicyTraits<Properties ...>
{
+ using traits = Kokkos::Impl::PolicyTraits<Properties ...>;
using range_policy = RangePolicy<Properties...>;
- static_assert( !std::is_same<range_policy,void>::value
+ using impl_range_policy = RangePolicy< typename traits::execution_space
+ , typename traits::schedule_type
+ , typename traits::index_type
+ > ;
+
+ static_assert( !std::is_same<typename traits::iteration_pattern,void>::value
, "Kokkos Error: MD iteration pattern not defined" );
- using iteration_pattern = typename range_policy::iteration_pattern;
- using work_tag = typename range_policy::work_tag;
+ using iteration_pattern = typename traits::iteration_pattern;
+ using work_tag = typename traits::work_tag;
static constexpr int rank = iteration_pattern::rank;
static constexpr int outer_direction = static_cast<int> (
- (iteration_pattern::outer_direction != Iterate::Default && iteration_pattern::outer_direction != Iterate::Flat)
+ (iteration_pattern::outer_direction != Iterate::Default)
? iteration_pattern::outer_direction
- : default_outer_direction< typename range_policy::execution_space>::value );
+ : default_outer_direction< typename traits::execution_space>::value );
static constexpr int inner_direction = static_cast<int> (
iteration_pattern::inner_direction != Iterate::Default
? iteration_pattern::inner_direction
- : default_inner_direction< typename range_policy::execution_space>::value ) ;
+ : default_inner_direction< typename traits::execution_space>::value ) ;
// Ugly ugly workaround intel 14 not handling scoped enum correctly
- static constexpr int Flat = static_cast<int>( Iterate::Flat );
static constexpr int Right = static_cast<int>( Iterate::Right );
-
-
- using size_type = typename range_policy::index_type;
- using index_type = typename std::make_signed<size_type>::type;
-
-
- template <typename I>
- MDRangePolicy( std::initializer_list<I> upper_corner )
+ static constexpr int Left = static_cast<int>( Iterate::Left );
+
+ using index_type = typename traits::index_type;
+ using array_index_type = long;
+ using point_type = Kokkos::Array<array_index_type,rank>; //was index_type
+ using tile_type = Kokkos::Array<array_index_type,rank>;
+ // If point_type or tile_type is not templated on a signed integral type (if it is unsigned),
+ // then if user passes in intializer_list of runtime-determined values of
+ // signed integral type that are not const will receive a compiler error due
+ // to an invalid case for implicit conversion -
+ // "conversion from integer or unscoped enumeration type to integer type that cannot represent all values of the original, except where source is a constant expression whose value can be stored exactly in the target type"
+ // This would require the user to either pass a matching index_type parameter
+ // as template parameter to the MDRangePolicy or static_cast the individual values
+
+ MDRangePolicy( point_type const& lower, point_type const& upper, tile_type const& tile = tile_type{} )
+ : m_lower(lower)
+ , m_upper(upper)
+ , m_tile(tile)
+ , m_num_tiles(1)
{
- static_assert( std::is_integral<I>::value, "Kokkos Error: corner defined with non-integral type" );
-
- // TODO check size of lists equal to rank
- // static_asserts on initializer_list.size() require c++14
-
- //static_assert( upper_corner.size() == rank, "Kokkos Error: upper_corner has incorrect rank" );
-
- const auto u = upper_corner.begin();
-
- m_num_tiles = 1;
- for (int i=0; i<rank; ++i) {
- m_offset[i] = static_cast<index_type>(0);
- m_dim[i] = static_cast<index_type>(u[i]);
- if (inner_direction != Flat) {
- // default tile size to 4
- m_tile[i] = 4;
- } else {
- m_tile[i] = 1;
+ // Host
+ if ( true
+ #if defined(KOKKOS_ENABLE_CUDA)
+ && !std::is_same< typename traits::execution_space, Kokkos::Cuda >::value
+ #endif
+ )
+ {
+ index_type span;
+ for (int i=0; i<rank; ++i) {
+ span = upper[i] - lower[i];
+ if ( m_tile[i] <= 0 ) {
+ if ( (inner_direction == Right && (i < rank-1))
+ || (inner_direction == Left && (i > 0)) )
+ {
+ m_tile[i] = 2;
+ }
+ else {
+ m_tile[i] = span;
+ }
+ }
+ m_tile_end[i] = static_cast<index_type>((span + m_tile[i] - 1) / m_tile[i]);
+ m_num_tiles *= m_tile_end[i];
}
- m_tile_dim[i] = (m_dim[i] + (m_tile[i] - 1)) / m_tile[i];
- m_num_tiles *= m_tile_dim[i];
}
- }
-
- template <typename IA, typename IB>
- MDRangePolicy( std::initializer_list<IA> corner_a
- , std::initializer_list<IB> corner_b
- )
- {
- static_assert( std::is_integral<IA>::value, "Kokkos Error: corner A defined with non-integral type" );
- static_assert( std::is_integral<IB>::value, "Kokkos Error: corner B defined with non-integral type" );
-
- // TODO check size of lists equal to rank
- // static_asserts on initializer_list.size() require c++14
- //static_assert( corner_a.size() == rank, "Kokkos Error: corner_a has incorrect rank" );
- //static_assert( corner_b.size() == rank, "Kokkos Error: corner_b has incorrect rank" );
-
-
- using A = typename std::make_signed<IA>::type;
- using B = typename std::make_signed<IB>::type;
-
- const auto a = [=](int i) { return static_cast<A>(corner_a.begin()[i]); };
- const auto b = [=](int i) { return static_cast<B>(corner_b.begin()[i]); };
-
- m_num_tiles = 1;
- for (int i=0; i<rank; ++i) {
- m_offset[i] = static_cast<index_type>(a(i) <= b(i) ? a(i) : b(i));
- m_dim[i] = static_cast<index_type>(a(i) <= b(i) ? b(i) - a(i) : a(i) - b(i));
- if (inner_direction != Flat) {
- // default tile size to 4
- m_tile[i] = 4;
- } else {
- m_tile[i] = 1;
+ #if defined(KOKKOS_ENABLE_CUDA)
+ else // Cuda
+ {
+ index_type span;
+ for (int i=0; i<rank; ++i) {
+ span = upper[i] - lower[i];
+ if ( m_tile[i] <= 0 ) {
+ // TODO: determine what is a good default tile size for cuda
+ // may be rank dependent
+ if ( (inner_direction == Right && (i < rank-1))
+ || (inner_direction == Left && (i > 0)) )
+ {
+ m_tile[i] = 2;
+ }
+ else {
+ m_tile[i] = 16;
+ }
+ }
+ m_tile_end[i] = static_cast<index_type>((span + m_tile[i] - 1) / m_tile[i]);
+ m_num_tiles *= m_tile_end[i];
+ }
+ index_type total_tile_size_check = 1;
+ for (int i=0; i<rank; ++i) {
+ total_tile_size_check *= m_tile[i];
+ }
+ if ( total_tile_size_check >= 1024 ) { // improve this check - 1024,1024,64 max per dim (Kepler), but product num_threads < 1024; more restrictions pending register limit
+ printf(" Tile dimensions exceed Cuda limits\n");
+ Kokkos::abort(" Cuda ExecSpace Error: MDRange tile dims exceed maximum number of threads per block - choose smaller tile dims");
+ //Kokkos::Impl::throw_runtime_exception( " Cuda ExecSpace Error: MDRange tile dims exceed maximum number of threads per block - choose smaller tile dims");
}
- m_tile_dim[i] = (m_dim[i] + (m_tile[i] - 1)) / m_tile[i];
- m_num_tiles *= m_tile_dim[i];
- }
- }
-
- template <typename IA, typename IB, typename T>
- MDRangePolicy( std::initializer_list<IA> corner_a
- , std::initializer_list<IB> corner_b
- , std::initializer_list<T> tile
- )
- {
- static_assert( std::is_integral<IA>::value, "Kokkos Error: corner A defined with non-integral type" );
- static_assert( std::is_integral<IB>::value, "Kokkos Error: corner B defined with non-integral type" );
- static_assert( std::is_integral<T>::value, "Kokkos Error: tile defined with non-integral type" );
- static_assert( inner_direction != Flat, "Kokkos Error: tiling not support with flat iteration" );
-
- // TODO check size of lists equal to rank
- // static_asserts on initializer_list.size() require c++14
- //static_assert( corner_a.size() == rank, "Kokkos Error: corner_a has incorrect rank" );
- //static_assert( corner_b.size() == rank, "Kokkos Error: corner_b has incorrect rank" );
- //static_assert( tile.size() == rank, "Kokkos Error: tile has incorrect rank" );
-
- using A = typename std::make_signed<IA>::type;
- using B = typename std::make_signed<IB>::type;
-
- const auto a = [=](int i) { return static_cast<A>(corner_a.begin()[i]); };
- const auto b = [=](int i) { return static_cast<B>(corner_b.begin()[i]); };
- const auto t = tile.begin();
-
- m_num_tiles = 1;
- for (int i=0; i<rank; ++i) {
- m_offset[i] = static_cast<index_type>(a(i) <= b(i) ? a(i) : b(i));
- m_dim[i] = static_cast<index_type>(a(i) <= b(i) ? b(i) - a(i) : a(i) - b(i));
- m_tile[i] = static_cast<int>(t[i] > (T)0 ? t[i] : (T)1 );
- m_tile_dim[i] = (m_dim[i] + (m_tile[i] - 1)) / m_tile[i];
- m_num_tiles *= m_tile_dim[i];
}
+ #endif
}
- index_type m_offset[rank];
- index_type m_dim[rank];
- int m_tile[rank];
- index_type m_tile_dim[rank];
- size_type m_num_tiles; // product of tile dims
-};
-
-namespace Impl {
-// Serial, Threads, OpenMP
-// use enable_if to overload for Cuda
-template < typename MDRange, typename Functor, typename Enable = void >
-struct MDForFunctor
-{
- using work_tag = typename MDRange::work_tag;
- using index_type = typename MDRange::index_type;
- using size_type = typename MDRange::size_type;
-
- MDRange m_range;
- Functor m_func;
-
- KOKKOS_INLINE_FUNCTION
- MDForFunctor( MDRange const& range, Functor const& f )
- : m_range(range)
- , m_func( f )
- {}
-
- KOKKOS_INLINE_FUNCTION
- MDForFunctor( MDRange const& range, Functor && f )
- : m_range(range)
- , m_func( std::forward<Functor>(f) )
- {}
-
- KOKKOS_INLINE_FUNCTION
- MDForFunctor( MDRange && range, Functor const& f )
- : m_range( std::forward<MDRange>(range) )
- , m_func( f )
- {}
-
- KOKKOS_INLINE_FUNCTION
- MDForFunctor( MDRange && range, Functor && f )
- : m_range( std::forward<MDRange>(range) )
- , m_func( std::forward<Functor>(f) )
- {}
-
-
- KOKKOS_INLINE_FUNCTION
- MDForFunctor( MDForFunctor const& ) = default;
-
- KOKKOS_INLINE_FUNCTION
- MDForFunctor& operator=( MDForFunctor const& ) = default;
-
- KOKKOS_INLINE_FUNCTION
- MDForFunctor( MDForFunctor && ) = default;
-
- KOKKOS_INLINE_FUNCTION
- MDForFunctor& operator=( MDForFunctor && ) = default;
-
- // Rank-2, Flat, No Tag
- template <typename Idx>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<Idx>::value
- && std::is_same<void, work_tag>::value
- && MDRange::rank == 2
- && MDRange::inner_direction == MDRange::Flat
- )>::type
- operator()(Idx t) const
+ template < typename LT , typename UT , typename TT = array_index_type >
+ MDRangePolicy( std::initializer_list<LT> const& lower, std::initializer_list<UT> const& upper, std::initializer_list<TT> const& tile = {} )
{
- if ( MDRange::outer_direction == MDRange::Right ) {
- m_func( m_range.m_offset[0] + ( t / m_range.m_dim[1] )
- , m_range.m_offset[1] + ( t % m_range.m_dim[1] ) );
- } else {
- m_func( m_range.m_offset[0] + ( t % m_range.m_dim[0] )
- , m_range.m_offset[1] + ( t / m_range.m_dim[0] ) );
+#if 0
+ // This should work, less duplicated code but not yet extensively tested
+ point_type lower_tmp, upper_tmp;
+ tile_type tile_tmp;
+ for ( auto i = 0; i < rank; ++i ) {
+ lower_tmp[i] = static_cast<array_index_type>(lower.begin()[i]);
+ upper_tmp[i] = static_cast<array_index_type>(upper.begin()[i]);
+ tile_tmp[i] = static_cast<array_index_type>(tile.begin()[i]);
}
- }
- // Rank-2, Flat, Tag
- template <typename Idx>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<Idx>::value
- && !std::is_same<void, work_tag>::value
- && MDRange::rank == 2
- && MDRange::inner_direction == MDRange::Flat
- )>::type
- operator()(Idx t) const
- {
- if ( MDRange::outer_direction == MDRange::Right ) {
- m_func( work_tag{}, m_range.m_offset[0] + ( t / m_range.m_dim[1] )
- , m_range.m_offset[1] + ( t % m_range.m_dim[1] ) );
- } else {
- m_func( work_tag{}, m_range.m_offset[0] + ( t % m_range.m_dim[0] )
- , m_range.m_offset[1] + ( t / m_range.m_dim[0] ) );
- }
- }
+ MDRangePolicy( lower_tmp, upper_tmp, tile_tmp );
- // Rank-2, Not Flat, No Tag
- template <typename Idx>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<Idx>::value
- && std::is_same<void, work_tag>::value
- && MDRange::rank == 2
- && MDRange::inner_direction != MDRange::Flat
- )>::type
- operator()(Idx t) const
- {
- index_type t0, t1;
- if ( MDRange::outer_direction == MDRange::Right ) {
- t0 = t / m_range.m_tile_dim[1];
- t1 = t % m_range.m_tile_dim[1];
- } else {
- t0 = t % m_range.m_tile_dim[0];
- t1 = t / m_range.m_tile_dim[0];
- }
+#else
+ if(m_lower.size()!=rank || m_upper.size() != rank)
+ Kokkos::abort("MDRangePolicy: Constructor initializer lists have wrong size");
- const index_type b0 = t0 * m_range.m_tile[0] + m_range.m_offset[0];
- const index_type b1 = t1 * m_range.m_tile[1] + m_range.m_offset[1];
-
- const index_type e0 = b0 + m_range.m_tile[0] <= (m_range.m_dim[0] + m_range.m_offset[0] ) ? b0 + m_range.m_tile[0] : ( m_range.m_dim[0] + m_range.m_offset[0] );
- const index_type e1 = b1 + m_range.m_tile[1] <= (m_range.m_dim[1] + m_range.m_offset[1] ) ? b1 + m_range.m_tile[1] : ( m_range.m_dim[1] + m_range.m_offset[1] );
-
- if ( MDRange::inner_direction == MDRange::Right ) {
- for (int i0=b0; i0<e0; ++i0) {
- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
- #pragma ivdep
- #endif
- for (int i1=b1; i1<e1; ++i1) {
- m_func( i0, i1 );
- }}
- } else {
- for (int i1=b1; i1<e1; ++i1) {
- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
- #pragma ivdep
- #endif
- for (int i0=b0; i0<e0; ++i0) {
- m_func( i0, i1 );
- }}
+ for ( auto i = 0; i < rank; ++i ) {
+ m_lower[i] = static_cast<array_index_type>(lower.begin()[i]);
+ m_upper[i] = static_cast<array_index_type>(upper.begin()[i]);
+ if(tile.size()==rank)
+ m_tile[i] = static_cast<array_index_type>(tile.begin()[i]);
+ else
+ m_tile[i] = 0;
}
- }
- // Rank-2, Not Flat, Tag
- template <typename Idx>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<Idx>::value
- && !std::is_same<void, work_tag>::value
- && MDRange::rank == 2
- && MDRange::inner_direction != MDRange::Flat
- )>::type
- operator()(Idx t) const
- {
- work_tag tag;
-
- index_type t0, t1;
- if ( MDRange::outer_direction == MDRange::Right ) {
- t0 = t / m_range.m_tile_dim[1];
- t1 = t % m_range.m_tile_dim[1];
- } else {
- t0 = t % m_range.m_tile_dim[0];
- t1 = t / m_range.m_tile_dim[0];
- }
+ m_num_tiles = 1;
- const index_type b0 = t0 * m_range.m_tile[0] + m_range.m_offset[0];
- const index_type b1 = t1 * m_range.m_tile[1] + m_range.m_offset[1];
-
- const index_type e0 = b0 + m_range.m_tile[0] <= (m_range.m_dim[0] + m_range.m_offset[0] ) ? b0 + m_range.m_tile[0] : ( m_range.m_dim[0] + m_range.m_offset[0] );
- const index_type e1 = b1 + m_range.m_tile[1] <= (m_range.m_dim[1] + m_range.m_offset[1] ) ? b1 + m_range.m_tile[1] : ( m_range.m_dim[1] + m_range.m_offset[1] );
-
- if ( MDRange::inner_direction == MDRange::Right ) {
- for (int i0=b0; i0<e0; ++i0) {
- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
- #pragma ivdep
- #endif
- for (int i1=b1; i1<e1; ++i1) {
- m_func( tag, i0, i1 );
- }}
- } else {
- for (int i1=b1; i1<e1; ++i1) {
- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
- #pragma ivdep
- #endif
- for (int i0=b0; i0<e0; ++i0) {
- m_func( tag, i0, i1 );
- }}
- }
- }
- //---------------------------------------------------------------------------
-
- // Rank-3, Flat, No Tag
- template <typename Idx>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<Idx>::value
- && std::is_same<void, work_tag>::value
- && MDRange::rank == 3
- && MDRange::inner_direction == MDRange::Flat
- )>::type
- operator()(Idx t) const
- {
- if ( MDRange::outer_direction == MDRange::Right ) {
- const int64_t tmp_prod = m_range.m_dim[1]*m_range.m_dim[2];
- m_func( m_range.m_offset[0] + ( t / tmp_prod )
- , m_range.m_offset[1] + ( (t % tmp_prod) / m_range.m_dim[2] )
- , m_range.m_offset[2] + ( (t % tmp_prod) % m_range.m_dim[2] )
- );
- } else {
- const int64_t tmp_prod = m_range.m_dim[0]*m_range.m_dim[1];
- m_func( m_range.m_offset[0] + ( (t % tmp_prod) % m_range.m_dim[0] )
- , m_range.m_offset[1] + ( (t % tmp_prod) / m_range.m_dim[0] )
- , m_range.m_offset[2] + ( t / tmp_prod )
- );
+ // Host
+ if ( true
+ #if defined(KOKKOS_ENABLE_CUDA)
+ && !std::is_same< typename traits::execution_space, Kokkos::Cuda >::value
+ #endif
+ )
+ {
+ index_type span;
+ for (int i=0; i<rank; ++i) {
+ span = m_upper[i] - m_lower[i];
+ if ( m_tile[i] <= 0 ) {
+ if ( (inner_direction == Right && (i < rank-1))
+ || (inner_direction == Left && (i > 0)) )
+ {
+ m_tile[i] = 2;
+ }
+ else {
+ m_tile[i] = span;
+ }
+ }
+ m_tile_end[i] = static_cast<index_type>((span + m_tile[i] - 1) / m_tile[i]);
+ m_num_tiles *= m_tile_end[i];
+ }
}
- }
-
- // Rank-3, Flat, Tag
- template <typename Idx>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<Idx>::value
- && !std::is_same<void, work_tag>::value
- && MDRange::rank == 3
- && MDRange::inner_direction == MDRange::Flat
- )>::type
- operator()(Idx t) const
- {
- if ( MDRange::outer_direction == MDRange::Right ) {
- const int64_t tmp_prod = m_range.m_dim[1]*m_range.m_dim[2];
- m_func( work_tag{}
- , m_range.m_offset[0] + ( t / tmp_prod )
- , m_range.m_offset[1] + ( (t % tmp_prod) / m_range.m_dim[2] )
- , m_range.m_offset[2] + ( (t % tmp_prod) % m_range.m_dim[2] )
- );
- } else {
- const int64_t tmp_prod = m_range.m_dim[0]*m_range.m_dim[1];
- m_func( work_tag{}
- , m_range.m_offset[0] + ( (t % tmp_prod) % m_range.m_dim[0] )
- , m_range.m_offset[1] + ( (t % tmp_prod) / m_range.m_dim[0] )
- , m_range.m_offset[2] + ( t / tmp_prod )
- );
+ #if defined(KOKKOS_ENABLE_CUDA)
+ else // Cuda
+ {
+ index_type span;
+ for (int i=0; i<rank; ++i) {
+ span = m_upper[i] - m_lower[i];
+ if ( m_tile[i] <= 0 ) {
+ // TODO: determine what is a good default tile size for cuda
+ // may be rank dependent
+ if ( (inner_direction == Right && (i < rank-1))
+ || (inner_direction == Left && (i > 0)) )
+ {
+ m_tile[i] = 2;
+ }
+ else {
+ m_tile[i] = 16;
+ }
+ }
+ m_tile_end[i] = static_cast<index_type>((span + m_tile[i] - 1) / m_tile[i]);
+ m_num_tiles *= m_tile_end[i];
+ }
+ index_type total_tile_size_check = 1;
+ for (int i=0; i<rank; ++i) {
+ total_tile_size_check *= m_tile[i];
+ }
+ if ( total_tile_size_check >= 1024 ) { // improve this check - 1024,1024,64 max per dim (Kepler), but product num_threads < 1024; more restrictions pending register limit
+ printf(" Tile dimensions exceed Cuda limits\n");
+ Kokkos::abort(" Cuda ExecSpace Error: MDRange tile dims exceed maximum number of threads per block - choose smaller tile dims");
+ //Kokkos::Impl::throw_runtime_exception( " Cuda ExecSpace Error: MDRange tile dims exceed maximum number of threads per block - choose smaller tile dims");
+ }
}
+ #endif
+#endif
}
- // Rank-3, Not Flat, No Tag
- template <typename Idx>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<Idx>::value
- && std::is_same<void, work_tag>::value
- && MDRange::rank == 3
- && MDRange::inner_direction != MDRange::Flat
- )>::type
- operator()(Idx t) const
- {
- index_type t0, t1, t2;
- if ( MDRange::outer_direction == MDRange::Right ) {
- const index_type tmp_prod = ( m_range.m_tile_dim[1]*m_range.m_tile_dim[2]);
- t0 = t / tmp_prod;
- t1 = ( t % tmp_prod ) / m_range.m_tile_dim[2];
- t2 = ( t % tmp_prod ) % m_range.m_tile_dim[2];
- } else {
- const index_type tmp_prod = ( m_range.m_tile_dim[0]*m_range.m_tile_dim[1]);
- t0 = ( t % tmp_prod ) % m_range.m_tile_dim[0];
- t1 = ( t % tmp_prod ) / m_range.m_tile_dim[0];
- t2 = t / tmp_prod;
- }
- const index_type b0 = t0 * m_range.m_tile[0] + m_range.m_offset[0];
- const index_type b1 = t1 * m_range.m_tile[1] + m_range.m_offset[1];
- const index_type b2 = t2 * m_range.m_tile[2] + m_range.m_offset[2];
-
- const index_type e0 = b0 + m_range.m_tile[0] <= (m_range.m_dim[0] + m_range.m_offset[0] ) ? b0 + m_range.m_tile[0] : ( m_range.m_dim[0] + m_range.m_offset[0] );
- const index_type e1 = b1 + m_range.m_tile[1] <= (m_range.m_dim[1] + m_range.m_offset[1] ) ? b1 + m_range.m_tile[1] : ( m_range.m_dim[1] + m_range.m_offset[1] );
- const index_type e2 = b2 + m_range.m_tile[2] <= (m_range.m_dim[2] + m_range.m_offset[2] ) ? b2 + m_range.m_tile[2] : ( m_range.m_dim[2] + m_range.m_offset[2] );
-
- if ( MDRange::inner_direction == MDRange::Right ) {
- for (int i0=b0; i0<e0; ++i0) {
- for (int i1=b1; i1<e1; ++i1) {
- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
- #pragma ivdep
- #endif
- for (int i2=b2; i2<e2; ++i2) {
- m_func( i0, i1, i2 );
- }}}
- } else {
- for (int i2=b2; i2<e2; ++i2) {
- for (int i1=b1; i1<e1; ++i1) {
- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
- #pragma ivdep
- #endif
- for (int i0=b0; i0<e0; ++i0) {
- m_func( i0, i1, i2 );
- }}}
- }
- }
+ point_type m_lower;
+ point_type m_upper;
+ tile_type m_tile;
+ point_type m_tile_end;
+ index_type m_num_tiles;
+};
+// ------------------------------------------------------------------ //
- // Rank-3, Not Flat, Tag
- template <typename Idx>
- KOKKOS_FORCEINLINE_FUNCTION
- typename std::enable_if<( std::is_integral<Idx>::value
- && !std::is_same<void, work_tag>::value
- && MDRange::rank == 3
- && MDRange::inner_direction != MDRange::Flat
- )>::type
- operator()(Idx t) const
- {
- work_tag tag;
-
- index_type t0, t1, t2;
- if ( MDRange::outer_direction == MDRange::Right ) {
- const index_type tmp_prod = ( m_range.m_tile_dim[1]*m_range.m_tile_dim[2]);
- t0 = t / tmp_prod;
- t1 = ( t % tmp_prod ) / m_range.m_tile_dim[2];
- t2 = ( t % tmp_prod ) % m_range.m_tile_dim[2];
- } else {
- const index_type tmp_prod = ( m_range.m_tile_dim[0]*m_range.m_tile_dim[1]);
- t0 = ( t % tmp_prod ) % m_range.m_tile_dim[0];
- t1 = ( t % tmp_prod ) / m_range.m_tile_dim[0];
- t2 = t / tmp_prod;
- }
+// ------------------------------------------------------------------ //
+//md_parallel_for
+// ------------------------------------------------------------------ //
+template <typename MDRange, typename Functor, typename Enable = void>
+void md_parallel_for( MDRange const& range
+ , Functor const& f
+ , const std::string& str = ""
+ , typename std::enable_if<( true
+ #if defined( KOKKOS_ENABLE_CUDA)
+ && !std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
+ #endif
+ ) >::type* = 0
+ )
+{
+ Impl::MDFunctor<MDRange, Functor, void> g(range, f);
- const index_type b0 = t0 * m_range.m_tile[0] + m_range.m_offset[0];
- const index_type b1 = t1 * m_range.m_tile[1] + m_range.m_offset[1];
- const index_type b2 = t2 * m_range.m_tile[2] + m_range.m_offset[2];
-
- const index_type e0 = b0 + m_range.m_tile[0] <= (m_range.m_dim[0] + m_range.m_offset[0] ) ? b0 + m_range.m_tile[0] : ( m_range.m_dim[0] + m_range.m_offset[0] );
- const index_type e1 = b1 + m_range.m_tile[1] <= (m_range.m_dim[1] + m_range.m_offset[1] ) ? b1 + m_range.m_tile[1] : ( m_range.m_dim[1] + m_range.m_offset[1] );
- const index_type e2 = b2 + m_range.m_tile[2] <= (m_range.m_dim[2] + m_range.m_offset[2] ) ? b2 + m_range.m_tile[2] : ( m_range.m_dim[2] + m_range.m_offset[2] );
-
- if ( MDRange::inner_direction == MDRange::Right ) {
- for (int i0=b0; i0<e0; ++i0) {
- for (int i1=b1; i1<e1; ++i1) {
- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
- #pragma ivdep
- #endif
- for (int i2=b2; i2<e2; ++i2) {
- m_func( tag, i0, i1, i2 );
- }}}
- } else {
- for (int i2=b2; i2<e2; ++i2) {
- for (int i1=b1; i1<e1; ++i1) {
- #if defined(KOKKOS_IMPL_MDRANGE_IVDEP)
- #pragma ivdep
- #endif
- for (int i0=b0; i0<e0; ++i0) {
- m_func( tag, i0, i1, i2 );
- }}}
- }
- }
-};
+ //using range_policy = typename MDRange::range_policy;
+ using range_policy = typename MDRange::impl_range_policy;
+
+ Kokkos::parallel_for( range_policy(0, range.m_num_tiles).set_chunk_size(1), g, str );
+}
+template <typename MDRange, typename Functor>
+void md_parallel_for( const std::string& str
+ , MDRange const& range
+ , Functor const& f
+ , typename std::enable_if<( true
+ #if defined( KOKKOS_ENABLE_CUDA)
+ && !std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
+ #endif
+ ) >::type* = 0
+ )
+{
+ Impl::MDFunctor<MDRange, Functor, void> g(range, f);
+ //using range_policy = typename MDRange::range_policy;
+ using range_policy = typename MDRange::impl_range_policy;
-} // namespace Impl
+ Kokkos::parallel_for( range_policy(0, range.m_num_tiles).set_chunk_size(1), g, str );
+}
+// Cuda specialization
+#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
+template <typename MDRange, typename Functor>
+void md_parallel_for( const std::string& str
+ , MDRange const& range
+ , Functor const& f
+ , typename std::enable_if<( true
+ #if defined( KOKKOS_ENABLE_CUDA)
+ && std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
+ #endif
+ ) >::type* = 0
+ )
+{
+ Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f);
+ closure.execute();
+}
template <typename MDRange, typename Functor>
void md_parallel_for( MDRange const& range
, Functor const& f
, const std::string& str = ""
+ , typename std::enable_if<( true
+ #if defined( KOKKOS_ENABLE_CUDA)
+ && std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
+ #endif
+ ) >::type* = 0
)
{
- Impl::MDForFunctor<MDRange, Functor> g(range, f);
+ Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f);
+ closure.execute();
+}
+#endif
+// ------------------------------------------------------------------ //
- using range_policy = typename MDRange::range_policy;
+// ------------------------------------------------------------------ //
+//md_parallel_reduce
+// ------------------------------------------------------------------ //
+template <typename MDRange, typename Functor, typename ValueType>
+void md_parallel_reduce( MDRange const& range
+ , Functor const& f
+ , ValueType & v
+ , const std::string& str = ""
+ , typename std::enable_if<( true
+ #if defined( KOKKOS_ENABLE_CUDA)
+ && !std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
+ #endif
+ ) >::type* = 0
+ )
+{
+ Impl::MDFunctor<MDRange, Functor, ValueType> g(range, f, v);
- Kokkos::parallel_for( range_policy(0, range.m_num_tiles).set_chunk_size(1), g, str );
+ //using range_policy = typename MDRange::range_policy;
+ using range_policy = typename MDRange::impl_range_policy;
+ Kokkos::parallel_reduce( str, range_policy(0, range.m_num_tiles).set_chunk_size(1), g, v );
}
-template <typename MDRange, typename Functor>
-void md_parallel_for( const std::string& str
+template <typename MDRange, typename Functor, typename ValueType>
+void md_parallel_reduce( const std::string& str
, MDRange const& range
, Functor const& f
+ , ValueType & v
+ , typename std::enable_if<( true
+ #if defined( KOKKOS_ENABLE_CUDA)
+ && !std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
+ #endif
+ ) >::type* = 0
)
{
- Impl::MDForFunctor<MDRange, Functor> g(range, f);
+ Impl::MDFunctor<MDRange, Functor, ValueType> g(range, f, v);
- using range_policy = typename MDRange::range_policy;
+ //using range_policy = typename MDRange::range_policy;
+ using range_policy = typename MDRange::impl_range_policy;
- Kokkos::parallel_for( range_policy(0, range.m_num_tiles).set_chunk_size(1), g, str );
+ Kokkos::parallel_reduce( str, range_policy(0, range.m_num_tiles).set_chunk_size(1), g, v );
}
+// Cuda - parallel_reduce not implemented yet
+/*
+template <typename MDRange, typename Functor, typename ValueType>
+void md_parallel_reduce( MDRange const& range
+ , Functor const& f
+ , ValueType & v
+ , const std::string& str = ""
+ , typename std::enable_if<( true
+ #if defined( KOKKOS_ENABLE_CUDA)
+ && std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
+ #endif
+ ) >::type* = 0
+ )
+{
+ Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f, v);
+ closure.execute();
+}
+
+template <typename MDRange, typename Functor, typename ValueType>
+void md_parallel_reduce( const std::string& str
+ , MDRange const& range
+ , Functor const& f
+ , ValueType & v
+ , typename std::enable_if<( true
+ #if defined( KOKKOS_ENABLE_CUDA)
+ && std::is_same< typename MDRange::range_policy::execution_space, Kokkos::Cuda>::value
+ #endif
+ ) >::type* = 0
+ )
+{
+ Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f, v);
+ closure.execute();
+}
+*/
+
}} // namespace Kokkos::Experimental
#endif //KOKKOS_CORE_EXP_MD_RANGE_POLICY_HPP
diff --git a/lib/kokkos/core/src/Kokkos_Array.hpp b/lib/kokkos/core/src/Kokkos_Array.hpp
index 8deb5142c..abb263b7c 100644
--- a/lib/kokkos/core/src/Kokkos_Array.hpp
+++ b/lib/kokkos/core/src/Kokkos_Array.hpp
@@ -1,302 +1,315 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_ARRAY_HPP
#define KOKKOS_ARRAY_HPP
#include <type_traits>
#include <algorithm>
#include <limits>
#include <cstddef>
namespace Kokkos {
/**\brief Derived from the C++17 'std::array'.
* Dropping the iterator interface.
*/
template< class T = void
, size_t N = ~size_t(0)
, class Proxy = void
>
struct Array {
-private:
- T m_elem[N];
+public:
+ /**
+ * The elements of this C array shall not be accessed directly. The data
+ * member has to be declared public to enable aggregate initialization as for
+ * std::array. We mark it as private in the documentation.
+ * @private
+ */
+ T m_internal_implementation_private_member_data[N];
public:
typedef T & reference ;
typedef typename std::add_const<T>::type & const_reference ;
typedef size_t size_type ;
typedef ptrdiff_t difference_type ;
typedef T value_type ;
typedef T * pointer ;
typedef typename std::add_const<T>::type * const_pointer ;
KOKKOS_INLINE_FUNCTION static constexpr size_type size() { return N ; }
KOKKOS_INLINE_FUNCTION static constexpr bool empty(){ return false ; }
template< typename iType >
KOKKOS_INLINE_FUNCTION
reference operator[]( const iType & i )
{
- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
- return m_elem[i];
+ static_assert( ( std::is_integral<iType>::value || std::is_enum<iType>::value ) , "Must be integral argument" );
+ return m_internal_implementation_private_member_data[i];
}
template< typename iType >
KOKKOS_INLINE_FUNCTION
const_reference operator[]( const iType & i ) const
{
- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
- return m_elem[i];
+ static_assert( ( std::is_integral<iType>::value || std::is_enum<iType>::value ) , "Must be integral argument" );
+ return m_internal_implementation_private_member_data[i];
}
- KOKKOS_INLINE_FUNCTION pointer data() { return & m_elem[0] ; }
- KOKKOS_INLINE_FUNCTION const_pointer data() const { return & m_elem[0] ; }
+ KOKKOS_INLINE_FUNCTION pointer data()
+ {
+ return & m_internal_implementation_private_member_data[0];
+ }
+ KOKKOS_INLINE_FUNCTION const_pointer data() const
+ {
+ return & m_internal_implementation_private_member_data[0];
+ }
- ~Array() = default ;
- Array() = default ;
- Array( const Array & ) = default ;
- Array & operator = ( const Array & ) = default ;
+ // Do not default unless move and move-assignment are also defined
+ // ~Array() = default ;
+ // Array() = default ;
+ // Array( const Array & ) = default ;
+ // Array & operator = ( const Array & ) = default ;
// Some supported compilers are not sufficiently C++11 compliant
// for default move constructor and move assignment operator.
// Array( Array && ) = default ;
// Array & operator = ( Array && ) = default ;
};
template< class T , class Proxy >
struct Array<T,0,Proxy> {
public:
typedef typename std::add_const<T>::type & reference ;
typedef typename std::add_const<T>::type & const_reference ;
typedef size_t size_type ;
typedef ptrdiff_t difference_type ;
typedef typename std::add_const<T>::type value_type ;
typedef typename std::add_const<T>::type * pointer ;
typedef typename std::add_const<T>::type * const_pointer ;
KOKKOS_INLINE_FUNCTION static constexpr size_type size() { return 0 ; }
KOKKOS_INLINE_FUNCTION static constexpr bool empty() { return true ; }
template< typename iType >
KOKKOS_INLINE_FUNCTION
value_type operator[]( const iType & )
{
- static_assert( std::is_integral<iType>::value , "Must be integer argument" );
+ static_assert( ( std::is_integral<iType>::value || std::is_enum<iType>::value ) , "Must be integer argument" );
return value_type();
}
template< typename iType >
KOKKOS_INLINE_FUNCTION
value_type operator[]( const iType & ) const
{
- static_assert( std::is_integral<iType>::value , "Must be integer argument" );
+ static_assert( ( std::is_integral<iType>::value || std::is_enum<iType>::value ) , "Must be integer argument" );
return value_type();
}
KOKKOS_INLINE_FUNCTION pointer data() { return pointer(0) ; }
KOKKOS_INLINE_FUNCTION const_pointer data() const { return const_pointer(0); }
~Array() = default ;
Array() = default ;
Array( const Array & ) = default ;
Array & operator = ( const Array & ) = default ;
// Some supported compilers are not sufficiently C++11 compliant
// for default move constructor and move assignment operator.
// Array( Array && ) = default ;
// Array & operator = ( Array && ) = default ;
};
template<>
struct Array<void,~size_t(0),void>
{
struct contiguous {};
struct strided {};
};
template< class T >
struct Array< T , ~size_t(0) , Array<>::contiguous >
{
private:
T * m_elem ;
size_t m_size ;
public:
typedef T & reference ;
typedef typename std::add_const<T>::type & const_reference ;
typedef size_t size_type ;
typedef ptrdiff_t difference_type ;
typedef T value_type ;
typedef T * pointer ;
typedef typename std::add_const<T>::type * const_pointer ;
KOKKOS_INLINE_FUNCTION constexpr size_type size() const { return m_size ; }
KOKKOS_INLINE_FUNCTION constexpr bool empty() const { return 0 != m_size ; }
template< typename iType >
KOKKOS_INLINE_FUNCTION
reference operator[]( const iType & i )
{
- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
+ static_assert( ( std::is_integral<iType>::value || std::is_enum<iType>::value ) , "Must be integral argument" );
return m_elem[i];
}
template< typename iType >
KOKKOS_INLINE_FUNCTION
const_reference operator[]( const iType & i ) const
{
- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
+ static_assert( ( std::is_integral<iType>::value || std::is_enum<iType>::value ) , "Must be integral argument" );
return m_elem[i];
}
KOKKOS_INLINE_FUNCTION pointer data() { return m_elem ; }
KOKKOS_INLINE_FUNCTION const_pointer data() const { return m_elem ; }
~Array() = default ;
Array() = delete ;
Array( const Array & rhs ) = delete ;
// Some supported compilers are not sufficiently C++11 compliant
// for default move constructor and move assignment operator.
// Array( Array && rhs ) = default ;
// Array & operator = ( Array && rhs ) = delete ;
KOKKOS_INLINE_FUNCTION
Array & operator = ( const Array & rhs )
{
const size_t n = std::min( m_size , rhs.size() );
for ( size_t i = 0 ; i < n ; ++i ) m_elem[i] = rhs[i] ;
return *this ;
}
template< size_t N , class P >
KOKKOS_INLINE_FUNCTION
Array & operator = ( const Array<T,N,P> & rhs )
{
const size_t n = std::min( m_size , rhs.size() );
for ( size_t i = 0 ; i < n ; ++i ) m_elem[i] = rhs[i] ;
return *this ;
}
KOKKOS_INLINE_FUNCTION constexpr Array( pointer arg_ptr , size_type arg_size , size_type = 0 )
: m_elem(arg_ptr), m_size(arg_size) {}
};
template< class T >
struct Array< T , ~size_t(0) , Array<>::strided >
{
private:
T * m_elem ;
size_t m_size ;
size_t m_stride ;
public:
typedef T & reference ;
typedef typename std::add_const<T>::type & const_reference ;
typedef size_t size_type ;
typedef ptrdiff_t difference_type ;
typedef T value_type ;
typedef T * pointer ;
typedef typename std::add_const<T>::type * const_pointer ;
KOKKOS_INLINE_FUNCTION constexpr size_type size() const { return m_size ; }
KOKKOS_INLINE_FUNCTION constexpr bool empty() const { return 0 != m_size ; }
template< typename iType >
KOKKOS_INLINE_FUNCTION
reference operator[]( const iType & i )
{
- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
+ static_assert( ( std::is_integral<iType>::value || std::is_enum<iType>::value ) , "Must be integral argument" );
return m_elem[i*m_stride];
}
template< typename iType >
KOKKOS_INLINE_FUNCTION
const_reference operator[]( const iType & i ) const
{
- static_assert( std::is_integral<iType>::value , "Must be integral argument" );
+ static_assert( ( std::is_integral<iType>::value || std::is_enum<iType>::value ) , "Must be integral argument" );
return m_elem[i*m_stride];
}
KOKKOS_INLINE_FUNCTION pointer data() { return m_elem ; }
KOKKOS_INLINE_FUNCTION const_pointer data() const { return m_elem ; }
~Array() = default ;
Array() = delete ;
Array( const Array & ) = delete ;
// Some supported compilers are not sufficiently C++11 compliant
// for default move constructor and move assignment operator.
// Array( Array && rhs ) = default ;
// Array & operator = ( Array && rhs ) = delete ;
KOKKOS_INLINE_FUNCTION
Array & operator = ( const Array & rhs )
{
const size_t n = std::min( m_size , rhs.size() );
for ( size_t i = 0 ; i < n ; ++i ) m_elem[i] = rhs[i] ;
return *this ;
}
template< size_t N , class P >
KOKKOS_INLINE_FUNCTION
Array & operator = ( const Array<T,N,P> & rhs )
{
const size_t n = std::min( m_size , rhs.size() );
for ( size_t i = 0 ; i < n ; ++i ) m_elem[i] = rhs[i] ;
return *this ;
}
KOKKOS_INLINE_FUNCTION constexpr Array( pointer arg_ptr , size_type arg_size , size_type arg_stride )
: m_elem(arg_ptr), m_size(arg_size), m_stride(arg_stride) {}
};
} // namespace Kokkos
#endif /* #ifndef KOKKOS_ARRAY_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Concepts.hpp b/lib/kokkos/core/src/Kokkos_Concepts.hpp
index 3f9bdea40..cfcdabf95 100644
--- a/lib/kokkos/core/src/Kokkos_Concepts.hpp
+++ b/lib/kokkos/core/src/Kokkos_Concepts.hpp
@@ -1,342 +1,343 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CORE_CONCEPTS_HPP
#define KOKKOS_CORE_CONCEPTS_HPP
#include <type_traits>
// Needed for 'is_space<S>::host_mirror_space
#include <Kokkos_Core_fwd.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
//Schedules for Execution Policies
struct Static {};
struct Dynamic {};
//Schedule Wrapper Type
template<class T>
struct Schedule
{
static_assert( std::is_same<T,Static>::value
|| std::is_same<T,Dynamic>::value
, "Kokkos: Invalid Schedule<> type."
);
using schedule_type = Schedule ;
using type = T;
};
//Specify Iteration Index Type
template<typename T>
struct IndexType
{
static_assert(std::is_integral<T>::value,"Kokkos: Invalid IndexType<>.");
using index_type = IndexType ;
using type = T;
};
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
#define KOKKOS_IMPL_IS_CONCEPT( CONCEPT ) \
template< typename T > struct is_ ## CONCEPT { \
private: \
template< typename , typename = std::true_type > struct have : std::false_type {}; \
template< typename U > struct have<U,typename std::is_same<U,typename U:: CONCEPT >::type> : std::true_type {}; \
public: \
enum { value = is_ ## CONCEPT::template have<T>::value }; \
};
// Public concept:
KOKKOS_IMPL_IS_CONCEPT( memory_space )
KOKKOS_IMPL_IS_CONCEPT( memory_traits )
KOKKOS_IMPL_IS_CONCEPT( execution_space )
KOKKOS_IMPL_IS_CONCEPT( execution_policy )
KOKKOS_IMPL_IS_CONCEPT( array_layout )
+KOKKOS_IMPL_IS_CONCEPT( reducer )
namespace Impl {
// For backward compatibility:
using Kokkos::is_memory_space ;
using Kokkos::is_memory_traits ;
using Kokkos::is_execution_space ;
using Kokkos::is_execution_policy ;
using Kokkos::is_array_layout ;
// Implementation concept:
KOKKOS_IMPL_IS_CONCEPT( iteration_pattern )
KOKKOS_IMPL_IS_CONCEPT( schedule_type )
KOKKOS_IMPL_IS_CONCEPT( index_type )
}
#undef KOKKOS_IMPL_IS_CONCEPT
} // namespace Kokkos
//----------------------------------------------------------------------------
namespace Kokkos {
template< class ExecutionSpace , class MemorySpace >
struct Device {
static_assert( Kokkos::is_execution_space<ExecutionSpace>::value
, "Execution space is not valid" );
static_assert( Kokkos::is_memory_space<MemorySpace>::value
, "Memory space is not valid" );
typedef ExecutionSpace execution_space;
typedef MemorySpace memory_space;
typedef Device<execution_space,memory_space> device_type;
};
template< typename T >
struct is_space {
private:
template< typename , typename = void >
struct exe : std::false_type { typedef void space ; };
template< typename , typename = void >
struct mem : std::false_type { typedef void space ; };
template< typename , typename = void >
struct dev : std::false_type { typedef void space ; };
template< typename U >
struct exe<U,typename std::conditional<true,void,typename U::execution_space>::type>
: std::is_same<U,typename U::execution_space>::type
{ typedef typename U::execution_space space ; };
template< typename U >
struct mem<U,typename std::conditional<true,void,typename U::memory_space>::type>
: std::is_same<U,typename U::memory_space>::type
{ typedef typename U::memory_space space ; };
template< typename U >
struct dev<U,typename std::conditional<true,void,typename U::device_type>::type>
: std::is_same<U,typename U::device_type>::type
{ typedef typename U::device_type space ; };
typedef typename is_space::template exe<T> is_exe ;
typedef typename is_space::template mem<T> is_mem ;
typedef typename is_space::template dev<T> is_dev ;
public:
enum { value = is_exe::value || is_mem::value || is_dev::value };
typedef typename is_exe::space execution_space ;
typedef typename is_mem::space memory_space ;
// For backward compatibility, deprecated in favor of
// Kokkos::Impl::HostMirror<S>::host_mirror_space
typedef typename std::conditional
< std::is_same< memory_space , Kokkos::HostSpace >::value
#if defined( KOKKOS_ENABLE_CUDA )
|| std::is_same< memory_space , Kokkos::CudaUVMSpace >::value
|| std::is_same< memory_space , Kokkos::CudaHostPinnedSpace >::value
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
, memory_space
, Kokkos::HostSpace
>::type host_memory_space ;
#if defined( KOKKOS_ENABLE_CUDA )
typedef typename std::conditional
< std::is_same< execution_space , Kokkos::Cuda >::value
, Kokkos::DefaultHostExecutionSpace , execution_space
>::type host_execution_space ;
#else
typedef execution_space host_execution_space ;
#endif
typedef typename std::conditional
< std::is_same< execution_space , host_execution_space >::value &&
std::is_same< memory_space , host_memory_space >::value
, T , Kokkos::Device< host_execution_space , host_memory_space >
>::type host_mirror_space ;
};
// For backward compatiblity
namespace Impl {
using Kokkos::is_space ;
}
} // namespace Kokkos
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
/**\brief Access relationship between DstMemorySpace and SrcMemorySpace
*
* The default case can assume accessibility for the same space.
* Specializations must be defined for different memory spaces.
*/
template< typename DstMemorySpace , typename SrcMemorySpace >
struct MemorySpaceAccess {
static_assert( Kokkos::is_memory_space< DstMemorySpace >::value &&
Kokkos::is_memory_space< SrcMemorySpace >::value
, "template arguments must be memory spaces" );
/**\brief Can a View (or pointer) to memory in SrcMemorySpace
* be assigned to a View (or pointer) to memory marked DstMemorySpace.
*
* 1. DstMemorySpace::execution_space == SrcMemorySpace::execution_space
* 2. All execution spaces that can access DstMemorySpace can also access
* SrcMemorySpace.
*/
enum { assignable = std::is_same<DstMemorySpace,SrcMemorySpace>::value };
/**\brief For all DstExecSpace::memory_space == DstMemorySpace
* DstExecSpace can access SrcMemorySpace.
*/
enum { accessible = assignable };
/**\brief Does a DeepCopy capability exist
* to DstMemorySpace from SrcMemorySpace
*/
enum { deepcopy = assignable };
};
/**\brief Can AccessSpace access MemorySpace ?
*
* Requires:
* Kokkos::is_space< AccessSpace >::value
* Kokkos::is_memory_space< MemorySpace >::value
*
* Can AccessSpace::execution_space access MemorySpace ?
* enum : bool { accessible };
*
* Is View<AccessSpace::memory_space> assignable from View<MemorySpace> ?
* enum : bool { assignable };
*
* If ! accessible then through which intercessory memory space
* should a be used to deep copy memory for
* AccessSpace::execution_space
* to get access.
* When AccessSpace::memory_space == Kokkos::HostSpace
* then space is the View host mirror space.
*/
template< typename AccessSpace , typename MemorySpace >
struct SpaceAccessibility {
private:
static_assert( Kokkos::is_space< AccessSpace >::value
, "template argument #1 must be a Kokkos space" );
static_assert( Kokkos::is_memory_space< MemorySpace >::value
, "template argument #2 must be a Kokkos memory space" );
// The input AccessSpace may be a Device<ExecSpace,MemSpace>
// verify that it is a valid combination of spaces.
static_assert( Kokkos::Impl::MemorySpaceAccess
< typename AccessSpace::execution_space::memory_space
, typename AccessSpace::memory_space
>::accessible
, "template argument #1 is an invalid space" );
typedef Kokkos::Impl::MemorySpaceAccess
< typename AccessSpace::execution_space::memory_space , MemorySpace >
exe_access ;
typedef Kokkos::Impl::MemorySpaceAccess
< typename AccessSpace::memory_space , MemorySpace >
mem_access ;
public:
/**\brief Can AccessSpace::execution_space access MemorySpace ?
*
* Default based upon memory space accessibility.
* Specialization required for other relationships.
*/
enum { accessible = exe_access::accessible };
/**\brief Can assign to AccessSpace from MemorySpace ?
*
* Default based upon memory space accessibility.
* Specialization required for other relationships.
*/
enum { assignable =
is_memory_space< AccessSpace >::value && mem_access::assignable };
/**\brief Can deep copy to AccessSpace::memory_Space from MemorySpace ? */
enum { deepcopy = mem_access::deepcopy };
// What intercessory space for AccessSpace::execution_space
// to be able to access MemorySpace?
// If same memory space or not accessible use the AccessSpace
// else construct a device with execution space and memory space.
typedef typename std::conditional
< std::is_same<typename AccessSpace::memory_space,MemorySpace>::value ||
! exe_access::accessible
, AccessSpace
, Kokkos::Device< typename AccessSpace::execution_space , MemorySpace >
>::type space ;
};
}} // namespace Kokkos::Impl
//----------------------------------------------------------------------------
#endif // KOKKOS_CORE_CONCEPTS_HPP
diff --git a/lib/kokkos/core/src/Kokkos_Core.hpp b/lib/kokkos/core/src/Kokkos_Core.hpp
index 6d92f4bf6..16c1bce90 100644
--- a/lib/kokkos/core/src/Kokkos_Core.hpp
+++ b/lib/kokkos/core/src/Kokkos_Core.hpp
@@ -1,162 +1,169 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CORE_HPP
#define KOKKOS_CORE_HPP
//----------------------------------------------------------------------------
// Include the execution space header files for the enabled execution spaces.
#include <Kokkos_Core_fwd.hpp>
#if defined( KOKKOS_ENABLE_SERIAL )
#include <Kokkos_Serial.hpp>
#endif
#if defined( KOKKOS_ENABLE_OPENMP )
#include <Kokkos_OpenMP.hpp>
#endif
+#if defined( KOKKOS_ENABLE_QTHREADS )
+#include <Kokkos_Qthreads.hpp>
+#endif
+
#if defined( KOKKOS_ENABLE_PTHREAD )
#include <Kokkos_Threads.hpp>
#endif
#if defined( KOKKOS_ENABLE_CUDA )
#include <Kokkos_Cuda.hpp>
#endif
#include <Kokkos_MemoryPool.hpp>
#include <Kokkos_Pair.hpp>
#include <Kokkos_Array.hpp>
#include <Kokkos_View.hpp>
#include <Kokkos_Vectorization.hpp>
#include <Kokkos_Atomic.hpp>
#include <Kokkos_hwloc.hpp>
#include <Kokkos_Timer.hpp>
#include <Kokkos_Complex.hpp>
+#include <iosfwd>
//----------------------------------------------------------------------------
namespace Kokkos {
struct InitArguments {
int num_threads;
int num_numa;
int device_id;
InitArguments() {
num_threads = -1;
num_numa = -1;
device_id = -1;
}
};
void initialize(int& narg, char* arg[]);
void initialize(const InitArguments& args = InitArguments());
/** \brief Finalize the spaces that were initialized via Kokkos::initialize */
void finalize();
/** \brief Finalize all known execution spaces */
void finalize_all();
void fence();
+/** \brief Print "Bill of Materials" */
+void print_configuration( std::ostream & , const bool detail = false );
+
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
/* Allocate memory from a memory space.
* The allocation is tracked in Kokkos memory tracking system, so
* leaked memory can be identified.
*/
template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
inline
void * kokkos_malloc( const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
typedef typename Space::memory_space MemorySpace ;
return Impl::SharedAllocationRecord< MemorySpace >::
allocate_tracked( MemorySpace() , arg_alloc_label , arg_alloc_size );
}
template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
inline
void * kokkos_malloc( const size_t arg_alloc_size )
{
typedef typename Space::memory_space MemorySpace ;
return Impl::SharedAllocationRecord< MemorySpace >::
allocate_tracked( MemorySpace() , "no-label" , arg_alloc_size );
}
template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
inline
void kokkos_free( void * arg_alloc )
{
typedef typename Space::memory_space MemorySpace ;
return Impl::SharedAllocationRecord< MemorySpace >::
deallocate_tracked( arg_alloc );
}
template< class Space = typename Kokkos::DefaultExecutionSpace::memory_space >
inline
void * kokkos_realloc( void * arg_alloc , const size_t arg_alloc_size )
{
typedef typename Space::memory_space MemorySpace ;
return Impl::SharedAllocationRecord< MemorySpace >::
reallocate_tracked( arg_alloc , arg_alloc_size );
}
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif
-
diff --git a/lib/kokkos/core/src/Kokkos_Core_fwd.hpp b/lib/kokkos/core/src/Kokkos_Core_fwd.hpp
index e7e6a49d3..4029bf599 100644
--- a/lib/kokkos/core/src/Kokkos_Core_fwd.hpp
+++ b/lib/kokkos/core/src/Kokkos_Core_fwd.hpp
@@ -1,248 +1,259 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CORE_FWD_HPP
#define KOKKOS_CORE_FWD_HPP
//----------------------------------------------------------------------------
// Kokkos_Macros.hpp does introspection on configuration options
// and compiler environment then sets a collection of #define macros.
#include <Kokkos_Macros.hpp>
#include <impl/Kokkos_Utilities.hpp>
//----------------------------------------------------------------------------
// Have assumed a 64bit build (8byte pointers) throughout the code base.
static_assert( sizeof(void*) == 8
, "Kokkos assumes 64-bit build; i.e., 8-byte pointers" );
//----------------------------------------------------------------------------
namespace Kokkos {
struct AUTO_t {
KOKKOS_INLINE_FUNCTION
- constexpr const AUTO_t & operator()() const { return *this ; }
+ constexpr const AUTO_t & operator()() const { return *this; }
};
namespace {
/**\brief Token to indicate that a parameter's value is to be automatically selected */
constexpr AUTO_t AUTO = Kokkos::AUTO_t();
}
struct InvalidType {};
-}
+} // namespace Kokkos
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
// Forward declarations for class inter-relationships
namespace Kokkos {
-class HostSpace ; ///< Memory space for main process and CPU execution spaces
+class HostSpace; ///< Memory space for main process and CPU execution spaces
#ifdef KOKKOS_ENABLE_HBWSPACE
namespace Experimental {
-class HBWSpace ; /// Memory space for hbw_malloc from memkind (e.g. for KNL processor)
+class HBWSpace; /// Memory space for hbw_malloc from memkind (e.g. for KNL processor)
}
#endif
#if defined( KOKKOS_ENABLE_SERIAL )
-class Serial ; ///< Execution space main process on CPU
-#endif // defined( KOKKOS_ENABLE_SERIAL )
+class Serial; ///< Execution space main process on CPU.
+#endif
+
+#if defined( KOKKOS_ENABLE_QTHREADS )
+class Qthreads; ///< Execution space with Qthreads back-end.
+#endif
#if defined( KOKKOS_ENABLE_PTHREAD )
-class Threads ; ///< Execution space with pthreads back-end
+class Threads; ///< Execution space with pthreads back-end.
#endif
#if defined( KOKKOS_ENABLE_OPENMP )
-class OpenMP ; ///< OpenMP execution space
+class OpenMP; ///< OpenMP execution space.
#endif
#if defined( KOKKOS_ENABLE_CUDA )
-class CudaSpace ; ///< Memory space on Cuda GPU
-class CudaUVMSpace ; ///< Memory space on Cuda GPU with UVM
-class CudaHostPinnedSpace ; ///< Memory space on Host accessible to Cuda GPU
-class Cuda ; ///< Execution space for Cuda GPU
+class CudaSpace; ///< Memory space on Cuda GPU
+class CudaUVMSpace; ///< Memory space on Cuda GPU with UVM
+class CudaHostPinnedSpace; ///< Memory space on Host accessible to Cuda GPU
+class Cuda; ///< Execution space for Cuda GPU
#endif
template<class ExecutionSpace, class MemorySpace>
struct Device;
+
} // namespace Kokkos
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
// Set the default execution space.
/// Define Kokkos::DefaultExecutionSpace as per configuration option
/// or chosen from the enabled execution spaces in the following order:
/// Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Serial
namespace Kokkos {
-#if defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
- typedef Cuda DefaultExecutionSpace ;
-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
- typedef OpenMP DefaultExecutionSpace ;
-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
- typedef Threads DefaultExecutionSpace ;
-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
- typedef Serial DefaultExecutionSpace ;
+#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
+ typedef Cuda DefaultExecutionSpace;
+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
+ typedef OpenMP DefaultExecutionSpace;
+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
+ typedef Threads DefaultExecutionSpace;
+//#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
+// typedef Qthreads DefaultExecutionSpace;
+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
+ typedef Serial DefaultExecutionSpace;
#else
-# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads."
+# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qthreads, or Kokkos::Serial."
#endif
-#if defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
- typedef OpenMP DefaultHostExecutionSpace ;
-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
- typedef Threads DefaultHostExecutionSpace ;
-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
- typedef Serial DefaultHostExecutionSpace ;
-#elif defined ( KOKKOS_ENABLE_OPENMP )
- typedef OpenMP DefaultHostExecutionSpace ;
-#elif defined ( KOKKOS_ENABLE_PTHREAD )
- typedef Threads DefaultHostExecutionSpace ;
-#elif defined ( KOKKOS_ENABLE_SERIAL )
- typedef Serial DefaultHostExecutionSpace ;
+#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
+ typedef OpenMP DefaultHostExecutionSpace;
+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
+ typedef Threads DefaultHostExecutionSpace;
+//#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
+// typedef Qthreads DefaultHostExecutionSpace;
+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
+ typedef Serial DefaultHostExecutionSpace;
+#elif defined( KOKKOS_ENABLE_OPENMP )
+ typedef OpenMP DefaultHostExecutionSpace;
+#elif defined( KOKKOS_ENABLE_PTHREAD )
+ typedef Threads DefaultHostExecutionSpace;
+//#elif defined( KOKKOS_ENABLE_QTHREADS )
+// typedef Qthreads DefaultHostExecutionSpace;
+#elif defined( KOKKOS_ENABLE_SERIAL )
+ typedef Serial DefaultHostExecutionSpace;
#else
-# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads."
+# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qthreads, or Kokkos::Serial."
#endif
} // namespace Kokkos
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
// Detect the active execution space and define its memory space.
// This is used to verify whether a running kernel can access
// a given memory space.
namespace Kokkos {
+
namespace Impl {
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA ) && defined (KOKKOS_ENABLE_CUDA)
-typedef Kokkos::CudaSpace ActiveExecutionMemorySpace ;
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA ) && defined( KOKKOS_ENABLE_CUDA )
+typedef Kokkos::CudaSpace ActiveExecutionMemorySpace;
#elif defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
-typedef Kokkos::HostSpace ActiveExecutionMemorySpace ;
+typedef Kokkos::HostSpace ActiveExecutionMemorySpace;
#else
-typedef void ActiveExecutionMemorySpace ;
+typedef void ActiveExecutionMemorySpace;
#endif
-template< class ActiveSpace , class MemorySpace >
+template< class ActiveSpace, class MemorySpace >
struct VerifyExecutionCanAccessMemorySpace {
enum {value = 0};
};
template< class Space >
-struct VerifyExecutionCanAccessMemorySpace< Space , Space >
+struct VerifyExecutionCanAccessMemorySpace< Space, Space >
{
enum {value = 1};
KOKKOS_INLINE_FUNCTION static void verify(void) {}
KOKKOS_INLINE_FUNCTION static void verify(const void *) {}
};
} // namespace Impl
+
} // namespace Kokkos
-#define KOKKOS_RESTRICT_EXECUTION_TO_DATA( DATA_SPACE , DATA_PTR ) \
+#define KOKKOS_RESTRICT_EXECUTION_TO_DATA( DATA_SPACE, DATA_PTR ) \
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< \
- Kokkos::Impl::ActiveExecutionMemorySpace , DATA_SPACE >::verify( DATA_PTR )
+ Kokkos::Impl::ActiveExecutionMemorySpace, DATA_SPACE >::verify( DATA_PTR )
#define KOKKOS_RESTRICT_EXECUTION_TO_( DATA_SPACE ) \
Kokkos::Impl::VerifyExecutionCanAccessMemorySpace< \
- Kokkos::Impl::ActiveExecutionMemorySpace , DATA_SPACE >::verify()
+ Kokkos::Impl::ActiveExecutionMemorySpace, DATA_SPACE >::verify()
//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
namespace Kokkos {
void fence();
}
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
+
namespace Impl {
template< class Functor
, class Policy
, class EnableFunctor = void
- , class EnablePolicy = void
+ , class EnablePolicy = void
>
struct FunctorPolicyExecutionSpace;
//----------------------------------------------------------------------------
/// \class ParallelFor
/// \brief Implementation of the ParallelFor operator that has a
/// partial specialization for the device.
///
/// This is an implementation detail of parallel_for. Users should
/// skip this and go directly to the nonmember function parallel_for.
-template< class FunctorType , class ExecPolicy , class ExecutionSpace =
- typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
- > class ParallelFor ;
+template< class FunctorType, class ExecPolicy, class ExecutionSpace =
+ typename Impl::FunctorPolicyExecutionSpace< FunctorType, ExecPolicy >::execution_space
+ > class ParallelFor;
/// \class ParallelReduce
/// \brief Implementation detail of parallel_reduce.
///
/// This is an implementation detail of parallel_reduce. Users should
/// skip this and go directly to the nonmember function parallel_reduce.
-template< class FunctorType , class ExecPolicy , class ReducerType = InvalidType, class ExecutionSpace =
- typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
- > class ParallelReduce ;
+template< class FunctorType, class ExecPolicy, class ReducerType = InvalidType, class ExecutionSpace =
+ typename Impl::FunctorPolicyExecutionSpace< FunctorType, ExecPolicy >::execution_space
+ > class ParallelReduce;
/// \class ParallelScan
/// \brief Implementation detail of parallel_scan.
///
/// This is an implementation detail of parallel_scan. Users should
/// skip this and go directly to the documentation of the nonmember
/// template function Kokkos::parallel_scan.
-template< class FunctorType , class ExecPolicy , class ExecutionSapce =
- typename Impl::FunctorPolicyExecutionSpace< FunctorType , ExecPolicy >::execution_space
- > class ParallelScan ;
+template< class FunctorType, class ExecPolicy, class ExecutionSapce =
+ typename Impl::FunctorPolicyExecutionSpace< FunctorType, ExecPolicy >::execution_space
+ > class ParallelScan;
-}}
-#endif /* #ifndef KOKKOS_CORE_FWD_HPP */
+} // namespace Impl
+
+} // namespace Kokkos
+#endif /* #ifndef KOKKOS_CORE_FWD_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Cuda.hpp b/lib/kokkos/core/src/Kokkos_Cuda.hpp
index afccdb6c5..433cac5e5 100644
--- a/lib/kokkos/core/src/Kokkos_Cuda.hpp
+++ b/lib/kokkos/core/src/Kokkos_Cuda.hpp
@@ -1,304 +1,304 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDA_HPP
#define KOKKOS_CUDA_HPP
#include <Kokkos_Core_fwd.hpp>
// If CUDA execution space is enabled then use this header file.
#if defined( KOKKOS_ENABLE_CUDA )
#include <iosfwd>
#include <vector>
#include <Kokkos_CudaSpace.hpp>
#include <Kokkos_Parallel.hpp>
#include <Kokkos_TaskScheduler.hpp>
#include <Kokkos_Layout.hpp>
#include <Kokkos_ScratchSpace.hpp>
#include <Kokkos_MemoryTraits.hpp>
#include <impl/Kokkos_Tags.hpp>
-#include <KokkosExp_MDRangePolicy.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
class CudaExec ;
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/// \class Cuda
/// \brief Kokkos Execution Space that uses CUDA to run on GPUs.
///
/// An "execution space" represents a parallel execution model. It tells Kokkos
/// how to parallelize the execution of kernels in a parallel_for or
/// parallel_reduce. For example, the Threads execution space uses Pthreads or
/// C++11 threads on a CPU, the OpenMP execution space uses the OpenMP language
/// extensions, and the Serial execution space executes "parallel" kernels
/// sequentially. The Cuda execution space uses NVIDIA's CUDA programming
/// model to execute kernels in parallel on GPUs.
class Cuda {
public:
//! \name Type declarations that all Kokkos execution spaces must provide.
//@{
//! Tag this class as a kokkos execution space
typedef Cuda execution_space ;
#if defined( KOKKOS_ENABLE_CUDA_UVM )
//! This execution space's preferred memory space.
typedef CudaUVMSpace memory_space ;
#else
//! This execution space's preferred memory space.
typedef CudaSpace memory_space ;
#endif
//! This execution space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
//! The size_type best suited for this execution space.
typedef memory_space::size_type size_type ;
//! This execution space's preferred array layout.
typedef LayoutLeft array_layout ;
//!
typedef ScratchMemorySpace< Cuda > scratch_memory_space ;
//@}
//--------------------------------------------------
//! \name Functions that all Kokkos devices must implement.
//@{
/// \brief True if and only if this method is being called in a
/// thread-parallel function.
KOKKOS_INLINE_FUNCTION static int in_parallel() {
#if defined( __CUDA_ARCH__ )
return true;
#else
return false;
#endif
}
/** \brief Set the device in a "sleep" state.
*
* This function sets the device in a "sleep" state in which it is
* not ready for work. This may consume less resources than if the
* device were in an "awake" state, but it may also take time to
* bring the device from a sleep state to be ready for work.
*
* \return True if the device is in the "sleep" state, else false if
* the device is actively working and could not enter the "sleep"
* state.
*/
static bool sleep();
/// \brief Wake the device from the 'sleep' state so it is ready for work.
///
/// \return True if the device is in the "ready" state, else "false"
/// if the device is actively working (which also means that it's
/// awake).
static bool wake();
/// \brief Wait until all dispatched functors complete.
///
/// The parallel_for or parallel_reduce dispatch of a functor may
/// return asynchronously, before the functor completes. This
/// method does not return until all dispatched functors on this
/// device have completed.
static void fence();
//! Free any resources being consumed by the device.
static void finalize();
//! Has been initialized
static int is_initialized();
/** \brief Return the maximum amount of concurrency. */
static int concurrency();
//! Print configuration information to the given output stream.
static void print_configuration( std::ostream & , const bool detail = false );
//@}
//--------------------------------------------------
//! \name Cuda space instances
~Cuda() {}
Cuda();
explicit Cuda( const int instance_id );
Cuda( Cuda && ) = default ;
Cuda( const Cuda & ) = default ;
Cuda & operator = ( Cuda && ) = default ;
Cuda & operator = ( const Cuda & ) = default ;
//--------------------------------------------------------------------------
//! \name Device-specific functions
//@{
struct SelectDevice {
int cuda_device_id ;
SelectDevice() : cuda_device_id(0) {}
explicit SelectDevice( int id ) : cuda_device_id( id ) {}
};
//! Initialize, telling the CUDA run-time library which device to use.
static void initialize( const SelectDevice = SelectDevice()
, const size_t num_instances = 1 );
/// \brief Cuda device architecture of the selected device.
///
/// This matches the __CUDA_ARCH__ specification.
static size_type device_arch();
//! Query device count.
static size_type detect_device_count();
/** \brief Detect the available devices and their architecture
* as defined by the __CUDA_ARCH__ specification.
*/
static std::vector<unsigned> detect_device_arch();
cudaStream_t cuda_stream() const { return m_stream ; }
int cuda_device() const { return m_device ; }
//@}
//--------------------------------------------------------------------------
private:
cudaStream_t m_stream ;
int m_device ;
};
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
template<>
struct MemorySpaceAccess
< Kokkos::CudaSpace
, Kokkos::Cuda::scratch_memory_space
>
{
enum { assignable = false };
enum { accessible = true };
enum { deepcopy = false };
};
#if defined( KOKKOS_ENABLE_CUDA_UVM )
// If forcing use of UVM everywhere
// then must assume that CudaUVMSpace
// can be a stand-in for CudaSpace.
// This will fail when a strange host-side execution space
// that defines CudaUVMSpace as its preferredmemory space.
template<>
struct MemorySpaceAccess
< Kokkos::CudaUVMSpace
, Kokkos::Cuda::scratch_memory_space
>
{
enum { assignable = false };
enum { accessible = true };
enum { deepcopy = false };
};
#endif
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::CudaSpace
, Kokkos::Cuda::scratch_memory_space
>
{
enum { value = true };
KOKKOS_INLINE_FUNCTION static void verify( void ) { }
KOKKOS_INLINE_FUNCTION static void verify( const void * ) { }
};
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::HostSpace
, Kokkos::Cuda::scratch_memory_space
>
{
enum { value = false };
inline static void verify( void ) { CudaSpace::access_error(); }
inline static void verify( const void * p ) { CudaSpace::access_error(p); }
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
#include <Cuda/Kokkos_CudaExec.hpp>
#include <Cuda/Kokkos_Cuda_View.hpp>
#include <Cuda/Kokkos_Cuda_Parallel.hpp>
#include <Cuda/Kokkos_Cuda_Task.hpp>
+#include <KokkosExp_MDRangePolicy.hpp>
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
#endif /* #ifndef KOKKOS_CUDA_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_HBWSpace.hpp b/lib/kokkos/core/src/Kokkos_HBWSpace.hpp
index d6bf8dcdf..fc39ce0e5 100644
--- a/lib/kokkos/core/src/Kokkos_HBWSpace.hpp
+++ b/lib/kokkos/core/src/Kokkos_HBWSpace.hpp
@@ -1,337 +1,352 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_HBWSPACE_HPP
#define KOKKOS_HBWSPACE_HPP
-
#include <Kokkos_HostSpace.hpp>
/*--------------------------------------------------------------------------*/
+
#ifdef KOKKOS_ENABLE_HBWSPACE
namespace Kokkos {
+
namespace Experimental {
+
namespace Impl {
/// \brief Initialize lock array for arbitrary size atomics.
///
/// Arbitrary atomics are implemented using a hash table of locks
/// where the hash value is derived from the address of the
/// object for which an atomic operation is performed.
/// This function initializes the locks to zero (unset).
void init_lock_array_hbw_space();
/// \brief Aquire a lock for the address
///
/// This function tries to aquire the lock for the hash value derived
/// from the provided ptr. If the lock is successfully aquired the
/// function returns true. Otherwise it returns false.
-bool lock_address_hbw_space(void* ptr);
+bool lock_address_hbw_space( void* ptr );
/// \brief Release lock for the address
///
/// This function releases the lock for the hash value derived
/// from the provided ptr. This function should only be called
/// after previously successfully aquiring a lock with
/// lock_address.
-void unlock_address_hbw_space(void* ptr);
+void unlock_address_hbw_space( void* ptr );
} // namespace Impl
-} // neamspace Experimental
+
+} // namespace Experimental
+
} // namespace Kokkos
namespace Kokkos {
+
namespace Experimental {
/// \class HBWSpace
/// \brief Memory management for host memory.
///
/// HBWSpace is a memory space that governs host memory. "Host"
/// memory means the usual CPU-accessible memory.
class HBWSpace {
public:
-
//! Tag this class as a kokkos memory space
- typedef HBWSpace memory_space ;
- typedef size_t size_type ;
+ typedef HBWSpace memory_space;
+ typedef size_t size_type;
/// \typedef execution_space
/// \brief Default execution space for this memory space.
///
/// Every memory space has a default execution space. This is
/// useful for things like initializing a View (which happens in
/// parallel using the View's default execution space).
#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
- typedef Kokkos::OpenMP execution_space ;
+ typedef Kokkos::OpenMP execution_space;
#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
- typedef Kokkos::Threads execution_space ;
+ typedef Kokkos::Threads execution_space;
+//#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
+// typedef Kokkos::Qthreads execution_space;
#elif defined( KOKKOS_ENABLE_OPENMP )
- typedef Kokkos::OpenMP execution_space ;
+ typedef Kokkos::OpenMP execution_space;
#elif defined( KOKKOS_ENABLE_PTHREAD )
- typedef Kokkos::Threads execution_space ;
+ typedef Kokkos::Threads execution_space;
+//#elif defined( KOKKOS_ENABLE_QTHREADS )
+// typedef Kokkos::Qthreads execution_space;
#elif defined( KOKKOS_ENABLE_SERIAL )
- typedef Kokkos::Serial execution_space ;
+ typedef Kokkos::Serial execution_space;
#else
-# error "At least one of the following host execution spaces must be defined: Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads. You might be seeing this message if you disabled the Kokkos::Serial device explicitly using the Kokkos_ENABLE_Serial:BOOL=OFF CMake option, but did not enable any of the other host execution space devices."
+# error "At least one of the following host execution spaces must be defined: Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qhreads, or Kokkos::Serial. You might be seeing this message if you disabled the Kokkos::Serial device explicitly using the Kokkos_ENABLE_Serial:BOOL=OFF CMake option, but did not enable any of the other host execution space devices."
#endif
//! This memory space preferred device_type
- typedef Kokkos::Device<execution_space,memory_space> device_type;
+ typedef Kokkos::Device< execution_space, memory_space > device_type;
/*--------------------------------*/
/* Functions unique to the HBWSpace */
static int in_parallel();
static void register_in_parallel( int (*)() );
/*--------------------------------*/
/**\brief Default memory space instance */
HBWSpace();
- HBWSpace( const HBWSpace & rhs ) = default ;
- HBWSpace & operator = ( const HBWSpace & ) = default ;
- ~HBWSpace() = default ;
+ HBWSpace( const HBWSpace & rhs ) = default;
+ HBWSpace & operator = ( const HBWSpace & ) = default;
+ ~HBWSpace() = default;
/**\brief Non-default memory space instance to choose allocation mechansim, if available */
- enum AllocationMechanism { STD_MALLOC , POSIX_MEMALIGN , POSIX_MMAP , INTEL_MM_ALLOC };
+ enum AllocationMechanism { STD_MALLOC, POSIX_MEMALIGN, POSIX_MMAP, INTEL_MM_ALLOC };
explicit
HBWSpace( const AllocationMechanism & );
/**\brief Allocate untracked memory in the space */
- void * allocate( const size_t arg_alloc_size ) const ;
+ void * allocate( const size_t arg_alloc_size ) const;
/**\brief Deallocate untracked memory in the space */
- void deallocate( void * const arg_alloc_ptr
- , const size_t arg_alloc_size ) const ;
+ void deallocate( void * const arg_alloc_ptr
+ , const size_t arg_alloc_size ) const;
/**\brief Return Name of the MemorySpace */
static constexpr const char* name();
private:
- AllocationMechanism m_alloc_mech ;
+ AllocationMechanism m_alloc_mech;
static constexpr const char* m_name = "HBW";
- friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > ;
+ friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace, void >;
};
} // namespace Experimental
+
} // namespace Kokkos
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
+
namespace Impl {
template<>
-class SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >
- : public SharedAllocationRecord< void , void >
+class SharedAllocationRecord< Kokkos::Experimental::HBWSpace, void >
+ : public SharedAllocationRecord< void, void >
{
private:
- friend Kokkos::Experimental::HBWSpace ;
+ friend Kokkos::Experimental::HBWSpace;
- typedef SharedAllocationRecord< void , void > RecordBase ;
+ typedef SharedAllocationRecord< void, void > RecordBase;
- SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
- SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
+ SharedAllocationRecord( const SharedAllocationRecord & ) = delete;
+ SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete;
static void deallocate( RecordBase * );
/**\brief Root record for tracked allocations from this HBWSpace instance */
- static RecordBase s_root_record ;
+ static RecordBase s_root_record;
- const Kokkos::Experimental::HBWSpace m_space ;
+ const Kokkos::Experimental::HBWSpace m_space;
protected:
~SharedAllocationRecord();
- SharedAllocationRecord() = default ;
+ SharedAllocationRecord() = default;
- SharedAllocationRecord( const Kokkos::Experimental::HBWSpace & arg_space
- , const std::string & arg_label
- , const size_t arg_alloc_size
- , const RecordBase::function_type arg_dealloc = & deallocate
+ SharedAllocationRecord( const Kokkos::Experimental::HBWSpace & arg_space
+ , const std::string & arg_label
+ , const size_t arg_alloc_size
+ , const RecordBase::function_type arg_dealloc = & deallocate
);
public:
inline
std::string get_label() const
{
return std::string( RecordBase::head()->m_label );
}
KOKKOS_INLINE_FUNCTION static
- SharedAllocationRecord * allocate( const Kokkos::Experimental::HBWSpace & arg_space
- , const std::string & arg_label
- , const size_t arg_alloc_size
+ SharedAllocationRecord * allocate( const Kokkos::Experimental::HBWSpace & arg_space
+ , const std::string & arg_label
+ , const size_t arg_alloc_size
)
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
+ return new SharedAllocationRecord( arg_space, arg_label, arg_alloc_size );
#else
- return (SharedAllocationRecord *) 0 ;
+ return (SharedAllocationRecord *) 0;
#endif
}
/**\brief Allocate tracked memory in the space */
static
void * allocate_tracked( const Kokkos::Experimental::HBWSpace & arg_space
- , const std::string & arg_label
- , const size_t arg_alloc_size );
+ , const std::string & arg_label
+ , const size_t arg_alloc_size );
/**\brief Reallocate tracked memory in the space */
static
void * reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size );
/**\brief Deallocate tracked memory in the space */
static
void deallocate_tracked( void * const arg_alloc_ptr );
-
static SharedAllocationRecord * get_record( void * arg_alloc_ptr );
- static void print_records( std::ostream & , const Kokkos::Experimental::HBWSpace & , bool detail = false );
+ static void print_records( std::ostream &, const Kokkos::Experimental::HBWSpace &, bool detail = false );
};
} // namespace Impl
-} // namespace Kokkos
+} // namespace Kokkos
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
+
namespace Impl {
-static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::Experimental::HBWSpace , Kokkos::Experimental::HBWSpace >::assignable , "" );
+static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::Experimental::HBWSpace, Kokkos::Experimental::HBWSpace >::assignable, "" );
template<>
-struct MemorySpaceAccess< Kokkos::HostSpace , Kokkos::Experimental::HBWSpace > {
+struct MemorySpaceAccess< Kokkos::HostSpace, Kokkos::Experimental::HBWSpace > {
enum { assignable = true };
enum { accessible = true };
enum { deepcopy = true };
};
template<>
-struct MemorySpaceAccess< Kokkos::Experimental::HBWSpace , Kokkos::HostSpace> {
+struct MemorySpaceAccess< Kokkos::Experimental::HBWSpace, Kokkos::HostSpace > {
enum { assignable = false };
enum { accessible = true };
enum { deepcopy = true };
};
-}}
+} // namespace Impl
+
+} // namespace Kokkos
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Impl {
+namespace Impl {
-template<class ExecutionSpace>
-struct DeepCopy<Experimental::HBWSpace,Experimental::HBWSpace,ExecutionSpace> {
- DeepCopy( void * dst , const void * src , size_t n ) {
- memcpy( dst , src , n );
+template< class ExecutionSpace >
+struct DeepCopy< Experimental::HBWSpace, Experimental::HBWSpace, ExecutionSpace > {
+ DeepCopy( void * dst, const void * src, size_t n ) {
+ memcpy( dst, src, n );
}
- DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
+
+ DeepCopy( const ExecutionSpace& exec, void * dst, const void * src, size_t n ) {
exec.fence();
- memcpy( dst , src , n );
+ memcpy( dst, src, n );
}
};
-template<class ExecutionSpace>
-struct DeepCopy<HostSpace,Experimental::HBWSpace,ExecutionSpace> {
- DeepCopy( void * dst , const void * src , size_t n ) {
- memcpy( dst , src , n );
+template< class ExecutionSpace >
+struct DeepCopy< HostSpace, Experimental::HBWSpace, ExecutionSpace > {
+ DeepCopy( void * dst, const void * src, size_t n ) {
+ memcpy( dst, src, n );
}
- DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
+
+ DeepCopy( const ExecutionSpace& exec, void * dst, const void * src, size_t n ) {
exec.fence();
- memcpy( dst , src , n );
+ memcpy( dst, src, n );
}
};
-template<class ExecutionSpace>
-struct DeepCopy<Experimental::HBWSpace,HostSpace,ExecutionSpace> {
- DeepCopy( void * dst , const void * src , size_t n ) {
- memcpy( dst , src , n );
+template< class ExecutionSpace >
+struct DeepCopy< Experimental::HBWSpace, HostSpace, ExecutionSpace > {
+ DeepCopy( void * dst, const void * src, size_t n ) {
+ memcpy( dst, src, n );
}
- DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
+
+ DeepCopy( const ExecutionSpace& exec, void * dst, const void * src, size_t n ) {
exec.fence();
- memcpy( dst , src , n );
+ memcpy( dst, src, n );
}
};
} // namespace Impl
+
} // namespace Kokkos
namespace Kokkos {
+
namespace Impl {
template<>
-struct VerifyExecutionCanAccessMemorySpace< Kokkos::HostSpace , Kokkos::Experimental::HBWSpace >
+struct VerifyExecutionCanAccessMemorySpace< Kokkos::HostSpace, Kokkos::Experimental::HBWSpace >
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
template<>
-struct VerifyExecutionCanAccessMemorySpace< Kokkos::Experimental::HBWSpace , Kokkos::HostSpace >
+struct VerifyExecutionCanAccessMemorySpace< Kokkos::Experimental::HBWSpace, Kokkos::HostSpace >
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
} // namespace Impl
+
} // namespace Kokkos
#endif
-#endif /* #define KOKKOS_HBWSPACE_HPP */
+#endif // #define KOKKOS_HBWSPACE_HPP
diff --git a/lib/kokkos/core/src/Kokkos_HostSpace.hpp b/lib/kokkos/core/src/Kokkos_HostSpace.hpp
index e79de462b..82006665c 100644
--- a/lib/kokkos/core/src/Kokkos_HostSpace.hpp
+++ b/lib/kokkos/core/src/Kokkos_HostSpace.hpp
@@ -1,317 +1,318 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_HOSTSPACE_HPP
#define KOKKOS_HOSTSPACE_HPP
#include <cstring>
#include <string>
#include <iosfwd>
#include <typeinfo>
#include <Kokkos_Core_fwd.hpp>
#include <Kokkos_Concepts.hpp>
#include <Kokkos_MemoryTraits.hpp>
#include <impl/Kokkos_Traits.hpp>
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_SharedAlloc.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
+
namespace Impl {
/// \brief Initialize lock array for arbitrary size atomics.
///
/// Arbitrary atomics are implemented using a hash table of locks
/// where the hash value is derived from the address of the
/// object for which an atomic operation is performed.
/// This function initializes the locks to zero (unset).
void init_lock_array_host_space();
/// \brief Aquire a lock for the address
///
/// This function tries to aquire the lock for the hash value derived
/// from the provided ptr. If the lock is successfully aquired the
/// function returns true. Otherwise it returns false.
bool lock_address_host_space(void* ptr);
/// \brief Release lock for the address
///
/// This function releases the lock for the hash value derived
/// from the provided ptr. This function should only be called
/// after previously successfully aquiring a lock with
/// lock_address.
-void unlock_address_host_space(void* ptr);
+void unlock_address_host_space( void* ptr );
} // namespace Impl
+
} // namespace Kokkos
namespace Kokkos {
/// \class HostSpace
/// \brief Memory management for host memory.
///
/// HostSpace is a memory space that governs host memory. "Host"
/// memory means the usual CPU-accessible memory.
class HostSpace {
public:
-
//! Tag this class as a kokkos memory space
- typedef HostSpace memory_space ;
- typedef size_t size_type ;
+ typedef HostSpace memory_space;
+ typedef size_t size_type;
/// \typedef execution_space
/// \brief Default execution space for this memory space.
///
/// Every memory space has a default execution space. This is
/// useful for things like initializing a View (which happens in
/// parallel using the View's default execution space).
#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
- typedef Kokkos::OpenMP execution_space ;
+ typedef Kokkos::OpenMP execution_space;
#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
- typedef Kokkos::Threads execution_space ;
+ typedef Kokkos::Threads execution_space;
+//#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
+// typedef Kokkos::Qthreads execution_space;
#elif defined( KOKKOS_ENABLE_OPENMP )
- typedef Kokkos::OpenMP execution_space ;
+ typedef Kokkos::OpenMP execution_space;
#elif defined( KOKKOS_ENABLE_PTHREAD )
- typedef Kokkos::Threads execution_space ;
+ typedef Kokkos::Threads execution_space;
+//#elif defined( KOKKOS_ENABLE_QTHREADS )
+// typedef Kokkos::Qthreads execution_space;
#elif defined( KOKKOS_ENABLE_SERIAL )
- typedef Kokkos::Serial execution_space ;
+ typedef Kokkos::Serial execution_space;
#else
-# error "At least one of the following host execution spaces must be defined: Kokkos::OpenMP, Kokkos::Serial, or Kokkos::Threads. You might be seeing this message if you disabled the Kokkos::Serial device explicitly using the Kokkos_ENABLE_Serial:BOOL=OFF CMake option, but did not enable any of the other host execution space devices."
+# error "At least one of the following host execution spaces must be defined: Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qthreads, or Kokkos::Serial. You might be seeing this message if you disabled the Kokkos::Serial device explicitly using the Kokkos_ENABLE_Serial:BOOL=OFF CMake option, but did not enable any of the other host execution space devices."
#endif
//! This memory space preferred device_type
- typedef Kokkos::Device<execution_space,memory_space> device_type;
+ typedef Kokkos::Device< execution_space, memory_space > device_type;
/*--------------------------------*/
/* Functions unique to the HostSpace */
static int in_parallel();
static void register_in_parallel( int (*)() );
/*--------------------------------*/
/**\brief Default memory space instance */
HostSpace();
- HostSpace( HostSpace && rhs ) = default ;
- HostSpace( const HostSpace & rhs ) = default ;
- HostSpace & operator = ( HostSpace && ) = default ;
- HostSpace & operator = ( const HostSpace & ) = default ;
- ~HostSpace() = default ;
+ HostSpace( HostSpace && rhs ) = default;
+ HostSpace( const HostSpace & rhs ) = default;
+ HostSpace & operator = ( HostSpace && ) = default;
+ HostSpace & operator = ( const HostSpace & ) = default;
+ ~HostSpace() = default;
/**\brief Non-default memory space instance to choose allocation mechansim, if available */
- enum AllocationMechanism { STD_MALLOC , POSIX_MEMALIGN , POSIX_MMAP , INTEL_MM_ALLOC };
+ enum AllocationMechanism { STD_MALLOC, POSIX_MEMALIGN, POSIX_MMAP, INTEL_MM_ALLOC };
explicit
HostSpace( const AllocationMechanism & );
/**\brief Allocate untracked memory in the space */
- void * allocate( const size_t arg_alloc_size ) const ;
+ void * allocate( const size_t arg_alloc_size ) const;
/**\brief Deallocate untracked memory in the space */
- void deallocate( void * const arg_alloc_ptr
- , const size_t arg_alloc_size ) const ;
+ void deallocate( void * const arg_alloc_ptr
+ , const size_t arg_alloc_size ) const;
/**\brief Return Name of the MemorySpace */
static constexpr const char* name();
private:
-
- AllocationMechanism m_alloc_mech ;
+ AllocationMechanism m_alloc_mech;
static constexpr const char* m_name = "Host";
- friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > ;
+ friend class Kokkos::Impl::SharedAllocationRecord< Kokkos::HostSpace, void >;
};
} // namespace Kokkos
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-namespace Impl {
-static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::HostSpace >::assignable , "" );
+namespace Impl {
+static_assert( Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::HostSpace >::assignable, "" );
template< typename S >
struct HostMirror {
private:
-
// If input execution space can access HostSpace then keep it.
// Example: Kokkos::OpenMP can access, Kokkos::Cuda cannot
enum { keep_exe = Kokkos::Impl::MemorySpaceAccess
- < typename S::execution_space::memory_space , Kokkos::HostSpace >
- ::accessible };
+ < typename S::execution_space::memory_space, Kokkos::HostSpace >::accessible };
// If HostSpace can access memory space then keep it.
// Example: Cannot access Kokkos::CudaSpace, can access Kokkos::CudaUVMSpace
enum { keep_mem = Kokkos::Impl::MemorySpaceAccess
- < Kokkos::HostSpace , typename S::memory_space >::accessible };
+ < Kokkos::HostSpace, typename S::memory_space >::accessible };
public:
typedef typename std::conditional
< keep_exe && keep_mem /* Can keep whole space */
, S
, typename std::conditional
< keep_mem /* Can keep memory space, use default Host execution space */
, Kokkos::Device< Kokkos::HostSpace::execution_space
, typename S::memory_space >
, Kokkos::HostSpace
>::type
- >::type Space ;
+ >::type Space;
};
} // namespace Impl
+
} // namespace Kokkos
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
+
namespace Impl {
template<>
-class SharedAllocationRecord< Kokkos::HostSpace , void >
- : public SharedAllocationRecord< void , void >
+class SharedAllocationRecord< Kokkos::HostSpace, void >
+ : public SharedAllocationRecord< void, void >
{
private:
+ friend Kokkos::HostSpace;
- friend Kokkos::HostSpace ;
-
- typedef SharedAllocationRecord< void , void > RecordBase ;
+ typedef SharedAllocationRecord< void, void > RecordBase;
- SharedAllocationRecord( const SharedAllocationRecord & ) = delete ;
- SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete ;
+ SharedAllocationRecord( const SharedAllocationRecord & ) = delete;
+ SharedAllocationRecord & operator = ( const SharedAllocationRecord & ) = delete;
static void deallocate( RecordBase * );
/**\brief Root record for tracked allocations from this HostSpace instance */
- static RecordBase s_root_record ;
+ static RecordBase s_root_record;
- const Kokkos::HostSpace m_space ;
+ const Kokkos::HostSpace m_space;
protected:
-
~SharedAllocationRecord();
- SharedAllocationRecord() = default ;
+ SharedAllocationRecord() = default;
SharedAllocationRecord( const Kokkos::HostSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const RecordBase::function_type arg_dealloc = & deallocate
);
public:
inline
std::string get_label() const
- {
- return std::string( RecordBase::head()->m_label );
- }
+ {
+ return std::string( RecordBase::head()->m_label );
+ }
KOKKOS_INLINE_FUNCTION static
SharedAllocationRecord * allocate( const Kokkos::HostSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
)
- {
+ {
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- return new SharedAllocationRecord( arg_space , arg_label , arg_alloc_size );
+ return new SharedAllocationRecord( arg_space, arg_label, arg_alloc_size );
#else
- return (SharedAllocationRecord *) 0 ;
+ return (SharedAllocationRecord *) 0;
#endif
- }
+ }
+
/**\brief Allocate tracked memory in the space */
static
void * allocate_tracked( const Kokkos::HostSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size );
/**\brief Reallocate tracked memory in the space */
static
void * reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size );
/**\brief Deallocate tracked memory in the space */
static
void deallocate_tracked( void * const arg_alloc_ptr );
-
static SharedAllocationRecord * get_record( void * arg_alloc_ptr );
- static void print_records( std::ostream & , const Kokkos::HostSpace & , bool detail = false );
+ static void print_records( std::ostream &, const Kokkos::HostSpace &, bool detail = false );
};
} // namespace Impl
+
} // namespace Kokkos
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
+
namespace Impl {
-template< class DstSpace, class SrcSpace, class ExecutionSpace = typename DstSpace::execution_space> struct DeepCopy ;
+template< class DstSpace, class SrcSpace, class ExecutionSpace = typename DstSpace::execution_space > struct DeepCopy;
-template<class ExecutionSpace>
-struct DeepCopy<HostSpace,HostSpace,ExecutionSpace> {
- DeepCopy( void * dst , const void * src , size_t n ) {
- memcpy( dst , src , n );
+template< class ExecutionSpace >
+struct DeepCopy< HostSpace, HostSpace, ExecutionSpace > {
+ DeepCopy( void * dst, const void * src, size_t n ) {
+ memcpy( dst, src, n );
}
- DeepCopy( const ExecutionSpace& exec, void * dst , const void * src , size_t n ) {
+
+ DeepCopy( const ExecutionSpace& exec, void * dst, const void * src, size_t n ) {
exec.fence();
- memcpy( dst , src , n );
+ memcpy( dst, src, n );
}
};
} // namespace Impl
-} // namespace Kokkos
-
-#endif /* #define KOKKOS_HOSTSPACE_HPP */
+} // namespace Kokkos
+#endif // #define KOKKOS_HOSTSPACE_HPP
diff --git a/lib/kokkos/core/src/Kokkos_Macros.hpp b/lib/kokkos/core/src/Kokkos_Macros.hpp
index 52845b9e0..c138b08c9 100644
--- a/lib/kokkos/core/src/Kokkos_Macros.hpp
+++ b/lib/kokkos/core/src/Kokkos_Macros.hpp
@@ -1,493 +1,468 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_MACROS_HPP
#define KOKKOS_MACROS_HPP
//----------------------------------------------------------------------------
-/** Pick up configure/build options via #define macros:
+/** Pick up configure / build options via #define macros:
*
* KOKKOS_ENABLE_CUDA Kokkos::Cuda execution and memory spaces
* KOKKOS_ENABLE_PTHREAD Kokkos::Threads execution space
- * KOKKOS_ENABLE_QTHREAD Kokkos::Qthread execution space
- * KOKKOS_ENABLE_OPENMP Kokkos::OpenMP execution space
- * KOKKOS_ENABLE_HWLOC HWLOC library is available
- * KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK insert array bounds checks, is expensive!
- *
- * KOKKOS_ENABLE_MPI negotiate MPI/execution space interactions
- *
- * KOKKOS_ENABLE_CUDA_UVM Use CUDA UVM for Cuda memory space
+ * KOKKOS_ENABLE_QTHREADS Kokkos::Qthreads execution space
+ * KOKKOS_ENABLE_OPENMP Kokkos::OpenMP execution space
+ * KOKKOS_ENABLE_HWLOC HWLOC library is available.
+ * KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK Insert array bounds checks, is expensive!
+ * KOKKOS_ENABLE_MPI Negotiate MPI/execution space interactions.
+ * KOKKOS_ENABLE_CUDA_UVM Use CUDA UVM for Cuda memory space.
*/
#ifndef KOKKOS_DONT_INCLUDE_CORE_CONFIG_H
-#include <KokkosCore_config.h>
+ #include <KokkosCore_config.h>
#endif
#include <impl/Kokkos_OldMacros.hpp>
//----------------------------------------------------------------------------
/** Pick up compiler specific #define macros:
*
* Macros for known compilers evaluate to an integral version value
*
* KOKKOS_COMPILER_NVCC
* KOKKOS_COMPILER_GNU
* KOKKOS_COMPILER_INTEL
* KOKKOS_COMPILER_IBM
* KOKKOS_COMPILER_CRAYC
* KOKKOS_COMPILER_APPLECC
* KOKKOS_COMPILER_CLANG
* KOKKOS_COMPILER_PGI
*
* Macros for which compiler extension to use for atomics on intrinsice types
*
* KOKKOS_ENABLE_CUDA_ATOMICS
* KOKKOS_ENABLE_GNU_ATOMICS
* KOKKOS_ENABLE_INTEL_ATOMICS
* KOKKOS_ENABLE_OPENMP_ATOMICS
*
- * A suite of 'KOKKOS_HAVE_PRAGMA_...' are defined for internal use.
+ * A suite of 'KOKKOS_ENABLE_PRAGMA_...' are defined for internal use.
*
* Macros for marking functions to run in an execution space:
*
* KOKKOS_FUNCTION
* KOKKOS_INLINE_FUNCTION request compiler to inline
* KOKKOS_FORCEINLINE_FUNCTION force compiler to inline, use with care!
*/
//----------------------------------------------------------------------------
#if defined( KOKKOS_ENABLE_CUDA ) && defined( __CUDACC__ )
+ // Compiling with a CUDA compiler.
+ //
+ // Include <cuda.h> to pick up the CUDA_VERSION macro defined as:
+ // CUDA_VERSION = ( MAJOR_VERSION * 1000 ) + ( MINOR_VERSION * 10 )
+ //
+ // When generating device code the __CUDA_ARCH__ macro is defined as:
+ // __CUDA_ARCH__ = ( MAJOR_CAPABILITY * 100 ) + ( MINOR_CAPABILITY * 10 )
+
+ #include <cuda_runtime.h>
+ #include <cuda.h>
+
+ #if !defined( CUDA_VERSION )
+ #error "#include <cuda.h> did not define CUDA_VERSION."
+ #endif
-/* Compiling with a CUDA compiler.
- *
- * Include <cuda.h> to pick up the CUDA_VERSION macro defined as:
- * CUDA_VERSION = ( MAJOR_VERSION * 1000 ) + ( MINOR_VERSION * 10 )
- *
- * When generating device code the __CUDA_ARCH__ macro is defined as:
- * __CUDA_ARCH__ = ( MAJOR_CAPABILITY * 100 ) + ( MINOR_CAPABILITY * 10 )
- */
+ #if ( CUDA_VERSION < 7000 )
+ // CUDA supports C++11 in device code starting with version 7.0.
+ // This includes auto type and device code internal lambdas.
+ #error "Cuda version 7.0 or greater required."
+ #endif
-#include <cuda_runtime.h>
-#include <cuda.h>
+ #if defined( __CUDA_ARCH__ ) && ( __CUDA_ARCH__ < 300 )
+ // Compiling with CUDA compiler for device code.
+ #error "Cuda device capability >= 3.0 is required."
+ #endif
-#if ! defined( CUDA_VERSION )
-#error "#include <cuda.h> did not define CUDA_VERSION"
-#endif
+ #ifdef KOKKOS_ENABLE_CUDA_LAMBDA
+ #if ( CUDA_VERSION < 7050 )
+ // CUDA supports C++11 lambdas generated in host code to be given
+ // to the device starting with version 7.5. But the release candidate (7.5.6)
+ // still identifies as 7.0.
+ #error "Cuda version 7.5 or greater required for host-to-device Lambda support."
+ #endif
-#if ( CUDA_VERSION < 7000 )
-// CUDA supports C++11 in device code starting with
-// version 7.0. This includes auto type and device code internal
-// lambdas.
-#error "Cuda version 7.0 or greater required"
-#endif
+ #if ( CUDA_VERSION < 8000 ) && defined( __NVCC__ )
+ #define KOKKOS_LAMBDA [=]__device__
+ #else
+ #define KOKKOS_LAMBDA [=]__host__ __device__
-#if defined( __CUDA_ARCH__ ) && ( __CUDA_ARCH__ < 300 )
-/* Compiling with CUDA compiler for device code. */
-#error "Cuda device capability >= 3.0 is required"
-#endif
+ #if defined( KOKKOS_ENABLE_CXX1Z )
+ #define KOKKOS_CLASS_LAMBDA [=,*this] __host__ __device__
+ #endif
+ #endif
-#ifdef KOKKOS_ENABLE_CUDA_LAMBDA
-#if ( CUDA_VERSION < 7050 )
- // CUDA supports C++11 lambdas generated in host code to be given
- // to the device starting with version 7.5. But the release candidate (7.5.6)
- // still identifies as 7.0
- #error "Cuda version 7.5 or greater required for host-to-device Lambda support"
-#endif
-#if ( CUDA_VERSION < 8000 ) && defined(__NVCC__)
- #define KOKKOS_LAMBDA [=]__device__
-#else
- #define KOKKOS_LAMBDA [=]__host__ __device__
- #if defined( KOKKOS_ENABLE_CXX1Z )
- #define KOKKOS_CLASS_LAMBDA [=,*this] __host__ __device__
+ #define KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA 1
#endif
-#endif
-#define KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA 1
-#endif
-#endif /* #if defined( KOKKOS_ENABLE_CUDA ) && defined( __CUDACC__ ) */
+#endif // #if defined( KOKKOS_ENABLE_CUDA ) && defined( __CUDACC__ )
-
-#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
// Cuda version 8.0 still needs the functor wrapper
- #if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA /* && (CUDA_VERSION < 8000) */ ) && defined(__NVCC__)
+ #if /* ( CUDA_VERSION < 8000 ) && */ defined( __NVCC__ )
#define KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
#endif
#endif
-/*--------------------------------------------------------------------------*/
-/* Language info: C++, CUDA, OPENMP */
+//----------------------------------------------------------------------------
+// Language info: C++, CUDA, OPENMP
#if defined( KOKKOS_ENABLE_CUDA )
// Compiling Cuda code to 'ptx'
#define KOKKOS_FORCEINLINE_FUNCTION __device__ __host__ __forceinline__
#define KOKKOS_INLINE_FUNCTION __device__ __host__ inline
#define KOKKOS_FUNCTION __device__ __host__
-#endif /* #if defined( __CUDA_ARCH__ ) */
+#endif // #if defined( __CUDA_ARCH__ )
#if defined( _OPENMP )
+ // Compiling with OpenMP.
+ // The value of _OPENMP is an integer value YYYYMM
+ // where YYYY and MM are the year and month designation
+ // of the supported OpenMP API version.
+#endif // #if defined( _OPENMP )
- /* Compiling with OpenMP.
- * The value of _OPENMP is an integer value YYYYMM
- * where YYYY and MM are the year and month designation
- * of the supported OpenMP API version.
- */
-
-#endif /* #if defined( _OPENMP ) */
-
-/*--------------------------------------------------------------------------*/
-/* Mapping compiler built-ins to KOKKOS_COMPILER_*** macros */
+//----------------------------------------------------------------------------
+// Mapping compiler built-ins to KOKKOS_COMPILER_*** macros
#if defined( __NVCC__ )
// NVIDIA compiler is being used.
// Code is parsed and separated into host and device code.
// Host code is compiled again with another compiler.
// Device code is compile to 'ptx'.
#define KOKKOS_COMPILER_NVCC __NVCC__
-
#else
-#if ! defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
- #if !defined (KOKKOS_ENABLE_CUDA) // Compiling with clang for Cuda does not work with LAMBDAs either
- // CUDA (including version 6.5) does not support giving lambdas as
- // arguments to global functions. Thus its not currently possible
- // to dispatch lambdas from the host.
- #define KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA 1
+ #if !defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
+ #if !defined( KOKKOS_ENABLE_CUDA ) // Compiling with clang for Cuda does not work with LAMBDAs either
+ // CUDA (including version 6.5) does not support giving lambdas as
+ // arguments to global functions. Thus its not currently possible
+ // to dispatch lambdas from the host.
+ #define KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA 1
#endif
#endif
-#endif /* #if defined( __NVCC__ ) */
+#endif // #if defined( __NVCC__ )
-#if !defined (KOKKOS_LAMBDA)
+#if !defined( KOKKOS_LAMBDA )
#define KOKKOS_LAMBDA [=]
#endif
-#if defined( KOKKOS_ENABLE_CXX1Z ) && !defined (KOKKOS_CLASS_LAMBDA)
+#if defined( KOKKOS_ENABLE_CXX1Z ) && !defined( KOKKOS_CLASS_LAMBDA )
#define KOKKOS_CLASS_LAMBDA [=,*this]
#endif
-//#if ! defined( __CUDA_ARCH__ ) /* Not compiling Cuda code to 'ptx'. */
+//#if !defined( __CUDA_ARCH__ ) // Not compiling Cuda code to 'ptx'.
-/* Intel compiler for host code */
+// Intel compiler for host code.
#if defined( __INTEL_COMPILER )
#define KOKKOS_COMPILER_INTEL __INTEL_COMPILER
#elif defined( __ICC )
// Old define
#define KOKKOS_COMPILER_INTEL __ICC
#elif defined( __ECC )
// Very old define
#define KOKKOS_COMPILER_INTEL __ECC
#endif
-/* CRAY compiler for host code */
+// CRAY compiler for host code
#if defined( _CRAYC )
#define KOKKOS_COMPILER_CRAYC _CRAYC
#endif
#if defined( __IBMCPP__ )
// IBM C++
#define KOKKOS_COMPILER_IBM __IBMCPP__
#elif defined( __IBMC__ )
#define KOKKOS_COMPILER_IBM __IBMC__
#endif
#if defined( __APPLE_CC__ )
#define KOKKOS_COMPILER_APPLECC __APPLE_CC__
#endif
-#if defined (__clang__) && !defined (KOKKOS_COMPILER_INTEL)
+#if defined( __clang__ ) && !defined( KOKKOS_COMPILER_INTEL )
#define KOKKOS_COMPILER_CLANG __clang_major__*100+__clang_minor__*10+__clang_patchlevel__
#endif
-#if ! defined( __clang__ ) && ! defined( KOKKOS_COMPILER_INTEL ) &&defined( __GNUC__ )
+#if !defined( __clang__ ) && !defined( KOKKOS_COMPILER_INTEL ) &&defined( __GNUC__ )
#define KOKKOS_COMPILER_GNU __GNUC__*100+__GNUC_MINOR__*10+__GNUC_PATCHLEVEL__
+
#if ( 472 > KOKKOS_COMPILER_GNU )
#error "Compiling with GCC version earlier than 4.7.2 is not supported."
#endif
#endif
-#if defined( __PGIC__ ) && ! defined( __GNUC__ )
+#if defined( __PGIC__ ) && !defined( __GNUC__ )
#define KOKKOS_COMPILER_PGI __PGIC__*100+__PGIC_MINOR__*10+__PGIC_PATCHLEVEL__
+
#if ( 1540 > KOKKOS_COMPILER_PGI )
#error "Compiling with PGI version earlier than 15.4 is not supported."
#endif
#endif
-//#endif /* #if ! defined( __CUDA_ARCH__ ) */
+//#endif // #if !defined( __CUDA_ARCH__ )
-/*--------------------------------------------------------------------------*/
-/*--------------------------------------------------------------------------*/
-/* Intel compiler macros */
+//----------------------------------------------------------------------------
+// Intel compiler macros
#if defined( KOKKOS_COMPILER_INTEL )
-
#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
- #define KOKKOS_ENABLE_PRAGMA_IVDEP 1
#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
#define KOKKOS_ENABLE_PRAGMA_SIMD 1
+ #if ( __INTEL_COMPILER > 1400 )
+ #define KOKKOS_ENABLE_PRAGMA_IVDEP 1
+ #endif
+
#define KOKKOS_RESTRICT __restrict__
#ifndef KOKKOS_ALIGN
- #define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
+ #define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
#endif
#ifndef KOKKOS_ALIGN_PTR
- #define KOKKOS_ALIGN_PTR(size) __attribute__((align_value(size)))
+ #define KOKKOS_ALIGN_PTR(size) __attribute__((align_value(size)))
#endif
#ifndef KOKKOS_ALIGN_SIZE
- #define KOKKOS_ALIGN_SIZE 64
+ #define KOKKOS_ALIGN_SIZE 64
#endif
#if ( 1400 > KOKKOS_COMPILER_INTEL )
#if ( 1300 > KOKKOS_COMPILER_INTEL )
#error "Compiling with Intel version earlier than 13.0 is not supported. Official minimal version is 14.0."
#else
#warning "Compiling with Intel version 13.x probably works but is not officially supported. Official minimal version is 14.0."
#endif
#endif
- #if ! defined( KOKKOS_ENABLE_ASM ) && ! defined( _WIN32 )
+
+ #if !defined( KOKKOS_ENABLE_ASM ) && !defined( _WIN32 )
#define KOKKOS_ENABLE_ASM 1
#endif
- #if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
- #if !defined (_WIN32)
+ #if !defined( KOKKOS_FORCEINLINE_FUNCTION )
+ #if !defined( _WIN32 )
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
#else
#define KOKKOS_FORCEINLINE_FUNCTION inline
#endif
#endif
#if defined( __MIC__ )
// Compiling for Xeon Phi
#endif
-
#endif
-/*--------------------------------------------------------------------------*/
-/* Cray compiler macros */
+//----------------------------------------------------------------------------
+// Cray compiler macros
#if defined( KOKKOS_COMPILER_CRAYC )
-
-
#endif
-/*--------------------------------------------------------------------------*/
-/* IBM Compiler macros */
+//----------------------------------------------------------------------------
+// IBM Compiler macros
#if defined( KOKKOS_COMPILER_IBM )
-
#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
//#define KOKKOS_ENABLE_PRAGMA_IVDEP 1
//#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
//#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
//#define KOKKOS_ENABLE_PRAGMA_SIMD 1
-
#endif
-/*--------------------------------------------------------------------------*/
-/* CLANG compiler macros */
+//----------------------------------------------------------------------------
+// CLANG compiler macros
#if defined( KOKKOS_COMPILER_CLANG )
-
//#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
//#define KOKKOS_ENABLE_PRAGMA_IVDEP 1
//#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
//#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
//#define KOKKOS_ENABLE_PRAGMA_SIMD 1
- #if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
+ #if !defined( KOKKOS_FORCEINLINE_FUNCTION )
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
#endif
-
#endif
-/*--------------------------------------------------------------------------*/
-/* GNU Compiler macros */
+//----------------------------------------------------------------------------
+// GNU Compiler macros
#if defined( KOKKOS_COMPILER_GNU )
-
//#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
//#define KOKKOS_ENABLE_PRAGMA_IVDEP 1
//#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
//#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
//#define KOKKOS_ENABLE_PRAGMA_SIMD 1
- #if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
+ #if !defined( KOKKOS_FORCEINLINE_FUNCTION )
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
#endif
- #if ! defined( KOKKOS_ENABLE_ASM ) && ! defined( __PGIC__ ) && \
- ( defined( __amd64 ) || \
- defined( __amd64__ ) || \
- defined( __x86_64 ) || \
- defined( __x86_64__ ) )
+ #if !defined( KOKKOS_ENABLE_ASM ) && !defined( __PGIC__ ) && \
+ ( defined( __amd64 ) || defined( __amd64__ ) || \
+ defined( __x86_64 ) || defined( __x86_64__ ) )
#define KOKKOS_ENABLE_ASM 1
#endif
-
#endif
-/*--------------------------------------------------------------------------*/
+//----------------------------------------------------------------------------
#if defined( KOKKOS_COMPILER_PGI )
-
#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
#define KOKKOS_ENABLE_PRAGMA_IVDEP 1
//#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
//#define KOKKOS_ENABLE_PRAGMA_SIMD 1
-
#endif
-/*--------------------------------------------------------------------------*/
+//----------------------------------------------------------------------------
#if defined( KOKKOS_COMPILER_NVCC )
-
- #if defined(__CUDA_ARCH__ )
+ #if defined( __CUDA_ARCH__ )
#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
#endif
-
#endif
//----------------------------------------------------------------------------
-/** Define function marking macros if compiler specific macros are undefined: */
+// Define function marking macros if compiler specific macros are undefined:
-#if ! defined( KOKKOS_FORCEINLINE_FUNCTION )
-#define KOKKOS_FORCEINLINE_FUNCTION inline
+#if !defined( KOKKOS_FORCEINLINE_FUNCTION )
+ #define KOKKOS_FORCEINLINE_FUNCTION inline
#endif
-#if ! defined( KOKKOS_INLINE_FUNCTION )
-#define KOKKOS_INLINE_FUNCTION inline
+#if !defined( KOKKOS_INLINE_FUNCTION )
+ #define KOKKOS_INLINE_FUNCTION inline
#endif
-#if ! defined( KOKKOS_FUNCTION )
-#define KOKKOS_FUNCTION /**/
+#if !defined( KOKKOS_FUNCTION )
+ #define KOKKOS_FUNCTION /**/
#endif
-
//----------------------------------------------------------------------------
-///** Define empty macro for restrict if necessary: */
+// Define empty macro for restrict if necessary:
-#if ! defined(KOKKOS_RESTRICT)
-#define KOKKOS_RESTRICT
+#if !defined( KOKKOS_RESTRICT )
+ #define KOKKOS_RESTRICT
#endif
//----------------------------------------------------------------------------
-/** Define Macro for alignment: */
-#if ! defined KOKKOS_ALIGN_SIZE
-#define KOKKOS_ALIGN_SIZE 16
-#endif
+// Define Macro for alignment:
-#if ! defined(KOKKOS_ALIGN)
-#define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
+#if !defined KOKKOS_ALIGN_SIZE
+ #define KOKKOS_ALIGN_SIZE 16
#endif
-#if ! defined(KOKKOS_ALIGN_PTR)
-#define KOKKOS_ALIGN_PTR(size) __attribute__((aligned(size)))
+#if !defined( KOKKOS_ALIGN )
+ #define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
#endif
-//----------------------------------------------------------------------------
-/** Determine the default execution space for parallel dispatch.
- * There is zero or one default execution space specified.
- */
-
-#if 1 < ( ( defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA ) ? 1 : 0 ) + \
- ( defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP ) ? 1 : 0 ) + \
- ( defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS ) ? 1 : 0 ) + \
- ( defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL ) ? 1 : 0 ) )
-
-#error "More than one KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_* specified" ;
-
+#if !defined( KOKKOS_ALIGN_PTR )
+ #define KOKKOS_ALIGN_PTR(size) __attribute__((aligned(size)))
#endif
-/** If default is not specified then chose from enabled execution spaces.
- * Priority: CUDA, OPENMP, THREADS, SERIAL
- */
-#if defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
-#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
-#elif defined ( KOKKOS_ENABLE_CUDA )
-#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA
-#elif defined ( KOKKOS_ENABLE_OPENMP )
-#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP
-#elif defined ( KOKKOS_ENABLE_PTHREAD )
-#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS
+//----------------------------------------------------------------------------
+// Determine the default execution space for parallel dispatch.
+// There is zero or one default execution space specified.
+
+#if 1 < ( ( defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA ) ? 1 : 0 ) + \
+ ( defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP ) ? 1 : 0 ) + \
+ ( defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS ) ? 1 : 0 ) + \
+ ( defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS ) ? 1 : 0 ) + \
+ ( defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL ) ? 1 : 0 ) )
+ #error "More than one KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_* specified."
+#endif
+
+// If default is not specified then chose from enabled execution spaces.
+// Priority: CUDA, OPENMP, THREADS, QTHREADS, SERIAL
+#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
+//#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
+#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
+#elif defined( KOKKOS_ENABLE_CUDA )
+ #define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA
+#elif defined( KOKKOS_ENABLE_OPENMP )
+ #define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP
+#elif defined( KOKKOS_ENABLE_PTHREAD )
+ #define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS
+//#elif defined( KOKKOS_ENABLE_QTHREADS )
+// #define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS
#else
-#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL
+ #define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL
#endif
//----------------------------------------------------------------------------
-/** Determine for what space the code is being compiled: */
+// Determine for what space the code is being compiled:
-#if defined( __CUDACC__ ) && defined( __CUDA_ARCH__ ) && defined (KOKKOS_ENABLE_CUDA)
-#define KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA
+#if defined( __CUDACC__ ) && defined( __CUDA_ARCH__ ) && defined( KOKKOS_ENABLE_CUDA )
+ #define KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA
#else
-#define KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
+ #define KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
#endif
-//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#if ( defined( _POSIX_C_SOURCE ) && _POSIX_C_SOURCE >= 200112L ) || \
( defined( _XOPEN_SOURCE ) && _XOPEN_SOURCE >= 600 )
-#if defined(KOKKOS_ENABLE_PERFORMANCE_POSIX_MEMALIGN)
-#define KOKKOS_ENABLE_POSIX_MEMALIGN 1
-#endif
+ #if defined( KOKKOS_ENABLE_PERFORMANCE_POSIX_MEMALIGN )
+ #define KOKKOS_ENABLE_POSIX_MEMALIGN 1
+ #endif
#endif
//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-/**Enable Profiling by default**/
+// Enable Profiling by default
#ifndef KOKKOS_ENABLE_PROFILING
-#define KOKKOS_ENABLE_PROFILING 1
+ #define KOKKOS_ENABLE_PROFILING 1
#endif
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#endif /* #ifndef KOKKOS_MACROS_HPP */
-
+#endif // #ifndef KOKKOS_MACROS_HPP
diff --git a/lib/kokkos/core/src/Kokkos_MemoryPool.hpp b/lib/kokkos/core/src/Kokkos_MemoryPool.hpp
index 2d45926e7..eadad10b4 100644
--- a/lib/kokkos/core/src/Kokkos_MemoryPool.hpp
+++ b/lib/kokkos/core/src/Kokkos_MemoryPool.hpp
@@ -1,1558 +1,1559 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_MEMORYPOOL_HPP
#define KOKKOS_MEMORYPOOL_HPP
#include <Kokkos_Core_fwd.hpp>
#include <Kokkos_Parallel.hpp>
#include <Kokkos_Atomic.hpp>
#include <impl/Kokkos_BitOps.hpp>
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_SharedAlloc.hpp>
#include <limits>
#include <algorithm>
#include <chrono>
// How should errors be handled? In general, production code should return a
// value indicating failure so the user can decide how the error is handled.
// While experimental, code can abort instead. If KOKKOS_ENABLE_MEMPOOL_PRINTERR is
// defined, the code will abort with an error message. Otherwise, the code will
// return with a value indicating failure when possible, or do nothing instead.
//#define KOKKOS_ENABLE_MEMPOOL_PRINTERR
//#define KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
//#define KOKKOS_ENABLE_MEMPOOL_PRINT_CONSTRUCTOR_INFO
//#define KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO
//#define KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
//#define KOKKOS_ENABLE_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
//#define KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO
//#define KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace MempoolImpl {
template < typename T, typename ExecutionSpace >
struct initialize_array {
typedef ExecutionSpace execution_space;
typedef typename ExecutionSpace::size_type size_type;
T * m_data;
T m_value;
initialize_array( T * d, size_t size, T v ) : m_data( d ), m_value( v )
{
Kokkos::parallel_for( size, *this );
execution_space::fence();
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const { m_data[i] = m_value; }
};
template <typename Bitset>
struct bitset_count
{
typedef typename Bitset::execution_space execution_space;
typedef typename execution_space::size_type size_type;
typedef typename Bitset::size_type value_type;
typedef typename Bitset::word_type word_type;
word_type * m_words;
value_type & m_result;
bitset_count( word_type * w, value_type num_words, value_type & r )
: m_words( w ), m_result( r )
{
parallel_reduce( num_words, *this, m_result );
}
KOKKOS_INLINE_FUNCTION
void init( value_type & v ) const
{ v = 0; }
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & dst, volatile value_type const & src ) const
{ dst += src; }
KOKKOS_INLINE_FUNCTION
void operator()( size_type i, value_type & count ) const
{
count += Kokkos::Impl::bit_count( m_words[i] );
}
};
template < typename Device >
class Bitset {
public:
typedef typename Device::execution_space execution_space;
typedef typename Device::memory_space memory_space;
typedef unsigned word_type;
typedef unsigned size_type;
typedef Kokkos::Impl::DeepCopy< memory_space, Kokkos::HostSpace > raw_deep_copy;
// Define some constants.
enum {
// Size of bitset word. Should be 32.
WORD_SIZE = sizeof(word_type) * CHAR_BIT,
LG_WORD_SIZE = Kokkos::Impl::integral_power_of_two( WORD_SIZE ),
WORD_MASK = WORD_SIZE - 1
};
private:
word_type * m_words;
size_type m_size;
size_type m_num_words;
word_type m_last_word_mask;
public:
~Bitset() = default;
Bitset() = default;
Bitset( Bitset && ) = default;
Bitset( const Bitset & ) = default;
Bitset & operator = ( Bitset && ) = default;
Bitset & operator = ( const Bitset & ) = default;
void init( void * w, size_type s )
{
// Assumption: The size of the memory pointed to by w is a multiple of
// sizeof(word_type).
m_words = reinterpret_cast<word_type*>( w );
m_size = s;
m_num_words = ( s + WORD_SIZE - 1 ) >> LG_WORD_SIZE;
m_last_word_mask = m_size & WORD_MASK ? ( word_type(1) << ( m_size & WORD_MASK ) ) - 1 : 0;
reset();
}
size_type size() const { return m_size; }
size_type count() const
{
size_type val = 0;
bitset_count< Bitset > bc( m_words, m_num_words, val );
return val;
}
void set()
{
// Set all the bits.
initialize_array< word_type, execution_space > ia( m_words, m_num_words, ~word_type(0) );
if ( m_last_word_mask ) {
// Clear the unused bits in the last block.
raw_deep_copy( m_words + ( m_num_words - 1 ), &m_last_word_mask, sizeof(word_type) );
}
}
void reset()
{
initialize_array< word_type, execution_space > ia( m_words, m_num_words, word_type(0) );
}
KOKKOS_FORCEINLINE_FUNCTION
bool test( size_type i ) const
{
size_type word_pos = i >> LG_WORD_SIZE;
word_type word = volatile_load( &m_words[ word_pos ] );
word_type mask = word_type(1) << ( i & WORD_MASK );
return word & mask;
}
KOKKOS_FORCEINLINE_FUNCTION
bool set( size_type i ) const
{
size_type word_pos = i >> LG_WORD_SIZE;
word_type mask = word_type(1) << ( i & WORD_MASK );
return !( atomic_fetch_or( &m_words[ word_pos ], mask ) & mask );
}
KOKKOS_FORCEINLINE_FUNCTION
bool reset( size_type i ) const
{
size_type word_pos = i >> LG_WORD_SIZE;
word_type mask = word_type(1) << ( i & WORD_MASK );
return atomic_fetch_and( &m_words[ word_pos ], ~mask ) & mask;
}
KOKKOS_FORCEINLINE_FUNCTION
Kokkos::pair< bool, word_type >
fetch_word_set( size_type i ) const
{
size_type word_pos = i >> LG_WORD_SIZE;
word_type mask = word_type(1) << ( i & WORD_MASK );
Kokkos::pair<bool, word_type> result;
result.second = atomic_fetch_or( &m_words[ word_pos ], mask );
result.first = !( result.second & mask );
return result;
}
KOKKOS_FORCEINLINE_FUNCTION
Kokkos::pair< bool, word_type >
fetch_word_reset( size_type i ) const
{
size_type word_pos = i >> LG_WORD_SIZE;
word_type mask = word_type(1) << ( i & WORD_MASK );
Kokkos::pair<bool, word_type> result;
result.second = atomic_fetch_and( &m_words[ word_pos ], ~mask );
result.first = result.second & mask;
return result;
}
KOKKOS_FORCEINLINE_FUNCTION
Kokkos::pair< bool, word_type >
set_any_in_word( size_type & pos ) const
{
size_type word_pos = pos >> LG_WORD_SIZE;
word_type word = volatile_load( &m_words[ word_pos ] );
// Loop until there are no more unset bits in the word.
while ( ~word ) {
// Find the first unset bit in the word.
size_type bit = Kokkos::Impl::bit_scan_forward( ~word );
// Try to set the bit.
word_type mask = word_type(1) << bit;
word = atomic_fetch_or( &m_words[ word_pos ], mask );
if ( !( word & mask ) ) {
// Successfully set the bit.
pos = ( word_pos << LG_WORD_SIZE ) + bit;
return Kokkos::pair<bool, word_type>( true, word );
}
}
// Didn't find a free bit in this word.
return Kokkos::pair<bool, word_type>( false, word_type(0) );
}
KOKKOS_FORCEINLINE_FUNCTION
Kokkos::pair< bool, word_type >
set_any_in_word( size_type & pos, word_type word_mask ) const
{
size_type word_pos = pos >> LG_WORD_SIZE;
word_type word = volatile_load( &m_words[ word_pos ] );
word = ( ~word ) & word_mask;
// Loop until there are no more unset bits in the word.
while ( word ) {
// Find the first unset bit in the word.
size_type bit = Kokkos::Impl::bit_scan_forward( word );
// Try to set the bit.
word_type mask = word_type(1) << bit;
word = atomic_fetch_or( &m_words[ word_pos ], mask );
if ( !( word & mask ) ) {
// Successfully set the bit.
pos = ( word_pos << LG_WORD_SIZE ) + bit;
return Kokkos::pair<bool, word_type>( true, word );
}
word = ( ~word ) & word_mask;
}
// Didn't find a free bit in this word.
return Kokkos::pair<bool, word_type>( false, word_type(0) );
}
KOKKOS_FORCEINLINE_FUNCTION
Kokkos::pair< bool, word_type >
reset_any_in_word( size_type & pos ) const
{
size_type word_pos = pos >> LG_WORD_SIZE;
word_type word = volatile_load( &m_words[ word_pos ] );
// Loop until there are no more set bits in the word.
while ( word ) {
// Find the first unset bit in the word.
size_type bit = Kokkos::Impl::bit_scan_forward( word );
// Try to reset the bit.
word_type mask = word_type(1) << bit;
word = atomic_fetch_and( &m_words[ word_pos ], ~mask );
if ( word & mask ) {
// Successfully reset the bit.
pos = ( word_pos << LG_WORD_SIZE ) + bit;
return Kokkos::pair<bool, word_type>( true, word );
}
}
// Didn't find a free bit in this word.
return Kokkos::pair<bool, word_type>( false, word_type(0) );
}
KOKKOS_FORCEINLINE_FUNCTION
Kokkos::pair< bool, word_type >
reset_any_in_word( size_type & pos, word_type word_mask ) const
{
size_type word_pos = pos >> LG_WORD_SIZE;
word_type word = volatile_load( &m_words[ word_pos ] );
word = word & word_mask;
// Loop until there are no more set bits in the word.
while ( word ) {
// Find the first unset bit in the word.
size_type bit = Kokkos::Impl::bit_scan_forward( word );
// Try to reset the bit.
word_type mask = word_type(1) << bit;
word = atomic_fetch_and( &m_words[ word_pos ], ~mask );
if ( word & mask ) {
// Successfully reset the bit.
pos = ( word_pos << LG_WORD_SIZE ) + bit;
return Kokkos::pair<bool, word_type>( true, word );
}
word = word & word_mask;
}
// Didn't find a free bit in this word.
return Kokkos::pair<bool, word_type>( false, word_type(0) );
}
};
template < typename UInt32View, typename BSHeaderView, typename SBHeaderView,
typename MempoolBitset >
struct create_histogram {
typedef typename UInt32View::execution_space execution_space;
typedef typename execution_space::size_type size_type;
typedef Kokkos::pair< double, uint32_t > value_type;
size_t m_start;
UInt32View m_page_histogram;
BSHeaderView m_blocksize_info;
SBHeaderView m_sb_header;
MempoolBitset m_sb_blocks;
size_t m_lg_max_sb_blocks;
uint32_t m_lg_min_block_size;
uint32_t m_blocks_per_page;
value_type & m_result;
create_histogram( size_t start, size_t end, UInt32View ph, BSHeaderView bsi,
SBHeaderView sbh, MempoolBitset sbb, size_t lmsb,
uint32_t lmbs, uint32_t bpp, value_type & r )
: m_start( start ), m_page_histogram( ph ), m_blocksize_info( bsi ),
m_sb_header( sbh ), m_sb_blocks( sbb ), m_lg_max_sb_blocks( lmsb ),
m_lg_min_block_size( lmbs ), m_blocks_per_page( bpp ), m_result( r )
{
Kokkos::parallel_reduce( end - start, *this, m_result );
execution_space::fence();
}
KOKKOS_INLINE_FUNCTION
void init( value_type & v ) const
{
v.first = 0.0;
v.second = 0;
}
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & dst, volatile value_type const & src ) const
{
dst.first += src.first;
dst.second += src.second;
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i, value_type & r ) const
{
size_type i2 = i + m_start;
uint32_t lg_block_size = m_sb_header(i2).m_lg_block_size;
// A superblock only has a block size of 0 when it is empty.
if ( lg_block_size != 0 ) {
uint32_t block_size_id = lg_block_size - m_lg_min_block_size;
uint32_t blocks_per_sb = m_blocksize_info[block_size_id].m_blocks_per_sb;
uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;
uint32_t total_allocated_blocks = 0;
for ( uint32_t j = 0; j < pages_per_sb; ++j ) {
unsigned start_pos = ( i2 << m_lg_max_sb_blocks ) + j * m_blocks_per_page;
unsigned end_pos = start_pos + m_blocks_per_page;
uint32_t page_allocated_blocks = 0;
for ( unsigned k = start_pos; k < end_pos; ++k ) {
page_allocated_blocks += m_sb_blocks.test( k );
}
total_allocated_blocks += page_allocated_blocks;
atomic_increment( &m_page_histogram(page_allocated_blocks) );
}
r.first += double(total_allocated_blocks) / blocks_per_sb;
r.second += blocks_per_sb;
}
}
};
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
template < typename UInt32View, typename SBHeaderView, typename MempoolBitset >
struct count_allocated_blocks {
typedef typename UInt32View::execution_space execution_space;
typedef typename execution_space::size_type size_type;
UInt32View m_num_allocated_blocks;
SBHeaderView m_sb_header;
MempoolBitset m_sb_blocks;
size_t m_sb_size;
size_t m_lg_max_sb_blocks;
count_allocated_blocks( size_t num_sb, UInt32View nab, SBHeaderView sbh,
MempoolBitset sbb, size_t sbs, size_t lmsb )
: m_num_allocated_blocks( nab ), m_sb_header( sbh ),
m_sb_blocks( sbb ), m_sb_size( sbs ), m_lg_max_sb_blocks( lmsb )
{
Kokkos::parallel_for( num_sb, *this );
execution_space::fence();
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const
{
uint32_t lg_block_size = m_sb_header(i).m_lg_block_size;
// A superblock only has a block size of 0 when it is empty.
if ( lg_block_size != 0 ) {
// Count the allocated blocks in the superblock.
uint32_t blocks_per_sb = lg_block_size > 0 ? m_sb_size >> lg_block_size : 0;
unsigned start_pos = i << m_lg_max_sb_blocks;
unsigned end_pos = start_pos + blocks_per_sb;
uint32_t count = 0;
for ( unsigned j = start_pos; j < end_pos; ++j ) {
count += m_sb_blocks.test( j );
}
m_num_allocated_blocks(i) = count;
}
}
};
#endif
}
/// \class MemoryPool
/// \brief Bitset based memory manager for pools of same-sized chunks of memory.
/// \tparam Device Kokkos device that gives the execution and memory space the
/// allocator will be used in.
///
/// MemoryPool is a memory space that can be on host or device. It provides a
/// pool memory allocator for fast allocation of same-sized chunks of memory.
/// The memory is only accessible on the host / device this allocator is
/// associated with.
///
/// This allocator is based on ideas from the following GPU allocators:
/// Halloc (https://github.com/canonizer/halloc).
/// ScatterAlloc (https://github.com/ComputationalRadiationPhysics/scatteralloc)
template < typename Device >
class MemoryPool {
private:
// The allocator uses superblocks. A superblock is divided into pages, and a
// page is divided into blocks. A block is the chunk of memory that is given
// out by the allocator. A page always has a number of blocks equal to the
// size of the word used by the bitset. Thus, the pagesize can vary between
// superblocks as it is based on the block size of the superblock. The
// allocator supports all powers of 2 from MIN_BLOCK_SIZE to the size of a
// superblock as block sizes.
// Superblocks are divided into 4 categories:
// 1. empty - is completely empty; there are no active allocations
// 2. partfull - partially full; there are some active allocations
// 3. full - full enough with active allocations that new allocations
// will likely fail
// 4. active - is currently the active superblock for a block size
//
// An inactive superblock is one that is empty, partfull, or full.
//
// New allocations occur only from an active superblock. If a superblock is
// made inactive after an allocation request is made to it but before the
// allocation request is fulfilled, the allocation will still be attempted
// from that superblock. Deallocations can occur to partfull, full, or
// active superblocks. Superblocks move between categories as allocations
// and deallocations happen. Superblocks all start empty.
//
// Here are the possible moves between categories:
// empty -> active During allocation, there is no active superblock
// or the active superblock is full.
// active -> full During allocation, the full threshold of the
// superblock is reached when increasing the fill
// level.
// full -> partfull During deallocation, the full threshold of the
// superblock is crossed when decreasing the fill
// level.
// partfull -> empty Deallocation of the last allocated block of an
// inactive superblock.
// partfull -> active During allocation, the active superblock is full.
//
// When a new active superblock is needed, partfull superblocks of the same
// block size are chosen over empty superblocks.
//
// The empty and partfull superblocks are tracked using bitsets that represent
// the superblocks in those repsective categories. Empty superblocks use a
// single bitset, while partfull superblocks use a bitset per block size
// (contained sequentially in a single bitset). Active superblocks are
// tracked by the active superblocks array. Full superblocks aren't tracked
// at all.
typedef typename Device::execution_space execution_space;
typedef typename Device::memory_space backend_memory_space;
typedef Device device_type;
typedef MempoolImpl::Bitset< device_type > MempoolBitset;
// Define some constants.
enum {
MIN_BLOCK_SIZE = 64,
LG_MIN_BLOCK_SIZE = Kokkos::Impl::integral_power_of_two( MIN_BLOCK_SIZE ),
MAX_BLOCK_SIZES = 31 - LG_MIN_BLOCK_SIZE + 1,
// Size of bitset word.
BLOCKS_PER_PAGE = MempoolBitset::WORD_SIZE,
LG_BLOCKS_PER_PAGE = MempoolBitset::LG_WORD_SIZE,
INVALID_SUPERBLOCK = ~uint32_t(0),
SUPERBLOCK_LOCK = ~uint32_t(0) - 1,
MAX_TRIES = 32 // Cap on the number of pages searched
// before an allocation returns empty.
};
public:
// Stores information about each superblock.
struct SuperblockHeader {
uint32_t m_full_pages;
uint32_t m_empty_pages;
uint32_t m_lg_block_size;
uint32_t m_is_active;
KOKKOS_FUNCTION
SuperblockHeader() :
m_full_pages(0), m_empty_pages(0), m_lg_block_size(0), m_is_active(false) {}
};
// Stores information about each block size.
struct BlockSizeHeader {
uint32_t m_blocks_per_sb;
uint32_t m_pages_per_sb;
uint32_t m_sb_full_level;
uint32_t m_page_full_level;
KOKKOS_FUNCTION
BlockSizeHeader() :
m_blocks_per_sb(0), m_pages_per_sb(0), m_sb_full_level(0), m_page_full_level(0) {}
};
private:
typedef Kokkos::Impl::SharedAllocationTracker Tracker;
typedef View< uint32_t *, device_type > UInt32View;
typedef View< SuperblockHeader *, device_type > SBHeaderView;
// The letters 'sb' used in any variable name mean superblock.
size_t m_lg_sb_size; // Log2 of superblock size.
size_t m_sb_size; // Superblock size.
size_t m_lg_max_sb_blocks; // Log2 of the number of blocks of the
// minimum block size in a superblock.
size_t m_num_sb; // Number of superblocks.
size_t m_ceil_num_sb; // Number of superblocks rounded up to the smallest
// multiple of the bitset word size. Used by
// bitsets representing superblock categories to
// ensure different block sizes never share a word
// in the bitset.
size_t m_num_block_size; // Number of block sizes supported.
size_t m_data_size; // Amount of memory available to the allocator.
size_t m_sb_blocks_size; // Amount of memory for free / empty blocks bitset.
size_t m_empty_sb_size; // Amount of memory for empty superblocks bitset.
size_t m_partfull_sb_size; // Amount of memory for partfull superblocks bitset.
size_t m_total_size; // Total amount of memory allocated.
char * m_data; // Beginning device memory location used for
// superblocks.
UInt32View m_active; // Active superblocks IDs.
SBHeaderView m_sb_header; // Header info for superblocks.
MempoolBitset m_sb_blocks; // Bitsets representing free / allocated status
// of blocks in superblocks.
MempoolBitset m_empty_sb; // Bitset representing empty superblocks.
MempoolBitset m_partfull_sb; // Bitsets representing partially full superblocks.
Tracker m_track; // Tracker for superblock memory.
BlockSizeHeader m_blocksize_info[MAX_BLOCK_SIZES]; // Header info for block sizes.
// There were several methods tried for storing the block size header info: in a View,
// in a View of const data, and in a RandomAccess View. All of these were slower than
// storing it in a static array that is a member variable to the class. In the latter
// case, the block size info gets copied into the constant memory on the GPU along with
// the class when it is copied there for exeucting a parallel loop. Instead of storing
// the values, computing the values every time they were needed was also tried. This
// method was slightly slower than storing them in the static array.
public:
//! Tag this class as a kokkos memory space
typedef MemoryPool memory_space;
~MemoryPool() = default;
MemoryPool() = default;
MemoryPool( MemoryPool && ) = default;
MemoryPool( const MemoryPool & ) = default;
MemoryPool & operator = ( MemoryPool && ) = default;
MemoryPool & operator = ( const MemoryPool & ) = default;
/// \brief Initializes the memory pool.
/// \param memspace The memory space from which the memory pool will allocate memory.
/// \param total_size The requested memory amount controlled by the allocator. The
/// actual amount is rounded up to the smallest multiple of the
/// superblock size >= the requested size.
/// \param log2_superblock_size Log2 of the size of superblocks used by the allocator.
/// In most use cases, the default value should work.
inline
MemoryPool( const backend_memory_space & memspace,
size_t total_size, size_t log2_superblock_size = 20 )
: m_lg_sb_size( log2_superblock_size ),
m_sb_size( size_t(1) << m_lg_sb_size ),
m_lg_max_sb_blocks( m_lg_sb_size - LG_MIN_BLOCK_SIZE ),
m_num_sb( ( total_size + m_sb_size - 1 ) >> m_lg_sb_size ),
m_ceil_num_sb( ( ( m_num_sb + BLOCKS_PER_PAGE - 1 ) >> LG_BLOCKS_PER_PAGE ) <<
LG_BLOCKS_PER_PAGE ),
m_num_block_size( m_lg_sb_size - LG_MIN_BLOCK_SIZE + 1 ),
m_data_size( m_num_sb * m_sb_size ),
m_sb_blocks_size( ( m_num_sb << m_lg_max_sb_blocks ) / CHAR_BIT ),
m_empty_sb_size( m_ceil_num_sb / CHAR_BIT ),
m_partfull_sb_size( m_ceil_num_sb * m_num_block_size / CHAR_BIT ),
m_total_size( m_data_size + m_sb_blocks_size + m_empty_sb_size + m_partfull_sb_size ),
m_data(0),
m_active( "Active superblocks" ),
m_sb_header( "Superblock headers" ),
m_track()
{
// Assumption. The minimum block size must be a power of 2.
static_assert( Kokkos::Impl::is_integral_power_of_two( MIN_BLOCK_SIZE ), "" );
// Assumption. Require a superblock be large enough so it takes at least 1
// whole bitset word to represent it using the minimum blocksize.
if ( m_sb_size < MIN_BLOCK_SIZE * BLOCKS_PER_PAGE ) {
printf( "\n** MemoryPool::MemoryPool() Superblock size must be >= %u **\n",
MIN_BLOCK_SIZE * BLOCKS_PER_PAGE );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
Kokkos::abort( "" );
}
// Assumption. A superblock's size can be at most 2^31. Verify this.
if ( m_lg_sb_size > 31 ) {
printf( "\n** MemoryPool::MemoryPool() Superblock size must be < %u **\n",
( uint32_t(1) << 31 ) );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
Kokkos::abort( "" );
}
// Assumption. The Bitset only uses unsigned for size types which limits
// the amount of memory the allocator can manage. Verify the memory size
// is below this limit.
if ( m_data_size > size_t(MIN_BLOCK_SIZE) * std::numeric_limits<unsigned>::max() ) {
printf( "\n** MemoryPool::MemoryPool() Allocator can only manage %lu bytes of memory; requested %lu **\n",
size_t(MIN_BLOCK_SIZE) * std::numeric_limits<unsigned>::max(), total_size );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
Kokkos::abort( "" );
}
// Allocate memory for Views. This is done here instead of at construction
// so that the runtime checks can be performed before allocating memory.
resize( m_active, m_num_block_size );
resize( m_sb_header, m_num_sb );
// Allocate superblock memory.
typedef Kokkos::Impl::SharedAllocationRecord< backend_memory_space, void > SharedRecord;
SharedRecord * rec =
SharedRecord::allocate( memspace, "mempool", m_total_size );
m_track.assign_allocated_record_to_uninitialized( rec );
m_data = reinterpret_cast<char *>( rec->data() );
// Set and initialize the free / empty block bitset memory.
m_sb_blocks.init( m_data + m_data_size, m_num_sb << m_lg_max_sb_blocks );
// Set and initialize the empty superblock block bitset memory.
m_empty_sb.init( m_data + m_data_size + m_sb_blocks_size, m_num_sb );
// Start with all superblocks in the empty category.
m_empty_sb.set();
// Set and initialize the partfull superblock block bitset memory.
m_partfull_sb.init( m_data + m_data_size + m_sb_blocks_size + m_empty_sb_size,
m_ceil_num_sb * m_num_block_size );
// Initialize all active superblocks to be invalid.
typename UInt32View::HostMirror host_active = create_mirror_view( m_active );
for ( size_t i = 0; i < m_num_block_size; ++i ) host_active(i) = INVALID_SUPERBLOCK;
deep_copy( m_active, host_active );
// A superblock is considered full when this percentage of its pages are full.
const double superblock_full_fraction = .8;
// A page is considered full when this percentage of its blocks are full.
const double page_full_fraction = .875;
// Initialize the blocksize info.
for ( size_t i = 0; i < m_num_block_size; ++i ) {
uint32_t lg_block_size = i + LG_MIN_BLOCK_SIZE;
uint32_t blocks_per_sb = m_sb_size >> lg_block_size;
uint32_t pages_per_sb = ( blocks_per_sb + BLOCKS_PER_PAGE - 1 ) >> LG_BLOCKS_PER_PAGE;
m_blocksize_info[i].m_blocks_per_sb = blocks_per_sb;
m_blocksize_info[i].m_pages_per_sb = pages_per_sb;
// Set the full level for the superblock.
m_blocksize_info[i].m_sb_full_level =
static_cast<uint32_t>( pages_per_sb * superblock_full_fraction );
if ( m_blocksize_info[i].m_sb_full_level == 0 ) {
m_blocksize_info[i].m_sb_full_level = 1;
}
// Set the full level for the page.
uint32_t blocks_per_page =
blocks_per_sb < BLOCKS_PER_PAGE ? blocks_per_sb : BLOCKS_PER_PAGE;
m_blocksize_info[i].m_page_full_level =
static_cast<uint32_t>( blocks_per_page * page_full_fraction );
if ( m_blocksize_info[i].m_page_full_level == 0 ) {
m_blocksize_info[i].m_page_full_level = 1;
}
}
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_CONSTRUCTOR_INFO
printf( "\n" );
printf( " m_lg_sb_size: %12lu\n", m_lg_sb_size );
printf( " m_sb_size: %12lu\n", m_sb_size );
printf( " m_max_sb_blocks: %12lu\n", size_t(1) << m_lg_max_sb_blocks );
printf( "m_lg_max_sb_blocks: %12lu\n", m_lg_max_sb_blocks );
printf( " m_num_sb: %12lu\n", m_num_sb );
printf( " m_ceil_num_sb: %12lu\n", m_ceil_num_sb );
printf( " m_num_block_size: %12lu\n", m_num_block_size );
printf( " data bytes: %12lu\n", m_data_size );
printf( " sb_blocks bytes: %12lu\n", m_sb_blocks_size );
printf( " empty_sb bytes: %12lu\n", m_empty_sb_size );
printf( " partfull_sb bytes: %12lu\n", m_partfull_sb_size );
printf( " total bytes: %12lu\n", m_total_size );
printf( " m_empty_sb size: %12u\n", m_empty_sb.size() );
printf( "m_partfull_sb size: %12u\n", m_partfull_sb.size() );
printf( "\n" );
fflush( stdout );
#endif
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO
// Print the blocksize info for all the block sizes.
printf( "SIZE BLOCKS_PER_SB PAGES_PER_SB SB_FULL_LEVEL PAGE_FULL_LEVEL\n" );
for ( size_t i = 0; i < m_num_block_size; ++i ) {
printf( "%4zu %13u %12u %13u %15u\n", i + LG_MIN_BLOCK_SIZE,
m_blocksize_info[i].m_blocks_per_sb, m_blocksize_info[i].m_pages_per_sb,
m_blocksize_info[i].m_sb_full_level, m_blocksize_info[i].m_page_full_level );
}
printf( "\n" );
#endif
}
/// \brief The actual block size allocated given alloc_size.
KOKKOS_INLINE_FUNCTION
size_t allocate_block_size( const size_t alloc_size ) const
{ return size_t(1) << ( get_block_size_index( alloc_size ) + LG_MIN_BLOCK_SIZE ); }
/// \brief Allocate a chunk of memory.
/// \param alloc_size Size of the requested allocated in number of bytes.
///
/// The function returns a void pointer to a memory location on success and
/// NULL on failure.
KOKKOS_FUNCTION
void * allocate( size_t alloc_size ) const
{
void * p = 0;
// Only support allocations up to the superblock size. Just return 0
// (failed allocation) for any size above this.
if ( alloc_size <= m_sb_size )
{
int block_size_id = get_block_size_index( alloc_size );
uint32_t blocks_per_sb = m_blocksize_info[block_size_id].m_blocks_per_sb;
uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;
#ifdef KOKKOS_IMPL_CUDA_CLANG_WORKAROUND
// Without this test it looks like pages_per_sb might come back wrong.
if ( pages_per_sb == 0 ) return NULL;
#endif
unsigned word_size = blocks_per_sb > 32 ? 32 : blocks_per_sb;
unsigned word_mask = ( uint64_t(1) << word_size ) - 1;
// Instead of forcing an atomic read to guarantee the updated value,
// reading the old value is actually beneficial because more threads will
// attempt allocations on the old active superblock instead of waiting on
// the new active superblock. This will help hide the latency of
// switching the active superblock.
uint32_t sb_id = volatile_load( &m_active(block_size_id) );
// If the active is locked, keep reading it atomically until the lock is
// released.
while ( sb_id == SUPERBLOCK_LOCK ) {
sb_id = atomic_fetch_or( &m_active(block_size_id), uint32_t(0) );
}
load_fence();
bool allocation_done = false;
while ( !allocation_done ) {
bool need_new_sb = false;
if ( sb_id != INVALID_SUPERBLOCK ) {
// Use the value from the clock register as the hash value.
uint64_t hash_val = get_clock_register();
// Get the starting position for this superblock's bits in the bitset.
uint32_t pos_base = sb_id << m_lg_max_sb_blocks;
// Mod the hash value to choose a page in the superblock. The
// initial block searched is the first block of that page.
uint32_t pos_rel = uint32_t( hash_val & ( pages_per_sb - 1 ) ) << LG_BLOCKS_PER_PAGE;
// Get the absolute starting position for this superblock's bits in the bitset.
uint32_t pos = pos_base + pos_rel;
// Keep track of the number of pages searched. Pages in the superblock are
// searched linearly from the starting page. All pages in the superblock are
// searched until either a location is found, or it is proven empty.
uint32_t pages_searched = 0;
bool search_done = false;
while ( !search_done ) {
bool success = false;
unsigned prev_val = 0;
Kokkos::tie( success, prev_val ) = m_sb_blocks.set_any_in_word( pos, word_mask );
if ( !success ) {
if ( ++pages_searched >= pages_per_sb ) {
// Searched all the pages in this superblock. Look for a new superblock.
//
// The previous method tried limiting the number of pages searched, but
// that caused a huge performance issue in CUDA where the outer loop
// executed massive numbers of times. Threads weren't able to find a
// free location when the superblock wasn't full and were able to execute
// the outer loop many times before the superblock was switched for a new
// one. Switching to an exhaustive search eliminated this possiblity and
// didn't slow anything down for the tests.
need_new_sb = true;
search_done = true;
}
else {
// Move to the next page making sure the new search position
// doesn't go past this superblock's bits.
pos += BLOCKS_PER_PAGE;
pos = ( pos < pos_base + blocks_per_sb ) ? pos : pos_base;
}
}
else {
// Reserved a memory location to allocate.
memory_fence();
search_done = true;
allocation_done = true;
uint32_t lg_block_size = block_size_id + LG_MIN_BLOCK_SIZE;
p = m_data + ( size_t(sb_id) << m_lg_sb_size ) +
( ( pos - pos_base ) << lg_block_size );
uint32_t used_bits = Kokkos::Impl::bit_count( prev_val );
if ( used_bits == 0 ) {
// This page was empty. Decrement the number of empty pages for
// the superblock.
atomic_decrement( &m_sb_header(sb_id).m_empty_pages );
}
else if ( used_bits == m_blocksize_info[block_size_id].m_page_full_level - 1 )
{
// This page is full. Increment the number of full pages for
// the superblock.
uint32_t full_pages = atomic_fetch_add( &m_sb_header(sb_id).m_full_pages, 1 );
// This allocation made the superblock full, so a new one needs to be found.
if ( full_pages == m_blocksize_info[block_size_id].m_sb_full_level - 1 ) {
need_new_sb = true;
}
}
}
}
}
else {
// This is the first allocation for this block size. A superblock needs
// to be set as the active one. If this point is reached any other time,
// it is an error.
need_new_sb = true;
}
if ( need_new_sb ) {
uint32_t new_sb_id = find_superblock( block_size_id, sb_id );
if ( new_sb_id == sb_id ) {
allocation_done = true;
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
printf( "** No superblocks available. **\n" );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
#endif
}
else {
sb_id = new_sb_id;
}
}
}
}
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
else {
printf( "** Requested allocation size (%zu) larger than superblock size (%lu). **\n",
alloc_size, m_sb_size );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
}
#endif
return p;
}
/// \brief Release allocated memory back to the pool.
/// \param alloc_ptr Pointer to chunk of memory previously allocated by
/// the allocator.
/// \param alloc_size Size of the allocated memory in number of bytes.
KOKKOS_FUNCTION
void deallocate( void * alloc_ptr, size_t alloc_size ) const
{
char * ap = static_cast<char *>( alloc_ptr );
// Only deallocate memory controlled by this pool.
if ( ap >= m_data && ap + alloc_size <= m_data + m_data_size ) {
// Get the superblock for the address. This can be calculated by math on
// the address since the superblocks are stored contiguously in one memory
// chunk.
uint32_t sb_id = ( ap - m_data ) >> m_lg_sb_size;
// Get the starting position for this superblock's bits in the bitset.
uint32_t pos_base = sb_id << m_lg_max_sb_blocks;
// Get the relative position for this memory location's bit in the bitset.
uint32_t offset = ( ap - m_data ) - ( size_t(sb_id) << m_lg_sb_size );
uint32_t lg_block_size = m_sb_header(sb_id).m_lg_block_size;
uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
uint32_t pos_rel = offset >> lg_block_size;
bool success = false;
unsigned prev_val = 0;
memory_fence();
Kokkos::tie( success, prev_val ) = m_sb_blocks.fetch_word_reset( pos_base + pos_rel );
// If the memory location was previously deallocated, do nothing.
if ( success ) {
uint32_t page_fill_level = Kokkos::Impl::bit_count( prev_val );
if ( page_fill_level == 1 ) {
// This page is now empty. Increment the number of empty pages for the
// superblock.
uint32_t empty_pages = atomic_fetch_add( &m_sb_header(sb_id).m_empty_pages, 1 );
if ( !volatile_load( &m_sb_header(sb_id).m_is_active ) &&
empty_pages == m_blocksize_info[block_size_id].m_pages_per_sb - 1 )
{
// This deallocation caused the superblock to be empty. Change the
// superblock category from partially full to empty.
unsigned pos = block_size_id * m_ceil_num_sb + sb_id;
if ( m_partfull_sb.reset( pos ) ) {
// Reset the empty pages and block size for the superblock.
volatile_store( &m_sb_header(sb_id).m_empty_pages, uint32_t(0) );
volatile_store( &m_sb_header(sb_id).m_lg_block_size, uint32_t(0) );
store_fence();
m_empty_sb.set( sb_id );
}
}
}
else if ( page_fill_level == m_blocksize_info[block_size_id].m_page_full_level ) {
// This page is no longer full. Decrement the number of full pages for
// the superblock.
uint32_t full_pages = atomic_fetch_sub( &m_sb_header(sb_id).m_full_pages, 1 );
if ( !volatile_load( &m_sb_header(sb_id).m_is_active ) &&
full_pages == m_blocksize_info[block_size_id].m_sb_full_level )
{
// This deallocation caused the number of full pages to decrease below
// the full threshold. Change the superblock category from full to
// partially full.
unsigned pos = block_size_id * m_ceil_num_sb + sb_id;
m_partfull_sb.set( pos );
}
}
}
}
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINTERR
else {
printf( "\n** MemoryPool::deallocate() ADDRESS_OUT_OF_RANGE(0x%llx) **\n",
reinterpret_cast<uint64_t>( alloc_ptr ) );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
}
#endif
}
/// \brief Tests if the memory pool has no more memory available to allocate.
KOKKOS_INLINE_FUNCTION
bool is_empty() const
{
// The allocator is empty if all superblocks are full. A superblock is
// full if it has >= 80% of its pages allocated.
// Look at all the superblocks. If one is not full, then the allocator
// isn't empty.
for ( size_t i = 0; i < m_num_sb; ++i ) {
uint32_t lg_block_size = m_sb_header(i).m_lg_block_size;
// A superblock only has a block size of 0 when it is empty.
if ( lg_block_size == 0 ) return false;
uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
uint32_t full_pages = volatile_load( &m_sb_header(i).m_full_pages );
if ( full_pages < m_blocksize_info[block_size_id].m_sb_full_level ) return false;
}
// All the superblocks were full. The allocator is empty.
return true;
}
// The following functions are used for debugging.
void print_status() const
{
printf( "\n" );
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
typename SBHeaderView::HostMirror host_sb_header = create_mirror_view( m_sb_header );
deep_copy( host_sb_header, m_sb_header );
UInt32View num_allocated_blocks( "Allocated Blocks", m_num_sb );
// Count the number of allocated blocks per superblock.
{
MempoolImpl::count_allocated_blocks< UInt32View, SBHeaderView, MempoolBitset >
mch( m_num_sb, num_allocated_blocks, m_sb_header,
m_sb_blocks, m_sb_size, m_lg_max_sb_blocks );
}
typename UInt32View::HostMirror host_num_allocated_blocks =
create_mirror_view( num_allocated_blocks );
deep_copy( host_num_allocated_blocks, num_allocated_blocks );
// Print header info of all superblocks.
printf( "SB_ID SIZE ACTIVE EMPTY_PAGES FULL_PAGES USED_BLOCKS\n" );
for ( size_t i = 0; i < m_num_sb; ++i ) {
printf( "%5zu %4u %6d %11u %10u %10u\n", i,
host_sb_header(i).m_lg_block_size, host_sb_header(i).m_is_active,
host_sb_header(i).m_empty_pages, host_sb_header(i).m_full_pages,
host_num_allocated_blocks(i) );
}
printf( "\n" );
#endif
UInt32View page_histogram( "Page Histogram", 33 );
// Get a View version of the blocksize info.
typedef View< BlockSizeHeader *, device_type > BSHeaderView;
BSHeaderView blocksize_info( "BlockSize Headers", MAX_BLOCK_SIZES );
Kokkos::Impl::DeepCopy< backend_memory_space, Kokkos::HostSpace >
dc( blocksize_info.ptr_on_device(), m_blocksize_info,
sizeof(BlockSizeHeader) * m_num_block_size );
Kokkos::pair< double, uint32_t > result = Kokkos::pair< double, uint32_t >( 0.0, 0 );
// Create the page histogram.
{
MempoolImpl::create_histogram< UInt32View, BSHeaderView, SBHeaderView, MempoolBitset >
mch( 0, m_num_sb, page_histogram, blocksize_info, m_sb_header, m_sb_blocks,
m_lg_max_sb_blocks, LG_MIN_BLOCK_SIZE, BLOCKS_PER_PAGE, result );
}
typename UInt32View::HostMirror host_page_histogram = create_mirror_view( page_histogram );
deep_copy( host_page_histogram, page_histogram );
// Find the used and total pages and blocks.
uint32_t used_pages = 0;
uint32_t used_blocks = 0;
for ( uint32_t i = 1; i < 33; ++i ) {
used_pages += host_page_histogram(i);
used_blocks += i * host_page_histogram(i);
}
uint32_t total_pages = used_pages + host_page_histogram(0);
unsigned num_empty_sb = m_empty_sb.count();
unsigned num_non_empty_sb = m_num_sb - num_empty_sb;
unsigned num_partfull_sb = m_partfull_sb.count();
uint32_t total_blocks = result.second;
double ave_sb_full = num_non_empty_sb == 0 ? 0.0 : result.first / num_non_empty_sb;
double percent_used_sb = double( m_num_sb - num_empty_sb ) / m_num_sb;
double percent_used_pages = total_pages == 0 ? 0.0 : double(used_pages) / total_pages;
double percent_used_blocks = total_blocks == 0 ? 0.0 : double(used_blocks) / total_blocks;
// Count active superblocks.
typename UInt32View::HostMirror host_active = create_mirror_view( m_active );
deep_copy( host_active, m_active );
unsigned num_active_sb = 0;
for ( size_t i = 0; i < m_num_block_size; ++i ) {
num_active_sb += host_active(i) != INVALID_SUPERBLOCK;
}
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
// Print active superblocks.
printf( "BS_ID SB_ID\n" );
for ( size_t i = 0; i < m_num_block_size; ++i ) {
uint32_t sb_id = host_active(i);
if ( sb_id == INVALID_SUPERBLOCK ) {
printf( "%5zu I\n", i );
}
else if ( sb_id == SUPERBLOCK_LOCK ) {
printf( "%5zu L\n", i );
}
else {
printf( "%5zu %7u\n", i, sb_id );
}
}
printf( "\n" );
fflush( stdout );
#endif
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO
// Print the summary page histogram.
printf( "USED_BLOCKS PAGE_COUNT\n" );
for ( uint32_t i = 0; i < 33; ++i ) {
printf( "%10u %10u\n", i, host_page_histogram[i] );
}
printf( "\n" );
#endif
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
// Print the page histogram for a few individual superblocks.
// const uint32_t num_sb_id = 2;
// uint32_t sb_id[num_sb_id] = { 0, 10 };
const uint32_t num_sb_id = 1;
uint32_t sb_id[num_sb_id] = { 0 };
for ( uint32_t i = 0; i < num_sb_id; ++i ) {
deep_copy( page_histogram, 0 );
{
MempoolImpl::create_histogram< UInt32View, BSHeaderView, SBHeaderView, MempoolBitset >
mch( sb_id[i], sb_id[i] + 1, page_histogram, blocksize_info, m_sb_header,
m_sb_blocks, m_lg_max_sb_blocks, LG_MIN_BLOCK_SIZE, BLOCKS_PER_PAGE, result );
}
deep_copy( host_page_histogram, page_histogram );
printf( "SB_ID USED_BLOCKS PAGE_COUNT\n" );
for ( uint32_t j = 0; j < 33; ++j ) {
printf( "%5u %10u %10u\n", sb_id[i], j, host_page_histogram[j] );
}
printf( "\n" );
}
/*
// Print the blocks used for each page of a few individual superblocks.
for ( uint32_t i = 0; i < num_sb_id; ++i ) {
uint32_t lg_block_size = host_sb_header(sb_id[i]).m_lg_block_size;
if ( lg_block_size != 0 ) {
printf( "SB_ID BLOCK ID USED_BLOCKS\n" );
uint32_t block_size_id = lg_block_size - LG_MIN_BLOCK_SIZE;
uint32_t pages_per_sb = m_blocksize_info[block_size_id].m_pages_per_sb;
for ( uint32_t j = 0; j < pages_per_sb; ++j ) {
unsigned start_pos = ( sb_id[i] << m_lg_max_sb_blocks ) + j * BLOCKS_PER_PAGE;
unsigned end_pos = start_pos + BLOCKS_PER_PAGE;
uint32_t num_allocated_blocks = 0;
for ( unsigned k = start_pos; k < end_pos; ++k ) {
num_allocated_blocks += m_sb_blocks.test( k );
}
printf( "%5u %8u %11u\n", sb_id[i], j, num_allocated_blocks );
}
printf( "\n" );
}
}
*/
#endif
printf( " Used blocks: %10u / %10u = %10.6lf\n", used_blocks, total_blocks,
percent_used_blocks );
printf( " Used pages: %10u / %10u = %10.6lf\n", used_pages, total_pages,
percent_used_pages );
printf( " Used SB: %10zu / %10zu = %10.6lf\n", m_num_sb - num_empty_sb, m_num_sb,
percent_used_sb );
printf( " Active SB: %10u\n", num_active_sb );
printf( " Empty SB: %10u\n", num_empty_sb );
printf( " Partfull SB: %10u\n", num_partfull_sb );
printf( " Full SB: %10lu\n",
m_num_sb - num_active_sb - num_empty_sb - num_partfull_sb );
printf( "Ave. SB Full %%: %10.6lf\n", ave_sb_full );
printf( "\n" );
fflush( stdout );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
}
KOKKOS_INLINE_FUNCTION
size_t get_min_block_size() const { return MIN_BLOCK_SIZE; }
+ KOKKOS_INLINE_FUNCTION
size_t get_mem_size() const { return m_data_size; }
private:
/// \brief Returns the index into the active array for the given size.
///
/// Computes log2 of the largest power of two >= the given size
/// ( ie ceil( log2(size) ) ) shifted by LG_MIN_BLOCK_SIZE.
KOKKOS_FORCEINLINE_FUNCTION
int get_block_size_index( const size_t size ) const
{
// We know the size fits in a 32 bit unsigned because the size of a
// superblock is limited to 2^31, so casting to an unsigned is safe.
// Find the most significant nonzero bit.
uint32_t first_nonzero_bit =
Kokkos::Impl::bit_scan_reverse( static_cast<unsigned>( size ) );
// If size is an integral power of 2, ceil( log2(size) ) is equal to the
// most significant nonzero bit. Otherwise, you need to add 1. Since the
// minimum block size is MIN_BLOCK_SIZE, make sure ceil( log2(size) ) is at
// least LG_MIN_BLOCK_SIZE.
uint32_t lg2_size = first_nonzero_bit + !Kokkos::Impl::is_integral_power_of_two( size );
lg2_size = lg2_size > LG_MIN_BLOCK_SIZE ? lg2_size : LG_MIN_BLOCK_SIZE;
// Return ceil( log2(size) ) shifted so that the value for MIN_BLOCK_SIZE
// is 0.
return lg2_size - LG_MIN_BLOCK_SIZE;
}
/// \brief Finds a superblock with free space to become a new active superblock.
///
/// If this function is called, the current active superblock needs to be replaced
/// because it is full. Initially, only the thread that sets the active superblock
/// to full calls this function. Other threads can still allocate from the "full"
/// active superblock because a full superblock still has locations available. If
/// a thread tries to allocate from the active superblock when it has no free
/// locations, then that thread will call this function, too, and spin on a lock
/// waiting until the active superblock has been replaced.
KOKKOS_FUNCTION
uint32_t find_superblock( int block_size_id, uint32_t old_sb ) const
{
// Try to grab the lock on the head.
uint32_t lock_sb =
Kokkos::atomic_compare_exchange( &m_active(block_size_id), old_sb, SUPERBLOCK_LOCK );
load_fence();
// Initialize the new superblock to be the previous one so the previous
// superblock is returned if a new superblock can't be found.
uint32_t new_sb = lock_sb;
if ( lock_sb == old_sb ) {
// This thread has the lock.
// 1. Look for a partially filled superblock that is of the right block
// size.
size_t max_tries = m_ceil_num_sb >> LG_BLOCKS_PER_PAGE;
size_t tries = 0;
bool search_done = false;
// Set the starting search position to the beginning of this block
// size's bitset.
unsigned pos = block_size_id * m_ceil_num_sb;
while ( !search_done ) {
bool success = false;
unsigned prev_val = 0;
Kokkos::tie( success, prev_val ) = m_partfull_sb.reset_any_in_word( pos );
if ( !success ) {
if ( ++tries >= max_tries ) {
// Exceeded number of words for this block size's bitset.
search_done = true;
}
else {
pos += BLOCKS_PER_PAGE;
}
}
else {
// Found a superblock.
// It is possible that the newly found superblock is the same as the
// old superblock. In this case putting the old value back in yields
// correct behavior. This could happen as follows. This thread
// grabs the lock and transitions the superblock to the full state.
// Before it searches for a new superblock, other threads perform
// enough deallocations to transition the superblock to the partially
// full state. This thread then searches for a partially full
// superblock and finds the one it removed. There's potential for
// this to cause a performance issue if the same superblock keeps
// being removed and added due to the right mix and ordering of
// allocations and deallocations.
search_done = true;
new_sb = pos - block_size_id * m_ceil_num_sb;
// Set the head status for the superblock.
volatile_store( &m_sb_header(new_sb).m_is_active, uint32_t(true) );
// If there was a previous active superblock, mark it as not active.
// It is now in the full category and as such isn't tracked.
if ( lock_sb != INVALID_SUPERBLOCK ) {
volatile_store( &m_sb_header(lock_sb).m_is_active, uint32_t(false) );
}
store_fence();
}
}
// 2. Look for an empty superblock.
if ( new_sb == lock_sb ) {
tries = 0;
search_done = false;
// Set the starting search position to the beginning of this block
// size's bitset.
pos = 0;
while ( !search_done ) {
bool success = false;
unsigned prev_val = 0;
Kokkos::tie( success, prev_val ) = m_empty_sb.reset_any_in_word( pos );
if ( !success ) {
if ( ++tries >= max_tries ) {
// Exceeded number of words for this block size's bitset.
search_done = true;
}
else {
pos += BLOCKS_PER_PAGE;
}
}
else {
// Found a superblock.
// It is possible that the newly found superblock is the same as
// the old superblock. In this case putting the old value back in
// yields correct behavior. This could happen as follows. This
// thread grabs the lock and transitions the superblock to the full
// state. Before it searches for a new superblock, other threads
// perform enough deallocations to transition the superblock to the
// partially full state and then the empty state. This thread then
// searches for a partially full superblock and none exist. This
// thread then searches for an empty superblock and finds the one
// it removed. The likelihood of this happening is so remote that
// the potential for this to cause a performance issue is
// infinitesimal.
search_done = true;
new_sb = pos;
// Set the empty pages, block size, and head status for the
// superblock.
volatile_store( &m_sb_header(new_sb).m_empty_pages,
m_blocksize_info[block_size_id].m_pages_per_sb );
volatile_store( &m_sb_header(new_sb).m_lg_block_size,
block_size_id + LG_MIN_BLOCK_SIZE );
volatile_store( &m_sb_header(new_sb).m_is_active, uint32_t(true) );
// If there was a previous active superblock, mark it as not active.
// It is now in the full category and as such isn't tracked.
if ( lock_sb != INVALID_SUPERBLOCK ) {
volatile_store( &m_sb_header(lock_sb).m_is_active, uint32_t(false) );
}
store_fence();
}
}
}
// Write the new active superblock to release the lock.
atomic_exchange( &m_active(block_size_id), new_sb );
}
else {
// Either another thread has the lock and is switching the active
// superblock for this block size or another thread has already changed
// the active superblock since this thread read its value. Keep
// atomically reading the active superblock until it isn't locked to get
// the new active superblock.
do {
new_sb = atomic_fetch_or( &m_active(block_size_id), uint32_t(0) );
} while ( new_sb == SUPERBLOCK_LOCK );
load_fence();
// Assertions:
// 1. An invalid superblock should never be found here.
// 2. If the new superblock is the same as the previous superblock, the
// allocator is empty.
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINTERR
if ( new_sb == INVALID_SUPERBLOCK ) {
printf( "\n** MemoryPool::find_superblock() FOUND_INACTIVE_SUPERBLOCK **\n" );
#ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
fflush( stdout );
#endif
Kokkos::abort( "" );
}
#endif
}
return new_sb;
}
/// Returns 64 bits from a clock register.
KOKKOS_FORCEINLINE_FUNCTION
uint64_t get_clock_register(void) const
{
#if defined( __CUDA_ARCH__ )
// Return value of 64-bit hi-res clock register.
return clock64();
#elif defined( __i386__ ) || defined( __x86_64 )
// Return value of 64-bit hi-res clock register.
unsigned a = 0, d = 0;
__asm__ volatile( "rdtsc" : "=a" (a), "=d" (d) );
return ( (uint64_t) a ) | ( ( (uint64_t) d ) << 32 );
#elif defined( __powerpc ) || defined( __powerpc__ ) || defined( __powerpc64__ ) || \
defined( __POWERPC__ ) || defined( __ppc__ ) || defined( __ppc64__ )
unsigned int cycles = 0;
asm volatile( "mftb %0" : "=r" (cycles) );
return (uint64_t) cycles;
#else
const uint64_t ticks =
std::chrono::high_resolution_clock::now().time_since_epoch().count();
return ticks;
#endif
}
};
} // namespace Experimental
} // namespace Kokkos
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINTERR
#undef KOKKOS_ENABLE_MEMPOOL_PRINTERR
#endif
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
#undef KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
#endif
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO
#undef KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO
#endif
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
#undef KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
#endif
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO
#undef KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO
#endif
#ifdef KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
#undef KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
#endif
#endif // KOKKOS_MEMORYPOOL_HPP
diff --git a/lib/kokkos/core/src/Kokkos_OpenMP.hpp b/lib/kokkos/core/src/Kokkos_OpenMP.hpp
index a337d1a9d..c0c43b92f 100644
--- a/lib/kokkos/core/src/Kokkos_OpenMP.hpp
+++ b/lib/kokkos/core/src/Kokkos_OpenMP.hpp
@@ -1,204 +1,204 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_OPENMP_HPP
#define KOKKOS_OPENMP_HPP
#include <Kokkos_Core_fwd.hpp>
#if defined( KOKKOS_ENABLE_OPENMP) && !defined(_OPENMP)
#error "You enabled Kokkos OpenMP support without enabling OpenMP in the compiler!"
#endif
#if defined( KOKKOS_ENABLE_OPENMP ) && defined( _OPENMP )
#include <omp.h>
#include <cstddef>
#include <iosfwd>
#include <Kokkos_HostSpace.hpp>
#ifdef KOKKOS_ENABLE_HBWSPACE
#include <Kokkos_HBWSpace.hpp>
#endif
#include <Kokkos_ScratchSpace.hpp>
#include <Kokkos_Parallel.hpp>
#include <Kokkos_TaskScheduler.hpp>
#include <Kokkos_Layout.hpp>
#include <impl/Kokkos_Tags.hpp>
-#include <KokkosExp_MDRangePolicy.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/// \class OpenMP
/// \brief Kokkos device for multicore processors in the host memory space.
class OpenMP {
public:
//------------------------------------
//! \name Type declarations that all Kokkos devices must provide.
//@{
//! Tag this class as a kokkos execution space
typedef OpenMP execution_space ;
#ifdef KOKKOS_ENABLE_HBWSPACE
typedef Experimental::HBWSpace memory_space ;
#else
typedef HostSpace memory_space ;
#endif
//! This execution space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
typedef LayoutRight array_layout ;
typedef memory_space::size_type size_type ;
typedef ScratchMemorySpace< OpenMP > scratch_memory_space ;
//@}
//------------------------------------
//! \name Functions that all Kokkos execution spaces must implement.
//@{
inline static bool in_parallel() { return omp_in_parallel(); }
/** \brief Set the device in a "sleep" state. A noop for OpenMP. */
static bool sleep();
/** \brief Wake the device from the 'sleep' state. A noop for OpenMP. */
static bool wake();
/** \brief Wait until all dispatched functors complete. A noop for OpenMP. */
static void fence() {}
/// \brief Print configuration information to the given output stream.
static void print_configuration( std::ostream & , const bool detail = false );
/// \brief Free any resources being consumed by the device.
static void finalize();
/** \brief Initialize the device.
*
* 1) If the hardware locality library is enabled and OpenMP has not
* already bound threads then bind OpenMP threads to maximize
* core utilization and group for memory hierarchy locality.
*
* 2) Allocate a HostThread for each OpenMP thread to hold its
* topology and fan in/out data.
*/
static void initialize( unsigned thread_count = 0 ,
unsigned use_numa_count = 0 ,
unsigned use_cores_per_numa = 0 );
static int is_initialized();
/** \brief Return the maximum amount of concurrency. */
static int concurrency();
//@}
//------------------------------------
/** \brief This execution space has a topological thread pool which can be queried.
*
* All threads within a pool have a common memory space for which they are cache coherent.
* depth = 0 gives the number of threads in the whole pool.
* depth = 1 gives the number of threads in a NUMA region, typically sharing L3 cache.
* depth = 2 gives the number of threads at the finest granularity, typically sharing L1 cache.
*/
inline static int thread_pool_size( int depth = 0 );
/** \brief The rank of the executing thread in this thread pool */
KOKKOS_INLINE_FUNCTION static int thread_pool_rank();
//------------------------------------
inline static unsigned max_hardware_threads() { return thread_pool_size(0); }
KOKKOS_INLINE_FUNCTION static
unsigned hardware_thread_id() { return thread_pool_rank(); }
};
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
template<>
struct MemorySpaceAccess
< Kokkos::OpenMP::memory_space
, Kokkos::OpenMP::scratch_memory_space
>
{
enum { assignable = false };
enum { accessible = true };
enum { deepcopy = false };
};
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::OpenMP::memory_space
, Kokkos::OpenMP::scratch_memory_space
>
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
#include <OpenMP/Kokkos_OpenMPexec.hpp>
#include <OpenMP/Kokkos_OpenMP_Parallel.hpp>
#include <OpenMP/Kokkos_OpenMP_Task.hpp>
+#include <KokkosExp_MDRangePolicy.hpp>
/*--------------------------------------------------------------------------*/
#endif /* #if defined( KOKKOS_ENABLE_OPENMP ) && defined( _OPENMP ) */
#endif /* #ifndef KOKKOS_OPENMP_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Pair.hpp b/lib/kokkos/core/src/Kokkos_Pair.hpp
index 83436826f..067767f2f 100644
--- a/lib/kokkos/core/src/Kokkos_Pair.hpp
+++ b/lib/kokkos/core/src/Kokkos_Pair.hpp
@@ -1,530 +1,527 @@
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
/// \file Kokkos_Pair.hpp
/// \brief Declaration and definition of Kokkos::pair.
///
/// This header file declares and defines Kokkos::pair and its related
/// nonmember functions.
#ifndef KOKKOS_PAIR_HPP
#define KOKKOS_PAIR_HPP
#include <Kokkos_Macros.hpp>
#include <utility>
namespace Kokkos {
/// \struct pair
/// \brief Replacement for std::pair that works on CUDA devices.
///
/// The instance methods of std::pair, including its constructors, are
/// not marked as <tt>__device__</tt> functions. Thus, they cannot be
/// called on a CUDA device, such as an NVIDIA GPU. This struct
/// implements the same interface as std::pair, but can be used on a
/// CUDA device as well as on the host.
template <class T1, class T2>
struct pair
{
//! The first template parameter of this class.
typedef T1 first_type;
//! The second template parameter of this class.
typedef T2 second_type;
//! The first element of the pair.
first_type first;
//! The second element of the pair.
second_type second;
/// \brief Default constructor.
///
/// This calls the default constructors of T1 and T2. It won't
/// compile if those default constructors are not defined and
/// public.
- KOKKOS_FORCEINLINE_FUNCTION
- pair()
- : first(), second()
- {}
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
+ pair() = default ;
/// \brief Constructor that takes both elements of the pair.
///
/// This calls the copy constructors of T1 and T2. It won't compile
/// if those copy constructors are not defined and public.
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair(first_type const& f, second_type const& s)
: first(f), second(s)
{}
/// \brief Copy constructor.
///
/// This calls the copy constructors of T1 and T2. It won't compile
/// if those copy constructors are not defined and public.
template <class U, class V>
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair( const pair<U,V> &p)
: first(p.first), second(p.second)
{}
/// \brief Copy constructor.
///
/// This calls the copy constructors of T1 and T2. It won't compile
/// if those copy constructors are not defined and public.
template <class U, class V>
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair( const volatile pair<U,V> &p)
: first(p.first), second(p.second)
{}
/// \brief Assignment operator.
///
/// This calls the assignment operators of T1 and T2. It won't
/// compile if the assignment operators are not defined and public.
template <class U, class V>
KOKKOS_FORCEINLINE_FUNCTION
pair<T1, T2> & operator=(const pair<U,V> &p)
{
first = p.first;
second = p.second;
return *this;
}
/// \brief Assignment operator, for volatile <tt>*this</tt>.
///
/// \param p [in] Input; right-hand side of the assignment.
///
/// This calls the assignment operators of T1 and T2. It will not
/// compile if the assignment operators are not defined and public.
///
/// This operator returns \c void instead of <tt>volatile pair<T1,
/// T2>& </tt>. See Kokkos Issue #177 for the explanation. In
/// practice, this means that you should not chain assignments with
/// volatile lvalues.
template <class U, class V>
KOKKOS_FORCEINLINE_FUNCTION
void operator=(const volatile pair<U,V> &p) volatile
{
first = p.first;
second = p.second;
// We deliberately do not return anything here. See explanation
// in public documentation above.
}
// from std::pair<U,V>
template <class U, class V>
pair( const std::pair<U,V> &p)
: first(p.first), second(p.second)
{}
/// \brief Return the std::pair version of this object.
///
/// This is <i>not</i> a device function; you may not call it on a
/// CUDA device. It is meant to be called on the host, if the user
/// wants an std::pair instead of a Kokkos::pair.
///
/// \note This is not a conversion operator, since defining a
/// conversion operator made the relational operators have
/// ambiguous definitions.
std::pair<T1,T2> to_std_pair() const
{ return std::make_pair(first,second); }
};
template <class T1, class T2>
struct pair<T1&, T2&>
{
//! The first template parameter of this class.
typedef T1& first_type;
//! The second template parameter of this class.
typedef T2& second_type;
//! The first element of the pair.
first_type first;
//! The second element of the pair.
second_type second;
/// \brief Constructor that takes both elements of the pair.
///
/// This calls the copy constructors of T1 and T2. It won't compile
/// if those copy constructors are not defined and public.
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair(first_type f, second_type s)
: first(f), second(s)
{}
/// \brief Copy constructor.
///
/// This calls the copy constructors of T1 and T2. It won't compile
/// if those copy constructors are not defined and public.
template <class U, class V>
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair( const pair<U,V> &p)
: first(p.first), second(p.second)
{}
// from std::pair<U,V>
template <class U, class V>
pair( const std::pair<U,V> &p)
: first(p.first), second(p.second)
{}
/// \brief Assignment operator.
///
/// This calls the assignment operators of T1 and T2. It won't
/// compile if the assignment operators are not defined and public.
template <class U, class V>
KOKKOS_FORCEINLINE_FUNCTION
pair<first_type, second_type> & operator=(const pair<U,V> &p)
{
first = p.first;
second = p.second;
return *this;
}
/// \brief Return the std::pair version of this object.
///
/// This is <i>not</i> a device function; you may not call it on a
/// CUDA device. It is meant to be called on the host, if the user
/// wants an std::pair instead of a Kokkos::pair.
///
/// \note This is not a conversion operator, since defining a
/// conversion operator made the relational operators have
/// ambiguous definitions.
std::pair<T1,T2> to_std_pair() const
{ return std::make_pair(first,second); }
};
template <class T1, class T2>
struct pair<T1, T2&>
{
//! The first template parameter of this class.
typedef T1 first_type;
//! The second template parameter of this class.
typedef T2& second_type;
//! The first element of the pair.
first_type first;
//! The second element of the pair.
second_type second;
/// \brief Constructor that takes both elements of the pair.
///
/// This calls the copy constructors of T1 and T2. It won't compile
/// if those copy constructors are not defined and public.
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair(first_type const& f, second_type s)
: first(f), second(s)
{}
/// \brief Copy constructor.
///
/// This calls the copy constructors of T1 and T2. It won't compile
/// if those copy constructors are not defined and public.
template <class U, class V>
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair( const pair<U,V> &p)
: first(p.first), second(p.second)
{}
// from std::pair<U,V>
template <class U, class V>
pair( const std::pair<U,V> &p)
: first(p.first), second(p.second)
{}
/// \brief Assignment operator.
///
/// This calls the assignment operators of T1 and T2. It won't
/// compile if the assignment operators are not defined and public.
template <class U, class V>
KOKKOS_FORCEINLINE_FUNCTION
pair<first_type, second_type> & operator=(const pair<U,V> &p)
{
first = p.first;
second = p.second;
return *this;
}
/// \brief Return the std::pair version of this object.
///
/// This is <i>not</i> a device function; you may not call it on a
/// CUDA device. It is meant to be called on the host, if the user
/// wants an std::pair instead of a Kokkos::pair.
///
/// \note This is not a conversion operator, since defining a
/// conversion operator made the relational operators have
/// ambiguous definitions.
std::pair<T1,T2> to_std_pair() const
{ return std::make_pair(first,second); }
};
template <class T1, class T2>
struct pair<T1&, T2>
{
//! The first template parameter of this class.
typedef T1& first_type;
//! The second template parameter of this class.
typedef T2 second_type;
//! The first element of the pair.
first_type first;
//! The second element of the pair.
second_type second;
/// \brief Constructor that takes both elements of the pair.
///
/// This calls the copy constructors of T1 and T2. It won't compile
/// if those copy constructors are not defined and public.
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair(first_type f, second_type const& s)
: first(f), second(s)
{}
/// \brief Copy constructor.
///
/// This calls the copy constructors of T1 and T2. It won't compile
/// if those copy constructors are not defined and public.
template <class U, class V>
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair( const pair<U,V> &p)
: first(p.first), second(p.second)
{}
// from std::pair<U,V>
template <class U, class V>
pair( const std::pair<U,V> &p)
: first(p.first), second(p.second)
{}
/// \brief Assignment operator.
///
/// This calls the assignment operators of T1 and T2. It won't
/// compile if the assignment operators are not defined and public.
template <class U, class V>
KOKKOS_FORCEINLINE_FUNCTION
pair<first_type, second_type> & operator=(const pair<U,V> &p)
{
first = p.first;
second = p.second;
return *this;
}
/// \brief Return the std::pair version of this object.
///
/// This is <i>not</i> a device function; you may not call it on a
/// CUDA device. It is meant to be called on the host, if the user
/// wants an std::pair instead of a Kokkos::pair.
///
/// \note This is not a conversion operator, since defining a
/// conversion operator made the relational operators have
/// ambiguous definitions.
std::pair<T1,T2> to_std_pair() const
{ return std::make_pair(first,second); }
};
//! Equality operator for Kokkos::pair.
template <class T1, class T2>
KOKKOS_FORCEINLINE_FUNCTION
bool operator== (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
{ return lhs.first==rhs.first && lhs.second==rhs.second; }
//! Inequality operator for Kokkos::pair.
template <class T1, class T2>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator!= (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
{ return !(lhs==rhs); }
//! Less-than operator for Kokkos::pair.
template <class T1, class T2>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator< (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
{ return lhs.first<rhs.first || (!(rhs.first<lhs.first) && lhs.second<rhs.second); }
//! Less-than-or-equal-to operator for Kokkos::pair.
template <class T1, class T2>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator<= (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
{ return !(rhs<lhs); }
//! Greater-than operator for Kokkos::pair.
template <class T1, class T2>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator> (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
{ return rhs<lhs; }
//! Greater-than-or-equal-to operator for Kokkos::pair.
template <class T1, class T2>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator>= (const pair<T1,T2>& lhs, const pair<T1,T2>& rhs)
{ return !(lhs<rhs); }
/// \brief Return a new pair.
///
/// This is a "nonmember constructor" for Kokkos::pair. It works just
/// like std::make_pair.
template <class T1,class T2>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
pair<T1,T2> make_pair (T1 x, T2 y)
{ return ( pair<T1,T2>(x,y) ); }
/// \brief Return a pair of references to the input arguments.
///
/// This compares to std::tie (new in C++11). You can use it to
/// assign to two variables at once, from the result of a function
/// that returns a pair. For example (<tt>__device__</tt> and
/// <tt>__host__</tt> attributes omitted for brevity):
/// \code
/// // Declaration of the function to call.
/// // First return value: operation count.
/// // Second return value: whether all operations succeeded.
/// Kokkos::pair<int, bool> someFunction ();
///
/// // Code that uses Kokkos::tie.
/// int myFunction () {
/// int count = 0;
/// bool success = false;
///
/// // This assigns to both count and success.
/// Kokkos::tie (count, success) = someFunction ();
///
/// if (! success) {
/// // ... Some operation failed;
/// // take corrective action ...
/// }
/// return count;
/// }
/// \endcode
///
/// The line that uses tie() could have been written like this:
/// \code
/// Kokkos::pair<int, bool> result = someFunction ();
/// count = result.first;
/// success = result.second;
/// \endcode
///
/// Using tie() saves two lines of code and avoids a copy of each
/// element of the pair. The latter could be significant if one or
/// both elements of the pair are more substantial objects than \c int
/// or \c bool.
template <class T1,class T2>
KOKKOS_FORCEINLINE_FUNCTION
pair<T1 &,T2 &> tie (T1 & x, T2 & y)
{ return ( pair<T1 &,T2 &>(x,y) ); }
//
// Specialization of Kokkos::pair for a \c void second argument. This
// is not actually a "pair"; it only contains one element, the first.
//
template <class T1>
struct pair<T1,void>
{
typedef T1 first_type;
typedef void second_type;
first_type first;
enum { second = 0 };
- KOKKOS_FORCEINLINE_FUNCTION
- pair()
- : first()
- {}
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
+ pair() = default ;
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair(const first_type & f)
: first(f)
{}
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair(const first_type & f, int)
: first(f)
{}
template <class U>
- KOKKOS_FORCEINLINE_FUNCTION
+ KOKKOS_FORCEINLINE_FUNCTION constexpr
pair( const pair<U,void> &p)
: first(p.first)
{}
template <class U>
KOKKOS_FORCEINLINE_FUNCTION
pair<T1, void> & operator=(const pair<U,void> &p)
{
first = p.first;
return *this;
}
};
//
// Specialization of relational operators for Kokkos::pair<T1,void>.
//
template <class T1>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator== (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
{ return lhs.first==rhs.first; }
template <class T1>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator!= (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
{ return !(lhs==rhs); }
template <class T1>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator< (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
{ return lhs.first<rhs.first; }
template <class T1>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator<= (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
{ return !(rhs<lhs); }
template <class T1>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator> (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
{ return rhs<lhs; }
template <class T1>
-KOKKOS_FORCEINLINE_FUNCTION
+KOKKOS_FORCEINLINE_FUNCTION constexpr
bool operator>= (const pair<T1,void>& lhs, const pair<T1,void>& rhs)
{ return !(lhs<rhs); }
} // namespace Kokkos
#endif //KOKKOS_PAIR_HPP
+
diff --git a/lib/kokkos/core/src/Kokkos_Parallel.hpp b/lib/kokkos/core/src/Kokkos_Parallel.hpp
index 64b1502bc..e412e608b 100644
--- a/lib/kokkos/core/src/Kokkos_Parallel.hpp
+++ b/lib/kokkos/core/src/Kokkos_Parallel.hpp
@@ -1,527 +1,528 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
/// \file Kokkos_Parallel.hpp
/// \brief Declaration of parallel operators
#ifndef KOKKOS_PARALLEL_HPP
#define KOKKOS_PARALLEL_HPP
#include <cstddef>
#include <Kokkos_Core_fwd.hpp>
#include <Kokkos_View.hpp>
#include <Kokkos_ExecPolicy.hpp>
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
#include <impl/Kokkos_Profiling_Interface.hpp>
#include <typeinfo>
#endif
#include <impl/Kokkos_Tags.hpp>
#include <impl/Kokkos_Traits.hpp>
+#include <impl/Kokkos_FunctorAnalysis.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
#ifdef KOKKOS_DEBUG
#include<iostream>
#endif
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
/** \brief Given a Functor and Execution Policy query an execution space.
*
* if the Policy has an execution space use that
* else if the Functor has an execution_space use that
* else if the Functor has a device_type use that for backward compatibility
* else use the default
*/
template< class Functor
, class Policy
, class EnableFunctor
, class EnablePolicy
>
struct FunctorPolicyExecutionSpace {
typedef Kokkos::DefaultExecutionSpace execution_space ;
};
template< class Functor , class Policy >
struct FunctorPolicyExecutionSpace
< Functor , Policy
, typename enable_if_type< typename Functor::device_type >::type
, typename enable_if_type< typename Policy ::execution_space >::type
>
{
typedef typename Policy ::execution_space execution_space ;
};
template< class Functor , class Policy >
struct FunctorPolicyExecutionSpace
< Functor , Policy
, typename enable_if_type< typename Functor::execution_space >::type
, typename enable_if_type< typename Policy ::execution_space >::type
>
{
typedef typename Policy ::execution_space execution_space ;
};
template< class Functor , class Policy , class EnableFunctor >
struct FunctorPolicyExecutionSpace
< Functor , Policy
, EnableFunctor
, typename enable_if_type< typename Policy::execution_space >::type
>
{
typedef typename Policy ::execution_space execution_space ;
};
template< class Functor , class Policy , class EnablePolicy >
struct FunctorPolicyExecutionSpace
< Functor , Policy
, typename enable_if_type< typename Functor::device_type >::type
, EnablePolicy
>
{
typedef typename Functor::device_type execution_space ;
};
template< class Functor , class Policy , class EnablePolicy >
struct FunctorPolicyExecutionSpace
< Functor , Policy
, typename enable_if_type< typename Functor::execution_space >::type
, EnablePolicy
>
{
typedef typename Functor::execution_space execution_space ;
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
/** \brief Execute \c functor in parallel according to the execution \c policy.
*
* A "functor" is a class containing the function to execute in parallel,
* data needed for that execution, and an optional \c execution_space
* typedef. Here is an example functor for parallel_for:
*
* \code
* class FunctorType {
* public:
* typedef ... execution_space ;
* void operator() ( WorkType iwork ) const ;
* };
* \endcode
*
* In the above example, \c WorkType is any integer type for which a
* valid conversion from \c size_t to \c IntType exists. Its
* <tt>operator()</tt> method defines the operation to parallelize,
* over the range of integer indices <tt>iwork=[0,work_count-1]</tt>.
* This compares to a single iteration \c iwork of a \c for loop.
* If \c execution_space is not defined DefaultExecutionSpace will be used.
*/
template< class ExecPolicy , class FunctorType >
inline
void parallel_for( const ExecPolicy & policy
, const FunctorType & functor
, const std::string& str = ""
, typename Impl::enable_if< ! Impl::is_integral< ExecPolicy >::value >::type * = 0
)
{
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
uint64_t kpID = 0;
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::beginParallelFor("" == str ? typeid(FunctorType).name() : str, 0, &kpID);
}
#endif
Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
Impl::ParallelFor< FunctorType , ExecPolicy > closure( functor , policy );
Kokkos::Impl::shared_allocation_tracking_release_and_enable();
-
+
closure.execute();
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::endParallelFor(kpID);
}
#endif
}
template< class FunctorType >
inline
void parallel_for( const size_t work_count
, const FunctorType & functor
, const std::string& str = ""
)
{
typedef typename
Impl::FunctorPolicyExecutionSpace< FunctorType , void >::execution_space
execution_space ;
typedef RangePolicy< execution_space > policy ;
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
uint64_t kpID = 0;
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::beginParallelFor("" == str ? typeid(FunctorType).name() : str, 0, &kpID);
}
#endif
-
+
Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
Impl::ParallelFor< FunctorType , policy > closure( functor , policy(0,work_count) );
Kokkos::Impl::shared_allocation_tracking_release_and_enable();
closure.execute();
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::endParallelFor(kpID);
}
#endif
}
template< class ExecPolicy , class FunctorType >
inline
void parallel_for( const std::string & str
, const ExecPolicy & policy
, const FunctorType & functor )
{
#if KOKKOS_ENABLE_DEBUG_PRINT_KERNEL_NAMES
Kokkos::fence();
std::cout << "KOKKOS_DEBUG Start parallel_for kernel: " << str << std::endl;
#endif
parallel_for(policy,functor,str);
#if KOKKOS_ENABLE_DEBUG_PRINT_KERNEL_NAMES
Kokkos::fence();
std::cout << "KOKKOS_DEBUG End parallel_for kernel: " << str << std::endl;
#endif
(void) str;
}
}
#include <Kokkos_Parallel_Reduce.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
/// \fn parallel_scan
/// \tparam ExecutionPolicy The execution policy type.
/// \tparam FunctorType The scan functor type.
///
/// \param policy [in] The execution policy.
/// \param functor [in] The scan functor.
///
/// This function implements a parallel scan pattern. The scan can
/// be either inclusive or exclusive, depending on how you implement
/// the scan functor.
///
/// A scan functor looks almost exactly like a reduce functor, except
/// that its operator() takes a third \c bool argument, \c final_pass,
/// which indicates whether this is the last pass of the scan
/// operation. We will show below how to use the \c final_pass
/// argument to control whether the scan is inclusive or exclusive.
///
/// Here is the minimum required interface of a scan functor for a POD
/// (plain old data) value type \c PodType. That is, the result is a
/// View of zero or more PodType. It is also possible for the result
/// to be an array of (same-sized) arrays of PodType, but we do not
/// show the required interface for that here.
/// \code
/// template< class ExecPolicy , class FunctorType >
/// class ScanFunctor {
/// public:
/// // The Kokkos device type
/// typedef ... execution_space;
/// // Type of an entry of the array containing the result;
/// // also the type of each of the entries combined using
/// // operator() or join().
/// typedef PodType value_type;
///
/// void operator () (const ExecPolicy::member_type & i, value_type& update, const bool final_pass) const;
/// void init (value_type& update) const;
/// void join (volatile value_type& update, volatile const value_type& input) const
/// };
/// \endcode
///
/// Here is an example of a functor which computes an inclusive plus-scan
/// of an array of \c int, in place. If given an array [1, 2, 3, 4], this
/// scan will overwrite that array with [1, 3, 6, 10].
///
/// \code
/// template<class SpaceType>
/// class InclScanFunctor {
/// public:
/// typedef SpaceType execution_space;
/// typedef int value_type;
/// typedef typename SpaceType::size_type size_type;
///
/// InclScanFunctor( Kokkos::View<value_type*, execution_space> x
/// , Kokkos::View<value_type*, execution_space> y ) : m_x(x), m_y(y) {}
///
/// void operator () (const size_type i, value_type& update, const bool final_pass) const {
/// update += m_x(i);
/// if (final_pass) {
/// m_y(i) = update;
/// }
/// }
/// void init (value_type& update) const {
/// update = 0;
/// }
/// void join (volatile value_type& update, volatile const value_type& input) const {
/// update += input;
/// }
///
/// private:
/// Kokkos::View<value_type*, execution_space> m_x;
/// Kokkos::View<value_type*, execution_space> m_y;
/// };
/// \endcode
///
/// Here is an example of a functor which computes an <i>exclusive</i>
/// scan of an array of \c int, in place. In operator(), note both
/// that the final_pass test and the update have switched places, and
/// the use of a temporary. If given an array [1, 2, 3, 4], this scan
/// will overwrite that array with [0, 1, 3, 6].
///
/// \code
/// template<class SpaceType>
/// class ExclScanFunctor {
/// public:
/// typedef SpaceType execution_space;
/// typedef int value_type;
/// typedef typename SpaceType::size_type size_type;
///
/// ExclScanFunctor (Kokkos::View<value_type*, execution_space> x) : x_ (x) {}
///
/// void operator () (const size_type i, value_type& update, const bool final_pass) const {
/// const value_type x_i = x_(i);
/// if (final_pass) {
/// x_(i) = update;
/// }
/// update += x_i;
/// }
/// void init (value_type& update) const {
/// update = 0;
/// }
/// void join (volatile value_type& update, volatile const value_type& input) const {
/// update += input;
/// }
///
/// private:
/// Kokkos::View<value_type*, execution_space> x_;
/// };
/// \endcode
///
/// Here is an example of a functor which builds on the above
/// exclusive scan example, to compute an offsets array from a
/// population count array, in place. We assume that the pop count
/// array has an extra entry at the end to store the final count. If
/// given an array [1, 2, 3, 4, 0], this scan will overwrite that
/// array with [0, 1, 3, 6, 10].
///
/// \code
/// template<class SpaceType>
/// class OffsetScanFunctor {
/// public:
/// typedef SpaceType execution_space;
/// typedef int value_type;
/// typedef typename SpaceType::size_type size_type;
///
/// // lastIndex_ is the last valid index (zero-based) of x.
/// // If x has length zero, then lastIndex_ won't be used anyway.
/// OffsetScanFunctor( Kokkos::View<value_type*, execution_space> x
/// , Kokkos::View<value_type*, execution_space> y )
/// : m_x(x), m_y(y), last_index_ (x.dimension_0 () == 0 ? 0 : x.dimension_0 () - 1)
/// {}
///
/// void operator () (const size_type i, int& update, const bool final_pass) const {
/// if (final_pass) {
/// m_y(i) = update;
/// }
/// update += m_x(i);
/// // The last entry of m_y gets the final sum.
/// if (final_pass && i == last_index_) {
/// m_y(i+1) = update;
/// }
/// }
/// void init (value_type& update) const {
/// update = 0;
/// }
/// void join (volatile value_type& update, volatile const value_type& input) const {
/// update += input;
/// }
///
/// private:
/// Kokkos::View<value_type*, execution_space> m_x;
/// Kokkos::View<value_type*, execution_space> m_y;
/// const size_type last_index_;
/// };
/// \endcode
///
template< class ExecutionPolicy , class FunctorType >
inline
void parallel_scan( const ExecutionPolicy & policy
, const FunctorType & functor
, const std::string& str = ""
, typename Impl::enable_if< ! Impl::is_integral< ExecutionPolicy >::value >::type * = 0
)
{
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
uint64_t kpID = 0;
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::beginParallelScan("" == str ? typeid(FunctorType).name() : str, 0, &kpID);
}
#endif
Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
Impl::ParallelScan< FunctorType , ExecutionPolicy > closure( functor , policy );
Kokkos::Impl::shared_allocation_tracking_release_and_enable();
closure.execute();
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::endParallelScan(kpID);
}
#endif
}
template< class FunctorType >
inline
void parallel_scan( const size_t work_count
, const FunctorType & functor
, const std::string& str = "" )
{
typedef typename
Kokkos::Impl::FunctorPolicyExecutionSpace< FunctorType , void >::execution_space
execution_space ;
typedef Kokkos::RangePolicy< execution_space > policy ;
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
uint64_t kpID = 0;
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::beginParallelScan("" == str ? typeid(FunctorType).name() : str, 0, &kpID);
}
#endif
-
+
Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
Impl::ParallelScan< FunctorType , policy > closure( functor , policy(0,work_count) );
Kokkos::Impl::shared_allocation_tracking_release_and_enable();
closure.execute();
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::endParallelScan(kpID);
}
#endif
}
template< class ExecutionPolicy , class FunctorType >
inline
void parallel_scan( const std::string& str
, const ExecutionPolicy & policy
, const FunctorType & functor)
{
#if KOKKOS_ENABLE_DEBUG_PRINT_KERNEL_NAMES
Kokkos::fence();
std::cout << "KOKKOS_DEBUG Start parallel_scan kernel: " << str << std::endl;
#endif
parallel_scan(policy,functor,str);
#if KOKKOS_ENABLE_DEBUG_PRINT_KERNEL_NAMES
Kokkos::fence();
std::cout << "KOKKOS_DEBUG End parallel_scan kernel: " << str << std::endl;
#endif
(void) str;
}
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class Enable = void >
struct FunctorTeamShmemSize
{
KOKKOS_INLINE_FUNCTION static size_t value( const FunctorType & , int ) { return 0 ; }
};
template< class FunctorType >
struct FunctorTeamShmemSize< FunctorType , typename Impl::enable_if< 0 < sizeof( & FunctorType::team_shmem_size ) >::type >
{
static inline size_t value( const FunctorType & f , int team_size ) { return f.team_shmem_size( team_size ) ; }
};
template< class FunctorType >
struct FunctorTeamShmemSize< FunctorType , typename Impl::enable_if< 0 < sizeof( & FunctorType::shmem_size ) >::type >
{
static inline size_t value( const FunctorType & f , int team_size ) { return f.shmem_size( team_size ) ; }
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* KOKKOS_PARALLEL_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp b/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp
index a3649b442..900dce19f 100644
--- a/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp
+++ b/lib/kokkos/core/src/Kokkos_Parallel_Reduce.hpp
@@ -1,1356 +1,1356 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
namespace Kokkos {
template<class T, class Enable = void>
struct is_reducer_type {
enum { value = 0 };
};
template<class T>
struct is_reducer_type<T,typename std::enable_if<
std::is_same<typename std::remove_cv<T>::type,
typename std::remove_cv<typename T::reducer_type>::type>::value
>::type> {
enum { value = 1 };
};
namespace Experimental {
template<class Scalar,class Space = HostSpace>
struct Sum {
public:
//Required
typedef Sum reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return static_cast<value_type>(0);
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return value_type();
}
};
public:
Sum(value_type& result_):
init_value(InitWrapper<value_type>::value()),result(&result_) {}
Sum(const result_view_type& result_):
init_value(InitWrapper<value_type>::value()),result(result_) {}
Sum(value_type& result_, const value_type& init_value_):
init_value(init_value_),result(&result_) {}
Sum(const result_view_type& result_, const value_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest += src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest += src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar,class Space = HostSpace>
struct Prod {
public:
//Required
typedef Prod reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return static_cast<value_type>(1);
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return value_type();
}
};
public:
Prod(value_type& result_):
init_value(InitWrapper<value_type>::value()),result(&result_) {}
Prod(const result_view_type& result_):
init_value(InitWrapper<value_type>::value()),result(result_) {}
Prod(value_type& result_, const value_type& init_value_):
init_value(init_value_),result(&result_) {}
Prod(const result_view_type& result_, const value_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest *= src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest *= src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct Min {
public:
//Required
typedef Min reducer_type;
typedef typename std::remove_cv<Scalar>::type value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<value_type>::max();
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return value_type();
}
};
public:
Min(value_type& result_):
init_value(InitWrapper<value_type>::value()),result(&result_) {}
Min(const result_view_type& result_):
init_value(InitWrapper<value_type>::value()),result(result_) {}
Min(value_type& result_, const value_type& init_value_):
init_value(init_value_),result(&result_) {}
Min(const result_view_type& result_, const value_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src < dest )
dest = src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src < dest )
dest = src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct Max {
public:
//Required
typedef Max reducer_type;
typedef typename std::remove_cv<Scalar>::type value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<value_type>::min();
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return value_type();
}
};
public:
Max(value_type& result_):
init_value(InitWrapper<value_type>::value()),result(&result_) {}
Max(const result_view_type& result_):
init_value(InitWrapper<value_type>::value()),result(result_) {}
Max(value_type& result_, const value_type& init_value_):
init_value(init_value_),result(&result_) {}
Max(const result_view_type& result_, const value_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src > dest )
dest = src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src > dest )
dest = src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct LAnd {
public:
//Required
typedef LAnd reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
private:
result_view_type result;
public:
LAnd(value_type& result_):result(&result_) {}
LAnd(const result_view_type& result_):result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest && src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest && src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = 1;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct LOr {
public:
//Required
typedef LOr reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
private:
result_view_type result;
public:
LOr(value_type& result_):result(&result_) {}
LOr(const result_view_type& result_):result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest || src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest || src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = 0;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct LXor {
public:
//Required
typedef LXor reducer_type;
typedef Scalar value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
private:
result_view_type result;
public:
LXor(value_type& result_):result(&result_) {}
LXor(const result_view_type& result_):result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest? (!src) : src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest? (!src) : src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = 0;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct BAnd {
public:
//Required
typedef BAnd reducer_type;
typedef typename std::remove_cv<Scalar>::type value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
public:
BAnd(value_type& result_):
init_value(value_type() | (~value_type())),result(&result_) {}
BAnd(const result_view_type& result_):
init_value(value_type() | (~value_type())),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest & src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest & src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct BOr {
public:
//Required
typedef BOr reducer_type;
typedef typename std::remove_cv<Scalar>::type value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
public:
BOr(value_type& result_):
init_value(value_type() & (~value_type())),result(&result_) {}
BOr(const result_view_type& result_):
init_value(value_type() & (~value_type())),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest | src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest | src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Space = HostSpace>
struct BXor {
public:
//Required
typedef BXor reducer_type;
typedef typename std::remove_cv<Scalar>::type value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
value_type init_value;
private:
result_view_type result;
public:
BXor(value_type& result_):
init_value(value_type() & (~value_type())),result(&result_) {}
BXor(const result_view_type& result_):
init_value(value_type() & (~value_type())),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
dest = dest ^ src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
dest = dest ^ src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Index>
struct ValLocScalar {
Scalar val;
Index loc;
KOKKOS_INLINE_FUNCTION
void operator = (const ValLocScalar& rhs) {
val = rhs.val;
loc = rhs.loc;
}
KOKKOS_INLINE_FUNCTION
void operator = (const volatile ValLocScalar& rhs) volatile {
val = rhs.val;
loc = rhs.loc;
}
};
template<class Scalar, class Index, class Space = HostSpace>
struct MinLoc {
private:
typedef typename std::remove_cv<Scalar>::type scalar_type;
typedef typename std::remove_cv<Index>::type index_type;
public:
//Required
typedef MinLoc reducer_type;
typedef ValLocScalar<scalar_type,index_type> value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
scalar_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<scalar_type>::max();
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return scalar_type();
}
};
public:
MinLoc(value_type& result_):
init_value(InitWrapper<scalar_type>::value()),result(&result_) {}
MinLoc(const result_view_type& result_):
init_value(InitWrapper<scalar_type>::value()),result(result_) {}
MinLoc(value_type& result_, const scalar_type& init_value_):
init_value(init_value_),result(&result_) {}
MinLoc(const result_view_type& result_, const scalar_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src.val < dest.val )
dest = src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src.val < dest.val )
dest = src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val.val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Index, class Space = HostSpace>
struct MaxLoc {
private:
typedef typename std::remove_cv<Scalar>::type scalar_type;
typedef typename std::remove_cv<Index>::type index_type;
public:
//Required
typedef MaxLoc reducer_type;
typedef ValLocScalar<scalar_type,index_type> value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
scalar_type init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct InitWrapper;
template<class ValueType >
struct InitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<scalar_type>::min();
}
};
template<class ValueType >
struct InitWrapper<ValueType,false> {
static ValueType value() {
return scalar_type();
}
};
public:
MaxLoc(value_type& result_):
init_value(InitWrapper<scalar_type>::value()),result(&result_) {}
MaxLoc(const result_view_type& result_):
init_value(InitWrapper<scalar_type>::value()),result(result_) {}
MaxLoc(value_type& result_, const scalar_type& init_value_):
init_value(init_value_),result(&result_) {}
MaxLoc(const result_view_type& result_, const scalar_type& init_value_):
init_value(init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src.val > dest.val )
dest = src;
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src.val > dest.val )
dest = src;
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val.val = init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar>
struct MinMaxScalar {
Scalar min_val,max_val;
KOKKOS_INLINE_FUNCTION
void operator = (const MinMaxScalar& rhs) {
min_val = rhs.min_val;
max_val = rhs.max_val;
}
KOKKOS_INLINE_FUNCTION
void operator = (const volatile MinMaxScalar& rhs) volatile {
min_val = rhs.min_val;
max_val = rhs.max_val;
}
};
template<class Scalar, class Space = HostSpace>
struct MinMax {
private:
typedef typename std::remove_cv<Scalar>::type scalar_type;
public:
//Required
typedef MinMax reducer_type;
typedef MinMaxScalar<scalar_type> value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
scalar_type min_init_value;
scalar_type max_init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct MinInitWrapper;
template<class ValueType >
struct MinInitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<scalar_type>::max();
}
};
template<class ValueType >
struct MinInitWrapper<ValueType,false> {
static ValueType value() {
return scalar_type();
}
};
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct MaxInitWrapper;
template<class ValueType >
struct MaxInitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<scalar_type>::min();
}
};
template<class ValueType >
struct MaxInitWrapper<ValueType,false> {
static ValueType value() {
return scalar_type();
}
};
public:
MinMax(value_type& result_):
min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(&result_) {}
MinMax(const result_view_type& result_):
min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(result_) {}
MinMax(value_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
min_init_value(min_init_value_),max_init_value(max_init_value_),result(&result_) {}
MinMax(const result_view_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
min_init_value(min_init_value_),max_init_value(max_init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src.min_val < dest.min_val ) {
dest.min_val = src.min_val;
}
if ( src.max_val > dest.max_val ) {
dest.max_val = src.max_val;
}
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src.min_val < dest.min_val ) {
dest.min_val = src.min_val;
}
if ( src.max_val > dest.max_val ) {
dest.max_val = src.max_val;
}
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val.min_val = min_init_value;
val.max_val = max_init_value;
}
result_view_type result_view() const {
return result;
}
};
template<class Scalar, class Index>
struct MinMaxLocScalar {
Scalar min_val,max_val;
Index min_loc,max_loc;
KOKKOS_INLINE_FUNCTION
void operator = (const MinMaxLocScalar& rhs) {
min_val = rhs.min_val;
min_loc = rhs.min_loc;
max_val = rhs.max_val;
max_loc = rhs.max_loc;
}
KOKKOS_INLINE_FUNCTION
void operator = (const volatile MinMaxLocScalar& rhs) volatile {
min_val = rhs.min_val;
min_loc = rhs.min_loc;
max_val = rhs.max_val;
max_loc = rhs.max_loc;
}
};
template<class Scalar, class Index, class Space = HostSpace>
struct MinMaxLoc {
private:
typedef typename std::remove_cv<Scalar>::type scalar_type;
typedef typename std::remove_cv<Index>::type index_type;
public:
//Required
typedef MinMaxLoc reducer_type;
typedef MinMaxLocScalar<scalar_type,index_type> value_type;
typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
scalar_type min_init_value;
scalar_type max_init_value;
private:
result_view_type result;
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct MinInitWrapper;
template<class ValueType >
struct MinInitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<scalar_type>::max();
}
};
template<class ValueType >
struct MinInitWrapper<ValueType,false> {
static ValueType value() {
return scalar_type();
}
};
template<class ValueType, bool is_arithmetic = std::is_arithmetic<ValueType>::value >
struct MaxInitWrapper;
template<class ValueType >
struct MaxInitWrapper<ValueType,true> {
static ValueType value() {
return std::numeric_limits<scalar_type>::min();
}
};
template<class ValueType >
struct MaxInitWrapper<ValueType,false> {
static ValueType value() {
return scalar_type();
}
};
public:
MinMaxLoc(value_type& result_):
min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(&result_) {}
MinMaxLoc(const result_view_type& result_):
min_init_value(MinInitWrapper<scalar_type>::value()),max_init_value(MaxInitWrapper<scalar_type>::value()),result(result_) {}
MinMaxLoc(value_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
min_init_value(min_init_value_),max_init_value(max_init_value_),result(&result_) {}
MinMaxLoc(const result_view_type& result_, const scalar_type& min_init_value_, const scalar_type& max_init_value_):
min_init_value(min_init_value_),max_init_value(max_init_value_),result(result_) {}
//Required
KOKKOS_INLINE_FUNCTION
void join(value_type& dest, const value_type& src) const {
if ( src.min_val < dest.min_val ) {
dest.min_val = src.min_val;
dest.min_loc = src.min_loc;
}
if ( src.max_val > dest.max_val ) {
dest.max_val = src.max_val;
dest.max_loc = src.max_loc;
}
}
KOKKOS_INLINE_FUNCTION
void join(volatile value_type& dest, const volatile value_type& src) const {
if ( src.min_val < dest.min_val ) {
dest.min_val = src.min_val;
dest.min_loc = src.min_loc;
}
if ( src.max_val > dest.max_val ) {
dest.max_val = src.max_val;
dest.max_loc = src.max_loc;
}
}
//Optional
KOKKOS_INLINE_FUNCTION
void init( value_type& val) const {
val.min_val = min_init_value;
val.max_val = max_init_value;
}
result_view_type result_view() const {
return result;
}
};
}
}
namespace Kokkos {
namespace Impl {
template< class T, class ReturnType , class ValueTraits>
struct ParallelReduceReturnValue;
template< class ReturnType , class FunctorType >
struct ParallelReduceReturnValue<typename std::enable_if<Kokkos::is_view<ReturnType>::value>::type, ReturnType, FunctorType> {
typedef ReturnType return_type;
typedef InvalidType reducer_type;
typedef typename return_type::value_type value_type_scalar;
typedef typename return_type::value_type* const value_type_array;
typedef typename if_c<return_type::rank==0,value_type_scalar,value_type_array>::type value_type;
static return_type& return_value(ReturnType& return_val, const FunctorType&) {
return return_val;
}
};
template< class ReturnType , class FunctorType>
struct ParallelReduceReturnValue<typename std::enable_if<
!Kokkos::is_view<ReturnType>::value &&
(!std::is_array<ReturnType>::value && !std::is_pointer<ReturnType>::value) &&
!Kokkos::is_reducer_type<ReturnType>::value
>::type, ReturnType, FunctorType> {
typedef Kokkos::View< ReturnType
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> return_type;
typedef InvalidType reducer_type;
typedef typename return_type::value_type value_type;
static return_type return_value(ReturnType& return_val, const FunctorType&) {
return return_type(&return_val);
}
};
template< class ReturnType , class FunctorType>
struct ParallelReduceReturnValue<typename std::enable_if<
(is_array<ReturnType>::value || std::is_pointer<ReturnType>::value)
>::type, ReturnType, FunctorType> {
typedef Kokkos::View< typename std::remove_const<ReturnType>::type
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> return_type;
typedef InvalidType reducer_type;
typedef typename return_type::value_type value_type[];
static return_type return_value(ReturnType& return_val,
const FunctorType& functor) {
return return_type(return_val,functor.value_count);
}
};
template< class ReturnType , class FunctorType>
struct ParallelReduceReturnValue<typename std::enable_if<
Kokkos::is_reducer_type<ReturnType>::value
>::type, ReturnType, FunctorType> {
typedef ReturnType return_type;
typedef ReturnType reducer_type;
typedef typename return_type::value_type value_type;
static return_type return_value(ReturnType& return_val,
const FunctorType& functor) {
return return_val;
}
};
}
namespace Impl {
template< class T, class ReturnType , class FunctorType>
struct ParallelReducePolicyType;
template< class PolicyType , class FunctorType >
struct ParallelReducePolicyType<typename std::enable_if<Kokkos::Impl::is_execution_policy<PolicyType>::value>::type, PolicyType,FunctorType> {
typedef PolicyType policy_type;
static PolicyType policy(const PolicyType& policy_) {
return policy_;
}
};
template< class PolicyType , class FunctorType >
struct ParallelReducePolicyType<typename std::enable_if<std::is_integral<PolicyType>::value>::type, PolicyType,FunctorType> {
typedef typename
Impl::FunctorPolicyExecutionSpace< FunctorType , void >::execution_space
execution_space ;
typedef Kokkos::RangePolicy<execution_space> policy_type;
static policy_type policy(const PolicyType& policy_) {
return policy_type(0,policy_);
}
};
}
namespace Impl {
template< class FunctorType, class ExecPolicy, class ValueType, class ExecutionSpace>
struct ParallelReduceFunctorType {
typedef FunctorType functor_type;
static const functor_type& functor(const functor_type& functor) {
return functor;
}
};
}
namespace Impl {
template< class PolicyType, class FunctorType, class ReturnType >
struct ParallelReduceAdaptor {
typedef Impl::ParallelReduceReturnValue<void,ReturnType,FunctorType> return_value_adapter;
#ifdef KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
typedef Impl::ParallelReduceFunctorType<FunctorType,PolicyType,
typename return_value_adapter::value_type,
typename PolicyType::execution_space> functor_adaptor;
#endif
static inline
void execute(const std::string& label,
const PolicyType& policy,
const FunctorType& functor,
ReturnType& return_value) {
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
uint64_t kpID = 0;
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::beginParallelReduce("" == label ? typeid(FunctorType).name() : label, 0, &kpID);
}
#endif
Kokkos::Impl::shared_allocation_tracking_claim_and_disable();
#ifdef KOKKOS_IMPL_NEED_FUNCTOR_WRAPPER
Impl::ParallelReduce<typename functor_adaptor::functor_type, PolicyType, typename return_value_adapter::reducer_type >
closure(functor_adaptor::functor(functor),
policy,
return_value_adapter::return_value(return_value,functor));
#else
Impl::ParallelReduce<FunctorType, PolicyType, typename return_value_adapter::reducer_type >
closure(functor,
policy,
return_value_adapter::return_value(return_value,functor));
#endif
Kokkos::Impl::shared_allocation_tracking_release_and_enable();
closure.execute();
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::endParallelReduce(kpID);
}
#endif
}
};
}
/*! \fn void parallel_reduce(label,policy,functor,return_argument)
\brief Perform a parallel reduction.
\param label An optional Label giving the call name. Must be able to construct a std::string from the argument.
\param policy A Kokkos Execution Policy, such as an integer, a RangePolicy or a TeamPolicy.
\param functor A functor with a reduction operator, and optional init, join and final functions.
\param return_argument A return argument which can be a scalar, a View, or a ReducerStruct. This argument can be left out if the functor has a final function.
*/
/** \brief Parallel reduction
*
* parallel_reduce performs parallel reductions with arbitrary functions - i.e.
* it is not solely data based. The call expects up to 4 arguments:
*
*
* Example of a parallel_reduce functor for a POD (plain old data) value type:
* \code
* class FunctorType { // For POD value type
* public:
* typedef ... execution_space ;
* typedef <podType> value_type ;
* void operator()( <intType> iwork , <podType> & update ) const ;
* void init( <podType> & update ) const ;
* void join( volatile <podType> & update ,
* volatile const <podType> & input ) const ;
*
* typedef true_type has_final ;
* void final( <podType> & update ) const ;
* };
* \endcode
*
* Example of a parallel_reduce functor for an array of POD (plain old data) values:
* \code
* class FunctorType { // For array of POD value
* public:
* typedef ... execution_space ;
* typedef <podType> value_type[] ;
* void operator()( <intType> , <podType> update[] ) const ;
* void init( <podType> update[] ) const ;
* void join( volatile <podType> update[] ,
* volatile const <podType> input[] ) const ;
*
* typedef true_type has_final ;
* void final( <podType> update[] ) const ;
* };
* \endcode
*/
// ReturnValue is scalar or array: take by reference
template< class PolicyType, class FunctorType, class ReturnType >
inline
void parallel_reduce(const std::string& label,
const PolicyType& policy,
const FunctorType& functor,
ReturnType& return_value,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,ReturnType>::execute(label,policy,functor,return_value);
}
template< class PolicyType, class FunctorType, class ReturnType >
inline
void parallel_reduce(const PolicyType& policy,
const FunctorType& functor,
ReturnType& return_value,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,ReturnType>::execute("",policy,functor,return_value);
}
template< class FunctorType, class ReturnType >
inline
void parallel_reduce(const size_t& policy,
const FunctorType& functor,
ReturnType& return_value) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute("",policy_type(0,policy),functor,return_value);
}
template< class FunctorType, class ReturnType >
inline
void parallel_reduce(const std::string& label,
const size_t& policy,
const FunctorType& functor,
ReturnType& return_value) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute(label,policy_type(0,policy),functor,return_value);
}
// ReturnValue as View or Reducer: take by copy to allow for inline construction
template< class PolicyType, class FunctorType, class ReturnType >
inline
void parallel_reduce(const std::string& label,
const PolicyType& policy,
const FunctorType& functor,
const ReturnType& return_value,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,const ReturnType>::execute(label,policy,functor,return_value);
}
template< class PolicyType, class FunctorType, class ReturnType >
inline
void parallel_reduce(const PolicyType& policy,
const FunctorType& functor,
const ReturnType& return_value,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
ReturnType return_value_impl = return_value;
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,ReturnType>::execute("",policy,functor,return_value_impl);
}
template< class FunctorType, class ReturnType >
inline
void parallel_reduce(const size_t& policy,
const FunctorType& functor,
const ReturnType& return_value) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
ReturnType return_value_impl = return_value;
Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute("",policy_type(0,policy),functor,return_value_impl);
}
template< class FunctorType, class ReturnType >
inline
void parallel_reduce(const std::string& label,
const size_t& policy,
const FunctorType& functor,
const ReturnType& return_value) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
ReturnType return_value_impl = return_value;
Impl::ParallelReduceAdaptor<policy_type,FunctorType,ReturnType>::execute(label,policy_type(0,policy),functor,return_value_impl);
}
// No Return Argument
template< class PolicyType, class FunctorType>
inline
void parallel_reduce(const std::string& label,
const PolicyType& policy,
const FunctorType& functor,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
, typename ValueTraits::value_type
, typename ValueTraits::pointer_type
>::type value_type ;
typedef Kokkos::View< value_type
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> result_view_type;
result_view_type result_view ;
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,result_view_type>::execute(label,policy,functor,result_view);
}
template< class PolicyType, class FunctorType >
inline
void parallel_reduce(const PolicyType& policy,
const FunctorType& functor,
typename Impl::enable_if<
Kokkos::Impl::is_execution_policy<PolicyType>::value
>::type * = 0) {
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
, typename ValueTraits::value_type
, typename ValueTraits::pointer_type
>::type value_type ;
typedef Kokkos::View< value_type
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> result_view_type;
result_view_type result_view ;
Impl::ParallelReduceAdaptor<PolicyType,FunctorType,result_view_type>::execute("",policy,functor,result_view);
}
template< class FunctorType >
inline
void parallel_reduce(const size_t& policy,
const FunctorType& functor) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
, typename ValueTraits::value_type
, typename ValueTraits::pointer_type
>::type value_type ;
typedef Kokkos::View< value_type
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> result_view_type;
result_view_type result_view ;
Impl::ParallelReduceAdaptor<policy_type,FunctorType,result_view_type>::execute("",policy_type(0,policy),functor,result_view);
}
template< class FunctorType>
inline
void parallel_reduce(const std::string& label,
const size_t& policy,
const FunctorType& functor) {
typedef typename Impl::ParallelReducePolicyType<void,size_t,FunctorType>::policy_type policy_type;
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename Kokkos::Impl::if_c< (ValueTraits::StaticValueSize != 0)
, typename ValueTraits::value_type
, typename ValueTraits::pointer_type
>::type value_type ;
typedef Kokkos::View< value_type
, Kokkos::HostSpace
, Kokkos::MemoryUnmanaged
> result_view_type;
result_view_type result_view ;
Impl::ParallelReduceAdaptor<policy_type,FunctorType,result_view_type>::execute(label,policy_type(0,policy),functor,result_view);
}
} //namespace Kokkos
diff --git a/lib/kokkos/core/src/Kokkos_Qthread.hpp b/lib/kokkos/core/src/Kokkos_Qthreads.hpp
similarity index 72%
rename from lib/kokkos/core/src/Kokkos_Qthread.hpp
rename to lib/kokkos/core/src/Kokkos_Qthreads.hpp
index c58518b06..0507552c3 100644
--- a/lib/kokkos/core/src/Kokkos_Qthread.hpp
+++ b/lib/kokkos/core/src/Kokkos_Qthreads.hpp
@@ -1,183 +1,198 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-#ifndef KOKKOS_QTHREAD_HPP
-#define KOKKOS_QTHREAD_HPP
+#ifndef KOKKOS_QTHREADS_HPP
+#define KOKKOS_QTHREADS_HPP
+
+#include <Kokkos_Core_fwd.hpp>
+
+#ifdef KOKKOS_ENABLE_QTHREADS
+
+// Defines to enable experimental Qthreads functionality.
+#define QTHREAD_LOCAL_PRIORITY
+#define CLONED_TASKS
+
+#include <qthread.h>
#include <cstddef>
#include <iosfwd>
-#include <Kokkos_Core.hpp>
-#include <Kokkos_Layout.hpp>
-#include <Kokkos_MemoryTraits.hpp>
+
#include <Kokkos_HostSpace.hpp>
-#include <Kokkos_ExecPolicy.hpp>
+#include <Kokkos_ScratchSpace.hpp>
+#include <Kokkos_Parallel.hpp>
+//#include <Kokkos_MemoryTraits.hpp>
+//#include <Kokkos_ExecPolicy.hpp>
+//#include <Kokkos_TaskScheduler.hpp> // Uncomment when Tasking working.
+#include <Kokkos_Layout.hpp>
#include <impl/Kokkos_Tags.hpp>
+#include <KokkosExp_MDRangePolicy.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
+
namespace Impl {
-class QthreadExec ;
+
+class QthreadsExec;
+
} // namespace Impl
+
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
-/** \brief Execution space supported by Qthread */
-class Qthread {
+/** \brief Execution space supported by Qthreads */
+class Qthreads {
public:
//! \name Type declarations that all Kokkos devices must provide.
//@{
//! Tag this class as an execution space
- typedef Qthread execution_space ;
- typedef Kokkos::HostSpace memory_space ;
+ typedef Qthreads execution_space;
+ typedef Kokkos::HostSpace memory_space;
//! This execution space preferred device_type
- typedef Kokkos::Device<execution_space,memory_space> device_type;
+ typedef Kokkos::Device< execution_space, memory_space > device_type;
- typedef Kokkos::LayoutRight array_layout ;
- typedef memory_space::size_type size_type ;
+ typedef Kokkos::LayoutRight array_layout;
+ typedef memory_space::size_type size_type;
- typedef ScratchMemorySpace< Qthread > scratch_memory_space ;
+ typedef ScratchMemorySpace< Qthreads > scratch_memory_space;
//@}
/*------------------------------------------------------------------------*/
/** \brief Initialization will construct one or more instances */
- static Qthread & instance( int = 0 );
+ static Qthreads & instance( int = 0 );
/** \brief Set the execution space to a "sleep" state.
*
* This function sets the "sleep" state in which it is not ready for work.
* This may consume less resources than in an "ready" state,
* but it may also take time to transition to the "ready" state.
*
* \return True if enters or is in the "sleep" state.
* False if functions are currently executing.
*/
bool sleep();
/** \brief Wake from the sleep state.
- *
+ *
* \return True if enters or is in the "ready" state.
* False if functions are currently executing.
*/
static bool wake();
/** \brief Wait until all dispatched functions to complete.
- *
+ *
* The parallel_for or parallel_reduce dispatch of a functor may
* return asynchronously, before the functor completes. This
* method does not return until all dispatched functors on this
* device have completed.
*/
static void fence();
/*------------------------------------------------------------------------*/
static int in_parallel();
static int is_initialized();
/** \brief Return maximum amount of concurrency */
static int concurrency();
static void initialize( int thread_count );
static void finalize();
/** \brief Print configuration information to the given output stream. */
- static void print_configuration( std::ostream & , const bool detail = false );
+ static void print_configuration( std::ostream &, const bool detail = false );
- int shepherd_size() const ;
- int shepherd_worker_size() const ;
+ int shepherd_size() const;
+ int shepherd_worker_size() const;
};
-/*--------------------------------------------------------------------------*/
-
} // namespace Kokkos
-/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
+
namespace Impl {
template<>
-struct MemorySpaceAccess
- < Kokkos::Qthread::memory_space
- , Kokkos::Qthread::scratch_memory_space
+struct MemorySpaceAccess
+ < Kokkos::Qthreads::memory_space
+ , Kokkos::Qthreads::scratch_memory_space
>
{
enum { assignable = false };
enum { accessible = true };
enum { deepcopy = false };
};
template<>
struct VerifyExecutionCanAccessMemorySpace
- < Kokkos::Qthread::memory_space
- , Kokkos::Qthread::scratch_memory_space
+ < Kokkos::Qthreads::memory_space
+ , Kokkos::Qthreads::scratch_memory_space
>
{
enum { value = true };
- inline static void verify( void ) { }
- inline static void verify( const void * ) { }
+ inline static void verify( void ) {}
+ inline static void verify( const void * ) {}
};
} // namespace Impl
+
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
-/*--------------------------------------------------------------------------*/
-
-#include <Kokkos_Parallel.hpp>
-#include <Qthread/Kokkos_QthreadExec.hpp>
-#include <Qthread/Kokkos_Qthread_Parallel.hpp>
-#endif /* #define KOKKOS_QTHREAD_HPP */
+#include <Qthreads/Kokkos_QthreadsExec.hpp>
+#include <Qthreads/Kokkos_Qthreads_Parallel.hpp>
+//#include <Qthreads/Kokkos_Qthreads_Task.hpp> // Uncomment when Tasking working.
+//#include <Qthreads/Kokkos_Qthreads_TaskQueue.hpp> // Uncomment when Tasking working.
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
+#endif // #define KOKKOS_ENABLE_QTHREADS
+#endif // #define KOKKOS_QTHREADS_HPP
diff --git a/lib/kokkos/core/src/Kokkos_Serial.hpp b/lib/kokkos/core/src/Kokkos_Serial.hpp
index f26253591..72710e816 100644
--- a/lib/kokkos/core/src/Kokkos_Serial.hpp
+++ b/lib/kokkos/core/src/Kokkos_Serial.hpp
@@ -1,1123 +1,825 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
/// \file Kokkos_Serial.hpp
/// \brief Declaration and definition of Kokkos::Serial device.
#ifndef KOKKOS_SERIAL_HPP
#define KOKKOS_SERIAL_HPP
#include <cstddef>
#include <iosfwd>
#include <Kokkos_Parallel.hpp>
#include <Kokkos_TaskScheduler.hpp>
#include <Kokkos_Layout.hpp>
#include <Kokkos_HostSpace.hpp>
#include <Kokkos_ScratchSpace.hpp>
#include <Kokkos_MemoryTraits.hpp>
#include <impl/Kokkos_Tags.hpp>
+#include <impl/Kokkos_HostThreadTeam.hpp>
+#include <impl/Kokkos_FunctorAnalysis.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
#include <impl/Kokkos_Profiling_Interface.hpp>
#include <KokkosExp_MDRangePolicy.hpp>
#if defined( KOKKOS_ENABLE_SERIAL )
namespace Kokkos {
/// \class Serial
/// \brief Kokkos device for non-parallel execution
///
/// A "device" represents a parallel execution model. It tells Kokkos
/// how to parallelize the execution of kernels in a parallel_for or
/// parallel_reduce. For example, the Threads device uses Pthreads or
/// C++11 threads on a CPU, the OpenMP device uses the OpenMP language
/// extensions, and the Cuda device uses NVIDIA's CUDA programming
/// model. The Serial device executes "parallel" kernels
/// sequentially. This is useful if you really do not want to use
/// threads, or if you want to explore different combinations of MPI
/// and shared-memory parallel programming models.
class Serial {
public:
//! \name Type declarations that all Kokkos devices must provide.
//@{
//! Tag this class as an execution space:
typedef Serial execution_space ;
//! The size_type typedef best suited for this device.
typedef HostSpace::size_type size_type ;
//! This device's preferred memory space.
typedef HostSpace memory_space ;
//! This execution space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
//! This device's preferred array layout.
typedef LayoutRight array_layout ;
/// \brief Scratch memory space
typedef ScratchMemorySpace< Kokkos::Serial > scratch_memory_space ;
//@}
/// \brief True if and only if this method is being called in a
/// thread-parallel function.
///
/// For the Serial device, this method <i>always</i> returns false,
/// because parallel_for or parallel_reduce with the Serial device
/// always execute sequentially.
inline static int in_parallel() { return false ; }
/** \brief Set the device in a "sleep" state.
*
* This function sets the device in a "sleep" state in which it is
* not ready for work. This may consume less resources than if the
* device were in an "awake" state, but it may also take time to
* bring the device from a sleep state to be ready for work.
*
* \return True if the device is in the "sleep" state, else false if
* the device is actively working and could not enter the "sleep"
* state.
*/
static bool sleep();
/// \brief Wake the device from the 'sleep' state so it is ready for work.
///
/// \return True if the device is in the "ready" state, else "false"
/// if the device is actively working (which also means that it's
/// awake).
static bool wake();
/// \brief Wait until all dispatched functors complete.
///
/// The parallel_for or parallel_reduce dispatch of a functor may
/// return asynchronously, before the functor completes. This
/// method does not return until all dispatched functors on this
/// device have completed.
static void fence() {}
static void initialize( unsigned threads_count = 1 ,
unsigned use_numa_count = 0 ,
unsigned use_cores_per_numa = 0 ,
- bool allow_asynchronous_threadpool = false) {
- (void) threads_count;
- (void) use_numa_count;
- (void) use_cores_per_numa;
- (void) allow_asynchronous_threadpool;
-
- // Init the array of locks used for arbitrarily sized atomics
- Impl::init_lock_array_host_space();
- #if (KOKKOS_ENABLE_PROFILING)
- Kokkos::Profiling::initialize();
- #endif
- }
+ bool allow_asynchronous_threadpool = false);
- static int is_initialized() { return 1 ; }
+ static int is_initialized();
/** \brief Return the maximum amount of concurrency. */
static int concurrency() {return 1;};
//! Free any resources being consumed by the device.
- static void finalize() {
- #if (KOKKOS_ENABLE_PROFILING)
- Kokkos::Profiling::finalize();
- #endif
- }
+ static void finalize();
//! Print configuration information to the given output stream.
static void print_configuration( std::ostream & , const bool /* detail */ = false ) {}
//--------------------------------------------------------------------------
inline static int thread_pool_size( int = 0 ) { return 1 ; }
KOKKOS_INLINE_FUNCTION static int thread_pool_rank() { return 0 ; }
//--------------------------------------------------------------------------
KOKKOS_INLINE_FUNCTION static unsigned hardware_thread_id() { return thread_pool_rank(); }
inline static unsigned max_hardware_threads() { return thread_pool_size(0); }
//--------------------------------------------------------------------------
-
- static void * scratch_memory_resize( unsigned reduce_size , unsigned shared_size );
-
- //--------------------------------------------------------------------------
};
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
template<>
-struct MemorySpaceAccess
+struct MemorySpaceAccess
< Kokkos::Serial::memory_space
, Kokkos::Serial::scratch_memory_space
>
{
enum { assignable = false };
enum { accessible = true };
enum { deepcopy = false };
};
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::Serial::memory_space
, Kokkos::Serial::scratch_memory_space
>
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
-namespace SerialImpl {
-
-struct Sentinel {
-
- void * m_scratch ;
- unsigned m_reduce_end ;
- unsigned m_shared_end ;
-
- Sentinel();
- ~Sentinel();
- static Sentinel & singleton();
-};
-
-inline
-unsigned align( unsigned n );
-}
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
-class SerialTeamMember {
-private:
- typedef Kokkos::ScratchMemorySpace< Kokkos::Serial > scratch_memory_space ;
- const scratch_memory_space m_space ;
- const int m_league_rank ;
- const int m_league_size ;
-
- SerialTeamMember & operator = ( const SerialTeamMember & );
-
-public:
-
- KOKKOS_INLINE_FUNCTION
- const scratch_memory_space & team_shmem() const { return m_space ; }
-
- KOKKOS_INLINE_FUNCTION
- const scratch_memory_space & team_scratch(int) const
- { return m_space ; }
-
- KOKKOS_INLINE_FUNCTION
- const scratch_memory_space & thread_scratch(int) const
- { return m_space ; }
-
- KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
- KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
- KOKKOS_INLINE_FUNCTION int team_rank() const { return 0 ; }
- KOKKOS_INLINE_FUNCTION int team_size() const { return 1 ; }
+// Resize thread team data scratch memory
+void serial_resize_thread_team_data( size_t pool_reduce_bytes
+ , size_t team_reduce_bytes
+ , size_t team_shared_bytes
+ , size_t thread_local_bytes );
- KOKKOS_INLINE_FUNCTION void team_barrier() const {}
+HostThreadTeamData * serial_get_thread_team_data();
- template<class ValueType>
- KOKKOS_INLINE_FUNCTION
- void team_broadcast(const ValueType& , const int& ) const {}
-
- template< class ValueType, class JoinOp >
- KOKKOS_INLINE_FUNCTION
- ValueType team_reduce( const ValueType & value , const JoinOp & ) const
- {
- return value ;
- }
-
- /** \brief Intra-team exclusive prefix sum with team_rank() ordering
- * with intra-team non-deterministic ordering accumulation.
- *
- * The global inter-team accumulation value will, at the end of the
- * league's parallel execution, be the scan's total.
- * Parallel execution ordering of the league's teams is non-deterministic.
- * As such the base value for each team's scan operation is similarly
- * non-deterministic.
- */
- template< typename Type >
- KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const
- {
- const Type tmp = global_accum ? *global_accum : Type(0) ;
- if ( global_accum ) { *global_accum += value ; }
- return tmp ;
- }
-
- /** \brief Intra-team exclusive prefix sum with team_rank() ordering.
- *
- * The highest rank thread can compute the reduction total as
- * reduction_total = dev.team_scan( value ) + value ;
- */
- template< typename Type >
- KOKKOS_INLINE_FUNCTION Type team_scan( const Type & ) const
- { return Type(0); }
-
- //----------------------------------------
- // Execution space specific:
+} /* namespace Impl */
+} /* namespace Kokkos */
- SerialTeamMember( int arg_league_rank
- , int arg_league_size
- , int arg_shared_size
- );
-};
-} // namespace Impl
+namespace Kokkos {
+namespace Impl {
/*
* < Kokkos::Serial , WorkArgTag >
* < WorkArgTag , Impl::enable_if< std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value >::type >
*
*/
-namespace Impl {
template< class ... Properties >
class TeamPolicyInternal< Kokkos::Serial , Properties ... >:public PolicyTraits<Properties...>
{
private:
size_t m_team_scratch_size[2] ;
size_t m_thread_scratch_size[2] ;
int m_league_size ;
int m_chunk_size;
public:
//! Tag this class as a kokkos execution policy
typedef TeamPolicyInternal execution_policy ;
typedef PolicyTraits<Properties ... > traits;
//! Execution space of this execution policy:
typedef Kokkos::Serial execution_space ;
TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
m_league_size = p.m_league_size;
m_team_scratch_size[0] = p.m_team_scratch_size[0];
m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
m_team_scratch_size[1] = p.m_team_scratch_size[1];
m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
m_chunk_size = p.m_chunk_size;
return *this;
}
//----------------------------------------
template< class FunctorType >
static
int team_size_max( const FunctorType & ) { return 1 ; }
template< class FunctorType >
static
int team_size_recommended( const FunctorType & ) { return 1 ; }
template< class FunctorType >
static
int team_size_recommended( const FunctorType & , const int& ) { return 1 ; }
//----------------------------------------
inline int team_size() const { return 1 ; }
inline int league_size() const { return m_league_size ; }
inline size_t scratch_size(const int& level, int = 0) const { return m_team_scratch_size[level] + m_thread_scratch_size[level]; }
/** \brief Specify league size, request team size */
TeamPolicyInternal( execution_space &
, int league_size_request
, int /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_league_size( league_size_request )
, m_chunk_size ( 32 )
{}
TeamPolicyInternal( execution_space &
, int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_league_size( league_size_request )
, m_chunk_size ( 32 )
{}
TeamPolicyInternal( int league_size_request
, int /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_league_size( league_size_request )
, m_chunk_size ( 32 )
{}
TeamPolicyInternal( int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_league_size( league_size_request )
, m_chunk_size ( 32 )
{}
inline int chunk_size() const { return m_chunk_size ; }
/** \brief set chunk_size to a discrete value*/
inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
TeamPolicyInternal p = *this;
p.m_chunk_size = chunk_size_;
return p;
}
/** \brief set per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
return p;
};
/** \brief set per thread scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
/** \brief set per thread and per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
- typedef Impl::SerialTeamMember member_type ;
+ typedef Impl::HostThreadTeamMember< Kokkos::Serial > member_type ;
};
} /* namespace Impl */
} /* namespace Kokkos */
-/*--------------------------------------------------------------------------*/
-/*--------------------------------------------------------------------------*/
-
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
/* Parallel patterns for Kokkos::Serial with RangePolicy */
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType ,
Kokkos::RangePolicy< Traits ... > ,
Kokkos::Serial
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec() const
{
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( i );
}
}
template< class TagType >
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec() const
{
const TagType t{} ;
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( t , i );
}
}
public:
inline
void execute() const
{ this-> template exec< typename Policy::work_tag >(); }
inline
ParallelFor( const FunctorType & arg_functor
, const Policy & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
{}
};
/*--------------------------------------------------------------------------*/
template< class FunctorType , class ReducerType , class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::RangePolicy< Traits ... >
, ReducerType
, Kokkos::Serial
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
- typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
+ typedef FunctorAnalysis< FunctorPatternInterface::REDUCE , Policy , FunctorType > Analysis ;
+
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
- typedef typename ValueTraits::pointer_type pointer_type ;
- typedef typename ValueTraits::reference_type reference_type ;
+ typedef typename Analysis::pointer_type pointer_type ;
+ typedef typename Analysis::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
template< class TagType >
inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
- exec( pointer_type ptr ) const
+ exec( reference_type update ) const
{
- reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
-
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( i , update );
}
-
- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
- final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
template< class TagType >
inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
- exec( pointer_type ptr ) const
+ exec( reference_type update ) const
{
const TagType t{} ;
- reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( t , i , update );
}
-
- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
- final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
public:
inline
void execute() const
{
- pointer_type ptr = (pointer_type) Kokkos::Serial::scratch_memory_resize
- ( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
+ const size_t pool_reduce_size =
+ Analysis::value_size( ReducerConditional::select(m_functor , m_reducer) );
+ const size_t team_reduce_size = 0 ; // Never shrinks
+ const size_t team_shared_size = 0 ; // Never shrinks
+ const size_t thread_local_size = 0 ; // Never shrinks
+
+ serial_resize_thread_team_data( pool_reduce_size
+ , team_reduce_size
+ , team_shared_size
+ , thread_local_size );
+
+ HostThreadTeamData & data = *serial_get_thread_team_data();
- this-> template exec< WorkTag >( m_result_ptr ? m_result_ptr : ptr );
+ pointer_type ptr =
+ m_result_ptr ? m_result_ptr : pointer_type(data.pool_reduce_local());
+
+ reference_type update =
+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
+
+ this-> template exec< WorkTag >( update );
+
+ Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::
+ final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
template< class HostViewType >
ParallelReduce( const FunctorType & arg_functor ,
const Policy & arg_policy ,
const HostViewType & arg_result_view ,
typename std::enable_if<
Kokkos::is_view< HostViewType >::value &&
!Kokkos::is_reducer_type<ReducerType>::value
,void*>::type = NULL)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( InvalidType() )
- , m_result_ptr( arg_result_view.ptr_on_device() )
+ , m_result_ptr( arg_result_view.data() )
{
static_assert( Kokkos::is_view< HostViewType >::value
, "Kokkos::Serial reduce result must be a View" );
static_assert( std::is_same< typename HostViewType::memory_space , HostSpace >::value
, "Kokkos::Serial reduce result must be a View in HostSpace" );
}
inline
ParallelReduce( const FunctorType & arg_functor
, Policy arg_policy
, const ReducerType& reducer )
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().data() )
{
/*static_assert( std::is_same< typename ViewType::memory_space
, Kokkos::HostSpace >::value
, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
}
};
/*--------------------------------------------------------------------------*/
template< class FunctorType , class ... Traits >
class ParallelScan< FunctorType
, Kokkos::RangePolicy< Traits ... >
, Kokkos::Serial
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
- typedef Kokkos::Impl::FunctorValueTraits< FunctorType , WorkTag > ValueTraits ;
+
+ typedef FunctorAnalysis< FunctorPatternInterface::SCAN , Policy , FunctorType > Analysis ;
+
typedef Kokkos::Impl::FunctorValueInit< FunctorType , WorkTag > ValueInit ;
- typedef typename ValueTraits::pointer_type pointer_type ;
- typedef typename ValueTraits::reference_type reference_type ;
+ typedef typename Analysis::pointer_type pointer_type ;
+ typedef typename Analysis::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
- exec( pointer_type ptr ) const
+ exec( reference_type update ) const
{
- reference_type update = ValueInit::init( m_functor , ptr );
-
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( i , update , true );
}
}
template< class TagType >
inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
- exec( pointer_type ptr ) const
+ exec( reference_type update ) const
{
const TagType t{} ;
- reference_type update = ValueInit::init( m_functor , ptr );
-
const typename Policy::member_type e = m_policy.end();
for ( typename Policy::member_type i = m_policy.begin() ; i < e ; ++i ) {
m_functor( t , i , update , true );
}
}
public:
inline
void execute() const
{
- pointer_type ptr = (pointer_type)
- Kokkos::Serial::scratch_memory_resize( ValueTraits::value_size( m_functor ) , 0 );
- this-> template exec< WorkTag >( ptr );
+ const size_t pool_reduce_size = Analysis::value_size( m_functor );
+ const size_t team_reduce_size = 0 ; // Never shrinks
+ const size_t team_shared_size = 0 ; // Never shrinks
+ const size_t thread_local_size = 0 ; // Never shrinks
+
+ serial_resize_thread_team_data( pool_reduce_size
+ , team_reduce_size
+ , team_shared_size
+ , thread_local_size );
+
+ HostThreadTeamData & data = *serial_get_thread_team_data();
+
+ reference_type update =
+ ValueInit::init( m_functor , pointer_type(data.pool_reduce_local()) );
+
+ this-> template exec< WorkTag >( update );
}
inline
ParallelScan( const FunctorType & arg_functor
, const Policy & arg_policy
)
: m_functor( arg_functor )
, m_policy( arg_policy )
{}
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
/* Parallel patterns for Kokkos::Serial with TeamPolicy */
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Properties >
class ParallelFor< FunctorType
, Kokkos::TeamPolicy< Properties ... >
, Kokkos::Serial
>
{
private:
+ enum { TEAM_REDUCE_SIZE = 512 };
+
typedef TeamPolicyInternal< Kokkos::Serial , Properties ...> Policy ;
typedef typename Policy::member_type Member ;
const FunctorType m_functor ;
const int m_league ;
const int m_shared ;
template< class TagType >
inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
- exec() const
+ exec( HostThreadTeamData & data ) const
{
for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
- m_functor( Member(ileague,m_league,m_shared) );
+ m_functor( Member(data,ileague,m_league) );
}
}
template< class TagType >
inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
- exec() const
+ exec( HostThreadTeamData & data ) const
{
const TagType t{} ;
for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
- m_functor( t , Member(ileague,m_league,m_shared) );
+ m_functor( t , Member(data,ileague,m_league) );
}
}
public:
inline
void execute() const
{
- Kokkos::Serial::scratch_memory_resize( 0 , m_shared );
- this-> template exec< typename Policy::work_tag >();
+ const size_t pool_reduce_size = 0 ; // Never shrinks
+ const size_t team_reduce_size = TEAM_REDUCE_SIZE ;
+ const size_t team_shared_size = m_shared ;
+ const size_t thread_local_size = 0 ; // Never shrinks
+
+ serial_resize_thread_team_data( pool_reduce_size
+ , team_reduce_size
+ , team_shared_size
+ , thread_local_size );
+
+ HostThreadTeamData & data = *serial_get_thread_team_data();
+
+ this->template exec< typename Policy::work_tag >( data );
}
ParallelFor( const FunctorType & arg_functor
, const Policy & arg_policy )
: m_functor( arg_functor )
, m_league( arg_policy.league_size() )
- , m_shared( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , 1 ) )
+ , m_shared( arg_policy.scratch_size(0) +
+ arg_policy.scratch_size(1) +
+ FunctorTeamShmemSize< FunctorType >::value( arg_functor , 1 ) )
{ }
};
/*--------------------------------------------------------------------------*/
template< class FunctorType , class ReducerType , class ... Properties >
class ParallelReduce< FunctorType
, Kokkos::TeamPolicy< Properties ... >
, ReducerType
, Kokkos::Serial
>
{
private:
+ enum { TEAM_REDUCE_SIZE = 512 };
+
typedef TeamPolicyInternal< Kokkos::Serial, Properties ... > Policy ;
+
+ typedef FunctorAnalysis< FunctorPatternInterface::REDUCE , Policy , FunctorType > Analysis ;
+
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
- typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
- typedef typename ValueTraits::pointer_type pointer_type ;
- typedef typename ValueTraits::reference_type reference_type ;
+ typedef typename Analysis::pointer_type pointer_type ;
+ typedef typename Analysis::reference_type reference_type ;
const FunctorType m_functor ;
const int m_league ;
const ReducerType m_reducer ;
pointer_type m_result_ptr ;
const int m_shared ;
template< class TagType >
inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
- exec( pointer_type ptr ) const
+ exec( HostThreadTeamData & data , reference_type update ) const
{
- reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
-
for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
- m_functor( Member(ileague,m_league,m_shared) , update );
+ m_functor( Member(data,ileague,m_league) , update );
}
-
- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
- final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
template< class TagType >
inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
- exec( pointer_type ptr ) const
+ exec( HostThreadTeamData & data , reference_type update ) const
{
const TagType t{} ;
- reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
-
for ( int ileague = 0 ; ileague < m_league ; ++ileague ) {
- m_functor( t , Member(ileague,m_league,m_shared) , update );
+ m_functor( t , Member(data,ileague,m_league) , update );
}
-
- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagType >::
- final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
public:
inline
void execute() const
{
- pointer_type ptr = (pointer_type) Kokkos::Serial::scratch_memory_resize
- ( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , m_shared );
+ const size_t pool_reduce_size =
+ Analysis::value_size( ReducerConditional::select(m_functor, m_reducer));
+
+ const size_t team_reduce_size = TEAM_REDUCE_SIZE ;
+ const size_t team_shared_size = m_shared ;
+ const size_t thread_local_size = 0 ; // Never shrinks
+
+ serial_resize_thread_team_data( pool_reduce_size
+ , team_reduce_size
+ , team_shared_size
+ , thread_local_size );
+
- this-> template exec< WorkTag >( m_result_ptr ? m_result_ptr : ptr );
+ HostThreadTeamData & data = *serial_get_thread_team_data();
+
+ pointer_type ptr =
+ m_result_ptr ? m_result_ptr : pointer_type(data.pool_reduce_local());
+
+ reference_type update =
+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , ptr );
+
+ this-> template exec< WorkTag >( data , update );
+
+ Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::
+ final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
template< class ViewType >
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const ViewType & arg_result ,
typename std::enable_if<
Kokkos::is_view< ViewType >::value &&
!Kokkos::is_reducer_type<ReducerType>::value
,void*>::type = NULL)
: m_functor( arg_functor )
, m_league( arg_policy.league_size() )
, m_reducer( InvalidType() )
- , m_result_ptr( arg_result.ptr_on_device() )
- , m_shared( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( m_functor , 1 ) )
+ , m_result_ptr( arg_result.data() )
+ , m_shared( arg_policy.scratch_size(0) +
+ arg_policy.scratch_size(1) +
+ FunctorTeamShmemSize< FunctorType >::value( m_functor , 1 ) )
{
static_assert( Kokkos::is_view< ViewType >::value
, "Reduction result on Kokkos::Serial must be a Kokkos::View" );
static_assert( std::is_same< typename ViewType::memory_space
, Kokkos::HostSpace >::value
, "Reduction result on Kokkos::Serial must be a Kokkos::View in HostSpace" );
}
inline
ParallelReduce( const FunctorType & arg_functor
- , Policy arg_policy
- , const ReducerType& reducer )
- : m_functor( arg_functor )
- , m_league( arg_policy.league_size() )
- , m_reducer( reducer )
- , m_result_ptr( reducer.result_view().data() )
- , m_shared( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , arg_policy.team_size() ) )
+ , Policy arg_policy
+ , const ReducerType& reducer )
+ : m_functor( arg_functor )
+ , m_league( arg_policy.league_size() )
+ , m_reducer( reducer )
+ , m_result_ptr( reducer.result_view().data() )
+ , m_shared( arg_policy.scratch_size(0) +
+ arg_policy.scratch_size(1) +
+ FunctorTeamShmemSize< FunctorType >::value( arg_functor , 1 ) )
{
/*static_assert( std::is_same< typename ViewType::memory_space
, Kokkos::HostSpace >::value
, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
}
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
-/* Nested parallel patterns for Kokkos::Serial with TeamPolicy */
-
-namespace Kokkos {
-namespace Impl {
-
-template<typename iType>
-struct TeamThreadRangeBoundariesStruct<iType,SerialTeamMember> {
- typedef iType index_type;
- const iType begin ;
- const iType end ;
- enum {increment = 1};
- const SerialTeamMember& thread;
-
- KOKKOS_INLINE_FUNCTION
- TeamThreadRangeBoundariesStruct (const SerialTeamMember& arg_thread, const iType& arg_count)
- : begin(0)
- , end(arg_count)
- , thread(arg_thread)
- {}
-
- KOKKOS_INLINE_FUNCTION
- TeamThreadRangeBoundariesStruct (const SerialTeamMember& arg_thread, const iType& arg_begin, const iType & arg_end )
- : begin( arg_begin )
- , end( arg_end)
- , thread( arg_thread )
- {}
-};
-
- template<typename iType>
- struct ThreadVectorRangeBoundariesStruct<iType,SerialTeamMember> {
- typedef iType index_type;
- enum {start = 0};
- const iType end;
- enum {increment = 1};
-
- KOKKOS_INLINE_FUNCTION
- ThreadVectorRangeBoundariesStruct (const SerialTeamMember& thread, const iType& count):
- end( count )
- {}
- };
-
-} // namespace Impl
-
-template< typename iType >
-KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>
-TeamThreadRange( const Impl::SerialTeamMember& thread, const iType & count )
-{
- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::SerialTeamMember >( thread, count );
-}
-
-template< typename iType1, typename iType2 >
-KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
- Impl::SerialTeamMember >
-TeamThreadRange( const Impl::SerialTeamMember& thread, const iType1 & begin, const iType2 & end )
-{
- typedef typename std::common_type< iType1, iType2 >::type iType;
- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::SerialTeamMember >( thread, iType(begin), iType(end) );
-}
-
-template<typename iType>
-KOKKOS_INLINE_FUNCTION
-Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >
- ThreadVectorRange(const Impl::SerialTeamMember& thread, const iType& count) {
- return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >(thread,count);
-}
-
-KOKKOS_INLINE_FUNCTION
-Impl::ThreadSingleStruct<Impl::SerialTeamMember> PerTeam(const Impl::SerialTeamMember& thread) {
- return Impl::ThreadSingleStruct<Impl::SerialTeamMember>(thread);
-}
-
-KOKKOS_INLINE_FUNCTION
-Impl::VectorSingleStruct<Impl::SerialTeamMember> PerThread(const Impl::SerialTeamMember& thread) {
- return Impl::VectorSingleStruct<Impl::SerialTeamMember>(thread);
-}
-
-} // namespace Kokkos
-
-namespace Kokkos {
-
- /** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all threads of the the calling thread team.
- * This functionality requires C++11 support.*/
-template<typename iType, class Lambda>
-KOKKOS_INLINE_FUNCTION
-void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>& loop_boundaries, const Lambda& lambda) {
- for( iType i = loop_boundaries.begin; i < loop_boundaries.end; i+=loop_boundaries.increment)
- lambda(i);
-}
-
-/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
- * val is performed and put into result. This functionality requires C++11 support.*/
-template< typename iType, class Lambda, typename ValueType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>& loop_boundaries,
- const Lambda & lambda, ValueType& result) {
-
- result = ValueType();
-
- for( iType i = loop_boundaries.begin; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- result+=tmp;
- }
-
- result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
-}
-
-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
- * val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
- * The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
- * the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
- * '1 for *'). This functionality requires C++11 support.*/
-template< typename iType, class Lambda, typename ValueType, class JoinType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::SerialTeamMember>& loop_boundaries,
- const Lambda & lambda, const JoinType& join, ValueType& init_result) {
-
- ValueType result = init_result;
-
- for( iType i = loop_boundaries.begin; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- join(result,tmp);
- }
-
- init_result = loop_boundaries.thread.team_reduce(result,Impl::JoinLambdaAdapter<ValueType,JoinType>(join));
-}
-
-} //namespace Kokkos
-
-namespace Kokkos {
-/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
- * This functionality requires C++11 support.*/
-template<typename iType, class Lambda>
-KOKKOS_INLINE_FUNCTION
-void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
- loop_boundaries, const Lambda& lambda) {
- #ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
- #pragma ivdep
- #endif
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
- lambda(i);
-}
-
-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
- * val is performed and put into result. This functionality requires C++11 support.*/
-template< typename iType, class Lambda, typename ValueType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
- loop_boundaries, const Lambda & lambda, ValueType& result) {
- result = ValueType();
-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
-#pragma ivdep
-#endif
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- result+=tmp;
- }
-}
-
-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
- * val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
- * The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
- * the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
- * '1 for *'). This functionality requires C++11 support.*/
-template< typename iType, class Lambda, typename ValueType, class JoinType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
- loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
-
- ValueType result = init_result;
-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
-#pragma ivdep
-#endif
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- join(result,tmp);
- }
- init_result = result;
-}
-
-/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
- * for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
- * Depending on the target execution space the operator might be called twice: once with final=false
- * and once with final=true. When final==true val contains the prefix sum value. The contribution of this
- * "i" needs to be added to val no matter whether final==true or not. In a serial execution
- * (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
- * to the final sum value over all vector lanes.
- * This functionality requires C++11 support.*/
-template< typename iType, class FunctorType >
-KOKKOS_INLINE_FUNCTION
-void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::SerialTeamMember >&
- loop_boundaries, const FunctorType & lambda) {
-
- typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
- typedef typename ValueTraits::value_type value_type ;
-
- value_type scan_val = value_type();
-
-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
-#pragma ivdep
-#endif
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- lambda(i,scan_val,true);
- }
-}
-
-} // namespace Kokkos
-
-namespace Kokkos {
-
-template<class FunctorType>
-KOKKOS_INLINE_FUNCTION
-void single(const Impl::VectorSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda) {
- lambda();
-}
-
-template<class FunctorType>
-KOKKOS_INLINE_FUNCTION
-void single(const Impl::ThreadSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda) {
- lambda();
-}
-
-template<class FunctorType, class ValueType>
-KOKKOS_INLINE_FUNCTION
-void single(const Impl::VectorSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda, ValueType& val) {
- lambda(val);
-}
-
-template<class FunctorType, class ValueType>
-KOKKOS_INLINE_FUNCTION
-void single(const Impl::ThreadSingleStruct<Impl::SerialTeamMember>& , const FunctorType& lambda, ValueType& val) {
- lambda(val);
-}
-}
-
-//----------------------------------------------------------------------------
#include <impl/Kokkos_Serial_Task.hpp>
#endif // defined( KOKKOS_ENABLE_SERIAL )
#endif /* #define KOKKOS_SERIAL_HPP */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
diff --git a/lib/kokkos/core/src/Kokkos_TaskScheduler.hpp b/lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
index e4271aa18..e25039d23 100644
--- a/lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
+++ b/lib/kokkos/core/src/Kokkos_TaskScheduler.hpp
@@ -1,692 +1,851 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TASKSCHEDULER_HPP
#define KOKKOS_TASKSCHEDULER_HPP
//----------------------------------------------------------------------------
#include <Kokkos_Core_fwd.hpp>
// If compiling with CUDA then must be using CUDA 8 or better
// and use relocateable device code to enable the task policy.
// nvcc relocatable device code option: --relocatable-device-code=true
#if ( defined( KOKKOS_ENABLE_CUDA ) )
#if ( 8000 <= CUDA_VERSION ) && \
defined( KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE )
#define KOKKOS_ENABLE_TASKDAG
#endif
#else
#define KOKKOS_ENABLE_TASKDAG
#endif
#if defined( KOKKOS_ENABLE_TASKDAG )
//----------------------------------------------------------------------------
#include <Kokkos_MemoryPool.hpp>
#include <impl/Kokkos_Tags.hpp>
//----------------------------------------------------------------------------
namespace Kokkos {
// Forward declarations used in Impl::TaskQueue
template< typename Arg1 = void , typename Arg2 = void >
class Future ;
template< typename Space >
class TaskScheduler ;
+template< typename Space >
+void wait( TaskScheduler< Space > const & );
+
+template< typename Space >
+struct is_scheduler : public std::false_type {};
+
+template< typename Space >
+struct is_scheduler< TaskScheduler< Space > > : public std::true_type {};
+
} // namespace Kokkos
#include <impl/Kokkos_TaskQueue.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
/*\brief Implementation data for task data management, access, and execution.
*
* CRTP Inheritance structure to allow static_cast from the
* task root type and a task's FunctorType.
*
* TaskBase< Space , ResultType , FunctorType >
* : TaskBase< Space , ResultType , void >
* , FunctorType
* { ... };
*
* TaskBase< Space , ResultType , void >
* : TaskBase< Space , void , void >
* { ... };
*/
template< typename Space , typename ResultType , typename FunctorType >
class TaskBase ;
-template< typename Space >
-class TaskExec ;
-
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
namespace Kokkos {
/**
*
* Future< space > // value_type == void
* Future< value > // space == Default
* Future< value , space >
*
*/
template< typename Arg1 , typename Arg2 >
class Future {
private:
template< typename > friend class TaskScheduler ;
template< typename , typename > friend class Future ;
template< typename , typename , typename > friend class Impl::TaskBase ;
enum { Arg1_is_space = Kokkos::is_space< Arg1 >::value };
enum { Arg2_is_space = Kokkos::is_space< Arg2 >::value };
enum { Arg1_is_value = ! Arg1_is_space &&
! std::is_same< Arg1 , void >::value };
enum { Arg2_is_value = ! Arg2_is_space &&
! std::is_same< Arg2 , void >::value };
static_assert( ! ( Arg1_is_space && Arg2_is_space )
, "Future cannot be given two spaces" );
static_assert( ! ( Arg1_is_value && Arg2_is_value )
, "Future cannot be given two value types" );
using ValueType =
typename std::conditional< Arg1_is_value , Arg1 ,
typename std::conditional< Arg2_is_value , Arg2 , void
>::type >::type ;
using Space =
typename std::conditional< Arg1_is_space , Arg1 ,
typename std::conditional< Arg2_is_space , Arg2 , void
>::type >::type ;
using task_base = Impl::TaskBase< Space , ValueType , void > ;
using queue_type = Impl::TaskQueue< Space > ;
task_base * m_task ;
KOKKOS_INLINE_FUNCTION explicit
Future( task_base * task ) : m_task(0)
{ if ( task ) queue_type::assign( & m_task , task ); }
//----------------------------------------
public:
using execution_space = typename Space::execution_space ;
using value_type = ValueType ;
//----------------------------------------
KOKKOS_INLINE_FUNCTION
bool is_null() const { return 0 == m_task ; }
KOKKOS_INLINE_FUNCTION
int reference_count() const
{ return 0 != m_task ? m_task->reference_count() : 0 ; }
//----------------------------------------
KOKKOS_INLINE_FUNCTION
void clear()
{ if ( m_task ) queue_type::assign( & m_task , (task_base*)0 ); }
//----------------------------------------
KOKKOS_INLINE_FUNCTION
~Future() { clear(); }
//----------------------------------------
KOKKOS_INLINE_FUNCTION
constexpr Future() noexcept : m_task(0) {}
KOKKOS_INLINE_FUNCTION
Future( Future && rhs )
: m_task( rhs.m_task ) { rhs.m_task = 0 ; }
KOKKOS_INLINE_FUNCTION
Future( const Future & rhs )
: m_task(0)
{ if ( rhs.m_task ) queue_type::assign( & m_task , rhs.m_task ); }
KOKKOS_INLINE_FUNCTION
Future & operator = ( Future && rhs )
{
clear();
m_task = rhs.m_task ;
rhs.m_task = 0 ;
return *this ;
}
KOKKOS_INLINE_FUNCTION
Future & operator = ( const Future & rhs )
{
if ( m_task || rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
return *this ;
}
//----------------------------------------
template< class A1 , class A2 >
KOKKOS_INLINE_FUNCTION
Future( Future<A1,A2> && rhs )
: m_task( rhs.m_task )
{
static_assert
( std::is_same< Space , void >::value ||
std::is_same< Space , typename Future<A1,A2>::Space >::value
, "Assigned Futures must have the same space" );
static_assert
( std::is_same< value_type , void >::value ||
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
, "Assigned Futures must have the same value_type" );
rhs.m_task = 0 ;
}
template< class A1 , class A2 >
KOKKOS_INLINE_FUNCTION
Future( const Future<A1,A2> & rhs )
: m_task(0)
{
static_assert
( std::is_same< Space , void >::value ||
std::is_same< Space , typename Future<A1,A2>::Space >::value
, "Assigned Futures must have the same space" );
static_assert
( std::is_same< value_type , void >::value ||
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
, "Assigned Futures must have the same value_type" );
if ( rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
}
template< class A1 , class A2 >
KOKKOS_INLINE_FUNCTION
Future & operator = ( const Future<A1,A2> & rhs )
{
static_assert
( std::is_same< Space , void >::value ||
std::is_same< Space , typename Future<A1,A2>::Space >::value
, "Assigned Futures must have the same space" );
static_assert
( std::is_same< value_type , void >::value ||
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
, "Assigned Futures must have the same value_type" );
if ( m_task || rhs.m_task ) queue_type::assign( & m_task , rhs.m_task );
return *this ;
}
template< class A1 , class A2 >
KOKKOS_INLINE_FUNCTION
Future & operator = ( Future<A1,A2> && rhs )
{
static_assert
( std::is_same< Space , void >::value ||
std::is_same< Space , typename Future<A1,A2>::Space >::value
, "Assigned Futures must have the same space" );
static_assert
( std::is_same< value_type , void >::value ||
std::is_same< value_type , typename Future<A1,A2>::value_type >::value
, "Assigned Futures must have the same value_type" );
clear();
m_task = rhs.m_task ;
rhs.m_task = 0 ;
return *this ;
}
//----------------------------------------
KOKKOS_INLINE_FUNCTION
typename task_base::get_return_type
get() const
{
if ( 0 == m_task ) {
Kokkos::abort( "Kokkos:::Future::get ERROR: is_null()");
}
return m_task->get();
}
};
+// Is a Future with the given execution space
+template< typename , typename ExecSpace = void >
+struct is_future : public std::false_type {};
+
+template< typename Arg1 , typename Arg2 , typename ExecSpace >
+struct is_future< Future<Arg1,Arg2> , ExecSpace >
+ : public std::integral_constant
+ < bool ,
+ ( std::is_same< ExecSpace , void >::value ||
+ std::is_same< ExecSpace
+ , typename Future<Arg1,Arg2>::execution_space >::value )
+ > {};
+
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
-enum TaskType { TaskTeam = Impl::TaskBase<void,void,void>::TaskTeam
- , TaskSingle = Impl::TaskBase<void,void,void>::TaskSingle };
+enum class TaskPriority : int { High = 0
+ , Regular = 1
+ , Low = 2 };
-enum TaskPriority { TaskHighPriority = 0
- , TaskRegularPriority = 1
- , TaskLowPriority = 2 };
+} // namespace Kokkos
-template< typename Space >
-void wait( TaskScheduler< Space > const & );
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+//----------------------------------------------------------------------------
+
+template< int TaskEnum , typename DepFutureType >
+struct TaskPolicyData
+{
+ using execution_space = typename DepFutureType::execution_space ;
+ using scheduler_type = TaskScheduler< execution_space > ;
+
+ enum : int { m_task_type = TaskEnum };
+
+ scheduler_type const * m_scheduler ;
+ DepFutureType const m_dependence ;
+ int m_priority ;
+
+ TaskPolicyData() = delete ;
+ TaskPolicyData( TaskPolicyData && ) = default ;
+ TaskPolicyData( TaskPolicyData const & ) = default ;
+ TaskPolicyData & operator = ( TaskPolicyData && ) = default ;
+ TaskPolicyData & operator = ( TaskPolicyData const & ) = default ;
+
+ KOKKOS_INLINE_FUNCTION
+ TaskPolicyData( DepFutureType && arg_future
+ , Kokkos::TaskPriority const & arg_priority )
+ : m_scheduler( 0 )
+ , m_dependence( arg_future )
+ , m_priority( static_cast<int>( arg_priority ) )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ TaskPolicyData( scheduler_type const & arg_scheduler
+ , Kokkos::TaskPriority const & arg_priority )
+ : m_scheduler( & arg_scheduler )
+ , m_dependence()
+ , m_priority( static_cast<int>( arg_priority ) )
+ {}
+};
+} // namespace Impl
} // namespace Kokkos
+//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
template< typename ExecSpace >
class TaskScheduler
{
private:
using track_type = Kokkos::Impl::SharedAllocationTracker ;
using queue_type = Kokkos::Impl::TaskQueue< ExecSpace > ;
using task_base = Impl::TaskBase< ExecSpace , void , void > ;
track_type m_track ;
queue_type * m_queue ;
//----------------------------------------
- // Process optional arguments to spawn and respawn functions
-
- KOKKOS_INLINE_FUNCTION static
- void assign( task_base * const ) {}
-
- // TaskTeam or TaskSingle
- template< typename ... Options >
- KOKKOS_INLINE_FUNCTION static
- void assign( task_base * const task
- , TaskType const & arg
- , Options const & ... opts )
- {
- task->m_task_type = arg ;
- assign( task , opts ... );
- }
-
- // TaskHighPriority or TaskRegularPriority or TaskLowPriority
- template< typename ... Options >
- KOKKOS_INLINE_FUNCTION static
- void assign( task_base * const task
- , TaskPriority const & arg
- , Options const & ... opts )
- {
- task->m_priority = arg ;
- assign( task , opts ... );
- }
-
- // Future for a dependence
- template< typename A1 , typename A2 , typename ... Options >
- KOKKOS_INLINE_FUNCTION static
- void assign( task_base * const task
- , Future< A1 , A2 > const & arg
- , Options const & ... opts )
- {
- task->add_dependence( arg.m_task );
- assign( task , opts ... );
- }
-
- //----------------------------------------
public:
- using execution_policy = TaskScheduler ;
using execution_space = ExecSpace ;
using memory_space = typename queue_type::memory_space ;
- using member_type = Kokkos::Impl::TaskExec< ExecSpace > ;
+ using member_type =
+ typename Kokkos::Impl::TaskQueueSpecialization< ExecSpace >::member_type ;
KOKKOS_INLINE_FUNCTION
TaskScheduler() : m_track(), m_queue(0) {}
KOKKOS_INLINE_FUNCTION
TaskScheduler( TaskScheduler && rhs ) = default ;
KOKKOS_INLINE_FUNCTION
TaskScheduler( TaskScheduler const & rhs ) = default ;
KOKKOS_INLINE_FUNCTION
TaskScheduler & operator = ( TaskScheduler && rhs ) = default ;
KOKKOS_INLINE_FUNCTION
TaskScheduler & operator = ( TaskScheduler const & rhs ) = default ;
TaskScheduler( memory_space const & arg_memory_space
, unsigned const arg_memory_pool_capacity
, unsigned const arg_memory_pool_log2_superblock = 12 )
: m_track()
, m_queue(0)
{
typedef Kokkos::Impl::SharedAllocationRecord
< memory_space , typename queue_type::Destroy >
record_type ;
record_type * record =
record_type::allocate( arg_memory_space
, "TaskQueue"
, sizeof(queue_type)
);
m_queue = new( record->data() )
queue_type( arg_memory_space
, arg_memory_pool_capacity
, arg_memory_pool_log2_superblock );
record->m_destroy.m_queue = m_queue ;
m_track.assign_allocated_record_to_uninitialized( record );
}
//----------------------------------------
/**\brief Allocation size for a spawned task */
template< typename FunctorType >
KOKKOS_FUNCTION
size_t spawn_allocation_size() const
{
using task_type = Impl::TaskBase< execution_space
, typename FunctorType::value_type
, FunctorType > ;
return m_queue->allocate_block_size( sizeof(task_type) );
}
/**\brief Allocation size for a when_all aggregate */
KOKKOS_FUNCTION
size_t when_all_allocation_size( int narg ) const
{
using task_base = Kokkos::Impl::TaskBase< ExecSpace , void , void > ;
return m_queue->allocate_block_size( sizeof(task_base) + narg * sizeof(task_base*) );
}
//----------------------------------------
- /**\brief A task spawns a task with options
- *
- * 1) High, Normal, or Low priority
- * 2) With or without dependence
- * 3) Team or Serial
- */
- template< typename FunctorType , typename ... Options >
- KOKKOS_FUNCTION
- Future< typename FunctorType::value_type , ExecSpace >
- task_spawn( FunctorType const & arg_functor
- , Options const & ... arg_options
- ) const
+ template< int TaskEnum , typename DepFutureType , typename FunctorType >
+ KOKKOS_FUNCTION static
+ Kokkos::Future< typename FunctorType::value_type , execution_space >
+ spawn( Impl::TaskPolicyData<TaskEnum,DepFutureType> const & arg_policy
+ , typename task_base::function_type arg_function
+ , FunctorType && arg_functor
+ )
{
using value_type = typename FunctorType::value_type ;
using future_type = Future< value_type , execution_space > ;
using task_type = Impl::TaskBase< execution_space
, value_type
, FunctorType > ;
+ queue_type * const queue =
+ arg_policy.m_scheduler ? arg_policy.m_scheduler->m_queue : (
+ arg_policy.m_dependence.m_task
+ ? arg_policy.m_dependence.m_task->m_queue
+ : (queue_type*) 0 );
+
+ if ( 0 == queue ) {
+ Kokkos::abort("Kokkos spawn given null Future" );
+ }
+
//----------------------------------------
// Give single-thread back-ends an opportunity to clear
// queue of ready tasks before allocating a new task
- m_queue->iff_single_thread_recursive_execute();
+ queue->iff_single_thread_recursive_execute();
//----------------------------------------
future_type f ;
// Allocate task from memory pool
f.m_task =
- reinterpret_cast< task_type * >(m_queue->allocate(sizeof(task_type)));
+ reinterpret_cast< task_type * >(queue->allocate(sizeof(task_type)));
if ( f.m_task ) {
// Placement new construction
- new ( f.m_task ) task_type( arg_functor );
-
- // Reference count starts at two
- // +1 for matching decrement when task is complete
- // +1 for future
- f.m_task->m_queue = m_queue ;
- f.m_task->m_ref_count = 2 ;
- f.m_task->m_alloc_size = sizeof(task_type);
-
- assign( f.m_task , arg_options... );
-
- // Spawning from within the execution space so the
- // apply function pointer is guaranteed to be valid
- f.m_task->m_apply = task_type::apply ;
-
- m_queue->schedule( f.m_task );
- // this task may be updated or executed at any moment
+ // Reference count starts at two:
+ // +1 for the matching decrement when task is complete
+ // +1 for the future
+ new ( f.m_task )
+ task_type( arg_function
+ , queue
+ , arg_policy.m_dependence.m_task /* dependence */
+ , 2 /* reference count */
+ , int(sizeof(task_type)) /* allocation size */
+ , int(arg_policy.m_task_type)
+ , int(arg_policy.m_priority)
+ , std::move(arg_functor) );
+
+ // The dependence (if any) is processed immediately
+ // within the schedule function, as such the dependence's
+ // reference count does not need to be incremented for
+ // the assignment.
+
+ queue->schedule_runnable( f.m_task );
+ // This task may be updated or executed at any moment,
+ // even during the call to 'schedule'.
}
return f ;
}
- /**\brief The host process spawns a task with options
- *
- * 1) High, Normal, or Low priority
- * 2) With or without dependence
- * 3) Team or Serial
- */
- template< typename FunctorType , typename ... Options >
- inline
- Future< typename FunctorType::value_type , ExecSpace >
- host_spawn( FunctorType const & arg_functor
- , Options const & ... arg_options
- ) const
+ template< typename FunctorType , typename A1 , typename A2 >
+ KOKKOS_FUNCTION static
+ void
+ respawn( FunctorType * arg_self
+ , Future<A1,A2> const & arg_dependence
+ , TaskPriority const & arg_priority
+ )
{
+ // Precondition: task is in Executing state
+
using value_type = typename FunctorType::value_type ;
- using future_type = Future< value_type , execution_space > ;
using task_type = Impl::TaskBase< execution_space
, value_type
, FunctorType > ;
- if ( m_queue == 0 ) {
- Kokkos::abort("Kokkos::TaskScheduler not initialized");
- }
+ task_type * const task = static_cast< task_type * >( arg_self );
- future_type f ;
+ task->m_priority = static_cast<int>(arg_priority);
- // Allocate task from memory pool
- f.m_task =
- reinterpret_cast<task_type*>( m_queue->allocate(sizeof(task_type)) );
-
- if ( f.m_task ) {
-
- // Placement new construction
- new( f.m_task ) task_type( arg_functor );
-
- // Reference count starts at two:
- // +1 to match decrement when task completes
- // +1 for the future
- f.m_task->m_queue = m_queue ;
- f.m_task->m_ref_count = 2 ;
- f.m_task->m_alloc_size = sizeof(task_type);
-
- assign( f.m_task , arg_options... );
-
- // Potentially spawning outside execution space so the
- // apply function pointer must be obtained from execution space.
- // Required for Cuda execution space function pointer.
- m_queue->template proc_set_apply< FunctorType >( & f.m_task->m_apply );
+ task->add_dependence( arg_dependence.m_task );
- m_queue->schedule( f.m_task );
- }
- return f ;
+ // Postcondition: task is in Executing-Respawn state
}
+ //----------------------------------------
/**\brief Return a future that is complete
* when all input futures are complete.
*/
template< typename A1 , typename A2 >
- KOKKOS_FUNCTION
- Future< ExecSpace >
- when_all( int narg , Future< A1 , A2 > const * const arg ) const
+ KOKKOS_FUNCTION static
+ Future< execution_space >
+ when_all( Future< A1 , A2 > const arg[] , int narg )
{
- static_assert
- ( std::is_same< execution_space
- , typename Future< A1 , A2 >::execution_space
- >::value
- , "Future must have same execution space" );
-
- using future_type = Future< ExecSpace > ;
- using task_base = Kokkos::Impl::TaskBase< ExecSpace , void , void > ;
+ using future_type = Future< execution_space > ;
+ using task_base = Kokkos::Impl::TaskBase< execution_space , void , void > ;
future_type f ;
- size_t const size = sizeof(task_base) + narg * sizeof(task_base*);
-
- f.m_task =
- reinterpret_cast< task_base * >( m_queue->allocate( size ) );
+ if ( narg ) {
- if ( f.m_task ) {
-
- new( f.m_task ) task_base();
-
- // Reference count starts at two:
- // +1 to match decrement when task completes
- // +1 for the future
- f.m_task->m_queue = m_queue ;
- f.m_task->m_ref_count = 2 ;
- f.m_task->m_alloc_size = size ;
- f.m_task->m_dep_count = narg ;
- f.m_task->m_task_type = task_base::Aggregate ;
-
- task_base ** const dep = f.m_task->aggregate_dependences();
-
- // Assign dependences to increment their reference count
- // The futures may be destroyed upon returning from this call
- // so increment reference count to track this assignment.
+ queue_type * queue = 0 ;
for ( int i = 0 ; i < narg ; ++i ) {
- task_base * const t = dep[i] = arg[i].m_task ;
+ task_base * const t = arg[i].m_task ;
if ( 0 != t ) {
+ // Increment reference count to track subsequent assignment.
Kokkos::atomic_increment( &(t->m_ref_count) );
+ if ( queue == 0 ) {
+ queue = t->m_queue ;
+ }
+ else if ( queue != t->m_queue ) {
+ Kokkos::abort("Kokkos when_all Futures must be in the same scheduler" );
+ }
}
}
- m_queue->schedule( f.m_task );
- // this when_all may be processed at any moment
- }
+ if ( queue != 0 ) {
- return f ;
- }
+ size_t const size = sizeof(task_base) + narg * sizeof(task_base*);
- /**\brief An executing task respawns itself with options
- *
- * 1) High, Normal, or Low priority
- * 2) With or without dependence
- */
- template< class FunctorType , typename ... Options >
- KOKKOS_FUNCTION
- void respawn( FunctorType * task_self
- , Options const & ... arg_options ) const
- {
- using value_type = typename FunctorType::value_type ;
- using task_type = Impl::TaskBase< execution_space
- , value_type
- , FunctorType > ;
+ f.m_task =
+ reinterpret_cast< task_base * >( queue->allocate( size ) );
- task_type * const task = static_cast< task_type * >( task_self );
+ if ( f.m_task ) {
- // Reschedule task with no dependences.
- m_queue->reschedule( task );
+ // Reference count starts at two:
+ // +1 to match decrement when task completes
+ // +1 for the future
+ new( f.m_task ) task_base( queue
+ , 2 /* reference count */
+ , size /* allocation size */
+ , narg /* dependence count */
+ );
- // Dependences, if requested, are added here through parsing the arguments.
- assign( task , arg_options... );
- }
+ // Assign dependences, reference counts were already incremented
- //----------------------------------------
+ task_base ** const dep = f.m_task->aggregate_dependences();
- template< typename S >
- friend
- void Kokkos::wait( Kokkos::TaskScheduler< S > const & );
+ for ( int i = 0 ; i < narg ; ++i ) { dep[i] = arg[i].m_task ; }
+
+ queue->schedule_aggregate( f.m_task );
+ // this when_all may be processed at any moment
+ }
+ }
+ }
+
+ return f ;
+ }
//----------------------------------------
- inline
+ KOKKOS_INLINE_FUNCTION
int allocation_capacity() const noexcept
{ return m_queue->m_memory.get_mem_size(); }
KOKKOS_INLINE_FUNCTION
int allocated_task_count() const noexcept
{ return m_queue->m_count_alloc ; }
KOKKOS_INLINE_FUNCTION
int allocated_task_count_max() const noexcept
{ return m_queue->m_max_alloc ; }
KOKKOS_INLINE_FUNCTION
long allocated_task_count_accum() const noexcept
{ return m_queue->m_accum_alloc ; }
+ //----------------------------------------
+
+ template< typename S >
+ friend
+ void Kokkos::wait( Kokkos::TaskScheduler< S > const & );
+
};
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+//----------------------------------------------------------------------------
+// Construct a TaskTeam execution policy
+
+template< typename T >
+Kokkos::Impl::TaskPolicyData
+ < Kokkos::Impl::TaskBase<void,void,void>::TaskTeam
+ , typename std::conditional< Kokkos::is_future< T >::value , T ,
+ typename Kokkos::Future< typename T::execution_space > >::type
+ >
+KOKKOS_INLINE_FUNCTION
+TaskTeam( T const & arg
+ , TaskPriority const & arg_priority = TaskPriority::Regular
+ )
+{
+ static_assert( Kokkos::is_future<T>::value ||
+ Kokkos::is_scheduler<T>::value
+ , "Kokkos TaskTeam argument must be Future or TaskScheduler" );
+
+ return
+ Kokkos::Impl::TaskPolicyData
+ < Kokkos::Impl::TaskBase<void,void,void>::TaskTeam
+ , typename std::conditional< Kokkos::is_future< T >::value , T ,
+ typename Kokkos::Future< typename T::execution_space > >::type
+ >( arg , arg_priority );
+}
+
+// Construct a TaskSingle execution policy
+
+template< typename T >
+Kokkos::Impl::TaskPolicyData
+ < Kokkos::Impl::TaskBase<void,void,void>::TaskSingle
+ , typename std::conditional< Kokkos::is_future< T >::value , T ,
+ typename Kokkos::Future< typename T::execution_space > >::type
+ >
+KOKKOS_INLINE_FUNCTION
+TaskSingle( T const & arg
+ , TaskPriority const & arg_priority = TaskPriority::Regular
+ )
+{
+ static_assert( Kokkos::is_future<T>::value ||
+ Kokkos::is_scheduler<T>::value
+ , "Kokkos TaskSingle argument must be Future or TaskScheduler" );
+
+ return
+ Kokkos::Impl::TaskPolicyData
+ < Kokkos::Impl::TaskBase<void,void,void>::TaskSingle
+ , typename std::conditional< Kokkos::is_future< T >::value , T ,
+ typename Kokkos::Future< typename T::execution_space > >::type
+ >( arg , arg_priority );
+}
+
+//----------------------------------------------------------------------------
+
+/**\brief A host control thread spawns a task with options
+ *
+ * 1) Team or Serial
+ * 2) With scheduler or dependence
+ * 3) High, Normal, or Low priority
+ */
+template< int TaskEnum
+ , typename DepFutureType
+ , typename FunctorType >
+Future< typename FunctorType::value_type
+ , typename DepFutureType::execution_space >
+host_spawn( Impl::TaskPolicyData<TaskEnum,DepFutureType> const & arg_policy
+ , FunctorType && arg_functor
+ )
+{
+ using exec_space = typename DepFutureType::execution_space ;
+ using scheduler = TaskScheduler< exec_space > ;
+
+ typedef Impl::TaskBase< exec_space
+ , typename FunctorType::value_type
+ , FunctorType
+ > task_type ;
+
+ static_assert( TaskEnum == task_type::TaskTeam ||
+ TaskEnum == task_type::TaskSingle
+ , "Kokkos host_spawn requires TaskTeam or TaskSingle" );
+
+ // May be spawning a Cuda task, must use the specialization
+ // to query on-device function pointer.
+ typename task_type::function_type const ptr =
+ Kokkos::Impl::TaskQueueSpecialization< exec_space >::
+ template get_function_pointer< task_type >();
+
+ return scheduler::spawn( arg_policy , ptr , std::move(arg_functor) );
+}
+
+/**\brief A task spawns a task with options
+ *
+ * 1) Team or Serial
+ * 2) With scheduler or dependence
+ * 3) High, Normal, or Low priority
+ */
+template< int TaskEnum
+ , typename DepFutureType
+ , typename FunctorType >
+Future< typename FunctorType::value_type
+ , typename DepFutureType::execution_space >
+KOKKOS_INLINE_FUNCTION
+task_spawn( Impl::TaskPolicyData<TaskEnum,DepFutureType> const & arg_policy
+ , FunctorType && arg_functor
+ )
+{
+ using exec_space = typename DepFutureType::execution_space ;
+ using scheduler = TaskScheduler< exec_space > ;
+
+ typedef Impl::TaskBase< exec_space
+ , typename FunctorType::value_type
+ , FunctorType
+ > task_type ;
+
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST ) && \
+ defined( KOKKOS_ENABLE_CUDA )
+
+ static_assert( ! std::is_same< Kokkos::Cuda , exec_space >::value
+ , "Error calling Kokkos::task_spawn for Cuda space within Host code" );
+
+#endif
+
+ static_assert( TaskEnum == task_type::TaskTeam ||
+ TaskEnum == task_type::TaskSingle
+ , "Kokkos host_spawn requires TaskTeam or TaskSingle" );
+
+ typename task_type::function_type const ptr = task_type::apply ;
+
+ return scheduler::spawn( arg_policy , ptr , std::move(arg_functor) );
+}
+
+/**\brief A task respawns itself with options
+ *
+ * 1) With scheduler or dependence
+ * 2) High, Normal, or Low priority
+ */
+template< typename FunctorType , typename T >
+void
+KOKKOS_INLINE_FUNCTION
+respawn( FunctorType * arg_self
+ , T const & arg
+ , TaskPriority const & arg_priority = TaskPriority::Regular
+ )
+{
+ static_assert( Kokkos::is_future<T>::value ||
+ Kokkos::is_scheduler<T>::value
+ , "Kokkos respawn argument must be Future or TaskScheduler" );
+
+ TaskScheduler< typename T::execution_space >::
+ respawn( arg_self , arg , arg_priority );
+}
+
+//----------------------------------------------------------------------------
+
+template< typename A1 , typename A2 >
+KOKKOS_INLINE_FUNCTION
+Future< typename Future< A1 , A2 >::execution_space >
+when_all( Future< A1 , A2 > const arg[]
+ , int narg
+ )
+{
+ return TaskScheduler< typename Future<A1,A2>::execution_space >::
+ when_all( arg , narg );
+}
+
+//----------------------------------------------------------------------------
+// Wait for all runnable tasks to complete
+
template< typename ExecSpace >
inline
-void wait( TaskScheduler< ExecSpace > const & policy )
-{ policy.m_queue->execute(); }
+void wait( TaskScheduler< ExecSpace > const & scheduler )
+{ scheduler.m_queue->execute(); }
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #ifndef KOKKOS_TASKSCHEDULER_HPP */
diff --git a/lib/kokkos/core/src/Kokkos_Threads.hpp b/lib/kokkos/core/src/Kokkos_Threads.hpp
index aca482b42..8aa968d05 100644
--- a/lib/kokkos/core/src/Kokkos_Threads.hpp
+++ b/lib/kokkos/core/src/Kokkos_Threads.hpp
@@ -1,233 +1,232 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_THREADS_HPP
#define KOKKOS_THREADS_HPP
#include <Kokkos_Core_fwd.hpp>
#if defined( KOKKOS_ENABLE_PTHREAD )
#include <cstddef>
#include <iosfwd>
#include <Kokkos_HostSpace.hpp>
#include <Kokkos_ScratchSpace.hpp>
#include <Kokkos_Layout.hpp>
#include <Kokkos_MemoryTraits.hpp>
#include <impl/Kokkos_Tags.hpp>
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
class ThreadsExec ;
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/** \brief Execution space for a pool of Pthreads or C11 threads on a CPU. */
class Threads {
public:
//! \name Type declarations that all Kokkos devices must provide.
//@{
//! Tag this class as a kokkos execution space
typedef Threads execution_space ;
typedef Kokkos::HostSpace memory_space ;
//! This execution space preferred device_type
typedef Kokkos::Device<execution_space,memory_space> device_type;
typedef Kokkos::LayoutRight array_layout ;
typedef memory_space::size_type size_type ;
typedef ScratchMemorySpace< Threads > scratch_memory_space ;
//@}
/*------------------------------------------------------------------------*/
//! \name Static functions that all Kokkos devices must implement.
//@{
/// \brief True if and only if this method is being called in a
/// thread-parallel function.
static int in_parallel();
/** \brief Set the device in a "sleep" state.
*
* This function sets the device in a "sleep" state in which it is
* not ready for work. This may consume less resources than if the
* device were in an "awake" state, but it may also take time to
* bring the device from a sleep state to be ready for work.
*
* \return True if the device is in the "sleep" state, else false if
* the device is actively working and could not enter the "sleep"
* state.
*/
static bool sleep();
/// \brief Wake the device from the 'sleep' state so it is ready for work.
///
/// \return True if the device is in the "ready" state, else "false"
/// if the device is actively working (which also means that it's
/// awake).
static bool wake();
/// \brief Wait until all dispatched functors complete.
///
/// The parallel_for or parallel_reduce dispatch of a functor may
/// return asynchronously, before the functor completes. This
/// method does not return until all dispatched functors on this
/// device have completed.
static void fence();
/// \brief Free any resources being consumed by the device.
///
/// For the Threads device, this terminates spawned worker threads.
static void finalize();
/// \brief Print configuration information to the given output stream.
static void print_configuration( std::ostream & , const bool detail = false );
//@}
/*------------------------------------------------------------------------*/
/*------------------------------------------------------------------------*/
//! \name Space-specific functions
//@{
/** \brief Initialize the device in the "ready to work" state.
*
* The device is initialized in a "ready to work" or "awake" state.
* This state reduces latency and thus improves performance when
* dispatching work. However, the "awake" state consumes resources
* even when no work is being done. You may call sleep() to put
* the device in a "sleeping" state that does not consume as many
* resources, but it will take time (latency) to awaken the device
* again (via the wake()) method so that it is ready for work.
*
* Teams of threads are distributed as evenly as possible across
* the requested number of numa regions and cores per numa region.
* A team will not be split across a numa region.
*
* If the 'use_' arguments are not supplied the hwloc is queried
* to use all available cores.
*/
static void initialize( unsigned threads_count = 0 ,
unsigned use_numa_count = 0 ,
unsigned use_cores_per_numa = 0 ,
bool allow_asynchronous_threadpool = false );
static int is_initialized();
/** \brief Return the maximum amount of concurrency. */
static int concurrency();
static Threads & instance( int = 0 );
//----------------------------------------
static int thread_pool_size( int depth = 0 );
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
static int thread_pool_rank();
#else
KOKKOS_INLINE_FUNCTION static int thread_pool_rank() { return 0 ; }
#endif
inline static unsigned max_hardware_threads() { return thread_pool_size(0); }
KOKKOS_INLINE_FUNCTION static unsigned hardware_thread_id() { return thread_pool_rank(); }
//@}
//----------------------------------------
};
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
template<>
struct MemorySpaceAccess
< Kokkos::Threads::memory_space
, Kokkos::Threads::scratch_memory_space
>
{
enum { assignable = false };
enum { accessible = true };
enum { deepcopy = false };
};
template<>
struct VerifyExecutionCanAccessMemorySpace
< Kokkos::Threads::memory_space
, Kokkos::Threads::scratch_memory_space
>
{
enum { value = true };
inline static void verify( void ) { }
inline static void verify( const void * ) { }
};
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
#include <Kokkos_ExecPolicy.hpp>
#include <Kokkos_Parallel.hpp>
#include <Threads/Kokkos_ThreadsExec.hpp>
#include <Threads/Kokkos_ThreadsTeam.hpp>
#include <Threads/Kokkos_Threads_Parallel.hpp>
#include <KokkosExp_MDRangePolicy.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_ENABLE_PTHREAD ) */
#endif /* #define KOKKOS_THREADS_HPP */
-
diff --git a/lib/kokkos/core/src/Makefile b/lib/kokkos/core/src/Makefile
index 316f61fd4..0668f89c8 100644
--- a/lib/kokkos/core/src/Makefile
+++ b/lib/kokkos/core/src/Makefile
@@ -1,144 +1,200 @@
ifndef KOKKOS_PATH
MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
KOKKOS_PATH = $(subst Makefile,,$(MAKEFILE_PATH))../..
endif
PREFIX ?= /usr/local/lib/kokkos
default: messages build-lib
echo "End Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
else
CXX = g++
endif
CXXFLAGS = -O3
LINK ?= $(CXX)
LDFLAGS ?=
include $(KOKKOS_PATH)/Makefile.kokkos
PWD = $(shell pwd)
KOKKOS_HEADERS_INCLUDE = $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
KOKKOS_HEADERS_INCLUDE_IMPL = $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
KOKKOS_HEADERS_INCLUDE += $(wildcard $(KOKKOS_PATH)/containers/src/*.hpp)
KOKKOS_HEADERS_INCLUDE_IMPL += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.hpp)
KOKKOS_HEADERS_INCLUDE += $(wildcard $(KOKKOS_PATH)/algorithms/src/*.hpp)
CONDITIONAL_COPIES =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
- KOKKOS_HEADERS_CUDA += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
- CONDITIONAL_COPIES += copy-cuda
+ KOKKOS_HEADERS_CUDA += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
+ CONDITIONAL_COPIES += copy-cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
- KOKKOS_HEADERS_THREADS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
- CONDITIONAL_COPIES += copy-threads
+ KOKKOS_HEADERS_THREADS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
+ CONDITIONAL_COPIES += copy-threads
endif
-ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
- KOKKOS_HEADERS_QTHREAD += $(wildcard $(KOKKOS_PATH)/core/src/Qthread/*.hpp)
- CONDITIONAL_COPIES += copy-qthread
+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
+ KOKKOS_HEADERS_QTHREADS += $(wildcard $(KOKKOS_PATH)/core/src/Qthreads/*.hpp)
+ CONDITIONAL_COPIES += copy-qthreads
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
- KOKKOS_HEADERS_OPENMP += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
- CONDITIONAL_COPIES += copy-openmp
+ KOKKOS_HEADERS_OPENMP += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
+ CONDITIONAL_COPIES += copy-openmp
endif
ifeq ($(KOKKOS_OS),CYGWIN)
COPY_FLAG = -u
endif
ifeq ($(KOKKOS_OS),Linux)
COPY_FLAG = -u
endif
ifeq ($(KOKKOS_OS),Darwin)
COPY_FLAG =
endif
+ifeq ($(KOKKOS_DEBUG),"no")
+ KOKKOS_DEBUG_CMAKE = OFF
+else
+ KOKKOS_DEBUG_CMAKE = ON
+endif
+
messages:
echo "Start Build"
build-makefile-kokkos:
rm -f Makefile.kokkos
echo "#Global Settings used to generate this library" >> Makefile.kokkos
echo "KOKKOS_PATH = $(PREFIX)" >> Makefile.kokkos
echo "KOKKOS_DEVICES = $(KOKKOS_DEVICES)" >> Makefile.kokkos
echo "KOKKOS_ARCH = $(KOKKOS_ARCH)" >> Makefile.kokkos
echo "KOKKOS_DEBUG = $(KOKKOS_DEBUG)" >> Makefile.kokkos
echo "KOKKOS_USE_TPLS = $(KOKKOS_USE_TPLS)" >> Makefile.kokkos
echo "KOKKOS_CXX_STANDARD = $(KOKKOS_CXX_STANDARD)" >> Makefile.kokkos
echo "KOKKOS_OPTIONS = $(KOKKOS_OPTIONS)" >> Makefile.kokkos
echo "KOKKOS_CUDA_OPTIONS = $(KOKKOS_CUDA_OPTIONS)" >> Makefile.kokkos
echo "CXX ?= $(CXX)" >> Makefile.kokkos
echo "NVCC_WRAPPER ?= $(PREFIX)/bin/nvcc_wrapper" >> Makefile.kokkos
echo "" >> Makefile.kokkos
echo "#Source and Header files of Kokkos relative to KOKKOS_PATH" >> Makefile.kokkos
echo "KOKKOS_HEADERS = $(KOKKOS_HEADERS)" >> Makefile.kokkos
echo "KOKKOS_SRC = $(KOKKOS_SRC)" >> Makefile.kokkos
echo "" >> Makefile.kokkos
echo "#Variables used in application Makefiles" >> Makefile.kokkos
echo "KOKKOS_CPP_DEPENDS = $(KOKKOS_CPP_DEPENDS)" >> Makefile.kokkos
echo "KOKKOS_CXXFLAGS = $(KOKKOS_CXXFLAGS)" >> Makefile.kokkos
echo "KOKKOS_CPPFLAGS = $(KOKKOS_CPPFLAGS)" >> Makefile.kokkos
echo "KOKKOS_LINK_DEPENDS = $(KOKKOS_LINK_DEPENDS)" >> Makefile.kokkos
echo "KOKKOS_LIBS = $(KOKKOS_LIBS)" >> Makefile.kokkos
echo "KOKKOS_LDFLAGS = $(KOKKOS_LDFLAGS)" >> Makefile.kokkos
echo "" >> Makefile.kokkos
echo "#Internal settings which need to propagated for Kokkos examples" >> Makefile.kokkos
echo "KOKKOS_INTERNAL_USE_CUDA = ${KOKKOS_INTERNAL_USE_CUDA}" >> Makefile.kokkos
+ echo "KOKKOS_INTERNAL_USE_QTHREADS = ${KOKKOS_INTERNAL_USE_QTHREADS}" >> Makefile.kokkos
echo "KOKKOS_INTERNAL_USE_OPENMP = ${KOKKOS_INTERNAL_USE_OPENMP}" >> Makefile.kokkos
echo "KOKKOS_INTERNAL_USE_PTHREADS = ${KOKKOS_INTERNAL_USE_PTHREADS}" >> Makefile.kokkos
echo "" >> Makefile.kokkos
echo "#Fake kokkos-clean target" >> Makefile.kokkos
echo "kokkos-clean:" >> Makefile.kokkos
echo "" >> Makefile.kokkos
sed \
-e 's|$(KOKKOS_PATH)/core/src|$(PREFIX)/include|g' \
-e 's|$(KOKKOS_PATH)/containers/src|$(PREFIX)/include|g' \
-e 's|$(KOKKOS_PATH)/algorithms/src|$(PREFIX)/include|g' \
-e 's|-L$(PWD)|-L$(PREFIX)/lib|g' \
-e 's|= libkokkos.a|= $(PREFIX)/lib/libkokkos.a|g' \
-e 's|= KokkosCore_config.h|= $(PREFIX)/include/KokkosCore_config.h|g' Makefile.kokkos \
> Makefile.kokkos.tmp
mv -f Makefile.kokkos.tmp Makefile.kokkos
-build-lib: build-makefile-kokkos $(KOKKOS_LINK_DEPENDS)
+build-cmake-kokkos:
+ rm -f kokkos.cmake
+ echo "#Global Settings used to generate this library" >> kokkos.cmake
+ echo "set(KOKKOS_PATH $(PREFIX) CACHE PATH \"Kokkos installation path\")" >> kokkos.cmake
+ echo "set(KOKKOS_DEVICES $(KOKKOS_DEVICES) CACHE STRING \"Kokkos devices list\")" >> kokkos.cmake
+ echo "set(KOKKOS_ARCH $(KOKKOS_ARCH) CACHE STRING \"Kokkos architecture flags\")" >> kokkos.cmake
+ echo "set(KOKKOS_DEBUG $(KOKKOS_DEBUG_CMAKE) CACHE BOOL \"Kokkos debug enabled ?)\")" >> kokkos.cmake
+ echo "set(KOKKOS_USE_TPLS $(KOKKOS_USE_TPLS) CACHE STRING \"Kokkos templates list\")" >> kokkos.cmake
+ echo "set(KOKKOS_CXX_STANDARD $(KOKKOS_CXX_STANDARD) CACHE STRING \"Kokkos C++ standard\")" >> kokkos.cmake
+ echo "set(KOKKOS_OPTIONS $(KOKKOS_OPTIONS) CACHE STRING \"Kokkos options\")" >> kokkos.cmake
+ echo "set(KOKKOS_CUDA_OPTIONS $(KOKKOS_CUDA_OPTIONS) CACHE STRING \"Kokkos Cuda options\")" >> kokkos.cmake
+ echo "if(NOT $ENV{CXX})" >> kokkos.cmake
+ echo ' message(WARNING "You are currently using compiler $${CMAKE_CXX_COMPILER} while Kokkos was built with $(CXX) ; make sure this is the behavior you intended to be.")' >> kokkos.cmake
+ echo "endif()" >> kokkos.cmake
+ echo "if(NOT DEFINED ENV{NVCC_WRAPPER})" >> kokkos.cmake
+ echo " set(NVCC_WRAPPER \"$(NVCC_WRAPPER)\" CACHE FILEPATH \"Path to command nvcc_wrapper\")" >> kokkos.cmake
+ echo "else()" >> kokkos.cmake
+ echo ' set(NVCC_WRAPPER $$ENV{NVCC_WRAPPER} CACHE FILEPATH "Path to command nvcc_wrapper")' >> kokkos.cmake
+ echo "endif()" >> kokkos.cmake
+ echo "" >> kokkos.cmake
+ echo "#Source and Header files of Kokkos relative to KOKKOS_PATH" >> kokkos.cmake
+ echo "set(KOKKOS_HEADERS \"$(KOKKOS_HEADERS)\" CACHE STRING \"Kokkos headers list\")" >> kokkos.cmake
+ echo "set(KOKKOS_SRC \"$(KOKKOS_SRC)\" CACHE STRING \"Kokkos source list\")" >> kokkos.cmake
+ echo "" >> kokkos.cmake
+ echo "#Variables used in application Makefiles" >> kokkos.cmake
+ echo "set(KOKKOS_CPP_DEPENDS \"$(KOKKOS_CPP_DEPENDS)\" CACHE STRING \"\")" >> kokkos.cmake
+ echo "set(KOKKOS_CXXFLAGS \"$(KOKKOS_CXXFLAGS)\" CACHE STRING \"\")" >> kokkos.cmake
+ echo "set(KOKKOS_CPPFLAGS \"$(KOKKOS_CPPFLAGS)\" CACHE STRING \"\")" >> kokkos.cmake
+ echo "set(KOKKOS_LINK_DEPENDS \"$(KOKKOS_LINK_DEPENDS)\" CACHE STRING \"\")" >> kokkos.cmake
+ echo "set(KOKKOS_LIBS \"$(KOKKOS_LIBS)\" CACHE STRING \"\")" >> kokkos.cmake
+ echo "set(KOKKOS_LDFLAGS \"$(KOKKOS_LDFLAGS)\" CACHE STRING \"\")" >> kokkos.cmake
+ echo "" >> kokkos.cmake
+ echo "#Internal settings which need to propagated for Kokkos examples" >> kokkos.cmake
+ echo "set(KOKKOS_INTERNAL_USE_CUDA \"${KOKKOS_INTERNAL_USE_CUDA}\" CACHE STRING \"\")" >> kokkos.cmake
+ echo "set(KOKKOS_INTERNAL_USE_OPENMP \"${KOKKOS_INTERNAL_USE_OPENMP}\" CACHE STRING \"\")" >> kokkos.cmake
+ echo "set(KOKKOS_INTERNAL_USE_PTHREADS \"${KOKKOS_INTERNAL_USE_PTHREADS}\" CACHE STRING \"\")" >> kokkos.cmake
+ echo "mark_as_advanced(KOKKOS_HEADERS KOKKOS_SRC KOKKOS_INTERNAL_USE_CUDA KOKKOS_INTERNAL_USE_OPENMP KOKKOS_INTERNAL_USE_PTHREADS)" >> kokkos.cmake
+ echo "" >> kokkos.cmake
+ sed \
+ -e 's|$(KOKKOS_PATH)/core/src|$(PREFIX)/include|g' \
+ -e 's|$(KOKKOS_PATH)/containers/src|$(PREFIX)/include|g' \
+ -e 's|$(KOKKOS_PATH)/algorithms/src|$(PREFIX)/include|g' \
+ -e 's|-L$(PWD)|-L$(PREFIX)/lib|g' \
+ -e 's|= libkokkos.a|= $(PREFIX)/lib/libkokkos.a|g' \
+ -e 's|= KokkosCore_config.h|= $(PREFIX)/include/KokkosCore_config.h|g' kokkos.cmake \
+ > kokkos.cmake.tmp
+ mv -f kokkos.cmake.tmp kokkos.cmake
+
+build-lib: build-makefile-kokkos build-cmake-kokkos $(KOKKOS_LINK_DEPENDS)
mkdir:
mkdir -p $(PREFIX)
mkdir -p $(PREFIX)/bin
mkdir -p $(PREFIX)/include
mkdir -p $(PREFIX)/lib
mkdir -p $(PREFIX)/include/impl
copy-cuda: mkdir
mkdir -p $(PREFIX)/include/Cuda
cp $(COPY_FLAG) $(KOKKOS_HEADERS_CUDA) $(PREFIX)/include/Cuda
copy-threads: mkdir
mkdir -p $(PREFIX)/include/Threads
cp $(COPY_FLAG) $(KOKKOS_HEADERS_THREADS) $(PREFIX)/include/Threads
-copy-qthread: mkdir
- mkdir -p $(PREFIX)/include/Qthread
- cp $(COPY_FLAG) $(KOKKOS_HEADERS_QTHREAD) $(PREFIX)/include/Qthread
+copy-qthreads: mkdir
+ mkdir -p $(PREFIX)/include/Qthreads
+ cp $(COPY_FLAG) $(KOKKOS_HEADERS_QTHREADS) $(PREFIX)/include/Qthreads
copy-openmp: mkdir
mkdir -p $(PREFIX)/include/OpenMP
cp $(COPY_FLAG) $(KOKKOS_HEADERS_OPENMP) $(PREFIX)/include/OpenMP
install: mkdir $(CONDITIONAL_COPIES) build-lib
cp $(COPY_FLAG) $(NVCC_WRAPPER) $(PREFIX)/bin
cp $(COPY_FLAG) $(KOKKOS_HEADERS_INCLUDE) $(PREFIX)/include
cp $(COPY_FLAG) $(KOKKOS_HEADERS_INCLUDE_IMPL) $(PREFIX)/include/impl
cp $(COPY_FLAG) Makefile.kokkos $(PREFIX)
+ cp $(COPY_FLAG) kokkos.cmake $(PREFIX)
cp $(COPY_FLAG) libkokkos.a $(PREFIX)/lib
cp $(COPY_FLAG) KokkosCore_config.h $(PREFIX)/include
clean: kokkos-clean
rm -f Makefile.kokkos
diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel.hpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel.hpp
index a61791ca9..ecacffb77 100644
--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel.hpp
+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Parallel.hpp
@@ -1,750 +1,853 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_OPENMP_PARALLEL_HPP
#define KOKKOS_OPENMP_PARALLEL_HPP
#include <omp.h>
#include <iostream>
-#include <Kokkos_Parallel.hpp>
#include <OpenMP/Kokkos_OpenMPexec.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType
, Kokkos::RangePolicy< Traits ... >
, Kokkos::OpenMP
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::member_type Member ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend )
{
#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
#endif
for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
functor( iwork );
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend )
{
const TagType t{} ;
#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
#endif
for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
functor( t , iwork );
}
}
public:
- inline void execute() const {
- this->template execute_schedule<typename Policy::schedule_type::type>();
- }
-
- template<class Schedule>
- inline
- typename std::enable_if< std::is_same<Schedule,Kokkos::Static>::value >::type
- execute_schedule() const
+ inline void execute() const
{
+ enum { is_dynamic = std::is_same< typename Policy::schedule_type::type
+ , Kokkos::Dynamic >::value };
+
OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_for");
OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_for");
#pragma omp parallel
{
- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
-
- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
+ HostThreadTeamData & data = *OpenMPexec::get_thread_data();
- ParallelFor::template exec_range< WorkTag >( m_functor , range.begin() , range.end() );
- }
-/* END #pragma omp parallel */
- }
+ data.set_work_partition( m_policy.end() - m_policy.begin()
+ , m_policy.chunk_size() );
- template<class Schedule>
- inline
- typename std::enable_if< std::is_same<Schedule,Kokkos::Dynamic>::value >::type
- execute_schedule() const
- {
- OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_for");
- OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_for");
+ if ( is_dynamic ) {
+ // Make sure work partition is set before stealing
+ if ( data.pool_rendezvous() ) data.pool_rendezvous_release();
+ }
-#pragma omp parallel
- {
- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
+ std::pair<int64_t,int64_t> range(0,0);
- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
+ do {
- exec.set_work_range(range.begin(),range.end(),m_policy.chunk_size());
- exec.reset_steal_target();
- #pragma omp barrier
-
- long work_index = exec.get_work_index();
+ range = is_dynamic ? data.get_work_stealing_chunk()
+ : data.get_work_partition();
- while(work_index != -1) {
- const Member begin = static_cast<Member>(work_index) * m_policy.chunk_size();
- const Member end = begin + m_policy.chunk_size() < m_policy.end()?begin+m_policy.chunk_size():m_policy.end();
- ParallelFor::template exec_range< WorkTag >( m_functor , begin, end );
- work_index = exec.get_work_index();
- }
+ ParallelFor::template
+ exec_range< WorkTag >( m_functor
+ , range.first + m_policy.begin()
+ , range.second + m_policy.begin() );
+ } while ( is_dynamic && 0 <= range.first );
}
-/* END #pragma omp parallel */
+ // END #pragma omp parallel
}
inline
ParallelFor( const FunctorType & arg_functor
, Policy arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
{}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ReducerType, class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::RangePolicy< Traits ...>
, ReducerType
, Kokkos::OpenMP
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::member_type Member ;
+ typedef FunctorAnalysis< FunctorPatternInterface::REDUCE , Policy , FunctorType > Analysis ;
+
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
// Static Assert WorkTag void if ReducerType not InvalidType
- typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
- typedef typename ValueTraits::pointer_type pointer_type ;
- typedef typename ValueTraits::reference_type reference_type ;
+ typedef typename Analysis::pointer_type pointer_type ;
+ typedef typename Analysis::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update )
{
#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
#endif
for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
functor( iwork , update );
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update )
{
const TagType t{} ;
#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
#endif
for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
functor( t , iwork , update );
}
}
public:
- inline void execute() const {
- this->template execute_schedule<typename Policy::schedule_type::type>();
- }
-
- template<class Schedule>
- inline
- typename std::enable_if< std::is_same<Schedule,Kokkos::Static>::value >::type
- execute_schedule() const
+ inline void execute() const
{
- OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_reduce");
- OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_reduce");
+ enum { is_dynamic = std::is_same< typename Policy::schedule_type::type
+ , Kokkos::Dynamic >::value };
- OpenMPexec::resize_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
+ OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_for");
+ OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_for");
+
+ const size_t pool_reduce_bytes =
+ Analysis::value_size( ReducerConditional::select(m_functor, m_reducer));
+
+ OpenMPexec::resize_thread_data( pool_reduce_bytes
+ , 0 // team_reduce_bytes
+ , 0 // team_shared_bytes
+ , 0 // thread_local_bytes
+ );
#pragma omp parallel
{
- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
- ParallelReduce::template exec_range< WorkTag >
- ( m_functor , range.begin() , range.end()
- , ValueInit::init( ReducerConditional::select(m_functor , m_reducer), exec.scratch_reduce() ) );
- }
-/* END #pragma omp parallel */
+ HostThreadTeamData & data = *OpenMPexec::get_thread_data();
- // Reduction:
+ data.set_work_partition( m_policy.end() - m_policy.begin()
+ , m_policy.chunk_size() );
- const pointer_type ptr = pointer_type( OpenMPexec::pool_rev(0)->scratch_reduce() );
+ if ( is_dynamic ) {
+ // Make sure work partition is set before stealing
+ if ( data.pool_rendezvous() ) data.pool_rendezvous_release();
+ }
- for ( int i = 1 ; i < OpenMPexec::pool_size() ; ++i ) {
- ValueJoin::join( ReducerConditional::select(m_functor , m_reducer) , ptr , OpenMPexec::pool_rev(i)->scratch_reduce() );
- }
+ reference_type update =
+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer)
+ , data.pool_reduce_local() );
- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
+ std::pair<int64_t,int64_t> range(0,0);
- if ( m_result_ptr ) {
- const int n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
+ do {
- for ( int j = 0 ; j < n ; ++j ) { m_result_ptr[j] = ptr[j] ; }
- }
- }
+ range = is_dynamic ? data.get_work_stealing_chunk()
+ : data.get_work_partition();
- template<class Schedule>
- inline
- typename std::enable_if< std::is_same<Schedule,Kokkos::Dynamic>::value >::type
- execute_schedule() const
- {
- OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_reduce");
- OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_reduce");
+ ParallelReduce::template
+ exec_range< WorkTag >( m_functor
+ , range.first + m_policy.begin()
+ , range.second + m_policy.begin()
+ , update );
- OpenMPexec::resize_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
-
-#pragma omp parallel
- {
- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
-
- exec.set_work_range(range.begin(),range.end(),m_policy.chunk_size());
- exec.reset_steal_target();
- #pragma omp barrier
-
- long work_index = exec.get_work_index();
-
- reference_type update = ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , exec.scratch_reduce() );
- while(work_index != -1) {
- const Member begin = static_cast<Member>(work_index) * m_policy.chunk_size();
- const Member end = begin + m_policy.chunk_size() < m_policy.end()?begin+m_policy.chunk_size():m_policy.end();
- ParallelReduce::template exec_range< WorkTag >
- ( m_functor , begin,end
- , update );
- work_index = exec.get_work_index();
- }
+ } while ( is_dynamic && 0 <= range.first );
}
-/* END #pragma omp parallel */
+// END #pragma omp parallel
// Reduction:
- const pointer_type ptr = pointer_type( OpenMPexec::pool_rev(0)->scratch_reduce() );
+ const pointer_type ptr = pointer_type( OpenMPexec::get_thread_data(0)->pool_reduce_local() );
for ( int i = 1 ; i < OpenMPexec::pool_size() ; ++i ) {
- ValueJoin::join( ReducerConditional::select(m_functor , m_reducer) , ptr , OpenMPexec::pool_rev(i)->scratch_reduce() );
+ ValueJoin::join( ReducerConditional::select(m_functor , m_reducer)
+ , ptr
+ , OpenMPexec::get_thread_data(i)->pool_reduce_local() );
}
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
if ( m_result_ptr ) {
- const int n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
+ const int n = Analysis::value_count( ReducerConditional::select(m_functor , m_reducer) );
for ( int j = 0 ; j < n ; ++j ) { m_result_ptr[j] = ptr[j] ; }
}
}
//----------------------------------------
template< class ViewType >
inline
ParallelReduce( const FunctorType & arg_functor
, Policy arg_policy
, const ViewType & arg_result_view
, typename std::enable_if<
Kokkos::is_view< ViewType >::value &&
!Kokkos::is_reducer_type<ReducerType>::value
,void*>::type = NULL)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result_view.data() )
{
/*static_assert( std::is_same< typename ViewType::memory_space
, Kokkos::HostSpace >::value
, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
}
inline
ParallelReduce( const FunctorType & arg_functor
, Policy arg_policy
, const ReducerType& reducer )
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().data() )
{
/*static_assert( std::is_same< typename ViewType::memory_space
, Kokkos::HostSpace >::value
, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelScan< FunctorType
, Kokkos::RangePolicy< Traits ... >
, Kokkos::OpenMP
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
+ typedef FunctorAnalysis< FunctorPatternInterface::SCAN , Policy , FunctorType > Analysis ;
+
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::member_type Member ;
- typedef Kokkos::Impl::FunctorValueTraits< FunctorType, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< FunctorType, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< FunctorType, WorkTag > ValueJoin ;
typedef Kokkos::Impl::FunctorValueOps< FunctorType, WorkTag > ValueOps ;
- typedef typename ValueTraits::pointer_type pointer_type ;
- typedef typename ValueTraits::reference_type reference_type ;
+ typedef typename Analysis::pointer_type pointer_type ;
+ typedef typename Analysis::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update , const bool final )
{
#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
#endif
for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
functor( iwork , update , final );
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update , const bool final )
{
const TagType t{} ;
#ifdef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
#endif
for ( Member iwork = ibeg ; iwork < iend ; ++iwork ) {
functor( t , iwork , update , final );
}
}
public:
inline
void execute() const
{
OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_scan");
OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_scan");
- OpenMPexec::resize_scratch( 2 * ValueTraits::value_size( m_functor ) , 0 );
+ const int value_count = Analysis::value_count( m_functor );
+ const size_t pool_reduce_bytes = 2 * Analysis::value_size( m_functor );
+
+ OpenMPexec::resize_thread_data( pool_reduce_bytes
+ , 0 // team_reduce_bytes
+ , 0 // team_shared_bytes
+ , 0 // thread_local_bytes
+ );
#pragma omp parallel
{
- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
- const pointer_type ptr =
- pointer_type( exec.scratch_reduce() ) +
- ValueTraits::value_count( m_functor );
+ HostThreadTeamData & data = *OpenMPexec::get_thread_data();
+
+ const WorkRange range( m_policy, data.pool_rank(), data.pool_size() );
+
+ reference_type update_sum =
+ ValueInit::init( m_functor , data.pool_reduce_local() );
+
ParallelScan::template exec_range< WorkTag >
- ( m_functor , range.begin() , range.end()
- , ValueInit::init( m_functor , ptr ) , false );
- }
-/* END #pragma omp parallel */
+ ( m_functor , range.begin() , range.end() , update_sum , false );
- {
- const unsigned thread_count = OpenMPexec::pool_size();
- const unsigned value_count = ValueTraits::value_count( m_functor );
+ if ( data.pool_rendezvous() ) {
- pointer_type ptr_prev = 0 ;
+ pointer_type ptr_prev = 0 ;
- for ( unsigned rank_rev = thread_count ; rank_rev-- ; ) {
+ const int n = data.pool_size();
- pointer_type ptr = pointer_type( OpenMPexec::pool_rev(rank_rev)->scratch_reduce() );
+ for ( int i = 0 ; i < n ; ++i ) {
- if ( ptr_prev ) {
- for ( unsigned i = 0 ; i < value_count ; ++i ) { ptr[i] = ptr_prev[ i + value_count ] ; }
- ValueJoin::join( m_functor , ptr + value_count , ptr );
- }
- else {
- ValueInit::init( m_functor , ptr );
+ pointer_type ptr = (pointer_type)
+ data.pool_member(i)->pool_reduce_local();
+
+ if ( i ) {
+ for ( int j = 0 ; j < value_count ; ++j ) {
+ ptr[j+value_count] = ptr_prev[j+value_count] ;
+ }
+ ValueJoin::join( m_functor , ptr + value_count , ptr_prev );
+ }
+ else {
+ ValueInit::init( m_functor , ptr + value_count );
+ }
+
+ ptr_prev = ptr ;
}
- ptr_prev = ptr ;
+ data.pool_rendezvous_release();
}
- }
-#pragma omp parallel
- {
- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
- const WorkRange range( m_policy, exec.pool_rank(), exec.pool_size() );
- const pointer_type ptr = pointer_type( exec.scratch_reduce() );
+ reference_type update_base =
+ ValueOps::reference
+ ( ((pointer_type)data.pool_reduce_local()) + value_count );
+
ParallelScan::template exec_range< WorkTag >
- ( m_functor , range.begin() , range.end()
- , ValueOps::reference( ptr ) , true );
+ ( m_functor , range.begin() , range.end() , update_base , true );
}
/* END #pragma omp parallel */
+
}
//----------------------------------------
inline
ParallelScan( const FunctorType & arg_functor
, const Policy & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
{}
//----------------------------------------
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Properties >
class ParallelFor< FunctorType
, Kokkos::TeamPolicy< Properties ... >
, Kokkos::OpenMP
>
{
private:
+ enum { TEAM_REDUCE_SIZE = 512 };
+
typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::OpenMP, Properties ... > Policy ;
- typedef typename Policy::work_tag WorkTag ;
- typedef typename Policy::member_type Member ;
+ typedef typename Policy::work_tag WorkTag ;
+ typedef typename Policy::schedule_type::type SchedTag ;
+ typedef typename Policy::member_type Member ;
const FunctorType m_functor ;
const Policy m_policy ;
const int m_shmem_size ;
- template< class TagType, class Schedule >
+ template< class TagType >
inline static
- typename std::enable_if< std::is_same< TagType , void >::value && std::is_same<Schedule,Kokkos::Static>::value>::type
- exec_team( const FunctorType & functor , Member member )
+ typename std::enable_if< ( std::is_same< TagType , void >::value ) >::type
+ exec_team( const FunctorType & functor
+ , HostThreadTeamData & data
+ , const int league_rank_begin
+ , const int league_rank_end
+ , const int league_size )
{
- for ( ; member.valid_static() ; member.next_static() ) {
- functor( member );
- }
- }
+ for ( int r = league_rank_begin ; r < league_rank_end ; ) {
- template< class TagType, class Schedule >
- inline static
- typename std::enable_if< (! std::is_same< TagType , void >::value) && std::is_same<Schedule,Kokkos::Static>::value >::type
- exec_team( const FunctorType & functor , Member member )
- {
- const TagType t{} ;
- for ( ; member.valid_static() ; member.next_static() ) {
- functor( t , member );
- }
- }
+ functor( Member( data, r , league_size ) );
- template< class TagType, class Schedule >
- inline static
- typename std::enable_if< std::is_same< TagType , void >::value && std::is_same<Schedule,Kokkos::Dynamic>::value>::type
- exec_team( const FunctorType & functor , Member member )
- {
- #pragma omp barrier
- for ( ; member.valid_dynamic() ; member.next_dynamic() ) {
- functor( member );
+ if ( ++r < league_rank_end ) {
+ // Don't allow team members to lap one another
+ // so that they don't overwrite shared memory.
+ if ( data.team_rendezvous() ) { data.team_rendezvous_release(); }
+ }
}
}
- template< class TagType, class Schedule >
+
+ template< class TagType >
inline static
- typename std::enable_if< (! std::is_same< TagType , void >::value) && std::is_same<Schedule,Kokkos::Dynamic>::value >::type
- exec_team( const FunctorType & functor , Member member )
+ typename std::enable_if< ( ! std::is_same< TagType , void >::value ) >::type
+ exec_team( const FunctorType & functor
+ , HostThreadTeamData & data
+ , const int league_rank_begin
+ , const int league_rank_end
+ , const int league_size )
{
- #pragma omp barrier
- const TagType t{} ;
- for ( ; member.valid_dynamic() ; member.next_dynamic() ) {
- functor( t , member );
+ const TagType t{};
+
+ for ( int r = league_rank_begin ; r < league_rank_end ; ) {
+
+ functor( t , Member( data, r , league_size ) );
+
+ if ( ++r < league_rank_end ) {
+ // Don't allow team members to lap one another
+ // so that they don't overwrite shared memory.
+ if ( data.team_rendezvous() ) { data.team_rendezvous_release(); }
+ }
}
}
public:
inline
void execute() const
{
+ enum { is_dynamic = std::is_same< SchedTag , Kokkos::Dynamic >::value };
+
OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_for");
OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_for");
- const size_t team_reduce_size = Policy::member_type::team_reduce_size();
+ const size_t pool_reduce_size = 0 ; // Never shrinks
+ const size_t team_reduce_size = TEAM_REDUCE_SIZE * m_policy.team_size();
+ const size_t team_shared_size = m_shmem_size + m_policy.scratch_size(1);
+ const size_t thread_local_size = 0 ; // Never shrinks
- OpenMPexec::resize_scratch( 0 , team_reduce_size + m_shmem_size + m_policy.scratch_size(1));
+ OpenMPexec::resize_thread_data( pool_reduce_size
+ , team_reduce_size
+ , team_shared_size
+ , thread_local_size );
#pragma omp parallel
{
- ParallelFor::template exec_team< WorkTag, typename Policy::schedule_type::type>
- ( m_functor
- , Member( * OpenMPexec::get_thread_omp(), m_policy, m_shmem_size, 0) );
+ HostThreadTeamData & data = *OpenMPexec::get_thread_data();
+
+ const int active = data.organize_team( m_policy.team_size() );
+
+ if ( active ) {
+ data.set_work_partition( m_policy.league_size()
+ , ( 0 < m_policy.chunk_size()
+ ? m_policy.chunk_size()
+ : m_policy.team_iter() ) );
+ }
+
+ if ( is_dynamic ) {
+ // Must synchronize to make sure each team has set its
+ // partition before begining the work stealing loop.
+ if ( data.pool_rendezvous() ) data.pool_rendezvous_release();
+ }
+
+ if ( active ) {
+
+ std::pair<int64_t,int64_t> range(0,0);
+
+ do {
+
+ range = is_dynamic ? data.get_work_stealing_chunk()
+ : data.get_work_partition();
+
+ ParallelFor::template exec_team< WorkTag >
+ ( m_functor , data
+ , range.first , range.second , m_policy.league_size() );
+
+ } while ( is_dynamic && 0 <= range.first );
+ }
+
+ data.disband_team();
}
-/* END #pragma omp parallel */
+// END #pragma omp parallel
}
+
inline
ParallelFor( const FunctorType & arg_functor ,
const Policy & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
- , m_shmem_size( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , arg_policy.team_size() ) )
+ , m_shmem_size( arg_policy.scratch_size(0) +
+ arg_policy.scratch_size(1) +
+ FunctorTeamShmemSize< FunctorType >
+ ::value( arg_functor , arg_policy.team_size() ) )
{}
};
+//----------------------------------------------------------------------------
template< class FunctorType , class ReducerType, class ... Properties >
class ParallelReduce< FunctorType
, Kokkos::TeamPolicy< Properties ... >
, ReducerType
, Kokkos::OpenMP
>
{
private:
+ enum { TEAM_REDUCE_SIZE = 512 };
+
typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::OpenMP, Properties ... > Policy ;
- typedef typename Policy::work_tag WorkTag ;
- typedef typename Policy::member_type Member ;
+ typedef FunctorAnalysis< FunctorPatternInterface::REDUCE , Policy , FunctorType > Analysis ;
+
+ typedef typename Policy::work_tag WorkTag ;
+ typedef typename Policy::schedule_type::type SchedTag ;
+ typedef typename Policy::member_type Member ;
+
+ typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value
+ , FunctorType, ReducerType> ReducerConditional;
- typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
- typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd , WorkTag > ValueJoin ;
- typedef typename ValueTraits::pointer_type pointer_type ;
- typedef typename ValueTraits::reference_type reference_type ;
+ typedef typename Analysis::pointer_type pointer_type ;
+ typedef typename Analysis::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
const int m_shmem_size ;
template< class TagType >
inline static
- typename std::enable_if< std::is_same< TagType , void >::value >::type
- exec_team( const FunctorType & functor , Member member , reference_type update )
+ typename std::enable_if< ( std::is_same< TagType , void >::value ) >::type
+ exec_team( const FunctorType & functor
+ , HostThreadTeamData & data
+ , reference_type & update
+ , const int league_rank_begin
+ , const int league_rank_end
+ , const int league_size )
{
- for ( ; member.valid_static() ; member.next_static() ) {
- functor( member , update );
+ for ( int r = league_rank_begin ; r < league_rank_end ; ) {
+
+ functor( Member( data, r , league_size ) , update );
+
+ if ( ++r < league_rank_end ) {
+ // Don't allow team members to lap one another
+ // so that they don't overwrite shared memory.
+ if ( data.team_rendezvous() ) { data.team_rendezvous_release(); }
+ }
}
}
+
template< class TagType >
inline static
- typename std::enable_if< ! std::is_same< TagType , void >::value >::type
- exec_team( const FunctorType & functor , Member member , reference_type update )
+ typename std::enable_if< ( ! std::is_same< TagType , void >::value ) >::type
+ exec_team( const FunctorType & functor
+ , HostThreadTeamData & data
+ , reference_type & update
+ , const int league_rank_begin
+ , const int league_rank_end
+ , const int league_size )
{
- const TagType t{} ;
- for ( ; member.valid_static() ; member.next_static() ) {
- functor( t , member , update );
+ const TagType t{};
+
+ for ( int r = league_rank_begin ; r < league_rank_end ; ) {
+
+ functor( t , Member( data, r , league_size ) , update );
+
+ if ( ++r < league_rank_end ) {
+ // Don't allow team members to lap one another
+ // so that they don't overwrite shared memory.
+ if ( data.team_rendezvous() ) { data.team_rendezvous_release(); }
+ }
}
}
public:
inline
void execute() const
{
+ enum { is_dynamic = std::is_same< SchedTag , Kokkos::Dynamic >::value };
+
OpenMPexec::verify_is_process("Kokkos::OpenMP parallel_reduce");
+ OpenMPexec::verify_initialized("Kokkos::OpenMP parallel_reduce");
+
+ const size_t pool_reduce_size =
+ Analysis::value_size( ReducerConditional::select(m_functor, m_reducer));
- const size_t team_reduce_size = Policy::member_type::team_reduce_size();
+ const size_t team_reduce_size = TEAM_REDUCE_SIZE * m_policy.team_size();
+ const size_t team_shared_size = m_shmem_size + m_policy.scratch_size(1);
+ const size_t thread_local_size = 0 ; // Never shrinks
- OpenMPexec::resize_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , team_reduce_size + m_shmem_size );
+ OpenMPexec::resize_thread_data( pool_reduce_size
+ , team_reduce_size
+ , team_shared_size
+ , thread_local_size );
#pragma omp parallel
{
- OpenMPexec & exec = * OpenMPexec::get_thread_omp();
+ HostThreadTeamData & data = *OpenMPexec::get_thread_data();
- ParallelReduce::template exec_team< WorkTag >
- ( m_functor
- , Member( exec , m_policy , m_shmem_size, 0 )
- , ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , exec.scratch_reduce() ) );
- }
-/* END #pragma omp parallel */
+ const int active = data.organize_team( m_policy.team_size() );
- {
- const pointer_type ptr = pointer_type( OpenMPexec::pool_rev(0)->scratch_reduce() );
-
- int max_active_threads = OpenMPexec::pool_size();
- if( max_active_threads > m_policy.league_size()* m_policy.team_size() )
- max_active_threads = m_policy.league_size()* m_policy.team_size();
+ if ( active ) {
+ data.set_work_partition( m_policy.league_size()
+ , ( 0 < m_policy.chunk_size()
+ ? m_policy.chunk_size()
+ : m_policy.team_iter() ) );
+ }
- for ( int i = 1 ; i < max_active_threads ; ++i ) {
- ValueJoin::join( ReducerConditional::select(m_functor , m_reducer) , ptr , OpenMPexec::pool_rev(i)->scratch_reduce() );
+ if ( is_dynamic ) {
+ // Must synchronize to make sure each team has set its
+ // partition before begining the work stealing loop.
+ if ( data.pool_rendezvous() ) data.pool_rendezvous_release();
}
- Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
+ if ( active ) {
+ reference_type update =
+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer)
+ , data.pool_reduce_local() );
+
+ std::pair<int64_t,int64_t> range(0,0);
- if ( m_result_ptr ) {
- const int n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
+ do {
- for ( int j = 0 ; j < n ; ++j ) { m_result_ptr[j] = ptr[j] ; }
+ range = is_dynamic ? data.get_work_stealing_chunk()
+ : data.get_work_partition();
+
+ ParallelReduce::template exec_team< WorkTag >
+ ( m_functor , data , update
+ , range.first , range.second , m_policy.league_size() );
+
+ } while ( is_dynamic && 0 <= range.first );
+ } else {
+ ValueInit::init( ReducerConditional::select(m_functor , m_reducer)
+ , data.pool_reduce_local() );
}
+
+ data.disband_team();
+ }
+// END #pragma omp parallel
+
+ // Reduction:
+
+ const pointer_type ptr = pointer_type( OpenMPexec::get_thread_data(0)->pool_reduce_local() );
+
+ for ( int i = 1 ; i < OpenMPexec::pool_size() ; ++i ) {
+ ValueJoin::join( ReducerConditional::select(m_functor , m_reducer)
+ , ptr
+ , OpenMPexec::get_thread_data(i)->pool_reduce_local() );
+ }
+
+ Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
+
+ if ( m_result_ptr ) {
+ const int n = Analysis::value_count( ReducerConditional::select(m_functor , m_reducer) );
+
+ for ( int j = 0 ; j < n ; ++j ) { m_result_ptr[j] = ptr[j] ; }
}
}
+ //----------------------------------------
+
template< class ViewType >
inline
ParallelReduce( const FunctorType & arg_functor ,
const Policy & arg_policy ,
const ViewType & arg_result ,
typename std::enable_if<
Kokkos::is_view< ViewType >::value &&
!Kokkos::is_reducer_type<ReducerType>::value
,void*>::type = NULL)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result.ptr_on_device() )
- , m_shmem_size( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , arg_policy.team_size() ) )
+ , m_shmem_size( arg_policy.scratch_size(0) +
+ arg_policy.scratch_size(1) +
+ FunctorTeamShmemSize< FunctorType >
+ ::value( arg_functor , arg_policy.team_size() ) )
{}
inline
ParallelReduce( const FunctorType & arg_functor
, Policy arg_policy
, const ReducerType& reducer )
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().data() )
- , m_shmem_size( arg_policy.scratch_size(0) + arg_policy.scratch_size(1) + FunctorTeamShmemSize< FunctorType >::value( arg_functor , arg_policy.team_size() ) )
+ , m_shmem_size( arg_policy.scratch_size(0) +
+ arg_policy.scratch_size(1) +
+ FunctorTeamShmemSize< FunctorType >
+ ::value( arg_functor , arg_policy.team_size() ) )
{
/*static_assert( std::is_same< typename ViewType::memory_space
, Kokkos::HostSpace >::value
, "Reduction result on Kokkos::OpenMP must be a Kokkos::View in HostSpace" );*/
}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* KOKKOS_OPENMP_PARALLEL_HPP */
diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
index 5b3e9873e..9144d8c27 100644
--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
@@ -1,329 +1,316 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_ENABLE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG )
#include <impl/Kokkos_TaskQueue_impl.hpp>
+#include <impl/Kokkos_HostThreadTeam.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template class TaskQueue< Kokkos::OpenMP > ;
-//----------------------------------------------------------------------------
-
-TaskExec< Kokkos::OpenMP >::
-TaskExec()
- : m_self_exec( 0 )
- , m_team_exec( 0 )
- , m_sync_mask( 0 )
- , m_sync_value( 0 )
- , m_sync_step( 0 )
- , m_group_rank( 0 )
- , m_team_rank( 0 )
- , m_team_size( 1 )
-{
-}
-
-TaskExec< Kokkos::OpenMP >::
-TaskExec( Kokkos::Impl::OpenMPexec & arg_exec , int const arg_team_size )
- : m_self_exec( & arg_exec )
- , m_team_exec( arg_exec.pool_rev(arg_exec.pool_rank_rev() / arg_team_size) )
- , m_sync_mask( 0 )
- , m_sync_value( 0 )
- , m_sync_step( 0 )
- , m_group_rank( arg_exec.pool_rank_rev() / arg_team_size )
- , m_team_rank( arg_exec.pool_rank_rev() % arg_team_size )
- , m_team_size( arg_team_size )
-{
- // This team spans
- // m_self_exec->pool_rev( team_size * group_rank )
- // m_self_exec->pool_rev( team_size * ( group_rank + 1 ) - 1 )
-
- int64_t volatile * const sync = (int64_t *) m_self_exec->scratch_reduce();
-
- sync[0] = int64_t(0) ;
- sync[1] = int64_t(0) ;
-
- for ( int i = 0 ; i < m_team_size ; ++i ) {
- m_sync_value |= int64_t(1) << (8*i);
- m_sync_mask |= int64_t(3) << (8*i);
- }
+class HostThreadTeamDataSingleton : private HostThreadTeamData {
+private:
+
+ HostThreadTeamDataSingleton() : HostThreadTeamData()
+ {
+ Kokkos::OpenMP::memory_space space ;
+ const size_t num_pool_reduce_bytes = 32 ;
+ const size_t num_team_reduce_bytes = 32 ;
+ const size_t num_team_shared_bytes = 1024 ;
+ const size_t num_thread_local_bytes = 1024 ;
+ const size_t alloc_bytes =
+ HostThreadTeamData::scratch_size( num_pool_reduce_bytes
+ , num_team_reduce_bytes
+ , num_team_shared_bytes
+ , num_thread_local_bytes );
+
+ HostThreadTeamData::scratch_assign
+ ( space.allocate( alloc_bytes )
+ , alloc_bytes
+ , num_pool_reduce_bytes
+ , num_team_reduce_bytes
+ , num_team_shared_bytes
+ , num_thread_local_bytes );
+ }
+
+ ~HostThreadTeamDataSingleton()
+ {
+ Kokkos::OpenMP::memory_space space ;
+ space.deallocate( HostThreadTeamData::scratch_buffer()
+ , HostThreadTeamData::scratch_bytes() );
+ }
+
+public:
+
+ static HostThreadTeamData & singleton()
+ {
+ static HostThreadTeamDataSingleton s ;
+ return s ;
+ }
+};
- Kokkos::memory_fence();
-}
-
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+//----------------------------------------------------------------------------
-void TaskExec< Kokkos::OpenMP >::team_barrier_impl() const
+void TaskQueueSpecialization< Kokkos::OpenMP >::execute
+ ( TaskQueue< Kokkos::OpenMP > * const queue )
{
- if ( m_team_exec->scratch_reduce_size() < int(2 * sizeof(int64_t)) ) {
- Kokkos::abort("TaskQueue<OpenMP> scratch_reduce memory too small");
- }
+ using execution_space = Kokkos::OpenMP ;
+ using queue_type = TaskQueue< execution_space > ;
+ using task_root_type = TaskBase< execution_space , void , void > ;
+ using Member = Impl::HostThreadTeamMember< execution_space > ;
- // Use team shared memory to synchronize.
- // Alternate memory locations between barriers to avoid a sequence
- // of barriers overtaking one another.
+ static task_root_type * const end =
+ (task_root_type *) task_root_type::EndTag ;
- int64_t volatile * const sync =
- ((int64_t *) m_team_exec->scratch_reduce()) + ( m_sync_step & 0x01 );
+ HostThreadTeamData & team_data_single =
+ HostThreadTeamDataSingleton::singleton();
- // This team member sets one byte within the sync variable
- int8_t volatile * const sync_self =
- ((int8_t *) sync) + m_team_rank ;
+ const int team_size = Impl::OpenMPexec::pool_size(2); // Threads per core
+ // const int team_size = Impl::OpenMPexec::pool_size(1); // Threads per NUMA
#if 0
-fprintf( stdout
- , "barrier group(%d) member(%d) step(%d) wait(%lx) : before(%lx)\n"
- , m_group_rank
- , m_team_rank
- , m_sync_step
- , m_sync_value
- , *sync
- );
+fprintf(stdout,"TaskQueue<OpenMP> execute %d\n", team_size );
fflush(stdout);
#endif
- *sync_self = int8_t( m_sync_value & 0x03 ); // signal arrival
- while ( m_sync_value != *sync ); // wait for team to arrive
+#pragma omp parallel
+ {
+ Impl::HostThreadTeamData & self = *Impl::OpenMPexec::get_thread_data();
-#if 0
-fprintf( stdout
- , "barrier group(%d) member(%d) step(%d) wait(%lx) : after(%lx)\n"
- , m_group_rank
- , m_team_rank
- , m_sync_step
- , m_sync_value
- , *sync
- );
-fflush(stdout);
-#endif
+ // Organizing threads into a team performs a barrier across the
+ // entire pool to insure proper initialization of the team
+ // rendezvous mechanism before a team rendezvous can be performed.
- ++m_sync_step ;
+ if ( self.organize_team( team_size ) ) {
- if ( 0 == ( 0x01 & m_sync_step ) ) { // Every other step
- m_sync_value ^= m_sync_mask ;
- if ( 1000 < m_sync_step ) m_sync_step = 0 ;
- }
-}
+ Member single_exec( team_data_single );
+ Member team_exec( self );
+#if 0
+fprintf(stdout,"TaskQueue<OpenMP> pool(%d of %d) team(%d of %d) league(%d of %d) running\n"
+ , self.pool_rank()
+ , self.pool_size()
+ , team_exec.team_rank()
+ , team_exec.team_size()
+ , team_exec.league_rank()
+ , team_exec.league_size()
+ );
+fflush(stdout);
#endif
-//----------------------------------------------------------------------------
-
-void TaskQueueSpecialization< Kokkos::OpenMP >::execute
- ( TaskQueue< Kokkos::OpenMP > * const queue )
-{
- using execution_space = Kokkos::OpenMP ;
- using queue_type = TaskQueue< execution_space > ;
- using task_root_type = TaskBase< execution_space , void , void > ;
- using PoolExec = Kokkos::Impl::OpenMPexec ;
- using Member = TaskExec< execution_space > ;
+ // Loop until all queues are empty and no tasks in flight
- task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
+ task_root_type * task = 0 ;
- // Required: team_size <= 8
+ do {
+ // Each team lead attempts to acquire either a thread team task
+ // or a single thread task for the team.
- const int team_size = PoolExec::pool_size(2); // Threads per core
- // const int team_size = PoolExec::pool_size(1); // Threads per NUMA
+ if ( 0 == team_exec.team_rank() ) {
- if ( 8 < team_size ) {
- Kokkos::abort("TaskQueue<OpenMP> unsupported team size");
- }
+ bool leader_loop = false ;
-#pragma omp parallel
- {
- PoolExec & self = *PoolExec::get_thread_omp();
+ do {
- Member single_exec ;
- Member team_exec( self , team_size );
+ if ( 0 != task && end != task ) {
+ // team member #0 completes the previously executed task,
+ // completion may delete the task
+ queue->complete( task );
+ }
- // Team shared memory
- task_root_type * volatile * const task_shared =
- (task_root_type **) team_exec.m_team_exec->scratch_thread();
+ // If 0 == m_ready_count then set task = 0
-// Barrier across entire OpenMP thread pool to insure initialization
-#pragma omp barrier
+ task = 0 < *((volatile int *) & queue->m_ready_count) ? end : 0 ;
- // Loop until all queues are empty and no tasks in flight
+ // Attempt to acquire a task
+ // Loop by priority and then type
+ for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
+ for ( int j = 0 ; j < 2 && end == task ; ++j ) {
+ task = queue_type::pop_ready_task( & queue->m_ready[i][j] );
+ }
+ }
- do {
+ // If still tasks are still executing
+ // and no task could be acquired
+ // then continue this leader loop
+ leader_loop = end == task ;
- task_root_type * task = 0 ;
+ if ( ( ! leader_loop ) &&
+ ( 0 != task ) &&
+ ( task_root_type::TaskSingle == task->m_task_type ) ) {
- // Each team lead attempts to acquire either a thread team task
- // or a single thread task for the team.
+ // if a single thread task then execute now
- if ( 0 == team_exec.team_rank() ) {
+#if 0
+fprintf(stdout,"TaskQueue<OpenMP> pool(%d of %d) executing single task 0x%lx\n"
+ , self.pool_rank()
+ , self.pool_size()
+ , int64_t(task)
+ );
+fflush(stdout);
+#endif
- task = 0 < *((volatile int *) & queue->m_ready_count) ? end : 0 ;
+ (*task->m_apply)( task , & single_exec );
- // Loop by priority and then type
- for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
- for ( int j = 0 ; j < 2 && end == task ; ++j ) {
- task = queue_type::pop_task( & queue->m_ready[i][j] );
- }
+ leader_loop = true ;
+ }
+ } while ( leader_loop );
}
- }
-
- // Team lead broadcast acquired task to team members:
-
- if ( 1 < team_exec.team_size() ) {
-
- if ( 0 == team_exec.team_rank() ) *task_shared = task ;
-
- // Fence to be sure task_shared is stored before the barrier
- Kokkos::memory_fence();
- // Whole team waits for every team member to reach this statement
- team_exec.team_barrier();
+ // Team lead either found 0 == m_ready_count or a team task
+ // Team lead broadcast acquired task:
- // Fence to be sure task_shared is stored
- Kokkos::memory_fence();
+ team_exec.team_broadcast( task , 0);
- task = *task_shared ;
- }
+ if ( 0 != task ) { // Thread Team Task
#if 0
-fprintf( stdout
- , "\nexecute group(%d) member(%d) task_shared(0x%lx) task(0x%lx)\n"
- , team_exec.m_group_rank
- , team_exec.m_team_rank
- , uintptr_t(task_shared)
- , uintptr_t(task)
+fprintf(stdout,"TaskQueue<OpenMP> pool(%d of %d) team((%d of %d) league(%d of %d) executing team task 0x%lx\n"
+ , self.pool_rank()
+ , self.pool_size()
+ , team_exec.team_rank()
+ , team_exec.team_size()
+ , team_exec.league_rank()
+ , team_exec.league_size()
+ , int64_t(task)
);
fflush(stdout);
#endif
- if ( 0 == task ) break ; // 0 == m_ready_count
-
- if ( end == task ) {
- // All team members wait for whole team to reach this statement.
- // Is necessary to prevent task_shared from being updated
- // before it is read by all threads.
- team_exec.team_barrier();
- }
- else if ( task_root_type::TaskTeam == task->m_task_type ) {
- // Thread Team Task
- (*task->m_apply)( task , & team_exec );
+ (*task->m_apply)( task , & team_exec );
- // The m_apply function performs a barrier
-
- if ( 0 == team_exec.team_rank() ) {
- // team member #0 completes the task, which may delete the task
- queue->complete( task );
+ // The m_apply function performs a barrier
}
- }
- else {
- // Single Thread Task
+ } while( 0 != task );
- if ( 0 == team_exec.team_rank() ) {
+#if 0
+fprintf(stdout,"TaskQueue<OpenMP> pool(%d of %d) team(%d of %d) league(%d of %d) ending\n"
+ , self.pool_rank()
+ , self.pool_size()
+ , team_exec.team_rank()
+ , team_exec.team_size()
+ , team_exec.league_rank()
+ , team_exec.league_size()
+ );
+fflush(stdout);
+#endif
- (*task->m_apply)( task , & single_exec );
+ }
- queue->complete( task );
- }
+ self.disband_team();
+
+#if 0
+fprintf(stdout,"TaskQueue<OpenMP> pool(%d of %d) disbanded\n"
+ , self.pool_rank()
+ , self.pool_size()
+ );
+fflush(stdout);
+#endif
- // All team members wait for whole team to reach this statement.
- // Not necessary to complete the task.
- // Is necessary to prevent task_shared from being updated
- // before it is read by all threads.
- team_exec.team_barrier();
- }
- } while(1);
}
// END #pragma omp parallel
+#if 0
+fprintf(stdout,"TaskQueue<OpenMP> execute %d end\n", team_size );
+fflush(stdout);
+#endif
+
}
void TaskQueueSpecialization< Kokkos::OpenMP >::
iff_single_thread_recursive_execute
( TaskQueue< Kokkos::OpenMP > * const queue )
{
using execution_space = Kokkos::OpenMP ;
using queue_type = TaskQueue< execution_space > ;
using task_root_type = TaskBase< execution_space , void , void > ;
- using Member = TaskExec< execution_space > ;
+ using Member = Impl::HostThreadTeamMember< execution_space > ;
if ( 1 == omp_get_num_threads() ) {
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
- Member single_exec ;
+ HostThreadTeamData & team_data_single =
+ HostThreadTeamDataSingleton::singleton();
+
+ Member single_exec( team_data_single );
task_root_type * task = end ;
do {
task = end ;
// Loop by priority and then type
for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
for ( int j = 0 ; j < 2 && end == task ; ++j ) {
- task = queue_type::pop_task( & queue->m_ready[i][j] );
+ task = queue_type::pop_ready_task( & queue->m_ready[i][j] );
}
}
if ( end == task ) break ;
(*task->m_apply)( task , & single_exec );
queue->complete( task );
} while(1);
}
}
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_ENABLE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG ) */
diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp
index 15dbb77c2..3cfdf790b 100644
--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp
+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.hpp
@@ -1,365 +1,89 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_IMPL_OPENMP_TASK_HPP
#define KOKKOS_IMPL_OPENMP_TASK_HPP
#if defined( KOKKOS_ENABLE_TASKDAG )
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template<>
class TaskQueueSpecialization< Kokkos::OpenMP >
{
public:
using execution_space = Kokkos::OpenMP ;
using queue_type = Kokkos::Impl::TaskQueue< execution_space > ;
using task_base_type = Kokkos::Impl::TaskBase< execution_space , void , void > ;
+ using member_type = Kokkos::Impl::HostThreadTeamMember< execution_space > ;
// Must specify memory space
using memory_space = Kokkos::HostSpace ;
static
void iff_single_thread_recursive_execute( queue_type * const );
// Must provide task queue execution function
static void execute( queue_type * const );
- // Must provide mechanism to set function pointer in
- // execution space from the host process.
- template< typename FunctorType >
+ template< typename TaskType >
static
- void proc_set_apply( task_base_type::function_type * ptr )
- {
- using TaskType = TaskBase< Kokkos::OpenMP
- , typename FunctorType::value_type
- , FunctorType
- > ;
- *ptr = TaskType::apply ;
- }
+ typename TaskType::function_type
+ get_function_pointer() { return TaskType::apply ; }
};
extern template class TaskQueue< Kokkos::OpenMP > ;
-//----------------------------------------------------------------------------
-
-template<>
-class TaskExec< Kokkos::OpenMP >
-{
-private:
-
- TaskExec( TaskExec && ) = delete ;
- TaskExec( TaskExec const & ) = delete ;
- TaskExec & operator = ( TaskExec && ) = delete ;
- TaskExec & operator = ( TaskExec const & ) = delete ;
-
-
- using PoolExec = Kokkos::Impl::OpenMPexec ;
-
- friend class Kokkos::Impl::TaskQueue< Kokkos::OpenMP > ;
- friend class Kokkos::Impl::TaskQueueSpecialization< Kokkos::OpenMP > ;
-
- PoolExec * const m_self_exec ; ///< This thread's thread pool data structure
- PoolExec * const m_team_exec ; ///< Team thread's thread pool data structure
- int64_t m_sync_mask ;
- int64_t mutable m_sync_value ;
- int mutable m_sync_step ;
- int m_group_rank ; ///< Which "team" subset of thread pool
- int m_team_rank ; ///< Which thread within a team
- int m_team_size ;
-
- TaskExec();
- TaskExec( PoolExec & arg_exec , int arg_team_size );
-
- void team_barrier_impl() const ;
-
-public:
-
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- void * team_shared() const
- { return m_team_exec ? m_team_exec->scratch_thread() : (void*) 0 ; }
-
- int team_shared_size() const
- { return m_team_exec ? m_team_exec->scratch_thread_size() : 0 ; }
-
- /**\brief Whole team enters this function call
- * before any teeam member returns from
- * this function call.
- */
- void team_barrier() const { if ( 1 < m_team_size ) team_barrier_impl(); }
-#else
- KOKKOS_INLINE_FUNCTION void team_barrier() const {}
- KOKKOS_INLINE_FUNCTION void * team_shared() const { return 0 ; }
- KOKKOS_INLINE_FUNCTION int team_shared_size() const { return 0 ; }
-#endif
-
- KOKKOS_INLINE_FUNCTION
- int team_rank() const { return m_team_rank ; }
-
- KOKKOS_INLINE_FUNCTION
- int team_size() const { return m_team_size ; }
-};
-
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-namespace Kokkos {
-
-template<typename iType>
-KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >
-TeamThreadRange
- ( Impl::TaskExec< Kokkos::OpenMP > & thread, const iType & count )
-{
- return Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >(thread,count);
-}
-
-template<typename iType1, typename iType2>
-KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
- Impl::TaskExec< Kokkos::OpenMP > >
-TeamThreadRange
- ( Impl:: TaskExec< Kokkos::OpenMP > & thread, const iType1 & begin, const iType2 & end )
-{
- typedef typename std::common_type<iType1, iType2>::type iType;
- return Impl::TeamThreadRangeBoundariesStruct<iType, Impl::TaskExec< Kokkos::OpenMP > >(thread, begin, end);
-}
-
-template<typename iType>
-KOKKOS_INLINE_FUNCTION
-Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >
-ThreadVectorRange
- ( Impl::TaskExec< Kokkos::OpenMP > & thread
- , const iType & count )
-{
- return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >(thread,count);
-}
-
-/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all threads of the the calling thread team.
- * This functionality requires C++11 support.
-*/
-template<typename iType, class Lambda>
-KOKKOS_INLINE_FUNCTION
-void parallel_for
- ( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::OpenMP > >& loop_boundaries
- , const Lambda& lambda
- )
-{
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- lambda(i);
- }
-}
-
-template<typename iType, class Lambda, typename ValueType>
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce
- ( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl:: TaskExec< Kokkos::OpenMP > >& loop_boundaries
- , const Lambda& lambda
- , ValueType& initialized_result)
-{
- int team_rank = loop_boundaries.thread.team_rank(); // member num within the team
- ValueType result = initialized_result;
-
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- lambda(i, result);
- }
-
- if ( 1 < loop_boundaries.thread.team_size() ) {
-
- ValueType *shared = (ValueType*) loop_boundaries.thread.team_shared();
-
- loop_boundaries.thread.team_barrier();
- shared[team_rank] = result;
-
- loop_boundaries.thread.team_barrier();
-
- // reduce across threads to thread 0
- if (team_rank == 0) {
- for (int i = 1; i < loop_boundaries.thread.team_size(); i++) {
- shared[0] += shared[i];
- }
- }
-
- loop_boundaries.thread.team_barrier();
-
- // broadcast result
- initialized_result = shared[0];
- }
- else {
- initialized_result = result ;
- }
-}
-
-template< typename iType, class Lambda, typename ValueType, class JoinType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce
- (const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
- const Lambda & lambda,
- const JoinType & join,
- ValueType& initialized_result)
-{
- int team_rank = loop_boundaries.thread.team_rank(); // member num within the team
- ValueType result = initialized_result;
-
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- lambda(i, result);
- }
-
- if ( 1 < loop_boundaries.thread.team_size() ) {
- ValueType *shared = (ValueType*) loop_boundaries.thread.team_shared();
-
- loop_boundaries.thread.team_barrier();
- shared[team_rank] = result;
-
- loop_boundaries.thread.team_barrier();
-
- // reduce across threads to thread 0
- if (team_rank == 0) {
- for (int i = 1; i < loop_boundaries.thread.team_size(); i++) {
- join(shared[0], shared[i]);
- }
- }
-
- loop_boundaries.thread.team_barrier();
-
- // broadcast result
- initialized_result = shared[0];
- }
- else {
- initialized_result = result ;
- }
-}
-
-// placeholder for future function
-template< typename iType, class Lambda, typename ValueType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce
- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
- const Lambda & lambda,
- ValueType& initialized_result)
-{
-}
-
-// placeholder for future function
-template< typename iType, class Lambda, typename ValueType, class JoinType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce
- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
- const Lambda & lambda,
- const JoinType & join,
- ValueType& initialized_result)
-{
-}
-
-template< typename ValueType, typename iType, class Lambda >
-KOKKOS_INLINE_FUNCTION
-void parallel_scan
- (const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
- const Lambda & lambda)
-{
- ValueType accum = 0 ;
- ValueType val, local_total;
- ValueType *shared = (ValueType*) loop_boundaries.thread.team_shared();
- int team_size = loop_boundaries.thread.team_size();
- int team_rank = loop_boundaries.thread.team_rank(); // member num within the team
-
- // Intra-member scan
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- local_total = 0;
- lambda(i,local_total,false);
- val = accum;
- lambda(i,val,true);
- accum += local_total;
- }
-
- shared[team_rank] = accum;
- loop_boundaries.thread.team_barrier();
-
- // Member 0 do scan on accumulated totals
- if (team_rank == 0) {
- for( iType i = 1; i < team_size; i+=1) {
- shared[i] += shared[i-1];
- }
- accum = 0; // Member 0 set accum to 0 in preparation for inter-member scan
- }
-
- loop_boundaries.thread.team_barrier();
-
- // Inter-member scan adding in accumulated totals
- if (team_rank != 0) { accum = shared[team_rank-1]; }
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- local_total = 0;
- lambda(i,local_total,false);
- val = accum;
- lambda(i,val,true);
- accum += local_total;
- }
-}
-
-// placeholder for future function
-template< typename iType, class Lambda, typename ValueType >
-KOKKOS_INLINE_FUNCTION
-void parallel_scan
- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::OpenMP > >& loop_boundaries,
- const Lambda & lambda)
-{
-}
-
-
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #ifndef KOKKOS_IMPL_OPENMP_TASK_HPP */
diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp
index 34cf581a4..2d50c6e54 100644
--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp
+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.cpp
@@ -1,408 +1,462 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <stdio.h>
#include <limits>
#include <iostream>
#include <vector>
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Error.hpp>
#include <iostream>
#include <impl/Kokkos_CPUDiscovery.hpp>
#include <impl/Kokkos_Profiling_Interface.hpp>
#ifdef KOKKOS_ENABLE_OPENMP
namespace Kokkos {
namespace Impl {
namespace {
KOKKOS_INLINE_FUNCTION
int kokkos_omp_in_parallel();
int kokkos_omp_in_critical_region = ( Kokkos::HostSpace::register_in_parallel( kokkos_omp_in_parallel ) , 0 );
KOKKOS_INLINE_FUNCTION
int kokkos_omp_in_parallel()
{
#ifndef __CUDA_ARCH__
return omp_in_parallel() && ! kokkos_omp_in_critical_region ;
#else
return 0;
#endif
}
bool s_using_hwloc = false;
} // namespace
} // namespace Impl
} // namespace Kokkos
namespace Kokkos {
namespace Impl {
int OpenMPexec::m_map_rank[ OpenMPexec::MAX_THREAD_COUNT ] = { 0 };
int OpenMPexec::m_pool_topo[ 4 ] = { 0 };
-OpenMPexec * OpenMPexec::m_pool[ OpenMPexec::MAX_THREAD_COUNT ] = { 0 };
+HostThreadTeamData * OpenMPexec::m_pool[ OpenMPexec::MAX_THREAD_COUNT ] = { 0 };
void OpenMPexec::verify_is_process( const char * const label )
{
if ( omp_in_parallel() ) {
std::string msg( label );
msg.append( " ERROR: in parallel" );
Kokkos::Impl::throw_runtime_exception( msg );
}
}
void OpenMPexec::verify_initialized( const char * const label )
{
if ( 0 == m_pool[0] ) {
std::string msg( label );
msg.append( " ERROR: not initialized" );
Kokkos::Impl::throw_runtime_exception( msg );
}
if ( omp_get_max_threads() != Kokkos::OpenMP::thread_pool_size(0) ) {
std::string msg( label );
msg.append( " ERROR: Initialized but threads modified inappropriately" );
Kokkos::Impl::throw_runtime_exception( msg );
}
}
-void OpenMPexec::clear_scratch()
+} // namespace Impl
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+void OpenMPexec::clear_thread_data()
{
+ const size_t member_bytes =
+ sizeof(int64_t) *
+ HostThreadTeamData::align_to_int64( sizeof(HostThreadTeamData) );
+
+ const int old_alloc_bytes =
+ m_pool[0] ? ( member_bytes + m_pool[0]->scratch_bytes() ) : 0 ;
+
+ Kokkos::HostSpace space ;
+
#pragma omp parallel
{
- const int rank_rev = m_map_rank[ omp_get_thread_num() ];
- typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;
- if ( m_pool[ rank_rev ] ) {
- Record * const r = Record::get_record( m_pool[ rank_rev ] );
- m_pool[ rank_rev ] = 0 ;
- Record::decrement( r );
+ const int rank = m_map_rank[ omp_get_thread_num() ];
+
+ if ( 0 != m_pool[rank] ) {
+
+ m_pool[rank]->disband_pool();
+
+ space.deallocate( m_pool[rank] , old_alloc_bytes );
+
+ m_pool[rank] = 0 ;
}
}
/* END #pragma omp parallel */
}
-void OpenMPexec::resize_scratch( size_t reduce_size , size_t thread_size )
+void OpenMPexec::resize_thread_data( size_t pool_reduce_bytes
+ , size_t team_reduce_bytes
+ , size_t team_shared_bytes
+ , size_t thread_local_bytes )
{
- enum { ALIGN_MASK = Kokkos::Impl::MEMORY_ALIGNMENT - 1 };
- enum { ALLOC_EXEC = ( sizeof(OpenMPexec) + ALIGN_MASK ) & ~ALIGN_MASK };
+ const size_t member_bytes =
+ sizeof(int64_t) *
+ HostThreadTeamData::align_to_int64( sizeof(HostThreadTeamData) );
- const size_t old_reduce_size = m_pool[0] ? m_pool[0]->m_scratch_reduce_end : 0 ;
- const size_t old_thread_size = m_pool[0] ? m_pool[0]->m_scratch_thread_end - m_pool[0]->m_scratch_reduce_end : 0 ;
+ HostThreadTeamData * root = m_pool[0] ;
- reduce_size = ( reduce_size + ALIGN_MASK ) & ~ALIGN_MASK ;
- thread_size = ( thread_size + ALIGN_MASK ) & ~ALIGN_MASK ;
+ const size_t old_pool_reduce = root ? root->pool_reduce_bytes() : 0 ;
+ const size_t old_team_reduce = root ? root->team_reduce_bytes() : 0 ;
+ const size_t old_team_shared = root ? root->team_shared_bytes() : 0 ;
+ const size_t old_thread_local = root ? root->thread_local_bytes() : 0 ;
+ const size_t old_alloc_bytes = root ? ( member_bytes + root->scratch_bytes() ) : 0 ;
- // Requesting allocation and old allocation is too small:
+ // Allocate if any of the old allocation is tool small:
- const bool allocate = ( old_reduce_size < reduce_size ) ||
- ( old_thread_size < thread_size );
+ const bool allocate = ( old_pool_reduce < pool_reduce_bytes ) ||
+ ( old_team_reduce < team_reduce_bytes ) ||
+ ( old_team_shared < team_shared_bytes ) ||
+ ( old_thread_local < thread_local_bytes );
if ( allocate ) {
- if ( reduce_size < old_reduce_size ) { reduce_size = old_reduce_size ; }
- if ( thread_size < old_thread_size ) { thread_size = old_thread_size ; }
- }
- const size_t alloc_size = allocate ? ALLOC_EXEC + reduce_size + thread_size : 0 ;
- const int pool_size = m_pool_topo[0] ;
+ if ( pool_reduce_bytes < old_pool_reduce ) { pool_reduce_bytes = old_pool_reduce ; }
+ if ( team_reduce_bytes < old_team_reduce ) { team_reduce_bytes = old_team_reduce ; }
+ if ( team_shared_bytes < old_team_shared ) { team_shared_bytes = old_team_shared ; }
+ if ( thread_local_bytes < old_thread_local ) { thread_local_bytes = old_thread_local ; }
- if ( allocate ) {
+ const size_t alloc_bytes =
+ member_bytes +
+ HostThreadTeamData::scratch_size( pool_reduce_bytes
+ , team_reduce_bytes
+ , team_shared_bytes
+ , thread_local_bytes );
+
+ const int pool_size = omp_get_max_threads();
- clear_scratch();
+ Kokkos::HostSpace space ;
#pragma omp parallel
{
- const int rank_rev = m_map_rank[ omp_get_thread_num() ];
- const int rank = pool_size - ( rank_rev + 1 );
+ const int rank = m_map_rank[ omp_get_thread_num() ];
- typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;
+ if ( 0 != m_pool[rank] ) {
- Record * const r = Record::allocate( Kokkos::HostSpace()
- , "openmp_scratch"
- , alloc_size );
+ m_pool[rank]->disband_pool();
- Record::increment( r );
+ space.deallocate( m_pool[rank] , old_alloc_bytes );
+ }
+
+ void * const ptr = space.allocate( alloc_bytes );
- m_pool[ rank_rev ] = reinterpret_cast<OpenMPexec*>( r->data() );
+ m_pool[ rank ] = new( ptr ) HostThreadTeamData();
- new ( m_pool[ rank_rev ] ) OpenMPexec( rank , ALLOC_EXEC , reduce_size , thread_size );
+ m_pool[ rank ]->
+ scratch_assign( ((char *)ptr) + member_bytes
+ , alloc_bytes
+ , pool_reduce_bytes
+ , team_reduce_bytes
+ , team_shared_bytes
+ , thread_local_bytes );
}
/* END #pragma omp parallel */
+
+ HostThreadTeamData::organize_pool( m_pool , pool_size );
}
}
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
//----------------------------------------------------------------------------
int OpenMP::is_initialized()
{ return 0 != Impl::OpenMPexec::m_pool[0]; }
void OpenMP::initialize( unsigned thread_count ,
unsigned use_numa_count ,
unsigned use_cores_per_numa )
{
// Before any other call to OMP query the maximum number of threads
// and save the value for re-initialization unit testing.
- //Using omp_get_max_threads(); is problematic in conjunction with
- //Hwloc on Intel (essentially an initial call to the OpenMP runtime
- //without a parallel region before will set a process mask for a single core
- //The runtime will than bind threads for a parallel region to other cores on the
- //entering the first parallel region and make the process mask the aggregate of
- //the thread masks. The intend seems to be to make serial code run fast, if you
- //compile with OpenMP enabled but don't actually use parallel regions or so
- //static int omp_max_threads = omp_get_max_threads();
+ // Using omp_get_max_threads(); is problematic in conjunction with
+ // Hwloc on Intel (essentially an initial call to the OpenMP runtime
+ // without a parallel region before will set a process mask for a single core
+ // The runtime will than bind threads for a parallel region to other cores on the
+ // entering the first parallel region and make the process mask the aggregate of
+ // the thread masks. The intend seems to be to make serial code run fast, if you
+ // compile with OpenMP enabled but don't actually use parallel regions or so
+ // static int omp_max_threads = omp_get_max_threads();
int nthreads = 0;
#pragma omp parallel
{
#pragma omp atomic
nthreads++;
}
static int omp_max_threads = nthreads;
const bool is_initialized = 0 != Impl::OpenMPexec::m_pool[0] ;
bool thread_spawn_failed = false ;
if ( ! is_initialized ) {
// Use hwloc thread pinning if concerned with locality.
// If spreading threads across multiple NUMA regions.
// If hyperthreading is enabled.
Impl::s_using_hwloc = hwloc::available() && (
( 1 < Kokkos::hwloc::get_available_numa_count() ) ||
( 1 < Kokkos::hwloc::get_available_threads_per_core() ) );
std::pair<unsigned,unsigned> threads_coord[ Impl::OpenMPexec::MAX_THREAD_COUNT ];
// If hwloc available then use it's maximum value.
if ( thread_count == 0 ) {
thread_count = Impl::s_using_hwloc
? Kokkos::hwloc::get_available_numa_count() *
Kokkos::hwloc::get_available_cores_per_numa() *
Kokkos::hwloc::get_available_threads_per_core()
: omp_max_threads ;
}
if(Impl::s_using_hwloc)
hwloc::thread_mapping( "Kokkos::OpenMP::initialize" ,
false /* do not allow asynchronous */ ,
thread_count ,
use_numa_count ,
use_cores_per_numa ,
threads_coord );
// Spawn threads:
omp_set_num_threads( thread_count );
// Verify OMP interaction:
if ( int(thread_count) != omp_get_max_threads() ) {
thread_spawn_failed = true ;
}
// Verify spawning and bind threads:
#pragma omp parallel
{
#pragma omp critical
{
if ( int(thread_count) != omp_get_num_threads() ) {
thread_spawn_failed = true ;
}
// Call to 'bind_this_thread' is not thread safe so place this whole block in a critical region.
// Call to 'new' may not be thread safe as well.
- // Reverse the rank for threads so that the scan operation reduces to the highest rank thread.
-
const unsigned omp_rank = omp_get_thread_num();
const unsigned thread_r = Impl::s_using_hwloc && Kokkos::hwloc::can_bind_threads()
? Kokkos::hwloc::bind_this_thread( thread_count , threads_coord )
: omp_rank ;
Impl::OpenMPexec::m_map_rank[ omp_rank ] = thread_r ;
}
/* END #pragma omp critical */
}
/* END #pragma omp parallel */
if ( ! thread_spawn_failed ) {
Impl::OpenMPexec::m_pool_topo[0] = thread_count ;
Impl::OpenMPexec::m_pool_topo[1] = Impl::s_using_hwloc ? thread_count / use_numa_count : thread_count;
Impl::OpenMPexec::m_pool_topo[2] = Impl::s_using_hwloc ? thread_count / ( use_numa_count * use_cores_per_numa ) : 1;
- Impl::OpenMPexec::resize_scratch( 1024 , 1024 );
+ // New, unified host thread team data:
+ {
+ size_t pool_reduce_bytes = 32 * thread_count ;
+ size_t team_reduce_bytes = 32 * thread_count ;
+ size_t team_shared_bytes = 1024 * thread_count ;
+ size_t thread_local_bytes = 1024 ;
+
+ Impl::OpenMPexec::resize_thread_data( pool_reduce_bytes
+ , team_reduce_bytes
+ , team_shared_bytes
+ , thread_local_bytes
+ );
+ }
}
}
if ( is_initialized || thread_spawn_failed ) {
std::string msg("Kokkos::OpenMP::initialize ERROR");
if ( is_initialized ) { msg.append(" : already initialized"); }
if ( thread_spawn_failed ) { msg.append(" : failed spawning threads"); }
Kokkos::Impl::throw_runtime_exception(msg);
}
// Check for over-subscription
//if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
// std::cout << "Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
// std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
// std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
// std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
//}
// Init the array for used for arbitrarily sized atomics
Impl::init_lock_array_host_space();
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::initialize();
#endif
}
//----------------------------------------------------------------------------
void OpenMP::finalize()
{
Impl::OpenMPexec::verify_initialized( "OpenMP::finalize" );
Impl::OpenMPexec::verify_is_process( "OpenMP::finalize" );
- Impl::OpenMPexec::clear_scratch();
+ // New, unified host thread team data:
+ Impl::OpenMPexec::clear_thread_data();
Impl::OpenMPexec::m_pool_topo[0] = 0 ;
Impl::OpenMPexec::m_pool_topo[1] = 0 ;
Impl::OpenMPexec::m_pool_topo[2] = 0 ;
omp_set_num_threads(1);
if ( Impl::s_using_hwloc && Kokkos::hwloc::can_bind_threads() ) {
hwloc::unbind_this_thread();
}
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::finalize();
#endif
}
//----------------------------------------------------------------------------
void OpenMP::print_configuration( std::ostream & s , const bool detail )
{
Impl::OpenMPexec::verify_is_process( "OpenMP::print_configuration" );
s << "Kokkos::OpenMP" ;
#if defined( KOKKOS_ENABLE_OPENMP )
s << " KOKKOS_ENABLE_OPENMP" ;
#endif
#if defined( KOKKOS_ENABLE_HWLOC )
const unsigned numa_count_ = Kokkos::hwloc::get_available_numa_count();
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
s << " hwloc[" << numa_count_ << "x" << cores_per_numa << "x" << threads_per_core << "]"
<< " hwloc_binding_" << ( Impl::s_using_hwloc ? "enabled" : "disabled" )
;
#endif
const bool is_initialized = 0 != Impl::OpenMPexec::m_pool[0] ;
if ( is_initialized ) {
const int numa_count = Kokkos::Impl::OpenMPexec::m_pool_topo[0] / Kokkos::Impl::OpenMPexec::m_pool_topo[1] ;
const int core_per_numa = Kokkos::Impl::OpenMPexec::m_pool_topo[1] / Kokkos::Impl::OpenMPexec::m_pool_topo[2] ;
const int thread_per_core = Kokkos::Impl::OpenMPexec::m_pool_topo[2] ;
s << " thread_pool_topology[ " << numa_count
<< " x " << core_per_numa
<< " x " << thread_per_core
<< " ]"
<< std::endl ;
if ( detail ) {
std::vector< std::pair<unsigned,unsigned> > coord( Kokkos::Impl::OpenMPexec::m_pool_topo[0] );
#pragma omp parallel
{
#pragma omp critical
{
coord[ omp_get_thread_num() ] = hwloc::get_this_thread_coordinate();
}
/* END #pragma omp critical */
}
/* END #pragma omp parallel */
for ( unsigned i = 0 ; i < coord.size() ; ++i ) {
s << " thread omp_rank[" << i << "]"
<< " kokkos_rank[" << Impl::OpenMPexec::m_map_rank[ i ] << "]"
<< " hwloc_coord[" << coord[i].first << "." << coord[i].second << "]"
<< std::endl ;
}
}
}
else {
s << " not initialized" << std::endl ;
}
}
int OpenMP::concurrency() {
return thread_pool_size(0);
}
} // namespace Kokkos
#endif //KOKKOS_ENABLE_OPENMP
diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp
index 63f7234da..39ace3131 100644
--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp
+++ b/lib/kokkos/core/src/OpenMP/Kokkos_OpenMPexec.hpp
@@ -1,1065 +1,345 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_OPENMPEXEC_HPP
#define KOKKOS_OPENMPEXEC_HPP
+#include <Kokkos_OpenMP.hpp>
+
#include <impl/Kokkos_Traits.hpp>
-#include <impl/Kokkos_spinwait.hpp>
+#include <impl/Kokkos_HostThreadTeam.hpp>
#include <Kokkos_Atomic.hpp>
+
#include <iostream>
#include <sstream>
#include <fstream>
+
+#include <omp.h>
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
/** \brief Data for OpenMP thread execution */
class OpenMPexec {
public:
+ friend class Kokkos::OpenMP ;
+
enum { MAX_THREAD_COUNT = 4096 };
private:
- static OpenMPexec * m_pool[ MAX_THREAD_COUNT ]; // Indexed by: m_pool_rank_rev
-
static int m_pool_topo[ 4 ];
static int m_map_rank[ MAX_THREAD_COUNT ];
- friend class Kokkos::OpenMP ;
-
- int const m_pool_rank ;
- int const m_pool_rank_rev ;
- int const m_scratch_exec_end ;
- int const m_scratch_reduce_end ;
- int const m_scratch_thread_end ;
-
- int volatile m_barrier_state ;
-
- // Members for dynamic scheduling
- // Which thread am I stealing from currently
- int m_current_steal_target;
- // This thread's owned work_range
- Kokkos::pair<long,long> m_work_range KOKKOS_ALIGN(16);
- // Team Offset if one thread determines work_range for others
- long m_team_work_index;
+ static HostThreadTeamData * m_pool[ MAX_THREAD_COUNT ];
- // Is this thread stealing (i.e. its owned work_range is exhausted
- bool m_stealing;
-
- OpenMPexec();
- OpenMPexec( const OpenMPexec & );
- OpenMPexec & operator = ( const OpenMPexec & );
-
- static void clear_scratch();
+ static
+ void clear_thread_data();
public:
// Topology of a cache coherent thread pool:
// TOTAL = NUMA x GRAIN
// pool_size( depth = 0 )
// pool_size(0) = total number of threads
// pool_size(1) = number of threads per NUMA
// pool_size(2) = number of threads sharing finest grain memory hierarchy
inline static
int pool_size( int depth = 0 ) { return m_pool_topo[ depth ]; }
- inline static
- OpenMPexec * pool_rev( int pool_rank_rev ) { return m_pool[ pool_rank_rev ]; }
-
- inline int pool_rank() const { return m_pool_rank ; }
- inline int pool_rank_rev() const { return m_pool_rank_rev ; }
-
- inline long team_work_index() const { return m_team_work_index ; }
-
- inline int scratch_reduce_size() const
- { return m_scratch_reduce_end - m_scratch_exec_end ; }
-
- inline int scratch_thread_size() const
- { return m_scratch_thread_end - m_scratch_reduce_end ; }
-
- inline void * scratch_reduce() const { return ((char *) this) + m_scratch_exec_end ; }
- inline void * scratch_thread() const { return ((char *) this) + m_scratch_reduce_end ; }
-
- inline
- void state_wait( int state )
- { Impl::spinwait( m_barrier_state , state ); }
-
- inline
- void state_set( int state ) { m_barrier_state = state ; }
-
- ~OpenMPexec() {}
-
- OpenMPexec( const int arg_poolRank
- , const int arg_scratch_exec_size
- , const int arg_scratch_reduce_size
- , const int arg_scratch_thread_size )
- : m_pool_rank( arg_poolRank )
- , m_pool_rank_rev( pool_size() - ( arg_poolRank + 1 ) )
- , m_scratch_exec_end( arg_scratch_exec_size )
- , m_scratch_reduce_end( m_scratch_exec_end + arg_scratch_reduce_size )
- , m_scratch_thread_end( m_scratch_reduce_end + arg_scratch_thread_size )
- , m_barrier_state(0)
- {}
-
static void finalize();
- static void initialize( const unsigned team_count ,
+ static void initialize( const unsigned team_count ,
const unsigned threads_per_team ,
const unsigned numa_count ,
const unsigned cores_per_numa );
static void verify_is_process( const char * const );
static void verify_initialized( const char * const );
- static void resize_scratch( size_t reduce_size , size_t thread_size );
- inline static
- OpenMPexec * get_thread_omp() { return m_pool[ m_map_rank[ omp_get_thread_num() ] ]; }
+ static
+ void resize_thread_data( size_t pool_reduce_bytes
+ , size_t team_reduce_bytes
+ , size_t team_shared_bytes
+ , size_t thread_local_bytes );
- /* Dynamic Scheduling related functionality */
- // Initialize the work range for this thread
- inline void set_work_range(const long& begin, const long& end, const long& chunk_size) {
- m_work_range.first = (begin+chunk_size-1)/chunk_size;
- m_work_range.second = end>0?(end+chunk_size-1)/chunk_size:m_work_range.first;
- }
-
- // Claim and index from this thread's range from the beginning
- inline long get_work_index_begin () {
- Kokkos::pair<long,long> work_range_new = m_work_range;
- Kokkos::pair<long,long> work_range_old = work_range_new;
- if(work_range_old.first>=work_range_old.second)
- return -1;
-
- work_range_new.first+=1;
-
- bool success = false;
- while(!success) {
- work_range_new = Kokkos::atomic_compare_exchange(&m_work_range,work_range_old,work_range_new);
- success = ( (work_range_new == work_range_old) ||
- (work_range_new.first>=work_range_new.second));
- work_range_old = work_range_new;
- work_range_new.first+=1;
- }
- if(work_range_old.first<work_range_old.second)
- return work_range_old.first;
- else
- return -1;
- }
-
- // Claim and index from this thread's range from the end
- inline long get_work_index_end () {
- Kokkos::pair<long,long> work_range_new = m_work_range;
- Kokkos::pair<long,long> work_range_old = work_range_new;
- if(work_range_old.first>=work_range_old.second)
- return -1;
- work_range_new.second-=1;
- bool success = false;
- while(!success) {
- work_range_new = Kokkos::atomic_compare_exchange(&m_work_range,work_range_old,work_range_new);
- success = ( (work_range_new == work_range_old) ||
- (work_range_new.first>=work_range_new.second) );
- work_range_old = work_range_new;
- work_range_new.second-=1;
- }
- if(work_range_old.first<work_range_old.second)
- return work_range_old.second-1;
- else
- return -1;
- }
-
- // Reset the steal target
- inline void reset_steal_target() {
- m_current_steal_target = (m_pool_rank+1)%m_pool_topo[0];
- m_stealing = false;
- }
-
- // Reset the steal target
- inline void reset_steal_target(int team_size) {
- m_current_steal_target = (m_pool_rank_rev+team_size);
- if(m_current_steal_target>=m_pool_topo[0])
- m_current_steal_target = 0;//m_pool_topo[0]-1;
- m_stealing = false;
- }
-
- // Get a steal target; start with my-rank + 1 and go round robin, until arriving at this threads rank
- // Returns -1 fi no active steal target available
- inline int get_steal_target() {
- while(( m_pool[m_current_steal_target]->m_work_range.second <=
- m_pool[m_current_steal_target]->m_work_range.first ) &&
- (m_current_steal_target!=m_pool_rank) ) {
- m_current_steal_target = (m_current_steal_target+1)%m_pool_topo[0];
- }
- if(m_current_steal_target == m_pool_rank)
- return -1;
- else
- return m_current_steal_target;
- }
-
- inline int get_steal_target(int team_size) {
-
- while(( m_pool[m_current_steal_target]->m_work_range.second <=
- m_pool[m_current_steal_target]->m_work_range.first ) &&
- (m_current_steal_target!=m_pool_rank_rev) ) {
- if(m_current_steal_target + team_size < m_pool_topo[0])
- m_current_steal_target = (m_current_steal_target+team_size);
- else
- m_current_steal_target = 0;
- }
-
- if(m_current_steal_target == m_pool_rank_rev)
- return -1;
- else
- return m_current_steal_target;
- }
-
- inline long steal_work_index (int team_size = 0) {
- long index = -1;
- int steal_target = team_size>0?get_steal_target(team_size):get_steal_target();
- while ( (steal_target != -1) && (index == -1)) {
- index = m_pool[steal_target]->get_work_index_end();
- if(index == -1)
- steal_target = team_size>0?get_steal_target(team_size):get_steal_target();
- }
- return index;
- }
-
- // Get a work index. Claim from owned range until its exhausted, then steal from other thread
- inline long get_work_index (int team_size = 0) {
- long work_index = -1;
- if(!m_stealing) work_index = get_work_index_begin();
-
- if( work_index == -1) {
- memory_fence();
- m_stealing = true;
- work_index = steal_work_index(team_size);
- }
- m_team_work_index = work_index;
- memory_fence();
- return work_index;
- }
+ inline static
+ HostThreadTeamData * get_thread_data() noexcept
+ { return m_pool[ m_map_rank[ omp_get_thread_num() ] ]; }
+ inline static
+ HostThreadTeamData * get_thread_data( int i ) noexcept
+ { return m_pool[i]; }
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
-class OpenMPexecTeamMember {
-public:
-
- enum { TEAM_REDUCE_SIZE = 512 };
-
- /** \brief Thread states for team synchronization */
- enum { Active = 0 , Rendezvous = 1 };
-
- typedef Kokkos::OpenMP execution_space ;
- typedef execution_space::scratch_memory_space scratch_memory_space ;
-
- Impl::OpenMPexec & m_exec ;
- scratch_memory_space m_team_shared ;
- int m_team_scratch_size[2] ;
- int m_team_base_rev ;
- int m_team_rank_rev ;
- int m_team_rank ;
- int m_team_size ;
- int m_league_rank ;
- int m_league_end ;
- int m_league_size ;
-
- int m_chunk_size;
- int m_league_chunk_end;
- Impl::OpenMPexec & m_team_lead_exec ;
- int m_invalid_thread;
- int m_team_alloc;
-
- // Fan-in team threads, root of the fan-in which does not block returns true
- inline
- bool team_fan_in() const
- {
- memory_fence();
- for ( int n = 1 , j ; ( ( j = m_team_rank_rev + n ) < m_team_size ) && ! ( m_team_rank_rev & n ) ; n <<= 1 ) {
-
- m_exec.pool_rev( m_team_base_rev + j )->state_wait( Active );
- }
-
- if ( m_team_rank_rev ) {
- m_exec.state_set( Rendezvous );
- memory_fence();
- m_exec.state_wait( Rendezvous );
- }
-
- return 0 == m_team_rank_rev ;
- }
-
- inline
- void team_fan_out() const
- {
- memory_fence();
- for ( int n = 1 , j ; ( ( j = m_team_rank_rev + n ) < m_team_size ) && ! ( m_team_rank_rev & n ) ; n <<= 1 ) {
- m_exec.pool_rev( m_team_base_rev + j )->state_set( Active );
- memory_fence();
- }
- }
-
-public:
-
- KOKKOS_INLINE_FUNCTION
- const execution_space::scratch_memory_space& team_shmem() const
- { return m_team_shared.set_team_thread_mode(0,1,0) ; }
-
- KOKKOS_INLINE_FUNCTION
- const execution_space::scratch_memory_space& team_scratch(int) const
- { return m_team_shared.set_team_thread_mode(0,1,0) ; }
-
- KOKKOS_INLINE_FUNCTION
- const execution_space::scratch_memory_space& thread_scratch(int) const
- { return m_team_shared.set_team_thread_mode(0,team_size(),team_rank()) ; }
-
- KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
- KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
- KOKKOS_INLINE_FUNCTION int team_rank() const { return m_team_rank ; }
- KOKKOS_INLINE_FUNCTION int team_size() const { return m_team_size ; }
-
- KOKKOS_INLINE_FUNCTION void team_barrier() const
-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- {}
-#else
- {
- if ( 1 < m_team_size && !m_invalid_thread) {
- team_fan_in();
- team_fan_out();
- }
- }
-#endif
-
- template<class ValueType>
- KOKKOS_INLINE_FUNCTION
- void team_broadcast(ValueType& value, const int& thread_id) const
- {
-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { }
-#else
- // Make sure there is enough scratch space:
- typedef typename if_c< sizeof(ValueType) < TEAM_REDUCE_SIZE
- , ValueType , void >::type type ;
-
- type volatile * const shared_value =
- ((type*) m_exec.pool_rev( m_team_base_rev )->scratch_thread());
-
- if ( team_rank() == thread_id ) *shared_value = value;
- memory_fence();
- team_barrier(); // Wait for 'thread_id' to write
- value = *shared_value ;
- team_barrier(); // Wait for team members to read
-#endif
- }
-
- template< class ValueType, class JoinOp >
- KOKKOS_INLINE_FUNCTION ValueType
- team_reduce( const ValueType & value
- , const JoinOp & op_in ) const
- #if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { return ValueType(); }
- #else
- {
- memory_fence();
- typedef ValueType value_type;
- const JoinLambdaAdapter<value_type,JoinOp> op(op_in);
- #endif
-#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- // Make sure there is enough scratch space:
- typedef typename if_c< sizeof(value_type) < TEAM_REDUCE_SIZE
- , value_type , void >::type type ;
-
- type * const local_value = ((type*) m_exec.scratch_thread());
-
- // Set this thread's contribution
- *local_value = value ;
-
- // Fence to make sure the base team member has access:
- memory_fence();
-
- if ( team_fan_in() ) {
- // The last thread to synchronize returns true, all other threads wait for team_fan_out()
- type * const team_value = ((type*) m_exec.pool_rev( m_team_base_rev )->scratch_thread());
-
- // Join to the team value:
- for ( int i = 1 ; i < m_team_size ; ++i ) {
- op.join( *team_value , *((type*) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread()) );
- }
- memory_fence();
-
- // The base team member may "lap" the other team members,
- // copy to their local value before proceeding.
- for ( int i = 1 ; i < m_team_size ; ++i ) {
- *((type*) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread()) = *team_value ;
- }
-
- // Fence to make sure all team members have access
- memory_fence();
- }
-
- team_fan_out();
-
- return *((type volatile const *)local_value);
- }
-#endif
- /** \brief Intra-team exclusive prefix sum with team_rank() ordering
- * with intra-team non-deterministic ordering accumulation.
- *
- * The global inter-team accumulation value will, at the end of the
- * league's parallel execution, be the scan's total.
- * Parallel execution ordering of the league's teams is non-deterministic.
- * As such the base value for each team's scan operation is similarly
- * non-deterministic.
- */
- template< typename ArgType >
- KOKKOS_INLINE_FUNCTION ArgType team_scan( const ArgType & value , ArgType * const global_accum ) const
-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { return ArgType(); }
-#else
- {
- // Make sure there is enough scratch space:
- typedef typename if_c< sizeof(ArgType) < TEAM_REDUCE_SIZE , ArgType , void >::type type ;
-
- volatile type * const work_value = ((type*) m_exec.scratch_thread());
-
- *work_value = value ;
-
- memory_fence();
-
- if ( team_fan_in() ) {
- // The last thread to synchronize returns true, all other threads wait for team_fan_out()
- // m_team_base[0] == highest ranking team member
- // m_team_base[ m_team_size - 1 ] == lowest ranking team member
- //
- // 1) copy from lower to higher rank, initialize lowest rank to zero
- // 2) prefix sum from lowest to highest rank, skipping lowest rank
-
- type accum = 0 ;
-
- if ( global_accum ) {
- for ( int i = m_team_size ; i-- ; ) {
- type & val = *((type*) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread());
- accum += val ;
- }
- accum = atomic_fetch_add( global_accum , accum );
- }
-
- for ( int i = m_team_size ; i-- ; ) {
- type & val = *((type*) m_exec.pool_rev( m_team_base_rev + i )->scratch_thread());
- const type offset = accum ;
- accum += val ;
- val = offset ;
- }
-
- memory_fence();
- }
-
- team_fan_out();
-
- return *work_value ;
- }
-#endif
-
- /** \brief Intra-team exclusive prefix sum with team_rank() ordering.
- *
- * The highest rank thread can compute the reduction total as
- * reduction_total = dev.team_scan( value ) + value ;
- */
- template< typename Type >
- KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const
- { return this-> template team_scan<Type>( value , 0 ); }
-
- //----------------------------------------
- // Private for the driver
-
-private:
-
- typedef execution_space::scratch_memory_space space ;
-
-public:
-
- template< class ... Properties >
- inline
- OpenMPexecTeamMember( Impl::OpenMPexec & exec
- , const TeamPolicyInternal< OpenMP, Properties ...> & team
- , const int shmem_size_L1
- , const int shmem_size_L2
- )
- : m_exec( exec )
- , m_team_shared(0,0)
- , m_team_scratch_size{ shmem_size_L1 , shmem_size_L2 }
- , m_team_base_rev(0)
- , m_team_rank_rev(0)
- , m_team_rank(0)
- , m_team_size( team.team_size() )
- , m_league_rank(0)
- , m_league_end(0)
- , m_league_size( team.league_size() )
- , m_chunk_size( team.chunk_size()>0?team.chunk_size():team.team_iter() )
- , m_league_chunk_end(0)
- , m_team_lead_exec( *exec.pool_rev( team.team_alloc() * (m_exec.pool_rank_rev()/team.team_alloc()) ))
- , m_team_alloc( team.team_alloc())
- {
- const int pool_rank_rev = m_exec.pool_rank_rev();
- const int pool_team_rank_rev = pool_rank_rev % team.team_alloc();
- const int pool_league_rank_rev = pool_rank_rev / team.team_alloc();
- const int pool_num_teams = OpenMP::thread_pool_size(0)/team.team_alloc();
- const int chunks_per_team = ( team.league_size() + m_chunk_size*pool_num_teams-1 ) / (m_chunk_size*pool_num_teams);
- int league_iter_end = team.league_size() - pool_league_rank_rev * chunks_per_team * m_chunk_size;
- int league_iter_begin = league_iter_end - chunks_per_team * m_chunk_size;
- if (league_iter_begin < 0) league_iter_begin = 0;
- if (league_iter_end>team.league_size()) league_iter_end = team.league_size();
-
- if ((team.team_alloc()>m_team_size)?
- (pool_team_rank_rev >= m_team_size):
- (m_exec.pool_size() - pool_num_teams*m_team_size > m_exec.pool_rank())
- )
- m_invalid_thread = 1;
- else
- m_invalid_thread = 0;
-
- m_team_rank_rev = pool_team_rank_rev ;
- if ( pool_team_rank_rev < m_team_size && !m_invalid_thread ) {
- m_team_base_rev = team.team_alloc() * pool_league_rank_rev ;
- m_team_rank_rev = pool_team_rank_rev ;
- m_team_rank = m_team_size - ( m_team_rank_rev + 1 );
- m_league_end = league_iter_end ;
- m_league_rank = league_iter_begin ;
- new( (void*) &m_team_shared ) space( ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE , m_team_scratch_size[0] ,
- ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE + m_team_scratch_size[0],
- 0 );
- }
-
- if ( (m_team_rank_rev == 0) && (m_invalid_thread == 0) ) {
- m_exec.set_work_range(m_league_rank,m_league_end,m_chunk_size);
- m_exec.reset_steal_target(m_team_size);
- }
- }
-
- bool valid_static() const
- {
- return m_league_rank < m_league_end ;
- }
-
- void next_static()
- {
- if ( m_league_rank < m_league_end ) {
- team_barrier();
- new( (void*) &m_team_shared ) space( ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE , m_team_scratch_size[0] ,
- ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE + m_team_scratch_size[0],
- 0);
- }
- m_league_rank++;
- }
-
- bool valid_dynamic() {
- if(m_invalid_thread)
- return false;
- if ((m_league_rank < m_league_chunk_end) && (m_league_rank < m_league_size)) {
- return true;
- }
-
- if ( m_team_rank_rev == 0 ) {
- m_team_lead_exec.get_work_index(m_team_alloc);
- }
- team_barrier();
-
- long work_index = m_team_lead_exec.team_work_index();
-
- m_league_rank = work_index * m_chunk_size;
- m_league_chunk_end = (work_index +1 ) * m_chunk_size;
-
- if(m_league_chunk_end > m_league_size) m_league_chunk_end = m_league_size;
-
- if(m_league_rank>=0)
- return true;
- return false;
- }
-
- void next_dynamic() {
- if(m_invalid_thread)
- return;
-
- if ( m_league_rank < m_league_chunk_end ) {
- team_barrier();
- new( (void*) &m_team_shared ) space( ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE , m_team_scratch_size[0] ,
- ( (char*) m_exec.pool_rev(m_team_base_rev)->scratch_thread() ) + TEAM_REDUCE_SIZE + m_team_scratch_size[0],
- 0);
- }
- m_league_rank++;
- }
-
- static inline int team_reduce_size() { return TEAM_REDUCE_SIZE ; }
-};
-
template< class ... Properties >
class TeamPolicyInternal< Kokkos::OpenMP, Properties ... >: public PolicyTraits<Properties ...>
{
public:
//! Tag this class as a kokkos execution policy
typedef TeamPolicyInternal execution_policy ;
typedef PolicyTraits<Properties ... > traits;
TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
m_league_size = p.m_league_size;
m_team_size = p.m_team_size;
m_team_alloc = p.m_team_alloc;
m_team_iter = p.m_team_iter;
m_team_scratch_size[0] = p.m_team_scratch_size[0];
m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
m_team_scratch_size[1] = p.m_team_scratch_size[1];
m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
m_chunk_size = p.m_chunk_size;
return *this;
}
//----------------------------------------
template< class FunctorType >
inline static
- int team_size_max( const FunctorType & )
- { return traits::execution_space::thread_pool_size(1); }
+ int team_size_max( const FunctorType & ) {
+ int pool_size = traits::execution_space::thread_pool_size(1);
+ int max_host_team_size = Impl::HostThreadTeamData::max_team_members;
+ return pool_size<max_host_team_size?pool_size:max_host_team_size;
+ }
template< class FunctorType >
inline static
int team_size_recommended( const FunctorType & )
{ return traits::execution_space::thread_pool_size(2); }
template< class FunctorType >
inline static
int team_size_recommended( const FunctorType &, const int& )
{ return traits::execution_space::thread_pool_size(2); }
//----------------------------------------
private:
int m_league_size ;
int m_team_size ;
int m_team_alloc ;
int m_team_iter ;
size_t m_team_scratch_size[2];
size_t m_thread_scratch_size[2];
int m_chunk_size;
inline void init( const int league_size_request
, const int team_size_request )
{
const int pool_size = traits::execution_space::thread_pool_size(0);
- const int team_max = traits::execution_space::thread_pool_size(1);
+ const int max_host_team_size = Impl::HostThreadTeamData::max_team_members;
+ const int team_max = pool_size<max_host_team_size?pool_size:max_host_team_size;
const int team_grain = traits::execution_space::thread_pool_size(2);
m_league_size = league_size_request ;
m_team_size = team_size_request < team_max ?
team_size_request : team_max ;
// Round team size up to a multiple of 'team_gain'
const int team_size_grain = team_grain * ( ( m_team_size + team_grain - 1 ) / team_grain );
const int team_count = pool_size / team_size_grain ;
// Constraint : pool_size = m_team_alloc * team_count
m_team_alloc = pool_size / team_count ;
// Maxumum number of iterations each team will take:
m_team_iter = ( m_league_size + team_count - 1 ) / team_count ;
set_auto_chunk_size();
}
public:
inline int team_size() const { return m_team_size ; }
inline int league_size() const { return m_league_size ; }
inline size_t scratch_size(const int& level, int team_size_ = -1) const {
if(team_size_ < 0) team_size_ = m_team_size;
return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level] ;
}
/** \brief Specify league size, request team size */
TeamPolicyInternal( typename traits::execution_space &
, int league_size_request
, int team_size_request
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init( league_size_request , team_size_request ); }
TeamPolicyInternal( typename traits::execution_space &
, int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1)
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init( league_size_request , traits::execution_space::thread_pool_size(2) ); }
TeamPolicyInternal( int league_size_request
, int team_size_request
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init( league_size_request , team_size_request ); }
TeamPolicyInternal( int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init( league_size_request , traits::execution_space::thread_pool_size(2) ); }
inline int team_alloc() const { return m_team_alloc ; }
inline int team_iter() const { return m_team_iter ; }
inline int chunk_size() const { return m_chunk_size ; }
/** \brief set chunk_size to a discrete value*/
inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
TeamPolicyInternal p = *this;
p.m_chunk_size = chunk_size_;
return p;
}
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
return p;
};
inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
private:
/** \brief finalize chunk_size if it was set to AUTO*/
inline void set_auto_chunk_size() {
int concurrency = traits::execution_space::thread_pool_size(0)/m_team_alloc;
if( concurrency==0 ) concurrency=1;
if(m_chunk_size > 0) {
if(!Impl::is_integral_power_of_two( m_chunk_size ))
Kokkos::abort("TeamPolicy blocking granularity must be power of two" );
}
int new_chunk_size = 1;
while(new_chunk_size*100*concurrency < m_league_size)
new_chunk_size *= 2;
if(new_chunk_size < 128) {
new_chunk_size = 1;
while( (new_chunk_size*40*concurrency < m_league_size ) && (new_chunk_size<128) )
new_chunk_size*=2;
}
m_chunk_size = new_chunk_size;
}
public:
- typedef Impl::OpenMPexecTeamMember member_type ;
+ typedef Impl::HostThreadTeamMember< Kokkos::OpenMP > member_type ;
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
inline
int OpenMP::thread_pool_size( int depth )
{
return Impl::OpenMPexec::pool_size(depth);
}
KOKKOS_INLINE_FUNCTION
int OpenMP::thread_pool_rank()
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
return Impl::OpenMPexec::m_map_rank[ omp_get_thread_num() ];
#else
return -1 ;
#endif
}
-template< typename iType >
-KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >
-TeamThreadRange( const Impl::OpenMPexecTeamMember& thread, const iType& count ) {
- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >( thread, count );
-}
-
-template< typename iType1, typename iType2 >
-KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
- Impl::OpenMPexecTeamMember >
-TeamThreadRange( const Impl::OpenMPexecTeamMember& thread, const iType1& begin, const iType2& end ) {
- typedef typename std::common_type< iType1, iType2 >::type iType;
- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::OpenMPexecTeamMember >( thread, iType(begin), iType(end) );
-}
-
-template<typename iType>
-KOKKOS_INLINE_FUNCTION
-Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >
-ThreadVectorRange(const Impl::OpenMPexecTeamMember& thread, const iType& count) {
- return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >(thread,count);
-}
-
-KOKKOS_INLINE_FUNCTION
-Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember> PerTeam(const Impl::OpenMPexecTeamMember& thread) {
- return Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember>(thread);
-}
-
-KOKKOS_INLINE_FUNCTION
-Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember> PerThread(const Impl::OpenMPexecTeamMember& thread) {
- return Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>(thread);
-}
-
} // namespace Kokkos
-namespace Kokkos {
-
- /** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all threads of the the calling thread team.
- * This functionality requires C++11 support.*/
-template<typename iType, class Lambda>
-KOKKOS_INLINE_FUNCTION
-void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>& loop_boundaries, const Lambda& lambda) {
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
- lambda(i);
-}
-
-/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
- * val is performed and put into result. This functionality requires C++11 support.*/
-template< typename iType, class Lambda, typename ValueType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>& loop_boundaries,
- const Lambda & lambda, ValueType& result) {
-
- result = ValueType();
-
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- result+=tmp;
- }
-
- result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
-}
-
-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
- * val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
- * The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
- * the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
- * '1 for *'). This functionality requires C++11 support.*/
-template< typename iType, class Lambda, typename ValueType, class JoinType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember>& loop_boundaries,
- const Lambda & lambda, const JoinType& join, ValueType& init_result) {
-
- ValueType result = init_result;
-
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- join(result,tmp);
- }
-
- init_result = loop_boundaries.thread.team_reduce(result,join);
-}
-
-} //namespace Kokkos
-
-namespace Kokkos {
-/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
- * This functionality requires C++11 support.*/
-template<typename iType, class Lambda>
-KOKKOS_INLINE_FUNCTION
-void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
- loop_boundaries, const Lambda& lambda) {
- #ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
- #pragma ivdep
- #endif
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
- lambda(i);
-}
-
-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
- * val is performed and put into result. This functionality requires C++11 support.*/
-template< typename iType, class Lambda, typename ValueType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
- loop_boundaries, const Lambda & lambda, ValueType& result) {
- result = ValueType();
-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
-#pragma ivdep
-#endif
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- result+=tmp;
- }
-}
-
-/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
- * val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
- * The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
- * the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
- * '1 for *'). This functionality requires C++11 support.*/
-template< typename iType, class Lambda, typename ValueType, class JoinType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
- loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
-
- ValueType result = init_result;
-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
-#pragma ivdep
-#endif
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- join(result,tmp);
- }
- init_result = result;
-}
-
-/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
- * for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
- * Depending on the target execution space the operator might be called twice: once with final=false
- * and once with final=true. When final==true val contains the prefix sum value. The contribution of this
- * "i" needs to be added to val no matter whether final==true or not. In a serial execution
- * (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
- * to the final sum value over all vector lanes.
- * This functionality requires C++11 support.*/
-template< typename iType, class FunctorType >
-KOKKOS_INLINE_FUNCTION
-void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::OpenMPexecTeamMember >&
- loop_boundaries, const FunctorType & lambda) {
-
- typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
- typedef typename ValueTraits::value_type value_type ;
-
- value_type scan_val = value_type();
-
-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
-#pragma ivdep
-#endif
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- lambda(i,scan_val,true);
- }
-}
-
-} // namespace Kokkos
-
-namespace Kokkos {
-
-template<class FunctorType>
-KOKKOS_INLINE_FUNCTION
-void single(const Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda) {
- lambda();
-}
-
-template<class FunctorType>
-KOKKOS_INLINE_FUNCTION
-void single(const Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda) {
- if(single_struct.team_member.team_rank()==0) lambda();
-}
-
-template<class FunctorType, class ValueType>
-KOKKOS_INLINE_FUNCTION
-void single(const Impl::VectorSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
- lambda(val);
-}
-
-template<class FunctorType, class ValueType>
-KOKKOS_INLINE_FUNCTION
-void single(const Impl::ThreadSingleStruct<Impl::OpenMPexecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
- if(single_struct.team_member.team_rank()==0) {
- lambda(val);
- }
- single_struct.team_member.team_broadcast(val,0);
-}
-}
-
#endif /* #ifndef KOKKOS_OPENMPEXEC_HPP */
diff --git a/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.cpp b/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.cpp
deleted file mode 100644
index b4df5e35b..000000000
--- a/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.cpp
+++ /dev/null
@@ -1,511 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#include <Kokkos_Core_fwd.hpp>
-
-#if defined( KOKKOS_ENABLE_QTHREAD )
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <iostream>
-#include <sstream>
-#include <utility>
-#include <Kokkos_Qthread.hpp>
-#include <Kokkos_Atomic.hpp>
-#include <impl/Kokkos_Error.hpp>
-
-// Defines to enable experimental Qthread functionality
-
-#define QTHREAD_LOCAL_PRIORITY
-#define CLONED_TASKS
-
-#include <qthread/qthread.h>
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-namespace {
-
-enum { MAXIMUM_QTHREAD_WORKERS = 1024 };
-
-/** s_exec is indexed by the reverse rank of the workers
- * for faster fan-in / fan-out lookups
- * [ n - 1 , n - 2 , ... , 0 ]
- */
-QthreadExec * s_exec[ MAXIMUM_QTHREAD_WORKERS ];
-
-int s_number_shepherds = 0 ;
-int s_number_workers_per_shepherd = 0 ;
-int s_number_workers = 0 ;
-
-inline
-QthreadExec ** worker_exec()
-{
- return s_exec + s_number_workers - ( qthread_shep() * s_number_workers_per_shepherd + qthread_worker_local(NULL) + 1 );
-}
-
-const int s_base_size = QthreadExec::align_alloc( sizeof(QthreadExec) );
-
-int s_worker_reduce_end = 0 ; /* End of worker reduction memory */
-int s_worker_shared_end = 0 ; /* Total of worker scratch memory */
-int s_worker_shared_begin = 0 ; /* Beginning of worker shared memory */
-
-QthreadExecFunctionPointer volatile s_active_function = 0 ;
-const void * volatile s_active_function_arg = 0 ;
-
-} /* namespace */
-} /* namespace Impl */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-
-int Qthread::is_initialized()
-{
- return Impl::s_number_workers != 0 ;
-}
-
-int Qthread::concurrency()
-{
- return Impl::s_number_workers_per_shepherd ;
-}
-
-int Qthread::in_parallel()
-{
- return Impl::s_active_function != 0 ;
-}
-
-void Qthread::initialize( int thread_count )
-{
- // Environment variable: QTHREAD_NUM_SHEPHERDS
- // Environment variable: QTHREAD_NUM_WORKERS_PER_SHEP
- // Environment variable: QTHREAD_HWPAR
-
- {
- char buffer[256];
- snprintf(buffer,sizeof(buffer),"QTHREAD_HWPAR=%d",thread_count);
- putenv(buffer);
- }
-
- const bool ok_init = ( QTHREAD_SUCCESS == qthread_initialize() ) &&
- ( thread_count == qthread_num_shepherds() * qthread_num_workers_local(NO_SHEPHERD) ) &&
- ( thread_count == qthread_num_workers() );
-
- bool ok_symmetry = true ;
-
- if ( ok_init ) {
- Impl::s_number_shepherds = qthread_num_shepherds();
- Impl::s_number_workers_per_shepherd = qthread_num_workers_local(NO_SHEPHERD);
- Impl::s_number_workers = Impl::s_number_shepherds * Impl::s_number_workers_per_shepherd ;
-
- for ( int i = 0 ; ok_symmetry && i < Impl::s_number_shepherds ; ++i ) {
- ok_symmetry = ( Impl::s_number_workers_per_shepherd == qthread_num_workers_local(i) );
- }
- }
-
- if ( ! ok_init || ! ok_symmetry ) {
- std::ostringstream msg ;
-
- msg << "Kokkos::Qthread::initialize(" << thread_count << ") FAILED" ;
- msg << " : qthread_num_shepherds = " << qthread_num_shepherds();
- msg << " : qthread_num_workers_per_shepherd = " << qthread_num_workers_local(NO_SHEPHERD);
- msg << " : qthread_num_workers = " << qthread_num_workers();
-
- if ( ! ok_symmetry ) {
- msg << " : qthread_num_workers_local = {" ;
- for ( int i = 0 ; i < Impl::s_number_shepherds ; ++i ) {
- msg << " " << qthread_num_workers_local(i) ;
- }
- msg << " }" ;
- }
-
- Impl::s_number_workers = 0 ;
- Impl::s_number_shepherds = 0 ;
- Impl::s_number_workers_per_shepherd = 0 ;
-
- if ( ok_init ) { qthread_finalize(); }
-
- Kokkos::Impl::throw_runtime_exception( msg.str() );
- }
-
- Impl::QthreadExec::resize_worker_scratch( 256 , 256 );
-
- // Init the array for used for arbitrarily sized atomics
- Impl::init_lock_array_host_space();
-
-}
-
-void Qthread::finalize()
-{
- Impl::QthreadExec::clear_workers();
-
- if ( Impl::s_number_workers ) {
- qthread_finalize();
- }
-
- Impl::s_number_workers = 0 ;
- Impl::s_number_shepherds = 0 ;
- Impl::s_number_workers_per_shepherd = 0 ;
-}
-
-void Qthread::print_configuration( std::ostream & s , const bool detail )
-{
- s << "Kokkos::Qthread {"
- << " num_shepherds(" << Impl::s_number_shepherds << ")"
- << " num_workers_per_shepherd(" << Impl::s_number_workers_per_shepherd << ")"
- << " }" << std::endl ;
-}
-
-Qthread & Qthread::instance( int )
-{
- static Qthread q ;
- return q ;
-}
-
-void Qthread::fence()
-{
-}
-
-int Qthread::shepherd_size() const { return Impl::s_number_shepherds ; }
-int Qthread::shepherd_worker_size() const { return Impl::s_number_workers_per_shepherd ; }
-
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-namespace {
-
-aligned_t driver_exec_all( void * arg )
-{
- QthreadExec & exec = **worker_exec();
-
- (*s_active_function)( exec , s_active_function_arg );
-
-/*
- fprintf( stdout
- , "QthreadExec driver worker(%d:%d) shepherd(%d:%d) shepherd_worker(%d:%d) done\n"
- , exec.worker_rank()
- , exec.worker_size()
- , exec.shepherd_rank()
- , exec.shepherd_size()
- , exec.shepherd_worker_rank()
- , exec.shepherd_worker_size()
- );
- fflush(stdout);
-*/
-
- return 0 ;
-}
-
-aligned_t driver_resize_worker_scratch( void * arg )
-{
- static volatile int lock_begin = 0 ;
- static volatile int lock_end = 0 ;
-
- QthreadExec ** const exec = worker_exec();
-
- //----------------------------------------
- // Serialize allocation for thread safety
-
- while ( ! atomic_compare_exchange_strong( & lock_begin , 0 , 1 ) ); // Spin wait to claim lock
-
- const bool ok = 0 == *exec ;
-
- if ( ok ) { *exec = (QthreadExec *) malloc( s_base_size + s_worker_shared_end ); }
-
- lock_begin = 0 ; // release lock
-
- if ( ok ) { new( *exec ) QthreadExec(); }
-
- //----------------------------------------
- // Wait for all calls to complete to insure that each worker has executed.
-
- if ( s_number_workers == 1 + atomic_fetch_add( & lock_end , 1 ) ) { lock_end = 0 ; }
-
- while ( lock_end );
-
-/*
- fprintf( stdout
- , "QthreadExec resize worker(%d:%d) shepherd(%d:%d) shepherd_worker(%d:%d) done\n"
- , (**exec).worker_rank()
- , (**exec).worker_size()
- , (**exec).shepherd_rank()
- , (**exec).shepherd_size()
- , (**exec).shepherd_worker_rank()
- , (**exec).shepherd_worker_size()
- );
- fflush(stdout);
-*/
-
- //----------------------------------------
-
- if ( ! ok ) {
- fprintf( stderr , "Kokkos::QthreadExec resize failed\n" );
- fflush( stderr );
- }
-
- return 0 ;
-}
-
-void verify_is_process( const char * const label , bool not_active = false )
-{
- const bool not_process = 0 != qthread_shep() || 0 != qthread_worker_local(NULL);
- const bool is_active = not_active && ( s_active_function || s_active_function_arg );
-
- if ( not_process || is_active ) {
- std::string msg( label );
- msg.append( " : FAILED" );
- if ( not_process ) msg.append(" : not called by main process");
- if ( is_active ) msg.append(" : parallel execution in progress");
- Kokkos::Impl::throw_runtime_exception( msg );
- }
-}
-
-}
-
-int QthreadExec::worker_per_shepherd()
-{
- return s_number_workers_per_shepherd ;
-}
-
-QthreadExec::QthreadExec()
-{
- const int shepherd_rank = qthread_shep();
- const int shepherd_worker_rank = qthread_worker_local(NULL);
- const int worker_rank = shepherd_rank * s_number_workers_per_shepherd + shepherd_worker_rank ;
-
- m_worker_base = s_exec ;
- m_shepherd_base = s_exec + s_number_workers_per_shepherd * ( ( s_number_shepherds - ( shepherd_rank + 1 ) ) );
- m_scratch_alloc = ( (unsigned char *) this ) + s_base_size ;
- m_reduce_end = s_worker_reduce_end ;
- m_shepherd_rank = shepherd_rank ;
- m_shepherd_size = s_number_shepherds ;
- m_shepherd_worker_rank = shepherd_worker_rank ;
- m_shepherd_worker_size = s_number_workers_per_shepherd ;
- m_worker_rank = worker_rank ;
- m_worker_size = s_number_workers ;
- m_worker_state = QthreadExec::Active ;
-}
-
-void QthreadExec::clear_workers()
-{
- for ( int iwork = 0 ; iwork < s_number_workers ; ++iwork ) {
- QthreadExec * const exec = s_exec[iwork] ;
- s_exec[iwork] = 0 ;
- free( exec );
- }
-}
-
-void QthreadExec::shared_reset( Qthread::scratch_memory_space & space )
-{
- new( & space )
- Qthread::scratch_memory_space(
- ((unsigned char *) (**m_shepherd_base).m_scratch_alloc ) + s_worker_shared_begin ,
- s_worker_shared_end - s_worker_shared_begin
- );
-}
-
-void QthreadExec::resize_worker_scratch( const int reduce_size , const int shared_size )
-{
- const int exec_all_reduce_alloc = align_alloc( reduce_size );
- const int shepherd_scan_alloc = align_alloc( 8 );
- const int shepherd_shared_end = exec_all_reduce_alloc + shepherd_scan_alloc + align_alloc( shared_size );
-
- if ( s_worker_reduce_end < exec_all_reduce_alloc ||
- s_worker_shared_end < shepherd_shared_end ) {
-
-/*
- fprintf( stdout , "QthreadExec::resize\n");
- fflush(stdout);
-*/
-
- // Clear current worker memory before allocating new worker memory
- clear_workers();
-
- // Increase the buffers to an aligned allocation
- s_worker_reduce_end = exec_all_reduce_alloc ;
- s_worker_shared_begin = exec_all_reduce_alloc + shepherd_scan_alloc ;
- s_worker_shared_end = shepherd_shared_end ;
-
- // Need to query which shepherd this main 'process' is running...
-
- const int main_shep = qthread_shep();
-
- // Have each worker resize its memory for proper first-touch
-#if 0
- for ( int jshep = 0 ; jshep < s_number_shepherds ; ++jshep ) {
- for ( int i = jshep != main_shep ? 0 : 1 ; i < s_number_workers_per_shepherd ; ++i ) {
- qthread_fork_to( driver_resize_worker_scratch , NULL , NULL , jshep );
- }}
-#else
- // If this function is used before the 'qthread.task_policy' unit test
- // the 'qthread.task_policy' unit test fails with a seg-fault within libqthread.so.
- for ( int jshep = 0 ; jshep < s_number_shepherds ; ++jshep ) {
- const int num_clone = jshep != main_shep ? s_number_workers_per_shepherd : s_number_workers_per_shepherd - 1 ;
-
- if ( num_clone ) {
- const int ret = qthread_fork_clones_to_local_priority
- ( driver_resize_worker_scratch /* function */
- , NULL /* function data block */
- , NULL /* pointer to return value feb */
- , jshep /* shepherd number */
- , num_clone - 1 /* number of instances - 1 */
- );
-
- assert(ret == QTHREAD_SUCCESS);
- }
- }
-#endif
-
- driver_resize_worker_scratch( NULL );
-
- // Verify all workers allocated
-
- bool ok = true ;
- for ( int iwork = 0 ; ok && iwork < s_number_workers ; ++iwork ) { ok = 0 != s_exec[iwork] ; }
-
- if ( ! ok ) {
- std::ostringstream msg ;
- msg << "Kokkos::Impl::QthreadExec::resize : FAILED for workers {" ;
- for ( int iwork = 0 ; iwork < s_number_workers ; ++iwork ) {
- if ( 0 == s_exec[iwork] ) { msg << " " << ( s_number_workers - ( iwork + 1 ) ); }
- }
- msg << " }" ;
- Kokkos::Impl::throw_runtime_exception( msg.str() );
- }
- }
-}
-
-void QthreadExec::exec_all( Qthread & , QthreadExecFunctionPointer func , const void * arg )
-{
- verify_is_process("QthreadExec::exec_all(...)",true);
-
-/*
- fprintf( stdout , "QthreadExec::exec_all\n");
- fflush(stdout);
-*/
-
- s_active_function = func ;
- s_active_function_arg = arg ;
-
- // Need to query which shepherd this main 'process' is running...
-
- const int main_shep = qthread_shep();
-
-#if 0
- for ( int jshep = 0 , iwork = 0 ; jshep < s_number_shepherds ; ++jshep ) {
- for ( int i = jshep != main_shep ? 0 : 1 ; i < s_number_workers_per_shepherd ; ++i , ++iwork ) {
- qthread_fork_to( driver_exec_all , NULL , NULL , jshep );
- }}
-#else
- // If this function is used before the 'qthread.task_policy' unit test
- // the 'qthread.task_policy' unit test fails with a seg-fault within libqthread.so.
- for ( int jshep = 0 ; jshep < s_number_shepherds ; ++jshep ) {
- const int num_clone = jshep != main_shep ? s_number_workers_per_shepherd : s_number_workers_per_shepherd - 1 ;
-
- if ( num_clone ) {
- const int ret = qthread_fork_clones_to_local_priority
- ( driver_exec_all /* function */
- , NULL /* function data block */
- , NULL /* pointer to return value feb */
- , jshep /* shepherd number */
- , num_clone - 1 /* number of instances - 1 */
- );
-
- assert(ret == QTHREAD_SUCCESS);
- }
- }
-#endif
-
- driver_exec_all( NULL );
-
- s_active_function = 0 ;
- s_active_function_arg = 0 ;
-}
-
-void * QthreadExec::exec_all_reduce_result()
-{
- return s_exec[0]->m_scratch_alloc ;
-}
-
-} /* namespace Impl */
-} /* namespace Kokkos */
-
-namespace Kokkos {
-namespace Impl {
-
-QthreadTeamPolicyMember::QthreadTeamPolicyMember()
- : m_exec( **worker_exec() )
- , m_team_shared(0,0)
- , m_team_size( 1 )
- , m_team_rank( 0 )
- , m_league_size(1)
- , m_league_end(1)
- , m_league_rank(0)
-{
- m_exec.shared_reset( m_team_shared );
-}
-
-QthreadTeamPolicyMember::QthreadTeamPolicyMember( const QthreadTeamPolicyMember::TaskTeam & )
- : m_exec( **worker_exec() )
- , m_team_shared(0,0)
- , m_team_size( s_number_workers_per_shepherd )
- , m_team_rank( m_exec.shepherd_worker_rank() )
- , m_league_size(1)
- , m_league_end(1)
- , m_league_rank(0)
-{
- m_exec.shared_reset( m_team_shared );
-}
-
-} /* namespace Impl */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-
-#endif /* #if defined( KOKKOS_ENABLE_QTHREAD ) */
-
diff --git a/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.hpp b/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.hpp
deleted file mode 100644
index f948eb290..000000000
--- a/lib/kokkos/core/src/Qthread/Kokkos_QthreadExec.hpp
+++ /dev/null
@@ -1,620 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#ifndef KOKKOS_QTHREADEXEC_HPP
-#define KOKKOS_QTHREADEXEC_HPP
-
-#include <impl/Kokkos_spinwait.hpp>
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-//----------------------------------------------------------------------------
-
-class QthreadExec ;
-
-typedef void (*QthreadExecFunctionPointer)( QthreadExec & , const void * );
-
-class QthreadExec {
-private:
-
- enum { Inactive = 0 , Active = 1 };
-
- const QthreadExec * const * m_worker_base ;
- const QthreadExec * const * m_shepherd_base ;
-
- void * m_scratch_alloc ; ///< Scratch memory [ reduce , team , shared ]
- int m_reduce_end ; ///< End of scratch reduction memory
-
- int m_shepherd_rank ;
- int m_shepherd_size ;
-
- int m_shepherd_worker_rank ;
- int m_shepherd_worker_size ;
-
- /*
- * m_worker_rank = m_shepherd_rank * m_shepherd_worker_size + m_shepherd_worker_rank
- * m_worker_size = m_shepherd_size * m_shepherd_worker_size
- */
- int m_worker_rank ;
- int m_worker_size ;
-
- int mutable volatile m_worker_state ;
-
-
- friend class Kokkos::Qthread ;
-
- ~QthreadExec();
- QthreadExec( const QthreadExec & );
- QthreadExec & operator = ( const QthreadExec & );
-
-public:
-
- QthreadExec();
-
- /** Execute the input function on all available Qthread workers */
- static void exec_all( Qthread & , QthreadExecFunctionPointer , const void * );
-
- //----------------------------------------
- /** Barrier across all workers participating in the 'exec_all' */
- void exec_all_barrier() const
- {
- const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
-
- int n , j ;
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
- Impl::spinwait( m_worker_base[j]->m_worker_state , QthreadExec::Active );
- }
-
- if ( rev_rank ) {
- m_worker_state = QthreadExec::Inactive ;
- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
- }
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
- m_worker_base[j]->m_worker_state = QthreadExec::Active ;
- }
- }
-
- /** Barrier across workers within the shepherd with rank < team_rank */
- void shepherd_barrier( const int team_size ) const
- {
- if ( m_shepherd_worker_rank < team_size ) {
-
- const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
-
- int n , j ;
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
- Impl::spinwait( m_shepherd_base[j]->m_worker_state , QthreadExec::Active );
- }
-
- if ( rev_rank ) {
- m_worker_state = QthreadExec::Inactive ;
- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
- }
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
- m_shepherd_base[j]->m_worker_state = QthreadExec::Active ;
- }
- }
- }
-
- //----------------------------------------
- /** Reduce across all workers participating in the 'exec_all' */
- template< class FunctorType , class ReducerType , class ArgTag >
- inline
- void exec_all_reduce( const FunctorType & func, const ReducerType & reduce ) const
- {
- typedef Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, FunctorType, ReducerType > ReducerConditional;
- typedef typename ReducerConditional::type ReducerTypeFwd;
- typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, ArgTag > ValueJoin ;
-
- const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
-
- int n , j ;
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
- const QthreadExec & fan = *m_worker_base[j];
-
- Impl::spinwait( fan.m_worker_state , QthreadExec::Active );
-
- ValueJoin::join( ReducerConditional::select(func , reduce) , m_scratch_alloc , fan.m_scratch_alloc );
- }
-
- if ( rev_rank ) {
- m_worker_state = QthreadExec::Inactive ;
- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
- }
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
- m_worker_base[j]->m_worker_state = QthreadExec::Active ;
- }
- }
-
- //----------------------------------------
- /** Scall across all workers participating in the 'exec_all' */
- template< class FunctorType , class ArgTag >
- inline
- void exec_all_scan( const FunctorType & func ) const
- {
- typedef Kokkos::Impl::FunctorValueInit< FunctorType , ArgTag > ValueInit ;
- typedef Kokkos::Impl::FunctorValueJoin< FunctorType , ArgTag > ValueJoin ;
- typedef Kokkos::Impl::FunctorValueOps< FunctorType , ArgTag > ValueOps ;
-
- const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
-
- int n , j ;
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
- Impl::spinwait( m_worker_base[j]->m_worker_state , QthreadExec::Active );
- }
-
- if ( rev_rank ) {
- m_worker_state = QthreadExec::Inactive ;
- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
- }
- else {
- // Root thread scans across values before releasing threads
- // Worker data is in reverse order, so m_worker_base[0] is the
- // highest ranking thread.
-
- // Copy from lower ranking to higher ranking worker.
- for ( int i = 1 ; i < m_worker_size ; ++i ) {
- ValueOps::copy( func
- , m_worker_base[i-1]->m_scratch_alloc
- , m_worker_base[i]->m_scratch_alloc
- );
- }
-
- ValueInit::init( func , m_worker_base[m_worker_size-1]->m_scratch_alloc );
-
- // Join from lower ranking to higher ranking worker.
- // Value at m_worker_base[n-1] is zero so skip adding it to m_worker_base[n-2].
- for ( int i = m_worker_size - 1 ; --i > 0 ; ) {
- ValueJoin::join( func , m_worker_base[i-1]->m_scratch_alloc , m_worker_base[i]->m_scratch_alloc );
- }
- }
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ) ; n <<= 1 ) {
- m_worker_base[j]->m_worker_state = QthreadExec::Active ;
- }
- }
-
- //----------------------------------------
-
- template< class Type>
- inline
- volatile Type * shepherd_team_scratch_value() const
- { return (volatile Type*)(((unsigned char *) m_scratch_alloc) + m_reduce_end); }
-
- template< class Type >
- inline
- void shepherd_broadcast( Type & value , const int team_size , const int team_rank ) const
- {
- if ( m_shepherd_base ) {
- Type * const shared_value = m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
- if ( m_shepherd_worker_rank == team_rank ) { *shared_value = value ; }
- memory_fence();
- shepherd_barrier( team_size );
- value = *shared_value ;
- }
- }
-
- template< class Type >
- inline
- Type shepherd_reduce( const int team_size , const Type & value ) const
- {
- *shepherd_team_scratch_value<Type>() = value ;
-
- memory_fence();
-
- const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
-
- int n , j ;
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
- Impl::spinwait( m_shepherd_base[j]->m_worker_state , QthreadExec::Active );
- }
-
- if ( rev_rank ) {
- m_worker_state = QthreadExec::Inactive ;
- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
- }
- else {
- Type & accum = * m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
- for ( int i = 1 ; i < n ; ++i ) {
- accum += * m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
- }
- for ( int i = 1 ; i < n ; ++i ) {
- * m_shepherd_base[i]->shepherd_team_scratch_value<Type>() = accum ;
- }
-
- memory_fence();
- }
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
- m_shepherd_base[j]->m_worker_state = QthreadExec::Active ;
- }
-
- return *shepherd_team_scratch_value<Type>();
- }
-
- template< class JoinOp >
- inline
- typename JoinOp::value_type
- shepherd_reduce( const int team_size
- , const typename JoinOp::value_type & value
- , const JoinOp & op ) const
- {
- typedef typename JoinOp::value_type Type ;
-
- *shepherd_team_scratch_value<Type>() = value ;
-
- memory_fence();
-
- const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
-
- int n , j ;
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
- Impl::spinwait( m_shepherd_base[j]->m_worker_state , QthreadExec::Active );
- }
-
- if ( rev_rank ) {
- m_worker_state = QthreadExec::Inactive ;
- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
- }
- else {
- volatile Type & accum = * m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
- for ( int i = 1 ; i < team_size ; ++i ) {
- op.join( accum , * m_shepherd_base[i]->shepherd_team_scratch_value<Type>() );
- }
- for ( int i = 1 ; i < team_size ; ++i ) {
- * m_shepherd_base[i]->shepherd_team_scratch_value<Type>() = accum ;
- }
-
- memory_fence();
- }
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
- m_shepherd_base[j]->m_worker_state = QthreadExec::Active ;
- }
-
- return *shepherd_team_scratch_value<Type>();
- }
-
- template< class Type >
- inline
- Type shepherd_scan( const int team_size
- , const Type & value
- , Type * const global_value = 0 ) const
- {
- *shepherd_team_scratch_value<Type>() = value ;
-
- memory_fence();
-
- const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
-
- int n , j ;
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
- Impl::spinwait( m_shepherd_base[j]->m_worker_state , QthreadExec::Active );
- }
-
- if ( rev_rank ) {
- m_worker_state = QthreadExec::Inactive ;
- Impl::spinwait( m_worker_state , QthreadExec::Inactive );
- }
- else {
- // Root thread scans across values before releasing threads
- // Worker data is in reverse order, so m_shepherd_base[0] is the
- // highest ranking thread.
-
- // Copy from lower ranking to higher ranking worker.
-
- Type accum = * m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
- for ( int i = 1 ; i < team_size ; ++i ) {
- const Type tmp = * m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
- accum += tmp ;
- * m_shepherd_base[i-1]->shepherd_team_scratch_value<Type>() = tmp ;
- }
-
- * m_shepherd_base[team_size-1]->shepherd_team_scratch_value<Type>() =
- global_value ? atomic_fetch_add( global_value , accum ) : 0 ;
-
- // Join from lower ranking to higher ranking worker.
- for ( int i = team_size ; --i ; ) {
- * m_shepherd_base[i-1]->shepherd_team_scratch_value<Type>() += * m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
- }
-
- memory_fence();
- }
-
- for ( n = 1 ; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ) ; n <<= 1 ) {
- m_shepherd_base[j]->m_worker_state = QthreadExec::Active ;
- }
-
- return *shepherd_team_scratch_value<Type>();
- }
-
- //----------------------------------------
-
- static inline
- int align_alloc( int size )
- {
- enum { ALLOC_GRAIN = 1 << 6 /* power of two, 64bytes */};
- enum { ALLOC_GRAIN_MASK = ALLOC_GRAIN - 1 };
- return ( size + ALLOC_GRAIN_MASK ) & ~ALLOC_GRAIN_MASK ;
- }
-
- void shared_reset( Qthread::scratch_memory_space & );
-
- void * exec_all_reduce_value() const { return m_scratch_alloc ; }
-
- static void * exec_all_reduce_result();
-
- static void resize_worker_scratch( const int reduce_size , const int shared_size );
- static void clear_workers();
-
- //----------------------------------------
-
- inline int worker_rank() const { return m_worker_rank ; }
- inline int worker_size() const { return m_worker_size ; }
- inline int shepherd_worker_rank() const { return m_shepherd_worker_rank ; }
- inline int shepherd_worker_size() const { return m_shepherd_worker_size ; }
- inline int shepherd_rank() const { return m_shepherd_rank ; }
- inline int shepherd_size() const { return m_shepherd_size ; }
-
- static int worker_per_shepherd();
-};
-
-} /* namespace Impl */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-
-namespace Kokkos {
-namespace Impl {
-
-class QthreadTeamPolicyMember {
-private:
-
- typedef Kokkos::Qthread execution_space ;
- typedef execution_space::scratch_memory_space scratch_memory_space ;
-
-
- Impl::QthreadExec & m_exec ;
- scratch_memory_space m_team_shared ;
- const int m_team_size ;
- const int m_team_rank ;
- const int m_league_size ;
- const int m_league_end ;
- int m_league_rank ;
-
-public:
-
- KOKKOS_INLINE_FUNCTION
- const scratch_memory_space & team_shmem() const { return m_team_shared ; }
-
- KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
- KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
- KOKKOS_INLINE_FUNCTION int team_rank() const { return m_team_rank ; }
- KOKKOS_INLINE_FUNCTION int team_size() const { return m_team_size ; }
-
- KOKKOS_INLINE_FUNCTION void team_barrier() const
-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- {}
-#else
- { m_exec.shepherd_barrier( m_team_size ); }
-#endif
-
- template< typename Type >
- KOKKOS_INLINE_FUNCTION Type team_broadcast( const Type & value , int rank ) const
-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { return Type(); }
-#else
- { return m_exec.template shepherd_broadcast<Type>( value , m_team_size , rank ); }
-#endif
-
- template< typename Type >
- KOKKOS_INLINE_FUNCTION Type team_reduce( const Type & value ) const
-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { return Type(); }
-#else
- { return m_exec.template shepherd_reduce<Type>( m_team_size , value ); }
-#endif
-
- template< typename JoinOp >
- KOKKOS_INLINE_FUNCTION typename JoinOp::value_type
- team_reduce( const typename JoinOp::value_type & value
- , const JoinOp & op ) const
-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { return typename JoinOp::value_type(); }
-#else
- { return m_exec.template shepherd_reduce<JoinOp>( m_team_size , value , op ); }
-#endif
-
- /** \brief Intra-team exclusive prefix sum with team_rank() ordering.
- *
- * The highest rank thread can compute the reduction total as
- * reduction_total = dev.team_scan( value ) + value ;
- */
- template< typename Type >
- KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const
-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { return Type(); }
-#else
- { return m_exec.template shepherd_scan<Type>( m_team_size , value ); }
-#endif
-
- /** \brief Intra-team exclusive prefix sum with team_rank() ordering
- * with intra-team non-deterministic ordering accumulation.
- *
- * The global inter-team accumulation value will, at the end of the
- * league's parallel execution, be the scan's total.
- * Parallel execution ordering of the league's teams is non-deterministic.
- * As such the base value for each team's scan operation is similarly
- * non-deterministic.
- */
- template< typename Type >
- KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value , Type * const global_accum ) const
-#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- { return Type(); }
-#else
- { return m_exec.template shepherd_scan<Type>( m_team_size , value , global_accum ); }
-#endif
-
- //----------------------------------------
- // Private driver for task-team parallel
-
- struct TaskTeam {};
-
- QthreadTeamPolicyMember();
- explicit QthreadTeamPolicyMember( const TaskTeam & );
-
- //----------------------------------------
- // Private for the driver ( for ( member_type i(exec,team); i ; i.next_team() ) { ... }
-
- // Initialize
- template< class ... Properties >
- QthreadTeamPolicyMember( Impl::QthreadExec & exec
- , const Kokkos::Impl::TeamPolicyInternal<Qthread,Properties...> & team )
- : m_exec( exec )
- , m_team_shared(0,0)
- , m_team_size( team.m_team_size )
- , m_team_rank( exec.shepherd_worker_rank() )
- , m_league_size( team.m_league_size )
- , m_league_end( team.m_league_size - team.m_shepherd_iter * ( exec.shepherd_size() - ( exec.shepherd_rank() + 1 ) ) )
- , m_league_rank( m_league_end > team.m_shepherd_iter ? m_league_end - team.m_shepherd_iter : 0 )
- {
- m_exec.shared_reset( m_team_shared );
- }
-
- // Continue
- operator bool () const { return m_league_rank < m_league_end ; }
-
- // iterate
- void next_team() { ++m_league_rank ; m_exec.shared_reset( m_team_shared ); }
-};
-
-
-template< class ... Properties >
-class TeamPolicyInternal< Kokkos::Qthread , Properties ... >
- : public PolicyTraits< Properties... >
-{
-private:
-
- const int m_league_size ;
- const int m_team_size ;
- const int m_shepherd_iter ;
-
-public:
-
- //! Tag this class as a kokkos execution policy
- typedef TeamPolicyInternal execution_policy ;
- typedef Qthread execution_space ;
- typedef PolicyTraits< Properties ... > traits ;
-
- //----------------------------------------
-
- template< class FunctorType >
- inline static
- int team_size_max( const FunctorType & )
- { return Qthread::instance().shepherd_worker_size(); }
-
- template< class FunctorType >
- static int team_size_recommended( const FunctorType & f )
- { return team_size_max( f ); }
-
- template< class FunctorType >
- inline static
- int team_size_recommended( const FunctorType & f , const int& )
- { return team_size_max( f ); }
-
- //----------------------------------------
-
- inline int team_size() const { return m_team_size ; }
- inline int league_size() const { return m_league_size ; }
-
- // One active team per shepherd
- TeamPolicyInternal( Kokkos::Qthread & q
- , const int league_size
- , const int team_size
- , const int /* vector_length */ = 0
- )
- : m_league_size( league_size )
- , m_team_size( team_size < q.shepherd_worker_size()
- ? team_size : q.shepherd_worker_size() )
- , m_shepherd_iter( ( league_size + q.shepherd_size() - 1 ) / q.shepherd_size() )
- {
- }
-
- // One active team per shepherd
- TeamPolicyInternal( const int league_size
- , const int team_size
- , const int /* vector_length */ = 0
- )
- : m_league_size( league_size )
- , m_team_size( team_size < Qthread::instance().shepherd_worker_size()
- ? team_size : Qthread::instance().shepherd_worker_size() )
- , m_shepherd_iter( ( league_size + Qthread::instance().shepherd_size() - 1 ) / Qthread::instance().shepherd_size() )
- {
- }
-
- typedef Impl::QthreadTeamPolicyMember member_type ;
-
- friend class Impl::QthreadTeamPolicyMember ;
-};
-
-} /* namespace Impl */
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
-#endif /* #define KOKKOS_QTHREADEXEC_HPP */
-
diff --git a/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.cpp b/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.cpp
new file mode 100644
index 000000000..1b9249408
--- /dev/null
+++ b/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.cpp
@@ -0,0 +1,519 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#include <Kokkos_Core_fwd.hpp>
+
+#if defined( KOKKOS_ENABLE_QTHREADS )
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <iostream>
+#include <sstream>
+#include <utility>
+
+#include <Kokkos_Qthreads.hpp>
+#include <Kokkos_Atomic.hpp>
+#include <impl/Kokkos_Error.hpp>
+
+// Defines to enable experimental Qthreads functionality.
+//#define QTHREAD_LOCAL_PRIORITY
+//#define CLONED_TASKS
+
+//#include <qthread.h>
+
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+namespace Impl {
+
+namespace {
+
+enum { MAXIMUM_QTHREADS_WORKERS = 1024 };
+
+/** s_exec is indexed by the reverse rank of the workers
+ * for faster fan-in / fan-out lookups
+ * [ n - 1, n - 2, ..., 0 ]
+ */
+QthreadsExec * s_exec[ MAXIMUM_QTHREADS_WORKERS ];
+
+int s_number_shepherds = 0;
+int s_number_workers_per_shepherd = 0;
+int s_number_workers = 0;
+
+inline
+QthreadsExec ** worker_exec()
+{
+ return s_exec + s_number_workers - ( qthread_shep() * s_number_workers_per_shepherd + qthread_worker_local( NULL ) + 1 );
+}
+
+const int s_base_size = QthreadsExec::align_alloc( sizeof(QthreadsExec) );
+
+int s_worker_reduce_end = 0; // End of worker reduction memory.
+int s_worker_shared_end = 0; // Total of worker scratch memory.
+int s_worker_shared_begin = 0; // Beginning of worker shared memory.
+
+QthreadsExecFunctionPointer volatile s_active_function = 0;
+const void * volatile s_active_function_arg = 0;
+
+} // namespace
+
+} // namespace Impl
+
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+int Qthreads::is_initialized()
+{
+ return Impl::s_number_workers != 0;
+}
+
+int Qthreads::concurrency()
+{
+ return Impl::s_number_workers_per_shepherd;
+}
+
+int Qthreads::in_parallel()
+{
+ return Impl::s_active_function != 0;
+}
+
+void Qthreads::initialize( int thread_count )
+{
+ // Environment variable: QTHREAD_NUM_SHEPHERDS
+ // Environment variable: QTHREAD_NUM_WORKERS_PER_SHEP
+ // Environment variable: QTHREAD_HWPAR
+
+ {
+ char buffer[256];
+ snprintf( buffer, sizeof(buffer), "QTHREAD_HWPAR=%d", thread_count );
+ putenv( buffer );
+ }
+
+ const bool ok_init = ( QTHREAD_SUCCESS == qthread_initialize() ) &&
+ ( thread_count == qthread_num_shepherds() * qthread_num_workers_local( NO_SHEPHERD ) ) &&
+ ( thread_count == qthread_num_workers() );
+
+ bool ok_symmetry = true;
+
+ if ( ok_init ) {
+ Impl::s_number_shepherds = qthread_num_shepherds();
+ Impl::s_number_workers_per_shepherd = qthread_num_workers_local( NO_SHEPHERD );
+ Impl::s_number_workers = Impl::s_number_shepherds * Impl::s_number_workers_per_shepherd;
+
+ for ( int i = 0; ok_symmetry && i < Impl::s_number_shepherds; ++i ) {
+ ok_symmetry = ( Impl::s_number_workers_per_shepherd == qthread_num_workers_local( i ) );
+ }
+ }
+
+ if ( ! ok_init || ! ok_symmetry ) {
+ std::ostringstream msg;
+
+ msg << "Kokkos::Qthreads::initialize(" << thread_count << ") FAILED";
+ msg << " : qthread_num_shepherds = " << qthread_num_shepherds();
+ msg << " : qthread_num_workers_per_shepherd = " << qthread_num_workers_local( NO_SHEPHERD );
+ msg << " : qthread_num_workers = " << qthread_num_workers();
+
+ if ( ! ok_symmetry ) {
+ msg << " : qthread_num_workers_local = {";
+ for ( int i = 0; i < Impl::s_number_shepherds; ++i ) {
+ msg << " " << qthread_num_workers_local( i );
+ }
+ msg << " }";
+ }
+
+ Impl::s_number_workers = 0;
+ Impl::s_number_shepherds = 0;
+ Impl::s_number_workers_per_shepherd = 0;
+
+ if ( ok_init ) { qthread_finalize(); }
+
+ Kokkos::Impl::throw_runtime_exception( msg.str() );
+ }
+
+ Impl::QthreadsExec::resize_worker_scratch( 256, 256 );
+
+ // Init the array for used for arbitrarily sized atomics.
+ Impl::init_lock_array_host_space();
+
+}
+
+void Qthreads::finalize()
+{
+ Impl::QthreadsExec::clear_workers();
+
+ if ( Impl::s_number_workers ) {
+ qthread_finalize();
+ }
+
+ Impl::s_number_workers = 0;
+ Impl::s_number_shepherds = 0;
+ Impl::s_number_workers_per_shepherd = 0;
+}
+
+void Qthreads::print_configuration( std::ostream & s, const bool detail )
+{
+ s << "Kokkos::Qthreads {"
+ << " num_shepherds(" << Impl::s_number_shepherds << ")"
+ << " num_workers_per_shepherd(" << Impl::s_number_workers_per_shepherd << ")"
+ << " }" << std::endl;
+}
+
+Qthreads & Qthreads::instance( int )
+{
+ static Qthreads q;
+ return q;
+}
+
+void Qthreads::fence()
+{
+}
+
+int Qthreads::shepherd_size() const { return Impl::s_number_shepherds; }
+int Qthreads::shepherd_worker_size() const { return Impl::s_number_workers_per_shepherd; }
+
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+namespace Impl {
+
+namespace {
+
+aligned_t driver_exec_all( void * arg )
+{
+ QthreadsExec & exec = **worker_exec();
+
+ (*s_active_function)( exec, s_active_function_arg );
+
+/*
+ fprintf( stdout
+ , "QthreadsExec driver worker(%d:%d) shepherd(%d:%d) shepherd_worker(%d:%d) done\n"
+ , exec.worker_rank()
+ , exec.worker_size()
+ , exec.shepherd_rank()
+ , exec.shepherd_size()
+ , exec.shepherd_worker_rank()
+ , exec.shepherd_worker_size()
+ );
+ fflush(stdout);
+*/
+
+ return 0;
+}
+
+aligned_t driver_resize_worker_scratch( void * arg )
+{
+ static volatile int lock_begin = 0;
+ static volatile int lock_end = 0;
+
+ QthreadsExec ** const exec = worker_exec();
+
+ //----------------------------------------
+ // Serialize allocation for thread safety.
+
+ while ( ! atomic_compare_exchange_strong( & lock_begin, 0, 1 ) ); // Spin wait to claim lock.
+
+ const bool ok = 0 == *exec;
+
+ if ( ok ) { *exec = (QthreadsExec *) malloc( s_base_size + s_worker_shared_end ); }
+
+ lock_begin = 0; // Release lock.
+
+ if ( ok ) { new( *exec ) QthreadsExec(); }
+
+ //----------------------------------------
+ // Wait for all calls to complete to insure that each worker has executed.
+
+ if ( s_number_workers == 1 + atomic_fetch_add( & lock_end, 1 ) ) { lock_end = 0; }
+
+ while ( lock_end );
+
+/*
+ fprintf( stdout
+ , "QthreadsExec resize worker(%d:%d) shepherd(%d:%d) shepherd_worker(%d:%d) done\n"
+ , (**exec).worker_rank()
+ , (**exec).worker_size()
+ , (**exec).shepherd_rank()
+ , (**exec).shepherd_size()
+ , (**exec).shepherd_worker_rank()
+ , (**exec).shepherd_worker_size()
+ );
+ fflush(stdout);
+*/
+
+ //----------------------------------------
+
+ if ( ! ok ) {
+ fprintf( stderr, "Kokkos::QthreadsExec resize failed\n" );
+ fflush( stderr );
+ }
+
+ return 0;
+}
+
+void verify_is_process( const char * const label, bool not_active = false )
+{
+ const bool not_process = 0 != qthread_shep() || 0 != qthread_worker_local( NULL );
+ const bool is_active = not_active && ( s_active_function || s_active_function_arg );
+
+ if ( not_process || is_active ) {
+ std::string msg( label );
+ msg.append( " : FAILED" );
+ if ( not_process ) msg.append(" : not called by main process");
+ if ( is_active ) msg.append(" : parallel execution in progress");
+ Kokkos::Impl::throw_runtime_exception( msg );
+ }
+}
+
+} // namespace
+
+int QthreadsExec::worker_per_shepherd()
+{
+ return s_number_workers_per_shepherd;
+}
+
+QthreadsExec::QthreadsExec()
+{
+ const int shepherd_rank = qthread_shep();
+ const int shepherd_worker_rank = qthread_worker_local( NULL );
+ const int worker_rank = shepherd_rank * s_number_workers_per_shepherd + shepherd_worker_rank;
+
+ m_worker_base = s_exec;
+ m_shepherd_base = s_exec + s_number_workers_per_shepherd * ( ( s_number_shepherds - ( shepherd_rank + 1 ) ) );
+ m_scratch_alloc = ( (unsigned char *) this ) + s_base_size;
+ m_reduce_end = s_worker_reduce_end;
+ m_shepherd_rank = shepherd_rank;
+ m_shepherd_size = s_number_shepherds;
+ m_shepherd_worker_rank = shepherd_worker_rank;
+ m_shepherd_worker_size = s_number_workers_per_shepherd;
+ m_worker_rank = worker_rank;
+ m_worker_size = s_number_workers;
+ m_worker_state = QthreadsExec::Active;
+}
+
+void QthreadsExec::clear_workers()
+{
+ for ( int iwork = 0; iwork < s_number_workers; ++iwork ) {
+ QthreadsExec * const exec = s_exec[iwork];
+ s_exec[iwork] = 0;
+ free( exec );
+ }
+}
+
+void QthreadsExec::shared_reset( Qthreads::scratch_memory_space & space )
+{
+ new( & space )
+ Qthreads::scratch_memory_space(
+ ((unsigned char *) (**m_shepherd_base).m_scratch_alloc ) + s_worker_shared_begin,
+ s_worker_shared_end - s_worker_shared_begin
+ );
+}
+
+void QthreadsExec::resize_worker_scratch( const int reduce_size, const int shared_size )
+{
+ const int exec_all_reduce_alloc = align_alloc( reduce_size );
+ const int shepherd_scan_alloc = align_alloc( 8 );
+ const int shepherd_shared_end = exec_all_reduce_alloc + shepherd_scan_alloc + align_alloc( shared_size );
+
+ if ( s_worker_reduce_end < exec_all_reduce_alloc ||
+ s_worker_shared_end < shepherd_shared_end ) {
+
+/*
+ fprintf( stdout, "QthreadsExec::resize\n");
+ fflush(stdout);
+*/
+
+ // Clear current worker memory before allocating new worker memory.
+ clear_workers();
+
+ // Increase the buffers to an aligned allocation.
+ s_worker_reduce_end = exec_all_reduce_alloc;
+ s_worker_shared_begin = exec_all_reduce_alloc + shepherd_scan_alloc;
+ s_worker_shared_end = shepherd_shared_end;
+
+ // Need to query which shepherd this main 'process' is running.
+
+ const int main_shep = qthread_shep();
+
+ // Have each worker resize its memory for proper first-touch.
+#if 0
+ for ( int jshep = 0; jshep < s_number_shepherds; ++jshep ) {
+ for ( int i = jshep != main_shep ? 0 : 1; i < s_number_workers_per_shepherd; ++i ) {
+ qthread_fork_to( driver_resize_worker_scratch, NULL, NULL, jshep );
+ }
+ }
+#else
+ // If this function is used before the 'qthreads.task_policy' unit test,
+ // the 'qthreads.task_policy' unit test fails with a seg-fault within libqthread.so.
+ for ( int jshep = 0; jshep < s_number_shepherds; ++jshep ) {
+ const int num_clone = jshep != main_shep ? s_number_workers_per_shepherd : s_number_workers_per_shepherd - 1;
+
+ if ( num_clone ) {
+ const int ret = qthread_fork_clones_to_local_priority
+ ( driver_resize_worker_scratch // Function
+ , NULL // Function data block
+ , NULL // Pointer to return value feb
+ , jshep // Shepherd number
+ , num_clone - 1 // Number of instances - 1
+ );
+
+ assert( ret == QTHREAD_SUCCESS );
+ }
+ }
+#endif
+
+ driver_resize_worker_scratch( NULL );
+
+ // Verify all workers allocated.
+
+ bool ok = true;
+ for ( int iwork = 0; ok && iwork < s_number_workers; ++iwork ) { ok = 0 != s_exec[iwork]; }
+
+ if ( ! ok ) {
+ std::ostringstream msg;
+ msg << "Kokkos::Impl::QthreadsExec::resize : FAILED for workers {";
+ for ( int iwork = 0; iwork < s_number_workers; ++iwork ) {
+ if ( 0 == s_exec[iwork] ) { msg << " " << ( s_number_workers - ( iwork + 1 ) ); }
+ }
+ msg << " }";
+ Kokkos::Impl::throw_runtime_exception( msg.str() );
+ }
+ }
+}
+
+void QthreadsExec::exec_all( Qthreads &, QthreadsExecFunctionPointer func, const void * arg )
+{
+ verify_is_process("QthreadsExec::exec_all(...)",true);
+
+/*
+ fprintf( stdout, "QthreadsExec::exec_all\n");
+ fflush(stdout);
+*/
+
+ s_active_function = func;
+ s_active_function_arg = arg;
+
+ // Need to query which shepherd this main 'process' is running.
+
+ const int main_shep = qthread_shep();
+
+#if 0
+ for ( int jshep = 0, iwork = 0; jshep < s_number_shepherds; ++jshep ) {
+ for ( int i = jshep != main_shep ? 0 : 1; i < s_number_workers_per_shepherd; ++i, ++iwork ) {
+ qthread_fork_to( driver_exec_all, NULL, NULL, jshep );
+ }
+ }
+#else
+ // If this function is used before the 'qthreads.task_policy' unit test,
+ // the 'qthreads.task_policy' unit test fails with a seg-fault within libqthread.so.
+ for ( int jshep = 0; jshep < s_number_shepherds; ++jshep ) {
+ const int num_clone = jshep != main_shep ? s_number_workers_per_shepherd : s_number_workers_per_shepherd - 1;
+
+ if ( num_clone ) {
+ const int ret = qthread_fork_clones_to_local_priority
+ ( driver_exec_all // Function
+ , NULL // Function data block
+ , NULL // Pointer to return value feb
+ , jshep // Shepherd number
+ , num_clone - 1 // Number of instances - 1
+ );
+
+ assert(ret == QTHREAD_SUCCESS);
+ }
+ }
+#endif
+
+ driver_exec_all( NULL );
+
+ s_active_function = 0;
+ s_active_function_arg = 0;
+}
+
+void * QthreadsExec::exec_all_reduce_result()
+{
+ return s_exec[0]->m_scratch_alloc;
+}
+
+} // namespace Impl
+
+} // namespace Kokkos
+
+namespace Kokkos {
+
+namespace Impl {
+
+QthreadsTeamPolicyMember::QthreadsTeamPolicyMember()
+ : m_exec( **worker_exec() )
+ , m_team_shared( 0, 0 )
+ , m_team_size( 1 )
+ , m_team_rank( 0 )
+ , m_league_size( 1 )
+ , m_league_end( 1 )
+ , m_league_rank( 0 )
+{
+ m_exec.shared_reset( m_team_shared );
+}
+
+QthreadsTeamPolicyMember::QthreadsTeamPolicyMember( const QthreadsTeamPolicyMember::TaskTeam & )
+ : m_exec( **worker_exec() )
+ , m_team_shared( 0, 0 )
+ , m_team_size( s_number_workers_per_shepherd )
+ , m_team_rank( m_exec.shepherd_worker_rank() )
+ , m_league_size( 1 )
+ , m_league_end( 1 )
+ , m_league_rank( 0 )
+{
+ m_exec.shared_reset( m_team_shared );
+}
+
+} // namespace Impl
+
+} // namespace Kokkos
+
+#endif // #if defined( KOKKOS_ENABLE_QTHREADS )
diff --git a/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.hpp
new file mode 100644
index 000000000..64856eb99
--- /dev/null
+++ b/lib/kokkos/core/src/Qthreads/Kokkos_QthreadsExec.hpp
@@ -0,0 +1,640 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#ifndef KOKKOS_QTHREADSEXEC_HPP
+#define KOKKOS_QTHREADSEXEC_HPP
+
+#include <impl/Kokkos_spinwait.hpp>
+
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+namespace Impl {
+
+class QthreadsExec;
+
+typedef void (*QthreadsExecFunctionPointer)( QthreadsExec &, const void * );
+
+class QthreadsExec {
+private:
+ enum { Inactive = 0, Active = 1 };
+
+ const QthreadsExec * const * m_worker_base;
+ const QthreadsExec * const * m_shepherd_base;
+
+ void * m_scratch_alloc; ///< Scratch memory [ reduce, team, shared ]
+ int m_reduce_end; ///< End of scratch reduction memory
+
+ int m_shepherd_rank;
+ int m_shepherd_size;
+
+ int m_shepherd_worker_rank;
+ int m_shepherd_worker_size;
+
+ /*
+ * m_worker_rank = m_shepherd_rank * m_shepherd_worker_size + m_shepherd_worker_rank
+ * m_worker_size = m_shepherd_size * m_shepherd_worker_size
+ */
+ int m_worker_rank;
+ int m_worker_size;
+
+ int mutable volatile m_worker_state;
+
+ friend class Kokkos::Qthreads;
+
+ ~QthreadsExec();
+ QthreadsExec( const QthreadsExec & );
+ QthreadsExec & operator = ( const QthreadsExec & );
+
+public:
+ QthreadsExec();
+
+ /** Execute the input function on all available Qthreads workers. */
+ static void exec_all( Qthreads &, QthreadsExecFunctionPointer, const void * );
+
+ /** Barrier across all workers participating in the 'exec_all'. */
+ void exec_all_barrier() const
+ {
+ const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
+
+ int n, j;
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
+ Impl::spinwait_while_equal( m_worker_base[j]->m_worker_state, QthreadsExec::Active );
+ }
+
+ if ( rev_rank ) {
+ m_worker_state = QthreadsExec::Inactive;
+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
+ }
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
+ m_worker_base[j]->m_worker_state = QthreadsExec::Active;
+ }
+ }
+
+ /** Barrier across workers within the shepherd with rank < team_rank. */
+ void shepherd_barrier( const int team_size ) const
+ {
+ if ( m_shepherd_worker_rank < team_size ) {
+
+ const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
+
+ int n, j;
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
+ Impl::spinwait_while_equal( m_shepherd_base[j]->m_worker_state, QthreadsExec::Active );
+ }
+
+ if ( rev_rank ) {
+ m_worker_state = QthreadsExec::Inactive;
+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
+ }
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
+ m_shepherd_base[j]->m_worker_state = QthreadsExec::Active;
+ }
+ }
+ }
+
+ /** Reduce across all workers participating in the 'exec_all'. */
+ template< class FunctorType, class ReducerType, class ArgTag >
+ inline
+ void exec_all_reduce( const FunctorType & func, const ReducerType & reduce ) const
+ {
+ typedef Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, FunctorType, ReducerType > ReducerConditional;
+ typedef typename ReducerConditional::type ReducerTypeFwd;
+ typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, ArgTag > ValueJoin;
+
+ const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
+
+ int n, j;
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
+ const QthreadsExec & fan = *m_worker_base[j];
+
+ Impl::spinwait_while_equal( fan.m_worker_state, QthreadsExec::Active );
+
+ ValueJoin::join( ReducerConditional::select( func, reduce ), m_scratch_alloc, fan.m_scratch_alloc );
+ }
+
+ if ( rev_rank ) {
+ m_worker_state = QthreadsExec::Inactive;
+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
+ }
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
+ m_worker_base[j]->m_worker_state = QthreadsExec::Active;
+ }
+ }
+
+ /** Scan across all workers participating in the 'exec_all'. */
+ template< class FunctorType, class ArgTag >
+ inline
+ void exec_all_scan( const FunctorType & func ) const
+ {
+ typedef Kokkos::Impl::FunctorValueInit< FunctorType, ArgTag > ValueInit;
+ typedef Kokkos::Impl::FunctorValueJoin< FunctorType, ArgTag > ValueJoin;
+ typedef Kokkos::Impl::FunctorValueOps< FunctorType, ArgTag > ValueOps;
+
+ const int rev_rank = m_worker_size - ( m_worker_rank + 1 );
+
+ int n, j;
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
+ Impl::spinwait_while_equal( m_worker_base[j]->m_worker_state, QthreadsExec::Active );
+ }
+
+ if ( rev_rank ) {
+ m_worker_state = QthreadsExec::Inactive;
+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
+ }
+ else {
+ // Root thread scans across values before releasing threads.
+ // Worker data is in reverse order, so m_worker_base[0] is the
+ // highest ranking thread.
+
+ // Copy from lower ranking to higher ranking worker.
+ for ( int i = 1; i < m_worker_size; ++i ) {
+ ValueOps::copy( func
+ , m_worker_base[i-1]->m_scratch_alloc
+ , m_worker_base[i]->m_scratch_alloc
+ );
+ }
+
+ ValueInit::init( func, m_worker_base[m_worker_size-1]->m_scratch_alloc );
+
+ // Join from lower ranking to higher ranking worker.
+ // Value at m_worker_base[n-1] is zero so skip adding it to m_worker_base[n-2].
+ for ( int i = m_worker_size - 1; --i > 0; ) {
+ ValueJoin::join( func, m_worker_base[i-1]->m_scratch_alloc, m_worker_base[i]->m_scratch_alloc );
+ }
+ }
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < m_worker_size ); n <<= 1 ) {
+ m_worker_base[j]->m_worker_state = QthreadsExec::Active;
+ }
+ }
+
+ //----------------------------------------
+
+ template< class Type >
+ inline
+ volatile Type * shepherd_team_scratch_value() const
+ { return (volatile Type*)( ( (unsigned char *) m_scratch_alloc ) + m_reduce_end ); }
+
+ template< class Type >
+ inline
+ void shepherd_broadcast( Type & value, const int team_size, const int team_rank ) const
+ {
+ if ( m_shepherd_base ) {
+ Type * const shared_value = m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
+ if ( m_shepherd_worker_rank == team_rank ) { *shared_value = value; }
+ memory_fence();
+ shepherd_barrier( team_size );
+ value = *shared_value;
+ }
+ }
+
+ template< class Type >
+ inline
+ Type shepherd_reduce( const int team_size, const Type & value ) const
+ {
+ volatile Type * const shared_value = shepherd_team_scratch_value<Type>();
+ *shared_value = value;
+// *shepherd_team_scratch_value<Type>() = value;
+
+ memory_fence();
+
+ const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
+
+ int n, j;
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
+ Impl::spinwait_while_equal( m_shepherd_base[j]->m_worker_state, QthreadsExec::Active );
+ }
+
+ if ( rev_rank ) {
+ m_worker_state = QthreadsExec::Inactive;
+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
+ }
+ else {
+ Type & accum = *m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
+ for ( int i = 1; i < n; ++i ) {
+ accum += *m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
+ }
+ for ( int i = 1; i < n; ++i ) {
+ *m_shepherd_base[i]->shepherd_team_scratch_value<Type>() = accum;
+ }
+
+ memory_fence();
+ }
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
+ m_shepherd_base[j]->m_worker_state = QthreadsExec::Active;
+ }
+
+ return *shepherd_team_scratch_value<Type>();
+ }
+
+ template< class JoinOp >
+ inline
+ typename JoinOp::value_type
+ shepherd_reduce( const int team_size
+ , const typename JoinOp::value_type & value
+ , const JoinOp & op ) const
+ {
+ typedef typename JoinOp::value_type Type;
+
+ volatile Type * const shared_value = shepherd_team_scratch_value<Type>();
+ *shared_value = value;
+// *shepherd_team_scratch_value<Type>() = value;
+
+ memory_fence();
+
+ const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
+
+ int n, j;
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
+ Impl::spinwait_while_equal( m_shepherd_base[j]->m_worker_state, QthreadsExec::Active );
+ }
+
+ if ( rev_rank ) {
+ m_worker_state = QthreadsExec::Inactive;
+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
+ }
+ else {
+ volatile Type & accum = *m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
+ for ( int i = 1; i < team_size; ++i ) {
+ op.join( accum, *m_shepherd_base[i]->shepherd_team_scratch_value<Type>() );
+ }
+ for ( int i = 1; i < team_size; ++i ) {
+ *m_shepherd_base[i]->shepherd_team_scratch_value<Type>() = accum;
+ }
+
+ memory_fence();
+ }
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
+ m_shepherd_base[j]->m_worker_state = QthreadsExec::Active;
+ }
+
+ return *shepherd_team_scratch_value<Type>();
+ }
+
+ template< class Type >
+ inline
+ Type shepherd_scan( const int team_size
+ , const Type & value
+ , Type * const global_value = 0 ) const
+ {
+ *shepherd_team_scratch_value<Type>() = value;
+
+ memory_fence();
+
+ const int rev_rank = team_size - ( m_shepherd_worker_rank + 1 );
+
+ int n, j;
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
+ Impl::spinwait_while_equal( m_shepherd_base[j]->m_worker_state, QthreadsExec::Active );
+ }
+
+ if ( rev_rank ) {
+ m_worker_state = QthreadsExec::Inactive;
+ Impl::spinwait_while_equal( m_worker_state, QthreadsExec::Inactive );
+ }
+ else {
+ // Root thread scans across values before releasing threads.
+ // Worker data is in reverse order, so m_shepherd_base[0] is the
+ // highest ranking thread.
+
+ // Copy from lower ranking to higher ranking worker.
+
+ Type accum = *m_shepherd_base[0]->shepherd_team_scratch_value<Type>();
+ for ( int i = 1; i < team_size; ++i ) {
+ const Type tmp = *m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
+ accum += tmp;
+ *m_shepherd_base[i-1]->shepherd_team_scratch_value<Type>() = tmp;
+ }
+
+ *m_shepherd_base[team_size-1]->shepherd_team_scratch_value<Type>() =
+ global_value ? atomic_fetch_add( global_value, accum ) : 0;
+
+ // Join from lower ranking to higher ranking worker.
+ for ( int i = team_size; --i; ) {
+ *m_shepherd_base[i-1]->shepherd_team_scratch_value<Type>() += *m_shepherd_base[i]->shepherd_team_scratch_value<Type>();
+ }
+
+ memory_fence();
+ }
+
+ for ( n = 1; ( ! ( rev_rank & n ) ) && ( ( j = rev_rank + n ) < team_size ); n <<= 1 ) {
+ m_shepherd_base[j]->m_worker_state = QthreadsExec::Active;
+ }
+
+ return *shepherd_team_scratch_value<Type>();
+ }
+
+ //----------------------------------------
+
+ static inline
+ int align_alloc( int size )
+ {
+ enum { ALLOC_GRAIN = 1 << 6 /* power of two, 64bytes */ };
+ enum { ALLOC_GRAIN_MASK = ALLOC_GRAIN - 1 };
+ return ( size + ALLOC_GRAIN_MASK ) & ~ALLOC_GRAIN_MASK;
+ }
+
+ void shared_reset( Qthreads::scratch_memory_space & );
+
+ void * exec_all_reduce_value() const { return m_scratch_alloc; }
+
+ static void * exec_all_reduce_result();
+
+ static void resize_worker_scratch( const int reduce_size, const int shared_size );
+ static void clear_workers();
+
+ //----------------------------------------
+
+ inline int worker_rank() const { return m_worker_rank; }
+ inline int worker_size() const { return m_worker_size; }
+ inline int shepherd_worker_rank() const { return m_shepherd_worker_rank; }
+ inline int shepherd_worker_size() const { return m_shepherd_worker_size; }
+ inline int shepherd_rank() const { return m_shepherd_rank; }
+ inline int shepherd_size() const { return m_shepherd_size; }
+
+ static int worker_per_shepherd();
+};
+
+} // namespace Impl
+
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+namespace Impl {
+
+class QthreadsTeamPolicyMember {
+private:
+ typedef Kokkos::Qthreads execution_space;
+ typedef execution_space::scratch_memory_space scratch_memory_space;
+
+ Impl::QthreadsExec & m_exec;
+ scratch_memory_space m_team_shared;
+ const int m_team_size;
+ const int m_team_rank;
+ const int m_league_size;
+ const int m_league_end;
+ int m_league_rank;
+
+public:
+ KOKKOS_INLINE_FUNCTION
+ const scratch_memory_space & team_shmem() const { return m_team_shared; }
+
+ KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank; }
+ KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size; }
+ KOKKOS_INLINE_FUNCTION int team_rank() const { return m_team_rank; }
+ KOKKOS_INLINE_FUNCTION int team_size() const { return m_team_size; }
+
+ KOKKOS_INLINE_FUNCTION void team_barrier() const
+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ {}
+#else
+ { m_exec.shepherd_barrier( m_team_size ); }
+#endif
+
+ template< typename Type >
+ KOKKOS_INLINE_FUNCTION Type team_broadcast( const Type & value, int rank ) const
+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ { return Type(); }
+#else
+ { return m_exec.template shepherd_broadcast<Type>( value, m_team_size, rank ); }
+#endif
+
+ template< typename Type >
+ KOKKOS_INLINE_FUNCTION Type team_reduce( const Type & value ) const
+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ { return Type(); }
+#else
+ { return m_exec.template shepherd_reduce<Type>( m_team_size, value ); }
+#endif
+
+ template< typename JoinOp >
+ KOKKOS_INLINE_FUNCTION typename JoinOp::value_type
+ team_reduce( const typename JoinOp::value_type & value
+ , const JoinOp & op ) const
+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ { return typename JoinOp::value_type(); }
+#else
+ { return m_exec.template shepherd_reduce<JoinOp>( m_team_size, value, op ); }
+#endif
+
+ /** \brief Intra-team exclusive prefix sum with team_rank() ordering.
+ *
+ * The highest rank thread can compute the reduction total as
+ * reduction_total = dev.team_scan( value ) + value;
+ */
+ template< typename Type >
+ KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const
+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ { return Type(); }
+#else
+ { return m_exec.template shepherd_scan<Type>( m_team_size, value ); }
+#endif
+
+ /** \brief Intra-team exclusive prefix sum with team_rank() ordering
+ * with intra-team non-deterministic ordering accumulation.
+ *
+ * The global inter-team accumulation value will, at the end of the league's
+ * parallel execution, be the scan's total. Parallel execution ordering of
+ * the league's teams is non-deterministic. As such the base value for each
+ * team's scan operation is similarly non-deterministic.
+ */
+ template< typename Type >
+ KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value, Type * const global_accum ) const
+#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ { return Type(); }
+#else
+ { return m_exec.template shepherd_scan<Type>( m_team_size, value, global_accum ); }
+#endif
+
+ //----------------------------------------
+ // Private driver for task-team parallel.
+
+ struct TaskTeam {};
+
+ QthreadsTeamPolicyMember();
+ explicit QthreadsTeamPolicyMember( const TaskTeam & );
+
+ //----------------------------------------
+ // Private for the driver ( for ( member_type i( exec, team ); i; i.next_team() ) { ... }
+
+ // Initialize.
+ template< class ... Properties >
+ QthreadsTeamPolicyMember( Impl::QthreadsExec & exec
+ , const Kokkos::Impl::TeamPolicyInternal< Qthreads, Properties... > & team )
+ : m_exec( exec )
+ , m_team_shared( 0, 0 )
+ , m_team_size( team.m_team_size )
+ , m_team_rank( exec.shepherd_worker_rank() )
+ , m_league_size( team.m_league_size )
+ , m_league_end( team.m_league_size - team.m_shepherd_iter * ( exec.shepherd_size() - ( exec.shepherd_rank() + 1 ) ) )
+ , m_league_rank( m_league_end > team.m_shepherd_iter ? m_league_end - team.m_shepherd_iter : 0 )
+ {
+ m_exec.shared_reset( m_team_shared );
+ }
+
+ // Continue.
+ operator bool () const { return m_league_rank < m_league_end; }
+
+ // Iterate.
+ void next_team() { ++m_league_rank; m_exec.shared_reset( m_team_shared ); }
+};
+
+template< class ... Properties >
+class TeamPolicyInternal< Kokkos::Qthreads, Properties ... >
+ : public PolicyTraits< Properties... >
+{
+private:
+ const int m_league_size;
+ const int m_team_size;
+ const int m_shepherd_iter;
+
+public:
+ //! Tag this class as a kokkos execution policy.
+ typedef TeamPolicyInternal execution_policy;
+ typedef Qthreads execution_space;
+ typedef PolicyTraits< Properties ... > traits;
+
+ //----------------------------------------
+
+ template< class FunctorType >
+ inline static
+ int team_size_max( const FunctorType & )
+ { return Qthreads::instance().shepherd_worker_size(); }
+
+ template< class FunctorType >
+ static int team_size_recommended( const FunctorType & f )
+ { return team_size_max( f ); }
+
+ template< class FunctorType >
+ inline static
+ int team_size_recommended( const FunctorType & f, const int& )
+ { return team_size_max( f ); }
+
+ //----------------------------------------
+
+ inline int team_size() const { return m_team_size; }
+ inline int league_size() const { return m_league_size; }
+
+ // One active team per shepherd.
+ TeamPolicyInternal( Kokkos::Qthreads & q
+ , const int league_size
+ , const int team_size
+ , const int /* vector_length */ = 0
+ )
+ : m_league_size( league_size )
+ , m_team_size( team_size < q.shepherd_worker_size()
+ ? team_size : q.shepherd_worker_size() )
+ , m_shepherd_iter( ( league_size + q.shepherd_size() - 1 ) / q.shepherd_size() )
+ {}
+
+ // TODO: Make sure this is correct.
+ // One active team per shepherd.
+ TeamPolicyInternal( Kokkos::Qthreads & q
+ , const int league_size
+ , const Kokkos::AUTO_t & /* team_size_request */
+ , const int /* vector_length */ = 0
+ )
+ : m_league_size( league_size )
+ , m_team_size( q.shepherd_worker_size() )
+ , m_shepherd_iter( ( league_size + q.shepherd_size() - 1 ) / q.shepherd_size() )
+ {}
+
+ // One active team per shepherd.
+ TeamPolicyInternal( const int league_size
+ , const int team_size
+ , const int /* vector_length */ = 0
+ )
+ : m_league_size( league_size )
+ , m_team_size( team_size < Qthreads::instance().shepherd_worker_size()
+ ? team_size : Qthreads::instance().shepherd_worker_size() )
+ , m_shepherd_iter( ( league_size + Qthreads::instance().shepherd_size() - 1 ) / Qthreads::instance().shepherd_size() )
+ {}
+
+ // TODO: Make sure this is correct.
+ // One active team per shepherd.
+ TeamPolicyInternal( const int league_size
+ , const Kokkos::AUTO_t & /* team_size_request */
+ , const int /* vector_length */ = 0
+ )
+ : m_league_size( league_size )
+ , m_team_size( Qthreads::instance().shepherd_worker_size() )
+ , m_shepherd_iter( ( league_size + Qthreads::instance().shepherd_size() - 1 ) / Qthreads::instance().shepherd_size() )
+ {}
+
+ // TODO: Doesn't do anything yet. Fix this.
+ /** \brief set chunk_size to a discrete value*/
+ inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
+ TeamPolicyInternal p = *this;
+// p.m_chunk_size = chunk_size_;
+ return p;
+ }
+
+ typedef Impl::QthreadsTeamPolicyMember member_type;
+
+ friend class Impl::QthreadsTeamPolicyMember;
+};
+
+} // namespace Impl
+
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+
+#endif // #define KOKKOS_QTHREADSEXEC_HPP
diff --git a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_Parallel.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Parallel.hpp
similarity index 86%
rename from lib/kokkos/core/src/Qthread/Kokkos_Qthread_Parallel.hpp
rename to lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Parallel.hpp
index cb5b18094..9f9960754 100644
--- a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_Parallel.hpp
+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Parallel.hpp
@@ -1,727 +1,727 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#ifndef KOKKOS_QTHREAD_PARALLEL_HPP
-#define KOKKOS_QTHREAD_PARALLEL_HPP
+#ifndef KOKKOS_QTHREADS_PARALLEL_HPP
+#define KOKKOS_QTHREADS_PARALLEL_HPP
#include <vector>
#include <Kokkos_Parallel.hpp>
#include <impl/Kokkos_StaticAssert.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
-#include <Qthread/Kokkos_QthreadExec.hpp>
+#include <Qthreads/Kokkos_QthreadsExec.hpp>
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType
, Kokkos::RangePolicy< Traits ... >
- , Kokkos::Qthread
+ , Kokkos::Qthreads
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::member_type Member ;
typedef typename Policy::WorkRange WorkRange ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor , const Member ibeg , const Member iend )
{
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( i );
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor , const Member ibeg , const Member iend )
{
const TagType t{} ;
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( t , i );
}
}
// Function is called once by every concurrent thread.
- static void exec( QthreadExec & exec , const void * arg )
+ static void exec( QthreadsExec & exec , const void * arg )
{
const ParallelFor & self = * ((const ParallelFor *) arg );
const WorkRange range( self.m_policy, exec.worker_rank(), exec.worker_size() );
ParallelFor::template exec_range< WorkTag > ( self.m_functor , range.begin() , range.end() );
// All threads wait for completion.
exec.exec_all_barrier();
}
public:
inline
void execute() const
{
- Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelFor::exec , this );
+ Impl::QthreadsExec::exec_all( Qthreads::instance() , & ParallelFor::exec , this );
}
ParallelFor( const FunctorType & arg_functor
, const Policy & arg_policy
)
: m_functor( arg_functor )
, m_policy( arg_policy )
{ }
};
//----------------------------------------------------------------------------
template< class FunctorType , class ReducerType , class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::RangePolicy< Traits ... >
, ReducerType
- , Kokkos::Qthread
+ , Kokkos::Qthreads
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::member_type Member ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, FunctorType, ReducerType > ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
// Static Assert WorkTag void if ReducerType not InvalidType
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update )
{
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( i , update );
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update )
{
const TagType t{} ;
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( t , i , update );
}
}
- static void exec( QthreadExec & exec , const void * arg )
+ static void exec( QthreadsExec & exec , const void * arg )
{
const ParallelReduce & self = * ((const ParallelReduce *) arg );
const WorkRange range( self.m_policy, exec.worker_rank(), exec.worker_size() );
ParallelReduce::template exec_range< WorkTag >(
self.m_functor, range.begin(), range.end(),
ValueInit::init( ReducerConditional::select(self.m_functor , self.m_reducer)
, exec.exec_all_reduce_value() ) );
exec.template exec_all_reduce< FunctorType, ReducerType, WorkTag >( self.m_functor, self.m_reducer );
}
public:
inline
void execute() const
{
- QthreadExec::resize_worker_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
- Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelReduce::exec , this );
+ QthreadsExec::resize_worker_scratch( ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) ) , 0 );
+ Impl::QthreadsExec::exec_all( Qthreads::instance() , & ParallelReduce::exec , this );
- const pointer_type data = (pointer_type) QthreadExec::exec_all_reduce_result();
+ const pointer_type data = (pointer_type) QthreadsExec::exec_all_reduce_result();
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , data );
if ( m_result_ptr ) {
const unsigned n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
for ( unsigned i = 0 ; i < n ; ++i ) { m_result_ptr[i] = data[i]; }
}
}
template< class ViewType >
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const ViewType & arg_result_view
, typename std::enable_if<Kokkos::is_view< ViewType >::value &&
!Kokkos::is_reducer_type< ReducerType >::value
, void*>::type = NULL)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result_view.data() )
{ }
ParallelReduce( const FunctorType & arg_functor
, Policy arg_policy
, const ReducerType& reducer )
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().data() )
{ }
};
//----------------------------------------------------------------------------
template< class FunctorType , class ... Properties >
class ParallelFor< FunctorType
, TeamPolicy< Properties ... >
- , Kokkos::Qthread >
+ , Kokkos::Qthreads >
{
private:
- typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::Qthread , Properties ... > Policy ;
+ typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::Qthreads , Properties ... > Policy ;
typedef typename Policy::member_type Member ;
typedef typename Policy::work_tag WorkTag ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_team( const FunctorType & functor , Member member )
{
while ( member ) {
functor( member );
member.team_barrier();
member.next_team();
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_team( const FunctorType & functor , Member member )
{
const TagType t{} ;
while ( member ) {
functor( t , member );
member.team_barrier();
member.next_team();
}
}
- static void exec( QthreadExec & exec , const void * arg )
+ static void exec( QthreadsExec & exec , const void * arg )
{
const ParallelFor & self = * ((const ParallelFor *) arg );
ParallelFor::template exec_team< WorkTag >
( self.m_functor , Member( exec , self.m_policy ) );
exec.exec_all_barrier();
}
public:
inline
void execute() const
{
- QthreadExec::resize_worker_scratch
+ QthreadsExec::resize_worker_scratch
( /* reduction memory */ 0
, /* team shared memory */ FunctorTeamShmemSize< FunctorType >::value( m_functor , m_policy.team_size() ) );
- Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelFor::exec , this );
+ Impl::QthreadsExec::exec_all( Qthreads::instance() , & ParallelFor::exec , this );
}
ParallelFor( const FunctorType & arg_functor ,
const Policy & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
{ }
};
//----------------------------------------------------------------------------
template< class FunctorType , class ReducerType , class ... Properties >
class ParallelReduce< FunctorType
, TeamPolicy< Properties... >
, ReducerType
- , Kokkos::Qthread
+ , Kokkos::Qthreads
>
{
private:
- typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::Qthread , Properties ... > Policy ;
+ typedef Kokkos::Impl::TeamPolicyInternal< Kokkos::Qthreads , Properties ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::member_type Member ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
const ReducerType m_reducer ;
const pointer_type m_result_ptr ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_team( const FunctorType & functor , Member member , reference_type update )
{
while ( member ) {
functor( member , update );
member.team_barrier();
member.next_team();
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_team( const FunctorType & functor , Member member , reference_type update )
{
const TagType t{} ;
while ( member ) {
functor( t , member , update );
member.team_barrier();
member.next_team();
}
}
- static void exec( QthreadExec & exec , const void * arg )
+ static void exec( QthreadsExec & exec , const void * arg )
{
const ParallelReduce & self = * ((const ParallelReduce *) arg );
ParallelReduce::template exec_team< WorkTag >
( self.m_functor
, Member( exec , self.m_policy )
, ValueInit::init( ReducerConditional::select( self.m_functor , self.m_reducer )
, exec.exec_all_reduce_value() ) );
exec.template exec_all_reduce< FunctorType, ReducerType, WorkTag >( self.m_functor, self.m_reducer );
}
public:
inline
void execute() const
{
- QthreadExec::resize_worker_scratch
+ QthreadsExec::resize_worker_scratch
( /* reduction memory */ ValueTraits::value_size( ReducerConditional::select(m_functor , m_reducer) )
, /* team shared memory */ FunctorTeamShmemSize< FunctorType >::value( m_functor , m_policy.team_size() ) );
- Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelReduce::exec , this );
+ Impl::QthreadsExec::exec_all( Qthreads::instance() , & ParallelReduce::exec , this );
- const pointer_type data = (pointer_type) QthreadExec::exec_all_reduce_result();
+ const pointer_type data = (pointer_type) QthreadsExec::exec_all_reduce_result();
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer), data );
if ( m_result_ptr ) {
const unsigned n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
for ( unsigned i = 0 ; i < n ; ++i ) { m_result_ptr[i] = data[i]; }
}
}
template< class ViewType >
ParallelReduce( const FunctorType & arg_functor
, const Policy & arg_policy
, const ViewType & arg_result
, typename std::enable_if<Kokkos::is_view< ViewType >::value &&
!Kokkos::is_reducer_type< ReducerType >::value
, void*>::type = NULL)
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( InvalidType() )
, m_result_ptr( arg_result.ptr_on_device() )
{ }
inline
ParallelReduce( const FunctorType & arg_functor
, Policy arg_policy
, const ReducerType& reducer )
: m_functor( arg_functor )
, m_policy( arg_policy )
, m_reducer( reducer )
, m_result_ptr( reducer.result_view().data() )
{ }
};
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
template< class FunctorType , class ... Traits >
class ParallelScan< FunctorType
, Kokkos::RangePolicy< Traits ... >
- , Kokkos::Qthread
+ , Kokkos::Qthreads
>
{
private:
typedef Kokkos::RangePolicy< Traits ... > Policy ;
typedef typename Policy::work_tag WorkTag ;
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::member_type Member ;
typedef Kokkos::Impl::FunctorValueTraits< FunctorType, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< FunctorType, WorkTag > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
const FunctorType m_functor ;
const Policy m_policy ;
template< class TagType >
inline static
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update , const bool final )
{
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( i , update , final );
}
}
template< class TagType >
inline static
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_range( const FunctorType & functor
, const Member ibeg , const Member iend
, reference_type update , const bool final )
{
const TagType t{} ;
for ( Member i = ibeg ; i < iend ; ++i ) {
functor( t , i , update , final );
}
}
- static void exec( QthreadExec & exec , const void * arg )
+ static void exec( QthreadsExec & exec , const void * arg )
{
const ParallelScan & self = * ((const ParallelScan *) arg );
const WorkRange range( self.m_policy , exec.worker_rank() , exec.worker_size() );
// Initialize thread-local value
reference_type update = ValueInit::init( self.m_functor , exec.exec_all_reduce_value() );
ParallelScan::template exec_range< WorkTag >( self.m_functor, range.begin() , range.end() , update , false );
exec.template exec_all_scan< FunctorType , typename Policy::work_tag >( self.m_functor );
ParallelScan::template exec_range< WorkTag >( self.m_functor , range.begin() , range.end() , update , true );
exec.exec_all_barrier();
}
public:
inline
void execute() const
{
- QthreadExec::resize_worker_scratch( ValueTraits::value_size( m_functor ) , 0 );
- Impl::QthreadExec::exec_all( Qthread::instance() , & ParallelScan::exec , this );
+ QthreadsExec::resize_worker_scratch( ValueTraits::value_size( m_functor ) , 0 );
+ Impl::QthreadsExec::exec_all( Qthreads::instance() , & ParallelScan::exec , this );
}
ParallelScan( const FunctorType & arg_functor
, const Policy & arg_policy
)
: m_functor( arg_functor )
, m_policy( arg_policy )
{
}
};
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
template< typename iType >
KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >
-TeamThreadRange( const Impl::QthreadTeamPolicyMember& thread, const iType& count )
+Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadsTeamPolicyMember >
+TeamThreadRange( const Impl::QthreadsTeamPolicyMember& thread, const iType& count )
{
- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >( thread, count );
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadsTeamPolicyMember >( thread, count );
}
template< typename iType1, typename iType2 >
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
- Impl::QthreadTeamPolicyMember >
-TeamThreadRange( const Impl::QthreadTeamPolicyMember& thread, const iType1 & begin, const iType2 & end )
+ Impl::QthreadsTeamPolicyMember >
+TeamThreadRange( const Impl::QthreadsTeamPolicyMember& thread, const iType1 & begin, const iType2 & end )
{
typedef typename std::common_type< iType1, iType2 >::type iType;
- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadTeamPolicyMember >( thread, iType(begin), iType(end) );
+ return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::QthreadsTeamPolicyMember >( thread, iType(begin), iType(end) );
}
template<typename iType>
KOKKOS_INLINE_FUNCTION
-Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >
- ThreadVectorRange(const Impl::QthreadTeamPolicyMember& thread, const iType& count) {
- return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >(thread,count);
+Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >
+ ThreadVectorRange(const Impl::QthreadsTeamPolicyMember& thread, const iType& count) {
+ return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >(thread,count);
}
KOKKOS_INLINE_FUNCTION
-Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember> PerTeam(const Impl::QthreadTeamPolicyMember& thread) {
- return Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>(thread);
+Impl::ThreadSingleStruct<Impl::QthreadsTeamPolicyMember> PerTeam(const Impl::QthreadsTeamPolicyMember& thread) {
+ return Impl::ThreadSingleStruct<Impl::QthreadsTeamPolicyMember>(thread);
}
KOKKOS_INLINE_FUNCTION
-Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember> PerThread(const Impl::QthreadTeamPolicyMember& thread) {
- return Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>(thread);
+Impl::VectorSingleStruct<Impl::QthreadsTeamPolicyMember> PerThread(const Impl::QthreadsTeamPolicyMember& thread) {
+ return Impl::VectorSingleStruct<Impl::QthreadsTeamPolicyMember>(thread);
}
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
-void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries, const Lambda& lambda) {
+void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember>& loop_boundaries, const Lambda& lambda) {
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries,
+void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember>& loop_boundaries,
const Lambda & lambda, ValueType& result) {
result = ValueType();
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember>& loop_boundaries,
+void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember>& loop_boundaries,
const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = loop_boundaries.thread.team_reduce(result,Impl::JoinLambdaAdapter<ValueType,JoinType>(join));
}
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
-void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
+void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >&
loop_boundaries, const Lambda& lambda) {
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
+void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >&
loop_boundaries, const Lambda & lambda, ValueType& result) {
result = ValueType();
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
-void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
+void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >&
loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = result;
}
/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
* for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
* Depending on the target execution space the operator might be called twice: once with final=false
* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
* "i" needs to be added to val no matter whether final==true or not. In a serial execution
* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
* to the final sum value over all vector lanes.
* This functionality requires C++11 support.*/
template< typename iType, class FunctorType >
KOKKOS_INLINE_FUNCTION
-void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadTeamPolicyMember >&
+void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::QthreadsTeamPolicyMember >&
loop_boundaries, const FunctorType & lambda) {
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename ValueTraits::value_type value_type ;
value_type scan_val = value_type();
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,scan_val,true);
}
}
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
-void single(const Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda) {
+void single(const Impl::VectorSingleStruct<Impl::QthreadsTeamPolicyMember>& single_struct, const FunctorType& lambda) {
lambda();
}
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
-void single(const Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda) {
+void single(const Impl::ThreadSingleStruct<Impl::QthreadsTeamPolicyMember>& single_struct, const FunctorType& lambda) {
if(single_struct.team_member.team_rank()==0) lambda();
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
-void single(const Impl::VectorSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda, ValueType& val) {
+void single(const Impl::VectorSingleStruct<Impl::QthreadsTeamPolicyMember>& single_struct, const FunctorType& lambda, ValueType& val) {
lambda(val);
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
-void single(const Impl::ThreadSingleStruct<Impl::QthreadTeamPolicyMember>& single_struct, const FunctorType& lambda, ValueType& val) {
+void single(const Impl::ThreadSingleStruct<Impl::QthreadsTeamPolicyMember>& single_struct, const FunctorType& lambda, ValueType& val) {
if(single_struct.team_member.team_rank()==0) {
lambda(val);
}
single_struct.team_member.team_broadcast(val,0);
}
} // namespace Kokkos
-#endif /* #define KOKKOS_QTHREAD_PARALLEL_HPP */
+#endif /* #define KOKKOS_QTHREADS_PARALLEL_HPP */
diff --git a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.cpp
similarity index 56%
copy from lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
copy to lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.cpp
index 5b3e9873e..614a2c03f 100644
--- a/lib/kokkos/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.cpp
@@ -1,329 +1,320 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
-#if defined( KOKKOS_ENABLE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG )
+#if defined( KOKKOS_ENABLE_QTHREADS ) && defined( KOKKOS_ENABLE_TASKPOLICY )
#include <impl/Kokkos_TaskQueue_impl.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
-template class TaskQueue< Kokkos::OpenMP > ;
+template class TaskQueue< Kokkos::Qthreads > ;
//----------------------------------------------------------------------------
-TaskExec< Kokkos::OpenMP >::
-TaskExec()
- : m_self_exec( 0 )
- , m_team_exec( 0 )
- , m_sync_mask( 0 )
- , m_sync_value( 0 )
- , m_sync_step( 0 )
- , m_group_rank( 0 )
- , m_team_rank( 0 )
- , m_team_size( 1 )
-{
-}
-
-TaskExec< Kokkos::OpenMP >::
-TaskExec( Kokkos::Impl::OpenMPexec & arg_exec , int const arg_team_size )
- : m_self_exec( & arg_exec )
- , m_team_exec( arg_exec.pool_rev(arg_exec.pool_rank_rev() / arg_team_size) )
- , m_sync_mask( 0 )
- , m_sync_value( 0 )
- , m_sync_step( 0 )
- , m_group_rank( arg_exec.pool_rank_rev() / arg_team_size )
- , m_team_rank( arg_exec.pool_rank_rev() % arg_team_size )
- , m_team_size( arg_team_size )
+TaskExec< Kokkos::Qthreads >::TaskExec()
+ : m_self_exec( 0 ),
+ m_team_exec( 0 ),
+ m_sync_mask( 0 ),
+ m_sync_value( 0 ),
+ m_sync_step( 0 ),
+ m_group_rank( 0 ),
+ m_team_rank( 0 ),
+ m_team_size( 1 )
+{}
+
+TaskExec< Kokkos::Qthreads >::
+TaskExec( Kokkos::Impl::QthreadsExec & arg_exec, int const arg_team_size )
+ : m_self_exec( & arg_exec ),
+ m_team_exec( arg_exec.pool_rev(arg_exec.pool_rank_rev() / arg_team_size) ),
+ m_sync_mask( 0 ),
+ m_sync_value( 0 ),
+ m_sync_step( 0 ),
+ m_group_rank( arg_exec.pool_rank_rev() / arg_team_size ),
+ m_team_rank( arg_exec.pool_rank_rev() % arg_team_size ),
+ m_team_size( arg_team_size )
{
// This team spans
// m_self_exec->pool_rev( team_size * group_rank )
// m_self_exec->pool_rev( team_size * ( group_rank + 1 ) - 1 )
int64_t volatile * const sync = (int64_t *) m_self_exec->scratch_reduce();
sync[0] = int64_t(0) ;
sync[1] = int64_t(0) ;
for ( int i = 0 ; i < m_team_size ; ++i ) {
m_sync_value |= int64_t(1) << (8*i);
m_sync_mask |= int64_t(3) << (8*i);
}
Kokkos::memory_fence();
}
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
-void TaskExec< Kokkos::OpenMP >::team_barrier_impl() const
+void TaskExec< Kokkos::Qthreads >::team_barrier() const
{
- if ( m_team_exec->scratch_reduce_size() < int(2 * sizeof(int64_t)) ) {
- Kokkos::abort("TaskQueue<OpenMP> scratch_reduce memory too small");
- }
+ if ( 1 < m_team_size ) {
- // Use team shared memory to synchronize.
- // Alternate memory locations between barriers to avoid a sequence
- // of barriers overtaking one another.
+ if ( m_team_exec->scratch_reduce_size() < int(2 * sizeof(int64_t)) ) {
+ Kokkos::abort("TaskQueue<Qthreads> scratch_reduce memory too small");
+ }
- int64_t volatile * const sync =
- ((int64_t *) m_team_exec->scratch_reduce()) + ( m_sync_step & 0x01 );
+ // Use team shared memory to synchronize.
+ // Alternate memory locations between barriers to avoid a sequence
+ // of barriers overtaking one another.
- // This team member sets one byte within the sync variable
- int8_t volatile * const sync_self =
- ((int8_t *) sync) + m_team_rank ;
+ int64_t volatile * const sync =
+ ((int64_t *) m_team_exec->scratch_reduce()) + ( m_sync_step & 0x01 );
+
+ // This team member sets one byte within the sync variable
+ int8_t volatile * const sync_self =
+ ((int8_t *) sync) + m_team_rank ;
#if 0
-fprintf( stdout
- , "barrier group(%d) member(%d) step(%d) wait(%lx) : before(%lx)\n"
- , m_group_rank
- , m_team_rank
- , m_sync_step
- , m_sync_value
- , *sync
+fprintf( stdout,
+ "barrier group(%d) member(%d) step(%d) wait(%lx) : before(%lx)\n",
+ m_group_rank,
+ m_team_rank,
+ m_sync_step,
+ m_sync_value,
+ *sync
);
fflush(stdout);
#endif
- *sync_self = int8_t( m_sync_value & 0x03 ); // signal arrival
+ *sync_self = int8_t( m_sync_value & 0x03 ); // signal arrival
- while ( m_sync_value != *sync ); // wait for team to arrive
+ while ( m_sync_value != *sync ); // wait for team to arrive
#if 0
-fprintf( stdout
- , "barrier group(%d) member(%d) step(%d) wait(%lx) : after(%lx)\n"
- , m_group_rank
- , m_team_rank
- , m_sync_step
- , m_sync_value
- , *sync
+fprintf( stdout,
+ "barrier group(%d) member(%d) step(%d) wait(%lx) : after(%lx)\n",
+ m_group_rank,
+ m_team_rank,
+ m_sync_step,
+ m_sync_value,
+ *sync
);
fflush(stdout);
#endif
- ++m_sync_step ;
+ ++m_sync_step ;
- if ( 0 == ( 0x01 & m_sync_step ) ) { // Every other step
- m_sync_value ^= m_sync_mask ;
- if ( 1000 < m_sync_step ) m_sync_step = 0 ;
+ if ( 0 == ( 0x01 & m_sync_step ) ) { // Every other step
+ m_sync_value ^= m_sync_mask ;
+ if ( 1000 < m_sync_step ) m_sync_step = 0 ;
+ }
}
}
#endif
//----------------------------------------------------------------------------
-void TaskQueueSpecialization< Kokkos::OpenMP >::execute
- ( TaskQueue< Kokkos::OpenMP > * const queue )
+void TaskQueueSpecialization< Kokkos::Qthreads >::execute
+ ( TaskQueue< Kokkos::Qthreads > * const queue )
{
- using execution_space = Kokkos::OpenMP ;
+ using execution_space = Kokkos::Qthreads ;
using queue_type = TaskQueue< execution_space > ;
- using task_root_type = TaskBase< execution_space , void , void > ;
- using PoolExec = Kokkos::Impl::OpenMPexec ;
+ using task_root_type = TaskBase< execution_space, void, void > ;
+ using PoolExec = Kokkos::Impl::QthreadsExec ;
using Member = TaskExec< execution_space > ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
// Required: team_size <= 8
const int team_size = PoolExec::pool_size(2); // Threads per core
// const int team_size = PoolExec::pool_size(1); // Threads per NUMA
if ( 8 < team_size ) {
- Kokkos::abort("TaskQueue<OpenMP> unsupported team size");
+ Kokkos::abort("TaskQueue<Qthreads> unsupported team size");
}
#pragma omp parallel
{
PoolExec & self = *PoolExec::get_thread_omp();
Member single_exec ;
- Member team_exec( self , team_size );
+ Member team_exec( self, team_size );
// Team shared memory
task_root_type * volatile * const task_shared =
(task_root_type **) team_exec.m_team_exec->scratch_thread();
-// Barrier across entire OpenMP thread pool to insure initialization
+// Barrier across entire Qthreads thread pool to insure initialization
#pragma omp barrier
// Loop until all queues are empty and no tasks in flight
do {
- task_root_type * task = 0 ;
-
// Each team lead attempts to acquire either a thread team task
- // or a single thread task for the team.
+ // or collection of single thread tasks for the team.
if ( 0 == team_exec.team_rank() ) {
- task = 0 < *((volatile int *) & queue->m_ready_count) ? end : 0 ;
+ task_root_type * tmp =
+ 0 < *((volatile int *) & queue->m_ready_count) ? end : 0 ;
// Loop by priority and then type
- for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
- for ( int j = 0 ; j < 2 && end == task ; ++j ) {
- task = queue_type::pop_task( & queue->m_ready[i][j] );
+ for ( int i = 0 ; i < queue_type::NumQueue && end == tmp ; ++i ) {
+ for ( int j = 0 ; j < 2 && end == tmp ; ++j ) {
+ tmp = queue_type::pop_task( & queue->m_ready[i][j] );
}
}
- }
-
- // Team lead broadcast acquired task to team members:
-
- if ( 1 < team_exec.team_size() ) {
- if ( 0 == team_exec.team_rank() ) *task_shared = task ;
+ *task_shared = tmp ;
- // Fence to be sure task_shared is stored before the barrier
+ // Fence to be sure shared_task_array is stored
Kokkos::memory_fence();
+ }
- // Whole team waits for every team member to reach this statement
- team_exec.team_barrier();
+ // Whole team waits for every team member to reach this statement
+ team_exec.team_barrier();
- // Fence to be sure task_shared is stored
- Kokkos::memory_fence();
+ Kokkos::memory_fence();
- task = *task_shared ;
- }
+ task_root_type * const task = *task_shared ;
#if 0
-fprintf( stdout
- , "\nexecute group(%d) member(%d) task_shared(0x%lx) task(0x%lx)\n"
- , team_exec.m_group_rank
- , team_exec.m_team_rank
- , uintptr_t(task_shared)
- , uintptr_t(task)
+fprintf( stdout,
+ "\nexecute group(%d) member(%d) task_shared(0x%lx) task(0x%lx)\n",
+ team_exec.m_group_rank,
+ team_exec.m_team_rank,
+ uintptr_t(task_shared),
+ uintptr_t(task)
);
fflush(stdout);
#endif
if ( 0 == task ) break ; // 0 == m_ready_count
if ( end == task ) {
- // All team members wait for whole team to reach this statement.
- // Is necessary to prevent task_shared from being updated
- // before it is read by all threads.
team_exec.team_barrier();
}
else if ( task_root_type::TaskTeam == task->m_task_type ) {
// Thread Team Task
- (*task->m_apply)( task , & team_exec );
+ (*task->m_apply)( task, & team_exec );
// The m_apply function performs a barrier
if ( 0 == team_exec.team_rank() ) {
// team member #0 completes the task, which may delete the task
- queue->complete( task );
+ queue->complete( task );
}
}
else {
// Single Thread Task
if ( 0 == team_exec.team_rank() ) {
- (*task->m_apply)( task , & single_exec );
+ (*task->m_apply)( task, & single_exec );
- queue->complete( task );
+ queue->complete( task );
}
// All team members wait for whole team to reach this statement.
// Not necessary to complete the task.
// Is necessary to prevent task_shared from being updated
// before it is read by all threads.
team_exec.team_barrier();
}
} while(1);
}
// END #pragma omp parallel
}
-void TaskQueueSpecialization< Kokkos::OpenMP >::
+void TaskQueueSpecialization< Kokkos::Qthreads >::
iff_single_thread_recursive_execute
- ( TaskQueue< Kokkos::OpenMP > * const queue )
+ ( TaskQueue< Kokkos::Qthreads > * const queue )
{
- using execution_space = Kokkos::OpenMP ;
+ using execution_space = Kokkos::Qthreads ;
using queue_type = TaskQueue< execution_space > ;
- using task_root_type = TaskBase< execution_space , void , void > ;
+ using task_root_type = TaskBase< execution_space, void, void > ;
using Member = TaskExec< execution_space > ;
if ( 1 == omp_get_num_threads() ) {
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
Member single_exec ;
task_root_type * task = end ;
do {
task = end ;
// Loop by priority and then type
for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
for ( int j = 0 ; j < 2 && end == task ; ++j ) {
task = queue_type::pop_task( & queue->m_ready[i][j] );
}
}
if ( end == task ) break ;
- (*task->m_apply)( task , & single_exec );
+ (*task->m_apply)( task, & single_exec );
- queue->complete( task );
+ queue->complete( task );
} while(1);
}
}
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
-#endif /* #if defined( KOKKOS_ENABLE_OPENMP ) && defined( KOKKOS_ENABLE_TASKDAG ) */
+#endif /* #if defined( KOKKOS_ENABLE_QTHREADS ) && defined( KOKKOS_ENABLE_TASKPOLICY ) */
diff --git a/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.hpp
new file mode 100644
index 000000000..836452dde
--- /dev/null
+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_Task.hpp
@@ -0,0 +1,156 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#ifndef KOKKOS_IMPL_QTHREADS_TASK_HPP
+#define KOKKOS_IMPL_QTHREADS_TASK_HPP
+
+#if defined( KOKKOS_ENABLE_TASKPOLICY )
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+template<>
+class TaskQueueSpecialization< Kokkos::Qthreads >
+{
+public:
+
+ using execution_space = Kokkos::Qthreads ;
+ using queue_type = Kokkos::Impl::TaskQueue< execution_space > ;
+ using task_base_type = Kokkos::Impl::TaskBase< execution_space, void, void > ;
+
+ // Must specify memory space
+ using memory_space = Kokkos::HostSpace ;
+
+ static
+ void iff_single_thread_recursive_execute( queue_type * const );
+
+ // Must provide task queue execution function
+ static void execute( queue_type * const );
+
+ // Must provide mechanism to set function pointer in
+ // execution space from the host process.
+ template< typename FunctorType >
+ static
+ void proc_set_apply( task_base_type::function_type * ptr )
+ {
+ using TaskType = TaskBase< execution_space,
+ typename FunctorType::value_type,
+ FunctorType
+ > ;
+ *ptr = TaskType::apply ;
+ }
+};
+
+extern template class TaskQueue< Kokkos::Qthreads > ;
+
+//----------------------------------------------------------------------------
+
+template<>
+class TaskExec< Kokkos::Qthreads >
+{
+private:
+
+ TaskExec( TaskExec && ) = delete ;
+ TaskExec( TaskExec const & ) = delete ;
+ TaskExec & operator = ( TaskExec && ) = delete ;
+ TaskExec & operator = ( TaskExec const & ) = delete ;
+
+
+ using PoolExec = Kokkos::Impl::QthreadsExec ;
+
+ friend class Kokkos::Impl::TaskQueue< Kokkos::Qthreads > ;
+ friend class Kokkos::Impl::TaskQueueSpecialization< Kokkos::Qthreads > ;
+
+ PoolExec * const m_self_exec ; ///< This thread's thread pool data structure
+ PoolExec * const m_team_exec ; ///< Team thread's thread pool data structure
+ int64_t m_sync_mask ;
+ int64_t mutable m_sync_value ;
+ int mutable m_sync_step ;
+ int m_group_rank ; ///< Which "team" subset of thread pool
+ int m_team_rank ; ///< Which thread within a team
+ int m_team_size ;
+
+ TaskExec();
+ TaskExec( PoolExec & arg_exec, int arg_team_size );
+
+public:
+
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ void * team_shared() const
+ { return m_team_exec ? m_team_exec->scratch_thread() : (void*) 0 ; }
+
+ int team_shared_size() const
+ { return m_team_exec ? m_team_exec->scratch_thread_size() : 0 ; }
+
+ /**\brief Whole team enters this function call
+ * before any teeam member returns from
+ * this function call.
+ */
+ void team_barrier() const ;
+#else
+ KOKKOS_INLINE_FUNCTION void team_barrier() const {}
+ KOKKOS_INLINE_FUNCTION void * team_shared() const { return 0 ; }
+ KOKKOS_INLINE_FUNCTION int team_shared_size() const { return 0 ; }
+#endif
+
+ KOKKOS_INLINE_FUNCTION
+ int team_rank() const { return m_team_rank ; }
+
+ KOKKOS_INLINE_FUNCTION
+ int team_size() const { return m_team_size ; }
+};
+
+}} /* namespace Kokkos::Impl */
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
+#endif /* #ifndef KOKKOS_IMPL_QTHREADS_TASK_HPP */
+
diff --git a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.cpp.old
similarity index 91%
rename from lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
rename to lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.cpp.old
index 50444177c..aa159cff6 100644
--- a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.cpp
+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.cpp.old
@@ -1,491 +1,488 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-// Experimental unified task-data parallel manycore LDRD
+// Experimental unified task-data parallel manycore LDRD.
#include <Kokkos_Core_fwd.hpp>
-#if defined( KOKKOS_ENABLE_QTHREAD )
+#if defined( KOKKOS_ENABLE_QTHREADS )
#include <stdio.h>
#include <stdlib.h>
#include <stdexcept>
#include <iostream>
#include <sstream>
#include <string>
#include <Kokkos_Atomic.hpp>
-#include <Qthread/Kokkos_Qthread_TaskPolicy.hpp>
+#include <Qthreads/Kokkos_Qthreads_TaskPolicy.hpp>
#if defined( KOKKOS_ENABLE_TASKDAG )
-//----------------------------------------------------------------------------
-
namespace Kokkos {
namespace Experimental {
namespace Impl {
-typedef TaskMember< Kokkos::Qthread , void , void > Task ;
+typedef TaskMember< Kokkos::Qthreads , void , void > Task ;
namespace {
inline
unsigned padded_sizeof_derived( unsigned sizeof_derived )
{
return sizeof_derived +
( sizeof_derived % sizeof(Task*) ? sizeof(Task*) - sizeof_derived % sizeof(Task*) : 0 );
}
// int lock_alloc_dealloc = 0 ;
} // namespace
void Task::deallocate( void * ptr )
{
// Counting on 'free' thread safety so lock/unlock not required.
// However, isolate calls here to mitigate future need to introduce lock/unlock.
// lock
// while ( ! Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 0 , 1 ) );
free( ptr );
// unlock
// Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 1 , 0 );
}
void * Task::allocate( const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity )
{
// Counting on 'malloc' thread safety so lock/unlock not required.
// However, isolate calls here to mitigate future need to introduce lock/unlock.
// lock
// while ( ! Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 0 , 1 ) );
void * const ptr = malloc( padded_sizeof_derived( arg_sizeof_derived ) + arg_dependence_capacity * sizeof(Task*) );
// unlock
// Kokkos::atomic_compare_exchange_strong( & lock_alloc_dealloc , 1 , 0 );
return ptr ;
}
Task::~TaskMember()
{
}
Task::TaskMember( const function_verify_type arg_verify
, const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
)
: m_dealloc( arg_dealloc )
, m_verify( arg_verify )
, m_apply_single( arg_apply_single )
, m_apply_team( arg_apply_team )
, m_active_count( & arg_active_count )
, m_qfeb(0)
, m_dep( (Task **)( ((unsigned char *) this) + padded_sizeof_derived( arg_sizeof_derived ) ) )
, m_dep_capacity( arg_dependence_capacity )
, m_dep_size( 0 )
, m_ref_count( 0 )
, m_state( Kokkos::Experimental::TASK_STATE_CONSTRUCTING )
{
qthread_empty( & m_qfeb ); // Set to full when complete
for ( unsigned i = 0 ; i < arg_dependence_capacity ; ++i ) m_dep[i] = 0 ;
}
Task::TaskMember( const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
)
: m_dealloc( arg_dealloc )
, m_verify( & Task::verify_type<void> )
, m_apply_single( arg_apply_single )
, m_apply_team( arg_apply_team )
, m_active_count( & arg_active_count )
, m_qfeb(0)
, m_dep( (Task **)( ((unsigned char *) this) + padded_sizeof_derived( arg_sizeof_derived ) ) )
, m_dep_capacity( arg_dependence_capacity )
, m_dep_size( 0 )
, m_ref_count( 0 )
, m_state( Kokkos::Experimental::TASK_STATE_CONSTRUCTING )
{
qthread_empty( & m_qfeb ); // Set to full when complete
for ( unsigned i = 0 ; i < arg_dependence_capacity ; ++i ) m_dep[i] = 0 ;
}
//----------------------------------------------------------------------------
void Task::throw_error_add_dependence() const
{
- std::cerr << "TaskMember< Qthread >::add_dependence ERROR"
+ std::cerr << "TaskMember< Qthreads >::add_dependence ERROR"
<< " state(" << m_state << ")"
<< " dep_size(" << m_dep_size << ")"
<< std::endl ;
- throw std::runtime_error("TaskMember< Qthread >::add_dependence ERROR");
+ throw std::runtime_error("TaskMember< Qthreads >::add_dependence ERROR");
}
void Task::throw_error_verify_type()
{
- throw std::runtime_error("TaskMember< Qthread >::verify_type ERROR");
+ throw std::runtime_error("TaskMember< Qthreads >::verify_type ERROR");
}
//----------------------------------------------------------------------------
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
void Task::assign( Task ** const lhs , Task * rhs , const bool no_throw )
{
- static const char msg_error_header[] = "Kokkos::Impl::TaskManager<Kokkos::Qthread>::assign ERROR" ;
+ static const char msg_error_header[] = "Kokkos::Impl::TaskManager<Kokkos::Qthreads>::assign ERROR" ;
static const char msg_error_count[] = ": negative reference count" ;
static const char msg_error_complete[] = ": destroy task that is not complete" ;
static const char msg_error_dependences[] = ": destroy task that has dependences" ;
static const char msg_error_exception[] = ": caught internal exception" ;
if ( rhs ) { Kokkos::atomic_increment( &(*rhs).m_ref_count ); }
Task * const lhs_val = Kokkos::atomic_exchange( lhs , rhs );
if ( lhs_val ) {
const int count = Kokkos::atomic_fetch_add( & (*lhs_val).m_ref_count , -1 );
const char * msg_error = 0 ;
try {
if ( 1 == count ) {
// Reference count at zero, delete it
// Should only be deallocating a completed task
if ( (*lhs_val).m_state == Kokkos::Experimental::TASK_STATE_COMPLETE ) {
// A completed task should not have dependences...
for ( int i = 0 ; i < (*lhs_val).m_dep_size && 0 == msg_error ; ++i ) {
if ( (*lhs_val).m_dep[i] ) msg_error = msg_error_dependences ;
}
}
else {
msg_error = msg_error_complete ;
}
if ( 0 == msg_error ) {
// Get deletion function and apply it
const Task::function_dealloc_type d = (*lhs_val).m_dealloc ;
(*d)( lhs_val );
}
}
else if ( count <= 0 ) {
msg_error = msg_error_count ;
}
}
catch( ... ) {
if ( 0 == msg_error ) msg_error = msg_error_exception ;
}
if ( 0 != msg_error ) {
if ( no_throw ) {
std::cerr << msg_error_header << msg_error << std::endl ;
std::cerr.flush();
}
else {
std::string msg(msg_error_header);
msg.append(msg_error);
throw std::runtime_error( msg );
}
}
}
}
#endif
//----------------------------------------------------------------------------
void Task::closeout()
{
enum { RESPAWN = int( Kokkos::Experimental::TASK_STATE_WAITING ) |
int( Kokkos::Experimental::TASK_STATE_EXECUTING ) };
#if 0
fprintf( stdout
, "worker(%d.%d) task 0x%.12lx %s\n"
, qthread_shep()
, qthread_worker_local(NULL)
, reinterpret_cast<unsigned long>(this)
, ( m_state == RESPAWN ? "respawn" : "complete" )
);
fflush(stdout);
#endif
// When dependent tasks run there would be a race
// condition between destroying this task and
// querying the active count pointer from this task.
int volatile * const active_count = m_active_count ;
if ( m_state == RESPAWN ) {
// Task requests respawn, set state to waiting and reschedule the task
m_state = Kokkos::Experimental::TASK_STATE_WAITING ;
schedule();
}
else {
// Task did not respawn, is complete
m_state = Kokkos::Experimental::TASK_STATE_COMPLETE ;
// Release dependences before allowing dependent tasks to run.
// Otherwise there is a thread race condition for removing dependences.
for ( int i = 0 ; i < m_dep_size ; ++i ) {
assign( & m_dep[i] , 0 );
}
- // Set qthread FEB to full so that dependent tasks are allowed to execute.
+ // Set Qthreads FEB to full so that dependent tasks are allowed to execute.
// This 'task' may be deleted immediately following this function call.
qthread_fill( & m_qfeb );
// The dependent task could now complete and destroy 'this' task
// before the call to 'qthread_fill' returns. Therefore, for
// thread safety assume that 'this' task has now been destroyed.
}
// Decrement active task count before returning.
Kokkos::atomic_decrement( active_count );
}
aligned_t Task::qthread_func( void * arg )
{
Task * const task = reinterpret_cast< Task * >(arg);
// First member of the team change state to executing.
// Use compare-exchange to avoid race condition with a respawn.
Kokkos::atomic_compare_exchange_strong( & task->m_state
, int(Kokkos::Experimental::TASK_STATE_WAITING)
, int(Kokkos::Experimental::TASK_STATE_EXECUTING)
);
if ( task->m_apply_team && ! task->m_apply_single ) {
- Kokkos::Impl::QthreadTeamPolicyMember::TaskTeam task_team_tag ;
+ Kokkos::Impl::QthreadsTeamPolicyMember::TaskTeam task_team_tag ;
// Initialize team size and rank with shephered info
- Kokkos::Impl::QthreadTeamPolicyMember member( task_team_tag );
+ Kokkos::Impl::QthreadsTeamPolicyMember member( task_team_tag );
(*task->m_apply_team)( task , member );
#if 0
fprintf( stdout
, "worker(%d.%d) task 0x%.12lx executed by member(%d:%d)\n"
, qthread_shep()
, qthread_worker_local(NULL)
, reinterpret_cast<unsigned long>(task)
, member.team_rank()
, member.team_size()
);
fflush(stdout);
#endif
member.team_barrier();
if ( member.team_rank() == 0 ) task->closeout();
member.team_barrier();
}
else if ( task->m_apply_team && task->m_apply_single == reinterpret_cast<function_single_type>(1) ) {
// Team hard-wired to one, no cloning
- Kokkos::Impl::QthreadTeamPolicyMember member ;
+ Kokkos::Impl::QthreadsTeamPolicyMember member ;
(*task->m_apply_team)( task , member );
task->closeout();
}
else {
(*task->m_apply_single)( task );
task->closeout();
}
#if 0
fprintf( stdout
, "worker(%d.%d) task 0x%.12lx return\n"
, qthread_shep()
, qthread_worker_local(NULL)
, reinterpret_cast<unsigned long>(task)
);
fflush(stdout);
#endif
return 0 ;
}
void Task::respawn()
{
// Change state from pure executing to ( waiting | executing )
// to avoid confusion with simply waiting.
Kokkos::atomic_compare_exchange_strong( & m_state
, int(Kokkos::Experimental::TASK_STATE_EXECUTING)
, int(Kokkos::Experimental::TASK_STATE_WAITING |
Kokkos::Experimental::TASK_STATE_EXECUTING)
);
}
void Task::schedule()
{
// Is waiting for execution
// Increment active task count before spawning.
Kokkos::atomic_increment( m_active_count );
- // spawn in qthread. must malloc the precondition array and give to qthread.
- // qthread will eventually free this allocation so memory will not be leaked.
+ // spawn in Qthreads. must malloc the precondition array and give to Qthreads.
+ // Qthreads will eventually free this allocation so memory will not be leaked.
// concern with thread safety of malloc, does this need to be guarded?
aligned_t ** qprecon = (aligned_t **) malloc( ( m_dep_size + 1 ) * sizeof(aligned_t *) );
qprecon[0] = reinterpret_cast<aligned_t *>( uintptr_t(m_dep_size) );
for ( int i = 0 ; i < m_dep_size ; ++i ) {
- qprecon[i+1] = & m_dep[i]->m_qfeb ; // Qthread precondition flag
+ qprecon[i+1] = & m_dep[i]->m_qfeb ; // Qthreads precondition flag
}
if ( m_apply_team && ! m_apply_single ) {
// If more than one shepherd spawn on a shepherd other than this shepherd
const int num_shepherd = qthread_num_shepherds();
const int num_worker_per_shepherd = qthread_num_workers_local(NO_SHEPHERD);
const int this_shepherd = qthread_shep();
int spawn_shepherd = ( this_shepherd + 1 ) % num_shepherd ;
#if 0
fprintf( stdout
, "worker(%d.%d) task 0x%.12lx spawning on shepherd(%d) clone(%d)\n"
, qthread_shep()
, qthread_worker_local(NULL)
, reinterpret_cast<unsigned long>(this)
, spawn_shepherd
, num_worker_per_shepherd - 1
);
fflush(stdout);
#endif
qthread_spawn_cloneable
( & Task::qthread_func
, this
, 0
, NULL
, m_dep_size , qprecon /* dependences */
, spawn_shepherd
, unsigned( QTHREAD_SPAWN_SIMPLE | QTHREAD_SPAWN_LOCAL_PRIORITY )
, num_worker_per_shepherd - 1
);
}
else {
qthread_spawn( & Task::qthread_func /* function */
, this /* function argument */
, 0
, NULL
, m_dep_size , qprecon /* dependences */
, NO_SHEPHERD
, QTHREAD_SPAWN_SIMPLE /* allows optimization for non-blocking task */
);
}
}
} // namespace Impl
} // namespace Experimental
} // namespace Kokkos
namespace Kokkos {
namespace Experimental {
-TaskPolicy< Kokkos::Qthread >::
+TaskPolicy< Kokkos::Qthreads >::
TaskPolicy
( const unsigned /* arg_task_max_count */
, const unsigned /* arg_task_max_size */
, const unsigned arg_task_default_dependence_capacity
, const unsigned arg_task_team_size
)
: m_default_dependence_capacity( arg_task_default_dependence_capacity )
, m_team_size( arg_task_team_size != 0 ? arg_task_team_size : unsigned(qthread_num_workers_local(NO_SHEPHERD)) )
, m_active_count_root(0)
, m_active_count( m_active_count_root )
{
const unsigned num_worker_per_shepherd = unsigned( qthread_num_workers_local(NO_SHEPHERD) );
if ( m_team_size != 1 && m_team_size != num_worker_per_shepherd ) {
std::ostringstream msg ;
- msg << "Kokkos::Experimental::TaskPolicy< Kokkos::Qthread >( "
+ msg << "Kokkos::Experimental::TaskPolicy< Kokkos::Qthreads >( "
<< "default_depedence = " << arg_task_default_dependence_capacity
<< " , team_size = " << arg_task_team_size
<< " ) ERROR, valid team_size arguments are { (omitted) , 1 , " << num_worker_per_shepherd << " }" ;
Kokkos::Impl::throw_runtime_exception(msg.str());
}
}
-TaskPolicy< Kokkos::Qthread >::member_type &
-TaskPolicy< Kokkos::Qthread >::member_single()
+TaskPolicy< Kokkos::Qthreads >::member_type &
+TaskPolicy< Kokkos::Qthreads >::member_single()
{
static member_type s ;
return s ;
}
-void wait( Kokkos::Experimental::TaskPolicy< Kokkos::Qthread > & policy )
+void wait( Kokkos::Experimental::TaskPolicy< Kokkos::Qthreads > & policy )
{
volatile int * const active_task_count = & policy.m_active_count ;
while ( *active_task_count ) qthread_yield();
}
} // namespace Experimental
} // namespace Kokkos
-#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
-#endif /* #if defined( KOKKOS_ENABLE_QTHREAD ) */
-
+#endif // #if defined( KOKKOS_ENABLE_TASKDAG )
+#endif // #if defined( KOKKOS_ENABLE_QTHREADS )
diff --git a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.hpp.old
similarity index 90%
rename from lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.hpp
rename to lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.hpp.old
index 565dbf7e6..1e5a4dc59 100644
--- a/lib/kokkos/core/src/Qthread/Kokkos_Qthread_TaskPolicy.hpp
+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskPolicy.hpp.old
@@ -1,664 +1,664 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
// Experimental unified task-data parallel manycore LDRD
-#ifndef KOKKOS_QTHREAD_TASKSCHEDULER_HPP
-#define KOKKOS_QTHREAD_TASKSCHEDULER_HPP
+#ifndef KOKKOS_QTHREADS_TASKSCHEDULER_HPP
+#define KOKKOS_QTHREADS_TASKSCHEDULER_HPP
#include <string>
#include <typeinfo>
#include <stdexcept>
//----------------------------------------------------------------------------
-// Defines to enable experimental Qthread functionality
+// Defines to enable experimental Qthreads functionality
#define QTHREAD_LOCAL_PRIORITY
#define CLONED_TASKS
#include <qthread.h>
#undef QTHREAD_LOCAL_PRIORITY
#undef CLONED_TASKS
//----------------------------------------------------------------------------
-#include <Kokkos_Qthread.hpp>
+#include <Kokkos_Qthreads.hpp>
#include <Kokkos_TaskScheduler.hpp>
#include <Kokkos_View.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
#if defined( KOKKOS_ENABLE_TASKDAG )
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
namespace Impl {
template<>
-class TaskMember< Kokkos::Qthread , void , void >
+class TaskMember< Kokkos::Qthreads , void , void >
{
public:
typedef TaskMember * (* function_verify_type) ( TaskMember * );
typedef void (* function_single_type) ( TaskMember * );
- typedef void (* function_team_type) ( TaskMember * , Kokkos::Impl::QthreadTeamPolicyMember & );
+ typedef void (* function_team_type) ( TaskMember * , Kokkos::Impl::QthreadsTeamPolicyMember & );
typedef void (* function_dealloc_type)( TaskMember * );
private:
const function_dealloc_type m_dealloc ; ///< Deallocation
const function_verify_type m_verify ; ///< Result type verification
const function_single_type m_apply_single ; ///< Apply function
const function_team_type m_apply_team ; ///< Apply function
int volatile * const m_active_count ; ///< Count of active tasks on this policy
- aligned_t m_qfeb ; ///< Qthread full/empty bit
+ aligned_t m_qfeb ; ///< Qthreads full/empty bit
TaskMember ** const m_dep ; ///< Dependences
const int m_dep_capacity ; ///< Capacity of dependences
int m_dep_size ; ///< Actual count of dependences
int m_ref_count ; ///< Reference count
int m_state ; ///< State of the task
TaskMember() /* = delete */ ;
TaskMember( const TaskMember & ) /* = delete */ ;
TaskMember & operator = ( const TaskMember & ) /* = delete */ ;
static aligned_t qthread_func( void * arg );
static void * allocate( const unsigned arg_sizeof_derived , const unsigned arg_dependence_capacity );
static void deallocate( void * );
void throw_error_add_dependence() const ;
static void throw_error_verify_type();
template < class DerivedTaskType >
static
void deallocate( TaskMember * t )
{
DerivedTaskType * ptr = static_cast< DerivedTaskType * >(t);
ptr->~DerivedTaskType();
deallocate( (void *) ptr );
}
void schedule();
void closeout();
protected :
~TaskMember();
- // Used by TaskMember< Qthread , ResultType , void >
+ // Used by TaskMember< Qthreads , ResultType , void >
TaskMember( const function_verify_type arg_verify
, const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
);
- // Used for TaskMember< Qthread , void , void >
+ // Used for TaskMember< Qthreads , void , void >
TaskMember( const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
);
public:
template< typename ResultType >
KOKKOS_FUNCTION static
TaskMember * verify_type( TaskMember * t )
{
enum { check_type = ! std::is_same< ResultType , void >::value };
if ( check_type && t != 0 ) {
// Verify that t->m_verify is this function
const function_verify_type self = & TaskMember::template verify_type< ResultType > ;
if ( t->m_verify != self ) {
t = 0 ;
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
throw_error_verify_type();
#endif
}
}
return t ;
}
//----------------------------------------
/* Inheritence Requirements on task types:
* typedef FunctorType::value_type value_type ;
* class DerivedTaskType
- * : public TaskMember< Qthread , value_type , FunctorType >
+ * : public TaskMember< Qthreads , value_type , FunctorType >
* { ... };
- * class TaskMember< Qthread , value_type , FunctorType >
- * : public TaskMember< Qthread , value_type , void >
+ * class TaskMember< Qthreads , value_type , FunctorType >
+ * : public TaskMember< Qthreads , value_type , void >
* , public Functor
* { ... };
* If value_type != void
- * class TaskMember< Qthread , value_type , void >
- * : public TaskMember< Qthread , void , void >
+ * class TaskMember< Qthreads , value_type , void >
+ * : public TaskMember< Qthreads , void , void >
*
* Allocate space for DerivedTaskType followed by TaskMember*[ dependence_capacity ]
*
*/
/** \brief Allocate and construct a single-thread task */
template< class DerivedTaskType >
static
TaskMember * create_single( const typename DerivedTaskType::functor_type & arg_functor
, volatile int & arg_active_count
, const unsigned arg_dependence_capacity )
{
typedef typename DerivedTaskType::functor_type functor_type ;
typedef typename functor_type::value_type value_type ;
DerivedTaskType * const task =
new( allocate( sizeof(DerivedTaskType) , arg_dependence_capacity ) )
DerivedTaskType( & TaskMember::template deallocate< DerivedTaskType >
, & TaskMember::template apply_single< functor_type , value_type >
, 0
, arg_active_count
, sizeof(DerivedTaskType)
, arg_dependence_capacity
, arg_functor );
return static_cast< TaskMember * >( task );
}
/** \brief Allocate and construct a team-thread task */
template< class DerivedTaskType >
static
TaskMember * create_team( const typename DerivedTaskType::functor_type & arg_functor
, volatile int & arg_active_count
, const unsigned arg_dependence_capacity
, const bool arg_is_team )
{
typedef typename DerivedTaskType::functor_type functor_type ;
typedef typename functor_type::value_type value_type ;
const function_single_type flag = reinterpret_cast<function_single_type>( arg_is_team ? 0 : 1 );
DerivedTaskType * const task =
new( allocate( sizeof(DerivedTaskType) , arg_dependence_capacity ) )
DerivedTaskType( & TaskMember::template deallocate< DerivedTaskType >
, flag
, & TaskMember::template apply_team< functor_type , value_type >
, arg_active_count
, sizeof(DerivedTaskType)
, arg_dependence_capacity
, arg_functor );
return static_cast< TaskMember * >( task );
}
void respawn();
void spawn()
{
m_state = Kokkos::Experimental::TASK_STATE_WAITING ;
schedule();
}
//----------------------------------------
typedef FutureValueTypeIsVoidError get_result_type ;
KOKKOS_INLINE_FUNCTION
get_result_type get() const { return get_result_type() ; }
KOKKOS_INLINE_FUNCTION
Kokkos::Experimental::TaskState get_state() const { return Kokkos::Experimental::TaskState( m_state ); }
//----------------------------------------
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
static
void assign( TaskMember ** const lhs , TaskMember * const rhs , const bool no_throw = false );
#else
KOKKOS_INLINE_FUNCTION static
void assign( TaskMember ** const lhs , TaskMember * const rhs , const bool no_throw = false ) {}
#endif
KOKKOS_INLINE_FUNCTION
TaskMember * get_dependence( int i ) const
{ return ( Kokkos::Experimental::TASK_STATE_EXECUTING == m_state && 0 <= i && i < m_dep_size ) ? m_dep[i] : (TaskMember*) 0 ; }
KOKKOS_INLINE_FUNCTION
int get_dependence() const
{ return m_dep_size ; }
KOKKOS_INLINE_FUNCTION
void clear_dependence()
{
for ( int i = 0 ; i < m_dep_size ; ++i ) assign( m_dep + i , 0 );
m_dep_size = 0 ;
}
KOKKOS_INLINE_FUNCTION
void add_dependence( TaskMember * before )
{
if ( ( Kokkos::Experimental::TASK_STATE_CONSTRUCTING == m_state ||
Kokkos::Experimental::TASK_STATE_EXECUTING == m_state ) &&
m_dep_size < m_dep_capacity ) {
assign( m_dep + m_dep_size , before );
++m_dep_size ;
}
else {
throw_error_add_dependence();
}
}
//----------------------------------------
template< class FunctorType , class ResultType >
KOKKOS_INLINE_FUNCTION static
void apply_single( typename std::enable_if< ! std::is_same< ResultType , void >::value , TaskMember * >::type t )
{
- typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
+ typedef TaskMember< Kokkos::Qthreads , ResultType , FunctorType > derived_type ;
- // TaskMember< Kokkos::Qthread , ResultType , FunctorType >
- // : public TaskMember< Kokkos::Qthread , ResultType , void >
+ // TaskMember< Kokkos::Qthreads , ResultType , FunctorType >
+ // : public TaskMember< Kokkos::Qthreads , ResultType , void >
// , public FunctorType
// { ... };
derived_type & m = * static_cast< derived_type * >( t );
Kokkos::Impl::FunctorApply< FunctorType , void , ResultType & >::apply( (FunctorType &) m , & m.m_result );
}
template< class FunctorType , class ResultType >
KOKKOS_INLINE_FUNCTION static
void apply_single( typename std::enable_if< std::is_same< ResultType , void >::value , TaskMember * >::type t )
{
- typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
+ typedef TaskMember< Kokkos::Qthreads , ResultType , FunctorType > derived_type ;
- // TaskMember< Kokkos::Qthread , ResultType , FunctorType >
- // : public TaskMember< Kokkos::Qthread , ResultType , void >
+ // TaskMember< Kokkos::Qthreads , ResultType , FunctorType >
+ // : public TaskMember< Kokkos::Qthreads , ResultType , void >
// , public FunctorType
// { ... };
derived_type & m = * static_cast< derived_type * >( t );
Kokkos::Impl::FunctorApply< FunctorType , void , void >::apply( (FunctorType &) m );
}
//----------------------------------------
template< class FunctorType , class ResultType >
KOKKOS_INLINE_FUNCTION static
void apply_team( typename std::enable_if< ! std::is_same< ResultType , void >::value , TaskMember * >::type t
- , Kokkos::Impl::QthreadTeamPolicyMember & member )
+ , Kokkos::Impl::QthreadsTeamPolicyMember & member )
{
- typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
+ typedef TaskMember< Kokkos::Qthreads , ResultType , FunctorType > derived_type ;
derived_type & m = * static_cast< derived_type * >( t );
m.FunctorType::apply( member , m.m_result );
}
template< class FunctorType , class ResultType >
KOKKOS_INLINE_FUNCTION static
void apply_team( typename std::enable_if< std::is_same< ResultType , void >::value , TaskMember * >::type t
- , Kokkos::Impl::QthreadTeamPolicyMember & member )
+ , Kokkos::Impl::QthreadsTeamPolicyMember & member )
{
- typedef TaskMember< Kokkos::Qthread , ResultType , FunctorType > derived_type ;
+ typedef TaskMember< Kokkos::Qthreads , ResultType , FunctorType > derived_type ;
derived_type & m = * static_cast< derived_type * >( t );
m.FunctorType::apply( member );
}
};
//----------------------------------------------------------------------------
-/** \brief Base class for tasks with a result value in the Qthread execution space.
+/** \brief Base class for tasks with a result value in the Qthreads execution space.
*
* The FunctorType must be void because this class is accessed by the
* Future class for the task and result value.
*
* Must be derived from TaskMember<S,void,void> 'root class' so the Future class
* can correctly static_cast from the 'root class' to this class.
*/
template < class ResultType >
-class TaskMember< Kokkos::Qthread , ResultType , void >
- : public TaskMember< Kokkos::Qthread , void , void >
+class TaskMember< Kokkos::Qthreads , ResultType , void >
+ : public TaskMember< Kokkos::Qthreads , void , void >
{
public:
ResultType m_result ;
typedef const ResultType & get_result_type ;
KOKKOS_INLINE_FUNCTION
get_result_type get() const { return m_result ; }
protected:
- typedef TaskMember< Kokkos::Qthread , void , void > task_root_type ;
+ typedef TaskMember< Kokkos::Qthreads , void , void > task_root_type ;
typedef task_root_type::function_dealloc_type function_dealloc_type ;
typedef task_root_type::function_single_type function_single_type ;
typedef task_root_type::function_team_type function_team_type ;
inline
TaskMember( const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
)
: task_root_type( & task_root_type::template verify_type< ResultType >
, arg_dealloc
, arg_apply_single
, arg_apply_team
, arg_active_count
, arg_sizeof_derived
, arg_dependence_capacity )
, m_result()
{}
};
template< class ResultType , class FunctorType >
-class TaskMember< Kokkos::Qthread , ResultType , FunctorType >
- : public TaskMember< Kokkos::Qthread , ResultType , void >
+class TaskMember< Kokkos::Qthreads , ResultType , FunctorType >
+ : public TaskMember< Kokkos::Qthreads , ResultType , void >
, public FunctorType
{
public:
typedef FunctorType functor_type ;
- typedef TaskMember< Kokkos::Qthread , void , void > task_root_type ;
- typedef TaskMember< Kokkos::Qthread , ResultType , void > task_base_type ;
+ typedef TaskMember< Kokkos::Qthreads , void , void > task_root_type ;
+ typedef TaskMember< Kokkos::Qthreads , ResultType , void > task_base_type ;
typedef task_root_type::function_dealloc_type function_dealloc_type ;
typedef task_root_type::function_single_type function_single_type ;
typedef task_root_type::function_team_type function_team_type ;
inline
TaskMember( const function_dealloc_type arg_dealloc
, const function_single_type arg_apply_single
, const function_team_type arg_apply_team
, volatile int & arg_active_count
, const unsigned arg_sizeof_derived
, const unsigned arg_dependence_capacity
, const functor_type & arg_functor
)
: task_base_type( arg_dealloc
, arg_apply_single
, arg_apply_team
, arg_active_count
, arg_sizeof_derived
, arg_dependence_capacity )
, functor_type( arg_functor )
{}
};
} /* namespace Impl */
} /* namespace Experimental */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Experimental {
-void wait( TaskPolicy< Kokkos::Qthread > & );
+void wait( TaskPolicy< Kokkos::Qthreads > & );
template<>
-class TaskPolicy< Kokkos::Qthread >
+class TaskPolicy< Kokkos::Qthreads >
{
public:
- typedef Kokkos::Qthread execution_space ;
+ typedef Kokkos::Qthreads execution_space ;
typedef TaskPolicy execution_policy ;
- typedef Kokkos::Impl::QthreadTeamPolicyMember member_type ;
+ typedef Kokkos::Impl::QthreadsTeamPolicyMember member_type ;
private:
typedef Impl::TaskMember< execution_space , void , void > task_root_type ;
template< class FunctorType >
static inline
const task_root_type * get_task_root( const FunctorType * f )
{
typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
return static_cast< const task_root_type * >( static_cast< const task_type * >(f) );
}
template< class FunctorType >
static inline
task_root_type * get_task_root( FunctorType * f )
{
typedef Impl::TaskMember< execution_space , typename FunctorType::value_type , FunctorType > task_type ;
return static_cast< task_root_type * >( static_cast< task_type * >(f) );
}
unsigned m_default_dependence_capacity ;
unsigned m_team_size ;
volatile int m_active_count_root ;
volatile int & m_active_count ;
public:
TaskPolicy
( const unsigned arg_task_max_count
, const unsigned arg_task_max_size
, const unsigned arg_task_default_dependence_capacity = 4
, const unsigned arg_task_team_size = 0 /* choose default */
);
KOKKOS_FUNCTION TaskPolicy() = default ;
KOKKOS_FUNCTION TaskPolicy( TaskPolicy && rhs ) = default ;
KOKKOS_FUNCTION TaskPolicy( const TaskPolicy & rhs ) = default ;
KOKKOS_FUNCTION TaskPolicy & operator = ( TaskPolicy && rhs ) = default ;
KOKKOS_FUNCTION TaskPolicy & operator = ( const TaskPolicy & rhs ) = default ;
//----------------------------------------
KOKKOS_INLINE_FUNCTION
int allocated_task_count() const { return m_active_count ; }
template< class ValueType >
const Future< ValueType , execution_space > &
spawn( const Future< ValueType , execution_space > & f
, const bool priority = false ) const
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
f.m_task->spawn();
#endif
return f ;
}
// Create single-thread task
template< class FunctorType >
KOKKOS_INLINE_FUNCTION
Future< typename FunctorType::value_type , execution_space >
task_create( const FunctorType & functor
, const unsigned dependence_capacity = ~0u ) const
{
typedef typename FunctorType::value_type value_type ;
typedef Impl::TaskMember< execution_space , value_type , FunctorType > task_type ;
return Future< value_type , execution_space >(
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
task_root_type::create_single< task_type >
( functor
, m_active_count
, ( ~0u == dependence_capacity ? m_default_dependence_capacity : dependence_capacity )
)
#endif
);
}
template< class FunctorType >
Future< typename FunctorType::value_type , execution_space >
proc_create( const FunctorType & functor
, const unsigned dependence_capacity = ~0u ) const
{ return task_create( functor , dependence_capacity ); }
// Create thread-team task
template< class FunctorType >
KOKKOS_INLINE_FUNCTION
Future< typename FunctorType::value_type , execution_space >
task_create_team( const FunctorType & functor
, const unsigned dependence_capacity = ~0u ) const
{
typedef typename FunctorType::value_type value_type ;
typedef Impl::TaskMember< execution_space , value_type , FunctorType > task_type ;
return Future< value_type , execution_space >(
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
task_root_type::create_team< task_type >
( functor
, m_active_count
, ( ~0u == dependence_capacity ? m_default_dependence_capacity : dependence_capacity )
, 1 < m_team_size
)
#endif
);
}
template< class FunctorType >
KOKKOS_INLINE_FUNCTION
Future< typename FunctorType::value_type , execution_space >
proc_create_team( const FunctorType & functor
, const unsigned dependence_capacity = ~0u ) const
{ return task_create_team( functor , dependence_capacity ); }
// Add dependence
template< class A1 , class A2 , class A3 , class A4 >
void add_dependence( const Future<A1,A2> & after
, const Future<A3,A4> & before
, typename std::enable_if
< std::is_same< typename Future<A1,A2>::execution_space , execution_space >::value
&&
std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
>::type * = 0
)
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
after.m_task->add_dependence( before.m_task );
#endif
}
//----------------------------------------
// Functions for an executing task functor to query dependences,
// set new dependences, and respawn itself.
template< class FunctorType >
Future< void , execution_space >
get_dependence( const FunctorType * task_functor , int i ) const
{
return Future<void,execution_space>(
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
get_task_root(task_functor)->get_dependence(i)
#endif
);
}
template< class FunctorType >
int get_dependence( const FunctorType * task_functor ) const
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return get_task_root(task_functor)->get_dependence(); }
#else
{ return 0 ; }
#endif
template< class FunctorType >
void clear_dependence( FunctorType * task_functor ) const
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
get_task_root(task_functor)->clear_dependence();
#endif
}
template< class FunctorType , class A3 , class A4 >
void add_dependence( FunctorType * task_functor
, const Future<A3,A4> & before
, typename std::enable_if
< std::is_same< typename Future<A3,A4>::execution_space , execution_space >::value
>::type * = 0
)
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
get_task_root(task_functor)->add_dependence( before.m_task );
#endif
}
template< class FunctorType >
void respawn( FunctorType * task_functor
, const bool priority = false ) const
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
get_task_root(task_functor)->respawn();
#endif
}
template< class FunctorType >
void respawn_needing_memory( FunctorType * task_functor ) const
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
get_task_root(task_functor)->respawn();
#endif
}
static member_type & member_single();
- friend void wait( TaskPolicy< Kokkos::Qthread > & );
+ friend void wait( TaskPolicy< Kokkos::Qthreads > & );
};
} /* namespace Experimental */
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
-#endif /* #define KOKKOS_QTHREAD_TASK_HPP */
+#endif /* #define KOKKOS_QTHREADS_TASK_HPP */
diff --git a/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue.hpp
new file mode 100644
index 000000000..55235cd6d
--- /dev/null
+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue.hpp
@@ -0,0 +1,319 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#if defined( KOKKOS_ENABLE_TASKPOLICY )
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+/** \brief Manage task allocation, deallocation, and scheduling.
+ *
+ * Task execution is handled here directly for the Qthread implementation.
+ */
+template<>
+class TaskQueue< Kokkos::Qthread > {
+private:
+
+ using execution_space = Kokkos::Qthread ;
+ using memory_space = Kokkos::HostSpace
+ using device_type = Kokkos::Device< execution_space, memory_space > ;
+ using memory_pool = Kokkos::Experimental::MemoryPool< device_type > ;
+ using task_root_type = Kokkos::Impl::TaskBase< execution_space, void, void > ;
+
+ friend class Kokkos::TaskScheduler< execution_space > ;
+
+ struct Destroy {
+ TaskQueue * m_queue ;
+ void destroy_shared_allocation();
+ };
+
+ //----------------------------------------
+
+ enum : int { TASK_STATE_NULL = 0, ///< Does not exist
+ TASK_STATE_CONSTRUCTING = 1, ///< Is under construction
+ TASK_STATE_WAITING = 2, ///< Is waiting for execution
+ TASK_STATE_EXECUTING = 4, ///< Is executing
+ TASK_STATE_RESPAWN = 8, ///< Requested respawn
+ TASK_STATE_COMPLETE = 16 ///< Execution is complete
+ };
+
+ // Queue is organized as [ priority ][ type ]
+
+ memory_pool m_memory ;
+ unsigned m_team_size ; // Number of threads in a team
+ long m_accum_alloc ; // Accumulated number of allocations
+ int m_count_alloc ; // Current number of allocations
+ int m_max_alloc ; // Maximum number of allocations
+ int m_ready_count ; // Number of ready or executing
+
+ //----------------------------------------
+
+ ~TaskQueue();
+ TaskQueue() = delete ;
+ TaskQueue( TaskQueue && ) = delete ;
+ TaskQueue( TaskQueue const & ) = delete ;
+ TaskQueue & operator = ( TaskQueue && ) = delete ;
+ TaskQueue & operator = ( TaskQueue const & ) = delete ;
+
+ TaskQueue
+ ( const memory_space & arg_space,
+ unsigned const arg_memory_pool_capacity,
+ unsigned const arg_memory_pool_superblock_capacity_log2
+ );
+
+ // Schedule a task
+ // Precondition:
+ // task is not executing
+ // task->m_next is the dependence or zero
+ // Postcondition:
+ // task->m_next is linked list membership
+ KOKKOS_FUNCTION
+ void schedule( task_root_type * const );
+
+ // Reschedule a task
+ // Precondition:
+ // task is in Executing state
+ // task->m_next == LockTag
+ // Postcondition:
+ // task is in Executing-Respawn state
+ // task->m_next == 0 (no dependence)
+ KOKKOS_FUNCTION
+ void reschedule( task_root_type * );
+
+ // Complete a task
+ // Precondition:
+ // task is not executing
+ // task->m_next == LockTag => task is complete
+ // task->m_next != LockTag => task is respawn
+ // Postcondition:
+ // task->m_wait == LockTag => task is complete
+ // task->m_wait != LockTag => task is waiting
+ KOKKOS_FUNCTION
+ void complete( task_root_type * );
+
+public:
+
+ // If and only if the execution space is a single thread
+ // then execute ready tasks.
+ KOKKOS_INLINE_FUNCTION
+ void iff_single_thread_recursive_execute()
+ {
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ specialization::iff_single_thread_recursive_execute( this );
+#endif
+ }
+
+ void execute() { specialization::execute( this ); }
+
+ template< typename FunctorType >
+ void proc_set_apply( typename task_root_type::function_type * ptr )
+ {
+ specialization::template proc_set_apply< FunctorType >( ptr );
+ }
+
+ // Assign task pointer with reference counting of assigned tasks
+ template< typename LV, typename RV >
+ KOKKOS_FUNCTION static
+ void assign( TaskBase< execution_space, LV, void > ** const lhs,
+ TaskBase< execution_space, RV, void > * const rhs )
+ {
+ using task_lhs = TaskBase< execution_space, LV, void > ;
+#if 0
+ {
+ printf( "assign( 0x%lx { 0x%lx %d %d }, 0x%lx { 0x%lx %d %d } )\n",
+ uintptr_t( lhs ? *lhs : 0 ),
+ uintptr_t( lhs && *lhs ? (*lhs)->m_next : 0 ),
+ int( lhs && *lhs ? (*lhs)->m_task_type : 0 ),
+ int( lhs && *lhs ? (*lhs)->m_ref_count : 0 ),
+ uintptr_t(rhs),
+ uintptr_t( rhs ? rhs->m_next : 0 ),
+ int( rhs ? rhs->m_task_type : 0 ),
+ int( rhs ? rhs->m_ref_count : 0 )
+ );
+ fflush( stdout );
+ }
+#endif
+
+ if ( *lhs )
+ {
+ const int count = Kokkos::atomic_fetch_add( &((*lhs)->m_ref_count), -1 );
+
+ if ( ( 1 == count ) && ( (*lhs)->m_state == TASK_STATE_COMPLETE ) ) {
+ // Reference count is zero and task is complete, deallocate.
+ (*lhs)->m_queue->deallocate( *lhs, (*lhs)->m_alloc_size );
+ }
+ else if ( count <= 1 ) {
+ Kokkos::abort("TaskScheduler task has negative reference count or is incomplete" );
+ }
+
+ // GEM: Should I check that there are no dependences here? Can the state
+ // be set to complete while there are still dependences?
+ }
+
+ if ( rhs ) { Kokkos::atomic_fetch_add( &(rhs->m_ref_count), 1 ); }
+
+ // Force write of *lhs
+
+ *static_cast< task_lhs * volatile * >(lhs) = rhs ;
+
+ Kokkos::memory_fence();
+ }
+
+ KOKKOS_FUNCTION
+ size_t allocate_block_size( size_t n ); ///< Actual block size allocated
+
+ KOKKOS_FUNCTION
+ void * allocate( size_t n ); ///< Allocate from the memory pool
+
+ KOKKOS_FUNCTION
+ void deallocate( void * p, size_t n ); ///< Deallocate to the memory pool
+};
+
+} /* namespace Impl */
+} /* namespace Kokkos */
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+template<>
+class TaskBase< Kokkos::Qthread, void, void >
+{
+public:
+
+ enum : int16_t { TaskTeam = TaskBase< void, void, void >::TaskTeam,
+ TaskSingle = TaskBase< void, void, void >::TaskSingle,
+ Aggregate = TaskBase< void, void, void >::Aggregate };
+
+ enum : uintptr_t { LockTag = TaskBase< void, void, void >::LockTag,
+ EndTag = TaskBase< void, void, void >::EndTag };
+
+ using execution_space = Kokkos::Qthread ;
+ using queue_type = TaskQueue< execution_space > ;
+
+ template< typename > friend class Kokkos::TaskScheduler ;
+
+ typedef void (* function_type) ( TaskBase *, void * );
+
+ // sizeof(TaskBase) == 48
+
+ function_type m_apply ; ///< Apply function pointer
+ queue_type * m_queue ; ///< Queue in which this task resides
+ TaskBase * m_dep ; ///< Dependence
+ int32_t m_ref_count ; ///< Reference count
+ int32_t m_alloc_size ; ///< Allocation size
+ int32_t m_dep_count ; ///< Aggregate's number of dependences
+ int16_t m_task_type ; ///< Type of task
+ int16_t m_priority ; ///< Priority of runnable task
+ aligned_t m_qfeb ; ///< Qthread full/empty bit
+ int m_state ; ///< State of the task
+
+ TaskBase( TaskBase && ) = delete ;
+ TaskBase( const TaskBase & ) = delete ;
+ TaskBase & operator = ( TaskBase && ) = delete ;
+ TaskBase & operator = ( const TaskBase & ) = delete ;
+
+ KOKKOS_INLINE_FUNCTION ~TaskBase() = default ;
+
+ KOKKOS_INLINE_FUNCTION
+ constexpr TaskBase() noexcept
+ : m_apply(0),
+ m_queue(0),
+ m_dep(0),
+ m_ref_count(0),
+ m_alloc_size(0),
+ m_dep_count(0),
+ m_task_type( TaskSingle ),
+ m_priority( 1 /* TaskRegularPriority */ ),
+ m_qfeb(0),
+ m_state( queue_type::TASK_STATE_CONSTRUCTING )
+ {
+ qthread_empty( & m_qfeb ); // Set to full when complete
+ }
+
+ //----------------------------------------
+
+ static aligned_t qthread_func( void * arg );
+
+ KOKKOS_INLINE_FUNCTION
+ TaskBase ** aggregate_dependences()
+ { return reinterpret_cast<TaskBase**>( this + 1 ); }
+
+ KOKKOS_INLINE_FUNCTION
+ void requested_respawn()
+ { return m_state == queue_type::TASK_STATE_RESPAWN; }
+
+ KOKKOS_INLINE_FUNCTION
+ void add_dependence( TaskBase* dep )
+ {
+ // Assign dependence to m_dep. It will be processed in the subsequent
+ // call to schedule. Error if the dependence is reset.
+ if ( 0 != Kokkos::atomic_exchange( & m_dep, dep ) ) {
+ Kokkos::abort("TaskScheduler ERROR: resetting task dependence");
+ }
+
+ if ( 0 != dep ) {
+ // The future may be destroyed upon returning from this call
+ // so increment reference count to track this assignment.
+ Kokkos::atomic_fetch_add( &(dep->m_ref_count), 1 );
+ }
+ }
+
+ using get_return_type = void ;
+
+ KOKKOS_INLINE_FUNCTION
+ get_return_type get() const {}
+};
+
+} /* namespace Impl */
+} /* namespace Kokkos */
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
diff --git a/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue_impl.hpp b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue_impl.hpp
new file mode 100644
index 000000000..4a9190c73
--- /dev/null
+++ b/lib/kokkos/core/src/Qthreads/Kokkos_Qthreads_TaskQueue_impl.hpp
@@ -0,0 +1,436 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#if defined( KOKKOS_ENABLE_TASKPOLICY )
+
+namespace Kokkos {
+namespace Impl {
+
+//----------------------------------------------------------------------------
+
+template< typename ExecSpace >
+void TaskQueue< ExecSpace >::Destroy::destroy_shared_allocation()
+{
+ m_queue->~TaskQueue();
+}
+
+//----------------------------------------------------------------------------
+
+template< typename ExecSpace >
+TaskQueue< ExecSpace >::TaskQueue
+ ( const TaskQueue< ExecSpace >::memory_space & arg_space,
+ unsigned const arg_memory_pool_capacity,
+ unsigned const arg_memory_pool_superblock_capacity_log2 )
+ : m_memory( arg_space,
+ arg_memory_pool_capacity,
+ arg_memory_pool_superblock_capacity_log2 )
+ m_team_size( unsigned( qthread_num_workers_local(NO_SHEPHERD) ) ),
+ m_accum_alloc(0),
+ m_count_alloc(0),
+ m_max_alloc(0),
+ m_ready_count(0)
+{}
+
+//----------------------------------------------------------------------------
+
+template< typename ExecSpace >
+TaskQueue< ExecSpace >::~TaskQueue()
+{
+ // Verify that ready count is zero.
+ if ( 0 != m_ready_count ) {
+ Kokkos::abort("TaskQueue::~TaskQueue ERROR: has ready or executing tasks");
+ }
+}
+
+//----------------------------------------------------------------------------
+
+template< typename ExecSpace >
+KOKKOS_FUNCTION
+size_t TaskQueue< ExecSpace >::allocate_block_size( size_t n )
+{
+ return m_memory.allocate_block_size( n );
+}
+
+//----------------------------------------------------------------------------
+
+template< typename ExecSpace >
+KOKKOS_FUNCTION
+void * TaskQueue< ExecSpace >::allocate( size_t n )
+{
+ void * const p = m_memory.allocate(n);
+
+ if ( p ) {
+ Kokkos::atomic_increment( & m_accum_alloc );
+ Kokkos::atomic_increment( & m_count_alloc );
+
+ if ( m_max_alloc < m_count_alloc ) m_max_alloc = m_count_alloc ;
+ }
+
+ return p ;
+}
+
+//----------------------------------------------------------------------------
+
+template< typename ExecSpace >
+KOKKOS_FUNCTION
+void TaskQueue< ExecSpace >::deallocate( void * p, size_t n )
+{
+ m_memory.deallocate( p, n );
+ Kokkos::atomic_decrement( & m_count_alloc );
+}
+
+//----------------------------------------------------------------------------
+
+template< typename ExecSpace >
+KOKKOS_FUNCTION
+void TaskQueue< ExecSpace >::schedule
+ ( TaskQueue< ExecSpace >::task_root_type * const task )
+{
+#if 0
+ printf( "schedule( 0x%lx { %d %d %d }\n",
+ uintptr_t(task),
+ task->m_task_type,
+ task->m_priority,
+ task->m_ref_count );
+#endif
+
+ // The task has been constructed and is waiting to be executed.
+ task->m_state = TASK_STATE_WAITING ;
+
+ if ( task->m_task_type != task_root_type::Aggregate ) {
+ // Scheduling a single or team task.
+
+ // Increment active task count before spawning.
+ Kokkos::atomic_increment( m_ready_count );
+
+ if ( task->m_dep == 0 ) {
+ // Schedule a task with no dependences.
+
+ if ( task_root_type::TaskTeam == task->m_task_type && m_team_size > 1 ) {
+ // If more than one shepherd spawn on a shepherd other than this shepherd
+ const int num_shepherd = qthread_num_shepherds();
+ const int this_shepherd = qthread_shep();
+ int spawn_shepherd = ( this_shepherd + 1 ) % num_shepherd ;
+
+#if 0
+ fprintf( stdout,
+ "worker(%d.%d) task 0x%.12lx spawning on shepherd(%d) clone(%d)\n",
+ qthread_shep(),
+ qthread_worker_local(NULL),
+ reinterpret_cast<unsigned long>(this),
+ spawn_shepherd,
+ m_team_size - 1
+ );
+ fflush(stdout);
+#endif
+
+ qthread_spawn_cloneable(
+ & task_root_type::qthread_func,
+ task,
+ 0,
+ NULL,
+ 0, // no depenedences
+ 0, // dependences array
+ spawn_shepherd,
+ unsigned( QTHREAD_SPAWN_SIMPLE | QTHREAD_SPAWN_LOCAL_PRIORITY ),
+ m_team_size - 1
+ );
+ }
+ else {
+ qthread_spawn(
+ & task_root_type::qthread_func,
+ task,
+ 0,
+ NULL,
+ 0, // no depenedences
+ 0, // dependences array
+ NO_SHEPHERD,
+ QTHREAD_SPAWN_SIMPLE /* allows optimization for non-blocking task */
+ );
+ }
+ }
+ else if ( task->m_dep->m_task_type != task_root_type::Aggregate )
+ // Malloc the precondition array to pass to qthread_spawn(). For
+ // non-aggregate tasks, it is a single pointer since there are no
+ // dependences. Qthreads will eventually free this allocation so memory will
+ // not be leaked. Is malloc thread-safe? Should this call be guarded? The
+ // memory can't be allocated from the pool allocator because Qthreads frees
+ // it using free().
+ aligned_t ** qprecon = (aligned_t **) malloc( sizeof(aligned_t *) );
+
+ *qprecon = reinterpret_cast<aligned_t *>( uintptr_t(m_dep_size) );
+
+ if ( task->m_task_type == task_root_type::TaskTeam && m_team_size > 1) {
+ // If more than one shepherd spawn on a shepherd other than this shepherd
+ const int num_shepherd = qthread_num_shepherds();
+ const int this_shepherd = qthread_shep();
+ int spawn_shepherd = ( this_shepherd + 1 ) % num_shepherd ;
+
+#if 0
+ fprintf( stdout,
+ "worker(%d.%d) task 0x%.12lx spawning on shepherd(%d) clone(%d)\n",
+ qthread_shep(),
+ qthread_worker_local(NULL),
+ reinterpret_cast<unsigned long>(this),
+ spawn_shepherd,
+ m_team_size - 1
+ );
+ fflush(stdout);
+#endif
+
+ qthread_spawn_cloneable(
+ & Task::qthread_func,
+ this,
+ 0,
+ NULL,
+ m_dep_size,
+ qprecon, /* dependences */
+ spawn_shepherd,
+ unsigned( QTHREAD_SPAWN_SIMPLE | QTHREAD_SPAWN_LOCAL_PRIORITY ),
+ m_team_size - 1
+ );
+ }
+ else {
+ qthread_spawn(
+ & Task::qthread_func, /* function */
+ this, /* function argument */
+ 0,
+ NULL,
+ m_dep_size,
+ qprecon, /* dependences */
+ NO_SHEPHERD,
+ QTHREAD_SPAWN_SIMPLE /* allows optimization for non-blocking task */
+ );
+ }
+ }
+ else {
+ // GEM: How do I handle an aggregate (when_all) task?
+ }
+}
+
+//----------------------------------------------------------------------------
+
+template< typename ExecSpace >
+KOKKOS_FUNCTION
+void TaskQueue< ExecSpace >::reschedule( task_root_type * task )
+{
+ // Precondition:
+ // task is in Executing state
+ // task->m_next == LockTag
+ //
+ // Postcondition:
+ // task is in Executing-Respawn state
+ // task->m_next == 0 (no dependence)
+
+ task_root_type * const zero = (task_root_type *) 0 ;
+ task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
+
+ if ( lock != Kokkos::atomic_exchange( & task->m_next, zero ) ) {
+ Kokkos::abort("TaskScheduler::respawn ERROR: already respawned");
+ }
+}
+
+//----------------------------------------------------------------------------
+
+template< typename ExecSpace >
+KOKKOS_FUNCTION
+void TaskQueue< ExecSpace >::complete
+ ( TaskQueue< ExecSpace >::task_root_type * task )
+{
+ // Complete a runnable task that has finished executing
+ // or a when_all task when all of its dependeneces are complete.
+
+ task_root_type * const zero = (task_root_type *) 0 ;
+ task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
+ task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
+
+#if 0
+ printf( "complete( 0x%lx { 0x%lx 0x%lx %d %d %d }\n",
+ uintptr_t(task),
+ uintptr_t(task->m_wait),
+ uintptr_t(task->m_next),
+ task->m_task_type,
+ task->m_priority,
+ task->m_ref_count
+ );
+ fflush( stdout );
+#endif
+
+ const bool runnable = task_root_type::Aggregate != task->m_task_type ;
+
+ //----------------------------------------
+
+ if ( runnable && lock != task->m_next ) {
+ // Is a runnable task has finished executing and requested respawn.
+ // Schedule the task for subsequent execution.
+
+ schedule( task );
+ }
+ //----------------------------------------
+ else {
+ // Is either an aggregate or a runnable task that executed
+ // and did not respawn. Transition this task to complete.
+
+ // If 'task' is an aggregate then any of the runnable tasks that
+ // it depends upon may be attempting to complete this 'task'.
+ // Must only transition a task once to complete status.
+ // This is controled by atomically locking the wait queue.
+
+ // Stop other tasks from adding themselves to this task's wait queue
+ // by locking the head of this task's wait queue.
+
+ task_root_type * x = Kokkos::atomic_exchange( & task->m_wait, lock );
+
+ if ( x != (task_root_type *) lock ) {
+
+ // This thread has transitioned this 'task' to complete.
+ // 'task' is no longer in a queue and is not executing
+ // so decrement the reference count from 'task's creation.
+ // If no other references to this 'task' then it will be deleted.
+
+ TaskQueue::assign( & task, zero );
+
+ // This thread has exclusive access to the wait list so
+ // the concurrency-safe pop_task function is not needed.
+ // Schedule the tasks that have been waiting on the input 'task',
+ // which may have been deleted.
+
+ while ( x != end ) {
+
+ // Set x->m_next = zero <= no dependence
+
+ task_root_type * const next =
+ (task_root_type *) Kokkos::atomic_exchange( & x->m_next, zero );
+
+ schedule( x );
+
+ x = next ;
+ }
+ }
+ }
+
+ if ( runnable ) {
+ // A runnable task was popped from a ready queue and executed.
+ // If respawned into a ready queue then the ready count was incremented
+ // so decrement whether respawned or not.
+ Kokkos::atomic_decrement( & m_ready_count );
+ }
+}
+
+//----------------------------------------------------------------------------
+
+template<>
+aligned_t
+TaskBase< Kokkos::Qthreads, void, void >::qthread_func( void * arg )
+{
+ using execution_space = Kokkos::Qthreads ;
+ using task_root_type = TaskBase< execution_space , void , void > ;
+ using Member = Kokkos::Impl::QthreadsTeamPolicyMember;
+
+ task_root_type * const task = reinterpret_cast< task_root_type * >( arg );
+
+ // First member of the team change state to executing.
+ // Use compare-exchange to avoid race condition with a respawn.
+ Kokkos::atomic_compare_exchange_strong( & task->m_state,
+ queue_type::TASK_STATE_WAITING,
+ queue_type::TASK_STATE_EXECUTING
+ );
+
+ if ( task_root_type::TaskTeam == task->m_task_type )
+ {
+ if ( 1 < task->m_queue->m_team_size ) {
+ // Team task with team size of more than 1.
+ Member::TaskTeam task_team_tag ;
+
+ // Initialize team size and rank with shephered info
+ Member member( task_team_tag );
+
+ (*task->m_apply)( task , & member );
+
+#if 0
+ fprintf( stdout,
+ "worker(%d.%d) task 0x%.12lx executed by member(%d:%d)\n",
+ qthread_shep(),
+ qthread_worker_local(NULL),
+ reinterpret_cast<unsigned long>(task),
+ member.team_rank(),
+ member.team_size()
+ );
+ fflush(stdout);
+#endif
+
+ member.team_barrier();
+ if ( member.team_rank() == 0 ) task->closeout();
+ member.team_barrier();
+ }
+ else {
+ // Team task with team size of 1.
+ Member member ;
+ (*task->m_apply)( task , & member );
+ task->closeout();
+ }
+ }
+ else {
+ (*task->m_apply)( task );
+ task->closeout();
+ }
+
+#if 0
+fprintf( stdout
+ , "worker(%d.%d) task 0x%.12lx return\n"
+ , qthread_shep()
+ , qthread_worker_local(NULL)
+ , reinterpret_cast<unsigned long>(task)
+ );
+fflush(stdout);
+#endif
+
+ return 0 ;
+}
+
+} /* namespace Impl */
+} /* namespace Kokkos */
+
+
+#endif /* #if defined( KOKKOS_ENABLE_TASKPOLICY ) */
+
diff --git a/lib/kokkos/core/src/Qthread/README b/lib/kokkos/core/src/Qthreads/README
similarity index 99%
rename from lib/kokkos/core/src/Qthread/README
rename to lib/kokkos/core/src/Qthreads/README
index 6e6c86a9e..e35b1f698 100644
--- a/lib/kokkos/core/src/Qthread/README
+++ b/lib/kokkos/core/src/Qthreads/README
@@ -1,25 +1,24 @@
# This Qthreads back-end uses an experimental branch of the Qthreads repository with special #define options.
# Cloning repository and branch:
git clone git@github.com:Qthreads/qthreads.git qthreads
cd qthreads
# checkout branch with "cloned tasks"
git checkout dev-kokkos
# Configure/autogen
sh autogen.sh
# configure with 'hwloc' installation:
./configure CFLAGS="-DCLONED_TASKS -DQTHREAD_LOCAL_PRIORITY" --with-hwloc=${HWLOCDIR} --prefix=${INSTALLDIR}
# install
make install
-
diff --git a/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp b/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp
index 0f69be9ed..b1f53489f 100644
--- a/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp
+++ b/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.cpp
@@ -1,826 +1,826 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core_fwd.hpp>
#if defined( KOKKOS_ENABLE_PTHREAD ) || defined( KOKKOS_ENABLE_WINTHREAD )
#include <stdint.h>
#include <limits>
#include <utility>
#include <iostream>
#include <sstream>
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_CPUDiscovery.hpp>
#include <impl/Kokkos_Profiling_Interface.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
namespace {
ThreadsExec s_threads_process ;
ThreadsExec * s_threads_exec[ ThreadsExec::MAX_THREAD_COUNT ] = { 0 };
pthread_t s_threads_pid[ ThreadsExec::MAX_THREAD_COUNT ] = { 0 };
std::pair<unsigned,unsigned> s_threads_coord[ ThreadsExec::MAX_THREAD_COUNT ];
int s_thread_pool_size[3] = { 0 , 0 , 0 };
unsigned s_current_reduce_size = 0 ;
unsigned s_current_shared_size = 0 ;
void (* volatile s_current_function)( ThreadsExec & , const void * );
const void * volatile s_current_function_arg = 0 ;
struct Sentinel {
Sentinel()
{
HostSpace::register_in_parallel( ThreadsExec::in_parallel );
}
~Sentinel()
{
if ( s_thread_pool_size[0] ||
s_thread_pool_size[1] ||
s_thread_pool_size[2] ||
s_current_reduce_size ||
s_current_shared_size ||
s_current_function ||
s_current_function_arg ||
s_threads_exec[0] ) {
std::cerr << "ERROR : Process exiting without calling Kokkos::Threads::terminate()" << std::endl ;
}
}
};
inline
unsigned fan_size( const unsigned rank , const unsigned size )
{
const unsigned rank_rev = size - ( rank + 1 );
unsigned count = 0 ;
for ( unsigned n = 1 ; ( rank_rev + n < size ) && ! ( rank_rev & n ) ; n <<= 1 ) { ++count ; }
return count ;
}
} // namespace
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
void execute_function_noop( ThreadsExec & , const void * ) {}
void ThreadsExec::driver(void)
{
ThreadsExec this_thread ;
while ( ThreadsExec::Active == this_thread.m_pool_state ) {
(*s_current_function)( this_thread , s_current_function_arg );
// Deactivate thread and wait for reactivation
this_thread.m_pool_state = ThreadsExec::Inactive ;
wait_yield( this_thread.m_pool_state , ThreadsExec::Inactive );
}
}
ThreadsExec::ThreadsExec()
: m_pool_base(0)
, m_scratch(0)
, m_scratch_reduce_end(0)
, m_scratch_thread_end(0)
, m_numa_rank(0)
, m_numa_core_rank(0)
, m_pool_rank(0)
, m_pool_size(0)
, m_pool_fan_size(0)
, m_pool_state( ThreadsExec::Terminating )
{
if ( & s_threads_process != this ) {
// A spawned thread
ThreadsExec * const nil = 0 ;
// Which entry in 's_threads_exec', possibly determined from hwloc binding
const int entry = ((size_t)s_current_function_arg) < size_t(s_thread_pool_size[0])
? ((size_t)s_current_function_arg)
: size_t(Kokkos::hwloc::bind_this_thread( s_thread_pool_size[0] , s_threads_coord ));
// Given a good entry set this thread in the 's_threads_exec' array
if ( entry < s_thread_pool_size[0] &&
nil == atomic_compare_exchange( s_threads_exec + entry , nil , this ) ) {
const std::pair<unsigned,unsigned> coord = Kokkos::hwloc::get_this_thread_coordinate();
m_numa_rank = coord.first ;
m_numa_core_rank = coord.second ;
m_pool_base = s_threads_exec ;
m_pool_rank = s_thread_pool_size[0] - ( entry + 1 );
m_pool_rank_rev = s_thread_pool_size[0] - ( pool_rank() + 1 );
m_pool_size = s_thread_pool_size[0] ;
m_pool_fan_size = fan_size( m_pool_rank , m_pool_size );
m_pool_state = ThreadsExec::Active ;
s_threads_pid[ m_pool_rank ] = pthread_self();
// Inform spawning process that the threads_exec entry has been set.
s_threads_process.m_pool_state = ThreadsExec::Active ;
}
else {
// Inform spawning process that the threads_exec entry could not be set.
s_threads_process.m_pool_state = ThreadsExec::Terminating ;
}
}
else {
// Enables 'parallel_for' to execute on unitialized Threads device
m_pool_rank = 0 ;
m_pool_size = 1 ;
m_pool_state = ThreadsExec::Inactive ;
s_threads_pid[ m_pool_rank ] = pthread_self();
}
}
ThreadsExec::~ThreadsExec()
{
const unsigned entry = m_pool_size - ( m_pool_rank + 1 );
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;
if ( m_scratch ) {
Record * const r = Record::get_record( m_scratch );
m_scratch = 0 ;
Record::decrement( r );
}
m_pool_base = 0 ;
m_scratch_reduce_end = 0 ;
m_scratch_thread_end = 0 ;
m_numa_rank = 0 ;
m_numa_core_rank = 0 ;
m_pool_rank = 0 ;
m_pool_size = 0 ;
m_pool_fan_size = 0 ;
m_pool_state = ThreadsExec::Terminating ;
if ( & s_threads_process != this && entry < MAX_THREAD_COUNT ) {
ThreadsExec * const nil = 0 ;
atomic_compare_exchange( s_threads_exec + entry , this , nil );
s_threads_process.m_pool_state = ThreadsExec::Terminating ;
}
}
int ThreadsExec::get_thread_count()
{
return s_thread_pool_size[0] ;
}
ThreadsExec * ThreadsExec::get_thread( const int init_thread_rank )
{
ThreadsExec * const th =
init_thread_rank < s_thread_pool_size[0]
? s_threads_exec[ s_thread_pool_size[0] - ( init_thread_rank + 1 ) ] : 0 ;
if ( 0 == th || th->m_pool_rank != init_thread_rank ) {
std::ostringstream msg ;
msg << "Kokkos::Impl::ThreadsExec::get_thread ERROR : "
<< "thread " << init_thread_rank << " of " << s_thread_pool_size[0] ;
if ( 0 == th ) {
msg << " does not exist" ;
}
else {
msg << " has wrong thread_rank " << th->m_pool_rank ;
}
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
return th ;
}
//----------------------------------------------------------------------------
void ThreadsExec::execute_sleep( ThreadsExec & exec , const void * )
{
ThreadsExec::global_lock();
ThreadsExec::global_unlock();
const int n = exec.m_pool_fan_size ;
const int rank_rev = exec.m_pool_size - ( exec.m_pool_rank + 1 );
for ( int i = 0 ; i < n ; ++i ) {
- Impl::spinwait( exec.m_pool_base[ rank_rev + (1<<i) ]->m_pool_state , ThreadsExec::Active );
+ Impl::spinwait_while_equal( exec.m_pool_base[ rank_rev + (1<<i) ]->m_pool_state , ThreadsExec::Active );
}
exec.m_pool_state = ThreadsExec::Inactive ;
}
}
}
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
void ThreadsExec::verify_is_process( const std::string & name , const bool initialized )
{
if ( ! is_process() ) {
std::string msg( name );
msg.append( " FAILED : Called by a worker thread, can only be called by the master process." );
Kokkos::Impl::throw_runtime_exception( msg );
}
if ( initialized && 0 == s_thread_pool_size[0] ) {
std::string msg( name );
msg.append( " FAILED : Threads not initialized." );
Kokkos::Impl::throw_runtime_exception( msg );
}
}
int ThreadsExec::in_parallel()
{
// A thread function is in execution and
// the function argument is not the special threads process argument and
// the master process is a worker or is not the master process.
return s_current_function &&
( & s_threads_process != s_current_function_arg ) &&
( s_threads_process.m_pool_base || ! is_process() );
}
// Wait for root thread to become inactive
void ThreadsExec::fence()
{
if ( s_thread_pool_size[0] ) {
// Wait for the root thread to complete:
- Impl::spinwait( s_threads_exec[0]->m_pool_state , ThreadsExec::Active );
+ Impl::spinwait_while_equal( s_threads_exec[0]->m_pool_state , ThreadsExec::Active );
}
s_current_function = 0 ;
s_current_function_arg = 0 ;
// Make sure function and arguments are cleared before
// potentially re-activating threads with a subsequent launch.
memory_fence();
}
/** \brief Begin execution of the asynchronous functor */
void ThreadsExec::start( void (*func)( ThreadsExec & , const void * ) , const void * arg )
{
verify_is_process("ThreadsExec::start" , true );
if ( s_current_function || s_current_function_arg ) {
Kokkos::Impl::throw_runtime_exception( std::string( "ThreadsExec::start() FAILED : already executing" ) );
}
s_current_function = func ;
s_current_function_arg = arg ;
// Make sure function and arguments are written before activating threads.
memory_fence();
// Activate threads:
for ( int i = s_thread_pool_size[0] ; 0 < i-- ; ) {
s_threads_exec[i]->m_pool_state = ThreadsExec::Active ;
}
if ( s_threads_process.m_pool_size ) {
// Master process is the root thread, run it:
(*func)( s_threads_process , arg );
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
}
}
//----------------------------------------------------------------------------
bool ThreadsExec::sleep()
{
verify_is_process("ThreadsExec::sleep", true );
if ( & execute_sleep == s_current_function ) return false ;
fence();
ThreadsExec::global_lock();
s_current_function = & execute_sleep ;
// Activate threads:
for ( unsigned i = s_thread_pool_size[0] ; 0 < i ; ) {
s_threads_exec[--i]->m_pool_state = ThreadsExec::Active ;
}
return true ;
}
bool ThreadsExec::wake()
{
verify_is_process("ThreadsExec::wake", true );
if ( & execute_sleep != s_current_function ) return false ;
ThreadsExec::global_unlock();
if ( s_threads_process.m_pool_base ) {
execute_sleep( s_threads_process , 0 );
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
}
fence();
return true ;
}
//----------------------------------------------------------------------------
void ThreadsExec::execute_serial( void (*func)( ThreadsExec & , const void * ) )
{
s_current_function = func ;
s_current_function_arg = & s_threads_process ;
// Make sure function and arguments are written before activating threads.
memory_fence();
const unsigned begin = s_threads_process.m_pool_base ? 1 : 0 ;
for ( unsigned i = s_thread_pool_size[0] ; begin < i ; ) {
ThreadsExec & th = * s_threads_exec[ --i ];
th.m_pool_state = ThreadsExec::Active ;
wait_yield( th.m_pool_state , ThreadsExec::Active );
}
if ( s_threads_process.m_pool_base ) {
s_threads_process.m_pool_state = ThreadsExec::Active ;
(*func)( s_threads_process , 0 );
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
}
s_current_function_arg = 0 ;
s_current_function = 0 ;
// Make sure function and arguments are cleared before proceeding.
memory_fence();
}
//----------------------------------------------------------------------------
void * ThreadsExec::root_reduce_scratch()
{
return s_threads_process.reduce_memory();
}
void ThreadsExec::execute_resize_scratch( ThreadsExec & exec , const void * )
{
typedef Kokkos::Experimental::Impl::SharedAllocationRecord< Kokkos::HostSpace , void > Record ;
if ( exec.m_scratch ) {
Record * const r = Record::get_record( exec.m_scratch );
exec.m_scratch = 0 ;
Record::decrement( r );
}
exec.m_scratch_reduce_end = s_threads_process.m_scratch_reduce_end ;
exec.m_scratch_thread_end = s_threads_process.m_scratch_thread_end ;
if ( s_threads_process.m_scratch_thread_end ) {
// Allocate tracked memory:
{
Record * const r = Record::allocate( Kokkos::HostSpace() , "thread_scratch" , s_threads_process.m_scratch_thread_end );
Record::increment( r );
exec.m_scratch = r->data();
}
unsigned * ptr = reinterpret_cast<unsigned *>( exec.m_scratch );
unsigned * const end = ptr + s_threads_process.m_scratch_thread_end / sizeof(unsigned);
// touch on this thread
while ( ptr < end ) *ptr++ = 0 ;
}
}
void * ThreadsExec::resize_scratch( size_t reduce_size , size_t thread_size )
{
enum { ALIGN_MASK = Kokkos::Impl::MEMORY_ALIGNMENT - 1 };
fence();
const size_t old_reduce_size = s_threads_process.m_scratch_reduce_end ;
const size_t old_thread_size = s_threads_process.m_scratch_thread_end - s_threads_process.m_scratch_reduce_end ;
reduce_size = ( reduce_size + ALIGN_MASK ) & ~ALIGN_MASK ;
thread_size = ( thread_size + ALIGN_MASK ) & ~ALIGN_MASK ;
// Increase size or deallocate completely.
if ( ( old_reduce_size < reduce_size ) ||
( old_thread_size < thread_size ) ||
( ( reduce_size == 0 && thread_size == 0 ) &&
( old_reduce_size != 0 || old_thread_size != 0 ) ) ) {
verify_is_process( "ThreadsExec::resize_scratch" , true );
s_threads_process.m_scratch_reduce_end = reduce_size ;
s_threads_process.m_scratch_thread_end = reduce_size + thread_size ;
execute_serial( & execute_resize_scratch );
s_threads_process.m_scratch = s_threads_exec[0]->m_scratch ;
}
return s_threads_process.m_scratch ;
}
//----------------------------------------------------------------------------
void ThreadsExec::print_configuration( std::ostream & s , const bool detail )
{
verify_is_process("ThreadsExec::print_configuration",false);
fence();
const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
// Forestall compiler warnings for unused variables.
(void) numa_count;
(void) cores_per_numa;
(void) threads_per_core;
s << "Kokkos::Threads" ;
#if defined( KOKKOS_ENABLE_PTHREAD )
s << " KOKKOS_ENABLE_PTHREAD" ;
#endif
#if defined( KOKKOS_ENABLE_HWLOC )
s << " hwloc[" << numa_count << "x" << cores_per_numa << "x" << threads_per_core << "]" ;
#endif
if ( s_thread_pool_size[0] ) {
s << " threads[" << s_thread_pool_size[0] << "]"
<< " threads_per_numa[" << s_thread_pool_size[1] << "]"
<< " threads_per_core[" << s_thread_pool_size[2] << "]"
;
if ( 0 == s_threads_process.m_pool_base ) { s << " Asynchronous" ; }
s << " ReduceScratch[" << s_current_reduce_size << "]"
<< " SharedScratch[" << s_current_shared_size << "]" ;
s << std::endl ;
if ( detail ) {
for ( int i = 0 ; i < s_thread_pool_size[0] ; ++i ) {
ThreadsExec * const th = s_threads_exec[i] ;
if ( th ) {
const int rank_rev = th->m_pool_size - ( th->m_pool_rank + 1 );
s << " Thread[ " << th->m_pool_rank << " : "
<< th->m_numa_rank << "." << th->m_numa_core_rank << " ]" ;
s << " Fan{" ;
for ( int j = 0 ; j < th->m_pool_fan_size ; ++j ) {
ThreadsExec * const thfan = th->m_pool_base[rank_rev+(1<<j)] ;
s << " [ " << thfan->m_pool_rank << " : "
<< thfan->m_numa_rank << "." << thfan->m_numa_core_rank << " ]" ;
}
s << " }" ;
if ( th == & s_threads_process ) {
s << " is_process" ;
}
}
s << std::endl ;
}
}
}
else {
s << " not initialized" << std::endl ;
}
}
//----------------------------------------------------------------------------
int ThreadsExec::is_initialized()
{ return 0 != s_threads_exec[0] ; }
void ThreadsExec::initialize( unsigned thread_count ,
unsigned use_numa_count ,
unsigned use_cores_per_numa ,
bool allow_asynchronous_threadpool )
{
static const Sentinel sentinel ;
const bool is_initialized = 0 != s_thread_pool_size[0] ;
unsigned thread_spawn_failed = 0 ;
for ( int i = 0; i < ThreadsExec::MAX_THREAD_COUNT ; i++)
s_threads_exec[i] = NULL;
if ( ! is_initialized ) {
// If thread_count, use_numa_count, or use_cores_per_numa are zero
// then they will be given default values based upon hwloc detection
// and allowed asynchronous execution.
const bool hwloc_avail = Kokkos::hwloc::available();
const bool hwloc_can_bind = hwloc_avail && Kokkos::hwloc::can_bind_threads();
if ( thread_count == 0 ) {
thread_count = hwloc_avail
? Kokkos::hwloc::get_available_numa_count() *
Kokkos::hwloc::get_available_cores_per_numa() *
Kokkos::hwloc::get_available_threads_per_core()
: 1 ;
}
const unsigned thread_spawn_begin =
hwloc::thread_mapping( "Kokkos::Threads::initialize" ,
allow_asynchronous_threadpool ,
thread_count ,
use_numa_count ,
use_cores_per_numa ,
s_threads_coord );
const std::pair<unsigned,unsigned> proc_coord = s_threads_coord[0] ;
if ( thread_spawn_begin ) {
// Synchronous with s_threads_coord[0] as the process core
// Claim entry #0 for binding the process core.
s_threads_coord[0] = std::pair<unsigned,unsigned>(~0u,~0u);
}
s_thread_pool_size[0] = thread_count ;
s_thread_pool_size[1] = s_thread_pool_size[0] / use_numa_count ;
s_thread_pool_size[2] = s_thread_pool_size[1] / use_cores_per_numa ;
s_current_function = & execute_function_noop ; // Initialization work function
for ( unsigned ith = thread_spawn_begin ; ith < thread_count ; ++ith ) {
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
// If hwloc available then spawned thread will
// choose its own entry in 's_threads_coord'
// otherwise specify the entry.
s_current_function_arg = (void*)static_cast<uintptr_t>( hwloc_can_bind ? ~0u : ith );
// Make sure all outstanding memory writes are complete
// before spawning the new thread.
memory_fence();
// Spawn thread executing the 'driver()' function.
// Wait until spawned thread has attempted to initialize.
// If spawning and initialization is successfull then
// an entry in 's_threads_exec' will be assigned.
if ( ThreadsExec::spawn() ) {
wait_yield( s_threads_process.m_pool_state , ThreadsExec::Inactive );
}
if ( s_threads_process.m_pool_state == ThreadsExec::Terminating ) break ;
}
// Wait for all spawned threads to deactivate before zeroing the function.
for ( unsigned ith = thread_spawn_begin ; ith < thread_count ; ++ith ) {
// Try to protect against cache coherency failure by casting to volatile.
ThreadsExec * const th = ((ThreadsExec * volatile *)s_threads_exec)[ith] ;
if ( th ) {
wait_yield( th->m_pool_state , ThreadsExec::Active );
}
else {
++thread_spawn_failed ;
}
}
s_current_function = 0 ;
s_current_function_arg = 0 ;
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
memory_fence();
if ( ! thread_spawn_failed ) {
// Bind process to the core on which it was located before spawning occured
if (hwloc_can_bind) {
Kokkos::hwloc::bind_this_thread( proc_coord );
}
if ( thread_spawn_begin ) { // Include process in pool.
const std::pair<unsigned,unsigned> coord = Kokkos::hwloc::get_this_thread_coordinate();
s_threads_exec[0] = & s_threads_process ;
s_threads_process.m_numa_rank = coord.first ;
s_threads_process.m_numa_core_rank = coord.second ;
s_threads_process.m_pool_base = s_threads_exec ;
s_threads_process.m_pool_rank = thread_count - 1 ; // Reversed for scan-compatible reductions
s_threads_process.m_pool_size = thread_count ;
s_threads_process.m_pool_fan_size = fan_size( s_threads_process.m_pool_rank , s_threads_process.m_pool_size );
s_threads_pid[ s_threads_process.m_pool_rank ] = pthread_self();
}
else {
s_threads_process.m_pool_base = 0 ;
s_threads_process.m_pool_rank = 0 ;
s_threads_process.m_pool_size = 0 ;
s_threads_process.m_pool_fan_size = 0 ;
}
// Initial allocations:
ThreadsExec::resize_scratch( 1024 , 1024 );
}
else {
s_thread_pool_size[0] = 0 ;
s_thread_pool_size[1] = 0 ;
s_thread_pool_size[2] = 0 ;
}
}
if ( is_initialized || thread_spawn_failed ) {
std::ostringstream msg ;
msg << "Kokkos::Threads::initialize ERROR" ;
if ( is_initialized ) {
msg << " : already initialized" ;
}
if ( thread_spawn_failed ) {
msg << " : failed to spawn " << thread_spawn_failed << " threads" ;
}
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
// Check for over-subscription
//if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
// std::cout << "Kokkos::Threads::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
// std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
// std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
// std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
//}
// Init the array for used for arbitrarily sized atomics
Impl::init_lock_array_host_space();
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::initialize();
#endif
}
//----------------------------------------------------------------------------
void ThreadsExec::finalize()
{
verify_is_process("ThreadsExec::finalize",false);
fence();
resize_scratch(0,0);
const unsigned begin = s_threads_process.m_pool_base ? 1 : 0 ;
for ( unsigned i = s_thread_pool_size[0] ; begin < i-- ; ) {
if ( s_threads_exec[i] ) {
s_threads_exec[i]->m_pool_state = ThreadsExec::Terminating ;
wait_yield( s_threads_process.m_pool_state , ThreadsExec::Inactive );
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
}
s_threads_pid[i] = 0 ;
}
if ( s_threads_process.m_pool_base ) {
( & s_threads_process )->~ThreadsExec();
s_threads_exec[0] = 0 ;
}
if (Kokkos::hwloc::can_bind_threads() ) {
Kokkos::hwloc::unbind_this_thread();
}
s_thread_pool_size[0] = 0 ;
s_thread_pool_size[1] = 0 ;
s_thread_pool_size[2] = 0 ;
// Reset master thread to run solo.
s_threads_process.m_numa_rank = 0 ;
s_threads_process.m_numa_core_rank = 0 ;
s_threads_process.m_pool_base = 0 ;
s_threads_process.m_pool_rank = 0 ;
s_threads_process.m_pool_size = 1 ;
s_threads_process.m_pool_fan_size = 0 ;
s_threads_process.m_pool_state = ThreadsExec::Inactive ;
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::finalize();
#endif
}
//----------------------------------------------------------------------------
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
int Threads::concurrency() {
return thread_pool_size(0);
}
Threads & Threads::instance(int)
{
static Threads t ;
return t ;
}
int Threads::thread_pool_size( int depth )
{
return Impl::s_thread_pool_size[depth];
}
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
int Threads::thread_pool_rank()
{
const pthread_t pid = pthread_self();
int i = 0;
while ( ( i < Impl::s_thread_pool_size[0] ) && ( pid != Impl::s_threads_pid[i] ) ) { ++i ; }
return i ;
}
#endif
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_ENABLE_PTHREAD ) || defined( KOKKOS_ENABLE_WINTHREAD ) */
diff --git a/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.hpp b/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.hpp
index 385dd492d..a6db02eba 100644
--- a/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.hpp
+++ b/lib/kokkos/core/src/Threads/Kokkos_ThreadsExec.hpp
@@ -1,631 +1,631 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_THREADSEXEC_HPP
#define KOKKOS_THREADSEXEC_HPP
#include <stdio.h>
#include <utility>
#include <impl/Kokkos_spinwait.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
#include <Kokkos_Atomic.hpp>
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
class ThreadsExec {
public:
// Fan array has log_2(NT) reduction threads plus 2 scan threads
// Currently limited to 16k threads.
enum { MAX_FAN_COUNT = 16 };
enum { MAX_THREAD_COUNT = 1 << ( MAX_FAN_COUNT - 2 ) };
enum { VECTOR_LENGTH = 8 };
/** \brief States of a worker thread */
enum { Terminating ///< Termination in progress
, Inactive ///< Exists, waiting for work
, Active ///< Exists, performing work
, Rendezvous ///< Exists, waiting in a barrier or reduce
, ScanCompleted
, ScanAvailable
, ReductionAvailable
};
private:
friend class Kokkos::Threads ;
// Fan-in operations' root is the highest ranking thread
// to place the 'scan' reduction intermediate values on
// the threads that need them.
// For a simple reduction the thread location is arbitrary.
ThreadsExec * const * m_pool_base ; ///< Base for pool fan-in
void * m_scratch ;
int m_scratch_reduce_end ;
int m_scratch_thread_end ;
int m_numa_rank ;
int m_numa_core_rank ;
int m_pool_rank ;
int m_pool_rank_rev ;
int m_pool_size ;
int m_pool_fan_size ;
int volatile m_pool_state ; ///< State for global synchronizations
// Members for dynamic scheduling
// Which thread am I stealing from currently
int m_current_steal_target;
// This thread's owned work_range
Kokkos::pair<long,long> m_work_range KOKKOS_ALIGN(16);
// Team Offset if one thread determines work_range for others
long m_team_work_index;
// Is this thread stealing (i.e. its owned work_range is exhausted
bool m_stealing;
static void global_lock();
static void global_unlock();
static bool spawn();
static void execute_resize_scratch( ThreadsExec & , const void * );
static void execute_sleep( ThreadsExec & , const void * );
ThreadsExec( const ThreadsExec & );
ThreadsExec & operator = ( const ThreadsExec & );
static void execute_serial( void (*)( ThreadsExec & , const void * ) );
public:
KOKKOS_INLINE_FUNCTION int pool_size() const { return m_pool_size ; }
KOKKOS_INLINE_FUNCTION int pool_rank() const { return m_pool_rank ; }
KOKKOS_INLINE_FUNCTION int numa_rank() const { return m_numa_rank ; }
KOKKOS_INLINE_FUNCTION int numa_core_rank() const { return m_numa_core_rank ; }
inline long team_work_index() const { return m_team_work_index ; }
static int get_thread_count();
static ThreadsExec * get_thread( const int init_thread_rank );
inline void * reduce_memory() const { return m_scratch ; }
KOKKOS_INLINE_FUNCTION void * scratch_memory() const
{ return reinterpret_cast<unsigned char *>(m_scratch) + m_scratch_reduce_end ; }
KOKKOS_INLINE_FUNCTION int volatile & state() { return m_pool_state ; }
KOKKOS_INLINE_FUNCTION ThreadsExec * const * pool_base() const { return m_pool_base ; }
static void driver(void);
~ThreadsExec();
ThreadsExec();
static void * resize_scratch( size_t reduce_size , size_t thread_size );
static void * root_reduce_scratch();
static bool is_process();
static void verify_is_process( const std::string & , const bool initialized );
static int is_initialized();
static void initialize( unsigned thread_count ,
unsigned use_numa_count ,
unsigned use_cores_per_numa ,
bool allow_asynchronous_threadpool );
static void finalize();
/* Given a requested team size, return valid team size */
static unsigned team_size_valid( unsigned );
static void print_configuration( std::ostream & , const bool detail = false );
//------------------------------------
static void wait_yield( volatile int & , const int );
//------------------------------------
// All-thread functions:
inline
int all_reduce( const int value )
{
// Make sure there is enough scratch space:
const int rev_rank = m_pool_size - ( m_pool_rank + 1 );
*((volatile int*) reduce_memory()) = value ;
memory_fence();
// Fan-in reduction with highest ranking thread as the root
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
// Wait: Active -> Rendezvous
- Impl::spinwait( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
+ Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
}
if ( rev_rank ) {
m_pool_state = ThreadsExec::Rendezvous ;
// Wait: Rendezvous -> Active
- Impl::spinwait( m_pool_state , ThreadsExec::Rendezvous );
+ Impl::spinwait_while_equal( m_pool_state , ThreadsExec::Rendezvous );
}
else {
// Root thread does the reduction and broadcast
int accum = 0 ;
for ( int rank = 0 ; rank < m_pool_size ; ++rank ) {
accum += *((volatile int *) get_thread( rank )->reduce_memory());
}
for ( int rank = 0 ; rank < m_pool_size ; ++rank ) {
*((volatile int *) get_thread( rank )->reduce_memory()) = accum ;
}
memory_fence();
for ( int rank = 0 ; rank < m_pool_size ; ++rank ) {
get_thread( rank )->m_pool_state = ThreadsExec::Active ;
}
}
return *((volatile int*) reduce_memory());
}
inline
void barrier( )
{
// Make sure there is enough scratch space:
const int rev_rank = m_pool_size - ( m_pool_rank + 1 );
memory_fence();
// Fan-in reduction with highest ranking thread as the root
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
// Wait: Active -> Rendezvous
- Impl::spinwait( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
+ Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
}
if ( rev_rank ) {
m_pool_state = ThreadsExec::Rendezvous ;
// Wait: Rendezvous -> Active
- Impl::spinwait( m_pool_state , ThreadsExec::Rendezvous );
+ Impl::spinwait_while_equal( m_pool_state , ThreadsExec::Rendezvous );
}
else {
// Root thread does the reduction and broadcast
memory_fence();
for ( int rank = 0 ; rank < m_pool_size ; ++rank ) {
get_thread( rank )->m_pool_state = ThreadsExec::Active ;
}
}
}
//------------------------------------
// All-thread functions:
template< class FunctorType , class ArgTag >
inline
void fan_in_reduce( const FunctorType & f ) const
{
typedef Kokkos::Impl::FunctorValueJoin< FunctorType , ArgTag > Join ;
typedef Kokkos::Impl::FunctorFinal< FunctorType , ArgTag > Final ;
const int rev_rank = m_pool_size - ( m_pool_rank + 1 );
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
ThreadsExec & fan = *m_pool_base[ rev_rank + ( 1 << i ) ] ;
- Impl::spinwait( fan.m_pool_state , ThreadsExec::Active );
+ Impl::spinwait_while_equal( fan.m_pool_state , ThreadsExec::Active );
Join::join( f , reduce_memory() , fan.reduce_memory() );
}
if ( ! rev_rank ) {
Final::final( f , reduce_memory() );
}
}
inline
void fan_in() const
{
const int rev_rank = m_pool_size - ( m_pool_rank + 1 );
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
- Impl::spinwait( m_pool_base[rev_rank+(1<<i)]->m_pool_state , ThreadsExec::Active );
+ Impl::spinwait_while_equal( m_pool_base[rev_rank+(1<<i)]->m_pool_state , ThreadsExec::Active );
}
}
template< class FunctorType , class ArgTag >
inline
void scan_large( const FunctorType & f )
{
// Sequence of states:
// 0) Active : entry and exit state
// 1) ReductionAvailable : reduction value available
// 2) ScanAvailable : inclusive scan value available
// 3) Rendezvous : All threads inclusive scan value are available
// 4) ScanCompleted : exclusive scan value copied
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , ArgTag > Traits ;
typedef Kokkos::Impl::FunctorValueJoin< FunctorType , ArgTag > Join ;
typedef Kokkos::Impl::FunctorValueInit< FunctorType , ArgTag > Init ;
typedef typename Traits::value_type scalar_type ;
const int rev_rank = m_pool_size - ( m_pool_rank + 1 );
const unsigned count = Traits::value_count( f );
scalar_type * const work_value = (scalar_type *) reduce_memory();
//--------------------------------
// Fan-in reduction with highest ranking thread as the root
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
ThreadsExec & fan = *m_pool_base[ rev_rank + (1<<i) ];
// Wait: Active -> ReductionAvailable (or ScanAvailable)
- Impl::spinwait( fan.m_pool_state , ThreadsExec::Active );
+ Impl::spinwait_while_equal( fan.m_pool_state , ThreadsExec::Active );
Join::join( f , work_value , fan.reduce_memory() );
}
// Copy reduction value to scan value before releasing from this phase.
for ( unsigned i = 0 ; i < count ; ++i ) { work_value[i+count] = work_value[i] ; }
if ( rev_rank ) {
// Set: Active -> ReductionAvailable
m_pool_state = ThreadsExec::ReductionAvailable ;
// Wait for contributing threads' scan value to be available.
if ( ( 1 << m_pool_fan_size ) < ( m_pool_rank + 1 ) ) {
ThreadsExec & th = *m_pool_base[ rev_rank + ( 1 << m_pool_fan_size ) ] ;
// Wait: Active -> ReductionAvailable
// Wait: ReductionAvailable -> ScanAvailable
- Impl::spinwait( th.m_pool_state , ThreadsExec::Active );
- Impl::spinwait( th.m_pool_state , ThreadsExec::ReductionAvailable );
+ Impl::spinwait_while_equal( th.m_pool_state , ThreadsExec::Active );
+ Impl::spinwait_while_equal( th.m_pool_state , ThreadsExec::ReductionAvailable );
Join::join( f , work_value + count , ((scalar_type *)th.reduce_memory()) + count );
}
// This thread has completed inclusive scan
// Set: ReductionAvailable -> ScanAvailable
m_pool_state = ThreadsExec::ScanAvailable ;
// Wait for all threads to complete inclusive scan
// Wait: ScanAvailable -> Rendezvous
- Impl::spinwait( m_pool_state , ThreadsExec::ScanAvailable );
+ Impl::spinwait_while_equal( m_pool_state , ThreadsExec::ScanAvailable );
}
//--------------------------------
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
ThreadsExec & fan = *m_pool_base[ rev_rank + (1<<i) ];
// Wait: ReductionAvailable -> ScanAvailable
- Impl::spinwait( fan.m_pool_state , ThreadsExec::ReductionAvailable );
+ Impl::spinwait_while_equal( fan.m_pool_state , ThreadsExec::ReductionAvailable );
// Set: ScanAvailable -> Rendezvous
fan.m_pool_state = ThreadsExec::Rendezvous ;
}
// All threads have completed the inclusive scan.
// All non-root threads are in the Rendezvous state.
// Threads are free to overwrite their reduction value.
//--------------------------------
if ( ( rev_rank + 1 ) < m_pool_size ) {
// Exclusive scan: copy the previous thread's inclusive scan value
ThreadsExec & th = *m_pool_base[ rev_rank + 1 ] ; // Not the root thread
const scalar_type * const src_value = ((scalar_type *)th.reduce_memory()) + count ;
for ( unsigned j = 0 ; j < count ; ++j ) { work_value[j] = src_value[j]; }
}
else {
(void) Init::init( f , work_value );
}
//--------------------------------
// Wait for all threads to copy previous thread's inclusive scan value
// Wait for all threads: Rendezvous -> ScanCompleted
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
- Impl::spinwait( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Rendezvous );
+ Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Rendezvous );
}
if ( rev_rank ) {
// Set: ScanAvailable -> ScanCompleted
m_pool_state = ThreadsExec::ScanCompleted ;
// Wait: ScanCompleted -> Active
- Impl::spinwait( m_pool_state , ThreadsExec::ScanCompleted );
+ Impl::spinwait_while_equal( m_pool_state , ThreadsExec::ScanCompleted );
}
// Set: ScanCompleted -> Active
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
m_pool_base[ rev_rank + (1<<i) ]->m_pool_state = ThreadsExec::Active ;
}
}
template< class FunctorType , class ArgTag >
inline
void scan_small( const FunctorType & f )
{
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , ArgTag > Traits ;
typedef Kokkos::Impl::FunctorValueJoin< FunctorType , ArgTag > Join ;
typedef Kokkos::Impl::FunctorValueInit< FunctorType , ArgTag > Init ;
typedef typename Traits::value_type scalar_type ;
const int rev_rank = m_pool_size - ( m_pool_rank + 1 );
const unsigned count = Traits::value_count( f );
scalar_type * const work_value = (scalar_type *) reduce_memory();
//--------------------------------
// Fan-in reduction with highest ranking thread as the root
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
// Wait: Active -> Rendezvous
- Impl::spinwait( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
+ Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
}
for ( unsigned i = 0 ; i < count ; ++i ) { work_value[i+count] = work_value[i]; }
if ( rev_rank ) {
m_pool_state = ThreadsExec::Rendezvous ;
// Wait: Rendezvous -> Active
- Impl::spinwait( m_pool_state , ThreadsExec::Rendezvous );
+ Impl::spinwait_while_equal( m_pool_state , ThreadsExec::Rendezvous );
}
else {
// Root thread does the thread-scan before releasing threads
scalar_type * ptr_prev = 0 ;
for ( int rank = 0 ; rank < m_pool_size ; ++rank ) {
scalar_type * const ptr = (scalar_type *) get_thread( rank )->reduce_memory();
if ( rank ) {
for ( unsigned i = 0 ; i < count ; ++i ) { ptr[i] = ptr_prev[ i + count ]; }
Join::join( f , ptr + count , ptr );
}
else {
(void) Init::init( f , ptr );
}
ptr_prev = ptr ;
}
}
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
m_pool_base[ rev_rank + (1<<i) ]->m_pool_state = ThreadsExec::Active ;
}
}
//------------------------------------
/** \brief Wait for previous asynchronous functor to
* complete and release the Threads device.
* Acquire the Threads device and start this functor.
*/
static void start( void (*)( ThreadsExec & , const void * ) , const void * );
static int in_parallel();
static void fence();
static bool sleep();
static bool wake();
/* Dynamic Scheduling related functionality */
// Initialize the work range for this thread
inline void set_work_range(const long& begin, const long& end, const long& chunk_size) {
m_work_range.first = (begin+chunk_size-1)/chunk_size;
m_work_range.second = end>0?(end+chunk_size-1)/chunk_size:m_work_range.first;
}
// Claim and index from this thread's range from the beginning
inline long get_work_index_begin () {
Kokkos::pair<long,long> work_range_new = m_work_range;
Kokkos::pair<long,long> work_range_old = work_range_new;
if(work_range_old.first>=work_range_old.second)
return -1;
work_range_new.first+=1;
bool success = false;
while(!success) {
work_range_new = Kokkos::atomic_compare_exchange(&m_work_range,work_range_old,work_range_new);
success = ( (work_range_new == work_range_old) ||
(work_range_new.first>=work_range_new.second));
work_range_old = work_range_new;
work_range_new.first+=1;
}
if(work_range_old.first<work_range_old.second)
return work_range_old.first;
else
return -1;
}
// Claim and index from this thread's range from the end
inline long get_work_index_end () {
Kokkos::pair<long,long> work_range_new = m_work_range;
Kokkos::pair<long,long> work_range_old = work_range_new;
if(work_range_old.first>=work_range_old.second)
return -1;
work_range_new.second-=1;
bool success = false;
while(!success) {
work_range_new = Kokkos::atomic_compare_exchange(&m_work_range,work_range_old,work_range_new);
success = ( (work_range_new == work_range_old) ||
(work_range_new.first>=work_range_new.second) );
work_range_old = work_range_new;
work_range_new.second-=1;
}
if(work_range_old.first<work_range_old.second)
return work_range_old.second-1;
else
return -1;
}
// Reset the steal target
inline void reset_steal_target() {
m_current_steal_target = (m_pool_rank+1)%pool_size();
m_stealing = false;
}
// Reset the steal target
inline void reset_steal_target(int team_size) {
m_current_steal_target = (m_pool_rank_rev+team_size);
if(m_current_steal_target>=pool_size())
m_current_steal_target = 0;//pool_size()-1;
m_stealing = false;
}
// Get a steal target; start with my-rank + 1 and go round robin, until arriving at this threads rank
// Returns -1 fi no active steal target available
inline int get_steal_target() {
while(( m_pool_base[m_current_steal_target]->m_work_range.second <=
m_pool_base[m_current_steal_target]->m_work_range.first ) &&
(m_current_steal_target!=m_pool_rank) ) {
m_current_steal_target = (m_current_steal_target+1)%pool_size();
}
if(m_current_steal_target == m_pool_rank)
return -1;
else
return m_current_steal_target;
}
inline int get_steal_target(int team_size) {
while(( m_pool_base[m_current_steal_target]->m_work_range.second <=
m_pool_base[m_current_steal_target]->m_work_range.first ) &&
(m_current_steal_target!=m_pool_rank_rev) ) {
if(m_current_steal_target + team_size < pool_size())
m_current_steal_target = (m_current_steal_target+team_size);
else
m_current_steal_target = 0;
}
if(m_current_steal_target == m_pool_rank_rev)
return -1;
else
return m_current_steal_target;
}
inline long steal_work_index (int team_size = 0) {
long index = -1;
int steal_target = team_size>0?get_steal_target(team_size):get_steal_target();
while ( (steal_target != -1) && (index == -1)) {
index = m_pool_base[steal_target]->get_work_index_end();
if(index == -1)
steal_target = team_size>0?get_steal_target(team_size):get_steal_target();
}
return index;
}
// Get a work index. Claim from owned range until its exhausted, then steal from other thread
inline long get_work_index (int team_size = 0) {
long work_index = -1;
if(!m_stealing) work_index = get_work_index_begin();
if( work_index == -1) {
memory_fence();
m_stealing = true;
work_index = steal_work_index(team_size);
}
m_team_work_index = work_index;
memory_fence();
return work_index;
}
};
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
inline int Threads::in_parallel()
{ return Impl::ThreadsExec::in_parallel(); }
inline int Threads::is_initialized()
{ return Impl::ThreadsExec::is_initialized(); }
inline void Threads::initialize(
unsigned threads_count ,
unsigned use_numa_count ,
unsigned use_cores_per_numa ,
bool allow_asynchronous_threadpool )
{
Impl::ThreadsExec::initialize( threads_count , use_numa_count , use_cores_per_numa , allow_asynchronous_threadpool );
}
inline void Threads::finalize()
{
Impl::ThreadsExec::finalize();
}
inline void Threads::print_configuration( std::ostream & s , const bool detail )
{
Impl::ThreadsExec::print_configuration( s , detail );
}
inline bool Threads::sleep()
{ return Impl::ThreadsExec::sleep() ; }
inline bool Threads::wake()
{ return Impl::ThreadsExec::wake() ; }
inline void Threads::fence()
{ Impl::ThreadsExec::fence() ; }
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #define KOKKOS_THREADSEXEC_HPP */
diff --git a/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp b/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp
index b9edb6455..701495428 100644
--- a/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp
+++ b/lib/kokkos/core/src/Threads/Kokkos_ThreadsTeam.hpp
@@ -1,916 +1,920 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_THREADSTEAM_HPP
#define KOKKOS_THREADSTEAM_HPP
#include <stdio.h>
#include <utility>
#include <impl/Kokkos_spinwait.hpp>
#include <impl/Kokkos_FunctorAdapter.hpp>
+#include <impl/Kokkos_HostThreadTeam.hpp>
#include <Kokkos_Atomic.hpp>
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
template< class > struct ThreadsExecAdapter ;
//----------------------------------------------------------------------------
class ThreadsExecTeamMember {
private:
enum { TEAM_REDUCE_SIZE = 512 };
typedef Kokkos::Threads execution_space ;
typedef execution_space::scratch_memory_space space ;
ThreadsExec * const m_exec ;
ThreadsExec * const * m_team_base ; ///< Base for team fan-in
space m_team_shared ;
int m_team_shared_size ;
int m_team_size ;
int m_team_rank ;
int m_team_rank_rev ;
int m_league_size ;
int m_league_end ;
int m_league_rank ;
int m_chunk_size;
int m_league_chunk_end;
int m_invalid_thread;
int m_team_alloc;
inline
void set_team_shared()
{ new( & m_team_shared ) space( ((char *) (*m_team_base)->scratch_memory()) + TEAM_REDUCE_SIZE , m_team_shared_size ); }
public:
// Fan-in and wait until the matching fan-out is called.
// The root thread which does not wait will return true.
// All other threads will return false during the fan-out.
KOKKOS_INLINE_FUNCTION bool team_fan_in() const
{
int n , j ;
// Wait for fan-in threads
for ( n = 1 ; ( ! ( m_team_rank_rev & n ) ) && ( ( j = m_team_rank_rev + n ) < m_team_size ) ; n <<= 1 ) {
- Impl::spinwait( m_team_base[j]->state() , ThreadsExec::Active );
+ Impl::spinwait_while_equal( m_team_base[j]->state() , ThreadsExec::Active );
}
// If not root then wait for release
if ( m_team_rank_rev ) {
m_exec->state() = ThreadsExec::Rendezvous ;
- Impl::spinwait( m_exec->state() , ThreadsExec::Rendezvous );
+ Impl::spinwait_while_equal( m_exec->state() , ThreadsExec::Rendezvous );
}
return ! m_team_rank_rev ;
}
KOKKOS_INLINE_FUNCTION void team_fan_out() const
{
int n , j ;
for ( n = 1 ; ( ! ( m_team_rank_rev & n ) ) && ( ( j = m_team_rank_rev + n ) < m_team_size ) ; n <<= 1 ) {
m_team_base[j]->state() = ThreadsExec::Active ;
}
}
public:
KOKKOS_INLINE_FUNCTION static int team_reduce_size() { return TEAM_REDUCE_SIZE ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & team_shmem() const
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & team_scratch(int) const
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & thread_scratch(int) const
{ return m_team_shared.set_team_thread_mode(0,team_size(),team_rank()) ; }
KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
KOKKOS_INLINE_FUNCTION int team_rank() const { return m_team_rank ; }
KOKKOS_INLINE_FUNCTION int team_size() const { return m_team_size ; }
KOKKOS_INLINE_FUNCTION void team_barrier() const
{
team_fan_in();
team_fan_out();
}
template<class ValueType>
KOKKOS_INLINE_FUNCTION
void team_broadcast(ValueType& value, const int& thread_id) const
{
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ }
#else
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(ValueType) < TEAM_REDUCE_SIZE
, ValueType , void >::type type ;
if ( m_team_base ) {
type * const local_value = ((type*) m_team_base[0]->scratch_memory());
if(team_rank() == thread_id) *local_value = value;
memory_fence();
team_barrier();
value = *local_value;
}
#endif
}
template< typename Type >
KOKKOS_INLINE_FUNCTION Type team_reduce( const Type & value ) const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return Type(); }
#else
{
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(Type) < TEAM_REDUCE_SIZE , Type , void >::type type ;
if ( 0 == m_exec ) return value ;
*((volatile type*) m_exec->scratch_memory() ) = value ;
memory_fence();
type & accum = *((type *) m_team_base[0]->scratch_memory() );
if ( team_fan_in() ) {
for ( int i = 1 ; i < m_team_size ; ++i ) {
accum += *((type *) m_team_base[i]->scratch_memory() );
}
memory_fence();
}
team_fan_out();
return accum ;
}
#endif
template< class ValueType, class JoinOp >
KOKKOS_INLINE_FUNCTION ValueType
team_reduce( const ValueType & value
, const JoinOp & op_in ) const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return ValueType(); }
#else
{
typedef ValueType value_type;
const JoinLambdaAdapter<value_type,JoinOp> op(op_in);
#endif
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(value_type) < TEAM_REDUCE_SIZE
, value_type , void >::type type ;
if ( 0 == m_exec ) return value ;
type * const local_value = ((type*) m_exec->scratch_memory());
// Set this thread's contribution
*local_value = value ;
// Fence to make sure the base team member has access:
memory_fence();
if ( team_fan_in() ) {
// The last thread to synchronize returns true, all other threads wait for team_fan_out()
type * const team_value = ((type*) m_team_base[0]->scratch_memory());
// Join to the team value:
for ( int i = 1 ; i < m_team_size ; ++i ) {
op.join( *team_value , *((type*) m_team_base[i]->scratch_memory()) );
}
// Team base thread may "lap" member threads so copy out to their local value.
for ( int i = 1 ; i < m_team_size ; ++i ) {
*((type*) m_team_base[i]->scratch_memory()) = *team_value ;
}
// Fence to make sure all team members have access
memory_fence();
}
team_fan_out();
// Value was changed by the team base
return *((type volatile const *) local_value);
}
#endif
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
* with intra-team non-deterministic ordering accumulation.
*
* The global inter-team accumulation value will, at the end of the
* league's parallel execution, be the scan's total.
* Parallel execution ordering of the league's teams is non-deterministic.
* As such the base value for each team's scan operation is similarly
* non-deterministic.
*/
template< typename ArgType >
KOKKOS_INLINE_FUNCTION ArgType team_scan( const ArgType & value , ArgType * const global_accum ) const
#if ! defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
{ return ArgType(); }
#else
{
// Make sure there is enough scratch space:
typedef typename if_c< sizeof(ArgType) < TEAM_REDUCE_SIZE , ArgType , void >::type type ;
if ( 0 == m_exec ) return type(0);
volatile type * const work_value = ((type*) m_exec->scratch_memory());
*work_value = value ;
memory_fence();
if ( team_fan_in() ) {
// The last thread to synchronize returns true, all other threads wait for team_fan_out()
// m_team_base[0] == highest ranking team member
// m_team_base[ m_team_size - 1 ] == lowest ranking team member
//
// 1) copy from lower to higher rank, initialize lowest rank to zero
// 2) prefix sum from lowest to highest rank, skipping lowest rank
type accum = 0 ;
if ( global_accum ) {
for ( int i = m_team_size ; i-- ; ) {
type & val = *((type*) m_team_base[i]->scratch_memory());
accum += val ;
}
accum = atomic_fetch_add( global_accum , accum );
}
for ( int i = m_team_size ; i-- ; ) {
type & val = *((type*) m_team_base[i]->scratch_memory());
const type offset = accum ;
accum += val ;
val = offset ;
}
memory_fence();
}
team_fan_out();
return *work_value ;
}
#endif
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
*
* The highest rank thread can compute the reduction total as
* reduction_total = dev.team_scan( value ) + value ;
*/
template< typename ArgType >
KOKKOS_INLINE_FUNCTION ArgType team_scan( const ArgType & value ) const
{ return this-> template team_scan<ArgType>( value , 0 ); }
//----------------------------------------
// Private for the driver
template< class ... Properties >
ThreadsExecTeamMember( Impl::ThreadsExec * exec
, const TeamPolicyInternal< Kokkos::Threads , Properties ... > & team
, const int shared_size )
: m_exec( exec )
, m_team_base(0)
, m_team_shared(0,0)
, m_team_shared_size( shared_size )
, m_team_size(team.team_size())
, m_team_rank(0)
, m_team_rank_rev(0)
, m_league_size(0)
, m_league_end(0)
, m_league_rank(0)
, m_chunk_size( team.chunk_size() )
, m_league_chunk_end(0)
, m_team_alloc( team.team_alloc())
{
if ( team.league_size() ) {
// Execution is using device-team interface:
const int pool_rank_rev = m_exec->pool_size() - ( m_exec->pool_rank() + 1 );
const int team_rank_rev = pool_rank_rev % team.team_alloc();
const size_t pool_league_size = m_exec->pool_size() / team.team_alloc() ;
const size_t pool_league_rank_rev = pool_rank_rev / team.team_alloc() ;
+ if(pool_league_rank_rev >= pool_league_size) {
+ m_invalid_thread = 1;
+ return;
+ }
const size_t pool_league_rank = pool_league_size - ( pool_league_rank_rev + 1 );
const int pool_num_teams = m_exec->pool_size()/team.team_alloc();
const int chunk_size = team.chunk_size()>0?team.chunk_size():team.team_iter();
const int chunks_per_team = ( team.league_size() + chunk_size*pool_num_teams-1 ) / (chunk_size*pool_num_teams);
int league_iter_end = team.league_size() - pool_league_rank_rev * chunks_per_team * chunk_size;
int league_iter_begin = league_iter_end - chunks_per_team * chunk_size;
if (league_iter_begin < 0) league_iter_begin = 0;
if (league_iter_end>team.league_size()) league_iter_end = team.league_size();
if ((team.team_alloc()>m_team_size)?
(team_rank_rev >= m_team_size):
(m_exec->pool_size() - pool_num_teams*m_team_size > m_exec->pool_rank())
)
m_invalid_thread = 1;
else
m_invalid_thread = 0;
// May be using fewer threads per team than a multiple of threads per core,
// some threads will idle.
if ( team_rank_rev < team.team_size() && !m_invalid_thread) {
m_team_base = m_exec->pool_base() + team.team_alloc() * pool_league_rank_rev ;
m_team_size = team.team_size() ;
m_team_rank = team.team_size() - ( team_rank_rev + 1 );
m_team_rank_rev = team_rank_rev ;
m_league_size = team.league_size();
m_league_rank = ( team.league_size() * pool_league_rank ) / pool_league_size ;
m_league_end = ( team.league_size() * (pool_league_rank+1) ) / pool_league_size ;
set_team_shared();
}
if ( (m_team_rank_rev == 0) && (m_invalid_thread == 0) ) {
m_exec->set_work_range(m_league_rank,m_league_end,m_chunk_size);
m_exec->reset_steal_target(m_team_size);
}
if(std::is_same<typename TeamPolicyInternal<Kokkos::Threads, Properties ...>::schedule_type::type,Kokkos::Dynamic>::value) {
m_exec->barrier();
}
}
else
{ m_invalid_thread = 1; }
}
ThreadsExecTeamMember()
: m_exec(0)
, m_team_base(0)
, m_team_shared(0,0)
, m_team_shared_size(0)
, m_team_size(1)
, m_team_rank(0)
, m_team_rank_rev(0)
, m_league_size(1)
, m_league_end(0)
, m_league_rank(0)
, m_chunk_size(0)
, m_league_chunk_end(0)
, m_invalid_thread(0)
, m_team_alloc(0)
{}
inline
ThreadsExec & threads_exec_team_base() const { return m_team_base ? **m_team_base : *m_exec ; }
bool valid_static() const
{ return m_league_rank < m_league_end ; }
void next_static()
{
if ( m_league_rank < m_league_end ) {
team_barrier();
set_team_shared();
}
m_league_rank++;
}
bool valid_dynamic() {
if(m_invalid_thread)
return false;
if ((m_league_rank < m_league_chunk_end) && (m_league_rank < m_league_size)) {
return true;
}
if ( m_team_rank_rev == 0 ) {
m_team_base[0]->get_work_index(m_team_alloc);
}
team_barrier();
long work_index = m_team_base[0]->team_work_index();
m_league_rank = work_index * m_chunk_size;
m_league_chunk_end = (work_index +1 ) * m_chunk_size;
if(m_league_chunk_end > m_league_size) m_league_chunk_end = m_league_size;
if((m_league_rank>=0) && (m_league_rank < m_league_chunk_end))
return true;
return false;
}
void next_dynamic() {
if(m_invalid_thread)
return;
if ( m_league_rank < m_league_chunk_end ) {
team_barrier();
set_team_shared();
}
m_league_rank++;
}
void set_league_shmem( const int arg_league_rank
, const int arg_league_size
, const int arg_shmem_size
)
{
m_league_rank = arg_league_rank ;
m_league_size = arg_league_size ;
m_team_shared_size = arg_shmem_size ;
set_team_shared();
}
};
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< class ... Properties >
class TeamPolicyInternal< Kokkos::Threads , Properties ... >: public PolicyTraits<Properties ...>
{
private:
int m_league_size ;
int m_team_size ;
int m_team_alloc ;
int m_team_iter ;
size_t m_team_scratch_size[2];
size_t m_thread_scratch_size[2];
int m_chunk_size;
inline
void init( const int league_size_request
, const int team_size_request )
{
const int pool_size = traits::execution_space::thread_pool_size(0);
- const int team_max = traits::execution_space::thread_pool_size(1);
+ const int max_host_team_size = Impl::HostThreadTeamData::max_team_members;
+ const int team_max = pool_size<max_host_team_size?pool_size:max_host_team_size;
const int team_grain = traits::execution_space::thread_pool_size(2);
m_league_size = league_size_request ;
m_team_size = team_size_request < team_max ?
team_size_request : team_max ;
// Round team size up to a multiple of 'team_gain'
const int team_size_grain = team_grain * ( ( m_team_size + team_grain - 1 ) / team_grain );
const int team_count = pool_size / team_size_grain ;
// Constraint : pool_size = m_team_alloc * team_count
m_team_alloc = pool_size / team_count ;
// Maxumum number of iterations each team will take:
m_team_iter = ( m_league_size + team_count - 1 ) / team_count ;
set_auto_chunk_size();
}
public:
//! Tag this class as a kokkos execution policy
//! Tag this class as a kokkos execution policy
typedef TeamPolicyInternal execution_policy ;
typedef PolicyTraits<Properties ... > traits;
TeamPolicyInternal& operator = (const TeamPolicyInternal& p) {
m_league_size = p.m_league_size;
m_team_size = p.m_team_size;
m_team_alloc = p.m_team_alloc;
m_team_iter = p.m_team_iter;
m_team_scratch_size[0] = p.m_team_scratch_size[0];
m_thread_scratch_size[0] = p.m_thread_scratch_size[0];
m_team_scratch_size[1] = p.m_team_scratch_size[1];
m_thread_scratch_size[1] = p.m_thread_scratch_size[1];
m_chunk_size = p.m_chunk_size;
return *this;
}
//----------------------------------------
template< class FunctorType >
inline static
- int team_size_max( const FunctorType & )
- { return traits::execution_space::thread_pool_size(1); }
+ int team_size_max( const FunctorType & ) {
+ int pool_size = traits::execution_space::thread_pool_size(1);
+ int max_host_team_size = Impl::HostThreadTeamData::max_team_members;
+ return pool_size<max_host_team_size?pool_size:max_host_team_size;
+ }
+
template< class FunctorType >
static int team_size_recommended( const FunctorType & )
{ return traits::execution_space::thread_pool_size(2); }
template< class FunctorType >
inline static
int team_size_recommended( const FunctorType &, const int& )
{ return traits::execution_space::thread_pool_size(2); }
//----------------------------------------
inline int team_size() const { return m_team_size ; }
inline int team_alloc() const { return m_team_alloc ; }
inline int league_size() const { return m_league_size ; }
inline size_t scratch_size(const int& level, int team_size_ = -1 ) const {
if(team_size_ < 0)
team_size_ = m_team_size;
return m_team_scratch_size[level] + team_size_*m_thread_scratch_size[level] ;
}
inline int team_iter() const { return m_team_iter ; }
/** \brief Specify league size, request team size */
TeamPolicyInternal( typename traits::execution_space &
, int league_size_request
, int team_size_request
, int vector_length_request = 1 )
: m_league_size(0)
, m_team_size(0)
, m_team_alloc(0)
, m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init(league_size_request,team_size_request); (void) vector_length_request; }
/** \brief Specify league size, request team size */
TeamPolicyInternal( typename traits::execution_space &
, int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_league_size(0)
, m_team_size(0)
, m_team_alloc(0)
, m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init(league_size_request,traits::execution_space::thread_pool_size(2)); }
TeamPolicyInternal( int league_size_request
, int team_size_request
, int /* vector_length_request */ = 1 )
: m_league_size(0)
, m_team_size(0)
, m_team_alloc(0)
, m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init(league_size_request,team_size_request); }
TeamPolicyInternal( int league_size_request
, const Kokkos::AUTO_t & /* team_size_request */
, int /* vector_length_request */ = 1 )
: m_league_size(0)
, m_team_size(0)
, m_team_alloc(0)
, m_team_scratch_size { 0 , 0 }
, m_thread_scratch_size { 0 , 0 }
, m_chunk_size(0)
{ init(league_size_request,traits::execution_space::thread_pool_size(2)); }
inline int chunk_size() const { return m_chunk_size ; }
/** \brief set chunk_size to a discrete value*/
inline TeamPolicyInternal set_chunk_size(typename traits::index_type chunk_size_) const {
TeamPolicyInternal p = *this;
p.m_chunk_size = chunk_size_;
return p;
}
/** \brief set per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
return p;
};
/** \brief set per thread scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
/** \brief set per thread and per team scratch size for a specific level of the scratch hierarchy */
inline TeamPolicyInternal set_scratch_size(const int& level, const PerTeamValue& per_team, const PerThreadValue& per_thread) const {
TeamPolicyInternal p = *this;
p.m_team_scratch_size[level] = per_team.value;
p.m_thread_scratch_size[level] = per_thread.value;
return p;
};
private:
/** \brief finalize chunk_size if it was set to AUTO*/
inline void set_auto_chunk_size() {
int concurrency = traits::execution_space::thread_pool_size(0)/m_team_alloc;
if( concurrency==0 ) concurrency=1;
if(m_chunk_size > 0) {
if(!Impl::is_integral_power_of_two( m_chunk_size ))
Kokkos::abort("TeamPolicy blocking granularity must be power of two" );
}
int new_chunk_size = 1;
while(new_chunk_size*100*concurrency < m_league_size)
new_chunk_size *= 2;
if(new_chunk_size < 128) {
new_chunk_size = 1;
while( (new_chunk_size*40*concurrency < m_league_size ) && (new_chunk_size<128) )
new_chunk_size*=2;
}
m_chunk_size = new_chunk_size;
}
public:
typedef Impl::ThreadsExecTeamMember member_type ;
friend class Impl::ThreadsExecTeamMember ;
};
} /*namespace Impl */
} /* namespace Kokkos */
namespace Kokkos {
template< typename iType >
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >
TeamThreadRange( const Impl::ThreadsExecTeamMember& thread, const iType& count )
{
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >( thread, count );
}
template< typename iType1, typename iType2 >
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
Impl::ThreadsExecTeamMember>
TeamThreadRange( const Impl::ThreadsExecTeamMember& thread, const iType1 & begin, const iType2 & end )
{
typedef typename std::common_type< iType1, iType2 >::type iType;
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::ThreadsExecTeamMember >( thread, iType(begin), iType(end) );
}
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >
ThreadVectorRange(const Impl::ThreadsExecTeamMember& thread, const iType& count) {
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >(thread,count);
}
KOKKOS_INLINE_FUNCTION
Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember> PerTeam(const Impl::ThreadsExecTeamMember& thread) {
return Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember>(thread);
}
KOKKOS_INLINE_FUNCTION
Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember> PerThread(const Impl::ThreadsExecTeamMember& thread) {
return Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember>(thread);
}
} // namespace Kokkos
namespace Kokkos {
/** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>& loop_boundaries, const Lambda& lambda) {
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Inter-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>& loop_boundaries,
const Lambda & lambda, ValueType& result) {
result = ValueType();
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
result+=tmp;
}
result = loop_boundaries.thread.team_reduce(result,Impl::JoinAdd<ValueType>());
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember>& loop_boundaries,
const Lambda & lambda, const JoinType& join, ValueType& init_result) {
ValueType result = init_result;
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
ValueType tmp = ValueType();
lambda(i,tmp);
join(result,tmp);
}
init_result = loop_boundaries.thread.team_reduce(result,Impl::JoinLambdaAdapter<ValueType,JoinType>(join));
}
} //namespace Kokkos
namespace Kokkos {
/** \brief Intra-thread vector parallel_for. Executes lambda(iType i) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread.
* This functionality requires C++11 support.*/
template<typename iType, class Lambda>
KOKKOS_INLINE_FUNCTION
void parallel_for(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
loop_boundaries, const Lambda& lambda) {
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
lambda(i);
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
loop_boundaries, const Lambda & lambda, ValueType& result) {
result = ValueType();
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- result+=tmp;
+ lambda(i,result);
}
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
- loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& init_result) {
+ loop_boundaries, const Lambda & lambda, const JoinType& join, ValueType& result ) {
- ValueType result = init_result;
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- join(result,tmp);
+ lambda(i,result);
}
- init_result = result;
}
/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
* for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes in the thread and a scan operation is performed.
* Depending on the target execution space the operator might be called twice: once with final=false
* and once with final=true. When final==true val contains the prefix sum value. The contribution of this
* "i" needs to be added to val no matter whether final==true or not. In a serial execution
* (i.e. team_size==1) the operator is only called once with final==true. Scan_val will be set
* to the final sum value over all vector lanes.
* This functionality requires C++11 support.*/
template< typename iType, class FunctorType >
KOKKOS_INLINE_FUNCTION
void parallel_scan(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ThreadsExecTeamMember >&
loop_boundaries, const FunctorType & lambda) {
typedef Kokkos::Impl::FunctorValueTraits< FunctorType , void > ValueTraits ;
typedef typename ValueTraits::value_type value_type ;
value_type scan_val = value_type();
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,scan_val,true);
}
}
} // namespace Kokkos
namespace Kokkos {
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda) {
lambda();
}
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda) {
if(single_struct.team_member.team_rank()==0) lambda();
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
lambda(val);
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::ThreadsExecTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
if(single_struct.team_member.team_rank()==0) {
lambda(val);
}
single_struct.team_member.team_broadcast(val,0);
}
}
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #define KOKKOS_THREADSTEAM_HPP */
diff --git a/lib/kokkos/core/src/impl/KokkosExp_Host_IterateTile.hpp b/lib/kokkos/core/src/impl/KokkosExp_Host_IterateTile.hpp
new file mode 100644
index 000000000..c4db3e15e
--- /dev/null
+++ b/lib/kokkos/core/src/impl/KokkosExp_Host_IterateTile.hpp
@@ -0,0 +1,2356 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#ifndef KOKKOS_HOST_EXP_ITERATE_TILE_HPP
+#define KOKKOS_HOST_EXP_ITERATE_TILE_HPP
+
+#include <iostream>
+#include <algorithm>
+#include <stdio.h>
+
+#include <Kokkos_Macros.hpp>
+
+#if defined(KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION) && defined(KOKKOS_HAVE_PRAGMA_IVDEP) && !defined(__CUDA_ARCH__)
+#define KOKKOS_MDRANGE_IVDEP
+#endif
+
+
+#ifdef KOKKOS_MDRANGE_IVDEP
+ #define KOKKOS_ENABLE_IVDEP_MDRANGE _Pragma("ivdep")
+#else
+ #define KOKKOS_ENABLE_IVDEP_MDRANGE
+#endif
+
+
+
+namespace Kokkos { namespace Experimental { namespace Impl {
+
+// Temporary, for testing new loop macros
+#define KOKKOS_ENABLE_NEW_LOOP_MACROS 1
+
+
+#define LOOP_1L(type, tile) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0=0; i0<static_cast<type>(tile[0]); ++i0)
+
+#define LOOP_2L(type, tile) \
+ for( type i1=0; i1<static_cast<type>(tile[1]); ++i1) \
+ LOOP_1L(type, tile)
+
+#define LOOP_3L(type, tile) \
+ for( type i2=0; i2<static_cast<type>(tile[2]); ++i2) \
+ LOOP_2L(type, tile)
+
+#define LOOP_4L(type, tile) \
+ for( type i3=0; i3<static_cast<type>(tile[3]); ++i3) \
+ LOOP_3L(type, tile)
+
+#define LOOP_5L(type, tile) \
+ for( type i4=0; i4<static_cast<type>(tile[4]); ++i4) \
+ LOOP_4L(type, tile)
+
+#define LOOP_6L(type, tile) \
+ for( type i5=0; i5<static_cast<type>(tile[5]); ++i5) \
+ LOOP_5L(type, tile)
+
+#define LOOP_7L(type, tile) \
+ for( type i6=0; i6<static_cast<type>(tile[6]); ++i6) \
+ LOOP_6L(type, tile)
+
+#define LOOP_8L(type, tile) \
+ for( type i7=0; i7<static_cast<type>(tile[7]); ++i7) \
+ LOOP_7L(type, tile)
+
+
+#define LOOP_1R(type, tile) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for ( type i0=0; i0<static_cast<type>(tile[0]); ++i0 )
+
+#define LOOP_2R(type, tile) \
+ LOOP_1R(type, tile) \
+ for ( type i1=0; i1<static_cast<type>(tile[1]); ++i1 )
+
+#define LOOP_3R(type, tile) \
+ LOOP_2R(type, tile) \
+ for ( type i2=0; i2<static_cast<type>(tile[2]); ++i2 )
+
+#define LOOP_4R(type, tile) \
+ LOOP_3R(type, tile) \
+ for ( type i3=0; i3<static_cast<type>(tile[3]); ++i3 )
+
+#define LOOP_5R(type, tile) \
+ LOOP_4R(type, tile) \
+ for ( type i4=0; i4<static_cast<type>(tile[4]); ++i4 )
+
+#define LOOP_6R(type, tile) \
+ LOOP_5R(type, tile) \
+ for ( type i5=0; i5<static_cast<type>(tile[5]); ++i5 )
+
+#define LOOP_7R(type, tile) \
+ LOOP_6R(type, tile) \
+ for ( type i6=0; i6<static_cast<type>(tile[6]); ++i6 )
+
+#define LOOP_8R(type, tile) \
+ LOOP_7R(type, tile) \
+ for ( type i7=0; i7<static_cast<type>(tile[7]); ++i7 )
+
+
+#define LOOP_ARGS_1 i0 + m_offset[0]
+#define LOOP_ARGS_2 LOOP_ARGS_1, i1 + m_offset[1]
+#define LOOP_ARGS_3 LOOP_ARGS_2, i2 + m_offset[2]
+#define LOOP_ARGS_4 LOOP_ARGS_3, i3 + m_offset[3]
+#define LOOP_ARGS_5 LOOP_ARGS_4, i4 + m_offset[4]
+#define LOOP_ARGS_6 LOOP_ARGS_5, i5 + m_offset[5]
+#define LOOP_ARGS_7 LOOP_ARGS_6, i6 + m_offset[6]
+#define LOOP_ARGS_8 LOOP_ARGS_7, i7 + m_offset[7]
+
+
+
+// New Loop Macros...
+// parallel_for, non-tagged
+#define APPLY( func, ... ) \
+ func( __VA_ARGS__ );
+
+// LayoutRight
+// d = 0 to start
+#define LOOP_R_1( func, type, m_offset, extent, d, ... ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
+ APPLY( func, __VA_ARGS__, i0 + m_offset[d] ) \
+ }
+
+#define LOOP_R_2( func, type, m_offset, extent, d, ... ) \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
+ LOOP_R_1( func, type, m_offset, extent, d+1 , __VA_ARGS__, i1 + m_offset[d] ) \
+ }
+
+#define LOOP_R_3( func, type, m_offset, extent, d, ... ) \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
+ LOOP_R_2( func, type, m_offset, extent, d+1 , __VA_ARGS__, i2 + m_offset[d] ) \
+ }
+
+#define LOOP_R_4( func, type, m_offset, extent, d, ... ) \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
+ LOOP_R_3( func, type, m_offset, extent, d+1 , __VA_ARGS__, i3 + m_offset[d] ) \
+ }
+
+#define LOOP_R_5( func, type, m_offset, extent, d, ... ) \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
+ LOOP_R_4( func, type, m_offset, extent, d+1 , __VA_ARGS__, i4 + m_offset[d] ) \
+ }
+
+#define LOOP_R_6( func, type, m_offset, extent, d, ... ) \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
+ LOOP_R_5( func, type, m_offset, extent, d+1 , __VA_ARGS__, i5 + m_offset[d] ) \
+ }
+
+#define LOOP_R_7( func, type, m_offset, extent, d, ... ) \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
+ LOOP_R_6( func, type, m_offset, extent, d+1 , __VA_ARGS__, i6 + m_offset[d] ) \
+ }
+
+#define LOOP_R_8( func, type, m_offset, extent, d, ... ) \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
+ LOOP_R_7( func, type, m_offset, extent, d+1 , __VA_ARGS__, i7 + m_offset[d] ) \
+ }
+
+//LayoutLeft
+// d = rank-1 to start
+#define LOOP_L_1( func, type, m_offset, extent, d, ... ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
+ APPLY( func, i0 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_2( func, type, m_offset, extent, d, ... ) \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
+ LOOP_L_1( func, type, m_offset, extent, d-1, i1 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_3( func, type, m_offset, extent, d, ... ) \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
+ LOOP_L_2( func, type, m_offset, extent, d-1, i2 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_4( func, type, m_offset, extent, d, ... ) \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
+ LOOP_L_3( func, type, m_offset, extent, d-1, i3 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_5( func, type, m_offset, extent, d, ... ) \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
+ LOOP_L_4( func, type, m_offset, extent, d-1, i4 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_6( func, type, m_offset, extent, d, ... ) \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
+ LOOP_L_5( func, type, m_offset, extent, d-1, i5 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_7( func, type, m_offset, extent, d, ... ) \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
+ LOOP_L_6( func, type, m_offset, extent, d-1, i6 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_8( func, type, m_offset, extent, d, ... ) \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
+ LOOP_L_7( func, type, m_offset, extent, d-1, i7 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+// Left vs Right
+// TODO: rank not necessary to pass through, can hardcode the values
+#define LOOP_LAYOUT_1( func, type, is_left, m_offset, extent, rank ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[0]); ++i0) { \
+ APPLY( func, i0 + m_offset[0] ) \
+ }
+
+#define LOOP_LAYOUT_2( func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[rank-1]); ++i1) { \
+ LOOP_L_1( func, type, m_offset, extent, rank-2, i1 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[0]); ++i1) { \
+ LOOP_R_1( func, type, m_offset, extent, 1 , i1 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_3( func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[rank-1]); ++i2) { \
+ LOOP_L_2( func, type, m_offset, extent, rank-2, i2 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[0]); ++i2) { \
+ LOOP_R_2( func, type, m_offset, extent, 1 , i2 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_4( func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[rank-1]); ++i3) { \
+ LOOP_L_3( func, type, m_offset, extent, rank-2, i3 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[0]); ++i3) { \
+ LOOP_R_3( func, type, m_offset, extent, 1 , i3 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_5( func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[rank-1]); ++i4) { \
+ LOOP_L_4( func, type, m_offset, extent, rank-2, i4 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[0]); ++i4) { \
+ LOOP_R_4( func, type, m_offset, extent, 1 , i4 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_6( func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[rank-1]); ++i5) { \
+ LOOP_L_5( func, type, m_offset, extent, rank-2, i5 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[0]); ++i5) { \
+ LOOP_R_5( func, type, m_offset, extent, 1 , i5 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_7( func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[rank-1]); ++i6) { \
+ LOOP_L_6( func, type, m_offset, extent, rank-2, i6 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[0]); ++i6) { \
+ LOOP_R_6( func, type, m_offset, extent, 1 , i6 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_8( func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[rank-1]); ++i7) { \
+ LOOP_L_7( func, type, m_offset, extent, rank-2, i7 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[0]); ++i7) { \
+ LOOP_R_7( func, type, m_offset, extent, 1 , i7 + m_offset[0] ) \
+ } \
+ }
+
+// Partial vs Full Tile
+#define TILE_LOOP_1( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_1( func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_1( func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_2( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_2( func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_2( func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_3( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_3( func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_3( func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_4( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_4( func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_4( func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_5( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_5( func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_5( func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_6( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_6( func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_6( func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_7( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_7( func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_7( func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_8( func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_8( func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_8( func, type, is_left, m_offset, extent_partial, rank ) }
+
+
+// parallel_reduce, non-tagged
+// Reduction version
+#define APPLY_REDUX( val, func, ... ) \
+ func( __VA_ARGS__, val );
+
+// LayoutRight
+// d = 0 to start
+#define LOOP_R_1_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
+ APPLY_REDUX( val, func, __VA_ARGS__, i0 + m_offset[d] ) \
+ }
+
+#define LOOP_R_2_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
+ LOOP_R_1_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i1 + m_offset[d] ) \
+ }
+
+#define LOOP_R_3_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
+ LOOP_R_2_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i2 + m_offset[d] ) \
+ }
+
+#define LOOP_R_4_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
+ LOOP_R_3_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i3 + m_offset[d] ) \
+ }
+
+#define LOOP_R_5_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
+ LOOP_R_4_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i4 + m_offset[d] ) \
+ }
+
+#define LOOP_R_6_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
+ LOOP_R_5_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i5 + m_offset[d] ) \
+ }
+
+#define LOOP_R_7_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
+ LOOP_R_6_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i6 + m_offset[d] ) \
+ }
+
+#define LOOP_R_8_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
+ LOOP_R_7_REDUX( val, func, type, m_offset, extent, d+1 , __VA_ARGS__, i7 + m_offset[d] ) \
+ }
+
+//LayoutLeft
+// d = rank-1 to start
+#define LOOP_L_1_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
+ APPLY_REDUX( val, func, i0 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_2_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
+ LOOP_L_1_REDUX( val, func, type, m_offset, extent, d-1, i1 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_3_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
+ LOOP_L_2_REDUX( val, func, type, m_offset, extent, d-1, i2 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_4_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
+ LOOP_L_3_REDUX( val, func, type, m_offset, extent, d-1, i3 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_5_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
+ LOOP_L_4_REDUX( val, func, type, m_offset, extent, d-1, i4 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_6_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
+ LOOP_L_5_REDUX( val, func, type, m_offset, extent, d-1, i5 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_7_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
+ LOOP_L_6_REDUX( val, func, type, m_offset, extent, d-1, i6 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define LOOP_L_8_REDUX( val, func, type, m_offset, extent, d, ... ) \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
+ LOOP_L_7_REDUX( val, func, type, m_offset, extent, d-1, i7 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+// Left vs Right
+#define LOOP_LAYOUT_1_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[0]); ++i0) { \
+ APPLY_REDUX( val, func, i0 + m_offset[0] ) \
+ }
+
+#define LOOP_LAYOUT_2_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[rank-1]); ++i1) { \
+ LOOP_L_1_REDUX( val, func, type, m_offset, extent, rank-2, i1 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[0]); ++i1) { \
+ LOOP_R_1_REDUX( val, func, type, m_offset, extent, 1 , i1 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_3_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[rank-1]); ++i2) { \
+ LOOP_L_2_REDUX( val, func, type, m_offset, extent, rank-2, i2 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[0]); ++i2) { \
+ LOOP_R_2_REDUX( val, func, type, m_offset, extent, 1 , i2 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_4_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[rank-1]); ++i3) { \
+ LOOP_L_3_REDUX( val, func, type, m_offset, extent, rank-2, i3 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[0]); ++i3) { \
+ LOOP_R_3_REDUX( val, func, type, m_offset, extent, 1 , i3 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_5_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[rank-1]); ++i4) { \
+ LOOP_L_4_REDUX( val, func, type, m_offset, extent, rank-2, i4 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[0]); ++i4) { \
+ LOOP_R_4_REDUX( val, func, type, m_offset, extent, 1 , i4 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_6_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[rank-1]); ++i5) { \
+ LOOP_L_5_REDUX( val, func, type, m_offset, extent, rank-2, i5 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[0]); ++i5) { \
+ LOOP_R_5_REDUX( val, func, type, m_offset, extent, 1 , i5 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_7_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[rank-1]); ++i6) { \
+ LOOP_L_6_REDUX( val, func, type, m_offset, extent, rank-2, i6 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[0]); ++i6) { \
+ LOOP_R_6_REDUX( val, func, type, m_offset, extent, 1 , i6 + m_offset[0] ) \
+ } \
+ }
+
+#define LOOP_LAYOUT_8_REDUX( val, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[rank-1]); ++i7) { \
+ LOOP_L_7_REDUX( val, func, type, m_offset, extent, rank-2, i7 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[0]); ++i7) { \
+ LOOP_R_7_REDUX( val, func, type, m_offset, extent, 1 , i7 + m_offset[0] ) \
+ } \
+ }
+
+// Partial vs Full Tile
+#define TILE_LOOP_1_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_1_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_1_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_2_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_2_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_2_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_3_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_3_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_3_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_4_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_4_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_4_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_5_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_5_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_5_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_6_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_6_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_6_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_7_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_7_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_7_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TILE_LOOP_8_REDUX( val, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { LOOP_LAYOUT_8_REDUX( val, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { LOOP_LAYOUT_8_REDUX( val, func, type, is_left, m_offset, extent_partial, rank ) }
+// end New Loop Macros
+
+
+// tagged macros
+#define TAGGED_APPLY( tag, func, ... ) \
+ func( tag, __VA_ARGS__ );
+
+// LayoutRight
+// d = 0 to start
+#define TAGGED_LOOP_R_1( tag, func, type, m_offset, extent, d, ... ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
+ TAGGED_APPLY( tag, func, __VA_ARGS__, i0 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_2( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
+ TAGGED_LOOP_R_1( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i1 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_3( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
+ TAGGED_LOOP_R_2( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i2 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_4( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
+ TAGGED_LOOP_R_3( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i3 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_5( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
+ TAGGED_LOOP_R_4( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i4 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_6( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
+ TAGGED_LOOP_R_5( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i5 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_7( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
+ TAGGED_LOOP_R_6( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i6 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_8( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
+ TAGGED_LOOP_R_7( tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i7 + m_offset[d] ) \
+ }
+
+//LayoutLeft
+// d = rank-1 to start
+#define TAGGED_LOOP_L_1( tag, func, type, m_offset, extent, d, ... ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
+ TAGGED_APPLY( tag, func, i0 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_2( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
+ TAGGED_LOOP_L_1( tag, func, type, m_offset, extent, d-1, i1 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_3( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
+ TAGGED_LOOP_L_2( tag, func, type, m_offset, extent, d-1, i2 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_4( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
+ TAGGED_LOOP_L_3( tag, func, type, m_offset, extent, d-1, i3 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_5( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
+ TAGGED_LOOP_L_4( tag, func, type, m_offset, extent, d-1, i4 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_6( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
+ TAGGED_LOOP_L_5( tag, func, type, m_offset, extent, d-1, i5 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_7( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
+ TAGGED_LOOP_L_6( tag, func, type, m_offset, extent, d-1, i6 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_8( tag, func, type, m_offset, extent, d, ... ) \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
+ TAGGED_LOOP_L_7( tag, func, type, m_offset, extent, d-1, i7 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+// Left vs Right
+// TODO: rank not necessary to pass through, can hardcode the values
+#define TAGGED_LOOP_LAYOUT_1( tag, func, type, is_left, m_offset, extent, rank ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[0]); ++i0) { \
+ TAGGED_APPLY( tag, func, i0 + m_offset[0] ) \
+ }
+
+#define TAGGED_LOOP_LAYOUT_2( tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[rank-1]); ++i1) { \
+ TAGGED_LOOP_L_1( tag, func, type, m_offset, extent, rank-2, i1 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[0]); ++i1) { \
+ TAGGED_LOOP_R_1( tag, func, type, m_offset, extent, 1 , i1 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_3( tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[rank-1]); ++i2) { \
+ TAGGED_LOOP_L_2( tag, func, type, m_offset, extent, rank-2, i2 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[0]); ++i2) { \
+ TAGGED_LOOP_R_2( tag, func, type, m_offset, extent, 1 , i2 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_4( tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[rank-1]); ++i3) { \
+ TAGGED_LOOP_L_3( tag, func, type, m_offset, extent, rank-2, i3 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[0]); ++i3) { \
+ TAGGED_LOOP_R_3( tag, func, type, m_offset, extent, 1 , i3 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_5( tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[rank-1]); ++i4) { \
+ TAGGED_LOOP_L_4( tag, func, type, m_offset, extent, rank-2, i4 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[0]); ++i4) { \
+ TAGGED_LOOP_R_4( tag, func, type, m_offset, extent, 1 , i4 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_6( tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[rank-1]); ++i5) { \
+ TAGGED_LOOP_L_5( tag, func, type, m_offset, extent, rank-2, i5 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[0]); ++i5) { \
+ TAGGED_LOOP_R_5( tag, func, type, m_offset, extent, 1 , i5 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_7( tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[rank-1]); ++i6) { \
+ TAGGED_LOOP_L_6( tag, func, type, m_offset, extent, rank-2, i6 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[0]); ++i6) { \
+ TAGGED_LOOP_R_6( tag, func, type, m_offset, extent, 1 , i6 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_8( tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[rank-1]); ++i7) { \
+ TAGGED_LOOP_L_7( tag, func, type, m_offset, extent, rank-2, i7 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[0]); ++i7) { \
+ TAGGED_LOOP_R_7( tag, func, type, m_offset, extent, 1 , i7 + m_offset[0] ) \
+ } \
+ }
+
+// Partial vs Full Tile
+#define TAGGED_TILE_LOOP_1( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_1( tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_1( tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_2( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_2( tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_2( tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_3( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_3( tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_3( tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_4( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_4( tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_4( tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_5( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_5( tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_5( tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_6( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_6( tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_6( tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_7( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_7( tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_7( tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_8( tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_8( tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_8( tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+
+// parallel_reduce, tagged
+// Reduction version
+#define TAGGED_APPLY_REDUX( val, tag, func, ... ) \
+ func( tag, __VA_ARGS__, val );
+
+// LayoutRight
+// d = 0 to start
+#define TAGGED_LOOP_R_1_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
+ TAGGED_APPLY_REDUX( val, tag, func, __VA_ARGS__, i0 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_2_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
+ TAGGED_LOOP_R_1_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i1 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_3_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
+ TAGGED_LOOP_R_2_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i2 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_4_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
+ TAGGED_LOOP_R_3_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i3 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_5_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
+ TAGGED_LOOP_R_4_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i4 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_6_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
+ TAGGED_LOOP_R_5_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i5 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_7_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
+ TAGGED_LOOP_R_6_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i6 + m_offset[d] ) \
+ }
+
+#define TAGGED_LOOP_R_8_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
+ TAGGED_LOOP_R_7_REDUX( val, tag, func, type, m_offset, extent, d+1 , __VA_ARGS__, i7 + m_offset[d] ) \
+ }
+
+//LayoutLeft
+// d = rank-1 to start
+#define TAGGED_LOOP_L_1_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[d]); ++i0) { \
+ TAGGED_APPLY_REDUX( val, tag, func, i0 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_2_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[d]); ++i1) { \
+ TAGGED_LOOP_L_1_REDUX( val, tag, func, type, m_offset, extent, d-1, i1 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_3_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[d]); ++i2) { \
+ TAGGED_LOOP_L_2_REDUX( val, tag, func, type, m_offset, extent, d-1, i2 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_4_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[d]); ++i3) { \
+ TAGGED_LOOP_L_3_REDUX( val, tag, func, type, m_offset, extent, d-1, i3 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_5_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[d]); ++i4) { \
+ TAGGED_LOOP_L_4_REDUX( val, tag, func, type, m_offset, extent, d-1, i4 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_6_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[d]); ++i5) { \
+ TAGGED_LOOP_L_5_REDUX( val, tag, func, type, m_offset, extent, d-1, i5 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_7_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[d]); ++i6) { \
+ TAGGED_LOOP_L_6_REDUX( val, tag, func, type, m_offset, extent, d-1, i6 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+#define TAGGED_LOOP_L_8_REDUX( val, tag, func, type, m_offset, extent, d, ... ) \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[d]); ++i7) { \
+ TAGGED_LOOP_L_7_REDUX( val, tag, func, type, m_offset, extent, d-1, i7 + m_offset[d] , __VA_ARGS__ ) \
+ }
+
+// Left vs Right
+#define TAGGED_LOOP_LAYOUT_1_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
+ KOKKOS_ENABLE_IVDEP_MDRANGE \
+ for( type i0 = (type)0; i0 < static_cast<type>(extent[0]); ++i0) { \
+ TAGGED_APPLY_REDUX( val, tag, func, i0 + m_offset[0] ) \
+ }
+
+#define TAGGED_LOOP_LAYOUT_2_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[rank-1]); ++i1) { \
+ TAGGED_LOOP_L_1_REDUX( val, tag, func, type, m_offset, extent, rank-2, i1 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i1 = (type)0; i1 < static_cast<type>(extent[0]); ++i1) { \
+ TAGGED_LOOP_R_1_REDUX( val, tag, func, type, m_offset, extent, 1 , i1 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_3_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[rank-1]); ++i2) { \
+ TAGGED_LOOP_L_2_REDUX( val, tag, func, type, m_offset, extent, rank-2, i2 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i2 = (type)0; i2 < static_cast<type>(extent[0]); ++i2) { \
+ TAGGED_LOOP_R_2_REDUX( val, tag, func, type, m_offset, extent, 1 , i2 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_4_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[rank-1]); ++i3) { \
+ TAGGED_LOOP_L_3_REDUX( val, tag, func, type, m_offset, extent, rank-2, i3 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i3 = (type)0; i3 < static_cast<type>(extent[0]); ++i3) { \
+ TAGGED_LOOP_R_3_REDUX( val, tag, func, type, m_offset, extent, 1 , i3 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_5_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[rank-1]); ++i4) { \
+ TAGGED_LOOP_L_4_REDUX( val, tag, func, type, m_offset, extent, rank-2, i4 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i4 = (type)0; i4 < static_cast<type>(extent[0]); ++i4) { \
+ TAGGED_LOOP_R_4_REDUX( val, tag, func, type, m_offset, extent, 1 , i4 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_6_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[rank-1]); ++i5) { \
+ TAGGED_LOOP_L_5_REDUX( val, tag, func, type, m_offset, extent, rank-2, i5 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i5 = (type)0; i5 < static_cast<type>(extent[0]); ++i5) { \
+ TAGGED_LOOP_R_5_REDUX( val, tag, func, type, m_offset, extent, 1 , i5 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_7_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[rank-1]); ++i6) { \
+ TAGGED_LOOP_L_6_REDUX( val, tag, func, type, m_offset, extent, rank-2, i6 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i6 = (type)0; i6 < static_cast<type>(extent[0]); ++i6) { \
+ TAGGED_LOOP_R_6_REDUX( val, tag, func, type, m_offset, extent, 1 , i6 + m_offset[0] ) \
+ } \
+ }
+
+#define TAGGED_LOOP_LAYOUT_8_REDUX( val, tag, func, type, is_left, m_offset, extent, rank ) \
+ if (is_left) { \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[rank-1]); ++i7) { \
+ TAGGED_LOOP_L_7_REDUX( val, tag, func, type, m_offset, extent, rank-2, i7 + m_offset[rank-1] ) \
+ } \
+ } \
+ else { \
+ for( type i7 = (type)0; i7 < static_cast<type>(extent[0]); ++i7) { \
+ TAGGED_LOOP_R_7_REDUX( val, tag, func, type, m_offset, extent, 1 , i7 + m_offset[0] ) \
+ } \
+ }
+
+// Partial vs Full Tile
+#define TAGGED_TILE_LOOP_1_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_1_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_1_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_2_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_2_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_2_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_3_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_3_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_3_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_4_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_4_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_4_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_5_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_5_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_5_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_6_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_6_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_6_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_7_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_7_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_7_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+#define TAGGED_TILE_LOOP_8_REDUX( val, tag, func, type, is_left, cond, m_offset, extent_full, extent_partial, rank ) \
+ if (cond) { TAGGED_LOOP_LAYOUT_8_REDUX( val, tag, func, type, is_left, m_offset, extent_full, rank ) } \
+ else { TAGGED_LOOP_LAYOUT_8_REDUX( val, tag, func, type, is_left, m_offset, extent_partial, rank ) }
+
+// end tagged macros
+
+
+
+
+// Structs for calling loops
+template < int Rank, bool IsLeft, typename IType, typename Tagged, typename Enable = void >
+struct Tile_Loop_Type;
+
+template < bool IsLeft, typename IType >
+struct Tile_Loop_Type<1, IsLeft, IType, void, void >
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_1( func, IType, IsLeft, cond, offset, a, b, 1 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_1_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 1 );
+ }
+};
+
+template < bool IsLeft, typename IType >
+struct Tile_Loop_Type<2, IsLeft, IType, void, void>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_2( func, IType, IsLeft, cond, offset, a, b, 2 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_2_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 2 );
+ }
+};
+
+template < bool IsLeft, typename IType >
+struct Tile_Loop_Type<3, IsLeft, IType, void, void>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_3( func, IType, IsLeft, cond, offset, a, b, 3 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_3_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 3 );
+ }
+};
+
+template < bool IsLeft, typename IType >
+struct Tile_Loop_Type<4, IsLeft, IType, void, void>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_4( func, IType, IsLeft, cond, offset, a, b, 4 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_4_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 4 );
+ }
+};
+
+template < bool IsLeft, typename IType >
+struct Tile_Loop_Type<5, IsLeft, IType, void, void>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_5( func, IType, IsLeft, cond, offset, a, b, 5 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_5_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 5 );
+ }
+};
+
+template < bool IsLeft, typename IType >
+struct Tile_Loop_Type<6, IsLeft, IType, void, void>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_6( func, IType, IsLeft, cond, offset, a, b, 6 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_6_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 6 );
+ }
+};
+
+template < bool IsLeft, typename IType >
+struct Tile_Loop_Type<7, IsLeft, IType, void, void>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_7( func, IType, IsLeft, cond, offset, a, b, 7 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_7_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 7 );
+ }
+};
+
+template < bool IsLeft, typename IType >
+struct Tile_Loop_Type<8, IsLeft, IType, void, void>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_8( func, IType, IsLeft, cond, offset, a, b, 8 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TILE_LOOP_8_REDUX( value, func, IType, IsLeft, cond, offset, a, b, 8 );
+ }
+};
+
+// tagged versions
+
+template < bool IsLeft, typename IType, typename Tagged >
+struct Tile_Loop_Type<1, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type >
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_1( Tagged(), func, IType, IsLeft, cond, offset, a, b, 1 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_1_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 1 );
+ }
+};
+
+template < bool IsLeft, typename IType, typename Tagged >
+struct Tile_Loop_Type<2, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_2( Tagged(), func, IType, IsLeft, cond, offset, a, b, 2 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_2_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 2 );
+ }
+};
+
+template < bool IsLeft, typename IType, typename Tagged >
+struct Tile_Loop_Type<3, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_3( Tagged(), func, IType, IsLeft, cond, offset, a, b, 3 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_3_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 3 );
+ }
+};
+
+template < bool IsLeft, typename IType, typename Tagged >
+struct Tile_Loop_Type<4, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_4( Tagged(), func, IType, IsLeft, cond, offset, a, b, 4 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_4_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 4 );
+ }
+};
+
+template < bool IsLeft, typename IType, typename Tagged >
+struct Tile_Loop_Type<5, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_5( Tagged(), func, IType, IsLeft, cond, offset, a, b, 5 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_5_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 5 );
+ }
+};
+
+template < bool IsLeft, typename IType, typename Tagged >
+struct Tile_Loop_Type<6, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_6( Tagged(), func, IType, IsLeft, cond, offset, a, b, 6 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_6_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 6 );
+ }
+};
+
+template < bool IsLeft, typename IType, typename Tagged >
+struct Tile_Loop_Type<7, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_7( Tagged(), func, IType, IsLeft, cond, offset, a, b, 7 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_7_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 7 );
+ }
+};
+
+template < bool IsLeft, typename IType, typename Tagged >
+struct Tile_Loop_Type<8, IsLeft, IType, Tagged, typename std::enable_if< !std::is_same<Tagged,void>::value>::type>
+{
+ template < typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_8( Tagged(), func, IType, IsLeft, cond, offset, a, b, 8 );
+ }
+
+ template < typename ValType, typename Func, typename Offset, typename ExtentA, typename ExtentB >
+ static void apply(ValType &value, Func const& func, bool cond, Offset const& offset, ExtentA const& a, ExtentB const& b)
+ {
+ TAGGED_TILE_LOOP_8_REDUX( value, Tagged(), func, IType, IsLeft, cond, offset, a, b, 8 );
+ }
+};
+// end Structs for calling loops
+
+
+template <typename T>
+using is_void = std::is_same< T , void >;
+
+template < typename RP
+ , typename Functor
+ , typename Tag = void
+ , typename ValueType = void
+ , typename Enable = void
+ >
+struct HostIterateTile;
+
+//For ParallelFor
+template < typename RP
+ , typename Functor
+ , typename Tag
+ , typename ValueType
+ >
+struct HostIterateTile < RP , Functor , Tag , ValueType , typename std::enable_if< is_void<ValueType >::value >::type >
+{
+ using index_type = typename RP::index_type;
+ using point_type = typename RP::point_type;
+
+ using value_type = ValueType;
+
+ inline
+ HostIterateTile( RP const& rp, Functor const& func )
+ : m_rp(rp)
+ , m_func(func)
+ {
+ }
+
+ inline
+ bool check_iteration_bounds( point_type& partial_tile , point_type& offset ) const {
+ bool is_full_tile = true;
+
+ for ( int i = 0; i < RP::rank; ++i ) {
+ if ((offset[i] + m_rp.m_tile[i]) <= m_rp.m_upper[i]) {
+ partial_tile[i] = m_rp.m_tile[i] ;
+ }
+ else {
+ is_full_tile = false ;
+ partial_tile[i] = (m_rp.m_upper[i] - 1 - offset[i]) == 0 ? 1
+ : (m_rp.m_upper[i] - m_rp.m_tile[i]) > 0 ? (m_rp.m_upper[i] - offset[i])
+ : (m_rp.m_upper[i] - m_rp.m_lower[i]) ; // when single tile encloses range
+ }
+ }
+
+ return is_full_tile ;
+ } // end check bounds
+
+
+ template <int Rank>
+ struct RankTag
+ {
+ typedef RankTag type;
+ enum { value = (int)Rank };
+ };
+
+#if KOKKOS_ENABLE_NEW_LOOP_MACROS
+ template <typename IType>
+ inline
+ void
+ operator()(IType tile_idx) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ Tile_Loop_Type< RP::rank, (RP::inner_direction == RP::Left), index_type, Tag >::apply( m_func, full_tile, m_offset, m_rp.m_tile, m_tiledims );
+
+ }
+
+#else
+ template <typename IType>
+ inline
+ void
+ operator()(IType tile_idx) const
+ { operator_impl( tile_idx , RankTag<RP::rank>() ); }
+ // added due to compiler error when using sfinae to choose operator based on rank w/ cuda+serial
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<2> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_2L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_2 );
+ }
+ } else {
+// #pragma simd
+ LOOP_2L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_2 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_2R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_2 );
+ }
+ } else {
+// #pragma simd
+ LOOP_2R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_2 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 2
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<3> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_3L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_3 );
+ }
+ } else {
+// #pragma simd
+ LOOP_3L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_3 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_3R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_3 );
+ }
+ } else {
+// #pragma simd
+ LOOP_3R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_3 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 3
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<4> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_4L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_4 );
+ }
+ } else {
+// #pragma simd
+ LOOP_4L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_4 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_4R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_4 );
+ }
+ } else {
+// #pragma simd
+ LOOP_4R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_4 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 4
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<5> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_5L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_5 );
+ }
+ } else {
+// #pragma simd
+ LOOP_5L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_5 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_5R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_5 );
+ }
+ } else {
+// #pragma simd
+ LOOP_5R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_5 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 5
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<6> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_6L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_6 );
+ }
+ } else {
+// #pragma simd
+ LOOP_6L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_6 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_6R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_6 );
+ }
+ } else {
+// #pragma simd
+ LOOP_6R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_6 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 6
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<7> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_7L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_7 );
+ }
+ } else {
+// #pragma simd
+ LOOP_7L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_7 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_7R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_7 );
+ }
+ } else {
+// #pragma simd
+ LOOP_7R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_7 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 7
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<8> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_8L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_8 );
+ }
+ } else {
+// #pragma simd
+ LOOP_8L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_8 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_8R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_8 );
+ }
+ } else {
+// #pragma simd
+ LOOP_8R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_8 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 8
+#endif
+
+
+ template <typename... Args>
+ typename std::enable_if<( sizeof...(Args) == RP::rank && std::is_same<Tag,void>::value), void>::type
+ apply(Args &&... args) const
+ {
+ m_func(args...);
+ }
+
+ template <typename... Args>
+ typename std::enable_if<( sizeof...(Args) == RP::rank && !std::is_same<Tag,void>::value), void>::type
+ apply(Args &&... args) const
+ {
+ m_func( m_tag, args...);
+ }
+
+
+ RP const& m_rp;
+ Functor const& m_func;
+ typename std::conditional< std::is_same<Tag,void>::value,int,Tag>::type m_tag;
+// value_type & m_v;
+
+};
+
+
+// ValueType: For reductions
+template < typename RP
+ , typename Functor
+ , typename Tag
+ , typename ValueType
+ >
+struct HostIterateTile < RP , Functor , Tag , ValueType , typename std::enable_if< !is_void<ValueType >::value >::type >
+{
+ using index_type = typename RP::index_type;
+ using point_type = typename RP::point_type;
+
+ using value_type = ValueType;
+
+ inline
+ HostIterateTile( RP const& rp, Functor const& func, value_type & v )
+ : m_rp(rp) //Cuda 7.0 does not like braces...
+ , m_func(func)
+ , m_v(v) // use with non-void ValueType struct
+ {
+// Errors due to braces rather than parenthesis for init (with cuda 7.0)
+// /home/ndellin/kokkos/core/src/impl/KokkosExp_Host_IterateTile.hpp:1216:98: error: too many braces around initializer for ‘int’ [-fpermissive]
+// /home/ndellin/kokkos/core/src/impl/KokkosExp_Host_IterateTile.hpp:1216:98: error: aggregate value used where an integer was expected
+ }
+
+ inline
+ bool check_iteration_bounds( point_type& partial_tile , point_type& offset ) const {
+ bool is_full_tile = true;
+
+ for ( int i = 0; i < RP::rank; ++i ) {
+ if ((offset[i] + m_rp.m_tile[i]) <= m_rp.m_upper[i]) {
+ partial_tile[i] = m_rp.m_tile[i] ;
+ }
+ else {
+ is_full_tile = false ;
+ partial_tile[i] = (m_rp.m_upper[i] - 1 - offset[i]) == 0 ? 1
+ : (m_rp.m_upper[i] - m_rp.m_tile[i]) > 0 ? (m_rp.m_upper[i] - offset[i])
+ : (m_rp.m_upper[i] - m_rp.m_lower[i]) ; // when single tile encloses range
+ }
+ }
+
+ return is_full_tile ;
+ } // end check bounds
+
+
+ template <int Rank>
+ struct RankTag
+ {
+ typedef RankTag type;
+ enum { value = (int)Rank };
+ };
+
+
+#if KOKKOS_ENABLE_NEW_LOOP_MACROS
+ template <typename IType>
+ inline
+ void
+ operator()(IType tile_idx) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ Tile_Loop_Type< RP::rank, (RP::inner_direction == RP::Left), index_type, Tag >::apply( m_v, m_func, full_tile, m_offset, m_rp.m_tile, m_tiledims );
+
+ }
+
+#else
+ template <typename IType>
+ inline
+ void
+ operator()(IType tile_idx) const
+ { operator_impl( tile_idx , RankTag<RP::rank>() ); }
+ // added due to compiler error when using sfinae to choose operator based on rank
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<2> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_2L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_2 );
+ }
+ } else {
+// #pragma simd
+ LOOP_2L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_2 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_2R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_2 );
+ }
+ } else {
+// #pragma simd
+ LOOP_2R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_2 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 2
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<3> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_3L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_3 );
+ }
+ } else {
+// #pragma simd
+ LOOP_3L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_3 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_3R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_3 );
+ }
+ } else {
+// #pragma simd
+ LOOP_3R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_3 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 3
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<4> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_4L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_4 );
+ }
+ } else {
+// #pragma simd
+ LOOP_4L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_4 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_4R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_4 );
+ }
+ } else {
+// #pragma simd
+ LOOP_4R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_4 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 4
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<5> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_5L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_5 );
+ }
+ } else {
+// #pragma simd
+ LOOP_5L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_5 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_5R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_5 );
+ }
+ } else {
+// #pragma simd
+ LOOP_5R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_5 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 5
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<6> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_6L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_6 );
+ }
+ } else {
+// #pragma simd
+ LOOP_6L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_6 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_6R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_6 );
+ }
+ } else {
+// #pragma simd
+ LOOP_6R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_6 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 6
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<7> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_7L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_7 );
+ }
+ } else {
+// #pragma simd
+ LOOP_7L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_7 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_7R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_7 );
+ }
+ } else {
+// #pragma simd
+ LOOP_7R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_7 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 7
+
+
+ template <typename IType>
+ inline
+ void operator_impl( IType tile_idx , const RankTag<8> ) const
+ {
+ point_type m_offset;
+ point_type m_tiledims;
+
+ if (RP::outer_direction == RP::Left) {
+ for (int i=0; i<RP::rank; ++i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+ else {
+ for (int i=RP::rank-1; i>=0; --i) {
+ m_offset[i] = (tile_idx % m_rp.m_tile_end[i]) * m_rp.m_tile[i] + m_rp.m_lower[i] ;
+ tile_idx /= m_rp.m_tile_end[i];
+ }
+ }
+
+ //Check if offset+tiledim in bounds - if not, replace tile dims with the partial tile dims
+ const bool full_tile = check_iteration_bounds(m_tiledims , m_offset) ;
+
+ if (RP::inner_direction == RP::Left) {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_8L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_8 );
+ }
+ } else {
+// #pragma simd
+ LOOP_8L(index_type, m_tiledims) {
+ apply( LOOP_ARGS_8 );
+ }
+ }
+ } // end RP::Left
+ else {
+ if ( full_tile ) {
+// #pragma simd
+ LOOP_8R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_8 );
+ }
+ } else {
+// #pragma simd
+ LOOP_8R(index_type, m_tiledims) {
+ apply( LOOP_ARGS_8 );
+ }
+ }
+ } // end RP::Right
+
+ } //end op() rank == 8
+#endif
+
+
+ template <typename... Args>
+ typename std::enable_if<( sizeof...(Args) == RP::rank && std::is_same<Tag,void>::value), void>::type
+ apply(Args &&... args) const
+ {
+ m_func(args... , m_v);
+ }
+
+ template <typename... Args>
+ typename std::enable_if<( sizeof...(Args) == RP::rank && !std::is_same<Tag,void>::value), void>::type
+ apply(Args &&... args) const
+ {
+ m_func( m_tag, args... , m_v);
+ }
+
+
+ RP const& m_rp;
+ Functor const& m_func;
+ value_type & m_v;
+ typename std::conditional< std::is_same<Tag,void>::value,int,Tag>::type m_tag;
+
+};
+
+
+// ------------------------------------------------------------------ //
+
+// MDFunctor - wraps the range_policy and functor to pass to IterateTile
+// Serial, Threads, OpenMP
+// Cuda uses DeviceIterateTile directly within md_parallel_for
+// ParallelReduce
+template < typename MDRange, typename Functor, typename ValueType = void >
+struct MDFunctor
+{
+ using range_policy = MDRange;
+ using functor_type = Functor;
+ using value_type = ValueType;
+ using work_tag = typename range_policy::work_tag;
+ using index_type = typename range_policy::index_type;
+ using iterate_type = typename Kokkos::Experimental::Impl::HostIterateTile< MDRange
+ , Functor
+ , work_tag
+ , value_type
+ >;
+
+
+ inline
+ MDFunctor( MDRange const& range, Functor const& f, ValueType & v )
+ : m_range( range )
+ , m_func( f )
+ {}
+
+ inline
+ MDFunctor( MDFunctor const& ) = default;
+
+ inline
+ MDFunctor& operator=( MDFunctor const& ) = default;
+
+ inline
+ MDFunctor( MDFunctor && ) = default;
+
+ inline
+ MDFunctor& operator=( MDFunctor && ) = default;
+
+// KOKKOS_FORCEINLINE_FUNCTION //Caused cuda warning - __host__ warning
+ inline
+ void operator()(index_type t, value_type & v) const
+ {
+ iterate_type(m_range, m_func, v)(t);
+ }
+
+ MDRange m_range;
+ Functor m_func;
+};
+
+// ParallelFor
+template < typename MDRange, typename Functor >
+struct MDFunctor< MDRange, Functor, void >
+{
+ using range_policy = MDRange;
+ using functor_type = Functor;
+ using work_tag = typename range_policy::work_tag;
+ using index_type = typename range_policy::index_type;
+ using iterate_type = typename Kokkos::Experimental::Impl::HostIterateTile< MDRange
+ , Functor
+ , work_tag
+ , void
+ >;
+
+
+ inline
+ MDFunctor( MDRange const& range, Functor const& f )
+ : m_range( range )
+ , m_func( f )
+ {}
+
+ inline
+ MDFunctor( MDFunctor const& ) = default;
+
+ inline
+ MDFunctor& operator=( MDFunctor const& ) = default;
+
+ inline
+ MDFunctor( MDFunctor && ) = default;
+
+ inline
+ MDFunctor& operator=( MDFunctor && ) = default;
+
+ inline
+ void operator()(index_type t) const
+ {
+ iterate_type(m_range, m_func)(t);
+ }
+
+ MDRange m_range;
+ Functor m_func;
+};
+
+#undef KOKKOS_ENABLE_NEW_LOOP_MACROS
+
+} } } //end namespace Kokkos::Experimental::Impl
+
+
+#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_BitOps.hpp b/lib/kokkos/core/src/impl/Kokkos_BitOps.hpp
index 0ffbc0548..7d7fd3d13 100644
--- a/lib/kokkos/core/src/impl/Kokkos_BitOps.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_BitOps.hpp
@@ -1,122 +1,127 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_BITOPS_HPP
#define KOKKOS_BITOPS_HPP
#include <Kokkos_Macros.hpp>
#include <stdint.h>
#include <climits>
namespace Kokkos {
namespace Impl {
KOKKOS_FORCEINLINE_FUNCTION
int bit_scan_forward( unsigned i )
{
#if defined( __CUDA_ARCH__ )
return __ffs(i) - 1;
-#elif defined( __GNUC__ ) || defined( __GNUG__ )
- return __builtin_ffs(i) - 1;
-#elif defined( __INTEL_COMPILER )
+#elif defined( KOKKOS_COMPILER_INTEL )
return _bit_scan_forward(i);
+#elif defined( KOKKOS_COMPILER_IBM )
+ return __cnttz4(i);
+#elif defined( KOKKOS_COMPILER_GNU ) || defined( __GNUC__ ) || defined( __GNUG__ )
+ return __builtin_ffs(i) - 1;
#else
-
unsigned t = 1u;
int r = 0;
while ( i && ( i & t == 0 ) )
{
t = t << 1;
++r;
}
return r;
#endif
}
KOKKOS_FORCEINLINE_FUNCTION
int bit_scan_reverse( unsigned i )
{
enum { shift = static_cast<int>( sizeof(unsigned) * CHAR_BIT - 1 ) };
#if defined( __CUDA_ARCH__ )
return shift - __clz(i);
+#elif defined( KOKKOS_COMPILER_INTEL )
+ return _bit_scan_reverse(i);
+#elif defined( KOKKOS_COMPILER_IBM )
+ return shift - __cntlz4(i);
#elif defined( __GNUC__ ) || defined( __GNUG__ )
return shift - __builtin_clz(i);
-#elif defined( __INTEL_COMPILER )
- return _bit_scan_reverse(i);
#else
unsigned t = 1u << shift;
int r = 0;
while ( i && ( i & t == 0 ) )
{
t = t >> 1;
++r;
}
return r;
#endif
}
/// Count the number of bits set.
KOKKOS_FORCEINLINE_FUNCTION
int bit_count( unsigned i )
{
#if defined( __CUDA_ARCH__ )
return __popc(i);
-#elif defined( __GNUC__ ) || defined( __GNUG__ )
- return __builtin_popcount(i);
#elif defined ( __INTEL_COMPILER )
return _popcnt32(i);
+#elif defined( KOKKOS_COMPILER_IBM )
+ return __popcnt4(i);
+#elif defined( __GNUC__ ) || defined( __GNUG__ )
+ return __builtin_popcount(i);
#else
// http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
i = i - ( ( i >> 1 ) & ~0u / 3u ); // temp
i = ( i & ~0u / 15u * 3u ) + ( ( i >> 2 ) & ~0u / 15u * 3u ); // temp
i = ( i + ( i >> 4 ) ) & ~0u / 255u * 15u; // temp
// count
return (int)( ( i * ( ~0u / 255u ) ) >> ( sizeof(unsigned) - 1 ) * CHAR_BIT );
#endif
}
} // namespace Impl
} // namespace Kokkos
#endif // KOKKOS_BITOPS_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_Core.cpp b/lib/kokkos/core/src/impl/Kokkos_Core.cpp
index cd38eaa9d..7c38430c4 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Core.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Core.cpp
@@ -1,453 +1,771 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Error.hpp>
#include <cctype>
#include <cstring>
#include <iostream>
#include <cstdlib>
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
namespace {
bool is_unsigned_int(const char* str)
{
const size_t len = strlen (str);
for (size_t i = 0; i < len; ++i) {
if (! isdigit (str[i])) {
return false;
}
}
return true;
}
void initialize_internal(const InitArguments& args)
{
// This is an experimental setting
// For KNL in Flat mode this variable should be set, so that
// memkind allocates high bandwidth memory correctly.
#ifdef KOKKOS_ENABLE_HBWSPACE
setenv("MEMKIND_HBW_NODES", "1", 0);
#endif
// Protect declarations, to prevent "unused variable" warnings.
#if defined( KOKKOS_ENABLE_OPENMP ) || defined( KOKKOS_ENABLE_PTHREAD )
const int num_threads = args.num_threads;
const int use_numa = args.num_numa;
#endif // defined( KOKKOS_ENABLE_OPENMP ) || defined( KOKKOS_ENABLE_PTHREAD )
#if defined( KOKKOS_ENABLE_CUDA )
const int use_gpu = args.device_id;
#endif // defined( KOKKOS_ENABLE_CUDA )
#if defined( KOKKOS_ENABLE_OPENMP )
if( std::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value ||
std::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ) {
if(num_threads>0) {
if(use_numa>0) {
Kokkos::OpenMP::initialize(num_threads,use_numa);
}
else {
Kokkos::OpenMP::initialize(num_threads);
}
} else {
Kokkos::OpenMP::initialize();
}
//std::cout << "Kokkos::initialize() fyi: OpenMP enabled and initialized" << std::endl ;
}
else {
//std::cout << "Kokkos::initialize() fyi: OpenMP enabled but not initialized" << std::endl ;
}
#endif
#if defined( KOKKOS_ENABLE_PTHREAD )
if( std::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value ||
std::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ) {
if(num_threads>0) {
if(use_numa>0) {
Kokkos::Threads::initialize(num_threads,use_numa);
}
else {
Kokkos::Threads::initialize(num_threads);
}
} else {
Kokkos::Threads::initialize();
}
//std::cout << "Kokkos::initialize() fyi: Pthread enabled and initialized" << std::endl ;
}
else {
//std::cout << "Kokkos::initialize() fyi: Pthread enabled but not initialized" << std::endl ;
}
#endif
#if defined( KOKKOS_ENABLE_SERIAL )
// Prevent "unused variable" warning for 'args' input struct. If
// Serial::initialize() ever needs to take arguments from the input
// struct, you may remove this line of code.
(void) args;
if( std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value ||
std::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ) {
Kokkos::Serial::initialize();
}
#endif
#if defined( KOKKOS_ENABLE_CUDA )
if( std::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value || 0 < use_gpu ) {
if (use_gpu > -1) {
Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice( use_gpu ) );
}
else {
Kokkos::Cuda::initialize();
}
//std::cout << "Kokkos::initialize() fyi: Cuda enabled and initialized" << std::endl ;
}
#endif
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::initialize();
#endif
}
void finalize_internal( const bool all_spaces = false )
{
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
Kokkos::Profiling::finalize();
#endif
#if defined( KOKKOS_ENABLE_CUDA )
if( std::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value || all_spaces ) {
if(Kokkos::Cuda::is_initialized())
Kokkos::Cuda::finalize();
}
#endif
#if defined( KOKKOS_ENABLE_OPENMP )
if( std::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value ||
std::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ||
all_spaces ) {
if(Kokkos::OpenMP::is_initialized())
Kokkos::OpenMP::finalize();
}
#endif
#if defined( KOKKOS_ENABLE_PTHREAD )
if( std::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value ||
std::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ||
all_spaces ) {
if(Kokkos::Threads::is_initialized())
Kokkos::Threads::finalize();
}
#endif
#if defined( KOKKOS_ENABLE_SERIAL )
if( std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value ||
std::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ||
all_spaces ) {
if(Kokkos::Serial::is_initialized())
Kokkos::Serial::finalize();
}
#endif
}
void fence_internal()
{
#if defined( KOKKOS_ENABLE_CUDA )
if( std::is_same< Kokkos::Cuda , Kokkos::DefaultExecutionSpace >::value ) {
Kokkos::Cuda::fence();
}
#endif
#if defined( KOKKOS_ENABLE_OPENMP )
if( std::is_same< Kokkos::OpenMP , Kokkos::DefaultExecutionSpace >::value ||
std::is_same< Kokkos::OpenMP , Kokkos::HostSpace::execution_space >::value ) {
Kokkos::OpenMP::fence();
}
#endif
#if defined( KOKKOS_ENABLE_PTHREAD )
if( std::is_same< Kokkos::Threads , Kokkos::DefaultExecutionSpace >::value ||
std::is_same< Kokkos::Threads , Kokkos::HostSpace::execution_space >::value ) {
Kokkos::Threads::fence();
}
#endif
#if defined( KOKKOS_ENABLE_SERIAL )
if( std::is_same< Kokkos::Serial , Kokkos::DefaultExecutionSpace >::value ||
std::is_same< Kokkos::Serial , Kokkos::HostSpace::execution_space >::value ) {
Kokkos::Serial::fence();
}
#endif
}
} // namespace
} // namespace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
namespace Kokkos {
void initialize(int& narg, char* arg[])
{
int num_threads = -1;
int numa = -1;
int device = -1;
int kokkos_threads_found = 0;
int kokkos_numa_found = 0;
int kokkos_device_found = 0;
int kokkos_ndevices_found = 0;
int iarg = 0;
while (iarg < narg) {
if ((strncmp(arg[iarg],"--kokkos-threads",16) == 0) || (strncmp(arg[iarg],"--threads",9) == 0)) {
//Find the number of threads (expecting --threads=XX)
if (!((strncmp(arg[iarg],"--kokkos-threads=",17) == 0) || (strncmp(arg[iarg],"--threads=",10) == 0)))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--threads/--kokkos-threads'. Raised by Kokkos::initialize(int narg, char* argc[]).");
char* number = strchr(arg[iarg],'=')+1;
if(!Impl::is_unsigned_int(number) || (strlen(number)==0))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--threads/--kokkos-threads'. Raised by Kokkos::initialize(int narg, char* argc[]).");
if((strncmp(arg[iarg],"--kokkos-threads",16) == 0) || !kokkos_threads_found)
num_threads = atoi(number);
//Remove the --kokkos-threads argument from the list but leave --threads
if(strncmp(arg[iarg],"--kokkos-threads",16) == 0) {
for(int k=iarg;k<narg-1;k++) {
arg[k] = arg[k+1];
}
kokkos_threads_found=1;
narg--;
} else {
iarg++;
}
} else if ((strncmp(arg[iarg],"--kokkos-numa",13) == 0) || (strncmp(arg[iarg],"--numa",6) == 0)) {
//Find the number of numa (expecting --numa=XX)
if (!((strncmp(arg[iarg],"--kokkos-numa=",14) == 0) || (strncmp(arg[iarg],"--numa=",7) == 0)))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--numa/--kokkos-numa'. Raised by Kokkos::initialize(int narg, char* argc[]).");
char* number = strchr(arg[iarg],'=')+1;
if(!Impl::is_unsigned_int(number) || (strlen(number)==0))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--numa/--kokkos-numa'. Raised by Kokkos::initialize(int narg, char* argc[]).");
if((strncmp(arg[iarg],"--kokkos-numa",13) == 0) || !kokkos_numa_found)
numa = atoi(number);
//Remove the --kokkos-numa argument from the list but leave --numa
if(strncmp(arg[iarg],"--kokkos-numa",13) == 0) {
for(int k=iarg;k<narg-1;k++) {
arg[k] = arg[k+1];
}
kokkos_numa_found=1;
narg--;
} else {
iarg++;
}
} else if ((strncmp(arg[iarg],"--kokkos-device",15) == 0) || (strncmp(arg[iarg],"--device",8) == 0)) {
//Find the number of device (expecting --device=XX)
if (!((strncmp(arg[iarg],"--kokkos-device=",16) == 0) || (strncmp(arg[iarg],"--device=",9) == 0)))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--device/--kokkos-device'. Raised by Kokkos::initialize(int narg, char* argc[]).");
char* number = strchr(arg[iarg],'=')+1;
if(!Impl::is_unsigned_int(number) || (strlen(number)==0))
Impl::throw_runtime_exception("Error: expecting an '=INT' after command line argument '--device/--kokkos-device'. Raised by Kokkos::initialize(int narg, char* argc[]).");
if((strncmp(arg[iarg],"--kokkos-device",15) == 0) || !kokkos_device_found)
device = atoi(number);
//Remove the --kokkos-device argument from the list but leave --device
if(strncmp(arg[iarg],"--kokkos-device",15) == 0) {
for(int k=iarg;k<narg-1;k++) {
arg[k] = arg[k+1];
}
kokkos_device_found=1;
narg--;
} else {
iarg++;
}
} else if ((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) || (strncmp(arg[iarg],"--ndevices",10) == 0)) {
//Find the number of device (expecting --device=XX)
if (!((strncmp(arg[iarg],"--kokkos-ndevices=",18) == 0) || (strncmp(arg[iarg],"--ndevices=",11) == 0)))
Impl::throw_runtime_exception("Error: expecting an '=INT[,INT]' after command line argument '--ndevices/--kokkos-ndevices'. Raised by Kokkos::initialize(int narg, char* argc[]).");
int ndevices=-1;
int skip_device = 9999;
char* num1 = strchr(arg[iarg],'=')+1;
char* num2 = strpbrk(num1,",");
int num1_len = num2==NULL?strlen(num1):num2-num1;
char* num1_only = new char[num1_len+1];
strncpy(num1_only,num1,num1_len);
num1_only[num1_len]=0;
if(!Impl::is_unsigned_int(num1_only) || (strlen(num1_only)==0)) {
Impl::throw_runtime_exception("Error: expecting an integer number after command line argument '--kokkos-ndevices'. Raised by Kokkos::initialize(int narg, char* argc[]).");
}
if((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) || !kokkos_ndevices_found)
ndevices = atoi(num1_only);
if( num2 != NULL ) {
if(( !Impl::is_unsigned_int(num2+1) ) || (strlen(num2)==1) )
Impl::throw_runtime_exception("Error: expecting an integer number after command line argument '--kokkos-ndevices=XX,'. Raised by Kokkos::initialize(int narg, char* argc[]).");
if((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) || !kokkos_ndevices_found)
skip_device = atoi(num2+1);
}
if((strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) || !kokkos_ndevices_found) {
char *str;
//if ((str = getenv("SLURM_LOCALID"))) {
// int local_rank = atoi(str);
// device = local_rank % ndevices;
// if (device >= skip_device) device++;
//}
if ((str = getenv("MV2_COMM_WORLD_LOCAL_RANK"))) {
int local_rank = atoi(str);
device = local_rank % ndevices;
if (device >= skip_device) device++;
}
if ((str = getenv("OMPI_COMM_WORLD_LOCAL_RANK"))) {
int local_rank = atoi(str);
device = local_rank % ndevices;
if (device >= skip_device) device++;
}
if(device==-1) {
device = 0;
if (device >= skip_device) device++;
}
}
//Remove the --kokkos-ndevices argument from the list but leave --ndevices
if(strncmp(arg[iarg],"--kokkos-ndevices",17) == 0) {
for(int k=iarg;k<narg-1;k++) {
arg[k] = arg[k+1];
}
kokkos_ndevices_found=1;
narg--;
} else {
iarg++;
}
} else if ((strcmp(arg[iarg],"--kokkos-help") == 0) || (strcmp(arg[iarg],"--help") == 0)) {
std::cout << std::endl;
std::cout << "--------------------------------------------------------------------------------" << std::endl;
std::cout << "-------------Kokkos command line arguments--------------------------------------" << std::endl;
std::cout << "--------------------------------------------------------------------------------" << std::endl;
std::cout << "The following arguments exist also without prefix 'kokkos' (e.g. --help)." << std::endl;
std::cout << "The prefixed arguments will be removed from the list by Kokkos::initialize()," << std::endl;
std::cout << "the non-prefixed ones are not removed. Prefixed versions take precedence over " << std::endl;
std::cout << "non prefixed ones, and the last occurence of an argument overwrites prior" << std::endl;
std::cout << "settings." << std::endl;
std::cout << std::endl;
std::cout << "--kokkos-help : print this message" << std::endl;
std::cout << "--kokkos-threads=INT : specify total number of threads or" << std::endl;
std::cout << " number of threads per NUMA region if " << std::endl;
std::cout << " used in conjunction with '--numa' option. " << std::endl;
std::cout << "--kokkos-numa=INT : specify number of NUMA regions used by process." << std::endl;
std::cout << "--kokkos-device=INT : specify device id to be used by Kokkos. " << std::endl;
std::cout << "--kokkos-ndevices=INT[,INT] : used when running MPI jobs. Specify number of" << std::endl;
std::cout << " devices per node to be used. Process to device" << std::endl;
std::cout << " mapping happens by obtaining the local MPI rank" << std::endl;
std::cout << " and assigning devices round-robin. The optional" << std::endl;
std::cout << " second argument allows for an existing device" << std::endl;
std::cout << " to be ignored. This is most useful on workstations" << std::endl;
std::cout << " with multiple GPUs of which one is used to drive" << std::endl;
std::cout << " screen output." << std::endl;
std::cout << std::endl;
std::cout << "--------------------------------------------------------------------------------" << std::endl;
std::cout << std::endl;
//Remove the --kokkos-help argument from the list but leave --ndevices
if(strcmp(arg[iarg],"--kokkos-help") == 0) {
for(int k=iarg;k<narg-1;k++) {
arg[k] = arg[k+1];
}
narg--;
} else {
iarg++;
}
} else
iarg++;
}
InitArguments arguments;
arguments.num_threads = num_threads;
arguments.num_numa = numa;
arguments.device_id = device;
Impl::initialize_internal(arguments);
}
void initialize(const InitArguments& arguments) {
Impl::initialize_internal(arguments);
}
void finalize()
{
Impl::finalize_internal();
}
void finalize_all()
{
enum { all_spaces = true };
Impl::finalize_internal( all_spaces );
}
void fence()
{
Impl::fence_internal();
}
+void print_configuration( std::ostream & out , const bool detail )
+{
+ std::ostringstream msg;
+
+ msg << "Compiler:" << std::endl;
+#ifdef KOKKOS_COMPILER_APPLECC
+ msg << " KOKKOS_COMPILER_APPLECC: " << KOKKOS_COMPILER_APPLECC << std::endl;
+#endif
+#ifdef KOKKOS_COMPILER_CLANG
+ msg << " KOKKOS_COMPILER_CLANG: " << KOKKOS_COMPILER_CLANG << std::endl;
+#endif
+#ifdef KOKKOS_COMPILER_CRAYC
+ msg << " KOKKOS_COMPILER_CRAYC: " << KOKKOS_COMPILER_CRAYC << std::endl;
+#endif
+#ifdef KOKKOS_COMPILER_GNU
+ msg << " KOKKOS_COMPILER_GNU: " << KOKKOS_COMPILER_GNU << std::endl;
+#endif
+#ifdef KOKKOS_COMPILER_IBM
+ msg << " KOKKOS_COMPILER_IBM: " << KOKKOS_COMPILER_IBM << std::endl;
+#endif
+#ifdef KOKKOS_COMPILER_INTEL
+ msg << " KOKKOS_COMPILER_INTEL: " << KOKKOS_COMPILER_INTEL << std::endl;
+#endif
+#ifdef KOKKOS_COMPILER_NVCC
+ msg << " KOKKOS_COMPILER_NVCC: " << KOKKOS_COMPILER_NVCC << std::endl;
+#endif
+#ifdef KOKKOS_COMPILER_PGI
+ msg << " KOKKOS_COMPILER_PGI: " << KOKKOS_COMPILER_PGI << std::endl;
+#endif
+
+
+ msg << "Architecture:" << std::endl;
+#ifdef KOKKOS_ENABLE_ISA_KNC
+ msg << " KOKKOS_ENABLE_ISA_KNC: yes" << std::endl;
+#else
+ msg << " KOKKOS_ENABLE_ISA_KNC: no" << std::endl;
+#endif
+#ifdef KOKKOS_ENABLE_ISA_POWERPCLE
+ msg << " KOKKOS_ENABLE_ISA_POWERPCLE: yes" << std::endl;
+#else
+ msg << " KOKKOS_ENABLE_ISA_POWERPCLE: no" << std::endl;
+#endif
+#ifdef KOKKOS_ENABLE_ISA_X86_64
+ msg << " KOKKOS_ENABLE_ISA_X86_64: yes" << std::endl;
+#else
+ msg << " KOKKOS_ENABLE_ISA_X86_64: no" << std::endl;
+#endif
+
+
+ msg << "Devices:" << std::endl;
+ msg << " KOKKOS_ENABLE_CUDA: ";
+#ifdef KOKKOS_ENABLE_CUDA
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_OPENMP: ";
+#ifdef KOKKOS_ENABLE_OPENMP
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_PTHREAD: ";
+#ifdef KOKKOS_ENABLE_PTHREAD
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_STDTHREAD: ";
+#ifdef KOKKOS_ENABLE_STDTHREAD
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_WINTHREAD: ";
+#ifdef KOKKOS_ENABLE_WINTHREAD
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_QTHREADS: ";
+#ifdef KOKKOS_ENABLE_QTHREADS
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_SERIAL: ";
+#ifdef KOKKOS_ENABLE_SERIAL
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+
+
+ msg << "Default Device:" << std::endl;
+ msg << " KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA: ";
+#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP: ";
+#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS: ";
+#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS: ";
+#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL: ";
+#ifdef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+
+
+ msg << "Atomics:" << std::endl;
+ msg << " KOKKOS_ENABLE_CUDA_ATOMICS: ";
+#ifdef KOKKOS_ENABLE_CUDA_ATOMICS
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_GNU_ATOMICS: ";
+#ifdef KOKKOS_ENABLE_GNU_ATOMICS
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_INTEL_ATOMICS: ";
+#ifdef KOKKOS_ENABLE_INTEL_ATOMICS
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_OPENMP_ATOMICS: ";
+#ifdef KOKKOS_ENABLE_OPENMP_ATOMICS
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_WINDOWS_ATOMICS: ";
+#ifdef KOKKOS_ENABLE_WINDOWS_ATOMICS
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+
+
+ msg << "Vectorization:" << std::endl;
+ msg << " KOKKOS_ENABLE_PRAGMA_IVDEP: ";
+#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_PRAGMA_LOOPCOUNT: ";
+#ifdef KOKKOS_ENABLE_PRAGMA_LOOPCOUNT
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_PRAGMA_SIMD: ";
+#ifdef KOKKOS_ENABLE_PRAGMA_SIMD
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_PRAGMA_UNROLL: ";
+#ifdef KOKKOS_ENABLE_PRAGMA_UNROLL
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_PRAGMA_VECTOR: ";
+#ifdef KOKKOS_ENABLE_PRAGMA_VECTOR
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+
+ msg << "Memory:" << std::endl;
+ msg << " KOKKOS_ENABLE_HBWSPACE: ";
+#ifdef KOKKOS_ENABLE_HBWSPACE
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_INTEL_MM_ALLOC: ";
+#ifdef KOKKOS_ENABLE_INTEL_MM_ALLOC
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_POSIX_MEMALIGN: ";
+#ifdef KOKKOS_ENABLE_POSIX_MEMALIGN
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+
+
+ msg << "Options:" << std::endl;
+ msg << " KOKKOS_ENABLE_ASM: ";
+#ifdef KOKKOS_ENABLE_ASM
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_CXX1Z: ";
+#ifdef KOKKOS_ENABLE_CXX1Z
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK: ";
+#ifdef KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_HWLOC: ";
+#ifdef KOKKOS_ENABLE_HWLOC
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_LIBRT: ";
+#ifdef KOKKOS_ENABLE_LIBRT
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_MPI: ";
+#ifdef KOKKOS_ENABLE_MPI
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_PROFILING: ";
+#ifdef KOKKOS_ENABLE_PROFILING
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+
+#ifdef KOKKOS_ENABLE_CUDA
+ msg << "Cuda Options:" << std::endl;
+ msg << " KOKKOS_ENABLE_CUDA_LAMBDA: ";
+#ifdef KOKKOS_ENABLE_CUDA_LAMBDA
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_CUDA_LDG_INTRINSIC: ";
+#ifdef KOKKOS_ENABLE_CUDA_LDG_INTRINSIC
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE: ";
+#ifdef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_CUDA_UVM: ";
+#ifdef KOKKOS_ENABLE_CUDA_UVM
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_CUSPARSE: ";
+#ifdef KOKKOS_ENABLE_CUSPARSE
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+ msg << " KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA: ";
+#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
+ msg << "yes" << std::endl;
+#else
+ msg << "no" << std::endl;
+#endif
+
+#endif
+
+ msg << "\nRuntime Configuration:" << std::endl;
+#ifdef KOKKOS_ENABLE_CUDA
+ Cuda::print_configuration(msg, detail);
+#endif
+#ifdef KOKKOS_ENABLE_OPENMP
+ OpenMP::print_configuration(msg, detail);
+#endif
+#if defined( KOKKOS_ENABLE_PTHREAD ) || defined( WINTHREAD )
+ Threads::print_configuration(msg, detail);
+#endif
+#ifdef KOKKOS_ENABLE_QTHREADS
+ Qthreads::print_configuration(msg, detail);
+#endif
+#ifdef KOKKOS_ENABLE_SERIAL
+ Serial::print_configuration(msg, detail);
+#endif
+
+ out << msg.str() << std::endl;
+}
+
} // namespace Kokkos
diff --git a/lib/kokkos/core/src/impl/Kokkos_FunctorAnalysis.hpp b/lib/kokkos/core/src/impl/Kokkos_FunctorAnalysis.hpp
new file mode 100644
index 000000000..b425b3f19
--- /dev/null
+++ b/lib/kokkos/core/src/impl/Kokkos_FunctorAnalysis.hpp
@@ -0,0 +1,653 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#ifndef KOKKOS_FUNCTORANALYSIS_HPP
+#define KOKKOS_FUNCTORANALYSIS_HPP
+
+#include <cstddef>
+#include <Kokkos_Core_fwd.hpp>
+#include <impl/Kokkos_Traits.hpp>
+#include <impl/Kokkos_Tags.hpp>
+#include <impl/Kokkos_Reducer.hpp>
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+struct FunctorPatternInterface {
+ struct FOR {};
+ struct REDUCE {};
+ struct SCAN {};
+};
+
+/** \brief Query Functor and execution policy argument tag for value type.
+ *
+ * If 'value_type' is not explicitly declared in the functor
+ * then attempt to deduce the type from FunctorType::operator()
+ * interface used by the pattern and policy.
+ *
+ * For the REDUCE pattern generate a Reducer and finalization function
+ * derived from what is available within the functor.
+ */
+template< typename PatternInterface , class Policy , class Functor >
+struct FunctorAnalysis {
+private:
+
+ using FOR = FunctorPatternInterface::FOR ;
+ using REDUCE = FunctorPatternInterface::REDUCE ;
+ using SCAN = FunctorPatternInterface::SCAN ;
+
+ //----------------------------------------
+
+ struct VOID {};
+
+ template< typename P = Policy , typename = std::false_type >
+ struct has_work_tag
+ {
+ using type = void ;
+ using wtag = VOID ;
+ };
+
+ template< typename P >
+ struct has_work_tag
+ < P , typename std::is_same< typename P::work_tag , void >::type >
+ {
+ using type = typename P::work_tag ;
+ using wtag = typename P::work_tag ;
+ };
+
+ using Tag = typename has_work_tag<>::type ;
+ using WTag = typename has_work_tag<>::wtag ;
+
+ //----------------------------------------
+ // Check for Functor::value_type, which is either a simple type T or T[]
+
+ template< typename F , typename = std::false_type >
+ struct has_value_type { using type = void ; };
+
+ template< typename F >
+ struct has_value_type
+ < F , typename std::is_same< typename F::value_type , void >::type >
+ {
+ using type = typename F::value_type ;
+
+ static_assert( ! std::is_reference< type >::value &&
+ std::rank< type >::value <= 1 &&
+ std::extent< type >::value == 0
+ , "Kokkos Functor::value_type is T or T[]" );
+ };
+
+ //----------------------------------------
+ // If Functor::value_type does not exist then evaluate operator(),
+ // depending upon the pattern and whether the policy has a work tag,
+ // to determine the reduction or scan value_type.
+
+ template< typename F
+ , typename P = PatternInterface
+ , typename V = typename has_value_type<F>::type
+ , bool T = std::is_same< Tag , void >::value
+ >
+ struct deduce_value_type { using type = V ; };
+
+ template< typename F >
+ struct deduce_value_type< F , REDUCE , void , true > {
+
+ template< typename M , typename A >
+ KOKKOS_INLINE_FUNCTION static
+ A deduce( void (Functor::*)( M , A & ) const );
+
+ using type = decltype( deduce( & F::operator() ) );
+ };
+
+ template< typename F >
+ struct deduce_value_type< F , REDUCE , void , false > {
+
+ template< typename M , typename A >
+ KOKKOS_INLINE_FUNCTION static
+ A deduce( void (Functor::*)( WTag , M , A & ) const );
+
+ template< typename M , typename A >
+ KOKKOS_INLINE_FUNCTION static
+ A deduce( void (Functor::*)( WTag const & , M , A & ) const );
+
+ using type = decltype( deduce( & F::operator() ) );
+ };
+
+ template< typename F >
+ struct deduce_value_type< F , SCAN , void , true > {
+
+ template< typename M , typename A , typename I >
+ KOKKOS_INLINE_FUNCTION static
+ A deduce( void (Functor::*)( M , A & , I ) const );
+
+ using type = decltype( deduce( & F::operator() ) );
+ };
+
+ template< typename F >
+ struct deduce_value_type< F , SCAN , void , false > {
+
+ template< typename M , typename A , typename I >
+ KOKKOS_INLINE_FUNCTION static
+ A deduce( void (Functor::*)( WTag , M , A & , I ) const );
+
+ template< typename M , typename A , typename I >
+ KOKKOS_INLINE_FUNCTION static
+ A deduce( void (Functor::*)( WTag const & , M , A & , I ) const );
+
+ using type = decltype( deduce( & F::operator() ) );
+ };
+
+ //----------------------------------------
+
+ using candidate_type = typename deduce_value_type< Functor >::type ;
+
+ enum { candidate_is_void = std::is_same< candidate_type , void >::value
+ , candidate_is_array = std::rank< candidate_type >::value == 1 };
+
+ //----------------------------------------
+
+public:
+
+ using value_type = typename std::remove_extent< candidate_type >::type ;
+
+ static_assert( ! std::is_const< value_type >::value
+ , "Kokkos functor operator reduce argument cannot be const" );
+
+private:
+
+ // Stub to avoid defining a type 'void &'
+ using ValueType = typename
+ std::conditional< candidate_is_void , VOID , value_type >::type ;
+
+public:
+
+ using pointer_type = typename
+ std::conditional< candidate_is_void , void , ValueType * >::type ;
+
+ using reference_type = typename
+ std::conditional< candidate_is_array , ValueType * , typename
+ std::conditional< ! candidate_is_void , ValueType & , void >
+ ::type >::type ;
+
+private:
+
+ template< bool IsArray , class FF >
+ KOKKOS_INLINE_FUNCTION static
+ typename std::enable_if< IsArray , unsigned >::type
+ get_length( FF const & f ) { return f.value_count ; }
+
+ template< bool IsArray , class FF >
+ KOKKOS_INLINE_FUNCTION static
+ typename std::enable_if< ! IsArray , unsigned >::type
+ get_length( FF const & ) { return 1 ; }
+
+public:
+
+ enum { StaticValueSize = ! candidate_is_void &&
+ ! candidate_is_array
+ ? sizeof(ValueType) : 0 };
+
+ KOKKOS_FORCEINLINE_FUNCTION static
+ unsigned value_count( const Functor & f )
+ { return FunctorAnalysis::template get_length< candidate_is_array >(f); }
+
+ KOKKOS_FORCEINLINE_FUNCTION static
+ unsigned value_size( const Functor & f )
+ { return FunctorAnalysis::template get_length< candidate_is_array >(f) * sizeof(ValueType); }
+
+ //----------------------------------------
+
+ template< class Unknown >
+ KOKKOS_FORCEINLINE_FUNCTION static
+ unsigned value_count( const Unknown & )
+ { return 1 ; }
+
+ template< class Unknown >
+ KOKKOS_FORCEINLINE_FUNCTION static
+ unsigned value_size( const Unknown & )
+ { return sizeof(ValueType); }
+
+private:
+
+ enum INTERFACE : int
+ { DISABLE = 0
+ , NO_TAG_NOT_ARRAY = 1
+ , NO_TAG_IS_ARRAY = 2
+ , HAS_TAG_NOT_ARRAY = 3
+ , HAS_TAG_IS_ARRAY = 4
+ , DEDUCED =
+ ! std::is_same< PatternInterface , REDUCE >::value ? DISABLE : (
+ std::is_same<Tag,void>::value
+ ? (candidate_is_array ? NO_TAG_IS_ARRAY : NO_TAG_NOT_ARRAY)
+ : (candidate_is_array ? HAS_TAG_IS_ARRAY : HAS_TAG_NOT_ARRAY) )
+ };
+
+ //----------------------------------------
+ // parallel_reduce join operator
+
+ template< class F , INTERFACE >
+ struct has_join_function ;
+
+ template< class F >
+ struct has_join_function< F , NO_TAG_NOT_ARRAY >
+ {
+ typedef volatile ValueType & vref_type ;
+ typedef volatile const ValueType & cvref_type ;
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void join( F const & f
+ , ValueType volatile * dst
+ , ValueType volatile const * src )
+ { f.join( *dst , *src ); }
+ };
+
+ template< class F >
+ struct has_join_function< F , NO_TAG_IS_ARRAY >
+ {
+ typedef volatile ValueType * vref_type ;
+ typedef volatile const ValueType * cvref_type ;
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void join( F const & f
+ , ValueType volatile * dst
+ , ValueType volatile const * src )
+ { f.join( dst , src ); }
+ };
+
+ template< class F >
+ struct has_join_function< F , HAS_TAG_NOT_ARRAY >
+ {
+ typedef volatile ValueType & vref_type ;
+ typedef volatile const ValueType & cvref_type ;
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag , vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag , vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag const & , vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag const & , vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void join( F const & f
+ , ValueType volatile * dst
+ , ValueType volatile const * src )
+ { f.join( WTag() , *dst , *src ); }
+ };
+
+ template< class F >
+ struct has_join_function< F , HAS_TAG_IS_ARRAY >
+ {
+ typedef volatile ValueType * vref_type ;
+ typedef volatile const ValueType * cvref_type ;
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag , vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag , vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag const & , vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag const & , vref_type , cvref_type ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void join( F const & f
+ , ValueType volatile * dst
+ , ValueType volatile const * src )
+ { f.join( WTag() , dst , src ); }
+ };
+
+
+ template< class F = Functor
+ , INTERFACE = DEDUCED
+ , typename = void >
+ struct DeduceJoin
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void join( F const & f
+ , ValueType volatile * dst
+ , ValueType volatile const * src )
+ {
+ const int n = FunctorAnalysis::value_count( f );
+ for ( int i = 0 ; i < n ; ++i ) dst[i] += src[i];
+ }
+ };
+
+ template< class F >
+ struct DeduceJoin< F , DISABLE , void >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void join( F const &
+ , ValueType volatile *
+ , ValueType volatile const * ) {}
+ };
+
+ template< class F , INTERFACE I >
+ struct DeduceJoin< F , I ,
+ decltype( has_join_function<F,I>::enable_if( & F::join ) ) >
+ : public has_join_function<F,I> {};
+
+ //----------------------------------------
+
+ template< class , INTERFACE >
+ struct has_init_function ;
+
+ template< class F >
+ struct has_init_function< F , NO_TAG_NOT_ARRAY >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void init( F const & f , ValueType * dst )
+ { f.init( *dst ); }
+ };
+
+ template< class F >
+ struct has_init_function< F , NO_TAG_IS_ARRAY >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void init( F const & f , ValueType * dst )
+ { f.init( dst ); }
+ };
+
+ template< class F >
+ struct has_init_function< F , HAS_TAG_NOT_ARRAY >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag , ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag const & , ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag , ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag const & , ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void init( F const & f , ValueType * dst )
+ { f.init( WTag(), *dst ); }
+ };
+
+ template< class F >
+ struct has_init_function< F , HAS_TAG_IS_ARRAY >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag , ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag const & , ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag , ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag const & , ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void init( F const & f , ValueType * dst )
+ { f.init( WTag(), dst ); }
+ };
+
+ template< class F = Functor
+ , INTERFACE = DEDUCED
+ , typename = void >
+ struct DeduceInit
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void init( F const & , ValueType * dst ) { new(dst) ValueType(); }
+ };
+
+ template< class F >
+ struct DeduceInit< F , DISABLE , void >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void init( F const & , ValueType * ) {}
+ };
+
+ template< class F , INTERFACE I >
+ struct DeduceInit< F , I ,
+ decltype( has_init_function<F,I>::enable_if( & F::init ) ) >
+ : public has_init_function<F,I> {};
+
+ //----------------------------------------
+
+public:
+
+ struct Reducer
+ {
+ private:
+
+ Functor const & m_functor ;
+ ValueType * const m_result ;
+ int const m_length ;
+
+ public:
+
+ using reducer = Reducer ;
+ using value_type = FunctorAnalysis::value_type ;
+ using memory_space = void ;
+ using reference_type = FunctorAnalysis::reference_type ;
+
+ KOKKOS_INLINE_FUNCTION
+ void join( ValueType volatile * dst
+ , ValueType volatile const * src ) const noexcept
+ { DeduceJoin<>::join( m_functor , dst , src ); }
+
+ KOKKOS_INLINE_FUNCTION
+ void init( ValueType * dst ) const noexcept
+ { DeduceInit<>::init( m_functor , dst ); }
+
+ KOKKOS_INLINE_FUNCTION explicit
+ constexpr Reducer( Functor const & arg_functor
+ , ValueType * arg_value = 0
+ , int arg_length = 0 ) noexcept
+ : m_functor( arg_functor ), m_result(arg_value), m_length(arg_length) {}
+
+ KOKKOS_INLINE_FUNCTION
+ constexpr int length() const noexcept { return m_length ; }
+
+ KOKKOS_INLINE_FUNCTION
+ ValueType & operator[]( int i ) const noexcept
+ { return m_result[i]; }
+
+ private:
+
+ template< bool IsArray >
+ constexpr
+ typename std::enable_if< IsArray , ValueType * >::type
+ ref() const noexcept { return m_result ; }
+
+ template< bool IsArray >
+ constexpr
+ typename std::enable_if< ! IsArray , ValueType & >::type
+ ref() const noexcept { return *m_result ; }
+
+ public:
+
+ KOKKOS_INLINE_FUNCTION
+ auto result() const noexcept
+ -> decltype( Reducer::template ref< candidate_is_array >() )
+ { return Reducer::template ref< candidate_is_array >(); }
+ };
+
+ //----------------------------------------
+
+private:
+
+ template< class , INTERFACE >
+ struct has_final_function ;
+
+ // No tag, not array
+ template< class F >
+ struct has_final_function< F , NO_TAG_NOT_ARRAY >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void final( F const & f , ValueType * dst )
+ { f.final( *dst ); }
+ };
+
+ // No tag, is array
+ template< class F >
+ struct has_final_function< F , NO_TAG_IS_ARRAY >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void final( F const & f , ValueType * dst )
+ { f.final( dst ); }
+ };
+
+ // Has tag, not array
+ template< class F >
+ struct has_final_function< F , HAS_TAG_NOT_ARRAY >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag , ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag const & , ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag , ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag const & , ValueType & ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void final( F const & f , ValueType * dst )
+ { f.final( WTag(), *dst ); }
+ };
+
+ // Has tag, is array
+ template< class F >
+ struct has_final_function< F , HAS_TAG_IS_ARRAY >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag , ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (F::*)( WTag const & , ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag , ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void enable_if( void (*)( WTag const & , ValueType * ) );
+
+ KOKKOS_INLINE_FUNCTION static
+ void final( F const & f , ValueType * dst )
+ { f.final( WTag(), dst ); }
+ };
+
+ template< class F = Functor
+ , INTERFACE = DEDUCED
+ , typename = void >
+ struct DeduceFinal
+ {
+ KOKKOS_INLINE_FUNCTION
+ static void final( F const & , ValueType * ) {}
+ };
+
+ template< class F , INTERFACE I >
+ struct DeduceFinal< F , I ,
+ decltype( has_final_function<F,I>::enable_if( & F::final ) ) >
+ : public has_init_function<F,I> {};
+
+public:
+
+ static void final( Functor const & f , ValueType * result )
+ { DeduceFinal<>::final( f , result ); }
+
+};
+
+} // namespace Impl
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+#endif /* KOKKOS_FUNCTORANALYSIS_HPP */
+
diff --git a/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp b/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp
index 96d30d0c4..eb1f5ce96 100644
--- a/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_HBWSpace.cpp
@@ -1,399 +1,399 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#include <stddef.h>
#include <stdlib.h>
#include <stdint.h>
#include <memory.h>
#include <iostream>
#include <sstream>
#include <cstring>
#include <algorithm>
#include <Kokkos_HBWSpace.hpp>
#include <impl/Kokkos_Error.hpp>
#include <Kokkos_Atomic.hpp>
#ifdef KOKKOS_ENABLE_HBWSPACE
#include <memkind.h>
#endif
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
#include <impl/Kokkos_Profiling_Interface.hpp>
#endif
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#ifdef KOKKOS_ENABLE_HBWSPACE
#define MEMKIND_TYPE MEMKIND_HBW //hbw_get_kind(HBW_PAGESIZE_4KB)
namespace Kokkos {
namespace Experimental {
namespace {
static const int QUERY_SPACE_IN_PARALLEL_MAX = 16 ;
typedef int (* QuerySpaceInParallelPtr )();
QuerySpaceInParallelPtr s_in_parallel_query[ QUERY_SPACE_IN_PARALLEL_MAX ] ;
int s_in_parallel_query_count = 0 ;
} // namespace <empty>
void HBWSpace::register_in_parallel( int (*device_in_parallel)() )
{
if ( 0 == device_in_parallel ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::HBWSpace::register_in_parallel ERROR : given NULL" ) );
}
int i = -1 ;
if ( ! (device_in_parallel)() ) {
for ( i = 0 ; i < s_in_parallel_query_count && ! (*(s_in_parallel_query[i]))() ; ++i );
}
if ( i < s_in_parallel_query_count ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::HBWSpace::register_in_parallel_query ERROR : called in_parallel" ) );
}
if ( QUERY_SPACE_IN_PARALLEL_MAX <= i ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Experimental::HBWSpace::register_in_parallel_query ERROR : exceeded maximum" ) );
}
for ( i = 0 ; i < s_in_parallel_query_count && s_in_parallel_query[i] != device_in_parallel ; ++i );
if ( i == s_in_parallel_query_count ) {
s_in_parallel_query[s_in_parallel_query_count++] = device_in_parallel ;
}
}
int HBWSpace::in_parallel()
{
const int n = s_in_parallel_query_count ;
int i = 0 ;
while ( i < n && ! (*(s_in_parallel_query[i]))() ) { ++i ; }
return i < n ;
}
} // namespace Experiemtal
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Experimental {
/* Default allocation mechanism */
HBWSpace::HBWSpace()
: m_alloc_mech(
HBWSpace::STD_MALLOC
)
{
printf("Init\n");
setenv("MEMKIND_HBW_NODES", "1", 0);
}
/* Default allocation mechanism */
HBWSpace::HBWSpace( const HBWSpace::AllocationMechanism & arg_alloc_mech )
: m_alloc_mech( HBWSpace::STD_MALLOC )
{
printf("Init2\n");
setenv("MEMKIND_HBW_NODES", "1", 0);
if ( arg_alloc_mech == STD_MALLOC ) {
m_alloc_mech = HBWSpace::STD_MALLOC ;
}
}
void * HBWSpace::allocate( const size_t arg_alloc_size ) const
{
static_assert( sizeof(void*) == sizeof(uintptr_t)
, "Error sizeof(void*) != sizeof(uintptr_t)" );
static_assert( Kokkos::Impl::power_of_two< Kokkos::Impl::MEMORY_ALIGNMENT >::value
, "Memory alignment must be power of two" );
constexpr uintptr_t alignment = Kokkos::Impl::MEMORY_ALIGNMENT ;
constexpr uintptr_t alignment_mask = alignment - 1 ;
void * ptr = 0 ;
if ( arg_alloc_size ) {
if ( m_alloc_mech == STD_MALLOC ) {
// Over-allocate to and round up to guarantee proper alignment.
size_t size_padded = arg_alloc_size + sizeof(void*) + alignment ;
void * alloc_ptr = memkind_malloc(MEMKIND_TYPE, size_padded );
if (alloc_ptr) {
uintptr_t address = reinterpret_cast<uintptr_t>(alloc_ptr);
// offset enough to record the alloc_ptr
address += sizeof(void *);
uintptr_t rem = address % alignment;
uintptr_t offset = rem ? (alignment - rem) : 0u;
address += offset;
ptr = reinterpret_cast<void *>(address);
// record the alloc'd pointer
address -= sizeof(void *);
*reinterpret_cast<void **>(address) = alloc_ptr;
}
}
}
if ( ( ptr == 0 ) || ( reinterpret_cast<uintptr_t>(ptr) == ~uintptr_t(0) )
|| ( reinterpret_cast<uintptr_t>(ptr) & alignment_mask ) ) {
std::ostringstream msg ;
msg << "Kokkos::Experimental::HBWSpace::allocate[ " ;
switch( m_alloc_mech ) {
case STD_MALLOC: msg << "STD_MALLOC" ; break ;
}
msg << " ]( " << arg_alloc_size << " ) FAILED" ;
- if ( ptr == NULL ) { msg << " NULL" ; }
+ if ( ptr == NULL ) { msg << " NULL" ; }
else { msg << " NOT ALIGNED " << ptr ; }
std::cerr << msg.str() << std::endl ;
std::cerr.flush();
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
return ptr;
}
void HBWSpace::deallocate( void * const arg_alloc_ptr , const size_t arg_alloc_size ) const
{
if ( arg_alloc_ptr ) {
if ( m_alloc_mech == STD_MALLOC ) {
void * alloc_ptr = *(reinterpret_cast<void **>(arg_alloc_ptr) -1);
memkind_free(MEMKIND_TYPE, alloc_ptr );
- }
+ }
}
}
constexpr const char* HBWSpace::name() {
return m_name;
}
} // namespace Experimental
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
SharedAllocationRecord< void , void >
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::s_root_record ;
void
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
deallocate( SharedAllocationRecord< void , void > * arg_rec )
{
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
~SharedAllocationRecord()
{
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::deallocateData(
Kokkos::Profiling::SpaceHandle(Kokkos::Experimental::HBWSpace::name()),RecordBase::m_alloc_ptr->m_label,
data(),size());
}
#endif
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
, SharedAllocationRecord< void , void >::m_alloc_size
);
}
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
SharedAllocationRecord( const Kokkos::Experimental::HBWSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const SharedAllocationRecord< void , void >::function_type arg_dealloc
)
// Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function
: SharedAllocationRecord< void , void >
( & SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::s_root_record
, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
, sizeof(SharedAllocationHeader) + arg_alloc_size
, arg_dealloc
)
, m_space( arg_space )
{
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
}
#endif
// Fill in the Header information
RecordBase::m_alloc_ptr->m_record = static_cast< SharedAllocationRecord< void , void > * >( this );
strncpy( RecordBase::m_alloc_ptr->m_label
, arg_label.c_str()
, SharedAllocationHeader::maximum_label_length
);
}
//----------------------------------------------------------------------------
void * SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
allocate_tracked( const Kokkos::Experimental::HBWSpace & arg_space
- , const std::string & arg_alloc_label
+ , const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
if ( ! arg_alloc_size ) return (void *) 0 ;
SharedAllocationRecord * const r =
allocate( arg_space , arg_alloc_label , arg_alloc_size );
RecordBase::increment( r );
return r->data();
}
void SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
deallocate_tracked( void * const arg_alloc_ptr )
{
if ( arg_alloc_ptr != 0 ) {
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
RecordBase::decrement( r );
}
}
void * SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size )
{
SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );
Kokkos::Impl::DeepCopy<Kokkos::Experimental::HBWSpace,Kokkos::Experimental::HBWSpace>( r_new->data() , r_old->data()
, std::min( r_old->size() , r_new->size() ) );
RecordBase::increment( r_new );
RecordBase::decrement( r_old );
return r_new->data();
}
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > *
SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::get_record( void * alloc_ptr )
{
typedef SharedAllocationHeader Header ;
typedef SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void > RecordHost ;
SharedAllocationHeader const * const head = alloc_ptr ? Header::get_header( alloc_ptr ) : (SharedAllocationHeader *)0 ;
RecordHost * const record = head ? static_cast< RecordHost * >( head->m_record ) : (RecordHost *) 0 ;
if ( ! alloc_ptr || record->m_alloc_ptr != head ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::get_record ERROR" ) );
}
return record ;
}
// Iterate records to print orphaned memory ...
void SharedAllocationRecord< Kokkos::Experimental::HBWSpace , void >::
print_records( std::ostream & s , const Kokkos::Experimental::HBWSpace & space , bool detail )
{
SharedAllocationRecord< void , void >::print_host_accessible_records( s , "HBWSpace" , & s_root_record , detail );
}
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Experimental {
namespace {
const unsigned HBW_SPACE_ATOMIC_MASK = 0xFFFF;
const unsigned HBW_SPACE_ATOMIC_XOR_MASK = 0x5A39;
static int HBW_SPACE_ATOMIC_LOCKS[HBW_SPACE_ATOMIC_MASK+1];
}
namespace Impl {
void init_lock_array_hbw_space() {
static int is_initialized = 0;
if(! is_initialized)
for(int i = 0; i < static_cast<int> (HBW_SPACE_ATOMIC_MASK+1); i++)
HBW_SPACE_ATOMIC_LOCKS[i] = 0;
}
bool lock_address_hbw_space(void* ptr) {
return 0 == atomic_compare_exchange( &HBW_SPACE_ATOMIC_LOCKS[
(( size_t(ptr) >> 2 ) & HBW_SPACE_ATOMIC_MASK) ^ HBW_SPACE_ATOMIC_XOR_MASK] ,
0 , 1);
}
void unlock_address_hbw_space(void* ptr) {
atomic_exchange( &HBW_SPACE_ATOMIC_LOCKS[
(( size_t(ptr) >> 2 ) & HBW_SPACE_ATOMIC_MASK) ^ HBW_SPACE_ATOMIC_XOR_MASK] ,
0);
}
}
}
}
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp b/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp
index 3cd603728..67be86c9a 100644
--- a/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_HostSpace.cpp
@@ -1,505 +1,505 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <algorithm>
#include <Kokkos_Macros.hpp>
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
#include <impl/Kokkos_Profiling_Interface.hpp>
#endif
/*--------------------------------------------------------------------------*/
#if defined( __INTEL_COMPILER ) && ! defined ( KOKKOS_ENABLE_CUDA )
// Intel specialized allocator does not interoperate with CUDA memory allocation
#define KOKKOS_ENABLE_INTEL_MM_ALLOC
#endif
/*--------------------------------------------------------------------------*/
#if defined(KOKKOS_ENABLE_POSIX_MEMALIGN)
#include <unistd.h>
#include <sys/mman.h>
/* mmap flags for private anonymous memory allocation */
#if defined( MAP_ANONYMOUS ) && defined( MAP_PRIVATE )
#define KOKKOS_IMPL_POSIX_MMAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS)
#elif defined( MAP_ANON ) && defined( MAP_PRIVATE )
#define KOKKOS_IMPL_POSIX_MMAP_FLAGS (MAP_PRIVATE | MAP_ANON)
#endif
// mmap flags for huge page tables
// the Cuda driver does not interoperate with MAP_HUGETLB
#if defined( KOKKOS_IMPL_POSIX_MMAP_FLAGS )
#if defined( MAP_HUGETLB ) && ! defined( KOKKOS_ENABLE_CUDA )
#define KOKKOS_IMPL_POSIX_MMAP_FLAGS_HUGE (KOKKOS_IMPL_POSIX_MMAP_FLAGS | MAP_HUGETLB )
#else
#define KOKKOS_IMPL_POSIX_MMAP_FLAGS_HUGE KOKKOS_IMPL_POSIX_MMAP_FLAGS
#endif
#endif
#endif
/*--------------------------------------------------------------------------*/
#include <stddef.h>
#include <stdlib.h>
#include <stdint.h>
#include <memory.h>
#include <iostream>
#include <sstream>
#include <cstring>
#include <Kokkos_HostSpace.hpp>
#include <impl/Kokkos_Error.hpp>
#include <Kokkos_Atomic.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace {
static const int QUERY_SPACE_IN_PARALLEL_MAX = 16 ;
typedef int (* QuerySpaceInParallelPtr )();
QuerySpaceInParallelPtr s_in_parallel_query[ QUERY_SPACE_IN_PARALLEL_MAX ] ;
int s_in_parallel_query_count = 0 ;
} // namespace <empty>
void HostSpace::register_in_parallel( int (*device_in_parallel)() )
{
if ( 0 == device_in_parallel ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::HostSpace::register_in_parallel ERROR : given NULL" ) );
}
int i = -1 ;
if ( ! (device_in_parallel)() ) {
for ( i = 0 ; i < s_in_parallel_query_count && ! (*(s_in_parallel_query[i]))() ; ++i );
}
if ( i < s_in_parallel_query_count ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::HostSpace::register_in_parallel_query ERROR : called in_parallel" ) );
}
if ( QUERY_SPACE_IN_PARALLEL_MAX <= i ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::HostSpace::register_in_parallel_query ERROR : exceeded maximum" ) );
}
for ( i = 0 ; i < s_in_parallel_query_count && s_in_parallel_query[i] != device_in_parallel ; ++i );
if ( i == s_in_parallel_query_count ) {
s_in_parallel_query[s_in_parallel_query_count++] = device_in_parallel ;
}
}
int HostSpace::in_parallel()
{
const int n = s_in_parallel_query_count ;
int i = 0 ;
while ( i < n && ! (*(s_in_parallel_query[i]))() ) { ++i ; }
return i < n ;
}
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
namespace Kokkos {
/* Default allocation mechanism */
HostSpace::HostSpace()
: m_alloc_mech(
#if defined( KOKKOS_ENABLE_INTEL_MM_ALLOC )
HostSpace::INTEL_MM_ALLOC
#elif defined( KOKKOS_IMPL_POSIX_MMAP_FLAGS )
HostSpace::POSIX_MMAP
#elif defined( KOKKOS_ENABLE_POSIX_MEMALIGN )
HostSpace::POSIX_MEMALIGN
#else
HostSpace::STD_MALLOC
#endif
)
{}
/* Default allocation mechanism */
HostSpace::HostSpace( const HostSpace::AllocationMechanism & arg_alloc_mech )
: m_alloc_mech( HostSpace::STD_MALLOC )
{
if ( arg_alloc_mech == STD_MALLOC ) {
m_alloc_mech = HostSpace::STD_MALLOC ;
}
#if defined( KOKKOS_ENABLE_INTEL_MM_ALLOC )
else if ( arg_alloc_mech == HostSpace::INTEL_MM_ALLOC ) {
m_alloc_mech = HostSpace::INTEL_MM_ALLOC ;
}
#elif defined( KOKKOS_ENABLE_POSIX_MEMALIGN )
else if ( arg_alloc_mech == HostSpace::POSIX_MEMALIGN ) {
m_alloc_mech = HostSpace::POSIX_MEMALIGN ;
}
#elif defined( KOKKOS_IMPL_POSIX_MMAP_FLAGS )
else if ( arg_alloc_mech == HostSpace::POSIX_MMAP ) {
m_alloc_mech = HostSpace::POSIX_MMAP ;
}
#endif
else {
const char * const mech =
( arg_alloc_mech == HostSpace::INTEL_MM_ALLOC ) ? "INTEL_MM_ALLOC" : (
( arg_alloc_mech == HostSpace::POSIX_MEMALIGN ) ? "POSIX_MEMALIGN" : (
( arg_alloc_mech == HostSpace::POSIX_MMAP ) ? "POSIX_MMAP" : "" ));
std::string msg ;
msg.append("Kokkos::HostSpace ");
msg.append(mech);
msg.append(" is not available" );
Kokkos::Impl::throw_runtime_exception( msg );
}
}
void * HostSpace::allocate( const size_t arg_alloc_size ) const
{
static_assert( sizeof(void*) == sizeof(uintptr_t)
, "Error sizeof(void*) != sizeof(uintptr_t)" );
static_assert( Kokkos::Impl::is_integral_power_of_two( Kokkos::Impl::MEMORY_ALIGNMENT )
, "Memory alignment must be power of two" );
constexpr uintptr_t alignment = Kokkos::Impl::MEMORY_ALIGNMENT ;
constexpr uintptr_t alignment_mask = alignment - 1 ;
void * ptr = 0 ;
if ( arg_alloc_size ) {
if ( m_alloc_mech == STD_MALLOC ) {
// Over-allocate to and round up to guarantee proper alignment.
size_t size_padded = arg_alloc_size + sizeof(void*) + alignment ;
void * alloc_ptr = malloc( size_padded );
if (alloc_ptr) {
uintptr_t address = reinterpret_cast<uintptr_t>(alloc_ptr);
// offset enough to record the alloc_ptr
address += sizeof(void *);
uintptr_t rem = address % alignment;
uintptr_t offset = rem ? (alignment - rem) : 0u;
address += offset;
ptr = reinterpret_cast<void *>(address);
// record the alloc'd pointer
address -= sizeof(void *);
*reinterpret_cast<void **>(address) = alloc_ptr;
}
}
#if defined( KOKKOS_ENABLE_INTEL_MM_ALLOC )
else if ( m_alloc_mech == INTEL_MM_ALLOC ) {
ptr = _mm_malloc( arg_alloc_size , alignment );
}
#endif
#if defined( KOKKOS_ENABLE_POSIX_MEMALIGN )
else if ( m_alloc_mech == POSIX_MEMALIGN ) {
posix_memalign( & ptr, alignment , arg_alloc_size );
}
#endif
#if defined( KOKKOS_IMPL_POSIX_MMAP_FLAGS )
else if ( m_alloc_mech == POSIX_MMAP ) {
constexpr size_t use_huge_pages = (1u << 27);
constexpr int prot = PROT_READ | PROT_WRITE ;
const int flags = arg_alloc_size < use_huge_pages
? KOKKOS_IMPL_POSIX_MMAP_FLAGS
: KOKKOS_IMPL_POSIX_MMAP_FLAGS_HUGE ;
// read write access to private memory
ptr = mmap( NULL /* address hint, if NULL OS kernel chooses address */
, arg_alloc_size /* size in bytes */
, prot /* memory protection */
, flags /* visibility of updates */
, -1 /* file descriptor */
, 0 /* offset */
);
/* Associated reallocation:
ptr = mremap( old_ptr , old_size , new_size , MREMAP_MAYMOVE );
*/
}
#endif
}
if ( ( ptr == 0 ) || ( reinterpret_cast<uintptr_t>(ptr) == ~uintptr_t(0) )
|| ( reinterpret_cast<uintptr_t>(ptr) & alignment_mask ) ) {
std::ostringstream msg ;
msg << "Kokkos::HostSpace::allocate[ " ;
switch( m_alloc_mech ) {
case STD_MALLOC: msg << "STD_MALLOC" ; break ;
case POSIX_MEMALIGN: msg << "POSIX_MEMALIGN" ; break ;
case POSIX_MMAP: msg << "POSIX_MMAP" ; break ;
case INTEL_MM_ALLOC: msg << "INTEL_MM_ALLOC" ; break ;
}
msg << " ]( " << arg_alloc_size << " ) FAILED" ;
- if ( ptr == NULL ) { msg << " NULL" ; }
+ if ( ptr == NULL ) { msg << " NULL" ; }
else { msg << " NOT ALIGNED " << ptr ; }
std::cerr << msg.str() << std::endl ;
std::cerr.flush();
Kokkos::Impl::throw_runtime_exception( msg.str() );
}
return ptr;
}
void HostSpace::deallocate( void * const arg_alloc_ptr , const size_t arg_alloc_size ) const
{
if ( arg_alloc_ptr ) {
if ( m_alloc_mech == STD_MALLOC ) {
void * alloc_ptr = *(reinterpret_cast<void **>(arg_alloc_ptr) -1);
free( alloc_ptr );
- }
+ }
#if defined( KOKKOS_ENABLE_INTEL_MM_ALLOC )
else if ( m_alloc_mech == INTEL_MM_ALLOC ) {
_mm_free( arg_alloc_ptr );
}
#endif
#if defined( KOKKOS_ENABLE_POSIX_MEMALIGN )
else if ( m_alloc_mech == POSIX_MEMALIGN ) {
free( arg_alloc_ptr );
}
#endif
#if defined( KOKKOS_IMPL_POSIX_MMAP_FLAGS )
else if ( m_alloc_mech == POSIX_MMAP ) {
munmap( arg_alloc_ptr , arg_alloc_size );
}
#endif
}
}
constexpr const char* HostSpace::name() {
return m_name;
}
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
SharedAllocationRecord< void , void >
SharedAllocationRecord< Kokkos::HostSpace , void >::s_root_record ;
void
SharedAllocationRecord< Kokkos::HostSpace , void >::
deallocate( SharedAllocationRecord< void , void > * arg_rec )
{
delete static_cast<SharedAllocationRecord*>(arg_rec);
}
SharedAllocationRecord< Kokkos::HostSpace , void >::
~SharedAllocationRecord()
{
- #if (KOKKOS_ENABLE_PROFILING)
+ #if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::deallocateData(
Kokkos::Profiling::SpaceHandle(Kokkos::HostSpace::name()),RecordBase::m_alloc_ptr->m_label,
data(),size());
}
#endif
m_space.deallocate( SharedAllocationRecord< void , void >::m_alloc_ptr
, SharedAllocationRecord< void , void >::m_alloc_size
);
}
SharedAllocationRecord< Kokkos::HostSpace , void >::
SharedAllocationRecord( const Kokkos::HostSpace & arg_space
, const std::string & arg_label
, const size_t arg_alloc_size
, const SharedAllocationRecord< void , void >::function_type arg_dealloc
)
// Pass through allocated [ SharedAllocationHeader , user_memory ]
// Pass through deallocation function
: SharedAllocationRecord< void , void >
( & SharedAllocationRecord< Kokkos::HostSpace , void >::s_root_record
, reinterpret_cast<SharedAllocationHeader*>( arg_space.allocate( sizeof(SharedAllocationHeader) + arg_alloc_size ) )
, sizeof(SharedAllocationHeader) + arg_alloc_size
, arg_dealloc
)
, m_space( arg_space )
{
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::allocateData(Kokkos::Profiling::SpaceHandle(arg_space.name()),arg_label,data(),arg_alloc_size);
}
#endif
// Fill in the Header information
RecordBase::m_alloc_ptr->m_record = static_cast< SharedAllocationRecord< void , void > * >( this );
strncpy( RecordBase::m_alloc_ptr->m_label
, arg_label.c_str()
, SharedAllocationHeader::maximum_label_length
);
}
//----------------------------------------------------------------------------
void * SharedAllocationRecord< Kokkos::HostSpace , void >::
allocate_tracked( const Kokkos::HostSpace & arg_space
- , const std::string & arg_alloc_label
+ , const std::string & arg_alloc_label
, const size_t arg_alloc_size )
{
if ( ! arg_alloc_size ) return (void *) 0 ;
SharedAllocationRecord * const r =
allocate( arg_space , arg_alloc_label , arg_alloc_size );
RecordBase::increment( r );
return r->data();
}
void SharedAllocationRecord< Kokkos::HostSpace , void >::
deallocate_tracked( void * const arg_alloc_ptr )
{
if ( arg_alloc_ptr != 0 ) {
SharedAllocationRecord * const r = get_record( arg_alloc_ptr );
RecordBase::decrement( r );
}
}
void * SharedAllocationRecord< Kokkos::HostSpace , void >::
reallocate_tracked( void * const arg_alloc_ptr
, const size_t arg_alloc_size )
{
SharedAllocationRecord * const r_old = get_record( arg_alloc_ptr );
SharedAllocationRecord * const r_new = allocate( r_old->m_space , r_old->get_label() , arg_alloc_size );
Kokkos::Impl::DeepCopy<HostSpace,HostSpace>( r_new->data() , r_old->data()
, std::min( r_old->size() , r_new->size() ) );
RecordBase::increment( r_new );
RecordBase::decrement( r_old );
return r_new->data();
}
SharedAllocationRecord< Kokkos::HostSpace , void > *
SharedAllocationRecord< Kokkos::HostSpace , void >::get_record( void * alloc_ptr )
{
typedef SharedAllocationHeader Header ;
typedef SharedAllocationRecord< Kokkos::HostSpace , void > RecordHost ;
SharedAllocationHeader const * const head = alloc_ptr ? Header::get_header( alloc_ptr ) : (SharedAllocationHeader *)0 ;
RecordHost * const record = head ? static_cast< RecordHost * >( head->m_record ) : (RecordHost *) 0 ;
if ( ! alloc_ptr || record->m_alloc_ptr != head ) {
Kokkos::Impl::throw_runtime_exception( std::string("Kokkos::Impl::SharedAllocationRecord< Kokkos::HostSpace , void >::get_record ERROR" ) );
}
return record ;
}
// Iterate records to print orphaned memory ...
void SharedAllocationRecord< Kokkos::HostSpace , void >::
print_records( std::ostream & s , const Kokkos::HostSpace & space , bool detail )
{
SharedAllocationRecord< void , void >::print_host_accessible_records( s , "HostSpace" , & s_root_record , detail );
}
} // namespace Impl
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace {
const unsigned HOST_SPACE_ATOMIC_MASK = 0xFFFF;
const unsigned HOST_SPACE_ATOMIC_XOR_MASK = 0x5A39;
static int HOST_SPACE_ATOMIC_LOCKS[HOST_SPACE_ATOMIC_MASK+1];
}
namespace Impl {
void init_lock_array_host_space() {
static int is_initialized = 0;
if(! is_initialized)
for(int i = 0; i < static_cast<int> (HOST_SPACE_ATOMIC_MASK+1); i++)
HOST_SPACE_ATOMIC_LOCKS[i] = 0;
}
bool lock_address_host_space(void* ptr) {
return 0 == atomic_compare_exchange( &HOST_SPACE_ATOMIC_LOCKS[
(( size_t(ptr) >> 2 ) & HOST_SPACE_ATOMIC_MASK) ^ HOST_SPACE_ATOMIC_XOR_MASK] ,
0 , 1);
}
void unlock_address_host_space(void* ptr) {
atomic_exchange( &HOST_SPACE_ATOMIC_LOCKS[
(( size_t(ptr) >> 2 ) & HOST_SPACE_ATOMIC_MASK) ^ HOST_SPACE_ATOMIC_XOR_MASK] ,
0);
}
}
}
diff --git a/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.cpp b/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.cpp
new file mode 100644
index 000000000..ac200209c
--- /dev/null
+++ b/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.cpp
@@ -0,0 +1,463 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#include <limits>
+#include <Kokkos_Macros.hpp>
+#include <impl/Kokkos_HostThreadTeam.hpp>
+#include <impl/Kokkos_Error.hpp>
+#include <impl/Kokkos_spinwait.hpp>
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+void HostThreadTeamData::organize_pool
+ ( HostThreadTeamData * members[] , const int size )
+{
+ bool ok = true ;
+
+ // Verify not already a member of a pool:
+ for ( int rank = 0 ; rank < size && ok ; ++rank ) {
+ ok = ( 0 != members[rank] ) && ( 0 == members[rank]->m_pool_scratch );
+ }
+
+ if ( ok ) {
+
+ int64_t * const root_scratch = members[0]->m_scratch ;
+
+ for ( int i = m_pool_rendezvous ; i < m_pool_reduce ; ++i ) {
+ root_scratch[i] = 0 ;
+ }
+
+ {
+ HostThreadTeamData ** const pool =
+ (HostThreadTeamData **) (root_scratch + m_pool_members);
+
+ // team size == 1, league size == pool_size
+
+ for ( int rank = 0 ; rank < size ; ++rank ) {
+ HostThreadTeamData * const mem = members[ rank ] ;
+ mem->m_pool_scratch = root_scratch ;
+ mem->m_team_scratch = mem->m_scratch ;
+ mem->m_pool_rank = rank ;
+ mem->m_pool_size = size ;
+ mem->m_team_base = rank ;
+ mem->m_team_rank = 0 ;
+ mem->m_team_size = 1 ;
+ mem->m_team_alloc = 1 ;
+ mem->m_league_rank = rank ;
+ mem->m_league_size = size ;
+ mem->m_pool_rendezvous_step = 0 ;
+ mem->m_team_rendezvous_step = 0 ;
+ pool[ rank ] = mem ;
+ }
+ }
+
+ Kokkos::memory_fence();
+ }
+ else {
+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::HostThreadTeamData::organize_pool ERROR pool already exists");
+ }
+}
+
+void HostThreadTeamData::disband_pool()
+{
+ m_work_range.first = -1 ;
+ m_work_range.second = -1 ;
+ m_pool_scratch = 0 ;
+ m_team_scratch = 0 ;
+ m_pool_rank = 0 ;
+ m_pool_size = 1 ;
+ m_team_base = 0 ;
+ m_team_rank = 0 ;
+ m_team_size = 1 ;
+ m_team_alloc = 1 ;
+ m_league_rank = 0 ;
+ m_league_size = 1 ;
+ m_pool_rendezvous_step = 0 ;
+ m_team_rendezvous_step = 0 ;
+}
+
+int HostThreadTeamData::organize_team( const int team_size )
+{
+ // Pool is initialized
+ const bool ok_pool = 0 != m_pool_scratch ;
+
+ // Team is not set
+ const bool ok_team =
+ m_team_scratch == m_scratch &&
+ m_team_base == m_pool_rank &&
+ m_team_rank == 0 &&
+ m_team_size == 1 &&
+ m_team_alloc == 1 &&
+ m_league_rank == m_pool_rank &&
+ m_league_size == m_pool_size ;
+
+ if ( ok_pool && ok_team ) {
+
+ if ( team_size <= 0 ) return 0 ; // No teams to organize
+
+ if ( team_size == 1 ) return 1 ; // Already organized in teams of one
+
+ HostThreadTeamData * const * const pool =
+ (HostThreadTeamData **) (m_pool_scratch + m_pool_members);
+
+ // "league_size" in this context is the number of concurrent teams
+ // that the pool can accommodate. Excess threads are idle.
+ const int league_size = m_pool_size / team_size ;
+ const int team_alloc_size = m_pool_size / league_size ;
+ const int team_alloc_rank = m_pool_rank % team_alloc_size ;
+ const int league_rank = m_pool_rank / team_alloc_size ;
+ const int team_base_rank = league_rank * team_alloc_size ;
+
+ m_team_scratch = pool[ team_base_rank ]->m_scratch ;
+ m_team_base = team_base_rank ;
+ // This needs to check overflow, if m_pool_size % team_alloc_size !=0
+ // there are two corner cases:
+ // (i) if team_alloc_size == team_size there might be a non-full
+ // zombi team around (for example m_pool_size = 5 and team_size = 2
+ // (ii) if team_alloc > team_size then the last team might have less
+ // threads than the others
+ m_team_rank = ( team_base_rank + team_size <= m_pool_size ) &&
+ ( team_alloc_rank < team_size ) ?
+ team_alloc_rank : -1;
+ m_team_size = team_size ;
+ m_team_alloc = team_alloc_size ;
+ m_league_rank = league_rank ;
+ m_league_size = league_size ;
+ m_team_rendezvous_step = 0 ;
+
+ if ( team_base_rank == m_pool_rank ) {
+ // Initialize team's rendezvous memory
+ for ( int i = m_team_rendezvous ; i < m_pool_reduce ; ++i ) {
+ m_scratch[i] = 0 ;
+ }
+ // Make sure team's rendezvous memory initialized
+ // is written before proceeding.
+ Kokkos::memory_fence();
+ }
+
+ // Organizing threads into a team performs a barrier across the
+ // entire pool to insure proper initialization of the team
+ // rendezvous mechanism before a team rendezvous can be performed.
+
+ if ( pool_rendezvous() ) {
+ pool_rendezvous_release();
+ }
+ }
+ else {
+ Kokkos::Impl::throw_runtime_exception("Kokkos::Impl::HostThreadTeamData::organize_team ERROR");
+ }
+
+ return 0 <= m_team_rank ;
+}
+
+void HostThreadTeamData::disband_team()
+{
+ m_team_scratch = m_scratch ;
+ m_team_base = m_pool_rank ;
+ m_team_rank = 0 ;
+ m_team_size = 1 ;
+ m_team_alloc = 1 ;
+ m_league_rank = m_pool_rank ;
+ m_league_size = m_pool_size ;
+ m_team_rendezvous_step = 0 ;
+}
+
+//----------------------------------------------------------------------------
+/* pattern for rendezvous
+ *
+ * if ( rendezvous() ) {
+ * ... all other threads are still in team_rendezvous() ...
+ * rendezvous_release();
+ * ... all other threads are released from team_rendezvous() ...
+ * }
+ */
+
+int HostThreadTeamData::rendezvous( int64_t * const buffer
+ , int & rendezvous_step
+ , int const size
+ , int const rank ) noexcept
+{
+ enum : int { shift_byte = 3 };
+ enum : int { size_byte = ( 01 << shift_byte ) }; // == 8
+ enum : int { mask_byte = size_byte - 1 };
+
+ enum : int { shift_mem_cycle = 2 };
+ enum : int { size_mem_cycle = ( 01 << shift_mem_cycle ) }; // == 4
+ enum : int { mask_mem_cycle = size_mem_cycle - 1 };
+
+ // Cycle step values: 1 <= step <= size_val_cycle
+ // An odd multiple of memory cycle so that when a memory location
+ // is reused it has a different value.
+ // Must be representable within a single byte: size_val_cycle < 16
+
+ enum : int { size_val_cycle = 3 * size_mem_cycle };
+
+ // Requires:
+ // Called by rank = [ 0 .. size )
+ // buffer aligned to int64_t[4]
+
+ // A sequence of rendezvous uses four cycled locations in memory
+ // and non-equal cycled synchronization values to
+ // 1) prevent rendezvous from overtaking one another and
+ // 2) give each spin wait location an int64_t[4] span
+ // so that it has its own cache line.
+
+ const int step = ( rendezvous_step % size_val_cycle ) + 1 ;
+
+ rendezvous_step = step ;
+
+ // The leading int64_t[4] span is for thread 0 to write
+ // and all other threads to read spin-wait.
+ // sync_offset is the index into this array for this step.
+
+ const int sync_offset = ( step & mask_mem_cycle ) + size_mem_cycle ;
+
+ union {
+ int64_t full ;
+ int8_t byte[8] ;
+ } value ;
+
+ if ( rank ) {
+
+ const int group_begin = rank << shift_byte ; // == rank * size_byte
+
+ if ( group_begin < size ) {
+
+ // This thread waits for threads
+ // [ group_begin .. group_begin + 8 )
+ // [ rank*8 .. rank*8 + 8 )
+ // to write to their designated bytes.
+
+ const int end = group_begin + size_byte < size
+ ? size_byte : size - group_begin ;
+
+ value.full = 0 ;
+ for ( int i = 0 ; i < end ; ++i ) value.byte[i] = int8_t( step );
+
+ store_fence(); // This should not be needed but fixes #742
+
+ spinwait_until_equal( buffer[ (rank << shift_mem_cycle) + sync_offset ]
+ , value.full );
+ }
+
+ {
+ // This thread sets its designated byte.
+ // ( rank % size_byte ) +
+ // ( ( rank / size_byte ) * size_byte * size_mem_cycle ) +
+ // ( sync_offset * size_byte )
+ const int offset = ( rank & mask_byte )
+ + ( ( rank & ~mask_byte ) << shift_mem_cycle )
+ + ( sync_offset << shift_byte );
+
+ // All of this thread's previous memory stores must be complete before
+ // this thread stores the step value at this thread's designated byte
+ // in the shared synchronization array.
+
+ Kokkos::memory_fence();
+
+ ((volatile int8_t*) buffer)[ offset ] = int8_t( step );
+
+ // Memory fence to push the previous store out
+ Kokkos::memory_fence();
+ }
+
+ // Wait for thread 0 to release all other threads
+
+ spinwait_until_equal( buffer[ step & mask_mem_cycle ] , int64_t(step) );
+
+ }
+ else {
+ // Thread 0 waits for threads [1..7]
+ // to write to their designated bytes.
+
+ const int end = size_byte < size ? 8 : size ;
+
+ value.full = 0 ;
+ for ( int i = 1 ; i < end ; ++i ) value.byte[i] = int8_t( step );
+
+ spinwait_until_equal( buffer[ sync_offset ], value.full );
+ }
+
+ return rank ? 0 : 1 ;
+}
+
+void HostThreadTeamData::
+ rendezvous_release( int64_t * const buffer
+ , int const rendezvous_step ) noexcept
+{
+ enum : int { shift_mem_cycle = 2 };
+ enum : int { size_mem_cycle = ( 01 << shift_mem_cycle ) }; // == 4
+ enum : int { mask_mem_cycle = size_mem_cycle - 1 };
+
+ // Requires:
+ // Called after team_rendezvous
+ // Called only by true == team_rendezvous(root)
+
+ // Memory fence to be sure all previous writes are complete:
+ Kokkos::memory_fence();
+
+ ((volatile int64_t*) buffer)[ rendezvous_step & mask_mem_cycle ] =
+ int64_t( rendezvous_step );
+
+ // Memory fence to push the store out
+ Kokkos::memory_fence();
+}
+
+//----------------------------------------------------------------------------
+
+int HostThreadTeamData::get_work_stealing() noexcept
+{
+ pair_int_t w( -1 , -1 );
+
+ if ( 1 == m_team_size || team_rendezvous() ) {
+
+ // Attempt first from beginning of my work range
+ for ( int attempt = m_work_range.first < m_work_range.second ; attempt ; ) {
+
+ // Query and attempt to update m_work_range
+ // from: [ w.first , w.second )
+ // to: [ w.first + 1 , w.second ) = w_new
+ //
+ // If w is invalid then is just a query.
+
+ const pair_int_t w_new( w.first + 1 , w.second );
+
+ w = Kokkos::atomic_compare_exchange( & m_work_range, w, w_new );
+
+ if ( w.first < w.second ) {
+ // m_work_range is viable
+
+ // If steal is successful then don't repeat attempt to steal
+ attempt = ! ( w_new.first == w.first + 1 &&
+ w_new.second == w.second );
+ }
+ else {
+ // m_work_range is not viable
+ w.first = -1 ;
+ w.second = -1 ;
+
+ attempt = 0 ;
+ }
+ }
+
+ if ( w.first == -1 && m_steal_rank != m_pool_rank ) {
+
+ HostThreadTeamData * const * const pool =
+ (HostThreadTeamData**)( m_pool_scratch + m_pool_members );
+
+ // Attempt from begining failed, try to steal from end of neighbor
+
+ pair_int_t volatile * steal_range =
+ & ( pool[ m_steal_rank ]->m_work_range );
+
+ for ( int attempt = true ; attempt ; ) {
+
+ // Query and attempt to update steal_work_range
+ // from: [ w.first , w.second )
+ // to: [ w.first , w.second - 1 ) = w_new
+ //
+ // If w is invalid then is just a query.
+
+ const pair_int_t w_new( w.first , w.second - 1 );
+
+ w = Kokkos::atomic_compare_exchange( steal_range, w, w_new );
+
+ if ( w.first < w.second ) {
+ // steal_work_range is viable
+
+ // If steal is successful then don't repeat attempt to steal
+ attempt = ! ( w_new.first == w.first &&
+ w_new.second == w.second - 1 );
+ }
+ else {
+ // steal_work_range is not viable, move to next member
+ w.first = -1 ;
+ w.second = -1 ;
+
+ // We need to figure out whether the next team is active
+ // m_steal_rank + m_team_alloc could be the next base_rank to steal from
+ // but only if there are another m_team_size threads available so that that
+ // base rank has a full team.
+ m_steal_rank = m_steal_rank + m_team_alloc + m_team_size <= m_pool_size ?
+ m_steal_rank + m_team_alloc : 0;
+
+ steal_range = & ( pool[ m_steal_rank ]->m_work_range );
+
+ // If tried all other members then don't repeat attempt to steal
+ attempt = m_steal_rank != m_pool_rank ;
+ }
+ }
+
+ if ( w.first != -1 ) w.first = w.second - 1 ;
+ }
+
+ if ( 1 < m_team_size ) {
+ // Must share the work index
+ *((int volatile *) team_reduce()) = w.first ;
+
+ team_rendezvous_release();
+ }
+ }
+ else if ( 1 < m_team_size ) {
+ w.first = *((int volatile *) team_reduce());
+ }
+
+ // May exit because successfully stole work and w is good.
+ // May exit because no work left to steal and w = (-1,-1).
+
+#if 0
+fprintf(stdout,"HostThreadTeamData::get_work_stealing() pool(%d of %d) %d\n"
+ , m_pool_rank , m_pool_size , w.first );
+fflush(stdout);
+#endif
+
+ return w.first ;
+}
+
+} // namespace Impl
+} // namespace Kokkos
+
diff --git a/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.hpp b/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.hpp
new file mode 100644
index 000000000..6b5918eae
--- /dev/null
+++ b/lib/kokkos/core/src/impl/Kokkos_HostThreadTeam.hpp
@@ -0,0 +1,1090 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#ifndef KOKKOS_IMPL_HOSTTHREADTEAM_HPP
+#define KOKKOS_IMPL_HOSTTHREADTEAM_HPP
+
+#include <Kokkos_Core_fwd.hpp>
+#include <Kokkos_Pair.hpp>
+#include <Kokkos_Atomic.hpp>
+#include <Kokkos_ExecPolicy.hpp>
+#include <impl/Kokkos_FunctorAdapter.hpp>
+#include <impl/Kokkos_Reducer.hpp>
+#include <impl/Kokkos_FunctorAnalysis.hpp>
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+template< class HostExecSpace >
+class HostThreadTeamMember ;
+
+class HostThreadTeamData {
+public:
+
+ template< class > friend class HostThreadTeamMember ;
+
+ // Assume upper bounds on number of threads:
+ // pool size <= 1024 threads
+ // pool rendezvous <= ( 1024 / 8 ) * 4 + 4 = 2052
+ // team size <= 64 threads
+ // team rendezvous <= ( 64 / 8 ) * 4 + 4 = 36
+
+ enum : int { max_pool_members = 1024 };
+ enum : int { max_team_members = 64 };
+ enum : int { max_pool_rendezvous = ( max_pool_members / 8 ) * 4 + 4 };
+ enum : int { max_team_rendezvous = ( max_team_members / 8 ) * 4 + 4 };
+
+private:
+
+ // per-thread scratch memory buffer chunks:
+ //
+ // [ pool_members ] = [ m_pool_members .. m_pool_rendezvous )
+ // [ pool_rendezvous ] = [ m_pool_rendezvous .. m_team_rendezvous )
+ // [ team_rendezvous ] = [ m_team_rendezvous .. m_pool_reduce )
+ // [ pool_reduce ] = [ m_pool_reduce .. m_team_reduce )
+ // [ team_reduce ] = [ m_team_reduce .. m_team_shared )
+ // [ team_shared ] = [ m_team_shared .. m_thread_local )
+ // [ thread_local ] = [ m_thread_local .. m_scratch_size )
+
+ enum : int { m_pool_members = 0 };
+ enum : int { m_pool_rendezvous = m_pool_members + max_pool_members };
+ enum : int { m_team_rendezvous = m_pool_rendezvous + max_pool_rendezvous };
+ enum : int { m_pool_reduce = m_team_rendezvous + max_team_rendezvous };
+
+ using pair_int_t = Kokkos::pair<int,int> ;
+
+ pair_int_t m_work_range ;
+ int64_t m_work_end ;
+ int64_t * m_scratch ; // per-thread buffer
+ int64_t * m_pool_scratch ; // == pool[0]->m_scratch
+ int64_t * m_team_scratch ; // == pool[ 0 + m_team_base ]->m_scratch
+ int m_pool_rank ;
+ int m_pool_size ;
+ int m_team_reduce ;
+ int m_team_shared ;
+ int m_thread_local ;
+ int m_scratch_size ;
+ int m_team_base ;
+ int m_team_rank ;
+ int m_team_size ;
+ int m_team_alloc ;
+ int m_league_rank ;
+ int m_league_size ;
+ int m_work_chunk ;
+ int m_steal_rank ; // work stealing rank
+ int mutable m_pool_rendezvous_step ;
+ int mutable m_team_rendezvous_step ;
+
+ HostThreadTeamData * team_member( int r ) const noexcept
+ { return ((HostThreadTeamData**)(m_pool_scratch+m_pool_members))[m_team_base+r]; }
+
+ // Rendezvous pattern:
+ // if ( rendezvous(root) ) {
+ // ... only root thread here while all others wait ...
+ // rendezvous_release();
+ // }
+ // else {
+ // ... all other threads release here ...
+ // }
+ //
+ // Requires: buffer[ ( max_threads / 8 ) * 4 + 4 ]; 0 == max_threads % 8
+ //
+ static
+ int rendezvous( int64_t * const buffer
+ , int & rendezvous_step
+ , int const size
+ , int const rank ) noexcept ;
+
+ static
+ void rendezvous_release( int64_t * const buffer
+ , int const rendezvous_step ) noexcept ;
+
+public:
+
+ inline
+ int team_rendezvous( int const root ) const noexcept
+ {
+ return 1 == m_team_size ? 1 :
+ rendezvous( m_team_scratch + m_team_rendezvous
+ , m_team_rendezvous_step
+ , m_team_size
+ , ( m_team_rank + m_team_size - root ) % m_team_size );
+ }
+
+ inline
+ int team_rendezvous() const noexcept
+ {
+ return 1 == m_team_size ? 1 :
+ rendezvous( m_team_scratch + m_team_rendezvous
+ , m_team_rendezvous_step
+ , m_team_size
+ , m_team_rank );
+ }
+
+ inline
+ void team_rendezvous_release() const noexcept
+ {
+ if ( 1 < m_team_size ) {
+ rendezvous_release( m_team_scratch + m_team_rendezvous
+ , m_team_rendezvous_step );
+ }
+ }
+
+ inline
+ int pool_rendezvous() const noexcept
+ {
+ return 1 == m_pool_size ? 1 :
+ rendezvous( m_pool_scratch + m_pool_rendezvous
+ , m_pool_rendezvous_step
+ , m_pool_size
+ , m_pool_rank );
+ }
+
+ inline
+ void pool_rendezvous_release() const noexcept
+ {
+ if ( 1 < m_pool_size ) {
+ rendezvous_release( m_pool_scratch + m_pool_rendezvous
+ , m_pool_rendezvous_step );
+ }
+ }
+
+ //----------------------------------------
+
+ constexpr HostThreadTeamData() noexcept
+ : m_work_range(-1,-1)
+ , m_work_end(0)
+ , m_scratch(0)
+ , m_pool_scratch(0)
+ , m_team_scratch(0)
+ , m_pool_rank(0)
+ , m_pool_size(1)
+ , m_team_reduce(0)
+ , m_team_shared(0)
+ , m_thread_local(0)
+ , m_scratch_size(0)
+ , m_team_base(0)
+ , m_team_rank(0)
+ , m_team_size(1)
+ , m_team_alloc(1)
+ , m_league_rank(0)
+ , m_league_size(1)
+ , m_work_chunk(0)
+ , m_steal_rank(0)
+ , m_pool_rendezvous_step(0)
+ , m_team_rendezvous_step(0)
+ {}
+
+ //----------------------------------------
+ // Organize array of members into a pool.
+ // The 0th member is the root of the pool.
+ // Requires: members are not already in a pool.
+ // Requires: called by one thread.
+ // Pool members are ordered as "close" - sorted by NUMA and then CORE
+ // Each thread is its own team with team_size == 1.
+ static void organize_pool( HostThreadTeamData * members[]
+ , const int size );
+
+ // Called by each thread within the pool
+ void disband_pool();
+
+ //----------------------------------------
+ // Each thread within a pool organizes itself into a team.
+ // Must be called by all threads of the pool.
+ // Organizing threads into a team performs a barrier across the
+ // entire pool to insure proper initialization of the team
+ // rendezvous mechanism before a team rendezvous can be performed.
+ //
+ // Return true if a valid member of a team.
+ // Return false if not a member and thread should be idled.
+ int organize_team( const int team_size );
+
+ // Each thread within a pool disbands itself from current team.
+ // Each thread becomes its own team with team_size == 1.
+ // Must be called by all threads of the pool.
+ void disband_team();
+
+ //----------------------------------------
+
+ constexpr int pool_rank() const { return m_pool_rank ; }
+ constexpr int pool_size() const { return m_pool_size ; }
+
+ HostThreadTeamData * pool_member( int r ) const noexcept
+ { return ((HostThreadTeamData**)(m_pool_scratch+m_pool_members))[r]; }
+
+ //----------------------------------------
+
+private:
+
+ enum : int { mask_to_16 = 0x0f }; // align to 16 bytes
+ enum : int { shift_to_8 = 3 }; // size to 8 bytes
+
+public:
+
+ static constexpr int align_to_int64( int n )
+ { return ( ( n + mask_to_16 ) & ~mask_to_16 ) >> shift_to_8 ; }
+
+ constexpr int pool_reduce_bytes() const
+ { return m_scratch_size ? sizeof(int64_t) * ( m_team_reduce - m_pool_reduce ) : 0 ; }
+
+ constexpr int team_reduce_bytes() const
+ { return sizeof(int64_t) * ( m_team_shared - m_team_reduce ); }
+
+ constexpr int team_shared_bytes() const
+ { return sizeof(int64_t) * ( m_thread_local - m_team_shared ); }
+
+ constexpr int thread_local_bytes() const
+ { return sizeof(int64_t) * ( m_scratch_size - m_thread_local ); }
+
+ constexpr int scratch_bytes() const
+ { return sizeof(int64_t) * m_scratch_size ; }
+
+ // Memory chunks:
+
+ int64_t * scratch_buffer() const noexcept
+ { return m_scratch ; }
+
+ int64_t * pool_reduce() const noexcept
+ { return m_pool_scratch + m_pool_reduce ; }
+
+ int64_t * pool_reduce_local() const noexcept
+ { return m_scratch + m_pool_reduce ; }
+
+ int64_t * team_reduce() const noexcept
+ { return m_team_scratch + m_team_reduce ; }
+
+ int64_t * team_reduce_local() const noexcept
+ { return m_scratch + m_team_reduce ; }
+
+ int64_t * team_shared() const noexcept
+ { return m_team_scratch + m_team_shared ; }
+
+ int64_t * local_scratch() const noexcept
+ { return m_scratch + m_thread_local ; }
+
+ // Given:
+ // pool_reduce_size = number bytes for pool reduce
+ // team_reduce_size = number bytes for team reduce
+ // team_shared_size = number bytes for team shared memory
+ // thread_local_size = number bytes for thread local memory
+ // Return:
+ // total number of bytes that must be allocated
+ static
+ size_t scratch_size( int pool_reduce_size
+ , int team_reduce_size
+ , int team_shared_size
+ , int thread_local_size )
+ {
+ pool_reduce_size = align_to_int64( pool_reduce_size );
+ team_reduce_size = align_to_int64( team_reduce_size );
+ team_shared_size = align_to_int64( team_shared_size );
+ thread_local_size = align_to_int64( thread_local_size );
+
+ const size_t total_bytes = (
+ m_pool_reduce +
+ pool_reduce_size +
+ team_reduce_size +
+ team_shared_size +
+ thread_local_size ) * sizeof(int64_t);
+
+ return total_bytes ;
+ }
+
+ // Given:
+ // alloc_ptr = pointer to allocated memory
+ // alloc_size = number bytes of allocated memory
+ // pool_reduce_size = number bytes for pool reduce/scan operations
+ // team_reduce_size = number bytes for team reduce/scan operations
+ // team_shared_size = number bytes for team-shared memory
+ // thread_local_size = number bytes for thread-local memory
+ // Return:
+ // total number of bytes that must be allocated
+ void scratch_assign( void * const alloc_ptr
+ , size_t const alloc_size
+ , int pool_reduce_size
+ , int team_reduce_size
+ , int team_shared_size
+ , int /* thread_local_size */ )
+ {
+ pool_reduce_size = align_to_int64( pool_reduce_size );
+ team_reduce_size = align_to_int64( team_reduce_size );
+ team_shared_size = align_to_int64( team_shared_size );
+ // thread_local_size = align_to_int64( thread_local_size );
+
+ m_scratch = (int64_t *) alloc_ptr ;
+ m_team_reduce = m_pool_reduce + pool_reduce_size ;
+ m_team_shared = m_team_reduce + team_reduce_size ;
+ m_thread_local = m_team_shared + team_shared_size ;
+ m_scratch_size = align_to_int64( alloc_size );
+
+#if 0
+fprintf(stdout,"HostThreadTeamData::scratch_assign { %d %d %d %d %d %d %d }\n"
+ , int(m_pool_members)
+ , int(m_pool_rendezvous)
+ , int(m_pool_reduce)
+ , int(m_team_reduce)
+ , int(m_team_shared)
+ , int(m_thread_local)
+ , int(m_scratch_size)
+ );
+fflush(stdout);
+#endif
+
+ }
+
+ //----------------------------------------
+ // Get a work index within the range.
+ // First try to steal from beginning of own teams's partition.
+ // If that fails then try to steal from end of another teams' partition.
+ int get_work_stealing() noexcept ;
+
+ //----------------------------------------
+ // Set the initial work partitioning of [ 0 .. length ) among the teams
+ // with granularity of chunk
+
+ void set_work_partition( int64_t const length
+ , int const chunk ) noexcept
+ {
+ // Minimum chunk size to insure that
+ // m_work_end < std::numeric_limits<int>::max() * m_work_chunk
+
+ int const chunk_min = ( length + std::numeric_limits<int>::max() )
+ / std::numeric_limits<int>::max();
+
+ m_work_end = length ;
+ m_work_chunk = std::max( chunk , chunk_min );
+
+ // Number of work chunks and partitioning of that number:
+ int const num = ( m_work_end + m_work_chunk - 1 ) / m_work_chunk ;
+ int const part = ( num + m_league_size - 1 ) / m_league_size ;
+
+ m_work_range.first = part * m_league_rank ;
+ m_work_range.second = m_work_range.first + part ;
+
+ // Steal from next team, round robin
+ // The next team is offset by m_team_alloc if it fits in the pool.
+
+ m_steal_rank = m_team_base + m_team_alloc + m_team_size <= m_pool_size ?
+ m_team_base + m_team_alloc : 0 ;
+ }
+
+ std::pair<int64_t,int64_t> get_work_partition() noexcept
+ {
+ return std::pair<int64_t,int64_t>
+ ( m_work_range.first * m_work_chunk
+ , m_work_range.second * m_work_chunk < m_work_end
+ ? m_work_range.second * m_work_chunk : m_work_end );
+ }
+
+ std::pair<int64_t,int64_t> get_work_stealing_chunk() noexcept
+ {
+ std::pair<int64_t,int64_t> x(-1,-1);
+
+ const int i = get_work_stealing();
+
+ if ( 0 <= i ) {
+ x.first = m_work_chunk * i ;
+ x.second = x.first + m_work_chunk < m_work_end
+ ? x.first + m_work_chunk : m_work_end ;
+ }
+
+ return x ;
+ }
+};
+
+//----------------------------------------------------------------------------
+
+template< class HostExecSpace >
+class HostThreadTeamMember {
+public:
+
+ using scratch_memory_space = typename HostExecSpace::scratch_memory_space ;
+
+private:
+
+ scratch_memory_space m_scratch ;
+ HostThreadTeamData & m_data ;
+ int const m_league_rank ;
+ int const m_league_size ;
+
+public:
+
+ constexpr HostThreadTeamMember( HostThreadTeamData & arg_data ) noexcept
+ : m_scratch( arg_data.team_shared() , arg_data.team_shared_bytes() )
+ , m_data( arg_data )
+ , m_league_rank(0)
+ , m_league_size(1)
+ {}
+
+ constexpr HostThreadTeamMember( HostThreadTeamData & arg_data
+ , int const arg_league_rank
+ , int const arg_league_size
+ ) noexcept
+ : m_scratch( arg_data.team_shared()
+ , arg_data.team_shared_bytes()
+ , arg_data.team_shared()
+ , arg_data.team_shared_bytes() )
+ , m_data( arg_data )
+ , m_league_rank( arg_league_rank )
+ , m_league_size( arg_league_size )
+ {}
+
+ ~HostThreadTeamMember() = default ;
+ HostThreadTeamMember() = delete ;
+ HostThreadTeamMember( HostThreadTeamMember && ) = default ;
+ HostThreadTeamMember( HostThreadTeamMember const & ) = default ;
+ HostThreadTeamMember & operator = ( HostThreadTeamMember && ) = default ;
+ HostThreadTeamMember & operator = ( HostThreadTeamMember const & ) = default ;
+
+ //----------------------------------------
+
+ KOKKOS_INLINE_FUNCTION
+ int team_rank() const noexcept { return m_data.m_team_rank ; }
+
+ KOKKOS_INLINE_FUNCTION
+ int team_size() const noexcept { return m_data.m_team_size ; }
+
+ KOKKOS_INLINE_FUNCTION
+ int league_rank() const noexcept { return m_league_rank ; }
+
+ KOKKOS_INLINE_FUNCTION
+ int league_size() const noexcept { return m_league_size ; }
+
+ //----------------------------------------
+
+ KOKKOS_INLINE_FUNCTION
+ const scratch_memory_space & team_shmem() const
+ { return m_scratch.set_team_thread_mode(0,1,0); }
+
+ KOKKOS_INLINE_FUNCTION
+ const scratch_memory_space & team_scratch(int) const
+ { return m_scratch.set_team_thread_mode(0,1,0); }
+
+ KOKKOS_INLINE_FUNCTION
+ const scratch_memory_space & thread_scratch(int) const
+ { return m_scratch.set_team_thread_mode(0,m_data.m_team_size,m_data.m_team_rank); }
+
+ //----------------------------------------
+ // Team collectives
+
+ KOKKOS_INLINE_FUNCTION void team_barrier() const noexcept
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ {
+ if ( m_data.team_rendezvous() ) m_data.team_rendezvous_release();
+ }
+#else
+ {}
+#endif
+
+ template< class Closure >
+ KOKKOS_INLINE_FUNCTION
+ void team_barrier( Closure const & f ) const noexcept
+ {
+ if ( m_data.team_rendezvous() ) {
+
+ // All threads have entered 'team_rendezvous'
+ // only this thread returned from 'team_rendezvous'
+ // with a return value of 'true'
+
+ f();
+
+ m_data.team_rendezvous_release();
+ }
+ }
+
+ //--------------------------------------------------------------------------
+
+ template< typename T >
+ KOKKOS_INLINE_FUNCTION
+ void team_broadcast( T & value , const int source_team_rank ) const noexcept
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ {
+ if ( 1 < m_data.m_team_size ) {
+ T volatile * const shared_value = (T*) m_data.team_reduce();
+
+ // Don't overwrite shared memory until all threads arrive
+
+ if ( m_data.team_rendezvous( source_team_rank ) ) {
+ // All threads have entered 'team_rendezvous'
+ // only this thread returned from 'team_rendezvous'
+ // with a return value of 'true'
+
+ *shared_value = value ;
+
+ m_data.team_rendezvous_release();
+ // This thread released all other threads from 'team_rendezvous'
+ // with a return value of 'false'
+ }
+ else {
+ value = *shared_value ;
+ }
+ }
+ }
+#else
+ { Kokkos::abort("HostThreadTeamMember team_broadcast\n"); }
+#endif
+
+ //--------------------------------------------------------------------------
+
+ template< class Closure , typename T >
+ KOKKOS_INLINE_FUNCTION
+ void team_broadcast( Closure const & f , T & value , const int source_team_rank) const noexcept
+ {
+ T volatile * const shared_value = (T*) m_data.team_reduce();
+
+ // Don't overwrite shared memory until all threads arrive
+
+ if ( m_data.team_rendezvous(source_team_rank) ) {
+
+ // All threads have entered 'team_rendezvous'
+ // only this thread returned from 'team_rendezvous'
+ // with a return value of 'true'
+
+ f( value );
+
+ if ( 1 < m_data.m_team_size ) { *shared_value = value ; }
+
+ m_data.team_rendezvous_release();
+ // This thread released all other threads from 'team_rendezvous'
+ // with a return value of 'false'
+ }
+ else {
+ value = *shared_value ;
+ }
+ }
+
+ //--------------------------------------------------------------------------
+ // team_reduce( Sum(result) );
+ // team_reduce( Min(result) );
+ // team_reduce( Max(result) );
+
+ template< typename ReducerType >
+ KOKKOS_INLINE_FUNCTION
+ typename std::enable_if< is_reducer< ReducerType >::value >::type
+ team_reduce( ReducerType const & reducer ) const noexcept
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ {
+ if ( 1 < m_data.m_team_size ) {
+
+ using value_type = typename ReducerType::value_type ;
+
+ if ( 0 != m_data.m_team_rank ) {
+ // Non-root copies to their local buffer:
+ reducer.copy( (value_type*) m_data.team_reduce_local()
+ , reducer.data() );
+ }
+
+ // Root does not overwrite shared memory until all threads arrive
+ // and copy to their local buffer.
+
+ if ( m_data.team_rendezvous() ) {
+ // All threads have entered 'team_rendezvous'
+ // only this thread returned from 'team_rendezvous'
+ // with a return value of 'true'
+ //
+ // This thread sums contributed values
+ for ( int i = 1 ; i < m_data.m_team_size ; ++i ) {
+ value_type * const src =
+ (value_type*) m_data.team_member(i)->team_reduce_local();
+
+ reducer.join( reducer.data() , src );
+ }
+
+ // Copy result to root member's buffer:
+ reducer.copy( (value_type*) m_data.team_reduce() , reducer.data() );
+
+ m_data.team_rendezvous_release();
+ // This thread released all other threads from 'team_rendezvous'
+ // with a return value of 'false'
+ }
+ else {
+ // Copy from root member's buffer:
+ reducer.copy( reducer.data() , (value_type*) m_data.team_reduce() );
+ }
+ }
+ }
+#else
+ { Kokkos::abort("HostThreadTeamMember team_reduce\n"); }
+#endif
+
+ //--------------------------------------------------------------------------
+
+ template< typename ValueType , class JoinOp >
+ KOKKOS_INLINE_FUNCTION
+ ValueType
+ team_reduce( ValueType const & value
+ , JoinOp const & join ) const noexcept
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ {
+ if ( 0 != m_data.m_team_rank ) {
+ // Non-root copies to their local buffer:
+ *((ValueType*) m_data.team_reduce_local()) = value ;
+ }
+
+ // Root does not overwrite shared memory until all threads arrive
+ // and copy to their local buffer.
+
+ if ( m_data.team_rendezvous() ) {
+ const Impl::Reducer< ValueType , JoinOp > reducer( join );
+
+ // All threads have entered 'team_rendezvous'
+ // only this thread returned from 'team_rendezvous'
+ // with a return value of 'true'
+ //
+ // This thread sums contributed values
+
+ ValueType * const dst = (ValueType*) m_data.team_reduce_local();
+
+ *dst = value ;
+
+ for ( int i = 1 ; i < m_data.m_team_size ; ++i ) {
+ ValueType * const src =
+ (ValueType*) m_data.team_member(i)->team_reduce_local();
+
+ reducer.join( dst , src );
+ }
+
+ m_data.team_rendezvous_release();
+ // This thread released all other threads from 'team_rendezvous'
+ // with a return value of 'false'
+ }
+
+ return *((ValueType*) m_data.team_reduce());
+ }
+#else
+ { Kokkos::abort("HostThreadTeamMember team_reduce\n"); return ValueType(); }
+#endif
+
+
+ template< typename T >
+ KOKKOS_INLINE_FUNCTION
+ T team_scan( T const & value , T * const global = 0 ) const noexcept
+#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ {
+ if ( 0 != m_data.m_team_rank ) {
+ // Non-root copies to their local buffer:
+ ((T*) m_data.team_reduce_local())[1] = value ;
+ }
+
+ // Root does not overwrite shared memory until all threads arrive
+ // and copy to their local buffer.
+
+ if ( m_data.team_rendezvous() ) {
+ // All threads have entered 'team_rendezvous'
+ // only this thread returned from 'team_rendezvous'
+ // with a return value of 'true'
+ //
+ // This thread scans contributed values
+
+ {
+ T * prev = (T*) m_data.team_reduce_local();
+
+ prev[0] = 0 ;
+ prev[1] = value ;
+
+ for ( int i = 1 ; i < m_data.m_team_size ; ++i ) {
+ T * const ptr = (T*) m_data.team_member(i)->team_reduce_local();
+
+ ptr[0] = prev[0] + prev[1] ;
+
+ prev = ptr ;
+ }
+ }
+
+ // If adding to global value then atomic_fetch_add to that value
+ // and sum previous value to every entry of the scan.
+ if ( global ) {
+ T * prev = (T*) m_data.team_reduce_local();
+
+ {
+ T * ptr = (T*) m_data.team_member( m_data.m_team_size - 1 )->team_reduce_local();
+ prev[0] = Kokkos::atomic_fetch_add( global , ptr[0] + ptr[1] );
+ }
+
+ for ( int i = 1 ; i < m_data.m_team_size ; ++i ) {
+ T * ptr = (T*) m_data.team_member(i)->team_reduce_local();
+ ptr[0] += prev[0] ;
+ }
+ }
+
+ m_data.team_rendezvous_release();
+ }
+
+ return ((T*) m_data.team_reduce_local())[0];
+ }
+#else
+ { Kokkos::abort("HostThreadTeamMember team_scan\n"); return T(); }
+#endif
+
+};
+
+
+}} /* namespace Kokkos::Impl */
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+template<class Space,typename iType>
+KOKKOS_INLINE_FUNCTION
+Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >
+TeamThreadRange( Impl::HostThreadTeamMember<Space> const & member
+ , iType const & count )
+{
+ return
+ Impl::TeamThreadRangeBoundariesStruct
+ <iType,Impl::HostThreadTeamMember<Space> >(member,0,count);
+}
+
+template<class Space, typename iType1, typename iType2>
+KOKKOS_INLINE_FUNCTION
+Impl::TeamThreadRangeBoundariesStruct
+ < typename std::common_type< iType1, iType2 >::type
+ , Impl::HostThreadTeamMember<Space> >
+TeamThreadRange( Impl::HostThreadTeamMember<Space> const & member
+ , iType1 const & begin , iType2 const & end )
+{
+ return
+ Impl::TeamThreadRangeBoundariesStruct
+ < typename std::common_type< iType1, iType2 >::type
+ , Impl::HostThreadTeamMember<Space> >( member , begin , end );
+}
+
+template<class Space, typename iType>
+KOKKOS_INLINE_FUNCTION
+Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >
+ThreadVectorRange
+ ( Impl::HostThreadTeamMember<Space> const & member
+ , const iType & count )
+{
+ return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >(member,count);
+}
+
+//----------------------------------------------------------------------------
+/** \brief Inter-thread parallel_for.
+ *
+ * Executes lambda(iType i) for each i=[0..N)
+ *
+ * The range [0..N) is mapped to all threads of the the calling thread team.
+*/
+template<typename iType, class Space, class Closure>
+KOKKOS_INLINE_FUNCTION
+void parallel_for
+ ( Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> > const & loop_boundaries
+ , Closure const & closure
+ )
+{
+ for( iType i = loop_boundaries.start
+ ; i < loop_boundaries.end
+ ; i += loop_boundaries.increment ) {
+ closure (i);
+ }
+}
+
+template<typename iType, class Space, class Closure>
+KOKKOS_INLINE_FUNCTION
+void parallel_for
+ ( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> > const & loop_boundaries
+ , Closure const & closure
+ )
+{
+ #ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
+ #pragma ivdep
+ #endif
+ for( iType i = loop_boundaries.start
+ ; i < loop_boundaries.end
+ ; i += loop_boundaries.increment ) {
+ closure (i);
+ }
+}
+
+//----------------------------------------------------------------------------
+
+template< typename iType, class Space, class Closure, class Reducer >
+KOKKOS_INLINE_FUNCTION
+typename std::enable_if< Kokkos::is_reducer< Reducer >::value >::type
+parallel_reduce
+ ( Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >
+ const & loop_boundaries
+ , Closure const & closure
+ , Reducer const & reducer
+ )
+{
+ reducer.init( reducer.data() );
+
+ for( iType i = loop_boundaries.start
+ ; i < loop_boundaries.end
+ ; i += loop_boundaries.increment ) {
+ closure( i , reducer.reference() );
+ }
+
+ loop_boundaries.thread.team_reduce( reducer );
+}
+
+template< typename iType, class Space, typename Closure, typename ValueType >
+KOKKOS_INLINE_FUNCTION
+typename std::enable_if< ! Kokkos::is_reducer<ValueType>::value >::type
+parallel_reduce
+ ( Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >
+ const & loop_boundaries
+ , Closure const & closure
+ , ValueType & result
+ )
+{
+ Impl::Reducer< ValueType , Impl::ReduceSum< ValueType > > reducer( & result );
+
+ reducer.init( reducer.data() );
+
+ for( iType i = loop_boundaries.start
+ ; i < loop_boundaries.end
+ ; i += loop_boundaries.increment ) {
+ closure( i , reducer.reference() );
+ }
+
+ loop_boundaries.thread.team_reduce( reducer );
+}
+
+template< typename iType, class Space
+ , class Closure, class Joiner , typename ValueType >
+KOKKOS_INLINE_FUNCTION
+void parallel_reduce
+ ( Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >
+ const & loop_boundaries
+ , Closure const & closure
+ , Joiner const & joiner
+ , ValueType & result
+ )
+{
+ Impl::Reducer< ValueType , Joiner > reducer( joiner , & result );
+
+ reducer.init( reducer.data() );
+
+ for( iType i = loop_boundaries.start
+ ; i < loop_boundaries.end
+ ; i += loop_boundaries.increment ) {
+ closure( i , reducer.reference() );
+ }
+
+ loop_boundaries.thread.team_reduce( reducer );
+}
+
+//----------------------------------------------------------------------------
+/** \brief Inter-thread vector parallel_reduce.
+ *
+ * Executes lambda(iType i, ValueType & val) for each i=[0..N)
+ *
+ * The range [0..N) is mapped to all threads of the
+ * calling thread team and a summation of val is
+ * performed and put into result.
+ */
+template< typename iType, class Space , class Lambda, typename ValueType >
+KOKKOS_INLINE_FUNCTION
+void parallel_reduce
+ (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >& loop_boundaries,
+ const Lambda & lambda,
+ ValueType& result)
+{
+ result = ValueType();
+#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
+#pragma ivdep
+#endif
+ for( iType i = loop_boundaries.start ;
+ i < loop_boundaries.end ;
+ i += loop_boundaries.increment) {
+ lambda(i,result);
+ }
+}
+
+/** \brief Intra-thread vector parallel_reduce.
+ *
+ * Executes lambda(iType i, ValueType & val) for each i=[0..N)
+ *
+ * The range [0..N) is mapped to all vector lanes of the the
+ * calling thread and a reduction of val is performed using
+ * JoinType(ValueType& val, const ValueType& update)
+ * and put into init_result.
+ * The input value of init_result is used as initializer for
+ * temporary variables of ValueType. Therefore * the input
+ * value should be the neutral element with respect to the
+ * join operation (e.g. '0 for +-' or * '1 for *').
+ */
+template< typename iType, class Space
+ , class Lambda, class JoinType , typename ValueType >
+KOKKOS_INLINE_FUNCTION
+void parallel_reduce
+ (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> >& loop_boundaries,
+ const Lambda & lambda,
+ const JoinType & join,
+ ValueType& result)
+{
+#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
+#pragma ivdep
+#endif
+ for( iType i = loop_boundaries.start ;
+ i < loop_boundaries.end ;
+ i += loop_boundaries.increment ) {
+ lambda(i,result);
+ }
+}
+
+//----------------------------------------------------------------------------
+
+template< typename iType, class Space, class Closure >
+KOKKOS_INLINE_FUNCTION
+void parallel_scan
+ ( Impl::TeamThreadRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> > const & loop_boundaries
+ , Closure const & closure
+ )
+{
+ // Extract ValueType from the closure
+
+ using value_type =
+ typename Kokkos::Impl::FunctorAnalysis
+ < Kokkos::Impl::FunctorPatternInterface::SCAN
+ , void
+ , Closure >::value_type ;
+
+ value_type accum = 0 ;
+
+ // Intra-member scan
+ for ( iType i = loop_boundaries.start
+ ; i < loop_boundaries.end
+ ; i += loop_boundaries.increment ) {
+ closure(i,accum,false);
+ }
+
+ // 'accum' output is the exclusive prefix sum
+ accum = loop_boundaries.thread.team_scan(accum);
+
+ for ( iType i = loop_boundaries.start
+ ; i < loop_boundaries.end
+ ; i += loop_boundaries.increment ) {
+ closure(i,accum,true);
+ }
+}
+
+
+template< typename iType, class Space, class ClosureType >
+KOKKOS_INLINE_FUNCTION
+void parallel_scan
+ ( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::HostThreadTeamMember<Space> > const & loop_boundaries
+ , ClosureType const & closure
+ )
+{
+ using value_type = typename
+ Kokkos::Impl::FunctorAnalysis
+ < Impl::FunctorPatternInterface::SCAN
+ , void
+ , ClosureType >::value_type ;
+
+ value_type scan_val = value_type();
+
+#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
+#pragma ivdep
+#endif
+ for ( iType i = loop_boundaries.start
+ ; i < loop_boundaries.end
+ ; i += loop_boundaries.increment ) {
+ closure(i,scan_val,true);
+ }
+}
+
+//----------------------------------------------------------------------------
+
+template< class Space >
+KOKKOS_INLINE_FUNCTION
+Impl::ThreadSingleStruct<Impl::HostThreadTeamMember<Space> >
+PerTeam(const Impl::HostThreadTeamMember<Space> & member )
+{
+ return Impl::ThreadSingleStruct<Impl::HostThreadTeamMember<Space> >(member);
+}
+
+template< class Space >
+KOKKOS_INLINE_FUNCTION
+Impl::VectorSingleStruct<Impl::HostThreadTeamMember<Space> >
+PerThread(const Impl::HostThreadTeamMember<Space> & member)
+{
+ return Impl::VectorSingleStruct<Impl::HostThreadTeamMember<Space> >(member);
+}
+
+template< class Space , class FunctorType >
+KOKKOS_INLINE_FUNCTION
+void single( const Impl::ThreadSingleStruct< Impl::HostThreadTeamMember<Space> > & single , const FunctorType & functor )
+{
+ if ( single.team_member.team_rank() == 0 ) functor();
+ // 'single' does not perform a barrier.
+ // single.team_member.team_barrier( functor );
+}
+
+template< class Space , class FunctorType , typename ValueType >
+KOKKOS_INLINE_FUNCTION
+void single( const Impl::ThreadSingleStruct< Impl::HostThreadTeamMember<Space> > & single , const FunctorType & functor , ValueType & val )
+{
+ single.team_member.team_broadcast( functor , val , 0 );
+}
+
+template< class Space , class FunctorType >
+KOKKOS_INLINE_FUNCTION
+void single( const Impl::VectorSingleStruct< Impl::HostThreadTeamMember<Space> > & , const FunctorType & functor )
+{
+ functor();
+}
+
+template< class Space , class FunctorType , typename ValueType >
+KOKKOS_INLINE_FUNCTION
+void single( const Impl::VectorSingleStruct< Impl::HostThreadTeamMember<Space> > & , const FunctorType & functor , ValueType & val )
+{
+ functor(val);
+}
+
+} /* namespace Kokkos */
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+#endif /* #ifndef KOKKOS_IMPL_HOSTTHREADTEAM_HPP */
+
diff --git a/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp b/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp
index 84cf536bb..7489018ac 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Memory_Fence.hpp
@@ -1,107 +1,111 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ATOMIC_HPP ) && ! defined( KOKKOS_MEMORY_FENCE_HPP )
#define KOKKOS_MEMORY_FENCE_HPP
namespace Kokkos {
//----------------------------------------------------------------------------
KOKKOS_FORCEINLINE_FUNCTION
void memory_fence()
{
#if defined( __CUDA_ARCH__ )
__threadfence();
+#elif defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_ENABLE_ISA_X86_64 )
+ asm volatile (
+ "mfence" ::: "memory"
+ );
#elif defined( KOKKOS_ENABLE_GNU_ATOMICS ) || \
( defined( KOKKOS_COMPILER_NVCC ) && defined( KOKKOS_ENABLE_INTEL_ATOMICS ) )
__sync_synchronize();
#elif defined( KOKKOS_ENABLE_INTEL_ATOMICS )
_mm_mfence();
#elif defined( KOKKOS_ENABLE_OPENMP_ATOMICS )
#pragma omp flush
#elif defined( KOKKOS_ENABLE_WINDOWS_ATOMICS )
MemoryBarrier();
#else
#error "Error: memory_fence() not defined"
#endif
}
//////////////////////////////////////////////////////
// store_fence()
//
// If possible use a store fence on the architecture, if not run a full memory fence
KOKKOS_FORCEINLINE_FUNCTION
void store_fence()
{
#if defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_ENABLE_ISA_X86_64 )
asm volatile (
- "sfence" ::: "memory"
- );
+ "sfence" ::: "memory"
+ );
#else
memory_fence();
#endif
}
//////////////////////////////////////////////////////
// load_fence()
//
// If possible use a load fence on the architecture, if not run a full memory fence
KOKKOS_FORCEINLINE_FUNCTION
void load_fence()
{
#if defined( KOKKOS_ENABLE_ASM ) && defined( KOKKOS_ENABLE_ISA_X86_64 )
asm volatile (
- "lfence" ::: "memory"
- );
+ "lfence" ::: "memory"
+ );
#else
memory_fence();
#endif
}
} // namespace kokkos
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_OldMacros.hpp b/lib/kokkos/core/src/impl/Kokkos_OldMacros.hpp
index da95c943f..5852efb01 100644
--- a/lib/kokkos/core/src/impl/Kokkos_OldMacros.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_OldMacros.hpp
@@ -1,447 +1,447 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_IMPL_OLD_MACROS_HPP
#define KOKKOS_IMPL_OLD_MACROS_HPP
#ifdef KOKKOS_ATOMICS_USE_CUDA
#ifndef KOKKOS_ENABLE_CUDA_ATOMICS
#define KOKKOS_ENABLE_CUDA_ATOMICS KOKKOS_ATOMICS_USE_CUDA
#endif
#endif
#ifdef KOKKOS_ATOMICS_USE_GCC
#ifndef KOKKOS_ENABLE_GNU_ATOMICS
#define KOKKOS_ENABLE_GNU_ATOMICS KOKKOS_ATOMICS_USE_GCC
#endif
#endif
#ifdef KOKKOS_ATOMICS_USE_GNU
#ifndef KOKKOS_ENABLE_GNU_ATOMICS
#define KOKKOS_ENABLE_GNU_ATOMICS KOKKOS_ATOMICS_USE_GNU
#endif
#endif
#ifdef KOKKOS_ATOMICS_USE_INTEL
#ifndef KOKKOS_ENABLE_INTEL_ATOMICS
#define KOKKOS_ENABLE_INTEL_ATOMICS KOKKOS_ATOMICS_USE_INTEL
#endif
#endif
#ifdef KOKKOS_ATOMICS_USE_OMP31
#ifndef KOKKOS_ENABLE_OPENMP_ATOMICS
#define KOKKOS_ENABLE_OPENMP_ATOMICS KOKKOS_ATOMICS_USE_OMP31
#endif
#endif
#ifdef KOKKOS_ATOMICS_USE_OPENMP31
#ifndef KOKKOS_ENABLE_OPENMP_ATOMICS
#define KOKKOS_ENABLE_OPENMP_ATOMICS KOKKOS_ATOMICS_USE_OPENMP31
#endif
#endif
#ifdef KOKKOS_ATOMICS_USE_WINDOWS
#ifndef KOKKOS_ENABLE_WINDOWS_ATOMICS
#define KOKKOS_ENABLE_WINDOWS_ATOMICS KOKKOS_ATOMICS_USE_WINDOWS
#endif
#endif
#ifdef KOKKOS_CUDA_CLANG_WORKAROUND
#ifndef KOKKOS_IMPL_CUDA_CLANG_WORKAROUND
#define KOKKOS_IMPL_CUDA_CLANG_WORKAROUND KOKKOS_CUDA_CLANG_WORKAROUND
#endif
#endif
#ifdef KOKKOS_CUDA_USE_LAMBDA
#ifndef KOKKOS_ENABLE_CUDA_LAMBDA
#define KOKKOS_ENABLE_CUDA_LAMBDA KOKKOS_CUDA_USE_LAMBDA
#endif
#endif
#ifdef KOKKOS_CUDA_USE_LDG_INTRINSIC
#ifndef KOKKOS_ENABLE_CUDA_LDG_INTRINSIC
#define KOKKOS_ENABLE_CUDA_LDG_INTRINSIC KOKKOS_CUDA_USE_LDG_INTRINSIC
#endif
#endif
#ifdef KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE
#ifndef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
#define KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE
#endif
#endif
#ifdef KOKKOS_CUDA_USE_UVM
#ifndef KOKKOS_ENABLE_CUDA_UVM
#define KOKKOS_ENABLE_CUDA_UVM KOKKOS_CUDA_USE_UVM
#endif
#endif
#ifdef KOKKOS_HAVE_CUDA
#ifndef KOKKOS_ENABLE_CUDA
#define KOKKOS_ENABLE_CUDA KOKKOS_HAVE_CUDA
#endif
#endif
#ifdef KOKKOS_HAVE_CUDA_LAMBDA
#ifndef KOKKOS_ENABLE_CUDA_LAMBDA
#define KOKKOS_ENABLE_CUDA_LAMBDA KOKKOS_HAVE_CUDA_LAMBDA
#endif
#endif
#ifdef KOKKOS_HAVE_CUDA_RDC
-#ifndef KOKKOS_ENABLE_CUDA_RDC
-#define KOKKOS_ENABLE_CUDA_RDC KOKKOS_HAVE_CUDA_RDC
+#ifndef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE
+#define KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE KOKKOS_HAVE_CUDA_RDC
#endif
#endif
#ifdef KOKKOS_HAVE_CUSPARSE
#ifndef KOKKOS_ENABLE_CUSPARSE
#define KOKKOS_ENABLE_CUSPARSE KOKKOS_HAVE_CUSPARSE
#endif
#endif
#ifdef KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
#ifndef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
#define KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
#endif
#endif
#ifdef KOKKOS_HAVE_CXX1Z
#ifndef KOKKOS_ENABLE_CXX1Z
#define KOKKOS_ENABLE_CXX1Z KOKKOS_HAVE_CXX1Z
#endif
#endif
#ifdef KOKKOS_HAVE_DEBUG
#ifndef KOKKOS_DEBUG
#define KOKKOS_DEBUG KOKKOS_HAVE_DEBUG
#endif
#endif
#ifdef KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_CUDA
#ifndef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA
#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_CUDA
#endif
#endif
#ifdef KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP
#ifndef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP
#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_OPENMP
#endif
#endif
#ifdef KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL
#ifndef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL
#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_SERIAL
#endif
#endif
#ifdef KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS
#ifndef KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS
#define KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS KOKKOS_HAVE_DEFAULT_DEVICE_TYPE_THREADS
#endif
#endif
#ifdef KOKKOS_HAVE_HBWSPACE
#ifndef KOKKOS_ENABLE_HBWSPACE
#define KOKKOS_ENABLE_HBWSPACE KOKKOS_HAVE_HBWSPACE
#endif
#endif
#ifdef KOKKOS_HAVE_HWLOC
#ifndef KOKKOS_ENABLE_HWLOC
#define KOKKOS_ENABLE_HWLOC KOKKOS_HAVE_HWLOC
#endif
#endif
#ifdef KOKKOS_HAVE_MPI
#ifndef KOKKOS_ENABLE_MPI
#define KOKKOS_ENABLE_MPI KOKKOS_HAVE_MPI
#endif
#endif
#ifdef KOKKOS_HAVE_OPENMP
#ifndef KOKKOS_ENABLE_OPENMP
#define KOKKOS_ENABLE_OPENMP KOKKOS_HAVE_OPENMP
#endif
#endif
#ifdef KOKKOS_HAVE_PRAGMA_IVDEP
#ifndef KOKKOS_ENABLE_PRAGMA_IVDEP
#define KOKKOS_ENABLE_PRAGMA_IVDEP KOKKOS_HAVE_PRAGMA_IVDEP
#endif
#endif
#ifdef KOKKOS_HAVE_PRAGMA_LOOPCOUNT
#ifndef KOKKOS_ENABLE_PRAGMA_LOOPCOUNT
#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT KOKKOS_HAVE_PRAGMA_LOOPCOUNT
#endif
#endif
#ifdef KOKKOS_HAVE_PRAGMA_SIMD
#ifndef KOKKOS_ENABLE_PRAGMA_SIMD
#define KOKKOS_ENABLE_PRAGMA_SIMD KOKKOS_HAVE_PRAGMA_SIMD
#endif
#endif
#ifdef KOKKOS_HAVE_PRAGMA_UNROLL
#ifndef KOKKOS_ENABLE_PRAGMA_UNROLL
#define KOKKOS_ENABLE_PRAGMA_UNROLL KOKKOS_HAVE_PRAGMA_UNROLL
#endif
#endif
#ifdef KOKKOS_HAVE_PRAGMA_VECTOR
#ifndef KOKKOS_ENABLE_PRAGMA_VECTOR
#define KOKKOS_ENABLE_PRAGMA_VECTOR KOKKOS_HAVE_PRAGMA_VECTOR
#endif
#endif
#ifdef KOKKOS_HAVE_PTHREAD
#ifndef KOKKOS_ENABLE_PTHREAD
#define KOKKOS_ENABLE_PTHREAD KOKKOS_HAVE_PTHREAD
#endif
#endif
-#ifdef KOKKOS_HAVE_QTHREAD
-#ifndef KOKKOS_ENABLE_QTHREAD
-#define KOKKOS_ENABLE_QTHREAD KOKKOS_HAVE_QTHREAD
+#ifdef KOKKOS_HAVE_QTHREADS
+#ifndef KOKKOS_ENABLE_QTHREADS
+#define KOKKOS_ENABLE_QTHREADS KOKKOS_HAVE_QTHREADS
#endif
#endif
#ifdef KOKKOS_HAVE_SERIAL
#ifndef KOKKOS_ENABLE_SERIAL
#define KOKKOS_ENABLE_SERIAL KOKKOS_HAVE_SERIAL
#endif
#endif
#ifdef KOKKOS_HAVE_TYPE
#ifndef KOKKOS_IMPL_HAS_TYPE
#define KOKKOS_IMPL_HAS_TYPE KOKKOS_HAVE_TYPE
#endif
#endif
#ifdef KOKKOS_HAVE_WINTHREAD
#ifndef KOKKOS_ENABLE_WINTHREAD
#define KOKKOS_ENABLE_WINTHREAD KOKKOS_HAVE_WINTHREAD
#endif
#endif
#ifdef KOKKOS_HAVE_Winthread
#ifndef KOKKOS_ENABLE_WINTHREAD
#define KOKKOS_ENABLE_WINTHREAD KOKKOS_HAVE_Winthread
#endif
#endif
#ifdef KOKKOS_INTEL_MM_ALLOC_AVAILABLE
#ifndef KOKKOS_ENABLE_INTEL_MM_ALLOC
#define KOKKOS_ENABLE_INTEL_MM_ALLOC KOKKOS_INTEL_MM_ALLOC_AVAILABLE
#endif
#endif
#ifdef KOKKOS_MACRO_IMPL_TO_STRING
#ifndef KOKKOS_IMPL_MACRO_TO_STRING
#define KOKKOS_IMPL_MACRO_TO_STRING KOKKOS_MACRO_IMPL_TO_STRING
#endif
#endif
#ifdef KOKKOS_MACRO_TO_STRING
#ifndef KOKKOS_MACRO_TO_STRING
#define KOKKOS_MACRO_TO_STRING KOKKOS_MACRO_TO_STRING
#endif
#endif
#ifdef KOKKOS_MAY_ALIAS
#ifndef KOKKOS_IMPL_MAY_ALIAS
#define KOKKOS_IMPL_MAY_ALIAS KOKKOS_MAY_ALIAS
#endif
#endif
#ifdef KOKKOS_MDRANGE_IVDEP
#ifndef KOKKOS_IMPL_MDRANGE_IVDEP
#define KOKKOS_IMPL_MDRANGE_IVDEP KOKKOS_MDRANGE_IVDEP
#endif
#endif
#ifdef KOKKOS_MEMPOOL_PRINTERR
#ifndef KOKKOS_ENABLE_MEMPOOL_PRINTERR
#define KOKKOS_ENABLE_MEMPOOL_PRINTERR KOKKOS_MEMPOOL_PRINTERR
#endif
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
#define KOKKOS_ENABLE_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS KOKKOS_MEMPOOL_PRINT_ACTIVE_SUPERBLOCKS
#endif
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_BLOCKSIZE_INFO
#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO
#define KOKKOS_ENABLE_MEMPOOL_PRINT_BLOCKSIZE_INFO KOKKOS_MEMPOOL_PRINT_BLOCKSIZE_INFO
#endif
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_CONSTRUCTOR_INFO
#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_CONSTRUCTOR_INFO
#define KOKKOS_ENABLE_MEMPOOL_PRINT_CONSTRUCTOR_INFO KOKKOS_MEMPOOL_PRINT_CONSTRUCTOR_INFO
#endif
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
#define KOKKOS_ENABLE_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO KOKKOS_MEMPOOL_PRINT_INDIVIDUAL_PAGE_INFO
#endif
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_INFO
#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_INFO
#define KOKKOS_ENABLE_MEMPOOL_PRINT_INFO KOKKOS_MEMPOOL_PRINT_INFO
#endif
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_PAGE_INFO
#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO
#define KOKKOS_ENABLE_MEMPOOL_PRINT_PAGE_INFO KOKKOS_MEMPOOL_PRINT_PAGE_INFO
#endif
#endif
#ifdef KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
#ifndef KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO
#define KOKKOS_ENABLE_MEMPOOL_PRINT_SUPERBLOCK_INFO KOKKOS_MEMPOOL_PRINT_SUPERBLOCK_INFO
#endif
#endif
#ifdef KOKKOS_POSIX_MEMALIGN_AVAILABLE
#ifndef KOKKOS_ENABLE_POSIX_MEMALIGN
#define KOKKOS_ENABLE_POSIX_MEMALIGN KOKKOS_POSIX_MEMALIGN_AVAILABLE
#endif
#endif
#ifdef KOKKOS_POSIX_MMAP_FLAGS
#ifndef KOKKOS_IMPL_POSIX_MMAP_FLAGS
#define KOKKOS_IMPL_POSIX_MMAP_FLAGS KOKKOS_POSIX_MMAP_FLAGS
#endif
#endif
#ifdef KOKKOS_POSIX_MMAP_FLAGS_HUGE
#ifndef KOKKOS_IMPL_POSIX_MMAP_FLAGS_HUGE
#define KOKKOS_IMPL_POSIX_MMAP_FLAGS_HUGE KOKKOS_POSIX_MMAP_FLAGS_HUGE
#endif
#endif
#ifdef KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT
#ifndef KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_DECREMENT
#define KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_DECREMENT KOKKOS_SHARED_ALLOCATION_TRACKER_DECREMENT
#endif
#endif
#ifdef KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED
#ifndef KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_ENABLED
#define KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_ENABLED KOKKOS_SHARED_ALLOCATION_TRACKER_ENABLED
#endif
#endif
#ifdef KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT
#ifndef KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_INCREMENT
#define KOKKOS_IMPL_SHARED_ALLOCATION_TRACKER_INCREMENT KOKKOS_SHARED_ALLOCATION_TRACKER_INCREMENT
#endif
#endif
#ifdef KOKKOS_USE_CUDA_UVM
#ifndef KOKKOS_ENABLE_CUDA_UVM
#define KOKKOS_ENABLE_CUDA_UVM KOKKOS_USE_CUDA_UVM
#endif
#endif
#ifdef KOKKOS_USE_ISA_KNC
#ifndef KOKKOS_ENABLE_ISA_KNC
#define KOKKOS_ENABLE_ISA_KNC KOKKOS_USE_ISA_KNC
#endif
#endif
#ifdef KOKKOS_USE_ISA_POWERPCLE
#ifndef KOKKOS_ENABLE_ISA_POWERPCLE
#define KOKKOS_ENABLE_ISA_POWERPCLE KOKKOS_USE_ISA_POWERPCLE
#endif
#endif
#ifdef KOKKOS_USE_ISA_X86_64
#ifndef KOKKOS_ENABLE_ISA_X86_64
#define KOKKOS_ENABLE_ISA_X86_64 KOKKOS_USE_ISA_X86_64
#endif
#endif
#ifdef KOKKOS_USE_LIBRT
#ifndef KOKKOS_ENABLE_LIBRT
#define KOKKOS_ENABLE_LIBRT KOKKOS_USE_LIBRT
#endif
#endif
#ifdef KOKKOS_VIEW_OPERATOR_VERIFY
#ifndef KOKKOS_IMPL_VIEW_OPERATOR_VERIFY
#define KOKKOS_IMPL_VIEW_OPERATOR_VERIFY KOKKOS_VIEW_OPERATOR_VERIFY
#endif
#endif
//------------------------------------------------------------------------------
// Deprecated macros
//------------------------------------------------------------------------------
#ifdef KOKKOS_HAVE_CXX11
#undef KOKKOS_HAVE_CXX11
#endif
#ifdef KOKKOS_ENABLE_CXX11
#undef KOKKOS_ENABLE_CXX11
#endif
#ifdef KOKKOS_USING_EXP_VIEW
#undef KOKKOS_USING_EXP_VIEW
#endif
#ifdef KOKKOS_USING_EXPERIMENTAL_VIEW
#undef KOKKOS_USING_EXPERIMENTAL_VIEW
#endif
#define KOKKOS_HAVE_CXX11 1
#define KOKKOS_ENABLE_CXX11 1
#define KOKKOS_USING_EXP_VIEW 1
#define KOKKOS_USING_EXPERIMENTAL_VIEW 1
#endif //KOKKOS_IMPL_OLD_MACROS_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
index 99c5df4db..0c006a8c0 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
@@ -1,237 +1,237 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <impl/Kokkos_Profiling_Interface.hpp>
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
#include <string.h>
namespace Kokkos {
namespace Profiling {
SpaceHandle::SpaceHandle(const char* space_name) {
strncpy(name,space_name,64);
}
bool profileLibraryLoaded() {
return (NULL != initProfileLibrary);
}
void beginParallelFor(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID) {
if(NULL != beginForCallee) {
Kokkos::fence();
(*beginForCallee)(kernelPrefix.c_str(), devID, kernelID);
}
}
void endParallelFor(const uint64_t kernelID) {
if(NULL != endForCallee) {
Kokkos::fence();
(*endForCallee)(kernelID);
}
}
void beginParallelScan(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID) {
if(NULL != beginScanCallee) {
Kokkos::fence();
(*beginScanCallee)(kernelPrefix.c_str(), devID, kernelID);
}
}
void endParallelScan(const uint64_t kernelID) {
if(NULL != endScanCallee) {
Kokkos::fence();
(*endScanCallee)(kernelID);
}
}
-
+
void beginParallelReduce(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID) {
if(NULL != beginReduceCallee) {
Kokkos::fence();
(*beginReduceCallee)(kernelPrefix.c_str(), devID, kernelID);
}
}
-
+
void endParallelReduce(const uint64_t kernelID) {
if(NULL != endReduceCallee) {
Kokkos::fence();
(*endReduceCallee)(kernelID);
}
}
-
+
void pushRegion(const std::string& kName) {
if( NULL != pushRegionCallee ) {
Kokkos::fence();
(*pushRegionCallee)(kName.c_str());
}
}
void popRegion() {
if( NULL != popRegionCallee ) {
Kokkos::fence();
(*popRegionCallee)();
}
}
void allocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size) {
if(NULL != allocateDataCallee) {
(*allocateDataCallee)(space,label.c_str(),ptr,size);
}
}
void deallocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size) {
if(NULL != allocateDataCallee) {
(*deallocateDataCallee)(space,label.c_str(),ptr,size);
}
}
void initialize() {
// Make sure initialize calls happens only once
static int is_initialized = 0;
if(is_initialized) return;
is_initialized = 1;
void* firstProfileLibrary;
char* envProfileLibrary = getenv("KOKKOS_PROFILE_LIBRARY");
// If we do not find a profiling library in the environment then exit
// early.
if( NULL == envProfileLibrary ) {
return ;
}
char* envProfileCopy = (char*) malloc(sizeof(char) * (strlen(envProfileLibrary) + 1));
sprintf(envProfileCopy, "%s", envProfileLibrary);
char* profileLibraryName = strtok(envProfileCopy, ";");
if( (NULL != profileLibraryName) && (strcmp(profileLibraryName, "") != 0) ) {
firstProfileLibrary = dlopen(profileLibraryName, RTLD_NOW | RTLD_GLOBAL);
if(NULL == firstProfileLibrary) {
std::cerr << "Error: Unable to load KokkosP library: " <<
profileLibraryName << std::endl;
} else {
std::cout << "KokkosP: Library Loaded: " << profileLibraryName << std::endl;
// dlsym returns a pointer to an object, while we want to assign to pointer to function
// A direct cast will give warnings hence, we have to workaround the issue by casting pointer to pointers.
auto p1 = dlsym(firstProfileLibrary, "kokkosp_begin_parallel_for");
beginForCallee = *((beginFunction*) &p1);
auto p2 = dlsym(firstProfileLibrary, "kokkosp_begin_parallel_scan");
beginScanCallee = *((beginFunction*) &p2);
auto p3 = dlsym(firstProfileLibrary, "kokkosp_begin_parallel_reduce");
beginReduceCallee = *((beginFunction*) &p3);
auto p4 = dlsym(firstProfileLibrary, "kokkosp_end_parallel_scan");
endScanCallee = *((endFunction*) &p4);
auto p5 = dlsym(firstProfileLibrary, "kokkosp_end_parallel_for");
endForCallee = *((endFunction*) &p5);
auto p6 = dlsym(firstProfileLibrary, "kokkosp_end_parallel_reduce");
endReduceCallee = *((endFunction*) &p6);
auto p7 = dlsym(firstProfileLibrary, "kokkosp_init_library");
initProfileLibrary = *((initFunction*) &p7);
auto p8 = dlsym(firstProfileLibrary, "kokkosp_finalize_library");
finalizeProfileLibrary = *((finalizeFunction*) &p8);
auto p9 = dlsym(firstProfileLibrary, "kokkosp_push_profile_region");
pushRegionCallee = *((pushFunction*) &p9);
auto p10 = dlsym(firstProfileLibrary, "kokkosp_pop_profile_region");
popRegionCallee = *((popFunction*) &p10);
auto p11 = dlsym(firstProfileLibrary, "kokkosp_allocate_data");
allocateDataCallee = *((allocateDataFunction*) &p11);
auto p12 = dlsym(firstProfileLibrary, "kokkosp_deallocate_data");
deallocateDataCallee = *((deallocateDataFunction*) &p12);
}
}
if(NULL != initProfileLibrary) {
(*initProfileLibrary)(0,
(uint64_t) KOKKOSP_INTERFACE_VERSION,
(uint32_t) 0,
NULL);
}
free(envProfileCopy);
}
void finalize() {
// Make sure finalize calls happens only once
static int is_finalized = 0;
if(is_finalized) return;
is_finalized = 1;
if(NULL != finalizeProfileLibrary) {
(*finalizeProfileLibrary)();
// Set all profile hooks to NULL to prevent
// any additional calls. Once we are told to
// finalize, we mean it
initProfileLibrary = NULL;
finalizeProfileLibrary = NULL;
beginForCallee = NULL;
beginScanCallee = NULL;
beginReduceCallee = NULL;
endScanCallee = NULL;
endForCallee = NULL;
endReduceCallee = NULL;
pushRegionCallee = NULL;
popRegionCallee = NULL;
allocateDataCallee = NULL;
deallocateDataCallee = NULL;
}
}
}
}
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp
index 3d6a38925..139a20d8f 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp
@@ -1,151 +1,151 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOSP_INTERFACE_HPP
#define KOKKOSP_INTERFACE_HPP
#include <cstddef>
#include <Kokkos_Core_fwd.hpp>
#include <Kokkos_Macros.hpp>
#include <string>
#include <cinttypes>
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
#include <impl/Kokkos_Profiling_DeviceInfo.hpp>
#include <dlfcn.h>
#include <iostream>
#include <stdlib.h>
#endif
#define KOKKOSP_INTERFACE_VERSION 20150628
-#if (KOKKOS_ENABLE_PROFILING)
+#if defined(KOKKOS_ENABLE_PROFILING)
namespace Kokkos {
namespace Profiling {
struct SpaceHandle {
SpaceHandle(const char* space_name);
char name[64];
};
typedef void (*initFunction)(const int,
const uint64_t,
const uint32_t,
KokkosPDeviceInfo*);
typedef void (*finalizeFunction)();
typedef void (*beginFunction)(const char*, const uint32_t, uint64_t*);
typedef void (*endFunction)(uint64_t);
typedef void (*pushFunction)(const char*);
typedef void (*popFunction)();
typedef void (*allocateDataFunction)(const SpaceHandle, const char*, const void*, const uint64_t);
typedef void (*deallocateDataFunction)(const SpaceHandle, const char*, const void*, const uint64_t);
static initFunction initProfileLibrary = NULL;
static finalizeFunction finalizeProfileLibrary = NULL;
static beginFunction beginForCallee = NULL;
static beginFunction beginScanCallee = NULL;
static beginFunction beginReduceCallee = NULL;
static endFunction endForCallee = NULL;
static endFunction endScanCallee = NULL;
static endFunction endReduceCallee = NULL;
static pushFunction pushRegionCallee = NULL;
static popFunction popRegionCallee = NULL;
static allocateDataFunction allocateDataCallee = NULL;
static deallocateDataFunction deallocateDataCallee = NULL;
bool profileLibraryLoaded();
void beginParallelFor(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID);
void endParallelFor(const uint64_t kernelID);
void beginParallelScan(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID);
void endParallelScan(const uint64_t kernelID);
void beginParallelReduce(const std::string& kernelPrefix, const uint32_t devID, uint64_t* kernelID);
void endParallelReduce(const uint64_t kernelID);
void pushRegion(const std::string& kName);
void popRegion();
void allocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size);
void deallocateData(const SpaceHandle space, const std::string label, const void* ptr, const uint64_t size);
void initialize();
void finalize();
//Define finalize_fake inline to get rid of warnings for unused static variables
inline void finalize_fake() {
if(NULL != finalizeProfileLibrary) {
(*finalizeProfileLibrary)();
// Set all profile hooks to NULL to prevent
// any additional calls. Once we are told to
// finalize, we mean it
beginForCallee = NULL;
beginScanCallee = NULL;
beginReduceCallee = NULL;
endScanCallee = NULL;
endForCallee = NULL;
endReduceCallee = NULL;
allocateDataCallee = NULL;
deallocateDataCallee = NULL;
initProfileLibrary = NULL;
finalizeProfileLibrary = NULL;
pushRegionCallee = NULL;
popRegionCallee = NULL;
}
}
}
}
#endif
#endif
diff --git a/lib/kokkos/core/src/impl/Kokkos_Reducer.hpp b/lib/kokkos/core/src/impl/Kokkos_Reducer.hpp
new file mode 100644
index 000000000..b3ed5f151
--- /dev/null
+++ b/lib/kokkos/core/src/impl/Kokkos_Reducer.hpp
@@ -0,0 +1,317 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#ifndef KOKKOS_IMPL_REDUCER_HPP
+#define KOKKOS_IMPL_REDUCER_HPP
+
+#include <impl/Kokkos_Traits.hpp>
+
+//----------------------------------------------------------------------------
+/* Reducer abstraction:
+ * 1) Provides 'join' operation
+ * 2) Provides 'init' operation
+ * 3) Provides 'copy' operation
+ * 4) Optionally provides result value in a memory space
+ *
+ * Created from:
+ * 1) Functor::operator()( destination , source )
+ * 2) Functor::{ join , init )
+ */
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+namespace Impl {
+
+template< typename value_type >
+struct ReduceSum
+{
+ KOKKOS_INLINE_FUNCTION static
+ void copy( value_type & dest
+ , value_type const & src ) noexcept
+ { dest = src ; }
+
+ KOKKOS_INLINE_FUNCTION static
+ void init( value_type & dest ) noexcept
+ { new( &dest ) value_type(); }
+
+ KOKKOS_INLINE_FUNCTION static
+ void join( value_type volatile & dest
+ , value_type const volatile & src ) noexcept
+ { dest += src ; }
+
+ KOKKOS_INLINE_FUNCTION static
+ void join( value_type & dest
+ , value_type const & src ) noexcept
+ { dest += src ; }
+};
+
+template< typename T
+ , class ReduceOp = ReduceSum< T >
+ , typename MemorySpace = void >
+struct Reducer
+ : private ReduceOp
+ , private integral_nonzero_constant
+ < int , ( std::rank<T>::value == 1 ? std::extent<T>::value : 1 )>
+{
+private:
+
+ // Determine if T is simple array
+
+ enum : int { rank = std::rank<T>::value };
+
+ static_assert( rank <= 1 , "Kokkos::Impl::Reducer type is at most rank-one" );
+
+ using length_t =
+ integral_nonzero_constant<int,( rank == 1 ? std::extent<T>::value : 1 )> ;
+
+public:
+
+ using reducer = Reducer ;
+ using memory_space = MemorySpace ;
+ using value_type = typename std::remove_extent<T>::type ;
+ using reference_type =
+ typename std::conditional< ( rank != 0 )
+ , value_type *
+ , value_type &
+ >::type ;
+private:
+
+ //--------------------------------------------------------------------------
+ // Determine what functions 'ReduceOp' provides:
+ // copy( destination , source )
+ // init( destination )
+ //
+ // operator()( destination , source )
+ // join( destination , source )
+ //
+ // Provide defaults for missing optional operations
+
+ template< class R , typename = void>
+ struct COPY {
+ KOKKOS_INLINE_FUNCTION static
+ void copy( R const &
+ , value_type * dst
+ , value_type const * src ) { *dst = *src ; }
+ };
+
+ template< class R >
+ struct COPY< R , decltype( ((R*)0)->copy( *((value_type*)0)
+ , *((value_type const *)0) ) ) >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void copy( R const & r
+ , value_type * dst
+ , value_type const * src ) { r.copy( *dst , *src ); }
+ };
+
+ template< class R , typename = void >
+ struct INIT {
+ KOKKOS_INLINE_FUNCTION static
+ void init( R const & , value_type * dst ) { new(dst) value_type(); }
+ };
+
+ template< class R >
+ struct INIT< R , decltype( ((R*)0)->init( *((value_type*)0 ) ) ) >
+ {
+ KOKKOS_INLINE_FUNCTION static
+ void init( R const & r , value_type * dst ) { r.init( *dst ); }
+ };
+
+ template< class R , typename V , typename = void > struct JOIN
+ {
+ // If no join function then try operator()
+ KOKKOS_INLINE_FUNCTION static
+ void join( R const & r , V * dst , V const * src )
+ { r.operator()(*dst,*src); }
+ };
+
+ template< class R , typename V >
+ struct JOIN< R , V , decltype( ((R*)0)->join ( *((V *)0) , *((V const *)0) ) ) >
+ {
+ // If has join function use it
+ KOKKOS_INLINE_FUNCTION static
+ void join( R const & r , V * dst , V const * src )
+ { r.join(*dst,*src); }
+ };
+
+ //--------------------------------------------------------------------------
+
+ value_type * const m_result ;
+
+ template< int Rank >
+ KOKKOS_INLINE_FUNCTION
+ static constexpr
+ typename std::enable_if< ( 0 != Rank ) , reference_type >::type
+ ref( value_type * p ) noexcept { return p ; }
+
+ template< int Rank >
+ KOKKOS_INLINE_FUNCTION
+ static constexpr
+ typename std::enable_if< ( 0 == Rank ) , reference_type >::type
+ ref( value_type * p ) noexcept { return *p ; }
+
+public:
+
+ //--------------------------------------------------------------------------
+
+ KOKKOS_INLINE_FUNCTION
+ constexpr int length() const noexcept
+ { return length_t::value ; }
+
+ KOKKOS_INLINE_FUNCTION
+ value_type * data() const noexcept
+ { return m_result ; }
+
+ KOKKOS_INLINE_FUNCTION
+ reference_type reference() const noexcept
+ { return Reducer::template ref< rank >( m_result ); }
+
+ //--------------------------------------------------------------------------
+
+ KOKKOS_INLINE_FUNCTION
+ void copy( value_type * const dest
+ , value_type const * const src ) const noexcept
+ {
+ for ( int i = 0 ; i < length() ; ++i ) {
+ Reducer::template COPY<ReduceOp>::copy( (ReduceOp &) *this , dest + i , src + i );
+ }
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void init( value_type * dest ) const noexcept
+ {
+ for ( int i = 0 ; i < length() ; ++i ) {
+ Reducer::template INIT<ReduceOp>::init( (ReduceOp &) *this , dest + i );
+ }
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void join( value_type * const dest
+ , value_type const * const src ) const noexcept
+ {
+ for ( int i = 0 ; i < length() ; ++i ) {
+ Reducer::template JOIN<ReduceOp,value_type>::join( (ReduceOp &) *this , dest + i , src + i );
+ }
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void join( value_type volatile * const dest
+ , value_type volatile const * const src ) const noexcept
+ {
+ for ( int i = 0 ; i < length() ; ++i ) {
+ Reducer::template JOIN<ReduceOp,value_type volatile>::join( (ReduceOp &) *this , dest + i , src + i );
+ }
+ }
+
+ //--------------------------------------------------------------------------
+
+ template< typename ArgT >
+ KOKKOS_INLINE_FUNCTION explicit
+ constexpr Reducer
+ ( ArgT * arg_value
+ , typename std::enable_if
+ < std::is_same<ArgT,value_type>::value &&
+ std::is_default_constructible< ReduceOp >::value
+ , int >::type arg_length = 1
+ ) noexcept
+ : ReduceOp(), length_t( arg_length ), m_result( arg_value ) {}
+
+ KOKKOS_INLINE_FUNCTION explicit
+ constexpr Reducer( ReduceOp const & arg_op
+ , value_type * arg_value = 0
+ , int arg_length = 1 ) noexcept
+ : ReduceOp( arg_op ), length_t( arg_length ), m_result( arg_value ) {}
+
+ KOKKOS_INLINE_FUNCTION explicit
+ constexpr Reducer( ReduceOp && arg_op
+ , value_type * arg_value = 0
+ , int arg_length = 1 ) noexcept
+ : ReduceOp( arg_op ), length_t( arg_length ), m_result( arg_value ) {}
+
+ Reducer( Reducer const & ) = default ;
+ Reducer( Reducer && ) = default ;
+ Reducer & operator = ( Reducer const & ) = default ;
+ Reducer & operator = ( Reducer && ) = default ;
+};
+
+} // namespace Impl
+} // namespace Kokkos
+
+//----------------------------------------------------------------------------
+//----------------------------------------------------------------------------
+
+namespace Kokkos {
+
+template< typename ValueType >
+constexpr
+Impl::Reducer< ValueType , Impl::ReduceSum< ValueType > >
+Sum( ValueType & arg_value )
+{
+ static_assert( std::is_trivial<ValueType>::value
+ , "Kokkos reducer requires trivial value type" );
+ return Impl::Reducer< ValueType , Impl::ReduceSum< ValueType > >( & arg_value );
+}
+
+template< typename ValueType >
+constexpr
+Impl::Reducer< ValueType[] , Impl::ReduceSum< ValueType > >
+Sum( ValueType * arg_value , int arg_length )
+{
+ static_assert( std::is_trivial<ValueType>::value
+ , "Kokkos reducer requires trivial value type" );
+ return Impl::Reducer< ValueType[] , Impl::ReduceSum< ValueType > >( arg_value , arg_length );
+}
+
+//----------------------------------------------------------------------------
+
+template< typename ValueType , class JoinType >
+Impl::Reducer< ValueType , JoinType >
+reducer( ValueType & value , JoinType const & lambda )
+{
+ return Impl::Reducer< ValueType , JoinType >( lambda , & value );
+}
+
+} // namespace Kokkos
+
+#endif /* #ifndef KOKKOS_IMPL_REDUCER_HPP */
+
diff --git a/lib/kokkos/core/src/impl/Kokkos_Serial.cpp b/lib/kokkos/core/src/impl/Kokkos_Serial.cpp
index 76161c10f..794961330 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Serial.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Serial.cpp
@@ -1,119 +1,182 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <stdlib.h>
#include <sstream>
#include <Kokkos_Serial.hpp>
#include <impl/Kokkos_Traits.hpp>
#include <impl/Kokkos_Error.hpp>
#if defined( KOKKOS_ENABLE_SERIAL )
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
-namespace SerialImpl {
+namespace {
-Sentinel::Sentinel() : m_scratch(0), m_reduce_end(0), m_shared_end(0) {}
+HostThreadTeamData g_serial_thread_team_data ;
-Sentinel::~Sentinel()
-{
- if ( m_scratch ) { free( m_scratch ); }
- m_scratch = 0 ;
- m_reduce_end = 0 ;
- m_shared_end = 0 ;
}
-Sentinel & Sentinel::singleton()
+// Resize thread team data scratch memory
+void serial_resize_thread_team_data( size_t pool_reduce_bytes
+ , size_t team_reduce_bytes
+ , size_t team_shared_bytes
+ , size_t thread_local_bytes )
{
- static Sentinel s ; return s ;
+ if ( pool_reduce_bytes < 512 ) pool_reduce_bytes = 512 ;
+ if ( team_reduce_bytes < 512 ) team_reduce_bytes = 512 ;
+
+ const size_t old_pool_reduce = g_serial_thread_team_data.pool_reduce_bytes();
+ const size_t old_team_reduce = g_serial_thread_team_data.team_reduce_bytes();
+ const size_t old_team_shared = g_serial_thread_team_data.team_shared_bytes();
+ const size_t old_thread_local = g_serial_thread_team_data.thread_local_bytes();
+ const size_t old_alloc_bytes = g_serial_thread_team_data.scratch_bytes();
+
+ // Allocate if any of the old allocation is tool small:
+
+ const bool allocate = ( old_pool_reduce < pool_reduce_bytes ) ||
+ ( old_team_reduce < team_reduce_bytes ) ||
+ ( old_team_shared < team_shared_bytes ) ||
+ ( old_thread_local < thread_local_bytes );
+
+ if ( allocate ) {
+
+ Kokkos::HostSpace space ;
+
+ if ( old_alloc_bytes ) {
+ g_serial_thread_team_data.disband_team();
+ g_serial_thread_team_data.disband_pool();
+
+ space.deallocate( g_serial_thread_team_data.scratch_buffer()
+ , g_serial_thread_team_data.scratch_bytes() );
+ }
+
+ if ( pool_reduce_bytes < old_pool_reduce ) { pool_reduce_bytes = old_pool_reduce ; }
+ if ( team_reduce_bytes < old_team_reduce ) { team_reduce_bytes = old_team_reduce ; }
+ if ( team_shared_bytes < old_team_shared ) { team_shared_bytes = old_team_shared ; }
+ if ( thread_local_bytes < old_thread_local ) { thread_local_bytes = old_thread_local ; }
+
+ const size_t alloc_bytes =
+ HostThreadTeamData::scratch_size( pool_reduce_bytes
+ , team_reduce_bytes
+ , team_shared_bytes
+ , thread_local_bytes );
+
+ void * const ptr = space.allocate( alloc_bytes );
+
+ g_serial_thread_team_data.
+ scratch_assign( ((char *)ptr)
+ , alloc_bytes
+ , pool_reduce_bytes
+ , team_reduce_bytes
+ , team_shared_bytes
+ , thread_local_bytes );
+
+ HostThreadTeamData * pool[1] = { & g_serial_thread_team_data };
+
+ g_serial_thread_team_data.organize_pool( pool , 1 );
+ g_serial_thread_team_data.organize_team(1);
+ }
}
-inline
-unsigned align( unsigned n )
+// Get thread team data structure for omp_get_thread_num()
+HostThreadTeamData * serial_get_thread_team_data()
{
- enum { ALIGN = 0x0100 /* 256 */ , MASK = ALIGN - 1 };
- return ( n + MASK ) & ~MASK ;
+ return & g_serial_thread_team_data ;
}
-} // namespace
+} // namespace Impl
+} // namespace Kokkos
-SerialTeamMember::SerialTeamMember( int arg_league_rank
- , int arg_league_size
- , int arg_shared_size
- )
- : m_space( ((char *) SerialImpl::Sentinel::singleton().m_scratch) + SerialImpl::Sentinel::singleton().m_reduce_end
- , arg_shared_size )
- , m_league_rank( arg_league_rank )
- , m_league_size( arg_league_size )
-{}
+/*--------------------------------------------------------------------------*/
-} // namespace Impl
+namespace Kokkos {
-void * Serial::scratch_memory_resize( unsigned reduce_size , unsigned shared_size )
+int Serial::is_initialized()
{
- static Impl::SerialImpl::Sentinel & s = Impl::SerialImpl::Sentinel::singleton();
+ return 1 ;
+}
- reduce_size = Impl::SerialImpl::align( reduce_size );
- shared_size = Impl::SerialImpl::align( shared_size );
+void Serial::initialize( unsigned threads_count
+ , unsigned use_numa_count
+ , unsigned use_cores_per_numa
+ , bool allow_asynchronous_threadpool )
+{
+ (void) threads_count;
+ (void) use_numa_count;
+ (void) use_cores_per_numa;
+ (void) allow_asynchronous_threadpool;
+
+ // Init the array of locks used for arbitrarily sized atomics
+ Impl::init_lock_array_host_space();
+ #if defined(KOKKOS_ENABLE_PROFILING)
+ Kokkos::Profiling::initialize();
+ #endif
+}
- if ( ( s.m_reduce_end < reduce_size ) ||
- ( s.m_shared_end < s.m_reduce_end + shared_size ) ) {
+void Serial::finalize()
+{
+ if ( Impl::g_serial_thread_team_data.scratch_buffer() ) {
+ Impl::g_serial_thread_team_data.disband_team();
+ Impl::g_serial_thread_team_data.disband_pool();
- if ( s.m_scratch ) { free( s.m_scratch ); }
+ Kokkos::HostSpace space ;
- if ( s.m_reduce_end < reduce_size ) s.m_reduce_end = reduce_size ;
- if ( s.m_shared_end < s.m_reduce_end + shared_size ) s.m_shared_end = s.m_reduce_end + shared_size ;
+ space.deallocate( Impl::g_serial_thread_team_data.scratch_buffer()
+ , Impl::g_serial_thread_team_data.scratch_bytes() );
- s.m_scratch = malloc( s.m_shared_end );
+ Impl::g_serial_thread_team_data.scratch_assign( (void*) 0, 0, 0, 0, 0, 0 );
}
- return s.m_scratch ;
+ #if defined(KOKKOS_ENABLE_PROFILING)
+ Kokkos::Profiling::finalize();
+ #endif
}
} // namespace Kokkos
#endif // defined( KOKKOS_ENABLE_SERIAL )
diff --git a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
index 19f3abe71..d22d604fb 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
@@ -1,148 +1,152 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_ENABLE_SERIAL ) && defined( KOKKOS_ENABLE_TASKDAG )
#include <impl/Kokkos_Serial_Task.hpp>
#include <impl/Kokkos_TaskQueue_impl.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template class TaskQueue< Kokkos::Serial > ;
void TaskQueueSpecialization< Kokkos::Serial >::execute
( TaskQueue< Kokkos::Serial > * const queue )
{
using execution_space = Kokkos::Serial ;
using queue_type = TaskQueue< execution_space > ;
using task_root_type = TaskBase< execution_space , void , void > ;
- using Member = TaskExec< execution_space > ;
+ using Member = Impl::HostThreadTeamMember< execution_space > ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
- Member exec ;
+ Impl::HostThreadTeamData * const data = Impl::serial_get_thread_team_data();
+
+ Member exec( *data );
// Loop until all queues are empty
while ( 0 < queue->m_ready_count ) {
task_root_type * task = end ;
for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
for ( int j = 0 ; j < 2 && end == task ; ++j ) {
- task = queue_type::pop_task( & queue->m_ready[i][j] );
+ task = queue_type::pop_ready_task( & queue->m_ready[i][j] );
}
}
if ( end != task ) {
- // pop_task resulted in lock == task->m_next
+ // pop_ready_task resulted in lock == task->m_next
// In the executing state
(*task->m_apply)( task , & exec );
#if 0
printf( "TaskQueue<Serial>::executed: 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
, uintptr_t(task)
, uintptr_t(task->m_wait)
, uintptr_t(task->m_next)
, task->m_task_type
, task->m_priority
, task->m_ref_count );
#endif
// If a respawn then re-enqueue otherwise the task is complete
// and all tasks waiting on this task are updated.
queue->complete( task );
}
else if ( 0 != queue->m_ready_count ) {
Kokkos::abort("TaskQueue<Serial>::execute ERROR: ready_count");
}
}
}
void TaskQueueSpecialization< Kokkos::Serial > ::
iff_single_thread_recursive_execute(
TaskQueue< Kokkos::Serial > * const queue )
{
using execution_space = Kokkos::Serial ;
using queue_type = TaskQueue< execution_space > ;
using task_root_type = TaskBase< execution_space , void , void > ;
- using Member = TaskExec< execution_space > ;
+ using Member = Impl::HostThreadTeamMember< execution_space > ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
- Member exec ;
+ Impl::HostThreadTeamData * const data = Impl::serial_get_thread_team_data();
+
+ Member exec( *data );
// Loop until no runnable task
task_root_type * task = end ;
do {
task = end ;
for ( int i = 0 ; i < queue_type::NumQueue && end == task ; ++i ) {
for ( int j = 0 ; j < 2 && end == task ; ++j ) {
- task = queue_type::pop_task( & queue->m_ready[i][j] );
+ task = queue_type::pop_ready_task( & queue->m_ready[i][j] );
}
}
if ( end == task ) break ;
(*task->m_apply)( task , & exec );
queue->complete( task );
} while(1);
}
}} /* namespace Kokkos::Impl */
#endif /* #if defined( KOKKOS_ENABLE_SERIAL ) && defined( KOKKOS_ENABLE_TASKDAG ) */
diff --git a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp
index 178305c5d..ac7f17c0e 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Serial_Task.hpp
@@ -1,308 +1,91 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_IMPL_SERIAL_TASK_HPP
#define KOKKOS_IMPL_SERIAL_TASK_HPP
#if defined( KOKKOS_ENABLE_TASKDAG )
#include <impl/Kokkos_TaskQueue.hpp>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
template<>
class TaskQueueSpecialization< Kokkos::Serial >
{
public:
using execution_space = Kokkos::Serial ;
using memory_space = Kokkos::HostSpace ;
using queue_type = Kokkos::Impl::TaskQueue< execution_space > ;
using task_base_type = Kokkos::Impl::TaskBase< execution_space , void , void > ;
+ using member_type = Kokkos::Impl::HostThreadTeamMember< execution_space > ;
static
void iff_single_thread_recursive_execute( queue_type * const );
static
void execute( queue_type * const );
- template< typename FunctorType >
+ template< typename TaskType >
static
- void proc_set_apply( task_base_type::function_type * ptr )
- {
- using TaskType = TaskBase< Kokkos::Serial
- , typename FunctorType::value_type
- , FunctorType
- > ;
- *ptr = TaskType::apply ;
- }
+ typename TaskType::function_type
+ get_function_pointer() { return TaskType::apply ; }
};
extern template class TaskQueue< Kokkos::Serial > ;
-//----------------------------------------------------------------------------
-
-template<>
-class TaskExec< Kokkos::Serial >
-{
-public:
-
- KOKKOS_INLINE_FUNCTION void team_barrier() const {}
- KOKKOS_INLINE_FUNCTION int team_rank() const { return 0 ; }
- KOKKOS_INLINE_FUNCTION int team_size() const { return 1 ; }
-};
-
-template<typename iType>
-struct TeamThreadRangeBoundariesStruct<iType, TaskExec< Kokkos::Serial > >
-{
- typedef iType index_type;
- const iType start ;
- const iType end ;
- enum {increment = 1};
- //const TaskExec< Kokkos::Serial > & thread;
- TaskExec< Kokkos::Serial > & thread;
-
- KOKKOS_INLINE_FUNCTION
- TeamThreadRangeBoundariesStruct
- //( const TaskExec< Kokkos::Serial > & arg_thread, const iType& arg_count)
- ( TaskExec< Kokkos::Serial > & arg_thread, const iType& arg_count)
- : start(0)
- , end(arg_count)
- , thread(arg_thread)
- {}
-
- KOKKOS_INLINE_FUNCTION
- TeamThreadRangeBoundariesStruct
- //( const TaskExec< Kokkos::Serial > & arg_thread
- ( TaskExec< Kokkos::Serial > & arg_thread
- , const iType& arg_start
- , const iType & arg_end
- )
- : start( arg_start )
- , end( arg_end)
- , thread( arg_thread )
- {}
-};
-
-//----------------------------------------------------------------------------
-
-template<typename iType>
-struct ThreadVectorRangeBoundariesStruct<iType, TaskExec< Kokkos::Serial > >
-{
- typedef iType index_type;
- const iType start ;
- const iType end ;
- enum {increment = 1};
- TaskExec< Kokkos::Serial > & thread;
-
- KOKKOS_INLINE_FUNCTION
- ThreadVectorRangeBoundariesStruct
- ( TaskExec< Kokkos::Serial > & arg_thread, const iType& arg_count)
- : start( 0 )
- , end(arg_count)
- , thread(arg_thread)
- {}
-};
-
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
-namespace Kokkos {
-
-// OMP version needs non-const TaskExec
-template< typename iType >
-KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Serial > >
-TeamThreadRange( Impl::TaskExec< Kokkos::Serial > & thread, const iType & count )
-{
- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Serial > >( thread, count );
-}
-
-// OMP version needs non-const TaskExec
-template< typename iType1, typename iType2 >
-KOKKOS_INLINE_FUNCTION
-Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
- Impl::TaskExec< Kokkos::Serial > >
-TeamThreadRange( Impl::TaskExec< Kokkos::Serial > & thread, const iType1 & start, const iType2 & end )
-{
- typedef typename std::common_type< iType1, iType2 >::type iType;
- return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::TaskExec< Kokkos::Serial > >(
- thread, iType(start), iType(end) );
-}
-
-// OMP version needs non-const TaskExec
-template<typename iType>
-KOKKOS_INLINE_FUNCTION
-Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >
-ThreadVectorRange
- ( Impl::TaskExec< Kokkos::Serial > & thread
- , const iType & count )
-{
- return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >(thread,count);
-}
-
- /** \brief Inter-thread parallel_for. Executes lambda(iType i) for each i=0..N-1.
- *
- * The range i=0..N-1 is mapped to all threads of the the calling thread team.
- * This functionality requires C++11 support.*/
-template<typename iType, class Lambda>
-KOKKOS_INLINE_FUNCTION
-void parallel_for(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries, const Lambda& lambda) {
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
- lambda(i);
-}
-
-template< typename iType, class Lambda, typename ValueType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce
- (const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
- const Lambda & lambda,
- ValueType& initialized_result)
-{
-
- ValueType result = initialized_result;
-
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
- lambda(i, result);
-
- initialized_result = result;
-}
-
-template< typename iType, class Lambda, typename ValueType, class JoinType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce
- (const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
- const Lambda & lambda,
- const JoinType & join,
- ValueType& initialized_result)
-{
- ValueType result = initialized_result;
-
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment)
- lambda(i, result);
-
- initialized_result = result;
-}
-
-template< typename iType, class Lambda, typename ValueType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce
- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
- const Lambda & lambda,
- ValueType& initialized_result)
-{
- initialized_result = ValueType();
-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
-#pragma ivdep
-#endif
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- initialized_result+=tmp;
- }
-}
-
-template< typename iType, class Lambda, typename ValueType, class JoinType >
-KOKKOS_INLINE_FUNCTION
-void parallel_reduce
- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
- const Lambda & lambda,
- const JoinType & join,
- ValueType& initialized_result)
-{
- ValueType result = initialized_result;
-#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
-#pragma ivdep
-#endif
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- ValueType tmp = ValueType();
- lambda(i,tmp);
- join(result,tmp);
- }
- initialized_result = result;
-}
-
-template< typename ValueType, typename iType, class Lambda >
-KOKKOS_INLINE_FUNCTION
-void parallel_scan
- (const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
- const Lambda & lambda)
-{
- ValueType accum = 0 ;
- ValueType val, local_total;
-
- for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
- local_total = 0;
- lambda(i,local_total,false);
- val = accum;
- lambda(i,val,true);
- accum += local_total;
- }
-
-}
-
-// placeholder for future function
-template< typename iType, class Lambda, typename ValueType >
-KOKKOS_INLINE_FUNCTION
-void parallel_scan
- (const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Serial > >& loop_boundaries,
- const Lambda & lambda)
-{
-}
-
-} /* namespace Kokkos */
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #ifndef KOKKOS_IMPL_SERIAL_TASK_HPP */
diff --git a/lib/kokkos/core/src/impl/Kokkos_Synchronic.hpp b/lib/kokkos/core/src/impl/Kokkos_Synchronic.hpp
deleted file mode 100644
index b2aea14df..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_Synchronic.hpp
+++ /dev/null
@@ -1,693 +0,0 @@
-/*
-
-Copyright (c) 2014, NVIDIA Corporation
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
-1. Redistributions of source code must retain the above copyright notice, this
-list of conditions and the following disclaimer.
-
-2. Redistributions in binary form must reproduce the above copyright notice,
-this list of conditions and the following disclaimer in the documentation
-and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
-IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
-OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
-OF THE POSSIBILITY OF SUCH DAMAGE.
-
-*/
-
-#ifndef KOKKOS_SYNCHRONIC_HPP
-#define KOKKOS_SYNCHRONIC_HPP
-
-#include <impl/Kokkos_Synchronic_Config.hpp>
-
-#include <atomic>
-#include <chrono>
-#include <thread>
-#include <functional>
-#include <algorithm>
-
-namespace Kokkos {
-namespace Impl {
-
-enum notify_hint {
- notify_all,
- notify_one,
- notify_none
-};
-enum expect_hint {
- expect_urgent,
- expect_delay
-};
-
-namespace Details {
-
-template <class S, class T>
-bool __synchronic_spin_wait_for_update(S const& arg, T const& nval, int attempts) noexcept {
- int i = 0;
- for(;i < __SYNCHRONIC_SPIN_RELAX(attempts); ++i)
- if(__builtin_expect(arg.load(std::memory_order_relaxed) != nval,1))
- return true;
- else
- __synchronic_relax();
- for(;i < attempts; ++i)
- if(__builtin_expect(arg.load(std::memory_order_relaxed) != nval,1))
- return true;
- else
- __synchronic_yield();
- return false;
-}
-
-struct __exponential_backoff {
- __exponential_backoff(int arg_maximum=512) : maximum(arg_maximum), microseconds(8), x(123456789), y(362436069), z(521288629) {
- }
- static inline void sleep_for(std::chrono::microseconds const& time) {
- auto t = time.count();
- if(__builtin_expect(t > 75,0)) {
- portable_sleep(time);
- }
- else if(__builtin_expect(t > 25,0))
- __synchronic_yield();
- else
- __synchronic_relax();
- }
- void sleep_for_step() {
- sleep_for(step());
- }
- std::chrono::microseconds step() {
- float const f = ranfu();
- int const t = int(microseconds * f);
- if(__builtin_expect(f >= 0.95f,0))
- microseconds = 8;
- else
- microseconds = (std::min)(microseconds>>1,maximum);
- return std::chrono::microseconds(t);
- }
-private :
- int maximum, microseconds, x, y, z;
- int xorshf96() {
- int t;
- x ^= x << 16; x ^= x >> 5; x ^= x << 1;
- t = x; x = y; y = z; z = t ^ x ^ y;
- return z;
- }
- float ranfu() {
- return (float)(xorshf96()&(~0UL>>1)) / (float)(~0UL>>1);
- }
-};
-
-template <class T, class Enable = void>
-struct __synchronic_base {
-
-protected:
- std::atomic<T> atom;
-
- void notify(notify_hint = notify_all) noexcept {
- }
- void notify(notify_hint = notify_all) volatile noexcept {
- }
-
-public :
- __synchronic_base() noexcept = default;
- constexpr __synchronic_base(T v) noexcept : atom(v) { }
- __synchronic_base(const __synchronic_base&) = delete;
- ~__synchronic_base() { }
- __synchronic_base& operator=(const __synchronic_base&) = delete;
- __synchronic_base& operator=(const __synchronic_base&) volatile = delete;
-
- void expect_update(T val, expect_hint = expect_urgent) const noexcept {
- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_A))
- return;
- __exponential_backoff b;
- while(atom.load(std::memory_order_relaxed) == val) {
- __do_backoff(b);
- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_B))
- return;
- }
- }
- void expect_update(T val, expect_hint = expect_urgent) const volatile noexcept {
- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_A))
- return;
- __exponential_backoff b;
- while(atom.load(std::memory_order_relaxed) == val) {
- __do_backoff(b);
- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_B))
- return;
- }
- }
-
- template <class Clock, class Duration>
- void expect_update_until(T val, std::chrono::time_point<Clock,Duration> const& then, expect_hint = expect_urgent) const {
- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_A))
- return;
- __exponential_backoff b;
- std::chrono::milliseconds remains = then - std::chrono::high_resolution_clock::now();
- while(remains > std::chrono::milliseconds::zero() && atom.load(std::memory_order_relaxed) == val) {
- __do_backoff(b);
- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_B))
- return;
- remains = then - std::chrono::high_resolution_clock::now();
- }
- }
- template <class Clock, class Duration>
- void expect_update_until(T val, std::chrono::time_point<Clock,Duration> const& then, expect_hint = expect_urgent) const volatile {
- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_A))
- return;
- __exponential_backoff b;
- std::chrono::milliseconds remains = then - std::chrono::high_resolution_clock::now();
- while(remains > std::chrono::milliseconds::zero() && atom.load(std::memory_order_relaxed) == val) {
- __do_backoff(b);
- if(__synchronic_spin_wait_for_update(atom, val, __SYNCHRONIC_SPIN_COUNT_B))
- return;
- remains = then - std::chrono::high_resolution_clock::now();
- }
- }
-};
-
-#ifdef __SYNCHRONIC_COMPATIBLE
-template <class T>
-struct __synchronic_base<T, typename std::enable_if<__SYNCHRONIC_COMPATIBLE(T)>::type> {
-
-public:
- std::atomic<T> atom;
-
- void notify(notify_hint hint = notify_all) noexcept {
- if(__builtin_expect(hint == notify_none,1))
- return;
- auto const x = count.fetch_add(0,std::memory_order_acq_rel);
- if(__builtin_expect(x,0)) {
- if(__builtin_expect(hint == notify_all,1))
- __synchronic_wake_all(&atom);
- else
- __synchronic_wake_one(&atom);
- }
- }
- void notify(notify_hint hint = notify_all) volatile noexcept {
- if(__builtin_expect(hint == notify_none,1))
- return;
- auto const x = count.fetch_add(0,std::memory_order_acq_rel);
- if(__builtin_expect(x,0)) {
- if(__builtin_expect(hint == notify_all,1))
- __synchronic_wake_all_volatile(&atom);
- else
- __synchronic_wake_one_volatile(&atom);
- }
- }
-
-public :
- __synchronic_base() noexcept : count(0) { }
- constexpr __synchronic_base(T v) noexcept : atom(v), count(0) { }
- __synchronic_base(const __synchronic_base&) = delete;
- ~__synchronic_base() { }
- __synchronic_base& operator=(const __synchronic_base&) = delete;
- __synchronic_base& operator=(const __synchronic_base&) volatile = delete;
-
- void expect_update(T val, expect_hint = expect_urgent) const noexcept {
- if(__builtin_expect(__synchronic_spin_wait_for_update(atom, val,__SYNCHRONIC_SPIN_COUNT_A),1))
- return;
- while(__builtin_expect(atom.load(std::memory_order_relaxed) == val,1)) {
- count.fetch_add(1,std::memory_order_release);
- __synchronic_wait(&atom,val);
- count.fetch_add(-1,std::memory_order_acquire);
- }
- }
- void expect_update(T val, expect_hint = expect_urgent) const volatile noexcept {
- if(__builtin_expect(__synchronic_spin_wait_for_update(atom, val,__SYNCHRONIC_SPIN_COUNT_A),1))
- return;
- while(__builtin_expect(atom.load(std::memory_order_relaxed) == val,1)) {
- count.fetch_add(1,std::memory_order_release);
- __synchronic_wait_volatile(&atom,val);
- count.fetch_add(-1,std::memory_order_acquire);
- }
- }
-
- template <class Clock, class Duration>
- void expect_update_until(T val, std::chrono::time_point<Clock,Duration> const& then, expect_hint = expect_urgent) const {
- if(__builtin_expect(__synchronic_spin_wait_for_update(atom, val,__SYNCHRONIC_SPIN_COUNT_A),1))
- return;
- std::chrono::milliseconds remains = then - std::chrono::high_resolution_clock::now();
- while(__builtin_expect(remains > std::chrono::milliseconds::zero() && atom.load(std::memory_order_relaxed) == val,1)) {
- count.fetch_add(1,std::memory_order_release);
- __synchronic_wait_timed(&atom,val,remains);
- count.fetch_add(-1,std::memory_order_acquire);
- remains = then - std::chrono::high_resolution_clock::now();
- }
- }
- template <class Clock, class Duration>
- void expect_update_until(T val, std::chrono::time_point<Clock,Duration> const& then, expect_hint = expect_urgent) const volatile {
- if(__builtin_expect(__synchronic_spin_wait_for_update(atom, val,__SYNCHRONIC_SPIN_COUNT_A),1))
- return;
- std::chrono::milliseconds remains = then - std::chrono::high_resolution_clock::now();
- while(__builtin_expect(remains > std::chrono::milliseconds::zero() && atom.load(std::memory_order_relaxed) == val,1)) {
- count.fetch_add(1,std::memory_order_release);
- __synchronic_wait_timed_volatile(&atom,val,remains);
- count.fetch_add(-1,std::memory_order_acquire);
- remains = then - std::chrono::high_resolution_clock::now();
- }
- }
-private:
- mutable std::atomic<int> count;
-};
-#endif
-
-template <class T, class Enable = void>
-struct __synchronic : public __synchronic_base<T> {
-
- __synchronic() noexcept = default;
- constexpr __synchronic(T v) noexcept : __synchronic_base<T>(v) { }
- __synchronic(const __synchronic&) = delete;
- __synchronic& operator=(const __synchronic&) = delete;
- __synchronic& operator=(const __synchronic&) volatile = delete;
-};
-
-template <class T>
-struct __synchronic<T,typename std::enable_if<std::is_integral<T>::value>::type> : public __synchronic_base<T> {
-
- T fetch_add(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.fetch_add(v,m);
- this->notify(n);
- return t;
- }
- T fetch_add(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.fetch_add(v,m);
- this->notify(n);
- return t;
- }
- T fetch_sub(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.fetch_sub(v,m);
- this->notify(n);
- return t;
- }
- T fetch_sub(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.fetch_sub(v,m);
- this->notify(n);
- return t;
- }
- T fetch_and(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.fetch_and(v,m);
- this->notify(n);
- return t;
- }
- T fetch_and(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.fetch_and(v,m);
- this->notify(n);
- return t;
- }
- T fetch_or(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.fetch_or(v,m);
- this->notify(n);
- return t;
- }
- T fetch_or(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.fetch_or(v,m);
- this->notify(n);
- return t;
- }
- T fetch_xor(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.fetch_xor(v,m);
- this->notify(n);
- return t;
- }
- T fetch_xor(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.fetch_xor(v,m);
- this->notify(n);
- return t;
- }
-
- __synchronic() noexcept = default;
- constexpr __synchronic(T v) noexcept : __synchronic_base<T>(v) { }
- __synchronic(const __synchronic&) = delete;
- __synchronic& operator=(const __synchronic&) = delete;
- __synchronic& operator=(const __synchronic&) volatile = delete;
-
- T operator=(T v) volatile noexcept {
- auto const t = this->atom = v;
- this->notify();
- return t;
- }
- T operator=(T v) noexcept {
- auto const t = this->atom = v;
- this->notify();
- return t;
- }
- T operator++(int) volatile noexcept {
- auto const t = ++this->atom;
- this->notify();
- return t;
- }
- T operator++(int) noexcept {
- auto const t = ++this->atom;
- this->notify();
- return t;
- }
- T operator--(int) volatile noexcept {
- auto const t = --this->atom;
- this->notify();
- return t;
- }
- T operator--(int) noexcept {
- auto const t = --this->atom;
- this->notify();
- return t;
- }
- T operator++() volatile noexcept {
- auto const t = this->atom++;
- this->notify();
- return t;
- }
- T operator++() noexcept {
- auto const t = this->atom++;
- this->notify();
- return t;
- }
- T operator--() volatile noexcept {
- auto const t = this->atom--;
- this->notify();
- return t;
- }
- T operator--() noexcept {
- auto const t = this->atom--;
- this->notify();
- return t;
- }
- T operator+=(T v) volatile noexcept {
- auto const t = this->atom += v;
- this->notify();
- return t;
- }
- T operator+=(T v) noexcept {
- auto const t = this->atom += v;
- this->notify();
- return t;
- }
- T operator-=(T v) volatile noexcept {
- auto const t = this->atom -= v;
- this->notify();
- return t;
- }
- T operator-=(T v) noexcept {
- auto const t = this->atom -= v;
- this->notify();
- return t;
- }
- T operator&=(T v) volatile noexcept {
- auto const t = this->atom &= v;
- this->notify();
- return t;
- }
- T operator&=(T v) noexcept {
- auto const t = this->atom &= v;
- this->notify();
- return t;
- }
- T operator|=(T v) volatile noexcept {
- auto const t = this->atom |= v;
- this->notify();
- return t;
- }
- T operator|=(T v) noexcept {
- auto const t = this->atom |= v;
- this->notify();
- return t;
- }
- T operator^=(T v) volatile noexcept {
- auto const t = this->atom ^= v;
- this->notify();
- return t;
- }
- T operator^=(T v) noexcept {
- auto const t = this->atom ^= v;
- this->notify();
- return t;
- }
-};
-
-template <class T>
-struct __synchronic<T*> : public __synchronic_base<T*> {
-
- T* fetch_add(ptrdiff_t v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.fetch_add(v,m);
- this->notify(n);
- return t;
- }
- T* fetch_add(ptrdiff_t v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.fetch_add(v,m);
- this->notify(n);
- return t;
- }
- T* fetch_sub(ptrdiff_t v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.fetch_sub(v,m);
- this->notify(n);
- return t;
- }
- T* fetch_sub(ptrdiff_t v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.fetch_sub(v,m);
- this->notify(n);
- return t;
- }
-
- __synchronic() noexcept = default;
- constexpr __synchronic(T* v) noexcept : __synchronic_base<T*>(v) { }
- __synchronic(const __synchronic&) = delete;
- __synchronic& operator=(const __synchronic&) = delete;
- __synchronic& operator=(const __synchronic&) volatile = delete;
-
- T* operator=(T* v) volatile noexcept {
- auto const t = this->atom = v;
- this->notify();
- return t;
- }
- T* operator=(T* v) noexcept {
- auto const t = this->atom = v;
- this->notify();
- return t;
- }
- T* operator++(int) volatile noexcept {
- auto const t = ++this->atom;
- this->notify();
- return t;
- }
- T* operator++(int) noexcept {
- auto const t = ++this->atom;
- this->notify();
- return t;
- }
- T* operator--(int) volatile noexcept {
- auto const t = --this->atom;
- this->notify();
- return t;
- }
- T* operator--(int) noexcept {
- auto const t = --this->atom;
- this->notify();
- return t;
- }
- T* operator++() volatile noexcept {
- auto const t = this->atom++;
- this->notify();
- return t;
- }
- T* operator++() noexcept {
- auto const t = this->atom++;
- this->notify();
- return t;
- }
- T* operator--() volatile noexcept {
- auto const t = this->atom--;
- this->notify();
- return t;
- }
- T* operator--() noexcept {
- auto const t = this->atom--;
- this->notify();
- return t;
- }
- T* operator+=(ptrdiff_t v) volatile noexcept {
- auto const t = this->atom += v;
- this->notify();
- return t;
- }
- T* operator+=(ptrdiff_t v) noexcept {
- auto const t = this->atom += v;
- this->notify();
- return t;
- }
- T* operator-=(ptrdiff_t v) volatile noexcept {
- auto const t = this->atom -= v;
- this->notify();
- return t;
- }
- T* operator-=(ptrdiff_t v) noexcept {
- auto const t = this->atom -= v;
- this->notify();
- return t;
- }
-};
-
-} //namespace Details
-
-template <class T>
-struct synchronic : public Details::__synchronic<T> {
-
- bool is_lock_free() const volatile noexcept { return this->atom.is_lock_free(); }
- bool is_lock_free() const noexcept { return this->atom.is_lock_free(); }
- void store(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- this->atom.store(v,m);
- this->notify(n);
- }
- void store(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- this->atom.store(v,m);
- this->notify(n);
- }
- T load(std::memory_order m = std::memory_order_seq_cst) const volatile noexcept { return this->atom.load(m); }
- T load(std::memory_order m = std::memory_order_seq_cst) const noexcept { return this->atom.load(m); }
-
- operator T() const volatile noexcept { return (T)this->atom; }
- operator T() const noexcept { return (T)this->atom; }
-
- T exchange(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.exchange(v,m);
- this->notify(n);
- return t;
- }
- T exchange(T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.exchange(v,m);
- this->notify(n);
- return t;
- }
- bool compare_exchange_weak(T& r, T v, std::memory_order m1, std::memory_order m2, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.compare_exchange_weak(r,v,m1,m2);
- this->notify(n);
- return t;
- }
- bool compare_exchange_weak(T& r, T v, std::memory_order m1, std::memory_order m2, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.compare_exchange_weak(r,v,m1, m2);
- this->notify(n);
- return t;
- }
- bool compare_exchange_strong(T& r, T v, std::memory_order m1, std::memory_order m2, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.compare_exchange_strong(r,v,m1,m2);
- this->notify(n);
- return t;
- }
- bool compare_exchange_strong(T& r, T v, std::memory_order m1, std::memory_order m2, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.compare_exchange_strong(r,v,m1,m2);
- this->notify(n);
- return t;
- }
- bool compare_exchange_weak(T& r, T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.compare_exchange_weak(r,v,m);
- this->notify(n);
- return t;
- }
- bool compare_exchange_weak(T& r, T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.compare_exchange_weak(r,v,m);
- this->notify(n);
- return t;
- }
- bool compare_exchange_strong(T& r, T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) volatile noexcept {
- auto const t = this->atom.compare_exchange_strong(r,v,m);
- this->notify(n);
- return t;
- }
- bool compare_exchange_strong(T& r, T v, std::memory_order m = std::memory_order_seq_cst, notify_hint n = notify_all) noexcept {
- auto const t = this->atom.compare_exchange_strong(r,v,m);
- this->notify(n);
- return t;
- }
-
- synchronic() noexcept = default;
- constexpr synchronic(T val) noexcept : Details::__synchronic<T>(val) { }
- synchronic(const synchronic&) = delete;
- ~synchronic() { }
- synchronic& operator=(const synchronic&) = delete;
- synchronic& operator=(const synchronic&) volatile = delete;
- T operator=(T val) noexcept {
- return Details::__synchronic<T>::operator=(val);
- }
- T operator=(T val) volatile noexcept {
- return Details::__synchronic<T>::operator=(val);
- }
-
- T load_when_not_equal(T val, std::memory_order order = std::memory_order_seq_cst, expect_hint h = expect_urgent) const noexcept {
- Details::__synchronic<T>::expect_update(val,h);
- return load(order);
- }
- T load_when_not_equal(T val, std::memory_order order = std::memory_order_seq_cst, expect_hint h = expect_urgent) const volatile noexcept {
- Details::__synchronic<T>::expect_update(val,h);
- return load(order);
- }
- T load_when_equal(T val, std::memory_order order = std::memory_order_seq_cst, expect_hint h = expect_urgent) const noexcept {
- for(T nval = load(std::memory_order_relaxed); nval != val; nval = load(std::memory_order_relaxed))
- Details::__synchronic<T>::expect_update(nval,h);
- return load(order);
- }
- T load_when_equal(T val, std::memory_order order = std::memory_order_seq_cst, expect_hint h = expect_urgent) const volatile noexcept {
- for(T nval = load(std::memory_order_relaxed); nval != val; nval = load(std::memory_order_relaxed))
- expect_update(nval,h);
- return load(order);
- }
- template <class Rep, class Period>
- void expect_update_for(T val, std::chrono::duration<Rep,Period> const& delta, expect_hint h = expect_urgent) const {
- Details::__synchronic<T>::expect_update_until(val, std::chrono::high_resolution_clock::now() + delta,h);
- }
- template < class Rep, class Period>
- void expect_update_for(T val, std::chrono::duration<Rep,Period> const& delta, expect_hint h = expect_urgent) const volatile {
- Details::__synchronic<T>::expect_update_until(val, std::chrono::high_resolution_clock::now() + delta,h);
- }
-};
-
-#include <inttypes.h>
-
-typedef synchronic<char> synchronic_char;
-typedef synchronic<char> synchronic_schar;
-typedef synchronic<unsigned char> synchronic_uchar;
-typedef synchronic<short> synchronic_short;
-typedef synchronic<unsigned short> synchronic_ushort;
-typedef synchronic<int> synchronic_int;
-typedef synchronic<unsigned int> synchronic_uint;
-typedef synchronic<long> synchronic_long;
-typedef synchronic<unsigned long> synchronic_ulong;
-typedef synchronic<long long> synchronic_llong;
-typedef synchronic<unsigned long long> synchronic_ullong;
-//typedef synchronic<char16_t> synchronic_char16_t;
-//typedef synchronic<char32_t> synchronic_char32_t;
-typedef synchronic<wchar_t> synchronic_wchar_t;
-
-typedef synchronic<int_least8_t> synchronic_int_least8_t;
-typedef synchronic<uint_least8_t> synchronic_uint_least8_t;
-typedef synchronic<int_least16_t> synchronic_int_least16_t;
-typedef synchronic<uint_least16_t> synchronic_uint_least16_t;
-typedef synchronic<int_least32_t> synchronic_int_least32_t;
-typedef synchronic<uint_least32_t> synchronic_uint_least32_t;
-//typedef synchronic<int_least_64_t> synchronic_int_least_64_t;
-typedef synchronic<uint_least64_t> synchronic_uint_least64_t;
-typedef synchronic<int_fast8_t> synchronic_int_fast8_t;
-typedef synchronic<uint_fast8_t> synchronic_uint_fast8_t;
-typedef synchronic<int_fast16_t> synchronic_int_fast16_t;
-typedef synchronic<uint_fast16_t> synchronic_uint_fast16_t;
-typedef synchronic<int_fast32_t> synchronic_int_fast32_t;
-typedef synchronic<uint_fast32_t> synchronic_uint_fast32_t;
-typedef synchronic<int_fast64_t> synchronic_int_fast64_t;
-typedef synchronic<uint_fast64_t> synchronic_uint_fast64_t;
-typedef synchronic<intptr_t> synchronic_intptr_t;
-typedef synchronic<uintptr_t> synchronic_uintptr_t;
-typedef synchronic<size_t> synchronic_size_t;
-typedef synchronic<ptrdiff_t> synchronic_ptrdiff_t;
-typedef synchronic<intmax_t> synchronic_intmax_t;
-typedef synchronic<uintmax_t> synchronic_uintmax_t;
-
-}
-}
-
-#endif //__SYNCHRONIC_H
diff --git a/lib/kokkos/core/src/impl/Kokkos_Synchronic_Config.hpp b/lib/kokkos/core/src/impl/Kokkos_Synchronic_Config.hpp
deleted file mode 100644
index 0a6dd6e71..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_Synchronic_Config.hpp
+++ /dev/null
@@ -1,169 +0,0 @@
-/*
-
-Copyright (c) 2014, NVIDIA Corporation
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
-1. Redistributions of source code must retain the above copyright notice, this
-list of conditions and the following disclaimer.
-
-2. Redistributions in binary form must reproduce the above copyright notice,
-this list of conditions and the following disclaimer in the documentation
-and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
-IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
-OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
-OF THE POSSIBILITY OF SUCH DAMAGE.
-
-*/
-
-#ifndef KOKKOS_SYNCHRONIC_CONFIG_H
-#define KOKKOS_SYNCHRONIC_CONFIG_H
-
-#include <thread>
-#include <chrono>
-
-namespace Kokkos {
-namespace Impl {
-
-//the default yield function used inside the implementation is the Standard one
-#define __synchronic_yield std::this_thread::yield
-#define __synchronic_relax __synchronic_yield
-
-#if defined(_MSC_VER)
- //this is a handy GCC optimization that I use inside the implementation
- #define __builtin_expect(condition,common) condition
- #if _MSC_VER <= 1800
- //using certain keywords that VC++ temporarily doesn't support
- #define _ALLOW_KEYWORD_MACROS
- #define noexcept
- #define constexpr
- #endif
- //yes, I define multiple assignment operators
- #pragma warning(disable:4522)
- //I don't understand how Windows is so bad at timing functions, but is OK
- //with straight-up yield loops
- #define __do_backoff(b) __synchronic_yield()
-#else
-#define __do_backoff(b) b.sleep_for_step()
-#endif
-
-//certain platforms have efficient support for spin-waiting built into the operating system
-#if defined(__linux__) || (defined(_WIN32_WINNT) && _WIN32_WINNT >= 0x0602)
-#if defined(_WIN32_WINNT)
-#include <winsock2.h>
-#include <Windows.h>
- //the combination of WaitOnAddress and WakeByAddressAll is supported on Windows 8.1+
- #define __synchronic_wait(x,v) WaitOnAddress((PVOID)x,(PVOID)&v,sizeof(v),-1)
- #define __synchronic_wait_timed(x,v,t) WaitOnAddress((PVOID)x,(PVOID)&v,sizeof(v),std::chrono::duration_cast<std::chrono::milliseconds>(t).count())
- #define __synchronic_wake_one(x) WakeByAddressSingle((PVOID)x)
- #define __synchronic_wake_all(x) WakeByAddressAll((PVOID)x)
- #define __synchronic_wait_volatile(x,v) WaitOnAddress((PVOID)x,(PVOID)&v,sizeof(v),-1)
- #define __synchronic_wait_timed_volatile(x,v,t) WaitOnAddress((PVOID)x,(PVOID)&v,sizeof(v),std::chrono::duration_cast<std::chrono::milliseconds>(t).count())
- #define __synchronic_wake_one_volatile(x) WakeByAddressSingle((PVOID)x)
- #define __synchronic_wake_all_volatile(x) WakeByAddressAll((PVOID)x)
- #define __SYNCHRONIC_COMPATIBLE(x) (std::is_pod<x>::value && (sizeof(x) <= 8))
-
- inline void native_sleep(unsigned long microseconds)
- {
- // What to do if microseconds is < 1000?
- Sleep(microseconds / 1000);
- }
-
- inline void native_yield()
- {
- SwitchToThread();
- }
-#elif defined(__linux__)
- #include <chrono>
- #include <time.h>
- #include <unistd.h>
- #include <pthread.h>
- #include <linux/futex.h>
- #include <sys/syscall.h>
- #include <climits>
- #include <cassert>
- template < class Rep, class Period>
- inline timespec to_timespec(std::chrono::duration<Rep,Period> const& delta) {
- struct timespec ts;
- ts.tv_sec = static_cast<long>(std::chrono::duration_cast<std::chrono::seconds>(delta).count());
- assert(!ts.tv_sec);
- ts.tv_nsec = static_cast<long>(std::chrono::duration_cast<std::chrono::nanoseconds>(delta).count());
- return ts;
- }
- inline long futex(void const* addr1, int op, int val1) {
- return syscall(SYS_futex, addr1, op, val1, 0, 0, 0);
- }
- inline long futex(void const* addr1, int op, int val1, struct timespec timeout) {
- return syscall(SYS_futex, addr1, op, val1, &timeout, 0, 0);
- }
- inline void native_sleep(unsigned long microseconds)
- {
- usleep(microseconds);
- }
- inline void native_yield()
- {
- pthread_yield();
- }
-
- //the combination of SYS_futex(WAIT) and SYS_futex(WAKE) is supported on all recent Linux distributions
- #define __synchronic_wait(x,v) futex(x, FUTEX_WAIT_PRIVATE, v)
- #define __synchronic_wait_timed(x,v,t) futex(x, FUTEX_WAIT_PRIVATE, v, to_timespec(t))
- #define __synchronic_wake_one(x) futex(x, FUTEX_WAKE_PRIVATE, 1)
- #define __synchronic_wake_all(x) futex(x, FUTEX_WAKE_PRIVATE, INT_MAX)
- #define __synchronic_wait_volatile(x,v) futex(x, FUTEX_WAIT, v)
- #define __synchronic_wait_volatile_timed(x,v,t) futex(x, FUTEX_WAIT, v, to_timespec(t))
- #define __synchronic_wake_one_volatile(x) futex(x, FUTEX_WAKE, 1)
- #define __synchronic_wake_all_volatile(x) futex(x, FUTEX_WAKE, INT_MAX)
- #define __SYNCHRONIC_COMPATIBLE(x) (std::is_integral<x>::value && (sizeof(x) <= 4))
-
- //the yield function on Linux is better replaced by sched_yield, which is tuned for spin-waiting
- #undef __synchronic_yield
- #define __synchronic_yield sched_yield
-
- //for extremely short wait times, just let another hyper-thread run
- #undef __synchronic_relax
- #define __synchronic_relax() asm volatile("rep; nop" ::: "memory")
-
-#endif
-#endif
-
-#ifdef _GLIBCXX_USE_NANOSLEEP
-inline void portable_sleep(std::chrono::microseconds const& time)
-{ std::this_thread::sleep_for(time); }
-#else
-inline void portable_sleep(std::chrono::microseconds const& time)
-{ native_sleep(time.count()); }
-#endif
-
-#ifdef _GLIBCXX_USE_SCHED_YIELD
-inline void portable_yield()
-{ std::this_thread::yield(); }
-#else
-inline void portable_yield()
-{ native_yield(); }
-#endif
-
-//this is the number of times we initially spin, on the first wait attempt
-#define __SYNCHRONIC_SPIN_COUNT_A 16
-
-//this is how decide to yield instead of just spinning, 'c' is the current trip count
-//#define __SYNCHRONIC_SPIN_YIELD(c) true
-#define __SYNCHRONIC_SPIN_RELAX(c) (c>>3)
-
-//this is the number of times we normally spin, on every subsequent wait attempt
-#define __SYNCHRONIC_SPIN_COUNT_B 8
-
-}
-}
-
-#endif //__SYNCHRONIC_CONFIG_H
diff --git a/lib/kokkos/core/src/impl/Kokkos_Synchronic_n3998.hpp b/lib/kokkos/core/src/impl/Kokkos_Synchronic_n3998.hpp
deleted file mode 100644
index facc8d6d8..000000000
--- a/lib/kokkos/core/src/impl/Kokkos_Synchronic_n3998.hpp
+++ /dev/null
@@ -1,162 +0,0 @@
-/*
-
-Copyright (c) 2014, NVIDIA Corporation
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
-1. Redistributions of source code must retain the above copyright notice, this
-list of conditions and the following disclaimer.
-
-2. Redistributions in binary form must reproduce the above copyright notice,
-this list of conditions and the following disclaimer in the documentation
-and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
-IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
-OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
-OF THE POSSIBILITY OF SUCH DAMAGE.
-
-*/
-
-#ifndef KOKKOS_SYNCHRONIC_N3998_HPP
-#define KOKKOS_SYNCHRONIC_N3998_HPP
-
-#include <impl/Kokkos_Synchronic.hpp>
-#include <functional>
-
-/*
-In the section below, a synchronization point represents a point at which a
-thread may block until a given synchronization condition has been reached or
-at which it may notify other threads that a synchronization condition has
-been achieved.
-*/
-namespace Kokkos { namespace Impl {
-
- /*
- A latch maintains an internal counter that is initialized when the latch
- is created. The synchronization condition is reached when the counter is
- decremented to 0. Threads may block at a synchronization point waiting
- for the condition to be reached. When the condition is reached, any such
- blocked threads will be released.
- */
- struct latch {
- latch(int val) : count(val), released(false) { }
- latch(const latch&) = delete;
- latch& operator=(const latch&) = delete;
- ~latch( ) { }
- void arrive( ) {
- __arrive( );
- }
- void arrive_and_wait( ) {
- if(!__arrive( ))
- wait( );
- }
- void wait( ) {
- while(!released.load_when_not_equal(false,std::memory_order_acquire))
- ;
- }
- bool try_wait( ) {
- return released.load(std::memory_order_acquire);
- }
- private:
- bool __arrive( ) {
- if(count.fetch_add(-1,std::memory_order_release)!=1)
- return false;
- released.store(true,std::memory_order_release);
- return true;
- }
- std::atomic<int> count;
- synchronic<bool> released;
- };
-
- /*
- A barrier is created with an initial value representing the number of threads
- that can arrive at the synchronization point. When that many threads have
- arrived, the synchronization condition is reached and the threads are
- released. The barrier will then reset, and may be reused for a new cycle, in
- which the same set of threads may arrive again at the synchronization point.
- The same set of threads shall arrive at the barrier in each cycle, otherwise
- the behaviour is undefined.
- */
- struct barrier {
- barrier(int val) : expected(val), arrived(0), nexpected(val), epoch(0) { }
- barrier(const barrier&) = delete;
- barrier& operator=(const barrier&) = delete;
- ~barrier() { }
- void arrive_and_wait() {
- int const myepoch = epoch.load(std::memory_order_relaxed);
- if(!__arrive(myepoch))
- while(epoch.load_when_not_equal(myepoch,std::memory_order_acquire) == myepoch)
- ;
- }
- void arrive_and_drop() {
- nexpected.fetch_add(-1,std::memory_order_relaxed);
- __arrive(epoch.load(std::memory_order_relaxed));
- }
- private:
- bool __arrive(int const myepoch) {
- int const myresult = arrived.fetch_add(1,std::memory_order_acq_rel) + 1;
- if(__builtin_expect(myresult == expected,0)) {
- expected = nexpected.load(std::memory_order_relaxed);
- arrived.store(0,std::memory_order_relaxed);
- epoch.store(myepoch+1,std::memory_order_release);
- return true;
- }
- return false;
- }
- int expected;
- std::atomic<int> arrived, nexpected;
- synchronic<int> epoch;
- };
-
- /*
- A notifying barrier behaves as a barrier, but is constructed with a callable
- completion function that is invoked after all threads have arrived at the
- synchronization point, and before the synchronization condition is reached.
- The completion may modify the set of threads that arrives at the barrier in
- each cycle.
- */
- struct notifying_barrier {
- template <typename T>
- notifying_barrier(int val, T && f) : expected(val), arrived(0), nexpected(val), epoch(0), completion(std::forward<T>(f)) { }
- notifying_barrier(const notifying_barrier&) = delete;
- notifying_barrier& operator=(const notifying_barrier&) = delete;
- ~notifying_barrier( ) { }
- void arrive_and_wait() {
- int const myepoch = epoch.load(std::memory_order_relaxed);
- if(!__arrive(myepoch))
- while(epoch.load_when_not_equal(myepoch,std::memory_order_acquire) == myepoch)
- ;
- }
- void arrive_and_drop() {
- nexpected.fetch_add(-1,std::memory_order_relaxed);
- __arrive(epoch.load(std::memory_order_relaxed));
- }
- private:
- bool __arrive(int const myepoch) {
- int const myresult = arrived.fetch_add(1,std::memory_order_acq_rel) + 1;
- if(__builtin_expect(myresult == expected,0)) {
- int const newexpected = completion();
- expected = newexpected ? newexpected : nexpected.load(std::memory_order_relaxed);
- arrived.store(0,std::memory_order_relaxed);
- epoch.store(myepoch+1,std::memory_order_release);
- return true;
- }
- return false;
- }
- int expected;
- std::atomic<int> arrived, nexpected;
- synchronic<int> epoch;
- std::function<int()> completion;
- };
-}}
-
-#endif //__N3998_H
diff --git a/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp b/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp
index afa01d0cd..b514df351 100644
--- a/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_TaskQueue.hpp
@@ -1,546 +1,614 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
// Experimental unified task-data parallel manycore LDRD
#ifndef KOKKOS_IMPL_TASKQUEUE_HPP
#define KOKKOS_IMPL_TASKQUEUE_HPP
#if defined( KOKKOS_ENABLE_TASKDAG )
#include <string>
#include <typeinfo>
#include <stdexcept>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
/*\brief Implementation data for task data management, access, and execution.
*
* Curiously recurring template pattern (CRTP)
* to allow static_cast from the
* task root type and a task's FunctorType.
*
* TaskBase< Space , ResultType , FunctorType >
* : TaskBase< Space , ResultType , void >
* , FunctorType
* { ... };
*
* TaskBase< Space , ResultType , void >
* : TaskBase< Space , void , void >
* { ... };
*/
template< typename Space , typename ResultType , typename FunctorType >
class TaskBase ;
-template< typename Space >
-class TaskExec ;
-
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< typename Space >
class TaskQueueSpecialization ;
/** \brief Manage task allocation, deallocation, and scheduling.
*
* Task execution is deferred to the TaskQueueSpecialization.
* All other aspects of task management have shared implementation.
*/
template< typename ExecSpace >
class TaskQueue {
private:
friend class TaskQueueSpecialization< ExecSpace > ;
friend class Kokkos::TaskScheduler< ExecSpace > ;
using execution_space = ExecSpace ;
using specialization = TaskQueueSpecialization< execution_space > ;
using memory_space = typename specialization::memory_space ;
using device_type = Kokkos::Device< execution_space , memory_space > ;
using memory_pool = Kokkos::Experimental::MemoryPool< device_type > ;
using task_root_type = Kokkos::Impl::TaskBase<execution_space,void,void> ;
struct Destroy {
TaskQueue * m_queue ;
void destroy_shared_allocation();
};
//----------------------------------------
enum : int { NumQueue = 3 };
// Queue is organized as [ priority ][ type ]
memory_pool m_memory ;
task_root_type * volatile m_ready[ NumQueue ][ 2 ];
long m_accum_alloc ; // Accumulated number of allocations
int m_count_alloc ; // Current number of allocations
int m_max_alloc ; // Maximum number of allocations
int m_ready_count ; // Number of ready or executing
//----------------------------------------
~TaskQueue();
TaskQueue() = delete ;
TaskQueue( TaskQueue && ) = delete ;
TaskQueue( TaskQueue const & ) = delete ;
TaskQueue & operator = ( TaskQueue && ) = delete ;
TaskQueue & operator = ( TaskQueue const & ) = delete ;
TaskQueue
( const memory_space & arg_space
, unsigned const arg_memory_pool_capacity
, unsigned const arg_memory_pool_superblock_capacity_log2
);
// Schedule a task
// Precondition:
// task is not executing
// task->m_next is the dependence or zero
// Postcondition:
// task->m_next is linked list membership
- KOKKOS_FUNCTION
- void schedule( task_root_type * const );
+ KOKKOS_FUNCTION void schedule_runnable( task_root_type * const );
+ KOKKOS_FUNCTION void schedule_aggregate( task_root_type * const );
// Reschedule a task
// Precondition:
// task is in Executing state
// task->m_next == LockTag
// Postcondition:
// task is in Executing-Respawn state
// task->m_next == 0 (no dependence)
KOKKOS_FUNCTION
void reschedule( task_root_type * );
// Complete a task
// Precondition:
// task is not executing
// task->m_next == LockTag => task is complete
// task->m_next != LockTag => task is respawn
// Postcondition:
// task->m_wait == LockTag => task is complete
// task->m_wait != LockTag => task is waiting
KOKKOS_FUNCTION
void complete( task_root_type * );
KOKKOS_FUNCTION
static bool push_task( task_root_type * volatile * const
, task_root_type * const );
KOKKOS_FUNCTION
- static task_root_type * pop_task( task_root_type * volatile * const );
+ static task_root_type * pop_ready_task( task_root_type * volatile * const );
KOKKOS_FUNCTION static
void decrement( task_root_type * task );
public:
// If and only if the execution space is a single thread
// then execute ready tasks.
KOKKOS_INLINE_FUNCTION
void iff_single_thread_recursive_execute()
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
specialization::iff_single_thread_recursive_execute( this );
#endif
}
void execute() { specialization::execute( this ); }
template< typename FunctorType >
void proc_set_apply( typename task_root_type::function_type * ptr )
{
specialization::template proc_set_apply< FunctorType >( ptr );
}
// Assign task pointer with reference counting of assigned tasks
template< typename LV , typename RV >
KOKKOS_FUNCTION static
void assign( TaskBase< execution_space,LV,void> ** const lhs
, TaskBase< execution_space,RV,void> * const rhs )
{
using task_lhs = TaskBase< execution_space,LV,void> ;
#if 0
{
printf( "assign( 0x%lx { 0x%lx %d %d } , 0x%lx { 0x%lx %d %d } )\n"
, uintptr_t( lhs ? *lhs : 0 )
, uintptr_t( lhs && *lhs ? (*lhs)->m_next : 0 )
, int( lhs && *lhs ? (*lhs)->m_task_type : 0 )
, int( lhs && *lhs ? (*lhs)->m_ref_count : 0 )
, uintptr_t(rhs)
, uintptr_t( rhs ? rhs->m_next : 0 )
, int( rhs ? rhs->m_task_type : 0 )
, int( rhs ? rhs->m_ref_count : 0 )
);
fflush( stdout );
}
#endif
if ( *lhs ) decrement( *lhs );
if ( rhs ) { Kokkos::atomic_increment( &(rhs->m_ref_count) ); }
// Force write of *lhs
*static_cast< task_lhs * volatile * >(lhs) = rhs ;
Kokkos::memory_fence();
}
KOKKOS_FUNCTION
size_t allocate_block_size( size_t n ); ///< Actual block size allocated
KOKKOS_FUNCTION
void * allocate( size_t n ); ///< Allocate from the memory pool
KOKKOS_FUNCTION
void deallocate( void * p , size_t n ); ///< Deallocate to the memory pool
};
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template<>
class TaskBase< void , void , void > {
public:
enum : int16_t { TaskTeam = 0 , TaskSingle = 1 , Aggregate = 2 };
enum : uintptr_t { LockTag = ~uintptr_t(0) , EndTag = ~uintptr_t(1) };
};
/** \brief Base class for task management, access, and execution.
*
* Inheritance structure to allow static_cast from the task root type
* and a task's FunctorType.
*
* // Enable a Future to access result data
* TaskBase< Space , ResultType , void >
* : TaskBase< void , void , void >
* { ... };
*
* // Enable a functor to access the base class
* TaskBase< Space , ResultType , FunctorType >
* : TaskBase< Space , ResultType , void >
* , FunctorType
* { ... };
*
*
* States of a task:
*
* Constructing State, NOT IN a linked list
* m_wait == 0
* m_next == 0
*
* Scheduling transition : Constructing -> Waiting
* before:
* m_wait == 0
* m_next == this task's initial dependence, 0 if none
* after:
* m_wait == EndTag
* m_next == EndTag
*
* Waiting State, IN a linked list
* m_apply != 0
* m_queue != 0
* m_ref_count > 0
* m_wait == head of linked list of tasks waiting on this task
* m_next == next of linked list of tasks
*
* transition : Waiting -> Executing
* before:
* m_next == EndTag
* after::
* m_next == LockTag
*
* Executing State, NOT IN a linked list
* m_apply != 0
* m_queue != 0
* m_ref_count > 0
* m_wait == head of linked list of tasks waiting on this task
* m_next == LockTag
*
* Respawn transition : Executing -> Executing-Respawn
* before:
* m_next == LockTag
* after:
* m_next == this task's updated dependence, 0 if none
*
* Executing-Respawn State, NOT IN a linked list
* m_apply != 0
* m_queue != 0
* m_ref_count > 0
* m_wait == head of linked list of tasks waiting on this task
* m_next == this task's updated dependence, 0 if none
*
* transition : Executing -> Complete
* before:
* m_wait == head of linked list
* after:
* m_wait == LockTag
*
* Complete State, NOT IN a linked list
* m_wait == LockTag: cannot add dependence
* m_next == LockTag: not a member of a wait queue
*
*/
template< typename ExecSpace >
class TaskBase< ExecSpace , void , void >
{
public:
enum : int16_t { TaskTeam = TaskBase<void,void,void>::TaskTeam
, TaskSingle = TaskBase<void,void,void>::TaskSingle
, Aggregate = TaskBase<void,void,void>::Aggregate };
enum : uintptr_t { LockTag = TaskBase<void,void,void>::LockTag
, EndTag = TaskBase<void,void,void>::EndTag };
using execution_space = ExecSpace ;
using queue_type = TaskQueue< execution_space > ;
template< typename > friend class Kokkos::TaskScheduler ;
typedef void (* function_type) ( TaskBase * , void * );
// sizeof(TaskBase) == 48
function_type m_apply ; ///< Apply function pointer
queue_type * m_queue ; ///< Queue in which this task resides
TaskBase * m_wait ; ///< Linked list of tasks waiting on this
TaskBase * m_next ; ///< Waiting linked-list next
int32_t m_ref_count ; ///< Reference count
int32_t m_alloc_size ; ///< Allocation size
int32_t m_dep_count ; ///< Aggregate's number of dependences
int16_t m_task_type ; ///< Type of task
int16_t m_priority ; ///< Priority of runnable task
+ TaskBase() = delete ;
TaskBase( TaskBase && ) = delete ;
TaskBase( const TaskBase & ) = delete ;
TaskBase & operator = ( TaskBase && ) = delete ;
TaskBase & operator = ( const TaskBase & ) = delete ;
KOKKOS_INLINE_FUNCTION ~TaskBase() = default ;
+ // Constructor for a runnable task
KOKKOS_INLINE_FUNCTION
- constexpr TaskBase() noexcept
- : m_apply(0)
- , m_queue(0)
- , m_wait(0)
- , m_next(0)
- , m_ref_count(0)
- , m_alloc_size(0)
- , m_dep_count(0)
- , m_task_type( TaskSingle )
- , m_priority( 1 /* TaskRegularPriority */ )
+ constexpr TaskBase( function_type arg_apply
+ , queue_type * arg_queue
+ , TaskBase * arg_dependence
+ , int arg_ref_count
+ , int arg_alloc_size
+ , int arg_task_type
+ , int arg_priority
+ ) noexcept
+ : m_apply( arg_apply )
+ , m_queue( arg_queue )
+ , m_wait( 0 )
+ , m_next( arg_dependence )
+ , m_ref_count( arg_ref_count )
+ , m_alloc_size( arg_alloc_size )
+ , m_dep_count( 0 )
+ , m_task_type( arg_task_type )
+ , m_priority( arg_priority )
+ {}
+
+ // Constructor for an aggregate task
+ KOKKOS_INLINE_FUNCTION
+ constexpr TaskBase( queue_type * arg_queue
+ , int arg_ref_count
+ , int arg_alloc_size
+ , int arg_dep_count
+ ) noexcept
+ : m_apply( 0 )
+ , m_queue( arg_queue )
+ , m_wait( 0 )
+ , m_next( 0 )
+ , m_ref_count( arg_ref_count )
+ , m_alloc_size( arg_alloc_size )
+ , m_dep_count( arg_dep_count )
+ , m_task_type( Aggregate )
+ , m_priority( 0 )
{}
//----------------------------------------
KOKKOS_INLINE_FUNCTION
TaskBase ** aggregate_dependences()
{ return reinterpret_cast<TaskBase**>( this + 1 ); }
KOKKOS_INLINE_FUNCTION
bool requested_respawn()
{
// This should only be called when a task has finished executing and is
// in the transition to either the complete or executing-respawn state.
TaskBase * const lock = reinterpret_cast< TaskBase * >( LockTag );
return lock != m_next;
}
KOKKOS_INLINE_FUNCTION
void add_dependence( TaskBase* dep )
{
+ // Precondition: lock == m_next
+
+ TaskBase * const lock = (TaskBase *) LockTag ;
+
// Assign dependence to m_next. It will be processed in the subsequent
// call to schedule. Error if the dependence is reset.
- if ( 0 != Kokkos::atomic_exchange( & m_next, dep ) ) {
+ if ( lock != Kokkos::atomic_exchange( & m_next, dep ) ) {
Kokkos::abort("TaskScheduler ERROR: resetting task dependence");
}
if ( 0 != dep ) {
// The future may be destroyed upon returning from this call
// so increment reference count to track this assignment.
Kokkos::atomic_increment( &(dep->m_ref_count) );
}
}
using get_return_type = void ;
KOKKOS_INLINE_FUNCTION
get_return_type get() const {}
};
template < typename ExecSpace , typename ResultType >
class TaskBase< ExecSpace , ResultType , void >
: public TaskBase< ExecSpace , void , void >
{
private:
- static_assert( sizeof(TaskBase<ExecSpace,void,void>) == 48 , "" );
+ using root_type = TaskBase<ExecSpace,void,void> ;
+ using function_type = typename root_type::function_type ;
+ using queue_type = typename root_type::queue_type ;
+ static_assert( sizeof(root_type) == 48 , "" );
+
+ TaskBase() = delete ;
TaskBase( TaskBase && ) = delete ;
TaskBase( const TaskBase & ) = delete ;
TaskBase & operator = ( TaskBase && ) = delete ;
TaskBase & operator = ( const TaskBase & ) = delete ;
public:
ResultType m_result ;
KOKKOS_INLINE_FUNCTION ~TaskBase() = default ;
+ // Constructor for runnable task
KOKKOS_INLINE_FUNCTION
- TaskBase()
- : TaskBase< ExecSpace , void , void >()
+ constexpr TaskBase( function_type arg_apply
+ , queue_type * arg_queue
+ , root_type * arg_dependence
+ , int arg_ref_count
+ , int arg_alloc_size
+ , int arg_task_type
+ , int arg_priority
+ )
+ : root_type( arg_apply
+ , arg_queue
+ , arg_dependence
+ , arg_ref_count
+ , arg_alloc_size
+ , arg_task_type
+ , arg_priority
+ )
, m_result()
{}
using get_return_type = ResultType const & ;
KOKKOS_INLINE_FUNCTION
get_return_type get() const { return m_result ; }
};
template< typename ExecSpace , typename ResultType , typename FunctorType >
class TaskBase
: public TaskBase< ExecSpace , ResultType , void >
, public FunctorType
{
private:
TaskBase() = delete ;
TaskBase( TaskBase && ) = delete ;
TaskBase( const TaskBase & ) = delete ;
TaskBase & operator = ( TaskBase && ) = delete ;
TaskBase & operator = ( const TaskBase & ) = delete ;
public:
- using root_type = TaskBase< ExecSpace , void , void > ;
- using base_type = TaskBase< ExecSpace , ResultType , void > ;
- using member_type = TaskExec< ExecSpace > ;
- using functor_type = FunctorType ;
- using result_type = ResultType ;
+ using root_type = TaskBase< ExecSpace , void , void > ;
+ using base_type = TaskBase< ExecSpace , ResultType , void > ;
+ using specialization = TaskQueueSpecialization< ExecSpace > ;
+ using function_type = typename root_type::function_type ;
+ using queue_type = typename root_type::queue_type ;
+ using member_type = typename specialization::member_type ;
+ using functor_type = FunctorType ;
+ using result_type = ResultType ;
template< typename Type >
KOKKOS_INLINE_FUNCTION static
void apply_functor
( Type * const task
, typename std::enable_if
< std::is_same< typename Type::result_type , void >::value
, member_type * const
>::type member
)
{
using fType = typename Type::functor_type ;
static_cast<fType*>(task)->operator()( *member );
}
template< typename Type >
KOKKOS_INLINE_FUNCTION static
void apply_functor
( Type * const task
, typename std::enable_if
< ! std::is_same< typename Type::result_type , void >::value
, member_type * const
>::type member
)
{
using fType = typename Type::functor_type ;
static_cast<fType*>(task)->operator()( *member , task->m_result );
}
KOKKOS_FUNCTION static
void apply( root_type * root , void * exec )
{
TaskBase * const task = static_cast< TaskBase * >( root );
member_type * const member = reinterpret_cast< member_type * >( exec );
TaskBase::template apply_functor( task , member );
// Task may be serial or team.
// If team then must synchronize before querying if respawn was requested.
// If team then only one thread calls destructor.
member->team_barrier();
if ( 0 == member->team_rank() && !(task->requested_respawn()) ) {
// Did not respawn, destroy the functor to free memory.
static_cast<functor_type*>(task)->~functor_type();
- // Cannot destroy the task until its dependences have been processed.
+ // Cannot destroy and deallocate the task until its dependences
+ // have been processed.
}
}
+ // Constructor for runnable task
KOKKOS_INLINE_FUNCTION
- TaskBase( functor_type const & arg_functor )
- : base_type()
+ constexpr TaskBase( function_type arg_apply
+ , queue_type * arg_queue
+ , root_type * arg_dependence
+ , int arg_ref_count
+ , int arg_alloc_size
+ , int arg_task_type
+ , int arg_priority
+ , FunctorType && arg_functor
+ )
+ : base_type( arg_apply
+ , arg_queue
+ , arg_dependence
+ , arg_ref_count
+ , arg_alloc_size
+ , arg_task_type
+ , arg_priority
+ )
, functor_type( arg_functor )
{}
KOKKOS_INLINE_FUNCTION
~TaskBase() {}
};
} /* namespace Impl */
} /* namespace Kokkos */
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
#endif /* #ifndef KOKKOS_IMPL_TASKQUEUE_HPP */
diff --git a/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp b/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp
index fefbbad8b..23f5d3cd3 100644
--- a/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_TaskQueue_impl.hpp
@@ -1,590 +1,661 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#if defined( KOKKOS_ENABLE_TASKDAG )
namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
template< typename ExecSpace >
void TaskQueue< ExecSpace >::Destroy::destroy_shared_allocation()
{
m_queue->~TaskQueue();
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
TaskQueue< ExecSpace >::TaskQueue
( const TaskQueue< ExecSpace >::memory_space & arg_space
, unsigned const arg_memory_pool_capacity
, unsigned const arg_memory_pool_superblock_capacity_log2
)
: m_memory( arg_space
, arg_memory_pool_capacity
, arg_memory_pool_superblock_capacity_log2 )
, m_ready()
, m_accum_alloc(0)
, m_count_alloc(0)
, m_max_alloc(0)
, m_ready_count(0)
{
for ( int i = 0 ; i < NumQueue ; ++i ) {
m_ready[i][0] = (task_root_type *) task_root_type::EndTag ;
m_ready[i][1] = (task_root_type *) task_root_type::EndTag ;
}
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
TaskQueue< ExecSpace >::~TaskQueue()
{
// Verify that queues are empty and ready count is zero
for ( int i = 0 ; i < NumQueue ; ++i ) {
for ( int j = 0 ; j < 2 ; ++j ) {
if ( m_ready[i][j] != (task_root_type *) task_root_type::EndTag ) {
Kokkos::abort("TaskQueue::~TaskQueue ERROR: has ready tasks");
}
}
}
if ( 0 != m_ready_count ) {
Kokkos::abort("TaskQueue::~TaskQueue ERROR: has ready or executing tasks");
}
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
void TaskQueue< ExecSpace >::decrement
( TaskQueue< ExecSpace >::task_root_type * task )
{
const int count = Kokkos::atomic_fetch_add(&(task->m_ref_count),-1);
#if 0
if ( 1 == count ) {
printf( "decrement-destroy( 0x%lx { 0x%lx %d %d } )\n"
, uintptr_t( task )
, uintptr_t( task->m_next )
, int( task->m_task_type )
, int( task->m_ref_count )
);
}
#endif
if ( ( 1 == count ) &&
( task->m_next == (task_root_type *) task_root_type::LockTag ) ) {
// Reference count is zero and task is complete, deallocate.
task->m_queue->deallocate( task , task->m_alloc_size );
}
else if ( count <= 1 ) {
Kokkos::abort("TaskScheduler task has negative reference count or is incomplete" );
}
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
size_t TaskQueue< ExecSpace >::allocate_block_size( size_t n )
{
return m_memory.allocate_block_size( n );
}
template< typename ExecSpace >
KOKKOS_FUNCTION
void * TaskQueue< ExecSpace >::allocate( size_t n )
{
void * const p = m_memory.allocate(n);
if ( p ) {
Kokkos::atomic_increment( & m_accum_alloc );
Kokkos::atomic_increment( & m_count_alloc );
if ( m_max_alloc < m_count_alloc ) m_max_alloc = m_count_alloc ;
}
return p ;
}
template< typename ExecSpace >
KOKKOS_FUNCTION
void TaskQueue< ExecSpace >::deallocate( void * p , size_t n )
{
m_memory.deallocate( p , n );
Kokkos::atomic_decrement( & m_count_alloc );
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
bool TaskQueue< ExecSpace >::push_task
( TaskQueue< ExecSpace >::task_root_type * volatile * const queue
, TaskQueue< ExecSpace >::task_root_type * const task
)
{
// Push task into a concurrently pushed and popped queue.
+ // The queue can be either a ready task queue or a waiting task queue.
// The queue is a linked list where 'task->m_next' form the links.
// Fail the push attempt if the queue is locked;
// otherwise retry until the push succeeds.
#if 0
printf( "push_task( 0x%lx { 0x%lx } 0x%lx { 0x%lx 0x%lx %d %d %d } )\n"
, uintptr_t(queue)
, uintptr_t(*queue)
, uintptr_t(task)
, uintptr_t(task->m_wait)
, uintptr_t(task->m_next)
, task->m_task_type
, task->m_priority
, task->m_ref_count );
#endif
task_root_type * const zero = (task_root_type *) 0 ;
task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
task_root_type * volatile * const next = & task->m_next ;
if ( zero != *next ) {
Kokkos::abort("TaskQueue::push_task ERROR: already a member of another queue" );
}
task_root_type * y = *queue ;
while ( lock != y ) {
*next = y ;
// Do not proceed until '*next' has been stored.
Kokkos::memory_fence();
task_root_type * const x = y ;
y = Kokkos::atomic_compare_exchange(queue,y,task);
if ( x == y ) return true ;
}
// Failed, replace 'task->m_next' value since 'task' remains
// not a member of a queue.
*next = zero ;
// Do not proceed until '*next' has been stored.
Kokkos::memory_fence();
return false ;
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
typename TaskQueue< ExecSpace >::task_root_type *
-TaskQueue< ExecSpace >::pop_task
+TaskQueue< ExecSpace >::pop_ready_task
( TaskQueue< ExecSpace >::task_root_type * volatile * const queue )
{
- // Pop task from a concurrently pushed and popped queue.
+ // Pop task from a concurrently pushed and popped ready task queue.
// The queue is a linked list where 'task->m_next' form the links.
- task_root_type * const zero = (task_root_type *) 0 ;
task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
// *queue is
// end => an empty queue
// lock => a locked queue
// valid
// Retry until the lock is acquired or the queue is empty.
task_root_type * task = *queue ;
while ( end != task ) {
// The only possible values for the queue are
// (1) lock, (2) end, or (3) a valid task.
// Thus zero will never appear in the queue.
//
- // If queue is locked then just read by guaranteeing
- // the CAS will fail.
+ // If queue is locked then just read by guaranteeing the CAS will fail.
if ( lock == task ) task = 0 ;
task_root_type * const x = task ;
- task = Kokkos::atomic_compare_exchange(queue,task,lock);
-
- if ( x == task ) break ; // CAS succeeded and queue is locked
- }
+ task = Kokkos::atomic_compare_exchange(queue,x,lock);
- if ( end != task ) {
+ if ( x == task ) {
+ // CAS succeeded and queue is locked
+ //
+ // This thread has locked the queue and removed 'task' from the queue.
+ // Extract the next entry of the queue from 'task->m_next'
+ // and mark 'task' as popped from a queue by setting
+ // 'task->m_next = lock'.
+ //
+ // Place the next entry in the head of the queue,
+ // which also unlocks the queue.
+ //
+ // This thread has exclusive access to
+ // the queue and the popped task's m_next.
- // This thread has locked the queue and removed 'task' from the queue.
- // Extract the next entry of the queue from 'task->m_next'
- // and mark 'task' as popped from a queue by setting
- // 'task->m_next = lock'.
+ *queue = task->m_next ; task->m_next = lock ;
- task_root_type * const next =
- Kokkos::atomic_exchange( & task->m_next , lock );
+ Kokkos::memory_fence();
- // Place the next entry in the head of the queue,
- // which also unlocks the queue.
-
- task_root_type * const unlock =
- Kokkos::atomic_exchange( queue , next );
+#if 0
+ printf( "pop_ready_task( 0x%lx 0x%lx { 0x%lx 0x%lx %d %d %d } )\n"
+ , uintptr_t(queue)
+ , uintptr_t(task)
+ , uintptr_t(task->m_wait)
+ , uintptr_t(task->m_next)
+ , int(task->m_task_type)
+ , int(task->m_priority)
+ , int(task->m_ref_count) );
+#endif
- if ( next == zero || next == lock || lock != unlock ) {
- Kokkos::abort("TaskQueue::pop_task ERROR");
+ return task ;
}
}
-#if 0
- if ( end != task ) {
- printf( "pop_task( 0x%lx 0x%lx { 0x%lx 0x%lx %d %d %d } )\n"
- , uintptr_t(queue)
- , uintptr_t(task)
- , uintptr_t(task->m_wait)
- , uintptr_t(task->m_next)
- , int(task->m_task_type)
- , int(task->m_priority)
- , int(task->m_ref_count) );
- }
-#endif
-
- return task ;
+ return end ;
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
-void TaskQueue< ExecSpace >::schedule
+void TaskQueue< ExecSpace >::schedule_runnable
( TaskQueue< ExecSpace >::task_root_type * const task )
{
- // Schedule a runnable or when_all task upon construction / spawn
+ // Schedule a runnable task upon construction / spawn
// and upon completion of other tasks that 'task' is waiting on.
-
- // Precondition on runnable task state:
- // task is either constructing or executing
+ //
+ // Precondition:
+ // - called by a single thread for the input task
+ // - calling thread has exclusive access to the task
+ // - task is not a member of a queue
+ // - if runnable then task is either constructing or respawning
//
// Constructing state:
// task->m_wait == 0
- // task->m_next == dependence
- // Executing-respawn state:
- // task->m_wait == head of linked list
- // task->m_next == dependence
+ // task->m_next == dependence or 0
+ // Respawn state:
+ // task->m_wait == head of linked list: 'end' or valid task
+ // task->m_next == dependence or 0
//
// Task state transition:
- // Constructing -> Waiting
- // Executing-respawn -> Waiting
+ // Constructing -> Waiting
+ // Respawn -> Waiting
//
// Postcondition on task state:
- // task->m_wait == head of linked list
- // task->m_next == member of linked list
+ // task->m_wait == head of linked list (queue)
+ // task->m_next == member of linked list (queue)
#if 0
- printf( "schedule( 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
+ printf( "schedule_runnable( 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
, uintptr_t(task)
, uintptr_t(task->m_wait)
, uintptr_t(task->m_next)
, task->m_task_type
, task->m_priority
, task->m_ref_count );
#endif
task_root_type * const zero = (task_root_type *) 0 ;
task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
- //----------------------------------------
- {
- // If Constructing then task->m_wait == 0
- // Change to waiting by task->m_wait = EndTag
-
- task_root_type * const init =
- Kokkos::atomic_compare_exchange( & task->m_wait , zero , end );
+ bool respawn = false ;
- // Precondition
+ //----------------------------------------
- if ( lock == init ) {
- Kokkos::abort("TaskQueue::schedule ERROR: task is complete");
- }
+ if ( zero == task->m_wait ) {
+ // Task in Constructing state
+ // - Transition to Waiting state
+ // Preconditions:
+ // - call occurs exclusively within a single thread
- // if ( init == 0 ) Constructing -> Waiting
- // else Executing-Respawn -> Waiting
+ task->m_wait = end ;
+ // Task in Waiting state
}
+ else if ( lock != task->m_wait ) {
+ // Task in Executing state with Respawn request
+ // - Update dependence
+ // - Transition to Waiting state
+ respawn = true ;
+ }
+ else {
+ // Task in Complete state
+ Kokkos::abort("TaskQueue::schedule_runnable ERROR: task is complete");
+ }
+
//----------------------------------------
+ // Scheduling a runnable task which may have a depencency 'dep'.
+ // Extract dependence, if any, from task->m_next.
+ // If 'dep' is not null then attempt to push 'task'
+ // into the wait queue of 'dep'.
+ // If the push succeeds then 'task' may be
+ // processed or executed by another thread at any time.
+ // If the push fails then 'dep' is complete and 'task'
+ // is ready to execute.
+
+ // Exclusive access so don't need an atomic exchange
+ // task_root_type * dep = Kokkos::atomic_exchange( & task->m_next , zero );
+ task_root_type * dep = task->m_next ; task->m_next = zero ;
+
+ const bool is_ready =
+ ( 0 == dep ) || ( ! push_task( & dep->m_wait , task ) );
+
+ if ( ( 0 != dep ) && respawn ) {
+ // Reference count for dep was incremented when
+ // respawn assigned dependency to task->m_next
+ // so that if dep completed prior to the
+ // above push_task dep would not be destroyed.
+ // dep reference count can now be decremented,
+ // which may deallocate the task.
+ TaskQueue::assign( & dep , (task_root_type *)0 );
+ }
- if ( task_root_type::Aggregate != task->m_task_type ) {
+ if ( is_ready ) {
- // Scheduling a runnable task which may have a depencency 'dep'.
- // Extract dependence, if any, from task->m_next.
- // If 'dep' is not null then attempt to push 'task'
- // into the wait queue of 'dep'.
- // If the push succeeds then 'task' may be
- // processed or executed by another thread at any time.
- // If the push fails then 'dep' is complete and 'task'
- // is ready to execute.
+ // No dependence or 'dep' is complete so push task into ready queue.
+ // Increment the ready count before pushing into ready queue
+ // to track number of ready + executing tasks.
+ // The ready count will be decremented when the task is complete.
- task_root_type * dep = Kokkos::atomic_exchange( & task->m_next , zero );
+ Kokkos::atomic_increment( & m_ready_count );
- const bool is_ready =
- ( 0 == dep ) || ( ! push_task( & dep->m_wait , task ) );
+ task_root_type * volatile * const ready_queue =
+ & m_ready[ task->m_priority ][ task->m_task_type ];
- // Reference count for dep was incremented when assigned
- // to task->m_next so that if it completed prior to the
- // above push_task dep would not be destroyed.
- // dep reference count can now be decremented,
- // which may deallocate the task.
- TaskQueue::assign( & dep , (task_root_type *)0 );
+ // A push_task fails if the ready queue is locked.
+ // A ready queue is only locked during a push or pop;
+ // i.e., it is never permanently locked.
+ // Retry push to ready queue until it succeeds.
+ // When the push succeeds then 'task' may be
+ // processed or executed by another thread at any time.
- if ( is_ready ) {
+ while ( ! push_task( ready_queue , task ) );
+ }
- // No dependence or 'dep' is complete so push task into ready queue.
- // Increment the ready count before pushing into ready queue
- // to track number of ready + executing tasks.
- // The ready count will be decremented when the task is complete.
+ //----------------------------------------
+ // Postcondition:
+ // - A runnable 'task' was pushed into a wait or ready queue.
+ // - Concurrent execution may have already popped 'task'
+ // from a queue and processed it as appropriate.
+}
- Kokkos::atomic_increment( & m_ready_count );
+template< typename ExecSpace >
+KOKKOS_FUNCTION
+void TaskQueue< ExecSpace >::schedule_aggregate
+ ( TaskQueue< ExecSpace >::task_root_type * const task )
+{
+ // Schedule an aggregate task upon construction
+ // and upon completion of other tasks that 'task' is waiting on.
+ //
+ // Precondition:
+ // - called by a single thread for the input task
+ // - calling thread has exclusive access to the task
+ // - task is not a member of a queue
+ //
+ // Constructing state:
+ // task->m_wait == 0
+ // task->m_next == dependence or 0
+ //
+ // Task state transition:
+ // Constructing -> Waiting
+ //
+ // Postcondition on task state:
+ // task->m_wait == head of linked list (queue)
+ // task->m_next == member of linked list (queue)
+
+#if 0
+ printf( "schedule_aggregate( 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
+ , uintptr_t(task)
+ , uintptr_t(task->m_wait)
+ , uintptr_t(task->m_next)
+ , task->m_task_type
+ , task->m_priority
+ , task->m_ref_count );
+#endif
- task_root_type * volatile * const queue =
- & m_ready[ task->m_priority ][ task->m_task_type ];
+ task_root_type * const zero = (task_root_type *) 0 ;
+ task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
+ task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
- // A push_task fails if the ready queue is locked.
- // A ready queue is only locked during a push or pop;
- // i.e., it is never permanently locked.
- // Retry push to ready queue until it succeeds.
- // When the push succeeds then 'task' may be
- // processed or executed by another thread at any time.
+ //----------------------------------------
- while ( ! push_task( queue , task ) );
- }
+ if ( zero == task->m_wait ) {
+ // Task in Constructing state
+ // - Transition to Waiting state
+ // Preconditions:
+ // - call occurs exclusively within a single thread
+
+ task->m_wait = end ;
+ // Task in Waiting state
+ }
+ else if ( lock == task->m_wait ) {
+ // Task in Complete state
+ Kokkos::abort("TaskQueue::schedule_aggregate ERROR: task is complete");
}
+
//----------------------------------------
- else {
- // Scheduling a 'when_all' task with multiple dependences.
- // This scheduling may be called when the 'when_all' is
- // (1) created or
- // (2) being removed from a completed task's wait list.
+ // Scheduling a 'when_all' task with multiple dependences.
+ // This scheduling may be called when the 'when_all' is
+ // (1) created or
+ // (2) being removed from a completed task's wait list.
- task_root_type ** const aggr = task->aggregate_dependences();
+ task_root_type ** const aggr = task->aggregate_dependences();
- // Assume the 'when_all' is complete until a dependence is
- // found that is not complete.
+ // Assume the 'when_all' is complete until a dependence is
+ // found that is not complete.
- bool is_complete = true ;
+ bool is_complete = true ;
- for ( int i = task->m_dep_count ; 0 < i && is_complete ; ) {
+ for ( int i = task->m_dep_count ; 0 < i && is_complete ; ) {
- --i ;
+ --i ;
- // Loop dependences looking for an incomplete task.
- // Add this task to the incomplete task's wait queue.
+ // Loop dependences looking for an incomplete task.
+ // Add this task to the incomplete task's wait queue.
- // Remove a task 'x' from the dependence list.
- // The reference count of 'x' was incremented when
- // it was assigned into the dependence list.
+ // Remove a task 'x' from the dependence list.
+ // The reference count of 'x' was incremented when
+ // it was assigned into the dependence list.
- task_root_type * x = Kokkos::atomic_exchange( aggr + i , zero );
+ // Exclusive access so don't need an atomic exchange
+ // task_root_type * x = Kokkos::atomic_exchange( aggr + i , zero );
+ task_root_type * x = aggr[i] ; aggr[i] = zero ;
- if ( x ) {
+ if ( x ) {
- // If x->m_wait is not locked then push succeeds
- // and the aggregate is not complete.
- // If the push succeeds then this when_all 'task' may be
- // processed by another thread at any time.
- // For example, 'x' may be completeed by another
- // thread and then re-schedule this when_all 'task'.
+ // If x->m_wait is not locked then push succeeds
+ // and the aggregate is not complete.
+ // If the push succeeds then this when_all 'task' may be
+ // processed by another thread at any time.
+ // For example, 'x' may be completeed by another
+ // thread and then re-schedule this when_all 'task'.
- is_complete = ! push_task( & x->m_wait , task );
+ is_complete = ! push_task( & x->m_wait , task );
- // Decrement reference count which had been incremented
- // when 'x' was added to the dependence list.
+ // Decrement reference count which had been incremented
+ // when 'x' was added to the dependence list.
- TaskQueue::assign( & x , zero );
- }
+ TaskQueue::assign( & x , zero );
}
+ }
- if ( is_complete ) {
- // The when_all 'task' was not added to a wait queue because
- // all dependences were complete so this aggregate is complete.
- // Complete the when_all 'task' to schedule other tasks
- // that are waiting for the when_all 'task' to complete.
+ if ( is_complete ) {
+ // The when_all 'task' was not added to a wait queue because
+ // all dependences were complete so this aggregate is complete.
+ // Complete the when_all 'task' to schedule other tasks
+ // that are waiting for the when_all 'task' to complete.
- task->m_next = lock ;
+ task->m_next = lock ;
- complete( task );
+ complete( task );
- // '*task' may have been deleted upon completion
- }
+ // '*task' may have been deleted upon completion
}
+
//----------------------------------------
// Postcondition:
- // A runnable 'task' was pushed into a wait or ready queue.
- // An aggregate 'task' was either pushed to a wait queue
- // or completed.
- // Concurrent execution may have already popped 'task'
- // from a queue and processed it as appropriate.
+ // - An aggregate 'task' was either pushed to a wait queue or completed.
+ // - Concurrent execution may have already popped 'task'
+ // from a queue and processed it as appropriate.
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
void TaskQueue< ExecSpace >::reschedule( task_root_type * task )
{
// Precondition:
// task is in Executing state
// task->m_next == LockTag
//
// Postcondition:
// task is in Executing-Respawn state
// task->m_next == 0 (no dependence)
task_root_type * const zero = (task_root_type *) 0 ;
task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
if ( lock != Kokkos::atomic_exchange( & task->m_next, zero ) ) {
Kokkos::abort("TaskScheduler::respawn ERROR: already respawned");
}
}
//----------------------------------------------------------------------------
template< typename ExecSpace >
KOKKOS_FUNCTION
void TaskQueue< ExecSpace >::complete
( TaskQueue< ExecSpace >::task_root_type * task )
{
// Complete a runnable task that has finished executing
// or a when_all task when all of its dependeneces are complete.
task_root_type * const zero = (task_root_type *) 0 ;
task_root_type * const lock = (task_root_type *) task_root_type::LockTag ;
task_root_type * const end = (task_root_type *) task_root_type::EndTag ;
#if 0
printf( "complete( 0x%lx { 0x%lx 0x%lx %d %d %d }\n"
, uintptr_t(task)
, uintptr_t(task->m_wait)
, uintptr_t(task->m_next)
, task->m_task_type
, task->m_priority
, task->m_ref_count );
fflush( stdout );
#endif
const bool runnable = task_root_type::Aggregate != task->m_task_type ;
//----------------------------------------
if ( runnable && lock != task->m_next ) {
// Is a runnable task has finished executing and requested respawn.
// Schedule the task for subsequent execution.
- schedule( task );
+ schedule_runnable( task );
}
//----------------------------------------
else {
// Is either an aggregate or a runnable task that executed
// and did not respawn. Transition this task to complete.
// If 'task' is an aggregate then any of the runnable tasks that
// it depends upon may be attempting to complete this 'task'.
// Must only transition a task once to complete status.
// This is controled by atomically locking the wait queue.
// Stop other tasks from adding themselves to this task's wait queue
// by locking the head of this task's wait queue.
task_root_type * x = Kokkos::atomic_exchange( & task->m_wait , lock );
if ( x != (task_root_type *) lock ) {
// This thread has transitioned this 'task' to complete.
// 'task' is no longer in a queue and is not executing
// so decrement the reference count from 'task's creation.
// If no other references to this 'task' then it will be deleted.
TaskQueue::assign( & task , zero );
// This thread has exclusive access to the wait list so
- // the concurrency-safe pop_task function is not needed.
+ // the concurrency-safe pop_ready_task function is not needed.
// Schedule the tasks that have been waiting on the input 'task',
// which may have been deleted.
while ( x != end ) {
+ // Have exclusive access to 'x' until it is scheduled
+ // Set x->m_next = zero <= no dependence, not a respawn
- // Set x->m_next = zero <= no dependence
-
- task_root_type * const next =
- (task_root_type *) Kokkos::atomic_exchange( & x->m_next , zero );
+ task_root_type * const next = x->m_next ; x->m_next = 0 ;
- schedule( x );
+ if ( task_root_type::Aggregate != x->m_task_type ) {
+ schedule_runnable( x );
+ }
+ else {
+ schedule_aggregate( x );
+ }
x = next ;
}
}
}
if ( runnable ) {
// A runnable task was popped from a ready queue and executed.
// If respawned into a ready queue then the ready count was incremented
// so decrement whether respawned or not.
Kokkos::atomic_decrement( & m_ready_count );
}
}
//----------------------------------------------------------------------------
} /* namespace Impl */
} /* namespace Kokkos */
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
diff --git a/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp b/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp
index ff503cb27..d72cde03f 100644
--- a/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_Utilities.hpp
@@ -1,414 +1,415 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CORE_IMPL_UTILITIES_HPP
#define KOKKOS_CORE_IMPL_UTILITIES_HPP
#include <Kokkos_Macros.hpp>
+#include <stdint.h>
#include <type_traits>
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos { namespace Impl {
// same as std::forward
// needed to allow perfect forwarding on the device
template <typename T>
KOKKOS_INLINE_FUNCTION
constexpr
T&& forward( typename std::remove_reference<T>::type& arg ) noexcept
{ return static_cast<T&&>(arg); }
template <typename T>
KOKKOS_INLINE_FUNCTION
constexpr
T&& forward( typename std::remove_reference<T>::type&& arg ) noexcept
{ return static_cast<T&&>(arg); }
// same as std::move
// needed to allowing moving on the device
template <typename T>
KOKKOS_INLINE_FUNCTION
constexpr
typename std::remove_reference<T>::type&& move( T&& arg ) noexcept
{ return static_cast<typename std::remove_reference<T>::type&&>(arg); }
// empty function to allow expanding a variadic argument pack
template<typename... Args>
KOKKOS_INLINE_FUNCTION
void expand_variadic(Args &&...) {}
//----------------------------------------
// C++14 integer sequence
template< typename T , T ... Ints >
struct integer_sequence {
using value_type = T ;
static constexpr std::size_t size() noexcept { return sizeof...(Ints); }
};
template< typename T , std::size_t N >
struct make_integer_sequence_helper ;
template< typename T , T N >
using make_integer_sequence =
typename make_integer_sequence_helper<T,N>::type ;
template< typename T >
struct make_integer_sequence_helper< T , 0 >
{ using type = integer_sequence<T> ; };
template< typename T >
struct make_integer_sequence_helper< T , 1 >
{ using type = integer_sequence<T,0> ; };
template< typename T >
struct make_integer_sequence_helper< T , 2 >
{ using type = integer_sequence<T,0,1> ; };
template< typename T >
struct make_integer_sequence_helper< T , 3 >
{ using type = integer_sequence<T,0,1,2> ; };
template< typename T >
struct make_integer_sequence_helper< T , 4 >
{ using type = integer_sequence<T,0,1,2,3> ; };
template< typename T >
struct make_integer_sequence_helper< T , 5 >
{ using type = integer_sequence<T,0,1,2,3,4> ; };
template< typename T >
struct make_integer_sequence_helper< T , 6 >
{ using type = integer_sequence<T,0,1,2,3,4,5> ; };
template< typename T >
struct make_integer_sequence_helper< T , 7 >
{ using type = integer_sequence<T,0,1,2,3,4,5,6> ; };
template< typename T >
struct make_integer_sequence_helper< T , 8 >
{ using type = integer_sequence<T,0,1,2,3,4,5,6,7> ; };
template< typename X , typename Y >
struct make_integer_sequence_concat ;
template< typename T , T ... x , T ... y >
struct make_integer_sequence_concat< integer_sequence<T,x...>
, integer_sequence<T,y...> >
{ using type = integer_sequence< T , x ... , (sizeof...(x)+y)... > ; };
template< typename T , std::size_t N >
struct make_integer_sequence_helper {
using type = typename make_integer_sequence_concat
< typename make_integer_sequence_helper< T , N/2 >::type
, typename make_integer_sequence_helper< T , N - N/2 >::type
>::type ;
};
//----------------------------------------
template <std::size_t... Indices>
using index_sequence = integer_sequence<std::size_t, Indices...>;
template< std::size_t N >
using make_index_sequence = make_integer_sequence< std::size_t, N>;
//----------------------------------------
template <unsigned I, typename IntegerSequence>
struct integer_sequence_at;
template <unsigned I, typename T, T h0, T... tail>
struct integer_sequence_at<I, integer_sequence<T, h0, tail...> >
: public integer_sequence_at<I-1u, integer_sequence<T,tail...> >
{
static_assert( 8 <= I , "Reasoning Error" );
static_assert( I < integer_sequence<T, h0, tail...>::size(), "Error: Index out of bounds");
};
template < typename T, T h0, T... tail>
struct integer_sequence_at<0u, integer_sequence<T,h0, tail...> >
{
using type = T;
static constexpr T value = h0;
};
template < typename T, T h0, T h1, T... tail>
struct integer_sequence_at<1u, integer_sequence<T, h0, h1, tail...> >
{
using type = T;
static constexpr T value = h1;
};
template < typename T, T h0, T h1, T h2, T... tail>
struct integer_sequence_at<2u, integer_sequence<T, h0, h1, h2, tail...> >
{
using type = T;
static constexpr T value = h2;
};
template < typename T, T h0, T h1, T h2, T h3, T... tail>
struct integer_sequence_at<3u, integer_sequence<T, h0, h1, h2, h3, tail...> >
{
using type = T;
static constexpr T value = h3;
};
template < typename T, T h0, T h1, T h2, T h3, T h4, T... tail>
struct integer_sequence_at<4u, integer_sequence<T, h0, h1, h2, h3, h4, tail...> >
{
using type = T;
static constexpr T value = h4;
};
template < typename T, T h0, T h1, T h2, T h3, T h4, T h5, T... tail>
struct integer_sequence_at<5u, integer_sequence<T, h0, h1, h2, h3, h4, h5, tail...> >
{
using type = T;
static constexpr T value = h5;
};
template < typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6, T... tail>
struct integer_sequence_at<6u, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6, tail...> >
{
using type = T;
static constexpr T value = h6;
};
template < typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6, T h7, T... tail>
struct integer_sequence_at<7u, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6, h7, tail...> >
{
using type = T;
static constexpr T value = h7;
};
//----------------------------------------
template <typename T>
constexpr
T at( const unsigned, integer_sequence<T> ) noexcept
{ return ~static_cast<T>(0); }
template <typename T, T h0, T... tail>
constexpr
T at( const unsigned i, integer_sequence<T, h0> ) noexcept
{ return i==0u ? h0 : ~static_cast<T>(0); }
template <typename T, T h0, T h1>
constexpr
T at( const unsigned i, integer_sequence<T, h0, h1> ) noexcept
{ return i==0u ? h0 :
i==1u ? h1 : ~static_cast<T>(0);
}
template <typename T, T h0, T h1, T h2>
constexpr
T at( const unsigned i, integer_sequence<T, h0, h1, h2> ) noexcept
{ return i==0u ? h0 :
i==1u ? h1 :
i==2u ? h2 : ~static_cast<T>(0);
}
template <typename T, T h0, T h1, T h2, T h3>
constexpr
T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3> ) noexcept
{ return i==0u ? h0 :
i==1u ? h1 :
i==2u ? h2 :
i==3u ? h3 : ~static_cast<T>(0);
}
template <typename T, T h0, T h1, T h2, T h3, T h4>
constexpr
T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4> ) noexcept
{ return i==0u ? h0 :
i==1u ? h1 :
i==2u ? h2 :
i==3u ? h3 :
i==4u ? h4 : ~static_cast<T>(0);
}
template <typename T, T h0, T h1, T h2, T h3, T h4, T h5>
constexpr
T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4, h5> ) noexcept
{ return i==0u ? h0 :
i==1u ? h1 :
i==2u ? h2 :
i==3u ? h3 :
i==4u ? h4 :
i==5u ? h5 : ~static_cast<T>(0);
}
template <typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6>
constexpr
T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6> ) noexcept
{ return i==0u ? h0 :
i==1u ? h1 :
i==2u ? h2 :
i==3u ? h3 :
i==4u ? h4 :
i==5u ? h5 :
i==6u ? h6 : ~static_cast<T>(0);
}
template <typename T, T h0, T h1, T h2, T h3, T h4, T h5, T h6, T h7, T... tail>
constexpr
T at( const unsigned i, integer_sequence<T, h0, h1, h2, h3, h4, h5, h6, h7, tail...> ) noexcept
{ return i==0u ? h0 :
i==1u ? h1 :
i==2u ? h2 :
i==3u ? h3 :
i==4u ? h4 :
i==5u ? h5 :
i==6u ? h6 :
i==7u ? h7 : at(i-8u, integer_sequence<T, tail...>{} );
}
//----------------------------------------
template < typename IntegerSequence
, typename ResultSequence = integer_sequence<typename IntegerSequence::value_type>
>
struct reverse_integer_sequence_helper;
template <typename T, T h0, T... tail, T... results>
struct reverse_integer_sequence_helper< integer_sequence<T, h0, tail...>, integer_sequence<T, results...> >
: public reverse_integer_sequence_helper< integer_sequence<T, tail...>, integer_sequence<T, h0, results...> >
{};
template <typename T, T... results>
struct reverse_integer_sequence_helper< integer_sequence<T>, integer_sequence<T, results...> >
{
using type = integer_sequence<T, results...>;
};
template <typename IntegerSequence>
using reverse_integer_sequence = typename reverse_integer_sequence_helper<IntegerSequence>::type;
//----------------------------------------
template < typename IntegerSequence
, typename Result
, typename ResultSequence = integer_sequence<typename IntegerSequence::value_type>
>
struct exclusive_scan_integer_sequence_helper;
template <typename T, T h0, T... tail, typename Result, T... results>
struct exclusive_scan_integer_sequence_helper
< integer_sequence<T, h0, tail...>
, Result
, integer_sequence<T, results...> >
: public exclusive_scan_integer_sequence_helper
< integer_sequence<T, tail...>
, std::integral_constant<T,Result::value+h0>
, integer_sequence<T, 0, (results+h0)...> >
{};
template <typename T, typename Result, T... results>
struct exclusive_scan_integer_sequence_helper
< integer_sequence<T>, Result, integer_sequence<T, results...> >
{
using type = integer_sequence<T, results...>;
static constexpr T value = Result::value ;
};
template <typename IntegerSequence>
struct exclusive_scan_integer_sequence
{
using value_type = typename IntegerSequence::value_type;
using helper =
exclusive_scan_integer_sequence_helper
< reverse_integer_sequence<IntegerSequence>
, std::integral_constant< value_type , 0 >
> ;
using type = typename helper::type ;
static constexpr value_type value = helper::value ;
};
//----------------------------------------
template < typename IntegerSequence
, typename Result
, typename ResultSequence = integer_sequence<typename IntegerSequence::value_type>
>
struct inclusive_scan_integer_sequence_helper;
template <typename T, T h0, T... tail, typename Result, T... results>
struct inclusive_scan_integer_sequence_helper
< integer_sequence<T, h0, tail...>
, Result
, integer_sequence<T, results...> >
: public inclusive_scan_integer_sequence_helper
< integer_sequence<T, tail...>
, std::integral_constant<T,Result::value+h0>
, integer_sequence<T, h0, (results+h0)...> >
{};
template <typename T, typename Result, T... results>
struct inclusive_scan_integer_sequence_helper
< integer_sequence<T>, Result, integer_sequence<T, results...> >
{
using type = integer_sequence<T, results...>;
static constexpr T value = Result::value ;
};
template <typename IntegerSequence>
struct inclusive_scan_integer_sequence
{
using value_type = typename IntegerSequence::value_type;
using helper =
inclusive_scan_integer_sequence_helper
< reverse_integer_sequence<IntegerSequence>
, std::integral_constant< value_type , 0 >
> ;
using type = typename helper::type ;
static constexpr value_type value = helper::value ;
};
}} // namespace Kokkos::Impl
#endif //KOKKOS_CORE_IMPL_UTILITIES_HPP
diff --git a/lib/kokkos/core/src/impl/Kokkos_spinwait.cpp b/lib/kokkos/core/src/impl/Kokkos_spinwait.cpp
index ad1b6dce3..93ff6c48a 100644
--- a/lib/kokkos/core/src/impl/Kokkos_spinwait.cpp
+++ b/lib/kokkos/core/src/impl/Kokkos_spinwait.cpp
@@ -1,89 +1,181 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
+
#include <impl/Kokkos_spinwait.hpp>
+#include <Kokkos_Atomic.hpp>
+#include <impl/Kokkos_BitOps.hpp>
+
/*--------------------------------------------------------------------------*/
-#if ( KOKKOS_ENABLE_ASM )
- #if defined( __arm__ ) || defined( __aarch64__ )
- /* No-operation instruction to idle the thread. */
- #define YIELD asm volatile("nop")
+#if !defined( _WIN32 )
+ #if defined( KOKKOS_ENABLE_ASM )
+ #if defined( __arm__ ) || defined( __aarch64__ )
+ /* No-operation instruction to idle the thread. */
+ #define KOKKOS_INTERNAL_PAUSE
+ #else
+ /* Pause instruction to prevent excess processor bus usage */
+ #define KOKKOS_INTERNAL_PAUSE asm volatile("pause\n":::"memory")
+ #endif
+ #define KOKKOS_INTERNAL_NOP2 asm volatile("nop\n" "nop\n")
+ #define KOKKOS_INTERNAL_NOP4 KOKKOS_INTERNAL_NOP2; KOKKOS_INTERNAL_NOP2
+ #define KOKKOS_INTERNAL_NOP8 KOKKOS_INTERNAL_NOP4; KOKKOS_INTERNAL_NOP4;
+ #define KOKKOS_INTERNAL_NOP16 KOKKOS_INTERNAL_NOP8; KOKKOS_INTERNAL_NOP8;
+ #define KOKKOS_INTERNAL_NOP32 KOKKOS_INTERNAL_NOP16; KOKKOS_INTERNAL_NOP16;
+ namespace {
+ inline void kokkos_internal_yield( const unsigned i ) noexcept {
+ switch (Kokkos::Impl::bit_scan_reverse((i >> 2)+1u)) {
+ case 0u: KOKKOS_INTERNAL_NOP2; break;
+ case 1u: KOKKOS_INTERNAL_NOP4; break;
+ case 2u: KOKKOS_INTERNAL_NOP8; break;
+ case 3u: KOKKOS_INTERNAL_NOP16; break;
+ default: KOKKOS_INTERNAL_NOP32;
+ }
+ KOKKOS_INTERNAL_PAUSE;
+ }
+ }
#else
- /* Pause instruction to prevent excess processor bus usage */
- #define YIELD asm volatile("pause\n":::"memory")
+ #include <sched.h>
+ namespace {
+ inline void kokkos_internal_yield( const unsigned ) noexcept {
+ sched_yield();
+ }
+ }
+ #endif
+#else // defined( _WIN32 )
+ #if defined ( KOKKOS_ENABLE_WINTHREAD )
+ #include <process.h>
+ namespace {
+ inline void kokkos_internal_yield( const unsigned ) noexcept {
+ Sleep(0);
+ }
+ }
+ #elif defined( _MSC_VER )
+ #define NOMINMAX
+ #include <winsock2.h>
+ #include <windows.h>
+ namespace {
+ inline void kokkos_internal_yield( const unsigned ) noexcept {
+ YieldProcessor();
+ }
+ }
+ #else
+ #define KOKKOS_INTERNAL_PAUSE __asm__ __volatile__("pause\n":::"memory")
+ #define KOKKOS_INTERNAL_NOP2 __asm__ __volatile__("nop\n" "nop")
+ #define KOKKOS_INTERNAL_NOP4 KOKKOS_INTERNAL_NOP2; KOKKOS_INTERNAL_NOP2
+ #define KOKKOS_INTERNAL_NOP8 KOKKOS_INTERNAL_NOP4; KOKKOS_INTERNAL_NOP4;
+ #define KOKKOS_INTERNAL_NOP16 KOKKOS_INTERNAL_NOP8; KOKKOS_INTERNAL_NOP8;
+ #define KOKKOS_INTERNAL_NOP32 KOKKOS_INTERNAL_NOP16; KOKKOS_INTERNAL_NOP16;
+ namespace {
+ inline void kokkos_internal_yield( const unsigned i ) noexcept {
+ switch (Kokkos::Impl::bit_scan_reverse((i >> 2)+1u)) {
+ case 0: KOKKOS_INTERNAL_NOP2; break;
+ case 1: KOKKOS_INTERNAL_NOP4; break;
+ case 2: KOKKOS_INTERNAL_NOP8; break;
+ case 3: KOKKOS_INTERNAL_NOP16; break;
+ default: KOKKOS_INTERNAL_NOP32;
+ }
+ KOKKOS_INTERNAL_PAUSE;
+ }
+ }
#endif
-#elif defined ( KOKKOS_ENABLE_WINTHREAD )
- #include <process.h>
- #define YIELD Sleep(0)
-#elif defined ( _WIN32) && defined (_MSC_VER)
- /* Windows w/ Visual Studio */
- #define NOMINMAX
- #include <winsock2.h>
- #include <windows.h>
-#define YIELD YieldProcessor();
-#elif defined ( _WIN32 )
- /* Windows w/ Intel*/
- #define YIELD __asm__ __volatile__("pause\n":::"memory")
-#else
- #include <sched.h>
- #define YIELD sched_yield()
#endif
+
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
-void spinwait( volatile int & flag , const int value )
+
+void spinwait_while_equal( volatile int32_t & flag , const int32_t value )
+{
+ Kokkos::store_fence();
+ unsigned i = 0;
+ while ( value == flag ) {
+ kokkos_internal_yield(i);
+ ++i;
+ }
+ Kokkos::load_fence();
+}
+
+void spinwait_until_equal( volatile int32_t & flag , const int32_t value )
+{
+ Kokkos::store_fence();
+ unsigned i = 0;
+ while ( value != flag ) {
+ kokkos_internal_yield(i);
+ ++i;
+ }
+ Kokkos::load_fence();
+}
+
+void spinwait_while_equal( volatile int64_t & flag , const int64_t value )
{
+ Kokkos::store_fence();
+ unsigned i = 0;
while ( value == flag ) {
- YIELD ;
+ kokkos_internal_yield(i);
+ ++i;
+ }
+ Kokkos::load_fence();
+}
+
+void spinwait_until_equal( volatile int64_t & flag , const int64_t value )
+{
+ Kokkos::store_fence();
+ unsigned i = 0;
+ while ( value != flag ) {
+ kokkos_internal_yield(i);
+ ++i;
}
+ Kokkos::load_fence();
}
+
#endif
} /* namespace Impl */
} /* namespace Kokkos */
diff --git a/lib/kokkos/core/src/impl/Kokkos_spinwait.hpp b/lib/kokkos/core/src/impl/Kokkos_spinwait.hpp
index cc87771fa..6e34b8a94 100644
--- a/lib/kokkos/core/src/impl/Kokkos_spinwait.hpp
+++ b/lib/kokkos/core/src/impl/Kokkos_spinwait.hpp
@@ -1,64 +1,80 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_SPINWAIT_HPP
#define KOKKOS_SPINWAIT_HPP
#include <Kokkos_Macros.hpp>
+#include <cstdint>
+
namespace Kokkos {
namespace Impl {
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
-void spinwait( volatile int & flag , const int value );
+
+void spinwait_while_equal( volatile int32_t & flag , const int32_t value );
+void spinwait_until_equal( volatile int32_t & flag , const int32_t value );
+
+void spinwait_while_equal( volatile int64_t & flag , const int64_t value );
+void spinwait_until_equal( volatile int64_t & flag , const int64_t value );
#else
+
+KOKKOS_INLINE_FUNCTION
+void spinwait_while_equal( volatile int32_t & , const int32_t ) {}
+KOKKOS_INLINE_FUNCTION
+void spinwait_until_equal( volatile int32_t & , const int32_t ) {}
+
+KOKKOS_INLINE_FUNCTION
+void spinwait_while_equal( volatile int64_t & , const int64_t ) {}
KOKKOS_INLINE_FUNCTION
-void spinwait( volatile int & , const int ) {}
+void spinwait_until_equal( volatile int64_t & , const int64_t ) {}
+
#endif
} /* namespace Impl */
} /* namespace Kokkos */
#endif /* #ifndef KOKKOS_SPINWAIT_HPP */
diff --git a/lib/kokkos/core/unit_test/CMakeLists.txt b/lib/kokkos/core/unit_test/CMakeLists.txt
index 795657fe8..caf6c5012 100644
--- a/lib/kokkos/core/unit_test/CMakeLists.txt
+++ b/lib/kokkos/core/unit_test/CMakeLists.txt
@@ -1,197 +1,217 @@
#
# Add test-only library for gtest to be reused by all the subpackages
#
SET(GTEST_SOURCE_DIR ${${PARENT_PACKAGE_NAME}_SOURCE_DIR}/tpls/gtest)
INCLUDE_DIRECTORIES(${GTEST_SOURCE_DIR})
TRIBITS_ADD_LIBRARY(
kokkos_gtest
HEADERS ${GTEST_SOURCE_DIR}/gtest/gtest.h
SOURCES ${GTEST_SOURCE_DIR}/gtest/gtest-all.cc
TESTONLY
)
#
# Define the tests
#
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
IF(Kokkos_ENABLE_Serial)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_Serial
SOURCES
UnitTestMain.cpp
serial/TestSerial_Atomics.cpp
serial/TestSerial_Other.cpp
serial/TestSerial_Reductions.cpp
serial/TestSerial_SubView_a.cpp
serial/TestSerial_SubView_b.cpp
serial/TestSerial_SubView_c01.cpp
serial/TestSerial_SubView_c02.cpp
serial/TestSerial_SubView_c03.cpp
serial/TestSerial_SubView_c04.cpp
serial/TestSerial_SubView_c05.cpp
serial/TestSerial_SubView_c06.cpp
serial/TestSerial_SubView_c07.cpp
serial/TestSerial_SubView_c08.cpp
serial/TestSerial_SubView_c09.cpp
serial/TestSerial_SubView_c10.cpp
serial/TestSerial_SubView_c11.cpp
serial/TestSerial_SubView_c12.cpp
serial/TestSerial_Team.cpp
serial/TestSerial_ViewAPI_a.cpp
serial/TestSerial_ViewAPI_b.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
ENDIF()
IF(Kokkos_ENABLE_Pthread)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_Threads
SOURCES
UnitTestMain.cpp
threads/TestThreads_Atomics.cpp
threads/TestThreads_Other.cpp
threads/TestThreads_Reductions.cpp
threads/TestThreads_SubView_a.cpp
threads/TestThreads_SubView_b.cpp
threads/TestThreads_SubView_c01.cpp
threads/TestThreads_SubView_c02.cpp
threads/TestThreads_SubView_c03.cpp
threads/TestThreads_SubView_c04.cpp
threads/TestThreads_SubView_c05.cpp
threads/TestThreads_SubView_c06.cpp
threads/TestThreads_SubView_c07.cpp
threads/TestThreads_SubView_c08.cpp
threads/TestThreads_SubView_c09.cpp
threads/TestThreads_SubView_c10.cpp
threads/TestThreads_SubView_c11.cpp
threads/TestThreads_SubView_c12.cpp
threads/TestThreads_Team.cpp
threads/TestThreads_ViewAPI_a.cpp
threads/TestThreads_ViewAPI_b.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
ENDIF()
IF(Kokkos_ENABLE_OpenMP)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_OpenMP
SOURCES
UnitTestMain.cpp
openmp/TestOpenMP_Atomics.cpp
openmp/TestOpenMP_Other.cpp
openmp/TestOpenMP_Reductions.cpp
openmp/TestOpenMP_SubView_a.cpp
openmp/TestOpenMP_SubView_b.cpp
openmp/TestOpenMP_SubView_c01.cpp
openmp/TestOpenMP_SubView_c02.cpp
openmp/TestOpenMP_SubView_c03.cpp
openmp/TestOpenMP_SubView_c04.cpp
openmp/TestOpenMP_SubView_c05.cpp
openmp/TestOpenMP_SubView_c06.cpp
openmp/TestOpenMP_SubView_c07.cpp
openmp/TestOpenMP_SubView_c08.cpp
openmp/TestOpenMP_SubView_c09.cpp
openmp/TestOpenMP_SubView_c10.cpp
openmp/TestOpenMP_SubView_c11.cpp
openmp/TestOpenMP_SubView_c12.cpp
openmp/TestOpenMP_Team.cpp
openmp/TestOpenMP_ViewAPI_a.cpp
openmp/TestOpenMP_ViewAPI_b.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
ENDIF()
-IF(Kokkos_ENABLE_QTHREAD)
+IF(Kokkos_ENABLE_Qthreads)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
- UnitTest_Qthread
- SOURCES UnitTestMain.cpp TestQthread.cpp
+ UnitTest_Qthreads
+ SOURCES
+ UnitTestMain.cpp
+ qthreads/TestQthreads_Atomics.cpp
+ qthreads/TestQthreads_Other.cpp
+ qthreads/TestQthreads_Reductions.cpp
+ qthreads/TestQthreads_SubView_a.cpp
+ qthreads/TestQthreads_SubView_b.cpp
+ qthreads/TestQthreads_SubView_c01.cpp
+ qthreads/TestQthreads_SubView_c02.cpp
+ qthreads/TestQthreads_SubView_c03.cpp
+ qthreads/TestQthreads_SubView_c04.cpp
+ qthreads/TestQthreads_SubView_c05.cpp
+ qthreads/TestQthreads_SubView_c06.cpp
+ qthreads/TestQthreads_SubView_c07.cpp
+ qthreads/TestQthreads_SubView_c08.cpp
+ qthreads/TestQthreads_SubView_c09.cpp
+ qthreads/TestQthreads_SubView_c10.cpp
+ qthreads/TestQthreads_SubView_c11.cpp
+ qthreads/TestQthreads_SubView_c12.cpp
+ qthreads/TestQthreads_Team.cpp
+ qthreads/TestQthreads_ViewAPI_a.cpp
+ qthreads/TestQthreads_ViewAPI_b.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
ENDIF()
IF(Kokkos_ENABLE_Cuda)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_Cuda
SOURCES
UnitTestMain.cpp
cuda/TestCuda_Atomics.cpp
cuda/TestCuda_Other.cpp
cuda/TestCuda_Reductions_a.cpp
cuda/TestCuda_Reductions_b.cpp
cuda/TestCuda_Spaces.cpp
cuda/TestCuda_SubView_a.cpp
cuda/TestCuda_SubView_b.cpp
cuda/TestCuda_SubView_c01.cpp
cuda/TestCuda_SubView_c02.cpp
cuda/TestCuda_SubView_c03.cpp
cuda/TestCuda_SubView_c04.cpp
cuda/TestCuda_SubView_c05.cpp
cuda/TestCuda_SubView_c06.cpp
cuda/TestCuda_SubView_c07.cpp
cuda/TestCuda_SubView_c08.cpp
cuda/TestCuda_SubView_c09.cpp
cuda/TestCuda_SubView_c10.cpp
cuda/TestCuda_SubView_c11.cpp
cuda/TestCuda_SubView_c12.cpp
cuda/TestCuda_Team.cpp
cuda/TestCuda_ViewAPI_a.cpp
cuda/TestCuda_ViewAPI_b.cpp
cuda/TestCuda_ViewAPI_c.cpp
cuda/TestCuda_ViewAPI_d.cpp
cuda/TestCuda_ViewAPI_e.cpp
cuda/TestCuda_ViewAPI_f.cpp
cuda/TestCuda_ViewAPI_g.cpp
cuda/TestCuda_ViewAPI_h.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
ENDIF()
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_Default
SOURCES UnitTestMain.cpp TestDefaultDeviceType.cpp TestDefaultDeviceType_a.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
foreach(INITTESTS_NUM RANGE 1 16)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_DefaultInit_${INITTESTS_NUM}
SOURCES UnitTestMain.cpp TestDefaultDeviceTypeInit_${INITTESTS_NUM}.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
endforeach(INITTESTS_NUM)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
UnitTest_HWLOC
SOURCES UnitTestMain.cpp TestHWLOC.cpp
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
)
-
diff --git a/lib/kokkos/core/unit_test/Makefile b/lib/kokkos/core/unit_test/Makefile
index cc59825fb..d93830a28 100644
--- a/lib/kokkos/core/unit_test/Makefile
+++ b/lib/kokkos/core/unit_test/Makefile
@@ -1,196 +1,196 @@
KOKKOS_PATH = ../..
GTEST_PATH = ../../tpls/gtest
vpath %.cpp ${KOKKOS_PATH}/core/unit_test
vpath %.cpp ${KOKKOS_PATH}/core/unit_test/serial
vpath %.cpp ${KOKKOS_PATH}/core/unit_test/threads
vpath %.cpp ${KOKKOS_PATH}/core/unit_test/openmp
+vpath %.cpp ${KOKKOS_PATH}/core/unit_test/qthreads
vpath %.cpp ${KOKKOS_PATH}/core/unit_test/cuda
TEST_HEADERS = $(wildcard $(KOKKOS_PATH)/core/unit_test/*.hpp)
TEST_HEADERS += $(wildcard $(KOKKOS_PATH)/core/unit_test/*/*.hpp)
default: build_all
echo "End Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
else
CXX = g++
endif
CXXFLAGS = -O3
LINK ?= $(CXX)
LDFLAGS ?= -lpthread
include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/unit_test
TEST_TARGETS =
TARGETS =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
OBJ_CUDA = TestCuda_Other.o TestCuda_Reductions_a.o TestCuda_Reductions_b.o TestCuda_Atomics.o TestCuda_Team.o TestCuda_Spaces.o
OBJ_CUDA += TestCuda_SubView_a.o TestCuda_SubView_b.o
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
- OBJ_OPENMP += TestCuda_SubView_c_all.o
+ OBJ_OPENMP += TestCuda_SubView_c_all.o
else
OBJ_CUDA += TestCuda_SubView_c01.o TestCuda_SubView_c02.o TestCuda_SubView_c03.o
- OBJ_CUDA += TestCuda_SubView_c04.o TestCuda_SubView_c05.o TestCuda_SubView_c06.o
- OBJ_CUDA += TestCuda_SubView_c07.o TestCuda_SubView_c08.o TestCuda_SubView_c09.o
+ OBJ_CUDA += TestCuda_SubView_c04.o TestCuda_SubView_c05.o TestCuda_SubView_c06.o
+ OBJ_CUDA += TestCuda_SubView_c07.o TestCuda_SubView_c08.o TestCuda_SubView_c09.o
OBJ_CUDA += TestCuda_SubView_c10.o TestCuda_SubView_c11.o TestCuda_SubView_c12.o
endif
- OBJ_CUDA += TestCuda_ViewAPI_a.o TestCuda_ViewAPI_b.o TestCuda_ViewAPI_c.o TestCuda_ViewAPI_d.o
- OBJ_CUDA += TestCuda_ViewAPI_e.o TestCuda_ViewAPI_f.o TestCuda_ViewAPI_g.o TestCuda_ViewAPI_h.o
+ OBJ_CUDA += TestCuda_ViewAPI_a.o TestCuda_ViewAPI_b.o TestCuda_ViewAPI_c.o TestCuda_ViewAPI_d.o
+ OBJ_CUDA += TestCuda_ViewAPI_e.o TestCuda_ViewAPI_f.o TestCuda_ViewAPI_g.o TestCuda_ViewAPI_h.o
OBJ_CUDA += TestCuda_ViewAPI_s.o
OBJ_CUDA += UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_Cuda
TEST_TARGETS += test-cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
- OBJ_THREADS = TestThreads_Other.o TestThreads_Reductions.o TestThreads_Atomics.o TestThreads_Team.o
- OBJ_THREADS += TestThreads_SubView_a.o TestThreads_SubView_b.o
+ OBJ_THREADS = TestThreads_Other.o TestThreads_Reductions.o TestThreads_Atomics.o TestThreads_Team.o
+ OBJ_THREADS += TestThreads_SubView_a.o TestThreads_SubView_b.o
OBJ_THREADS += TestThreads_SubView_c01.o TestThreads_SubView_c02.o TestThreads_SubView_c03.o
- OBJ_THREADS += TestThreads_SubView_c04.o TestThreads_SubView_c05.o TestThreads_SubView_c06.o
- OBJ_THREADS += TestThreads_SubView_c07.o TestThreads_SubView_c08.o TestThreads_SubView_c09.o
+ OBJ_THREADS += TestThreads_SubView_c04.o TestThreads_SubView_c05.o TestThreads_SubView_c06.o
+ OBJ_THREADS += TestThreads_SubView_c07.o TestThreads_SubView_c08.o TestThreads_SubView_c09.o
OBJ_THREADS += TestThreads_SubView_c10.o TestThreads_SubView_c11.o TestThreads_SubView_c12.o
- OBJ_THREADS += TestThreads_ViewAPI_a.o TestThreads_ViewAPI_b.o UnitTestMain.o gtest-all.o
+ OBJ_THREADS += TestThreads_ViewAPI_a.o TestThreads_ViewAPI_b.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_Threads
TEST_TARGETS += test-threads
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
OBJ_OPENMP = TestOpenMP_Other.o TestOpenMP_Reductions.o TestOpenMP_Atomics.o TestOpenMP_Team.o
OBJ_OPENMP += TestOpenMP_SubView_a.o TestOpenMP_SubView_b.o
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
- OBJ_OPENMP += TestOpenMP_SubView_c_all.o
+ OBJ_OPENMP += TestOpenMP_SubView_c_all.o
else
OBJ_OPENMP += TestOpenMP_SubView_c01.o TestOpenMP_SubView_c02.o TestOpenMP_SubView_c03.o
- OBJ_OPENMP += TestOpenMP_SubView_c04.o TestOpenMP_SubView_c05.o TestOpenMP_SubView_c06.o
- OBJ_OPENMP += TestOpenMP_SubView_c07.o TestOpenMP_SubView_c08.o TestOpenMP_SubView_c09.o
+ OBJ_OPENMP += TestOpenMP_SubView_c04.o TestOpenMP_SubView_c05.o TestOpenMP_SubView_c06.o
+ OBJ_OPENMP += TestOpenMP_SubView_c07.o TestOpenMP_SubView_c08.o TestOpenMP_SubView_c09.o
OBJ_OPENMP += TestOpenMP_SubView_c10.o TestOpenMP_SubView_c11.o TestOpenMP_SubView_c12.o
endif
OBJ_OPENMP += TestOpenMP_ViewAPI_a.o TestOpenMP_ViewAPI_b.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_OpenMP
TEST_TARGETS += test-openmp
endif
+ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
+ OBJ_QTHREADS = TestQthreads_Other.o TestQthreads_Reductions.o TestQthreads_Atomics.o TestQthreads_Team.o
+ OBJ_QTHREADS += TestQthreads_SubView_a.o TestQthreads_SubView_b.o
+ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
+ OBJ_QTHREADS += TestQthreads_SubView_c_all.o
+else
+ OBJ_QTHREADS += TestQthreads_SubView_c01.o TestQthreads_SubView_c02.o TestQthreads_SubView_c03.o
+ OBJ_QTHREADS += TestQthreads_SubView_c04.o TestQthreads_SubView_c05.o TestQthreads_SubView_c06.o
+ OBJ_QTHREADS += TestQthreads_SubView_c07.o TestQthreads_SubView_c08.o TestQthreads_SubView_c09.o
+ OBJ_QTHREADS += TestQthreads_SubView_c10.o TestQthreads_SubView_c11.o TestQthreads_SubView_c12.o
+endif
+ OBJ_QTHREADS += TestQthreads_ViewAPI_a.o TestQthreads_ViewAPI_b.o UnitTestMain.o gtest-all.o
+ TARGETS += KokkosCore_UnitTest_Qthreads
+ TEST_TARGETS += test-qthreads
+endif
+
ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
- OBJ_SERIAL = TestSerial_Other.o TestSerial_Reductions.o TestSerial_Atomics.o TestSerial_Team.o
- OBJ_SERIAL += TestSerial_SubView_a.o TestSerial_SubView_b.o
+ OBJ_SERIAL = TestSerial_Other.o TestSerial_Reductions.o TestSerial_Atomics.o TestSerial_Team.o
+ OBJ_SERIAL += TestSerial_SubView_a.o TestSerial_SubView_b.o
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
- OBJ_OPENMP += TestSerial_SubView_c_all.o
+ OBJ_OPENMP += TestSerial_SubView_c_all.o
else
OBJ_SERIAL += TestSerial_SubView_c01.o TestSerial_SubView_c02.o TestSerial_SubView_c03.o
- OBJ_SERIAL += TestSerial_SubView_c04.o TestSerial_SubView_c05.o TestSerial_SubView_c06.o
- OBJ_SERIAL += TestSerial_SubView_c07.o TestSerial_SubView_c08.o TestSerial_SubView_c09.o
+ OBJ_SERIAL += TestSerial_SubView_c04.o TestSerial_SubView_c05.o TestSerial_SubView_c06.o
+ OBJ_SERIAL += TestSerial_SubView_c07.o TestSerial_SubView_c08.o TestSerial_SubView_c09.o
OBJ_SERIAL += TestSerial_SubView_c10.o TestSerial_SubView_c11.o TestSerial_SubView_c12.o
endif
- OBJ_SERIAL += TestSerial_ViewAPI_a.o TestSerial_ViewAPI_b.o UnitTestMain.o gtest-all.o
+ OBJ_SERIAL += TestSerial_ViewAPI_a.o TestSerial_ViewAPI_b.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_Serial
TEST_TARGETS += test-serial
endif
-ifeq ($(KOKKOS_INTERNAL_USE_QTHREAD), 1)
- OBJ_QTHREAD = TestQthread.o UnitTestMain.o gtest-all.o
- TARGETS += KokkosCore_UnitTest_Qthread
- TEST_TARGETS += test-qthread
-endif
-
OBJ_HWLOC = TestHWLOC.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_HWLOC
TEST_TARGETS += test-hwloc
OBJ_DEFAULT = TestDefaultDeviceType.o TestDefaultDeviceType_a.o TestDefaultDeviceType_b.o TestDefaultDeviceType_c.o TestDefaultDeviceType_d.o UnitTestMain.o gtest-all.o
TARGETS += KokkosCore_UnitTest_Default
TEST_TARGETS += test-default
NUM_INITTESTS = 16
INITTESTS_NUMBERS := $(shell seq 1 ${NUM_INITTESTS})
INITTESTS_TARGETS := $(addprefix KokkosCore_UnitTest_DefaultDeviceTypeInit_,${INITTESTS_NUMBERS})
TARGETS += ${INITTESTS_TARGETS}
INITTESTS_TEST_TARGETS := $(addprefix test-default-init-,${INITTESTS_NUMBERS})
TEST_TARGETS += ${INITTESTS_TEST_TARGETS}
-OBJ_SYNCHRONIC = TestSynchronic.o UnitTestMain.o gtest-all.o
-TARGETS += KokkosCore_UnitTest_Synchronic
-TEST_TARGETS += test-synchronic
-
KokkosCore_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Cuda
KokkosCore_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Threads
KokkosCore_UnitTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_OpenMP
KokkosCore_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Serial
-KokkosCore_UnitTest_Qthread: $(OBJ_QTHREAD) $(KOKKOS_LINK_DEPENDS)
- $(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_QTHREAD) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Qthread
+KokkosCore_UnitTest_Qthreads: $(OBJ_QTHREADS) $(KOKKOS_LINK_DEPENDS)
+ $(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_QTHREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Qthreads
KokkosCore_UnitTest_HWLOC: $(OBJ_HWLOC) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_HWLOC) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_HWLOC
KokkosCore_UnitTest_AllocationTracker: $(OBJ_ALLOCATIONTRACKER) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_ALLOCATIONTRACKER) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_AllocationTracker
KokkosCore_UnitTest_Default: $(OBJ_DEFAULT) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_DEFAULT) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Default
${INITTESTS_TARGETS}: KokkosCore_UnitTest_DefaultDeviceTypeInit_%: TestDefaultDeviceTypeInit_%.o UnitTestMain.o gtest-all.o $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) TestDefaultDeviceTypeInit_$*.o UnitTestMain.o gtest-all.o $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_DefaultDeviceTypeInit_$*
-KokkosCore_UnitTest_Synchronic: $(OBJ_SYNCHRONIC) $(KOKKOS_LINK_DEPENDS)
- $(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SYNCHRONIC) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_UnitTest_Synchronic
-
test-cuda: KokkosCore_UnitTest_Cuda
./KokkosCore_UnitTest_Cuda
test-threads: KokkosCore_UnitTest_Threads
./KokkosCore_UnitTest_Threads
test-openmp: KokkosCore_UnitTest_OpenMP
./KokkosCore_UnitTest_OpenMP
test-serial: KokkosCore_UnitTest_Serial
./KokkosCore_UnitTest_Serial
-test-qthread: KokkosCore_UnitTest_Qthread
- ./KokkosCore_UnitTest_Qthread
+test-qthreads: KokkosCore_UnitTest_Qthreads
+ ./KokkosCore_UnitTest_Qthreads
test-hwloc: KokkosCore_UnitTest_HWLOC
./KokkosCore_UnitTest_HWLOC
test-allocationtracker: KokkosCore_UnitTest_AllocationTracker
./KokkosCore_UnitTest_AllocationTracker
test-default: KokkosCore_UnitTest_Default
./KokkosCore_UnitTest_Default
${INITTESTS_TEST_TARGETS}: test-default-init-%: KokkosCore_UnitTest_DefaultDeviceTypeInit_%
./KokkosCore_UnitTest_DefaultDeviceTypeInit_$*
-test-synchronic: KokkosCore_UnitTest_Synchronic
- ./KokkosCore_UnitTest_Synchronic
-
build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(TEST_HEADERS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
-
diff --git a/lib/kokkos/core/unit_test/TestAggregate.hpp b/lib/kokkos/core/unit_test/TestAggregate.hpp
index d22837f3e..f09cc5018 100644
--- a/lib/kokkos/core/unit_test/TestAggregate.hpp
+++ b/lib/kokkos/core/unit_test/TestAggregate.hpp
@@ -1,109 +1,124 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#ifndef TEST_AGGREGATE_HPP
#define TEST_AGGREGATE_HPP
#include <gtest/gtest.h>
#include <stdexcept>
#include <sstream>
#include <iostream>
-/*--------------------------------------------------------------------------*/
-
#include <impl/Kokkos_ViewArray.hpp>
namespace Test {
template< class DeviceType >
void TestViewAggregate()
{
- typedef Kokkos::Array<double,32> value_type ;
-
- typedef Kokkos::Experimental::Impl::
- ViewDataAnalysis< value_type * , Kokkos::LayoutLeft , value_type >
- analysis_1d ;
+ typedef Kokkos::Array< double, 32 > value_type;
+ typedef Kokkos::Experimental::Impl::ViewDataAnalysis< value_type *, Kokkos::LayoutLeft, value_type > analysis_1d;
- static_assert( std::is_same< typename analysis_1d::specialize , Kokkos::Array<> >::value , "" );
+ static_assert( std::is_same< typename analysis_1d::specialize, Kokkos::Array<> >::value, "" );
+ typedef Kokkos::ViewTraits< value_type **, DeviceType > a32_traits;
+ typedef Kokkos::ViewTraits< typename a32_traits::scalar_array_type, DeviceType > flat_traits;
- typedef Kokkos::ViewTraits< value_type ** , DeviceType > a32_traits ;
- typedef Kokkos::ViewTraits< typename a32_traits::scalar_array_type , DeviceType > flat_traits ;
+ static_assert( std::is_same< typename a32_traits::specialize, Kokkos::Array<> >::value, "" );
+ static_assert( std::is_same< typename a32_traits::value_type, value_type >::value, "" );
+ static_assert( a32_traits::rank == 2, "" );
+ static_assert( a32_traits::rank_dynamic == 2, "" );
- static_assert( std::is_same< typename a32_traits::specialize , Kokkos::Array<> >::value , "" );
- static_assert( std::is_same< typename a32_traits::value_type , value_type >::value , "" );
- static_assert( a32_traits::rank == 2 , "" );
- static_assert( a32_traits::rank_dynamic == 2 , "" );
+ static_assert( std::is_same< typename flat_traits::specialize, void >::value, "" );
+ static_assert( flat_traits::rank == 3, "" );
+ static_assert( flat_traits::rank_dynamic == 2, "" );
+ static_assert( flat_traits::dimension::N2 == 32, "" );
- static_assert( std::is_same< typename flat_traits::specialize , void >::value , "" );
- static_assert( flat_traits::rank == 3 , "" );
- static_assert( flat_traits::rank_dynamic == 2 , "" );
- static_assert( flat_traits::dimension::N2 == 32 , "" );
+ typedef Kokkos::View< Kokkos::Array< double, 32 > **, DeviceType > a32_type;
+ typedef typename a32_type::array_type a32_flat_type;
+ static_assert( std::is_same< typename a32_type::value_type, value_type >::value, "" );
+ static_assert( std::is_same< typename a32_type::pointer_type, double * >::value, "" );
+ static_assert( a32_type::Rank == 2, "" );
+ static_assert( a32_flat_type::Rank == 3, "" );
- typedef Kokkos::View< Kokkos::Array<double,32> ** , DeviceType > a32_type ;
-
- typedef typename a32_type::array_type a32_flat_type ;
-
- static_assert( std::is_same< typename a32_type::value_type , value_type >::value , "" );
- static_assert( std::is_same< typename a32_type::pointer_type , double * >::value , "" );
- static_assert( a32_type::Rank == 2 , "" );
- static_assert( a32_flat_type::Rank == 3 , "" );
-
- a32_type x("test",4,5);
+ a32_type x( "test", 4, 5 );
a32_flat_type y( x );
- ASSERT_EQ( x.extent(0) , 4 );
- ASSERT_EQ( x.extent(1) , 5 );
- ASSERT_EQ( y.extent(0) , 4 );
- ASSERT_EQ( y.extent(1) , 5 );
- ASSERT_EQ( y.extent(2) , 32 );
-}
-
+ ASSERT_EQ( x.extent( 0 ), 4 );
+ ASSERT_EQ( x.extent( 1 ), 5 );
+ ASSERT_EQ( y.extent( 0 ), 4 );
+ ASSERT_EQ( y.extent( 1 ), 5 );
+ ASSERT_EQ( y.extent( 2 ), 32 );
+
+ // Initialize arrays from brace-init-list as for std::array.
+ //
+ // Comment: Clang will issue the following warning if we don't use double
+ // braces here (one for initializing the Kokkos::Array and one for
+ // initializing the sub-aggreagate C-array data member),
+ //
+ // warning: suggest braces around initialization of subobject
+ //
+ // but single brace syntax would be valid as well.
+ Kokkos::Array< float, 2 > aggregate_initialization_syntax_1 = { { 1.41, 3.14 } };
+ ASSERT_FLOAT_EQ( aggregate_initialization_syntax_1[0], 1.41 );
+ ASSERT_FLOAT_EQ( aggregate_initialization_syntax_1[1], 3.14 );
+
+ Kokkos::Array< int, 3 > aggregate_initialization_syntax_2{ { 0, 1, 2 } }; // since C++11
+ for ( int i = 0; i < 3; ++i ) {
+ ASSERT_EQ( aggregate_initialization_syntax_2[i], i );
+ }
+
+ // Note that this is a valid initialization.
+ Kokkos::Array< double, 3 > initialized_with_one_argument_missing = { { 255, 255 } };
+ for (int i = 0; i < 2; ++i) {
+ ASSERT_DOUBLE_EQ( initialized_with_one_argument_missing[i], 255 );
+ }
+ // But the following line would not compile
+// Kokkos::Array< double, 3 > initialized_with_too_many{ { 1, 2, 3, 4 } };
}
-/*--------------------------------------------------------------------------*/
-/*--------------------------------------------------------------------------*/
+} // namespace Test
#endif /* #ifndef TEST_AGGREGATE_HPP */
diff --git a/lib/kokkos/core/unit_test/TestAtomic.hpp b/lib/kokkos/core/unit_test/TestAtomic.hpp
index e94872357..ff77b8dca 100644
--- a/lib/kokkos/core/unit_test/TestAtomic.hpp
+++ b/lib/kokkos/core/unit_test/TestAtomic.hpp
@@ -1,402 +1,433 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
namespace TestAtomic {
-// Struct for testing arbitrary size atomics
+// Struct for testing arbitrary size atomics.
-template<int N>
+template< int N >
struct SuperScalar {
double val[N];
KOKKOS_INLINE_FUNCTION
SuperScalar() {
- for(int i=0; i<N; i++)
+ for ( int i = 0; i < N; i++ ) {
val[i] = 0.0;
+ }
}
KOKKOS_INLINE_FUNCTION
- SuperScalar(const SuperScalar& src) {
- for(int i=0; i<N; i++)
+ SuperScalar( const SuperScalar & src ) {
+ for ( int i = 0; i < N; i++ ) {
val[i] = src.val[i];
+ }
}
KOKKOS_INLINE_FUNCTION
- SuperScalar(const volatile SuperScalar& src) {
- for(int i=0; i<N; i++)
+ SuperScalar( const volatile SuperScalar & src ) {
+ for ( int i = 0; i < N; i++ ) {
val[i] = src.val[i];
+ }
}
KOKKOS_INLINE_FUNCTION
- SuperScalar& operator = (const SuperScalar& src) {
- for(int i=0; i<N; i++)
+ SuperScalar& operator=( const SuperScalar & src ) {
+ for ( int i = 0; i < N; i++ ) {
val[i] = src.val[i];
+ }
return *this;
}
KOKKOS_INLINE_FUNCTION
- SuperScalar& operator = (const volatile SuperScalar& src) {
- for(int i=0; i<N; i++)
+ SuperScalar& operator=( const volatile SuperScalar & src ) {
+ for ( int i = 0; i < N; i++ ) {
val[i] = src.val[i];
+ }
return *this;
}
KOKKOS_INLINE_FUNCTION
- void operator = (const SuperScalar& src) volatile {
- for(int i=0; i<N; i++)
+ void operator=( const SuperScalar & src ) volatile {
+ for ( int i = 0; i < N; i++ ) {
val[i] = src.val[i];
+ }
}
KOKKOS_INLINE_FUNCTION
- SuperScalar operator + (const SuperScalar& src) {
+ SuperScalar operator+( const SuperScalar & src ) {
SuperScalar tmp = *this;
- for(int i=0; i<N; i++)
+ for ( int i = 0; i < N; i++ ) {
tmp.val[i] += src.val[i];
+ }
return tmp;
}
KOKKOS_INLINE_FUNCTION
- SuperScalar& operator += (const double& src) {
- for(int i=0; i<N; i++)
- val[i] += 1.0*(i+1)*src;
+ SuperScalar& operator+=( const double & src ) {
+ for ( int i = 0; i < N; i++ ) {
+ val[i] += 1.0 * ( i + 1 ) * src;
+ }
return *this;
}
KOKKOS_INLINE_FUNCTION
- SuperScalar& operator += (const SuperScalar& src) {
- for(int i=0; i<N; i++)
+ SuperScalar& operator+=( const SuperScalar & src ) {
+ for ( int i = 0; i < N; i++ ) {
val[i] += src.val[i];
+ }
return *this;
}
KOKKOS_INLINE_FUNCTION
- bool operator == (const SuperScalar& src) {
+ bool operator==( const SuperScalar & src ) {
bool compare = true;
- for(int i=0; i<N; i++)
- compare = compare && ( val[i] == src.val[i]);
+ for( int i = 0; i < N; i++ ) {
+ compare = compare && ( val[i] == src.val[i] );
+ }
return compare;
}
KOKKOS_INLINE_FUNCTION
- bool operator != (const SuperScalar& src) {
+ bool operator!=( const SuperScalar & src ) {
bool compare = true;
- for(int i=0; i<N; i++)
- compare = compare && ( val[i] == src.val[i]);
+ for ( int i = 0; i < N; i++ ) {
+ compare = compare && ( val[i] == src.val[i] );
+ }
return !compare;
}
-
-
KOKKOS_INLINE_FUNCTION
- SuperScalar(const double& src) {
- for(int i=0; i<N; i++)
- val[i] = 1.0 * (i+1) * src;
+ SuperScalar( const double & src ) {
+ for ( int i = 0; i < N; i++ ) {
+ val[i] = 1.0 * ( i + 1 ) * src;
+ }
}
-
};
-template<int N>
-std::ostream& operator<<(std::ostream& os, const SuperScalar<N>& dt)
+template< int N >
+std::ostream & operator<<( std::ostream & os, const SuperScalar< N > & dt )
{
- os << "{ ";
- for(int i=0;i<N-1;i++)
- os << dt.val[i] << ", ";
- os << dt.val[N-1] << "}";
- return os;
+ os << "{ ";
+ for ( int i = 0; i < N - 1; i++ ) {
+ os << dt.val[i] << ", ";
+ }
+ os << dt.val[N-1] << "}";
+
+ return os;
}
-template<class T,class DEVICE_TYPE>
+template< class T, class DEVICE_TYPE >
struct ZeroFunctor {
typedef DEVICE_TYPE execution_space;
- typedef typename Kokkos::View<T,execution_space> type;
- typedef typename Kokkos::View<T,execution_space>::HostMirror h_type;
+ typedef typename Kokkos::View< T, execution_space > type;
+ typedef typename Kokkos::View< T, execution_space >::HostMirror h_type;
+
type data;
+
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
+ void operator()( int ) const {
data() = 0;
}
};
//---------------------------------------------------
//--------------atomic_fetch_add---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct AddFunctor{
+template< class T, class DEVICE_TYPE >
+struct AddFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_fetch_add(&data(),(T)1);
+ void operator()( int ) const {
+ Kokkos::atomic_fetch_add( &data(), (T) 1 );
}
};
-template<class T, class execution_space >
-T AddLoop(int loop) {
- struct ZeroFunctor<T,execution_space> f_zero;
- typename ZeroFunctor<T,execution_space>::type data("Data");
- typename ZeroFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T AddLoop( int loop ) {
+ struct ZeroFunctor< T, execution_space > f_zero;
+ typename ZeroFunctor< T, execution_space >::type data( "Data" );
+ typename ZeroFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_zero.data = data;
- Kokkos::parallel_for(1,f_zero);
+ Kokkos::parallel_for( 1, f_zero );
execution_space::fence();
- struct AddFunctor<T,execution_space> f_add;
+ struct AddFunctor< T, execution_space > f_add;
+
f_add.data = data;
- Kokkos::parallel_for(loop,f_add);
+ Kokkos::parallel_for( loop, f_add );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T AddLoopSerial(int loop) {
+template< class T >
+T AddLoopSerial( int loop ) {
T* data = new T[1];
data[0] = 0;
- for(int i=0;i<loop;i++)
- *data+=(T)1;
+ for ( int i = 0; i < loop; i++ ) {
+ *data += (T) 1;
+ }
T val = *data;
delete [] data;
+
return val;
}
//------------------------------------------------------
//--------------atomic_compare_exchange-----------------
//------------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct CASFunctor{
+template< class T, class DEVICE_TYPE >
+struct CASFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- T old = data();
- T newval, assumed;
- do {
- assumed = old;
- newval = assumed + (T)1;
- old = Kokkos::atomic_compare_exchange(&data(), assumed, newval);
- }
- while( old != assumed );
+ void operator()( int ) const {
+ T old = data();
+ T newval, assumed;
+
+ do {
+ assumed = old;
+ newval = assumed + (T) 1;
+ old = Kokkos::atomic_compare_exchange( &data(), assumed, newval );
+ } while( old != assumed );
}
};
-template<class T, class execution_space >
-T CASLoop(int loop) {
- struct ZeroFunctor<T,execution_space> f_zero;
- typename ZeroFunctor<T,execution_space>::type data("Data");
- typename ZeroFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T CASLoop( int loop ) {
+ struct ZeroFunctor< T, execution_space > f_zero;
+ typename ZeroFunctor< T, execution_space >::type data( "Data" );
+ typename ZeroFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_zero.data = data;
- Kokkos::parallel_for(1,f_zero);
+ Kokkos::parallel_for( 1, f_zero );
execution_space::fence();
- struct CASFunctor<T,execution_space> f_cas;
+ struct CASFunctor< T, execution_space > f_cas;
+
f_cas.data = data;
- Kokkos::parallel_for(loop,f_cas);
+ Kokkos::parallel_for( loop, f_cas );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
return val;
}
-template<class T>
-T CASLoopSerial(int loop) {
+template< class T >
+T CASLoopSerial( int loop ) {
T* data = new T[1];
data[0] = 0;
- for(int i=0;i<loop;i++) {
- T assumed;
- T newval;
- T old;
- do {
- assumed = *data;
- newval = assumed + (T)1;
- old = *data;
- *data = newval;
- }
- while(!(assumed==old));
+ for ( int i = 0; i < loop; i++ ) {
+ T assumed;
+ T newval;
+ T old;
+
+ do {
+ assumed = *data;
+ newval = assumed + (T) 1;
+ old = *data;
+ *data = newval;
+ } while( !( assumed == old ) );
}
T val = *data;
delete [] data;
+
return val;
}
//----------------------------------------------
//--------------atomic_exchange-----------------
//----------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct ExchFunctor{
+template< class T, class DEVICE_TYPE >
+struct ExchFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data, data2;
KOKKOS_INLINE_FUNCTION
- void operator()(int i) const {
- T old = Kokkos::atomic_exchange(&data(),(T)i);
- Kokkos::atomic_fetch_add(&data2(),old);
+ void operator()( int i ) const {
+ T old = Kokkos::atomic_exchange( &data(), (T) i );
+ Kokkos::atomic_fetch_add( &data2(), old );
}
};
-template<class T, class execution_space >
-T ExchLoop(int loop) {
- struct ZeroFunctor<T,execution_space> f_zero;
- typename ZeroFunctor<T,execution_space>::type data("Data");
- typename ZeroFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T ExchLoop( int loop ) {
+ struct ZeroFunctor< T, execution_space > f_zero;
+ typename ZeroFunctor< T, execution_space >::type data( "Data" );
+ typename ZeroFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_zero.data = data;
- Kokkos::parallel_for(1,f_zero);
+ Kokkos::parallel_for( 1, f_zero );
execution_space::fence();
- typename ZeroFunctor<T,execution_space>::type data2("Data");
- typename ZeroFunctor<T,execution_space>::h_type h_data2("HData");
+ typename ZeroFunctor< T, execution_space >::type data2( "Data" );
+ typename ZeroFunctor< T, execution_space >::h_type h_data2( "HData" );
+
f_zero.data = data2;
- Kokkos::parallel_for(1,f_zero);
+ Kokkos::parallel_for( 1, f_zero );
execution_space::fence();
- struct ExchFunctor<T,execution_space> f_exch;
+ struct ExchFunctor< T, execution_space > f_exch;
+
f_exch.data = data;
f_exch.data2 = data2;
- Kokkos::parallel_for(loop,f_exch);
+ Kokkos::parallel_for( loop, f_exch );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
- Kokkos::deep_copy(h_data2,data2);
+ Kokkos::deep_copy( h_data, data );
+ Kokkos::deep_copy( h_data2, data2 );
T val = h_data() + h_data2();
return val;
}
-template<class T>
-T ExchLoopSerial(typename std::conditional<!std::is_same<T,Kokkos::complex<double> >::value,int,void>::type loop) {
+template< class T >
+T ExchLoopSerial( typename std::conditional< !std::is_same< T, Kokkos::complex<double> >::value, int, void >::type loop ) {
T* data = new T[1];
T* data2 = new T[1];
data[0] = 0;
data2[0] = 0;
- for(int i=0;i<loop;i++) {
- T old = *data;
- *data=(T) i;
- *data2+=old;
+
+ for ( int i = 0; i < loop; i++ ) {
+ T old = *data;
+ *data = (T) i;
+ *data2 += old;
}
T val = *data2 + *data;
delete [] data;
delete [] data2;
+
return val;
}
-template<class T>
-T ExchLoopSerial(typename std::conditional<std::is_same<T,Kokkos::complex<double> >::value,int,void>::type loop) {
+template< class T >
+T ExchLoopSerial( typename std::conditional< std::is_same< T, Kokkos::complex<double> >::value, int, void >::type loop ) {
T* data = new T[1];
T* data2 = new T[1];
data[0] = 0;
data2[0] = 0;
- for(int i=0;i<loop;i++) {
- T old = *data;
- data->real() = (static_cast<double>(i));
- data->imag() = 0;
- *data2+=old;
+
+ for ( int i = 0; i < loop; i++ ) {
+ T old = *data;
+ data->real() = ( static_cast<double>( i ) );
+ data->imag() = 0;
+ *data2 += old;
}
T val = *data2 + *data;
delete [] data;
delete [] data2;
+
return val;
}
-template<class T, class DeviceType >
-T LoopVariant(int loop, int test) {
- switch (test) {
- case 1: return AddLoop<T,DeviceType>(loop);
- case 2: return CASLoop<T,DeviceType>(loop);
- case 3: return ExchLoop<T,DeviceType>(loop);
+template< class T, class DeviceType >
+T LoopVariant( int loop, int test ) {
+ switch ( test ) {
+ case 1: return AddLoop< T, DeviceType >( loop );
+ case 2: return CASLoop< T, DeviceType >( loop );
+ case 3: return ExchLoop< T, DeviceType >( loop );
}
+
return 0;
}
-template<class T>
-T LoopVariantSerial(int loop, int test) {
- switch (test) {
- case 1: return AddLoopSerial<T>(loop);
- case 2: return CASLoopSerial<T>(loop);
- case 3: return ExchLoopSerial<T>(loop);
+template< class T >
+T LoopVariantSerial( int loop, int test ) {
+ switch ( test ) {
+ case 1: return AddLoopSerial< T >( loop );
+ case 2: return CASLoopSerial< T >( loop );
+ case 3: return ExchLoopSerial< T >( loop );
}
+
return 0;
}
-template<class T,class DeviceType>
-bool Loop(int loop, int test)
+template< class T, class DeviceType >
+bool Loop( int loop, int test )
{
- T res = LoopVariant<T,DeviceType>(loop,test);
- T resSerial = LoopVariantSerial<T>(loop,test);
+ T res = LoopVariant< T, DeviceType >( loop, test );
+ T resSerial = LoopVariantSerial< T >( loop, test );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = "
<< test << " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
-
- return passed ;
-}
-
+ return passed;
}
+} // namespace TestAtomic
diff --git a/lib/kokkos/core/unit_test/TestAtomicOperations.hpp b/lib/kokkos/core/unit_test/TestAtomicOperations.hpp
index 7f1519045..e3ceca404 100644
--- a/lib/kokkos/core/unit_test/TestAtomicOperations.hpp
+++ b/lib/kokkos/core/unit_test/TestAtomicOperations.hpp
@@ -1,985 +1,1059 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
namespace TestAtomicOperations {
//-----------------------------------------------
//--------------zero_functor---------------------
//-----------------------------------------------
-template<class T,class DEVICE_TYPE>
+template< class T, class DEVICE_TYPE >
struct ZeroFunctor {
typedef DEVICE_TYPE execution_space;
- typedef typename Kokkos::View<T,execution_space> type;
- typedef typename Kokkos::View<T,execution_space>::HostMirror h_type;
+ typedef typename Kokkos::View< T, execution_space > type;
+ typedef typename Kokkos::View< T, execution_space >::HostMirror h_type;
+
type data;
+
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
+ void operator()( int ) const {
data() = 0;
}
};
//-----------------------------------------------
//--------------init_functor---------------------
//-----------------------------------------------
-template<class T,class DEVICE_TYPE>
+template< class T, class DEVICE_TYPE >
struct InitFunctor {
typedef DEVICE_TYPE execution_space;
- typedef typename Kokkos::View<T,execution_space> type;
- typedef typename Kokkos::View<T,execution_space>::HostMirror h_type;
+ typedef typename Kokkos::View< T, execution_space > type;
+ typedef typename Kokkos::View< T, execution_space >::HostMirror h_type;
+
type data;
- T init_value ;
+ T init_value;
+
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
+ void operator()( int ) const {
data() = init_value;
}
- InitFunctor(T _init_value) : init_value(_init_value) {}
+ InitFunctor( T _init_value ) : init_value( _init_value ) {}
};
-
//---------------------------------------------------
//--------------atomic_fetch_max---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct MaxFunctor{
+template< class T, class DEVICE_TYPE >
+struct MaxFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- //Kokkos::atomic_fetch_max(&data(),(T)1);
- Kokkos::atomic_fetch_max(&data(),(T)i1);
+ void operator()( int ) const {
+ //Kokkos::atomic_fetch_max( &data(), (T) 1 );
+ Kokkos::atomic_fetch_max( &data(), (T) i1 );
}
- MaxFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
+ MaxFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
};
-template<class T, class execution_space >
-T MaxAtomic(T i0 , T i1) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T MaxAtomic( T i0, T i1 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct MaxFunctor<T,execution_space> f(i0,i1);
+ struct MaxFunctor< T, execution_space > f( i0, i1 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T MaxAtomicCheck(T i0 , T i1) {
+template< class T >
+T MaxAtomicCheck( T i0, T i1 ) {
T* data = new T[1];
data[0] = 0;
- *data = (i0 > i1 ? i0 : i1) ;
+ *data = ( i0 > i1 ? i0 : i1 );
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool MaxAtomicTest(T i0, T i1)
+template< class T, class DeviceType >
+bool MaxAtomicTest( T i0, T i1 )
{
- T res = MaxAtomic<T,DeviceType>(i0,i1);
- T resSerial = MaxAtomicCheck<T>(i0,i1);
+ T res = MaxAtomic< T, DeviceType >( i0, i1 );
+ T resSerial = MaxAtomicCheck<T>( i0, i1 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = MaxAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_fetch_min---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct MinFunctor{
+template< class T, class DEVICE_TYPE >
+struct MinFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_fetch_min(&data(),(T)i1);
+ void operator()( int ) const {
+ Kokkos::atomic_fetch_min( &data(), (T) i1 );
}
- MinFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
+
+ MinFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
};
-template<class T, class execution_space >
-T MinAtomic(T i0 , T i1) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T MinAtomic( T i0, T i1 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct MinFunctor<T,execution_space> f(i0,i1);
+ struct MinFunctor< T, execution_space > f( i0, i1 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T MinAtomicCheck(T i0 , T i1) {
+template< class T >
+T MinAtomicCheck( T i0, T i1 ) {
T* data = new T[1];
data[0] = 0;
- *data = (i0 < i1 ? i0 : i1) ;
+ *data = ( i0 < i1 ? i0 : i1 );
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool MinAtomicTest(T i0, T i1)
+template< class T, class DeviceType >
+bool MinAtomicTest( T i0, T i1 )
{
- T res = MinAtomic<T,DeviceType>(i0,i1);
- T resSerial = MinAtomicCheck<T>(i0,i1);
+ T res = MinAtomic< T, DeviceType >( i0, i1 );
+ T resSerial = MinAtomicCheck< T >( i0, i1 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = MinAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_increment---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct IncFunctor{
+template< class T, class DEVICE_TYPE >
+struct IncFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_increment(&data());
+ void operator()( int ) const {
+ Kokkos::atomic_increment( &data() );
}
- IncFunctor( T _i0 ) : i0(_i0) {}
+
+ IncFunctor( T _i0 ) : i0( _i0 ) {}
};
-template<class T, class execution_space >
-T IncAtomic(T i0) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T IncAtomic( T i0 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct IncFunctor<T,execution_space> f(i0);
+ struct IncFunctor< T, execution_space > f( i0 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T IncAtomicCheck(T i0) {
+template< class T >
+T IncAtomicCheck( T i0 ) {
T* data = new T[1];
data[0] = 0;
*data = i0 + 1;
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool IncAtomicTest(T i0)
+template< class T, class DeviceType >
+bool IncAtomicTest( T i0 )
{
- T res = IncAtomic<T,DeviceType>(i0);
- T resSerial = IncAtomicCheck<T>(i0);
+ T res = IncAtomic< T, DeviceType >( i0 );
+ T resSerial = IncAtomicCheck< T >( i0 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = IncAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_decrement---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct DecFunctor{
+template< class T, class DEVICE_TYPE >
+struct DecFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_decrement(&data());
+ void operator()( int ) const {
+ Kokkos::atomic_decrement( &data() );
}
- DecFunctor( T _i0 ) : i0(_i0) {}
+
+ DecFunctor( T _i0 ) : i0( _i0 ) {}
};
-template<class T, class execution_space >
-T DecAtomic(T i0) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T DecAtomic( T i0 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct DecFunctor<T,execution_space> f(i0);
+ struct DecFunctor< T, execution_space > f( i0 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T DecAtomicCheck(T i0) {
+template< class T >
+T DecAtomicCheck( T i0 ) {
T* data = new T[1];
data[0] = 0;
*data = i0 - 1;
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool DecAtomicTest(T i0)
+template< class T, class DeviceType >
+bool DecAtomicTest( T i0 )
{
- T res = DecAtomic<T,DeviceType>(i0);
- T resSerial = DecAtomicCheck<T>(i0);
+ T res = DecAtomic< T, DeviceType >( i0 );
+ T resSerial = DecAtomicCheck< T >( i0 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = DecAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_fetch_mul---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct MulFunctor{
+template< class T, class DEVICE_TYPE >
+struct MulFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_fetch_mul(&data(),(T)i1);
+ void operator()( int ) const {
+ Kokkos::atomic_fetch_mul( &data(), (T) i1 );
}
- MulFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
+
+ MulFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
};
-template<class T, class execution_space >
-T MulAtomic(T i0 , T i1) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T MulAtomic( T i0, T i1 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct MulFunctor<T,execution_space> f(i0,i1);
+ struct MulFunctor< T, execution_space > f( i0, i1 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T MulAtomicCheck(T i0 , T i1) {
+template< class T >
+T MulAtomicCheck( T i0, T i1 ) {
T* data = new T[1];
data[0] = 0;
- *data = i0*i1 ;
+ *data = i0*i1;
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool MulAtomicTest(T i0, T i1)
+template< class T, class DeviceType >
+bool MulAtomicTest( T i0, T i1 )
{
- T res = MulAtomic<T,DeviceType>(i0,i1);
- T resSerial = MulAtomicCheck<T>(i0,i1);
+ T res = MulAtomic< T, DeviceType >( i0, i1 );
+ T resSerial = MulAtomicCheck< T >( i0, i1 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = MulAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_fetch_div---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct DivFunctor{
+template< class T, class DEVICE_TYPE >
+struct DivFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_fetch_div(&data(),(T)i1);
+ void operator()( int ) const {
+ Kokkos::atomic_fetch_div( &data(), (T) i1 );
}
- DivFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
+
+ DivFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
};
-template<class T, class execution_space >
-T DivAtomic(T i0 , T i1) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T DivAtomic( T i0, T i1 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct DivFunctor<T,execution_space> f(i0,i1);
+ struct DivFunctor< T, execution_space > f( i0, i1 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T DivAtomicCheck(T i0 , T i1) {
+template< class T >
+T DivAtomicCheck( T i0, T i1 ) {
T* data = new T[1];
data[0] = 0;
- *data = i0/i1 ;
+ *data = i0 / i1;
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool DivAtomicTest(T i0, T i1)
+template< class T, class DeviceType >
+bool DivAtomicTest( T i0, T i1 )
{
- T res = DivAtomic<T,DeviceType>(i0,i1);
- T resSerial = DivAtomicCheck<T>(i0,i1);
+ T res = DivAtomic< T, DeviceType >( i0, i1 );
+ T resSerial = DivAtomicCheck< T >( i0, i1 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = DivAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_fetch_mod---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct ModFunctor{
+template< class T, class DEVICE_TYPE >
+struct ModFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_fetch_mod(&data(),(T)i1);
+ void operator()( int ) const {
+ Kokkos::atomic_fetch_mod( &data(), (T) i1 );
}
- ModFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
+
+ ModFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
};
-template<class T, class execution_space >
-T ModAtomic(T i0 , T i1) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T ModAtomic( T i0, T i1 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct ModFunctor<T,execution_space> f(i0,i1);
+ struct ModFunctor< T, execution_space > f( i0, i1 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T ModAtomicCheck(T i0 , T i1) {
+template< class T >
+T ModAtomicCheck( T i0, T i1 ) {
T* data = new T[1];
data[0] = 0;
- *data = i0%i1 ;
+ *data = i0 % i1;
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool ModAtomicTest(T i0, T i1)
+template< class T, class DeviceType >
+bool ModAtomicTest( T i0, T i1 )
{
- T res = ModAtomic<T,DeviceType>(i0,i1);
- T resSerial = ModAtomicCheck<T>(i0,i1);
+ T res = ModAtomic< T, DeviceType >( i0, i1 );
+ T resSerial = ModAtomicCheck< T >( i0, i1 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = ModAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_fetch_and---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct AndFunctor{
+template< class T, class DEVICE_TYPE >
+struct AndFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_fetch_and(&data(),(T)i1);
+ void operator()( int ) const {
+ Kokkos::atomic_fetch_and( &data(), (T) i1 );
}
- AndFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
+
+ AndFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
};
-template<class T, class execution_space >
-T AndAtomic(T i0 , T i1) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T AndAtomic( T i0, T i1 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct AndFunctor<T,execution_space> f(i0,i1);
+ struct AndFunctor< T, execution_space > f( i0, i1 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T AndAtomicCheck(T i0 , T i1) {
+template< class T >
+T AndAtomicCheck( T i0, T i1 ) {
T* data = new T[1];
data[0] = 0;
- *data = i0&i1 ;
+ *data = i0 & i1;
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool AndAtomicTest(T i0, T i1)
+template< class T, class DeviceType >
+bool AndAtomicTest( T i0, T i1 )
{
- T res = AndAtomic<T,DeviceType>(i0,i1);
- T resSerial = AndAtomicCheck<T>(i0,i1);
+ T res = AndAtomic< T, DeviceType >( i0, i1 );
+ T resSerial = AndAtomicCheck< T >( i0, i1 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = AndAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_fetch_or----------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct OrFunctor{
+template< class T, class DEVICE_TYPE >
+struct OrFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_fetch_or(&data(),(T)i1);
+ void operator()( int ) const {
+ Kokkos::atomic_fetch_or( &data(), (T) i1 );
}
- OrFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
+
+ OrFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
};
-template<class T, class execution_space >
-T OrAtomic(T i0 , T i1) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T OrAtomic( T i0, T i1 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct OrFunctor<T,execution_space> f(i0,i1);
+ struct OrFunctor< T, execution_space > f( i0, i1 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T OrAtomicCheck(T i0 , T i1) {
+template< class T >
+T OrAtomicCheck( T i0, T i1 ) {
T* data = new T[1];
data[0] = 0;
- *data = i0|i1 ;
+ *data = i0 | i1;
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool OrAtomicTest(T i0, T i1)
+template< class T, class DeviceType >
+bool OrAtomicTest( T i0, T i1 )
{
- T res = OrAtomic<T,DeviceType>(i0,i1);
- T resSerial = OrAtomicCheck<T>(i0,i1);
+ T res = OrAtomic< T, DeviceType >( i0, i1 );
+ T resSerial = OrAtomicCheck< T >( i0, i1 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = OrAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_fetch_xor---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct XorFunctor{
+template< class T, class DEVICE_TYPE >
+struct XorFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_fetch_xor(&data(),(T)i1);
+ void operator()( int ) const {
+ Kokkos::atomic_fetch_xor( &data(), (T) i1 );
}
- XorFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
+
+ XorFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
};
-template<class T, class execution_space >
-T XorAtomic(T i0 , T i1) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T XorAtomic( T i0, T i1 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct XorFunctor<T,execution_space> f(i0,i1);
+ struct XorFunctor< T, execution_space > f( i0, i1 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T XorAtomicCheck(T i0 , T i1) {
+template< class T >
+T XorAtomicCheck( T i0, T i1 ) {
T* data = new T[1];
data[0] = 0;
- *data = i0^i1 ;
+ *data = i0 ^ i1;
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool XorAtomicTest(T i0, T i1)
+template< class T, class DeviceType >
+bool XorAtomicTest( T i0, T i1 )
{
- T res = XorAtomic<T,DeviceType>(i0,i1);
- T resSerial = XorAtomicCheck<T>(i0,i1);
+ T res = XorAtomic< T, DeviceType >( i0, i1 );
+ T resSerial = XorAtomicCheck< T >( i0, i1 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = XorAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_fetch_lshift---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct LShiftFunctor{
+template< class T, class DEVICE_TYPE >
+struct LShiftFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_fetch_lshift(&data(),(T)i1);
+ void operator()( int ) const {
+ Kokkos::atomic_fetch_lshift( &data(), (T) i1 );
}
- LShiftFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
+
+ LShiftFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
};
-template<class T, class execution_space >
-T LShiftAtomic(T i0 , T i1) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T LShiftAtomic( T i0, T i1 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct LShiftFunctor<T,execution_space> f(i0,i1);
+ struct LShiftFunctor< T, execution_space > f( i0, i1 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T LShiftAtomicCheck(T i0 , T i1) {
+template< class T >
+T LShiftAtomicCheck( T i0, T i1 ) {
T* data = new T[1];
data[0] = 0;
- *data = i0<<i1 ;
+ *data = i0 << i1;
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool LShiftAtomicTest(T i0, T i1)
+template< class T, class DeviceType >
+bool LShiftAtomicTest( T i0, T i1 )
{
- T res = LShiftAtomic<T,DeviceType>(i0,i1);
- T resSerial = LShiftAtomicCheck<T>(i0,i1);
+ T res = LShiftAtomic< T, DeviceType >( i0, i1 );
+ T resSerial = LShiftAtomicCheck< T >( i0, i1 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = LShiftAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
//---------------------------------------------------
//--------------atomic_fetch_rshift---------------------
//---------------------------------------------------
-template<class T,class DEVICE_TYPE>
-struct RShiftFunctor{
+template< class T, class DEVICE_TYPE >
+struct RShiftFunctor {
typedef DEVICE_TYPE execution_space;
- typedef Kokkos::View<T,execution_space> type;
+ typedef Kokkos::View< T, execution_space > type;
+
type data;
T i0;
T i1;
KOKKOS_INLINE_FUNCTION
- void operator()(int) const {
- Kokkos::atomic_fetch_rshift(&data(),(T)i1);
+ void operator()( int ) const {
+ Kokkos::atomic_fetch_rshift( &data(), (T) i1 );
}
- RShiftFunctor( T _i0 , T _i1 ) : i0(_i0) , i1(_i1) {}
+
+ RShiftFunctor( T _i0, T _i1 ) : i0( _i0 ), i1( _i1 ) {}
};
-template<class T, class execution_space >
-T RShiftAtomic(T i0 , T i1) {
- struct InitFunctor<T,execution_space> f_init(i0);
- typename InitFunctor<T,execution_space>::type data("Data");
- typename InitFunctor<T,execution_space>::h_type h_data("HData");
+template< class T, class execution_space >
+T RShiftAtomic( T i0, T i1 ) {
+ struct InitFunctor< T, execution_space > f_init( i0 );
+ typename InitFunctor< T, execution_space >::type data( "Data" );
+ typename InitFunctor< T, execution_space >::h_type h_data( "HData" );
+
f_init.data = data;
- Kokkos::parallel_for(1,f_init);
+ Kokkos::parallel_for( 1, f_init );
execution_space::fence();
- struct RShiftFunctor<T,execution_space> f(i0,i1);
+ struct RShiftFunctor< T, execution_space > f( i0, i1 );
+
f.data = data;
- Kokkos::parallel_for(1,f);
+ Kokkos::parallel_for( 1, f );
execution_space::fence();
- Kokkos::deep_copy(h_data,data);
+ Kokkos::deep_copy( h_data, data );
T val = h_data();
+
return val;
}
-template<class T>
-T RShiftAtomicCheck(T i0 , T i1) {
+template< class T >
+T RShiftAtomicCheck( T i0, T i1 ) {
T* data = new T[1];
data[0] = 0;
- *data = i0>>i1 ;
+ *data = i0 >> i1;
T val = *data;
delete [] data;
+
return val;
}
-template<class T,class DeviceType>
-bool RShiftAtomicTest(T i0, T i1)
+template< class T, class DeviceType >
+bool RShiftAtomicTest( T i0, T i1 )
{
- T res = RShiftAtomic<T,DeviceType>(i0,i1);
- T resSerial = RShiftAtomicCheck<T>(i0,i1);
+ T res = RShiftAtomic< T, DeviceType >( i0, i1 );
+ T resSerial = RShiftAtomicCheck< T >( i0, i1 );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = RShiftAtomicTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
//---------------------------------------------------
//--------------atomic_test_control------------------
//---------------------------------------------------
-template<class T,class DeviceType>
-bool AtomicOperationsTestIntegralType( int i0 , int i1 , int test )
+template< class T, class DeviceType >
+bool AtomicOperationsTestIntegralType( int i0, int i1, int test )
{
- switch (test) {
- case 1: return MaxAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 2: return MinAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 3: return MulAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 4: return DivAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 5: return ModAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 6: return AndAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 7: return OrAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 8: return XorAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 9: return LShiftAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 10: return RShiftAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 11: return IncAtomicTest<T,DeviceType>( (T)i0 );
- case 12: return DecAtomicTest<T,DeviceType>( (T)i0 );
+ switch ( test ) {
+ case 1: return MaxAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 2: return MinAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 3: return MulAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 4: return DivAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 5: return ModAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 6: return AndAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 7: return OrAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 8: return XorAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 9: return LShiftAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 10: return RShiftAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 11: return IncAtomicTest< T, DeviceType >( (T) i0 );
+ case 12: return DecAtomicTest< T, DeviceType >( (T) i0 );
}
+
return 0;
}
-template<class T,class DeviceType>
-bool AtomicOperationsTestNonIntegralType( int i0 , int i1 , int test )
+template< class T, class DeviceType >
+bool AtomicOperationsTestNonIntegralType( int i0, int i1, int test )
{
- switch (test) {
- case 1: return MaxAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 2: return MinAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 3: return MulAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
- case 4: return DivAtomicTest<T,DeviceType>( (T)i0 , (T)i1 );
+ switch ( test ) {
+ case 1: return MaxAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 2: return MinAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 3: return MulAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
+ case 4: return DivAtomicTest< T, DeviceType >( (T) i0, (T) i1 );
}
+
return 0;
}
-} // namespace
-
+} // namespace TestAtomicOperations
diff --git a/lib/kokkos/core/unit_test/TestAtomicViews.hpp b/lib/kokkos/core/unit_test/TestAtomicViews.hpp
index 739492d32..71080e5c8 100644
--- a/lib/kokkos/core/unit_test/TestAtomicViews.hpp
+++ b/lib/kokkos/core/unit_test/TestAtomicViews.hpp
@@ -1,1532 +1,1439 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
namespace TestAtomicViews {
//-------------------------------------------------
//-----------atomic view api tests-----------------
//-------------------------------------------------
-template< class T , class ... P >
-size_t allocation_count( const Kokkos::View<T,P...> & view )
+template< class T, class ... P >
+size_t allocation_count( const Kokkos::View< T, P... > & view )
{
const size_t card = view.size();
const size_t alloc = view.span();
- const int memory_span = Kokkos::View<int*>::required_allocation_size(100);
+ const int memory_span = Kokkos::View< int* >::required_allocation_size( 100 );
- return (card <= alloc && memory_span == 400) ? alloc : 0 ;
+ return ( card <= alloc && memory_span == 400 ) ? alloc : 0;
}
-template< class DataType ,
- class DeviceType ,
+template< class DataType,
+ class DeviceType,
unsigned Rank = Kokkos::ViewTraits< DataType >::rank >
-struct TestViewOperator_LeftAndRight ;
+struct TestViewOperator_LeftAndRight;
-template< class DataType , class DeviceType >
-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 1 >
+template< class DataType, class DeviceType >
+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 1 >
{
- typedef typename DeviceType::execution_space execution_space ;
- typedef typename DeviceType::memory_space memory_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef typename DeviceType::execution_space execution_space;
+ typedef typename DeviceType::memory_space memory_space;
+ typedef typename execution_space::size_type size_type;
- typedef int value_type ;
+ typedef int value_type;
KOKKOS_INLINE_FUNCTION
- static void join( volatile value_type & update ,
+ static void join( volatile value_type & update,
const volatile value_type & input )
- { update |= input ; }
+ { update |= input; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
- { update = 0 ; }
+ { update = 0; }
+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > left_view;
- typedef Kokkos::
- View< DataType, Kokkos::LayoutLeft, execution_space, Kokkos::MemoryTraits< Kokkos::Atomic > > left_view ;
+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > right_view;
- typedef Kokkos::
- View< DataType, Kokkos::LayoutRight, execution_space, Kokkos::MemoryTraits< Kokkos::Atomic > > right_view ;
+ typedef Kokkos::View< DataType, Kokkos::LayoutStride, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > stride_view;
- typedef Kokkos::
- View< DataType, Kokkos::LayoutStride, execution_space, Kokkos::MemoryTraits< Kokkos::Atomic >> stride_view ;
-
- left_view left ;
- right_view right ;
- stride_view left_stride ;
- stride_view right_stride ;
- long left_alloc ;
- long right_alloc ;
+ left_view left;
+ right_view right;
+ stride_view left_stride;
+ stride_view right_stride;
+ long left_alloc;
+ long right_alloc;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
- TestViewOperator_LeftAndRight driver ;
+ TestViewOperator_LeftAndRight driver;
- int error_flag = 0 ;
+ int error_flag = 0;
- Kokkos::parallel_reduce( 1 , driver , error_flag );
+ Kokkos::parallel_reduce( 1, driver, error_flag );
- ASSERT_EQ( error_flag , 0 );
+ ASSERT_EQ( error_flag, 0 );
}
KOKKOS_INLINE_FUNCTION
- void operator()( const size_type , value_type & update ) const
+ void operator()( const size_type, value_type & update ) const
{
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
{
- // below checks that values match, but unable to check the references
- // - should this be able to be checked?
- if ( left(i0) != left(i0,0,0,0,0,0,0,0) ) { update |= 3 ; }
- if ( right(i0) != right(i0,0,0,0,0,0,0,0) ) { update |= 3 ; }
- if ( left(i0) != left_stride(i0) ) { update |= 4 ; }
- if ( right(i0) != right_stride(i0) ) { update |= 8 ; }
- /*
- if ( & left(i0) != & left(i0,0,0,0,0,0,0,0) ) { update |= 3 ; }
- if ( & right(i0) != & right(i0,0,0,0,0,0,0,0) ) { update |= 3 ; }
- if ( & left(i0) != & left_stride(i0) ) { update |= 4 ; }
- if ( & right(i0) != & right_stride(i0) ) { update |= 8 ; }
- */
+ // Below checks that values match, but unable to check the references.
+ // Should this be able to be checked?
+ if ( left( i0 ) != left( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update |= 3; }
+ if ( right( i0 ) != right( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update |= 3; }
+ if ( left( i0 ) != left_stride( i0 ) ) { update |= 4; }
+ if ( right( i0 ) != right_stride( i0 ) ) { update |= 8; }
+/*
+ if ( &left( i0 ) != &left( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update |= 3; }
+ if ( &right( i0 ) != &right( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update |= 3; }
+ if ( &left( i0 ) != &left_stride( i0 ) ) { update |= 4; }
+ if ( &right( i0 ) != &right_stride( i0 ) ) { update |= 8; }
+*/
}
}
};
-
template< typename T, class DeviceType >
class TestAtomicViewAPI
{
public:
- typedef DeviceType device ;
+ typedef DeviceType device;
- enum { N0 = 1000 ,
- N1 = 3 ,
- N2 = 5 ,
+ enum { N0 = 1000,
+ N1 = 3,
+ N2 = 5,
N3 = 7 };
- typedef Kokkos::View< T , device > dView0 ;
- typedef Kokkos::View< T* , device > dView1 ;
- typedef Kokkos::View< T*[N1] , device > dView2 ;
- typedef Kokkos::View< T*[N1][N2] , device > dView3 ;
- typedef Kokkos::View< T*[N1][N2][N3] , device > dView4 ;
- typedef Kokkos::View< const T*[N1][N2][N3] , device > const_dView4 ;
- typedef Kokkos::View< T****, device, Kokkos::MemoryUnmanaged > dView4_unmanaged ;
- typedef typename dView0::host_mirror_space host ;
+ typedef Kokkos::View< T, device > dView0;
+ typedef Kokkos::View< T*, device > dView1;
+ typedef Kokkos::View< T*[N1], device > dView2;
+ typedef Kokkos::View< T*[N1][N2], device > dView3;
+ typedef Kokkos::View< T*[N1][N2][N3], device > dView4;
+ typedef Kokkos::View< const T*[N1][N2][N3], device > const_dView4;
+ typedef Kokkos::View< T****, device, Kokkos::MemoryUnmanaged > dView4_unmanaged;
+ typedef typename dView0::host_mirror_space host;
- typedef Kokkos::View< T , device , Kokkos::MemoryTraits< Kokkos::Atomic > > aView0 ;
- typedef Kokkos::View< T* , device , Kokkos::MemoryTraits< Kokkos::Atomic > > aView1 ;
- typedef Kokkos::View< T*[N1] , device , Kokkos::MemoryTraits< Kokkos::Atomic > > aView2 ;
- typedef Kokkos::View< T*[N1][N2] , device , Kokkos::MemoryTraits< Kokkos::Atomic > > aView3 ;
- typedef Kokkos::View< T*[N1][N2][N3] , device , Kokkos::MemoryTraits< Kokkos::Atomic > > aView4 ;
- typedef Kokkos::View< const T*[N1][N2][N3] , device , Kokkos::MemoryTraits< Kokkos::Atomic > > const_aView4 ;
+ typedef Kokkos::View< T, device, Kokkos::MemoryTraits< Kokkos::Atomic > > aView0;
+ typedef Kokkos::View< T*, device, Kokkos::MemoryTraits< Kokkos::Atomic > > aView1;
+ typedef Kokkos::View< T*[N1], device, Kokkos::MemoryTraits< Kokkos::Atomic > > aView2;
+ typedef Kokkos::View< T*[N1][N2], device, Kokkos::MemoryTraits< Kokkos::Atomic > > aView3;
+ typedef Kokkos::View< T*[N1][N2][N3], device, Kokkos::MemoryTraits< Kokkos::Atomic > > aView4;
+ typedef Kokkos::View< const T*[N1][N2][N3], device, Kokkos::MemoryTraits< Kokkos::Atomic > > const_aView4;
- typedef Kokkos::View< T****, device, Kokkos::MemoryTraits< Kokkos::Unmanaged | Kokkos::Atomic > > aView4_unmanaged ;
+ typedef Kokkos::View< T****, device, Kokkos::MemoryTraits< Kokkos::Unmanaged | Kokkos::Atomic > > aView4_unmanaged;
- typedef typename aView0::host_mirror_space host_atomic ;
+ typedef typename aView0::host_mirror_space host_atomic;
TestAtomicViewAPI()
{
- TestViewOperator_LeftAndRight< int[2] , device >::testit();
+ TestViewOperator_LeftAndRight< int[2], device >::testit();
run_test_rank0();
run_test_rank4();
run_test_const();
}
-
static void run_test_rank0()
{
- dView0 dx , dy ;
- aView0 ax , ay , az ;
+ dView0 dx, dy;
+ aView0 ax, ay, az;
dx = dView0( "dx" );
dy = dView0( "dy" );
- ASSERT_EQ( dx.use_count() , size_t(1) );
- ASSERT_EQ( dy.use_count() , size_t(1) );
-
- ax = dx ;
- ay = dy ;
- ASSERT_EQ( dx.use_count() , size_t(2) );
- ASSERT_EQ( dy.use_count() , size_t(2) );
- ASSERT_EQ( dx.use_count() , ax.use_count() );
-
- az = ax ;
- ASSERT_EQ( dx.use_count() , size_t(3) );
- ASSERT_EQ( ax.use_count() , size_t(3) );
- ASSERT_EQ( az.use_count() , size_t(3) );
- ASSERT_EQ( az.use_count() , ax.use_count() );
+ ASSERT_EQ( dx.use_count(), size_t( 1 ) );
+ ASSERT_EQ( dy.use_count(), size_t( 1 ) );
+
+ ax = dx;
+ ay = dy;
+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );
+ ASSERT_EQ( dy.use_count(), size_t( 2 ) );
+ ASSERT_EQ( dx.use_count(), ax.use_count() );
+
+ az = ax;
+ ASSERT_EQ( dx.use_count(), size_t( 3 ) );
+ ASSERT_EQ( ax.use_count(), size_t( 3 ) );
+ ASSERT_EQ( az.use_count(), size_t( 3 ) );
+ ASSERT_EQ( az.use_count(), ax.use_count() );
}
static void run_test_rank4()
{
- dView4 dx , dy ;
- aView4 ax , ay , az ;
+ dView4 dx, dy;
+ aView4 ax, ay, az;
- dx = dView4( "dx" , N0 );
- dy = dView4( "dy" , N0 );
- ASSERT_EQ( dx.use_count() , size_t(1) );
- ASSERT_EQ( dy.use_count() , size_t(1) );
+ dx = dView4( "dx", N0 );
+ dy = dView4( "dy", N0 );
+ ASSERT_EQ( dx.use_count(), size_t( 1 ) );
+ ASSERT_EQ( dy.use_count(), size_t( 1 ) );
- ax = dx ;
- ay = dy ;
- ASSERT_EQ( dx.use_count() , size_t(2) );
- ASSERT_EQ( dy.use_count() , size_t(2) );
- ASSERT_EQ( dx.use_count() , ax.use_count() );
+ ax = dx;
+ ay = dy;
+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );
+ ASSERT_EQ( dy.use_count(), size_t( 2 ) );
+ ASSERT_EQ( dx.use_count(), ax.use_count() );
dView4_unmanaged unmanaged_dx = dx;
- ASSERT_EQ( dx.use_count() , size_t(2) );
+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );
- az = ax ;
- ASSERT_EQ( dx.use_count() , size_t(3) );
- ASSERT_EQ( ax.use_count() , size_t(3) );
- ASSERT_EQ( az.use_count() , size_t(3) );
- ASSERT_EQ( az.use_count() , ax.use_count() );
+ az = ax;
+ ASSERT_EQ( dx.use_count(), size_t( 3 ) );
+ ASSERT_EQ( ax.use_count(), size_t( 3 ) );
+ ASSERT_EQ( az.use_count(), size_t( 3 ) );
+ ASSERT_EQ( az.use_count(), ax.use_count() );
aView4_unmanaged unmanaged_ax = ax;
- ASSERT_EQ( ax.use_count() , size_t(3) );
+ ASSERT_EQ( ax.use_count(), size_t( 3 ) );
- aView4_unmanaged unmanaged_ax_from_ptr_dx = aView4_unmanaged(dx.data(),
- dx.dimension_0(),
- dx.dimension_1(),
- dx.dimension_2(),
- dx.dimension_3());
- ASSERT_EQ( ax.use_count() , size_t(3) );
+ aView4_unmanaged unmanaged_ax_from_ptr_dx =
+ aView4_unmanaged( dx.data(), dx.dimension_0(), dx.dimension_1(), dx.dimension_2(), dx.dimension_3() );
+ ASSERT_EQ( ax.use_count(), size_t( 3 ) );
- const_aView4 const_ax = ax ;
- ASSERT_EQ( ax.use_count() , size_t(4) );
- ASSERT_EQ( const_ax.use_count() , ax.use_count() );
+ const_aView4 const_ax = ax;
+ ASSERT_EQ( ax.use_count(), size_t( 4 ) );
+ ASSERT_EQ( const_ax.use_count(), ax.use_count() );
ASSERT_FALSE( ax.data() == 0 );
ASSERT_FALSE( const_ax.data() == 0 ); // referenceable ptr
ASSERT_FALSE( unmanaged_ax.data() == 0 );
ASSERT_FALSE( unmanaged_ax_from_ptr_dx.data() == 0 );
ASSERT_FALSE( ay.data() == 0 );
-// ASSERT_NE( ax , ay );
+// ASSERT_NE( ax, ay );
// Above test results in following runtime error from gtest:
// Expected: (ax) != (ay), actual: 32-byte object <30-01 D0-A0 D8-7F 00-00 00-31 44-0C 01-00 00-00 E8-03 00-00 00-00 00-00 69-00 00-00 00-00 00-00> vs 32-byte object <80-01 D0-A0 D8-7F 00-00 00-A1 4A-0C 01-00 00-00 E8-03 00-00 00-00 00-00 69-00 00-00 00-00 00-00>
- ASSERT_EQ( ax.dimension_0() , unsigned(N0) );
- ASSERT_EQ( ax.dimension_1() , unsigned(N1) );
- ASSERT_EQ( ax.dimension_2() , unsigned(N2) );
- ASSERT_EQ( ax.dimension_3() , unsigned(N3) );
+ ASSERT_EQ( ax.dimension_0(), unsigned( N0 ) );
+ ASSERT_EQ( ax.dimension_1(), unsigned( N1 ) );
+ ASSERT_EQ( ax.dimension_2(), unsigned( N2 ) );
+ ASSERT_EQ( ax.dimension_3(), unsigned( N3 ) );
- ASSERT_EQ( ay.dimension_0() , unsigned(N0) );
- ASSERT_EQ( ay.dimension_1() , unsigned(N1) );
- ASSERT_EQ( ay.dimension_2() , unsigned(N2) );
- ASSERT_EQ( ay.dimension_3() , unsigned(N3) );
+ ASSERT_EQ( ay.dimension_0(), unsigned( N0 ) );
+ ASSERT_EQ( ay.dimension_1(), unsigned( N1 ) );
+ ASSERT_EQ( ay.dimension_2(), unsigned( N2 ) );
+ ASSERT_EQ( ay.dimension_3(), unsigned( N3 ) );
- ASSERT_EQ( unmanaged_ax_from_ptr_dx.capacity(),unsigned(N0)*unsigned(N1)*unsigned(N2)*unsigned(N3) );
+ ASSERT_EQ( unmanaged_ax_from_ptr_dx.capacity(), unsigned( N0 ) * unsigned( N1 ) * unsigned( N2 ) * unsigned( N3 ) );
}
- typedef T DataType[2] ;
+ typedef T DataType[2];
static void
check_auto_conversion_to_const(
- const Kokkos::View< const DataType , device , Kokkos::MemoryTraits< Kokkos::Atomic> > & arg_const ,
- const Kokkos::View< const DataType , device , Kokkos::MemoryTraits< Kokkos::Atomic> > & arg )
+ const Kokkos::View< const DataType, device, Kokkos::MemoryTraits<Kokkos::Atomic> > & arg_const,
+ const Kokkos::View< const DataType, device, Kokkos::MemoryTraits<Kokkos::Atomic> > & arg )
{
ASSERT_TRUE( arg_const == arg );
}
static void run_test_const()
{
- typedef Kokkos::View< DataType , device , Kokkos::MemoryTraits< Kokkos::Atomic> > typeX ;
- typedef Kokkos::View< const DataType , device , Kokkos::MemoryTraits< Kokkos::Atomic> > const_typeX ;
+ typedef Kokkos::View< DataType, device, Kokkos::MemoryTraits<Kokkos::Atomic> > typeX;
+ typedef Kokkos::View< const DataType, device, Kokkos::MemoryTraits<Kokkos::Atomic> > const_typeX;
typeX x( "X" );
- const_typeX xc = x ;
+ const_typeX xc = x;
//ASSERT_TRUE( xc == x ); // const xc is referenceable, non-const x is not
//ASSERT_TRUE( x == xc );
- check_auto_conversion_to_const( x , xc );
+ check_auto_conversion_to_const( x, xc );
}
-
};
-
//---------------------------------------------------
//-----------initialization functors-----------------
//---------------------------------------------------
template<class T, class execution_space >
struct InitFunctor_Seq {
+ typedef Kokkos::View< T*, execution_space > view_type;
- typedef Kokkos::View< T* , execution_space > view_type ;
-
- view_type input ;
- const long length ;
+ view_type input;
+ const long length;
- InitFunctor_Seq( view_type & input_ , const long length_ )
- : input(input_)
- , length(length_)
+ InitFunctor_Seq( view_type & input_, const long length_ )
+ : input( input_ )
+ , length( length_ )
{}
KOKKOS_INLINE_FUNCTION
void operator()( const long i ) const {
if ( i < length ) {
- input(i) = (T) i ;
+ input( i ) = (T) i;
}
}
-
};
-
template<class T, class execution_space >
struct InitFunctor_ModTimes {
+ typedef Kokkos::View< T*, execution_space > view_type;
- typedef Kokkos::View< T* , execution_space > view_type ;
-
- view_type input ;
- const long length ;
- const long remainder ;
+ view_type input;
+ const long length;
+ const long remainder;
- InitFunctor_ModTimes( view_type & input_ , const long length_ , const long remainder_ )
- : input(input_)
- , length(length_)
- , remainder(remainder_)
+ InitFunctor_ModTimes( view_type & input_, const long length_, const long remainder_ )
+ : input( input_ )
+ , length( length_ )
+ , remainder( remainder_ )
{}
KOKKOS_INLINE_FUNCTION
void operator()( const long i ) const {
if ( i < length ) {
- if ( i % (remainder+1) == remainder ) {
- input(i) = (T)2 ;
+ if ( i % ( remainder + 1 ) == remainder ) {
+ input( i ) = (T) 2;
}
else {
- input(i) = (T)1 ;
+ input( i ) = (T) 1;
}
}
}
};
-
template<class T, class execution_space >
struct InitFunctor_ModShift {
+ typedef Kokkos::View< T*, execution_space > view_type;
- typedef Kokkos::View< T* , execution_space > view_type ;
-
- view_type input ;
- const long length ;
- const long remainder ;
+ view_type input;
+ const long length;
+ const long remainder;
- InitFunctor_ModShift( view_type & input_ , const long length_ , const long remainder_ )
- : input(input_)
- , length(length_)
- , remainder(remainder_)
+ InitFunctor_ModShift( view_type & input_, const long length_, const long remainder_ )
+ : input( input_ )
+ , length( length_ )
+ , remainder( remainder_ )
{}
KOKKOS_INLINE_FUNCTION
void operator()( const long i ) const {
if ( i < length ) {
- if ( i % (remainder+1) == remainder ) {
- input(i) = 1 ;
+ if ( i % ( remainder + 1 ) == remainder ) {
+ input( i ) = 1;
}
}
}
};
-
//---------------------------------------------------
//-----------atomic view plus-equal------------------
//---------------------------------------------------
template<class T, class execution_space >
struct PlusEqualAtomicViewFunctor {
-
- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
-
- typedef Kokkos::View< T* , execution_space > view_type ;
+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
+ typedef Kokkos::View< T*, execution_space > view_type;
view_type input;
atomic_view_type even_odd_result;
const long length;
// Wrap the result view in an atomic view, use this for operator
- PlusEqualAtomicViewFunctor( const view_type & input_ , view_type & even_odd_result_ , const long length_)
- : input(input_)
- , even_odd_result(even_odd_result_)
- , length(length_)
+ PlusEqualAtomicViewFunctor( const view_type & input_, view_type & even_odd_result_, const long length_ )
+ : input( input_ )
+ , even_odd_result( even_odd_result_ )
+ , length( length_ )
{}
KOKKOS_INLINE_FUNCTION
- void operator()(const long i) const {
+ void operator()( const long i ) const {
if ( i < length ) {
if ( i % 2 == 0 ) {
- even_odd_result(0) += input(i);
+ even_odd_result( 0 ) += input( i );
}
else {
- even_odd_result(1) += input(i);
+ even_odd_result( 1 ) += input( i );
}
}
}
-
};
-
-template<class T, class execution_space >
-T PlusEqualAtomicView(const long input_length) {
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef typename view_type::HostMirror host_view_type ;
+template< class T, class execution_space >
+T PlusEqualAtomicView( const long input_length ) {
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef typename view_type::HostMirror host_view_type;
const long length = input_length;
- view_type input("input_view",length) ;
- view_type result_view("result_view",2) ;
+ view_type input( "input_view", length );
+ view_type result_view( "result_view", 2 );
- InitFunctor_Seq<T, execution_space> init_f( input , length ) ;
- Kokkos::parallel_for(Kokkos::RangePolicy<execution_space>(0, length) , init_f );
+ InitFunctor_Seq< T, execution_space > init_f( input, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );
- PlusEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length);
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
+ PlusEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>( 0, length ), functor );
Kokkos::fence();
- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
- Kokkos::deep_copy(h_result_view, result_view);
+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
+ Kokkos::deep_copy( h_result_view, result_view );
- return (T) (h_result_view(0) + h_result_view(1) ) ;
+ return (T) ( h_result_view( 0 ) + h_result_view( 1 ) );
}
-template<class T>
+template< class T >
T PlusEqualAtomicViewCheck( const long input_length ) {
-
const long N = input_length;
T result[2];
+
if ( N % 2 == 0 ) {
- const long half_sum_end = (N/2) - 1;
+ const long half_sum_end = ( N / 2 ) - 1;
const long full_sum_end = N - 1;
- result[0] = half_sum_end*(half_sum_end + 1)/2 ; //even sum
- result[1] = ( full_sum_end*(full_sum_end + 1)/2 ) - result[0] ; // odd sum
+ result[0] = half_sum_end * ( half_sum_end + 1 ) / 2; // Even sum.
+ result[1] = ( full_sum_end * ( full_sum_end + 1 ) / 2 ) - result[0]; // Odd sum.
}
else {
- const long half_sum_end = (T)(N/2) ;
+ const long half_sum_end = (T) ( N / 2 );
const long full_sum_end = N - 2;
- result[0] = half_sum_end*(half_sum_end - 1)/2 ; //even sum
- result[1] = ( full_sum_end*(full_sum_end - 1)/2 ) - result[0] ; // odd sum
+ result[0] = half_sum_end * ( half_sum_end - 1 ) / 2; // Even sum.
+ result[1] = ( full_sum_end * ( full_sum_end - 1 ) / 2 ) - result[0]; // Odd sum.
}
- return (T)(result[0] + result[1]);
+ return (T) ( result[0] + result[1] );
}
-template<class T,class DeviceType>
-bool PlusEqualAtomicViewTest(long input_length)
+template< class T, class DeviceType >
+bool PlusEqualAtomicViewTest( long input_length )
{
- T res = PlusEqualAtomicView<T,DeviceType>(input_length);
- T resSerial = PlusEqualAtomicViewCheck<T>(input_length);
+ T res = PlusEqualAtomicView< T, DeviceType >( input_length );
+ T resSerial = PlusEqualAtomicViewCheck< T >( input_length );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = PlusEqualAtomicViewTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
//---------------------------------------------------
//-----------atomic view minus-equal-----------------
//---------------------------------------------------
template<class T, class execution_space >
struct MinusEqualAtomicViewFunctor {
-
- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
-
- typedef Kokkos::View< T* , execution_space > view_type ;
+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
+ typedef Kokkos::View< T*, execution_space > view_type;
view_type input;
atomic_view_type even_odd_result;
const long length;
- // Wrap the result view in an atomic view, use this for operator
- MinusEqualAtomicViewFunctor( const view_type & input_ , view_type & even_odd_result_ , const long length_)
- : input(input_)
- , even_odd_result(even_odd_result_)
- , length(length_)
+ // Wrap the result view in an atomic view, use this for operator.
+ MinusEqualAtomicViewFunctor( const view_type & input_, view_type & even_odd_result_, const long length_ )
+ : input( input_ )
+ , even_odd_result( even_odd_result_ )
+ , length( length_ )
{}
KOKKOS_INLINE_FUNCTION
- void operator()(const long i) const {
+ void operator()( const long i ) const {
if ( i < length ) {
if ( i % 2 == 0 ) {
- even_odd_result(0) -= input(i);
+ even_odd_result( 0 ) -= input( i );
}
else {
- even_odd_result(1) -= input(i);
+ even_odd_result( 1 ) -= input( i );
}
}
}
-
};
-
-template<class T, class execution_space >
-T MinusEqualAtomicView(const long input_length) {
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef typename view_type::HostMirror host_view_type ;
+template< class T, class execution_space >
+T MinusEqualAtomicView( const long input_length ) {
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef typename view_type::HostMirror host_view_type;
const long length = input_length;
- view_type input("input_view",length) ;
- view_type result_view("result_view",2) ;
+ view_type input( "input_view", length );
+ view_type result_view( "result_view", 2 );
- InitFunctor_Seq<T, execution_space> init_f( input , length ) ;
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
+ InitFunctor_Seq< T, execution_space > init_f( input, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );
- MinusEqualAtomicViewFunctor<T,execution_space> functor(input, result_view,length);
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
+ MinusEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
Kokkos::fence();
- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
- Kokkos::deep_copy(h_result_view, result_view);
+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
+ Kokkos::deep_copy( h_result_view, result_view );
- return (T) (h_result_view(0) + h_result_view(1) ) ;
+ return (T) ( h_result_view( 0 ) + h_result_view( 1 ) );
}
-template<class T>
+template< class T >
T MinusEqualAtomicViewCheck( const long input_length ) {
-
const long N = input_length;
T result[2];
+
if ( N % 2 == 0 ) {
- const long half_sum_end = (N/2) - 1;
+ const long half_sum_end = ( N / 2 ) - 1;
const long full_sum_end = N - 1;
- result[0] = -1*( half_sum_end*(half_sum_end + 1)/2 ) ; //even sum
- result[1] = -1*( ( full_sum_end*(full_sum_end + 1)/2 ) + result[0] ) ; // odd sum
+ result[0] = -1 * ( half_sum_end * ( half_sum_end + 1 ) / 2 ); // Even sum.
+ result[1] = -1 * ( ( full_sum_end * ( full_sum_end + 1 ) / 2 ) + result[0] ); // Odd sum.
}
else {
- const long half_sum_end = (long)(N/2) ;
+ const long half_sum_end = (long) ( N / 2 );
const long full_sum_end = N - 2;
- result[0] = -1*( half_sum_end*(half_sum_end - 1)/2 ) ; //even sum
- result[1] = -1*( ( full_sum_end*(full_sum_end - 1)/2 ) + result[0] ) ; // odd sum
+ result[0] = -1 * ( half_sum_end * ( half_sum_end - 1 ) / 2 ); // Even sum.
+ result[1] = -1 * ( ( full_sum_end * ( full_sum_end - 1 ) / 2 ) + result[0] ); // Odd sum.
}
- return (result[0] + result[1]);
+ return ( result[0] + result[1] );
}
-template<class T,class DeviceType>
-bool MinusEqualAtomicViewTest(long input_length)
+template< class T, class DeviceType >
+bool MinusEqualAtomicViewTest( long input_length )
{
- T res = MinusEqualAtomicView<T,DeviceType>(input_length);
- T resSerial = MinusEqualAtomicViewCheck<T>(input_length);
+ T res = MinusEqualAtomicView< T, DeviceType >( input_length );
+ T resSerial = MinusEqualAtomicViewCheck< T >( input_length );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = MinusEqualAtomicViewTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
//---------------------------------------------------
//-----------atomic view times-equal-----------------
//---------------------------------------------------
template<class T, class execution_space >
struct TimesEqualAtomicViewFunctor {
-
- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
-
- typedef Kokkos::View< T* , execution_space > view_type ;
+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
+ typedef Kokkos::View< T*, execution_space > view_type;
view_type input;
atomic_view_type result;
const long length;
// Wrap the result view in an atomic view, use this for operator
- TimesEqualAtomicViewFunctor( const view_type & input_ , view_type & result_ , const long length_)
- : input(input_)
- , result(result_)
- , length(length_)
+ TimesEqualAtomicViewFunctor( const view_type & input_, view_type & result_, const long length_ )
+ : input( input_ )
+ , result( result_ )
+ , length( length_ )
{}
KOKKOS_INLINE_FUNCTION
- void operator()(const long i) const {
+ void operator()( const long i ) const {
if ( i < length && i > 0 ) {
- result(0) *= (double)input(i);
+ result( 0 ) *= (double) input( i );
}
}
-
};
-
-template<class T, class execution_space >
-T TimesEqualAtomicView(const long input_length, const long remainder) {
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef typename view_type::HostMirror host_view_type ;
+template< class T, class execution_space >
+T TimesEqualAtomicView( const long input_length, const long remainder ) {
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef typename view_type::HostMirror host_view_type;
const long length = input_length;
- view_type input("input_view",length) ;
- view_type result_view("result_view",1) ;
- deep_copy(result_view, 1.0);
+ view_type input( "input_view", length );
+ view_type result_view( "result_view", 1 );
+ deep_copy( result_view, 1.0 );
- InitFunctor_ModTimes<T, execution_space> init_f( input , length , remainder ) ;
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
+ InitFunctor_ModTimes< T, execution_space > init_f( input, length, remainder );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );
- TimesEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length);
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
+ TimesEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
Kokkos::fence();
- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
- Kokkos::deep_copy(h_result_view, result_view);
+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
+ Kokkos::deep_copy( h_result_view, result_view );
- return (T) (h_result_view(0)) ;
+ return (T) ( h_result_view( 0 ) );
}
-template<class T>
+template< class T >
T TimesEqualAtomicViewCheck( const long input_length, const long remainder ) {
-
- //Analytical result
+ // Analytical result.
const long N = input_length;
T result = 1.0;
for ( long i = 2; i < N; ++i ) {
- if ( i % (remainder+1) == remainder ) {
+ if ( i % ( remainder + 1 ) == remainder ) {
result *= 2.0;
}
else {
result *= 1.0;
}
}
- return (T)result;
+ return (T) result;
}
-template<class T, class DeviceType>
-bool TimesEqualAtomicViewTest(const long input_length)
+template< class T, class DeviceType>
+bool TimesEqualAtomicViewTest( const long input_length )
{
const long remainder = 23;
- T res = TimesEqualAtomicView<T,DeviceType>(input_length, remainder);
- T resSerial = TimesEqualAtomicViewCheck<T>(input_length, remainder);
+ T res = TimesEqualAtomicView< T, DeviceType >( input_length, remainder );
+ T resSerial = TimesEqualAtomicViewCheck< T >( input_length, remainder );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = TimesEqualAtomicViewTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
//---------------------------------------------------
//------------atomic view div-equal------------------
//---------------------------------------------------
template<class T, class execution_space >
struct DivEqualAtomicViewFunctor {
-
- typedef Kokkos::View< T , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef Kokkos::View< T , execution_space > scalar_view_type ;
+ typedef Kokkos::View< T, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef Kokkos::View< T, execution_space > scalar_view_type;
view_type input;
atomic_view_type result;
const long length;
- // Wrap the result view in an atomic view, use this for operator
- DivEqualAtomicViewFunctor( const view_type & input_ , scalar_view_type & result_ , const long length_)
- : input(input_)
- , result(result_)
- , length(length_)
+ // Wrap the result view in an atomic view, use this for operator.
+ DivEqualAtomicViewFunctor( const view_type & input_, scalar_view_type & result_, const long length_ )
+ : input( input_ )
+ , result( result_ )
+ , length( length_ )
{}
KOKKOS_INLINE_FUNCTION
- void operator()(const long i) const {
+ void operator()( const long i ) const {
if ( i < length && i > 0 ) {
- result() /= (double)(input(i));
+ result() /= (double) ( input( i ) );
}
}
-
};
-
-template<class T, class execution_space >
-T DivEqualAtomicView(const long input_length, const long remainder) {
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef Kokkos::View< T , execution_space > scalar_view_type ;
- typedef typename scalar_view_type::HostMirror host_scalar_view_type ;
+template< class T, class execution_space >
+T DivEqualAtomicView( const long input_length, const long remainder ) {
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef Kokkos::View< T, execution_space > scalar_view_type;
+ typedef typename scalar_view_type::HostMirror host_scalar_view_type;
const long length = input_length;
- view_type input("input_view",length) ;
- scalar_view_type result_view("result_view") ;
- Kokkos::deep_copy(result_view, 12121212121);
+ view_type input( "input_view", length );
+ scalar_view_type result_view( "result_view" );
+ Kokkos::deep_copy( result_view, 12121212121 );
- InitFunctor_ModTimes<T, execution_space> init_f( input , length , remainder ) ;
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
+ InitFunctor_ModTimes< T, execution_space > init_f( input, length, remainder );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );
- DivEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length);
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
+ DivEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
Kokkos::fence();
- host_scalar_view_type h_result_view = Kokkos::create_mirror_view(result_view);
- Kokkos::deep_copy(h_result_view, result_view);
+ host_scalar_view_type h_result_view = Kokkos::create_mirror_view( result_view );
+ Kokkos::deep_copy( h_result_view, result_view );
- return (T) (h_result_view()) ;
+ return (T) ( h_result_view() );
}
-template<class T>
-T DivEqualAtomicViewCheck( const long input_length , const long remainder ) {
-
+template< class T >
+T DivEqualAtomicViewCheck( const long input_length, const long remainder ) {
const long N = input_length;
T result = 12121212121.0;
for ( long i = 2; i < N; ++i ) {
- if ( i % (remainder+1) == remainder ) {
+ if ( i % ( remainder + 1 ) == remainder ) {
result /= 1.0;
}
else {
result /= 2.0;
}
-
}
- return (T)result;
+ return (T) result;
}
-template<class T, class DeviceType>
-bool DivEqualAtomicViewTest(const long input_length)
+template< class T, class DeviceType >
+bool DivEqualAtomicViewTest( const long input_length )
{
const long remainder = 23;
- T res = DivEqualAtomicView<T,DeviceType>(input_length, remainder);
- T resSerial = DivEqualAtomicViewCheck<T>(input_length, remainder);
+ T res = DivEqualAtomicView< T, DeviceType >( input_length, remainder );
+ T resSerial = DivEqualAtomicViewCheck< T >( input_length, remainder );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = DivEqualAtomicViewTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
//---------------------------------------------------
//------------atomic view mod-equal------------------
//---------------------------------------------------
-template<class T, class execution_space >
+template< class T, class execution_space >
struct ModEqualAtomicViewFunctor {
-
- typedef Kokkos::View< T , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef Kokkos::View< T , execution_space > scalar_view_type ;
+ typedef Kokkos::View< T, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef Kokkos::View< T, execution_space > scalar_view_type;
view_type input;
atomic_view_type result;
const long length;
- // Wrap the result view in an atomic view, use this for operator
- ModEqualAtomicViewFunctor( const view_type & input_ , scalar_view_type & result_ , const long length_)
- : input(input_)
- , result(result_)
- , length(length_)
+ // Wrap the result view in an atomic view, use this for operator.
+ ModEqualAtomicViewFunctor( const view_type & input_, scalar_view_type & result_, const long length_ )
+ : input( input_ )
+ , result( result_ )
+ , length( length_ )
{}
KOKKOS_INLINE_FUNCTION
- void operator()(const long i) const {
+ void operator()( const long i ) const {
if ( i < length && i > 0 ) {
- result() %= (double)(input(i));
+ result() %= (double) ( input( i ) );
}
}
-
};
-
-template<class T, class execution_space >
-T ModEqualAtomicView(const long input_length, const long remainder) {
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef Kokkos::View< T , execution_space > scalar_view_type ;
- typedef typename scalar_view_type::HostMirror host_scalar_view_type ;
+template< class T, class execution_space >
+T ModEqualAtomicView( const long input_length, const long remainder ) {
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef Kokkos::View< T, execution_space > scalar_view_type;
+ typedef typename scalar_view_type::HostMirror host_scalar_view_type;
const long length = input_length;
- view_type input("input_view",length) ;
- scalar_view_type result_view("result_view") ;
- Kokkos::deep_copy(result_view, 12121212121);
+ view_type input( "input_view", length );
+ scalar_view_type result_view( "result_view" );
+ Kokkos::deep_copy( result_view, 12121212121 );
- InitFunctor_ModTimes<T, execution_space> init_f( input , length , remainder ) ;
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
+ InitFunctor_ModTimes< T, execution_space > init_f( input, length, remainder );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );
- ModEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length);
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
+ ModEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
Kokkos::fence();
- host_scalar_view_type h_result_view = Kokkos::create_mirror_view(result_view);
- Kokkos::deep_copy(h_result_view, result_view);
+ host_scalar_view_type h_result_view = Kokkos::create_mirror_view( result_view );
+ Kokkos::deep_copy( h_result_view, result_view );
- return (T) (h_result_view()) ;
+ return (T) ( h_result_view() );
}
-template<class T>
-T ModEqualAtomicViewCheck( const long input_length , const long remainder ) {
-
+template< class T >
+T ModEqualAtomicViewCheck( const long input_length, const long remainder ) {
const long N = input_length;
T result = 12121212121;
for ( long i = 2; i < N; ++i ) {
- if ( i % (remainder+1) == remainder ) {
+ if ( i % ( remainder + 1 ) == remainder ) {
result %= 1;
}
else {
result %= 2;
}
}
- return (T)result;
+ return (T) result;
}
-template<class T, class DeviceType>
-bool ModEqualAtomicViewTest(const long input_length)
+template< class T, class DeviceType >
+bool ModEqualAtomicViewTest( const long input_length )
{
-
- static_assert( std::is_integral<T>::value, "ModEqualAtomicView Error: Type must be integral type for this unit test");
+ static_assert( std::is_integral< T >::value, "ModEqualAtomicView Error: Type must be integral type for this unit test" );
const long remainder = 23;
- T res = ModEqualAtomicView<T,DeviceType>(input_length, remainder);
- T resSerial = ModEqualAtomicViewCheck<T>(input_length, remainder);
+ T res = ModEqualAtomicView< T, DeviceType >( input_length, remainder );
+ T resSerial = ModEqualAtomicViewCheck< T >( input_length, remainder );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = ModEqualAtomicViewTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
//---------------------------------------------------
//------------atomic view rs-equal------------------
//---------------------------------------------------
-template<class T, class execution_space >
+template< class T, class execution_space >
struct RSEqualAtomicViewFunctor {
-
- typedef Kokkos::View< T**** , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef Kokkos::View< T**** , execution_space > result_view_type ;
+ typedef Kokkos::View< T****, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef Kokkos::View< T****, execution_space > result_view_type;
const view_type input;
atomic_view_type result;
const long length;
const long value;
- // Wrap the result view in an atomic view, use this for operator
- RSEqualAtomicViewFunctor( const view_type & input_ , result_view_type & result_ , const long & length_ , const long & value_ )
- : input(input_)
- , result(result_)
- , length(length_)
- , value(value_)
+ // Wrap the result view in an atomic view, use this for operator.
+ RSEqualAtomicViewFunctor( const view_type & input_, result_view_type & result_, const long & length_, const long & value_ )
+ : input( input_ )
+ , result( result_ )
+ , length( length_ )
+ , value( value_ )
{}
KOKKOS_INLINE_FUNCTION
- void operator()(const long i) const {
+ void operator()( const long i ) const {
if ( i < length ) {
if ( i % 4 == 0 ) {
- result(1,0,0,0) >>= input(i);
+ result( 1, 0, 0, 0 ) >>= input( i );
}
else if ( i % 4 == 1 ) {
- result(0,1,0,0) >>= input(i);
+ result( 0, 1, 0, 0 ) >>= input( i );
}
else if ( i % 4 == 2 ) {
- result(0,0,1,0) >>= input(i);
+ result( 0, 0, 1, 0 ) >>= input( i );
}
else if ( i % 4 == 3 ) {
- result(0,0,0,1) >>= input(i);
+ result( 0, 0, 0, 1 ) >>= input( i );
}
}
}
-
};
-
-template<class T, class execution_space >
-T RSEqualAtomicView(const long input_length, const long value, const long remainder) {
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef Kokkos::View< T**** , execution_space > result_view_type ;
- typedef typename result_view_type::HostMirror host_scalar_view_type ;
+template< class T, class execution_space >
+T RSEqualAtomicView( const long input_length, const long value, const long remainder ) {
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef Kokkos::View< T****, execution_space > result_view_type;
+ typedef typename result_view_type::HostMirror host_scalar_view_type;
const long length = input_length;
- view_type input("input_view",length) ;
- result_view_type result_view("result_view",2,2,2,2) ;
- host_scalar_view_type h_result_view = Kokkos::create_mirror_view(result_view);
- h_result_view(1,0,0,0) = value;
- h_result_view(0,1,0,0) = value;
- h_result_view(0,0,1,0) = value;
- h_result_view(0,0,0,1) = value;
- Kokkos::deep_copy( result_view , h_result_view );
+ view_type input( "input_view", length );
+ result_view_type result_view( "result_view", 2, 2, 2, 2 );
+ host_scalar_view_type h_result_view = Kokkos::create_mirror_view( result_view );
+ h_result_view( 1, 0, 0, 0 ) = value;
+ h_result_view( 0, 1, 0, 0 ) = value;
+ h_result_view( 0, 0, 1, 0 ) = value;
+ h_result_view( 0, 0, 0, 1 ) = value;
+ Kokkos::deep_copy( result_view, h_result_view );
+ InitFunctor_ModShift< T, execution_space > init_f( input, length, remainder );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );
- InitFunctor_ModShift<T, execution_space> init_f( input , length , remainder ) ;
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
-
- RSEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length, value);
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
+ RSEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length, value );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
Kokkos::fence();
- Kokkos::deep_copy(h_result_view, result_view);
+ Kokkos::deep_copy( h_result_view, result_view );
- return (T) (h_result_view(1,0,0,0)) ;
+ return (T) ( h_result_view( 1, 0, 0, 0 ) );
}
-template<class T>
+template< class T >
T RSEqualAtomicViewCheck( const long input_length, const long value, const long remainder ) {
-
- T result[4] ;
- result[0] = value ;
- result[1] = value ;
- result[2] = value ;
- result[3] = value ;
+ T result[4];
+ result[0] = value;
+ result[1] = value;
+ result[2] = value;
+ result[3] = value;
T * input = new T[input_length];
for ( long i = 0; i < input_length; ++i ) {
- if ( i % (remainder+1) == remainder ) {
- input[i] = 1;
- }
- else {
- input[i] = 0;
- }
+ if ( i % ( remainder + 1 ) == remainder ) {
+ input[i] = 1;
+ }
+ else {
+ input[i] = 0;
+ }
}
for ( long i = 0; i < input_length; ++i ) {
- if ( i % 4 == 0 ) {
- result[0] >>= input[i];
- }
- else if ( i % 4 == 1 ) {
- result[1] >>= input[i];
- }
- else if ( i % 4 == 2 ) {
- result[2] >>= input[i];
- }
- else if ( i % 4 == 3 ) {
- result[3] >>= input[i];
- }
+ if ( i % 4 == 0 ) {
+ result[0] >>= input[i];
+ }
+ else if ( i % 4 == 1 ) {
+ result[1] >>= input[i];
+ }
+ else if ( i % 4 == 2 ) {
+ result[2] >>= input[i];
+ }
+ else if ( i % 4 == 3 ) {
+ result[3] >>= input[i];
+ }
}
+
delete [] input;
- return (T)result[0];
+ return (T) result[0];
}
-template<class T, class DeviceType>
-bool RSEqualAtomicViewTest(const long input_length)
+template< class T, class DeviceType >
+bool RSEqualAtomicViewTest( const long input_length )
{
-
- static_assert( std::is_integral<T>::value, "RSEqualAtomicViewTest: Must be integral type for test");
+ static_assert( std::is_integral< T >::value, "RSEqualAtomicViewTest: Must be integral type for test" );
const long remainder = 61042; //prime - 1
- const long value = 1073741825; // 2^30+1
- T res = RSEqualAtomicView<T,DeviceType>(input_length, value, remainder);
- T resSerial = RSEqualAtomicViewCheck<T>(input_length, value, remainder);
+ const long value = 1073741825; // 2^30+1
+ T res = RSEqualAtomicView< T, DeviceType >( input_length, value, remainder );
+ T resSerial = RSEqualAtomicViewCheck< T >( input_length, value, remainder );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = RSEqualAtomicViewTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
//---------------------------------------------------
//------------atomic view ls-equal------------------
//---------------------------------------------------
template<class T, class execution_space >
struct LSEqualAtomicViewFunctor {
-
- typedef Kokkos::View< T**** , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef Kokkos::View< T**** , execution_space > result_view_type ;
+ typedef Kokkos::View< T****, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef Kokkos::View< T****, execution_space > result_view_type;
view_type input;
atomic_view_type result;
const long length;
const long value;
- // Wrap the result view in an atomic view, use this for operator
- LSEqualAtomicViewFunctor( const view_type & input_ , result_view_type & result_ , const long & length_ , const long & value_ )
- : input(input_)
- , result(result_)
- , length(length_)
- , value(value_)
+ // Wrap the result view in an atomic view, use this for operator.
+ LSEqualAtomicViewFunctor( const view_type & input_, result_view_type & result_, const long & length_, const long & value_ )
+ : input( input_ )
+ , result( result_ )
+ , length( length_ )
+ , value( value_ )
{}
KOKKOS_INLINE_FUNCTION
- void operator()(const long i) const {
+ void operator()( const long i ) const {
if ( i < length ) {
if ( i % 4 == 0 ) {
- result(1,0,0,0) <<= input(i);
+ result( 1, 0, 0, 0 ) <<= input( i );
}
else if ( i % 4 == 1 ) {
- result(0,1,0,0) <<= input(i);
+ result( 0, 1, 0, 0 ) <<= input( i );
}
else if ( i % 4 == 2 ) {
- result(0,0,1,0) <<= input(i);
+ result( 0, 0, 1, 0 ) <<= input( i );
}
else if ( i % 4 == 3 ) {
- result(0,0,0,1) <<= input(i);
+ result( 0, 0, 0, 1 ) <<= input( i );
}
}
}
-
};
-
-template<class T, class execution_space >
-T LSEqualAtomicView(const long input_length, const long value, const long remainder) {
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef Kokkos::View< T**** , execution_space > result_view_type ;
- typedef typename result_view_type::HostMirror host_scalar_view_type ;
+template< class T, class execution_space >
+T LSEqualAtomicView( const long input_length, const long value, const long remainder ) {
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef Kokkos::View< T****, execution_space > result_view_type;
+ typedef typename result_view_type::HostMirror host_scalar_view_type;
const long length = input_length;
- view_type input("input_view",length) ;
- result_view_type result_view("result_view",2,2,2,2) ;
- host_scalar_view_type h_result_view = Kokkos::create_mirror_view(result_view);
- h_result_view(1,0,0,0) = value;
- h_result_view(0,1,0,0) = value;
- h_result_view(0,0,1,0) = value;
- h_result_view(0,0,0,1) = value;
- Kokkos::deep_copy( result_view , h_result_view );
+ view_type input( "input_view", length );
+ result_view_type result_view( "result_view", 2, 2, 2, 2 );
+ host_scalar_view_type h_result_view = Kokkos::create_mirror_view( result_view );
+ h_result_view( 1, 0, 0, 0 ) = value;
+ h_result_view( 0, 1, 0, 0 ) = value;
+ h_result_view( 0, 0, 1, 0 ) = value;
+ h_result_view( 0, 0, 0, 1 ) = value;
+ Kokkos::deep_copy( result_view, h_result_view );
- InitFunctor_ModShift<T, execution_space> init_f( input , length , remainder ) ;
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
+ InitFunctor_ModShift< T, execution_space > init_f( input, length, remainder );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );
- LSEqualAtomicViewFunctor<T,execution_space> functor(input, result_view, length, value);
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
+ LSEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length, value );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
Kokkos::fence();
- Kokkos::deep_copy(h_result_view, result_view);
+ Kokkos::deep_copy( h_result_view, result_view );
- return (T) (h_result_view(1,0,0,0)) ;
+ return (T) ( h_result_view( 1, 0, 0, 0 ) );
}
-template<class T>
+template< class T >
T LSEqualAtomicViewCheck( const long input_length, const long value, const long remainder ) {
-
- T result[4] ;
- result[0] = value ;
- result[1] = value ;
- result[2] = value ;
- result[3] = value ;
+ T result[4];
+ result[0] = value;
+ result[1] = value;
+ result[2] = value;
+ result[3] = value;
T * input = new T[input_length];
for ( long i = 0; i < input_length; ++i ) {
- if ( i % (remainder+1) == remainder ) {
- input[i] = 1;
- }
- else {
- input[i] = 0;
- }
+ if ( i % ( remainder + 1 ) == remainder ) {
+ input[i] = 1;
+ }
+ else {
+ input[i] = 0;
+ }
}
for ( long i = 0; i < input_length; ++i ) {
- if ( i % 4 == 0 ) {
- result[0] <<= input[i];
- }
- else if ( i % 4 == 1 ) {
- result[1] <<= input[i];
- }
- else if ( i % 4 == 2 ) {
- result[2] <<= input[i];
- }
- else if ( i % 4 == 3 ) {
- result[3] <<= input[i];
- }
+ if ( i % 4 == 0 ) {
+ result[0] <<= input[i];
+ }
+ else if ( i % 4 == 1 ) {
+ result[1] <<= input[i];
+ }
+ else if ( i % 4 == 2 ) {
+ result[2] <<= input[i];
+ }
+ else if ( i % 4 == 3 ) {
+ result[3] <<= input[i];
+ }
}
delete [] input;
- return (T)result[0];
+ return (T) result[0];
}
-template<class T, class DeviceType>
-bool LSEqualAtomicViewTest(const long input_length)
+template< class T, class DeviceType >
+bool LSEqualAtomicViewTest( const long input_length )
{
-
- static_assert( std::is_integral<T>::value, "LSEqualAtomicViewTest: Must be integral type for test");
+ static_assert( std::is_integral< T >::value, "LSEqualAtomicViewTest: Must be integral type for test" );
const long remainder = 61042; //prime - 1
- const long value = 1; // 2^30+1
- T res = LSEqualAtomicView<T,DeviceType>(input_length, value, remainder);
- T resSerial = LSEqualAtomicViewCheck<T>(input_length, value, remainder);
+ const long value = 1; // 2^30+1
+ T res = LSEqualAtomicView< T, DeviceType >( input_length, value, remainder );
+ T resSerial = LSEqualAtomicViewCheck< T >( input_length, value, remainder );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = RSEqualAtomicViewTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
//---------------------------------------------------
//-----------atomic view and-equal-----------------
//---------------------------------------------------
-template<class T, class execution_space >
+template< class T, class execution_space >
struct AndEqualAtomicViewFunctor {
-
- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
-
- typedef Kokkos::View< T* , execution_space > view_type ;
+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
+ typedef Kokkos::View< T*, execution_space > view_type;
view_type input;
atomic_view_type even_odd_result;
const long length;
- // Wrap the result view in an atomic view, use this for operator
- AndEqualAtomicViewFunctor( const view_type & input_ , view_type & even_odd_result_ , const long length_)
- : input(input_)
- , even_odd_result(even_odd_result_)
- , length(length_)
+ // Wrap the result view in an atomic view, use this for operator.
+ AndEqualAtomicViewFunctor( const view_type & input_, view_type & even_odd_result_, const long length_ )
+ : input( input_ )
+ , even_odd_result( even_odd_result_ )
+ , length( length_ )
{}
KOKKOS_INLINE_FUNCTION
- void operator()(const long i) const {
+ void operator()( const long i ) const {
if ( i < length ) {
if ( i % 2 == 0 ) {
- even_odd_result(0) &= input(i);
+ even_odd_result( 0 ) &= input( i );
}
else {
- even_odd_result(1) &= input(i);
+ even_odd_result( 1 ) &= input( i );
}
}
}
-
};
-
-template<class T, class execution_space >
-T AndEqualAtomicView(const long input_length) {
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef typename view_type::HostMirror host_view_type ;
+template< class T, class execution_space >
+T AndEqualAtomicView( const long input_length ) {
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef typename view_type::HostMirror host_view_type;
const long length = input_length;
- view_type input("input_view",length) ;
- view_type result_view("result_view",2) ;
- Kokkos::deep_copy(result_view, 1);
+ view_type input( "input_view", length );
+ view_type result_view( "result_view", 2 );
+ Kokkos::deep_copy( result_view, 1 );
- InitFunctor_Seq<T, execution_space> init_f( input , length ) ;
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
+ InitFunctor_Seq< T, execution_space > init_f( input, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );
- AndEqualAtomicViewFunctor<T,execution_space> functor(input, result_view,length);
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
+ AndEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
Kokkos::fence();
- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
- Kokkos::deep_copy(h_result_view, result_view);
+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
+ Kokkos::deep_copy( h_result_view, result_view );
- return (T) (h_result_view(0)) ;
+ return (T) ( h_result_view( 0 ) );
}
-template<class T>
+template< class T >
T AndEqualAtomicViewCheck( const long input_length ) {
-
const long N = input_length;
- T result[2] = {1};
+ T result[2] = { 1 };
for ( long i = 0; i < N; ++i ) {
if ( N % 2 == 0 ) {
- result[0] &= (T)i;
+ result[0] &= (T) i;
}
else {
- result[1] &= (T)i;
+ result[1] &= (T) i;
}
}
- return (result[0]);
+ return ( result[0] );
}
-template<class T,class DeviceType>
-bool AndEqualAtomicViewTest(long input_length)
+template< class T, class DeviceType >
+bool AndEqualAtomicViewTest( long input_length )
{
+ static_assert( std::is_integral< T >::value, "AndEqualAtomicViewTest: Must be integral type for test" );
- static_assert( std::is_integral<T>::value, "AndEqualAtomicViewTest: Must be integral type for test");
-
- T res = AndEqualAtomicView<T,DeviceType>(input_length);
- T resSerial = AndEqualAtomicViewCheck<T>(input_length);
+ T res = AndEqualAtomicView< T, DeviceType >( input_length );
+ T resSerial = AndEqualAtomicViewCheck< T >( input_length );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = AndEqualAtomicViewTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
//---------------------------------------------------
//-----------atomic view or-equal-----------------
//---------------------------------------------------
-template<class T, class execution_space >
+template< class T, class execution_space >
struct OrEqualAtomicViewFunctor {
-
- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
-
- typedef Kokkos::View< T* , execution_space > view_type ;
+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
+ typedef Kokkos::View< T*, execution_space > view_type;
view_type input;
atomic_view_type even_odd_result;
const long length;
- // Wrap the result view in an atomic view, use this for operator
- OrEqualAtomicViewFunctor( const view_type & input_ , view_type & even_odd_result_ , const long length_)
- : input(input_)
- , even_odd_result(even_odd_result_)
- , length(length_)
+ // Wrap the result view in an atomic view, use this for operator.
+ OrEqualAtomicViewFunctor( const view_type & input_, view_type & even_odd_result_, const long length_ )
+ : input( input_ )
+ , even_odd_result( even_odd_result_ )
+ , length( length_ )
{}
KOKKOS_INLINE_FUNCTION
- void operator()(const long i) const {
+ void operator()( const long i ) const {
if ( i < length ) {
if ( i % 2 == 0 ) {
- even_odd_result(0) |= input(i);
+ even_odd_result( 0 ) |= input( i );
}
else {
- even_odd_result(1) |= input(i);
+ even_odd_result( 1 ) |= input( i );
}
}
}
-
};
-
-template<class T, class execution_space >
-T OrEqualAtomicView(const long input_length) {
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef typename view_type::HostMirror host_view_type ;
+template< class T, class execution_space >
+T OrEqualAtomicView( const long input_length ) {
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef typename view_type::HostMirror host_view_type;
const long length = input_length;
- view_type input("input_view",length) ;
- view_type result_view("result_view",2) ;
+ view_type input( "input_view", length );
+ view_type result_view( "result_view", 2 );
- InitFunctor_Seq<T, execution_space> init_f( input , length ) ;
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
+ InitFunctor_Seq< T, execution_space > init_f( input, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );
- OrEqualAtomicViewFunctor<T,execution_space> functor(input, result_view,length);
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
+ OrEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
Kokkos::fence();
- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
- Kokkos::deep_copy(h_result_view, result_view);
+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
+ Kokkos::deep_copy( h_result_view, result_view );
- return (T) (h_result_view(0)) ;
+ return (T) ( h_result_view( 0 ) );
}
-template<class T>
+template< class T >
T OrEqualAtomicViewCheck( const long input_length ) {
const long N = input_length;
- T result[2] = {0};
+ T result[2] = { 0 };
for ( long i = 0; i < N; ++i ) {
if ( i % 2 == 0 ) {
- result[0] |= (T)i;
+ result[0] |= (T) i;
}
else {
- result[1] |= (T)i;
+ result[1] |= (T) i;
}
}
- return (T)(result[0]);
+ return (T) ( result[0] );
}
-template<class T,class DeviceType>
-bool OrEqualAtomicViewTest(long input_length)
+template< class T, class DeviceType >
+bool OrEqualAtomicViewTest( long input_length )
{
-
- static_assert( std::is_integral<T>::value, "OrEqualAtomicViewTest: Must be integral type for test");
+ static_assert( std::is_integral< T >::value, "OrEqualAtomicViewTest: Must be integral type for test" );
- T res = OrEqualAtomicView<T,DeviceType>(input_length);
- T resSerial = OrEqualAtomicViewCheck<T>(input_length);
+ T res = OrEqualAtomicView< T, DeviceType >( input_length );
+ T resSerial = OrEqualAtomicViewCheck< T >( input_length );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = OrEqualAtomicViewTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
//---------------------------------------------------
//-----------atomic view xor-equal-----------------
//---------------------------------------------------
-template<class T, class execution_space >
+template< class T, class execution_space >
struct XOrEqualAtomicViewFunctor {
-
- typedef Kokkos::View< T* , execution_space , Kokkos::MemoryTraits< Kokkos::Atomic > > atomic_view_type ;
-
- typedef Kokkos::View< T* , execution_space > view_type ;
+ typedef Kokkos::View< T*, execution_space, Kokkos::MemoryTraits<Kokkos::Atomic> > atomic_view_type;
+ typedef Kokkos::View< T*, execution_space > view_type;
view_type input;
atomic_view_type even_odd_result;
const long length;
- // Wrap the result view in an atomic view, use this for operator
- XOrEqualAtomicViewFunctor( const view_type & input_ , view_type & even_odd_result_ , const long length_)
- : input(input_)
- , even_odd_result(even_odd_result_)
- , length(length_)
+ // Wrap the result view in an atomic view, use this for operator.
+ XOrEqualAtomicViewFunctor( const view_type & input_, view_type & even_odd_result_, const long length_ )
+ : input( input_ )
+ , even_odd_result( even_odd_result_ )
+ , length( length_ )
{}
KOKKOS_INLINE_FUNCTION
- void operator()(const long i) const {
+ void operator()( const long i ) const {
if ( i < length ) {
if ( i % 2 == 0 ) {
- even_odd_result(0) ^= input(i);
+ even_odd_result( 0 ) ^= input( i );
}
else {
- even_odd_result(1) ^= input(i);
+ even_odd_result( 1 ) ^= input( i );
}
}
}
-
};
-
-template<class T, class execution_space >
-T XOrEqualAtomicView(const long input_length) {
-
- typedef Kokkos::View< T* , execution_space > view_type ;
- typedef typename view_type::HostMirror host_view_type ;
+template< class T, class execution_space >
+T XOrEqualAtomicView( const long input_length ) {
+ typedef Kokkos::View< T*, execution_space > view_type;
+ typedef typename view_type::HostMirror host_view_type;
const long length = input_length;
- view_type input("input_view",length) ;
- view_type result_view("result_view",2) ;
+ view_type input( "input_view", length );
+ view_type result_view( "result_view", 2 );
- InitFunctor_Seq<T, execution_space> init_f( input , length ) ;
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), init_f );
+ InitFunctor_Seq< T, execution_space > init_f( input, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), init_f );
- XOrEqualAtomicViewFunctor<T,execution_space> functor(input, result_view,length);
- Kokkos::parallel_for( Kokkos::RangePolicy<execution_space>(0, length), functor);
+ XOrEqualAtomicViewFunctor< T, execution_space > functor( input, result_view, length );
+ Kokkos::parallel_for( Kokkos::RangePolicy< execution_space >( 0, length ), functor );
Kokkos::fence();
- host_view_type h_result_view = Kokkos::create_mirror_view(result_view);
- Kokkos::deep_copy(h_result_view, result_view);
+ host_view_type h_result_view = Kokkos::create_mirror_view( result_view );
+ Kokkos::deep_copy( h_result_view, result_view );
- return (T) (h_result_view(0)) ;
+ return (T) ( h_result_view( 0 ) );
}
-template<class T>
+template< class T >
T XOrEqualAtomicViewCheck( const long input_length ) {
-
const long N = input_length;
- T result[2] = {0};
+ T result[2] = { 0 };
for ( long i = 0; i < N; ++i ) {
if ( i % 2 == 0 ) {
- result[0] ^= (T)i;
+ result[0] ^= (T) i;
}
else {
- result[1] ^= (T)i;
+ result[1] ^= (T) i;
}
}
- return (T)(result[0]);
+ return (T) ( result[0] );
}
-template<class T,class DeviceType>
-bool XOrEqualAtomicViewTest(long input_length)
+template< class T, class DeviceType >
+bool XOrEqualAtomicViewTest( long input_length )
{
+ static_assert( std::is_integral< T >::value, "XOrEqualAtomicViewTest: Must be integral type for test" );
- static_assert( std::is_integral<T>::value, "XOrEqualAtomicViewTest: Must be integral type for test");
-
- T res = XOrEqualAtomicView<T,DeviceType>(input_length);
- T resSerial = XOrEqualAtomicViewCheck<T>(input_length);
+ T res = XOrEqualAtomicView< T, DeviceType >( input_length );
+ T resSerial = XOrEqualAtomicViewCheck< T >( input_length );
bool passed = true;
if ( resSerial != res ) {
passed = false;
std::cout << "Loop<"
- << typeid(T).name()
+ << typeid( T ).name()
<< ">( test = XOrEqualAtomicViewTest"
<< " FAILED : "
<< resSerial << " != " << res
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
}
-
// inc/dec?
-
//---------------------------------------------------
//--------------atomic_test_control------------------
//---------------------------------------------------
-template<class T,class DeviceType>
-bool AtomicViewsTestIntegralType( const int length , int test )
+template< class T, class DeviceType >
+bool AtomicViewsTestIntegralType( const int length, int test )
{
- static_assert( std::is_integral<T>::value, "TestAtomicViews Error: Non-integral type passed into IntegralType tests");
-
- switch (test) {
- case 1: return PlusEqualAtomicViewTest<T,DeviceType>( length );
- case 2: return MinusEqualAtomicViewTest<T,DeviceType>( length );
- case 3: return RSEqualAtomicViewTest<T,DeviceType>( length );
- case 4: return LSEqualAtomicViewTest<T,DeviceType>( length );
- case 5: return ModEqualAtomicViewTest<T,DeviceType>( length );
- case 6: return AndEqualAtomicViewTest<T,DeviceType>( length );
- case 7: return OrEqualAtomicViewTest<T,DeviceType>( length );
- case 8: return XOrEqualAtomicViewTest<T,DeviceType>( length );
+ static_assert( std::is_integral< T >::value, "TestAtomicViews Error: Non-integral type passed into IntegralType tests" );
+
+ switch ( test ) {
+ case 1: return PlusEqualAtomicViewTest< T, DeviceType >( length );
+ case 2: return MinusEqualAtomicViewTest< T, DeviceType >( length );
+ case 3: return RSEqualAtomicViewTest< T, DeviceType >( length );
+ case 4: return LSEqualAtomicViewTest< T, DeviceType >( length );
+ case 5: return ModEqualAtomicViewTest< T, DeviceType >( length );
+ case 6: return AndEqualAtomicViewTest< T, DeviceType >( length );
+ case 7: return OrEqualAtomicViewTest< T, DeviceType >( length );
+ case 8: return XOrEqualAtomicViewTest< T, DeviceType >( length );
}
+
return 0;
}
-
-template<class T,class DeviceType>
-bool AtomicViewsTestNonIntegralType( const int length , int test )
+template< class T, class DeviceType >
+bool AtomicViewsTestNonIntegralType( const int length, int test )
{
- switch (test) {
- case 1: return PlusEqualAtomicViewTest<T,DeviceType>( length );
- case 2: return MinusEqualAtomicViewTest<T,DeviceType>( length );
- case 3: return TimesEqualAtomicViewTest<T,DeviceType>( length );
- case 4: return DivEqualAtomicViewTest<T,DeviceType>( length );
+ switch ( test ) {
+ case 1: return PlusEqualAtomicViewTest< T, DeviceType >( length );
+ case 2: return MinusEqualAtomicViewTest< T, DeviceType >( length );
+ case 3: return TimesEqualAtomicViewTest< T, DeviceType >( length );
+ case 4: return DivEqualAtomicViewTest< T, DeviceType >( length );
}
+
return 0;
}
-} // namespace
-
+} // namespace TestAtomicViews
diff --git a/lib/kokkos/core/unit_test/TestCXX11.hpp b/lib/kokkos/core/unit_test/TestCXX11.hpp
index d6dde5e96..e2ad623d9 100644
--- a/lib/kokkos/core/unit_test/TestCXX11.hpp
+++ b/lib/kokkos/core/unit_test/TestCXX11.hpp
@@ -1,334 +1,345 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+
#include <Kokkos_Core.hpp>
namespace TestCXX11 {
-template<class DeviceType>
-struct FunctorAddTest{
- typedef Kokkos::View<double**,DeviceType> view_type;
- view_type a_, b_;
+template< class DeviceType >
+struct FunctorAddTest {
+ typedef Kokkos::View< double**, DeviceType > view_type;
typedef DeviceType execution_space;
- FunctorAddTest(view_type & a, view_type &b):a_(a),b_(b) {}
+ typedef typename Kokkos::TeamPolicy< execution_space >::member_type team_member;
+
+ view_type a_, b_;
+
+ FunctorAddTest( view_type & a, view_type & b ) : a_( a ), b_( b ) {}
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i) const {
- b_(i,0) = a_(i,1) + a_(i,2);
- b_(i,1) = a_(i,0) - a_(i,3);
- b_(i,2) = a_(i,4) + a_(i,0);
- b_(i,3) = a_(i,2) - a_(i,1);
- b_(i,4) = a_(i,3) + a_(i,4);
+ void operator() ( const int& i ) const {
+ b_( i, 0 ) = a_( i, 1 ) + a_( i, 2 );
+ b_( i, 1 ) = a_( i, 0 ) - a_( i, 3 );
+ b_( i, 2 ) = a_( i, 4 ) + a_( i, 0 );
+ b_( i, 3 ) = a_( i, 2 ) - a_( i, 1 );
+ b_( i, 4 ) = a_( i, 3 ) + a_( i, 4 );
}
- typedef typename Kokkos::TeamPolicy< execution_space >::member_type team_member ;
KOKKOS_INLINE_FUNCTION
- void operator() (const team_member & dev) const {
- const int begin = dev.league_rank() * 4 ;
- const int end = begin + 4 ;
- for ( int i = begin + dev.team_rank() ; i < end ; i += dev.team_size() ) {
- b_(i,0) = a_(i,1) + a_(i,2);
- b_(i,1) = a_(i,0) - a_(i,3);
- b_(i,2) = a_(i,4) + a_(i,0);
- b_(i,3) = a_(i,2) - a_(i,1);
- b_(i,4) = a_(i,3) + a_(i,4);
+ void operator() ( const team_member & dev ) const {
+ const int begin = dev.league_rank() * 4;
+ const int end = begin + 4;
+ for ( int i = begin + dev.team_rank(); i < end; i += dev.team_size() ) {
+ b_( i, 0 ) = a_( i, 1 ) + a_( i, 2 );
+ b_( i, 1 ) = a_( i, 0 ) - a_( i, 3 );
+ b_( i, 2 ) = a_( i, 4 ) + a_( i, 0 );
+ b_( i, 3 ) = a_( i, 2 ) - a_( i, 1 );
+ b_( i, 4 ) = a_( i, 3 ) + a_( i, 4 );
}
}
};
-template<class DeviceType, bool PWRTest>
+template< class DeviceType, bool PWRTest >
double AddTestFunctor() {
+ typedef Kokkos::TeamPolicy< DeviceType > policy_type;
- typedef Kokkos::TeamPolicy<DeviceType> policy_type ;
-
- Kokkos::View<double**,DeviceType> a("A",100,5);
- Kokkos::View<double**,DeviceType> b("B",100,5);
- typename Kokkos::View<double**,DeviceType>::HostMirror h_a = Kokkos::create_mirror_view(a);
- typename Kokkos::View<double**,DeviceType>::HostMirror h_b = Kokkos::create_mirror_view(b);
+ Kokkos::View< double**, DeviceType > a( "A", 100, 5 );
+ Kokkos::View< double**, DeviceType > b( "B", 100, 5 );
+ typename Kokkos::View< double**, DeviceType >::HostMirror h_a = Kokkos::create_mirror_view( a );
+ typename Kokkos::View< double**, DeviceType >::HostMirror h_b = Kokkos::create_mirror_view( b );
- for(int i=0;i<100;i++) {
- for(int j=0;j<5;j++)
- h_a(i,j) = 0.1*i/(1.1*j+1.0) + 0.5*j;
+ for ( int i = 0; i < 100; i++ ) {
+ for ( int j = 0; j < 5; j++ ) {
+ h_a( i, j ) = 0.1 * i / ( 1.1 * j + 1.0 ) + 0.5 * j;
+ }
}
- Kokkos::deep_copy(a,h_a);
+ Kokkos::deep_copy( a, h_a );
- if(PWRTest==false)
- Kokkos::parallel_for(100,FunctorAddTest<DeviceType>(a,b));
- else
- Kokkos::parallel_for(policy_type(25,Kokkos::AUTO),FunctorAddTest<DeviceType>(a,b));
- Kokkos::deep_copy(h_b,b);
+ if ( PWRTest == false ) {
+ Kokkos::parallel_for( 100, FunctorAddTest< DeviceType >( a, b ) );
+ }
+ else {
+ Kokkos::parallel_for( policy_type( 25, Kokkos::AUTO ), FunctorAddTest< DeviceType >( a, b ) );
+ }
+ Kokkos::deep_copy( h_b, b );
double result = 0;
- for(int i=0;i<100;i++) {
- for(int j=0;j<5;j++)
- result += h_b(i,j);
+ for ( int i = 0; i < 100; i++ ) {
+ for ( int j = 0; j < 5; j++ ) {
+ result += h_b( i, j );
}
+ }
return result;
}
-
-#if defined (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
-template<class DeviceType, bool PWRTest>
+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
+template< class DeviceType, bool PWRTest >
double AddTestLambda() {
-
- Kokkos::View<double**,DeviceType> a("A",100,5);
- Kokkos::View<double**,DeviceType> b("B",100,5);
- typename Kokkos::View<double**,DeviceType>::HostMirror h_a = Kokkos::create_mirror_view(a);
- typename Kokkos::View<double**,DeviceType>::HostMirror h_b = Kokkos::create_mirror_view(b);
-
- for(int i=0;i<100;i++) {
- for(int j=0;j<5;j++)
- h_a(i,j) = 0.1*i/(1.1*j+1.0) + 0.5*j;
+ Kokkos::View< double**, DeviceType > a( "A", 100, 5 );
+ Kokkos::View< double**, DeviceType > b( "B", 100, 5 );
+ typename Kokkos::View< double**, DeviceType >::HostMirror h_a = Kokkos::create_mirror_view( a );
+ typename Kokkos::View< double**, DeviceType >::HostMirror h_b = Kokkos::create_mirror_view( b );
+
+ for ( int i = 0; i < 100; i++ ) {
+ for ( int j = 0; j < 5; j++ ) {
+ h_a( i, j ) = 0.1 * i / ( 1.1 * j + 1.0 ) + 0.5 * j;
+ }
}
- Kokkos::deep_copy(a,h_a);
-
- if(PWRTest==false) {
- Kokkos::parallel_for(100,KOKKOS_LAMBDA(const int& i) {
- b(i,0) = a(i,1) + a(i,2);
- b(i,1) = a(i,0) - a(i,3);
- b(i,2) = a(i,4) + a(i,0);
- b(i,3) = a(i,2) - a(i,1);
- b(i,4) = a(i,3) + a(i,4);
+ Kokkos::deep_copy( a, h_a );
+
+ if ( PWRTest == false ) {
+ Kokkos::parallel_for( 100, KOKKOS_LAMBDA( const int & i ) {
+ b( i, 0 ) = a( i, 1 ) + a( i, 2 );
+ b( i, 1 ) = a( i, 0 ) - a( i, 3 );
+ b( i, 2 ) = a( i, 4 ) + a( i, 0 );
+ b( i, 3 ) = a( i, 2 ) - a( i, 1 );
+ b( i, 4 ) = a( i, 3 ) + a( i, 4 );
});
- } else {
- typedef Kokkos::TeamPolicy<DeviceType> policy_type ;
- typedef typename policy_type::member_type team_member ;
-
- policy_type policy(25,Kokkos::AUTO);
-
- Kokkos::parallel_for(policy,KOKKOS_LAMBDA(const team_member & dev) {
- const int begin = dev.league_rank() * 4 ;
- const int end = begin + 4 ;
- for ( int i = begin + dev.team_rank() ; i < end ; i += dev.team_size() ) {
- b(i,0) = a(i,1) + a(i,2);
- b(i,1) = a(i,0) - a(i,3);
- b(i,2) = a(i,4) + a(i,0);
- b(i,3) = a(i,2) - a(i,1);
- b(i,4) = a(i,3) + a(i,4);
+ }
+ else {
+ typedef Kokkos::TeamPolicy< DeviceType > policy_type;
+ typedef typename policy_type::member_type team_member;
+
+ policy_type policy( 25, Kokkos::AUTO );
+
+ Kokkos::parallel_for( policy, KOKKOS_LAMBDA( const team_member & dev ) {
+ const int begin = dev.league_rank() * 4;
+ const int end = begin + 4;
+ for ( int i = begin + dev.team_rank(); i < end; i += dev.team_size() ) {
+ b( i, 0 ) = a( i, 1 ) + a( i, 2 );
+ b( i, 1 ) = a( i, 0 ) - a( i, 3 );
+ b( i, 2 ) = a( i, 4 ) + a( i, 0 );
+ b( i, 3 ) = a( i, 2 ) - a( i, 1 );
+ b( i, 4 ) = a( i, 3 ) + a( i, 4 );
}
});
}
- Kokkos::deep_copy(h_b,b);
+ Kokkos::deep_copy( h_b, b );
double result = 0;
- for(int i=0;i<100;i++) {
- for(int j=0;j<5;j++)
- result += h_b(i,j);
+ for ( int i = 0; i < 100; i++ ) {
+ for ( int j = 0; j < 5; j++ ) {
+ result += h_b( i, j );
}
+ }
return result;
}
-
#else
-template<class DeviceType, bool PWRTest>
+template< class DeviceType, bool PWRTest >
double AddTestLambda() {
- return AddTestFunctor<DeviceType,PWRTest>();
+ return AddTestFunctor< DeviceType, PWRTest >();
}
#endif
-
-template<class DeviceType>
-struct FunctorReduceTest{
- typedef Kokkos::View<double**,DeviceType> view_type;
- view_type a_;
+template< class DeviceType >
+struct FunctorReduceTest {
+ typedef Kokkos::View< double**, DeviceType > view_type;
typedef DeviceType execution_space;
typedef double value_type;
- FunctorReduceTest(view_type & a):a_(a) {}
+ typedef typename Kokkos::TeamPolicy< execution_space >::member_type team_member;
+
+ view_type a_;
+
+ FunctorReduceTest( view_type & a ) : a_( a ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, value_type& sum) const {
- sum += a_(i,1) + a_(i,2);
- sum += a_(i,0) - a_(i,3);
- sum += a_(i,4) + a_(i,0);
- sum += a_(i,2) - a_(i,1);
- sum += a_(i,3) + a_(i,4);
+ void operator() ( const int & i, value_type & sum ) const {
+ sum += a_( i, 1 ) + a_( i, 2 );
+ sum += a_( i, 0 ) - a_( i, 3 );
+ sum += a_( i, 4 ) + a_( i, 0 );
+ sum += a_( i, 2 ) - a_( i, 1 );
+ sum += a_( i, 3 ) + a_( i, 4 );
}
- typedef typename Kokkos::TeamPolicy< execution_space >::member_type team_member ;
-
KOKKOS_INLINE_FUNCTION
- void operator() (const team_member & dev, value_type& sum) const {
- const int begin = dev.league_rank() * 4 ;
- const int end = begin + 4 ;
- for ( int i = begin + dev.team_rank() ; i < end ; i += dev.team_size() ) {
- sum += a_(i,1) + a_(i,2);
- sum += a_(i,0) - a_(i,3);
- sum += a_(i,4) + a_(i,0);
- sum += a_(i,2) - a_(i,1);
- sum += a_(i,3) + a_(i,4);
+ void operator() ( const team_member & dev, value_type & sum ) const {
+ const int begin = dev.league_rank() * 4;
+ const int end = begin + 4;
+ for ( int i = begin + dev.team_rank(); i < end; i += dev.team_size() ) {
+ sum += a_( i, 1 ) + a_( i, 2 );
+ sum += a_( i, 0 ) - a_( i, 3 );
+ sum += a_( i, 4 ) + a_( i, 0 );
+ sum += a_( i, 2 ) - a_( i, 1 );
+ sum += a_( i, 3 ) + a_( i, 4 );
}
}
+
KOKKOS_INLINE_FUNCTION
- void init(value_type& update) const {update = 0.0;}
+ void init( value_type & update ) const { update = 0.0; }
+
KOKKOS_INLINE_FUNCTION
- void join(volatile value_type& update, volatile value_type const& input) const {update += input;}
+ void join( volatile value_type & update, volatile value_type const & input ) const { update += input; }
};
-template<class DeviceType, bool PWRTest>
+template< class DeviceType, bool PWRTest >
double ReduceTestFunctor() {
+ typedef Kokkos::TeamPolicy< DeviceType > policy_type;
+ typedef Kokkos::View< double**, DeviceType > view_type;
+ typedef Kokkos::View< double, typename view_type::host_mirror_space, Kokkos::MemoryUnmanaged > unmanaged_result;
- typedef Kokkos::TeamPolicy<DeviceType> policy_type ;
- typedef Kokkos::View<double**,DeviceType> view_type ;
- typedef Kokkos::View<double,typename view_type::host_mirror_space,Kokkos::MemoryUnmanaged> unmanaged_result ;
-
- view_type a("A",100,5);
- typename view_type::HostMirror h_a = Kokkos::create_mirror_view(a);
+ view_type a( "A", 100, 5 );
+ typename view_type::HostMirror h_a = Kokkos::create_mirror_view( a );
- for(int i=0;i<100;i++) {
- for(int j=0;j<5;j++)
- h_a(i,j) = 0.1*i/(1.1*j+1.0) + 0.5*j;
+ for ( int i = 0; i < 100; i++ ) {
+ for ( int j = 0; j < 5; j++ ) {
+ h_a( i, j ) = 0.1 * i / ( 1.1 * j + 1.0 ) + 0.5 * j;
+ }
}
- Kokkos::deep_copy(a,h_a);
+ Kokkos::deep_copy( a, h_a );
double result = 0.0;
- if(PWRTest==false)
- Kokkos::parallel_reduce(100,FunctorReduceTest<DeviceType>(a), unmanaged_result( & result ));
- else
- Kokkos::parallel_reduce(policy_type(25,Kokkos::AUTO),FunctorReduceTest<DeviceType>(a), unmanaged_result( & result ));
+ if ( PWRTest == false ) {
+ Kokkos::parallel_reduce( 100, FunctorReduceTest< DeviceType >( a ), unmanaged_result( & result ) );
+ }
+ else {
+ Kokkos::parallel_reduce( policy_type( 25, Kokkos::AUTO ), FunctorReduceTest< DeviceType >( a ), unmanaged_result( & result ) );
+ }
return result;
}
-#if defined (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
-template<class DeviceType, bool PWRTest>
+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
+template< class DeviceType, bool PWRTest >
double ReduceTestLambda() {
+ typedef Kokkos::TeamPolicy< DeviceType > policy_type;
+ typedef Kokkos::View< double**, DeviceType > view_type;
+ typedef Kokkos::View< double, typename view_type::host_mirror_space, Kokkos::MemoryUnmanaged > unmanaged_result;
- typedef Kokkos::TeamPolicy<DeviceType> policy_type ;
- typedef Kokkos::View<double**,DeviceType> view_type ;
- typedef Kokkos::View<double,typename view_type::host_mirror_space,Kokkos::MemoryUnmanaged> unmanaged_result ;
-
- view_type a("A",100,5);
- typename view_type::HostMirror h_a = Kokkos::create_mirror_view(a);
+ view_type a( "A", 100, 5 );
+ typename view_type::HostMirror h_a = Kokkos::create_mirror_view( a );
- for(int i=0;i<100;i++) {
- for(int j=0;j<5;j++)
- h_a(i,j) = 0.1*i/(1.1*j+1.0) + 0.5*j;
+ for ( int i = 0; i < 100; i++ ) {
+ for ( int j = 0; j < 5; j++ ) {
+ h_a( i, j ) = 0.1 * i / ( 1.1 * j + 1.0 ) + 0.5 * j;
+ }
}
- Kokkos::deep_copy(a,h_a);
+ Kokkos::deep_copy( a, h_a );
double result = 0.0;
- if(PWRTest==false) {
- Kokkos::parallel_reduce(100,KOKKOS_LAMBDA(const int& i, double& sum) {
- sum += a(i,1) + a(i,2);
- sum += a(i,0) - a(i,3);
- sum += a(i,4) + a(i,0);
- sum += a(i,2) - a(i,1);
- sum += a(i,3) + a(i,4);
+ if ( PWRTest == false ) {
+ Kokkos::parallel_reduce( 100, KOKKOS_LAMBDA( const int & i, double & sum ) {
+ sum += a( i, 1 ) + a( i, 2 );
+ sum += a( i, 0 ) - a( i, 3 );
+ sum += a( i, 4 ) + a( i, 0 );
+ sum += a( i, 2 ) - a( i, 1 );
+ sum += a( i, 3 ) + a( i, 4 );
}, unmanaged_result( & result ) );
- } else {
- typedef typename policy_type::member_type team_member ;
- Kokkos::parallel_reduce(policy_type(25,Kokkos::AUTO),KOKKOS_LAMBDA(const team_member & dev, double& sum) {
- const int begin = dev.league_rank() * 4 ;
- const int end = begin + 4 ;
- for ( int i = begin + dev.team_rank() ; i < end ; i += dev.team_size() ) {
- sum += a(i,1) + a(i,2);
- sum += a(i,0) - a(i,3);
- sum += a(i,4) + a(i,0);
- sum += a(i,2) - a(i,1);
- sum += a(i,3) + a(i,4);
+ }
+ else {
+ typedef typename policy_type::member_type team_member;
+ Kokkos::parallel_reduce( policy_type( 25, Kokkos::AUTO ), KOKKOS_LAMBDA( const team_member & dev, double & sum ) {
+ const int begin = dev.league_rank() * 4;
+ const int end = begin + 4;
+ for ( int i = begin + dev.team_rank(); i < end; i += dev.team_size() ) {
+ sum += a( i, 1 ) + a( i, 2 );
+ sum += a( i, 0 ) - a( i, 3 );
+ sum += a( i, 4 ) + a( i, 0 );
+ sum += a( i, 2 ) - a( i, 1 );
+ sum += a( i, 3 ) + a( i, 4 );
}
}, unmanaged_result( & result ) );
}
return result;
}
-
#else
-template<class DeviceType, bool PWRTest>
+template< class DeviceType, bool PWRTest >
double ReduceTestLambda() {
- return ReduceTestFunctor<DeviceType,PWRTest>();
+ return ReduceTestFunctor< DeviceType, PWRTest >();
}
#endif
-template<class DeviceType>
-double TestVariantLambda(int test) {
- switch (test) {
- case 1: return AddTestLambda<DeviceType,false>();
- case 2: return AddTestLambda<DeviceType,true>();
- case 3: return ReduceTestLambda<DeviceType,false>();
- case 4: return ReduceTestLambda<DeviceType,true>();
+template< class DeviceType >
+double TestVariantLambda( int test ) {
+ switch ( test ) {
+ case 1: return AddTestLambda< DeviceType, false >();
+ case 2: return AddTestLambda< DeviceType, true >();
+ case 3: return ReduceTestLambda< DeviceType, false >();
+ case 4: return ReduceTestLambda< DeviceType, true >();
}
+
return 0;
}
-
-template<class DeviceType>
-double TestVariantFunctor(int test) {
- switch (test) {
- case 1: return AddTestFunctor<DeviceType,false>();
- case 2: return AddTestFunctor<DeviceType,true>();
- case 3: return ReduceTestFunctor<DeviceType,false>();
- case 4: return ReduceTestFunctor<DeviceType,true>();
+template< class DeviceType >
+double TestVariantFunctor( int test ) {
+ switch ( test ) {
+ case 1: return AddTestFunctor< DeviceType, false >();
+ case 2: return AddTestFunctor< DeviceType, true >();
+ case 3: return ReduceTestFunctor< DeviceType, false >();
+ case 4: return ReduceTestFunctor< DeviceType, true >();
}
+
return 0;
}
-template<class DeviceType>
-bool Test(int test) {
-
+template< class DeviceType >
+bool Test( int test ) {
#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
- double res_functor = TestVariantFunctor<DeviceType>(test);
- double res_lambda = TestVariantLambda<DeviceType>(test);
+ double res_functor = TestVariantFunctor< DeviceType >( test );
+ double res_lambda = TestVariantLambda< DeviceType >( test );
- char testnames[5][256] = {" "
- ,"AddTest","AddTest TeamPolicy"
- ,"ReduceTest","ReduceTest TeamPolicy"
+ char testnames[5][256] = { " "
+ , "AddTest", "AddTest TeamPolicy"
+ , "ReduceTest", "ReduceTest TeamPolicy"
};
bool passed = true;
if ( res_functor != res_lambda ) {
passed = false;
std::cout << "CXX11 ( test = '"
<< testnames[test] << "' FAILED : "
<< res_functor << " != " << res_lambda
- << std::endl ;
+ << std::endl;
}
- return passed ;
+ return passed;
#else
return true;
#endif
}
-}
+} // namespace TestCXX11
diff --git a/lib/kokkos/core/unit_test/TestCXX11Deduction.hpp b/lib/kokkos/core/unit_test/TestCXX11Deduction.hpp
index 359e17a44..b53b42b8e 100644
--- a/lib/kokkos/core/unit_test/TestCXX11Deduction.hpp
+++ b/lib/kokkos/core/unit_test/TestCXX11Deduction.hpp
@@ -1,94 +1,92 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
+
#include <Kokkos_Core.hpp>
#ifndef TESTCXX11DEDUCTION_HPP
#define TESTCXX11DEDUCTION_HPP
namespace TestCXX11 {
struct TestReductionDeductionTagA {};
struct TestReductionDeductionTagB {};
template < class ExecSpace >
struct TestReductionDeductionFunctor {
-
// KOKKOS_INLINE_FUNCTION
- // void operator()( long i , long & value ) const
- // { value += i + 1 ; }
+ // void operator()( long i, long & value ) const
+ // { value += i + 1; }
KOKKOS_INLINE_FUNCTION
- void operator()( TestReductionDeductionTagA , long i , long & value ) const
+ void operator()( TestReductionDeductionTagA, long i, long & value ) const
{ value += ( 2 * i + 1 ) + ( 2 * i + 2 ); }
KOKKOS_INLINE_FUNCTION
- void operator()( const TestReductionDeductionTagB & , const long i , long & value ) const
- { value += ( 3 * i + 1 ) + ( 3 * i + 2 ) + ( 3 * i + 3 ) ; }
-
+ void operator()( const TestReductionDeductionTagB &, const long i, long & value ) const
+ { value += ( 3 * i + 1 ) + ( 3 * i + 2 ) + ( 3 * i + 3 ); }
};
template< class ExecSpace >
void test_reduction_deduction()
{
- typedef TestReductionDeductionFunctor< ExecSpace > Functor ;
+ typedef TestReductionDeductionFunctor< ExecSpace > Functor;
- const long N = 50 ;
- // const long answer = N % 2 ? ( N * ((N+1)/2 )) : ( (N/2) * (N+1) );
- const long answerA = N % 2 ? ( (2*N) * (((2*N)+1)/2 )) : ( ((2*N)/2) * ((2*N)+1) );
- const long answerB = N % 2 ? ( (3*N) * (((3*N)+1)/2 )) : ( ((3*N)/2) * ((3*N)+1) );
- long result = 0 ;
+ const long N = 50;
+ // const long answer = N % 2 ? ( N * ( ( N + 1 ) / 2 ) ) : ( ( N / 2 ) * ( N + 1 ) );
+ const long answerA = N % 2 ? ( ( 2 * N ) * ( ( ( 2 * N ) + 1 ) / 2 ) ) : ( ( ( 2 * N ) / 2 ) * ( ( 2 * N ) + 1 ) );
+ const long answerB = N % 2 ? ( ( 3 * N ) * ( ( ( 3 * N ) + 1 ) / 2 ) ) : ( ( ( 3 * N ) / 2 ) * ( ( 3 * N ) + 1 ) );
+ long result = 0;
- // Kokkos::parallel_reduce( Kokkos::RangePolicy<ExecSpace>(0,N) , Functor() , result );
- // ASSERT_EQ( answer , result );
-
- Kokkos::parallel_reduce( Kokkos::RangePolicy<ExecSpace,TestReductionDeductionTagA>(0,N) , Functor() , result );
- ASSERT_EQ( answerA , result );
-
- Kokkos::parallel_reduce( Kokkos::RangePolicy<ExecSpace,TestReductionDeductionTagB>(0,N) , Functor() , result );
- ASSERT_EQ( answerB , result );
-}
+ // Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), Functor(), result );
+ // ASSERT_EQ( answer, result );
+
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace, TestReductionDeductionTagA >( 0, N ), Functor(), result );
+ ASSERT_EQ( answerA, result );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace, TestReductionDeductionTagB >( 0, N ), Functor(), result );
+ ASSERT_EQ( answerB, result );
}
-#endif
+} // namespace TestCXX11
+#endif
diff --git a/lib/kokkos/core/unit_test/TestCompilerMacros.hpp b/lib/kokkos/core/unit_test/TestCompilerMacros.hpp
index 5add656a4..455543834 100644
--- a/lib/kokkos/core/unit_test/TestCompilerMacros.hpp
+++ b/lib/kokkos/core/unit_test/TestCompilerMacros.hpp
@@ -1,95 +1,97 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#define KOKKOS_PRAGMA_UNROLL(a)
namespace TestCompilerMacros {
-template<class DEVICE_TYPE>
+template< class DEVICE_TYPE >
struct AddFunctor {
typedef DEVICE_TYPE execution_space;
- typedef typename Kokkos::View<int**,execution_space> type;
- type a,b;
+ typedef typename Kokkos::View< int**, execution_space > type;
+ type a, b;
int length;
- AddFunctor(type a_, type b_):a(a_),b(b_),length(a.dimension_1()) {}
+ AddFunctor( type a_, type b_ ) : a( a_ ), b( b_ ), length( a.dimension_1() ) {}
KOKKOS_INLINE_FUNCTION
- void operator()(int i) const {
+ void operator()( int i ) const {
#ifdef KOKKOS_ENABLE_PRAGMA_UNROLL
#pragma unroll
#endif
#ifdef KOKKOS_ENABLE_PRAGMA_IVDEP
#pragma ivdep
#endif
#ifdef KOKKOS_ENABLE_PRAGMA_VECTOR
#pragma vector always
#endif
#ifdef KOKKOS_ENABLE_PRAGMA_LOOPCOUNT
#pragma loop count(128)
#endif
#ifndef KOKKOS_DEBUG
#ifdef KOKKOS_ENABLE_PRAGMA_SIMD
#pragma simd
#endif
#endif
- for(int j=0;j<length;j++)
- a(i,j) += b(i,j);
+ for ( int j = 0; j < length; j++ ) {
+ a( i, j ) += b( i, j );
+ }
}
};
-template<class DeviceType>
+template< class DeviceType >
bool Test() {
- typedef typename Kokkos::View<int**,DeviceType> type;
- type a("A",1024,128);
- type b("B",1024,128);
+ typedef typename Kokkos::View< int**, DeviceType > type;
+ type a( "A", 1024, 128 );
+ type b( "B", 1024, 128 );
- AddFunctor<DeviceType> f(a,b);
- Kokkos::parallel_for(1024,f);
+ AddFunctor< DeviceType > f( a, b );
+ Kokkos::parallel_for( 1024, f );
DeviceType::fence();
+
return true;
}
-}
+} // namespace TestCompilerMacros
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
index 7e08f67e6..f85a35c09 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType.cpp
@@ -1,101 +1,99 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
-#if !defined(KOKKOS_ENABLE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+#if !defined( KOKKOS_ENABLE_CUDA ) || defined( __CUDACC__ )
#include <TestAtomic.hpp>
-
#include <TestViewAPI.hpp>
-
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestTeam.hpp>
#include <TestAggregate.hpp>
#include <TestCompilerMacros.hpp>
#include <TestCXX11.hpp>
#include <TestTeamVector.hpp>
#include <TestUtilities.hpp>
namespace Test {
class defaultdevicetype : public ::testing::Test {
protected:
static void SetUpTestCase()
{
Kokkos::initialize();
}
static void TearDownTestCase()
{
Kokkos::finalize();
}
};
TEST_F( defaultdevicetype, host_space_access )
{
- typedef Kokkos::HostSpace::execution_space host_exec_space ;
- typedef Kokkos::Device< host_exec_space , Kokkos::HostSpace > device_space ;
- typedef Kokkos::Impl::HostMirror< Kokkos::DefaultExecutionSpace >::Space mirror_space ;
+ typedef Kokkos::HostSpace::execution_space host_exec_space;
+ typedef Kokkos::Device< host_exec_space, Kokkos::HostSpace > device_space;
+ typedef Kokkos::Impl::HostMirror< Kokkos::DefaultExecutionSpace >::Space mirror_space;
static_assert(
- Kokkos::Impl::SpaceAccessibility< host_exec_space , Kokkos::HostSpace >::accessible , "" );
+ Kokkos::Impl::SpaceAccessibility< host_exec_space, Kokkos::HostSpace >::accessible, "" );
static_assert(
- Kokkos::Impl::SpaceAccessibility< device_space , Kokkos::HostSpace >::accessible , "" );
+ Kokkos::Impl::SpaceAccessibility< device_space, Kokkos::HostSpace >::accessible, "" );
static_assert(
- Kokkos::Impl::SpaceAccessibility< mirror_space , Kokkos::HostSpace >::accessible , "" );
+ Kokkos::Impl::SpaceAccessibility< mirror_space, Kokkos::HostSpace >::accessible, "" );
}
-TEST_F( defaultdevicetype, view_api) {
- TestViewAPI< double , Kokkos::DefaultExecutionSpace >();
+TEST_F( defaultdevicetype, view_api )
+{
+ TestViewAPI< double, Kokkos::DefaultExecutionSpace >();
}
-} // namespace test
+} // namespace Test
#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp b/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp
index 7778efde3..401da58a5 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceTypeInit.hpp
@@ -1,419 +1,468 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
+
#ifdef KOKKOS_ENABLE_OPENMP
#include <omp.h>
#endif
-#if !defined(KOKKOS_ENABLE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+#if !defined( KOKKOS_ENABLE_CUDA ) || defined( __CUDACC__ )
namespace Test {
namespace Impl {
- char** init_kokkos_args(bool do_threads,bool do_numa,bool do_device,bool do_other, int& nargs, Kokkos::InitArguments& init_args) {
- nargs = (do_threads?1:0) +
- (do_numa?1:0) +
- (do_device?1:0) +
- (do_other?4:0);
- char** args_kokkos = new char*[nargs];
- for(int i = 0; i < nargs; i++)
- args_kokkos[i] = new char[20];
+char** init_kokkos_args( bool do_threads, bool do_numa, bool do_device, bool do_other, int & nargs, Kokkos::InitArguments & init_args ) {
+ nargs = ( do_threads ? 1 : 0 ) +
+ ( do_numa ? 1 : 0 ) +
+ ( do_device ? 1 : 0 ) +
+ ( do_other ? 4 : 0 );
- int threads_idx = do_other?1:0;
- int numa_idx = (do_other?3:0) + (do_threads?1:0);
- int device_idx = (do_other?3:0) + (do_threads?1:0) + (do_numa?1:0);
+ char** args_kokkos = new char*[nargs];
+ for ( int i = 0; i < nargs; i++ ) {
+ args_kokkos[i] = new char[20];
+ }
+ int threads_idx = do_other ? 1 : 0;
+ int numa_idx = ( do_other ? 3 : 0 ) + ( do_threads ? 1 : 0 );
+ int device_idx = ( do_other ? 3 : 0 ) + ( do_threads ? 1 : 0 ) + ( do_numa ? 1 : 0 );
- if(do_threads) {
- int nthreads = 3;
+ if ( do_threads ) {
+ int nthreads = 3;
#ifdef KOKKOS_ENABLE_OPENMP
- if(omp_get_max_threads() < 3)
- nthreads = omp_get_max_threads();
+ if ( omp_get_max_threads() < 3 )
+ nthreads = omp_get_max_threads();
#endif
- if(Kokkos::hwloc::available()) {
- if(Kokkos::hwloc::get_available_threads_per_core()<3)
- nthreads = Kokkos::hwloc::get_available_threads_per_core()
- * Kokkos::hwloc::get_available_numa_count();
- }
-
-#ifdef KOKKOS_ENABLE_SERIAL
- if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
- std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
- nthreads = 1;
- }
-#endif
- init_args.num_threads = nthreads;
- sprintf(args_kokkos[threads_idx],"--threads=%i",nthreads);
+ if ( Kokkos::hwloc::available() ) {
+ if ( Kokkos::hwloc::get_available_threads_per_core() < 3 )
+ nthreads = Kokkos::hwloc::get_available_threads_per_core()
+ * Kokkos::hwloc::get_available_numa_count();
}
- if(do_numa) {
- int numa = 1;
- if(Kokkos::hwloc::available())
- numa = Kokkos::hwloc::get_available_numa_count();
#ifdef KOKKOS_ENABLE_SERIAL
- if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
- std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
- numa = 1;
- }
-#endif
-
- init_args.num_numa = numa;
- sprintf(args_kokkos[numa_idx],"--numa=%i",numa);
+ if ( std::is_same< Kokkos::Serial, Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::Serial, Kokkos::DefaultHostExecutionSpace >::value ) {
+ nthreads = 1;
}
+#endif
- if(do_device) {
+ init_args.num_threads = nthreads;
+ sprintf( args_kokkos[threads_idx], "--threads=%i", nthreads );
+ }
- init_args.device_id = 0;
- sprintf(args_kokkos[device_idx],"--device=%i",0);
+ if ( do_numa ) {
+ int numa = 1;
+ if ( Kokkos::hwloc::available() ) {
+ numa = Kokkos::hwloc::get_available_numa_count();
}
- if(do_other) {
- sprintf(args_kokkos[0],"--dummyarg=1");
- sprintf(args_kokkos[threads_idx+(do_threads?1:0)],"--dummy2arg");
- sprintf(args_kokkos[threads_idx+(do_threads?1:0)+1],"dummy3arg");
- sprintf(args_kokkos[device_idx+(do_device?1:0)],"dummy4arg=1");
+#ifdef KOKKOS_ENABLE_SERIAL
+ if ( std::is_same< Kokkos::Serial, Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::Serial, Kokkos::DefaultHostExecutionSpace >::value ) {
+ numa = 1;
}
+#endif
+ init_args.num_numa = numa;
+ sprintf( args_kokkos[numa_idx], "--numa=%i", numa );
+ }
- return args_kokkos;
+ if ( do_device ) {
+ init_args.device_id = 0;
+ sprintf( args_kokkos[device_idx], "--device=%i", 0 );
}
- Kokkos::InitArguments init_initstruct(bool do_threads, bool do_numa, bool do_device) {
- Kokkos::InitArguments args;
+ if ( do_other ) {
+ sprintf( args_kokkos[0], "--dummyarg=1" );
+ sprintf( args_kokkos[ threads_idx + ( do_threads ? 1 : 0 ) ], "--dummy2arg" );
+ sprintf( args_kokkos[ threads_idx + ( do_threads ? 1 : 0 ) + 1 ], "dummy3arg" );
+ sprintf( args_kokkos[ device_idx + ( do_device ? 1 : 0 ) ], "dummy4arg=1" );
+ }
+
+ return args_kokkos;
+}
+
+Kokkos::InitArguments init_initstruct( bool do_threads, bool do_numa, bool do_device ) {
+ Kokkos::InitArguments args;
- if(do_threads) {
- int nthreads = 3;
+ if ( do_threads ) {
+ int nthreads = 3;
#ifdef KOKKOS_ENABLE_OPENMP
- if(omp_get_max_threads() < 3)
- nthreads = omp_get_max_threads();
+ if ( omp_get_max_threads() < 3 ) {
+ nthreads = omp_get_max_threads();
+ }
#endif
- if(Kokkos::hwloc::available()) {
- if(Kokkos::hwloc::get_available_threads_per_core()<3)
- nthreads = Kokkos::hwloc::get_available_threads_per_core()
- * Kokkos::hwloc::get_available_numa_count();
+ if ( Kokkos::hwloc::available() ) {
+ if ( Kokkos::hwloc::get_available_threads_per_core() < 3 ) {
+ nthreads = Kokkos::hwloc::get_available_threads_per_core()
+ * Kokkos::hwloc::get_available_numa_count();
}
+ }
+
#ifdef KOKKOS_ENABLE_SERIAL
- if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
- std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
- nthreads = 1;
- }
+ if ( std::is_same< Kokkos::Serial, Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::Serial, Kokkos::DefaultHostExecutionSpace >::value ) {
+ nthreads = 1;
+ }
#endif
- args.num_threads = nthreads;
+ args.num_threads = nthreads;
+ }
+
+ if ( do_numa ) {
+ int numa = 1;
+ if ( Kokkos::hwloc::available() ) {
+ numa = Kokkos::hwloc::get_available_numa_count();
}
- if(do_numa) {
- int numa = 1;
- if(Kokkos::hwloc::available())
- numa = Kokkos::hwloc::get_available_numa_count();
#ifdef KOKKOS_ENABLE_SERIAL
- if(std::is_same<Kokkos::Serial,Kokkos::DefaultExecutionSpace>::value ||
- std::is_same<Kokkos::Serial,Kokkos::DefaultHostExecutionSpace>::value ) {
- numa = 1;
- }
-#endif
- args.num_numa = numa;
+ if ( std::is_same< Kokkos::Serial, Kokkos::DefaultExecutionSpace >::value ||
+ std::is_same< Kokkos::Serial, Kokkos::DefaultHostExecutionSpace >::value ) {
+ numa = 1;
}
+#endif
- if(do_device) {
- args.device_id = 0;
- }
+ args.num_numa = numa;
+ }
- return args;
+ if ( do_device ) {
+ args.device_id = 0;
}
- void check_correct_initialization(const Kokkos::InitArguments& argstruct) {
- ASSERT_EQ( Kokkos::DefaultExecutionSpace::is_initialized(), 1);
- ASSERT_EQ( Kokkos::HostSpace::execution_space::is_initialized(), 1);
-
- //Figure out the number of threads the HostSpace ExecutionSpace should have initialized to
- int expected_nthreads = argstruct.num_threads;
- if(expected_nthreads<1) {
- if(Kokkos::hwloc::available()) {
- expected_nthreads = Kokkos::hwloc::get_available_numa_count()
- * Kokkos::hwloc::get_available_cores_per_numa()
- * Kokkos::hwloc::get_available_threads_per_core();
- } else {
- #ifdef KOKKOS_ENABLE_OPENMP
- if(std::is_same<Kokkos::HostSpace::execution_space,Kokkos::OpenMP>::value) {
- expected_nthreads = omp_get_max_threads();
- } else
- #endif
- expected_nthreads = 1;
+ return args;
+}
+
+void check_correct_initialization( const Kokkos::InitArguments & argstruct ) {
+ ASSERT_EQ( Kokkos::DefaultExecutionSpace::is_initialized(), 1 );
+ ASSERT_EQ( Kokkos::HostSpace::execution_space::is_initialized(), 1 );
+
+ // Figure out the number of threads the HostSpace ExecutionSpace should have initialized to.
+ int expected_nthreads = argstruct.num_threads;
+ if ( expected_nthreads < 1 ) {
+ if ( Kokkos::hwloc::available() ) {
+ expected_nthreads = Kokkos::hwloc::get_available_numa_count()
+ * Kokkos::hwloc::get_available_cores_per_numa()
+ * Kokkos::hwloc::get_available_threads_per_core();
+ }
+ else {
+#ifdef KOKKOS_ENABLE_OPENMP
+ if ( std::is_same< Kokkos::HostSpace::execution_space, Kokkos::OpenMP >::value ) {
+ expected_nthreads = omp_get_max_threads();
}
- #ifdef KOKKOS_ENABLE_SERIAL
- if(std::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Serial>::value ||
- std::is_same<Kokkos::DefaultHostExecutionSpace,Kokkos::Serial>::value )
+ else
+#endif
expected_nthreads = 1;
- #endif
}
- int expected_numa = argstruct.num_numa;
- if(expected_numa<1) {
- if(Kokkos::hwloc::available()) {
- expected_numa = Kokkos::hwloc::get_available_numa_count();
- } else {
- expected_numa = 1;
- }
- #ifdef KOKKOS_ENABLE_SERIAL
- if(std::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Serial>::value ||
- std::is_same<Kokkos::DefaultHostExecutionSpace,Kokkos::Serial>::value )
- expected_numa = 1;
- #endif
+#ifdef KOKKOS_ENABLE_SERIAL
+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Serial >::value ||
+ std::is_same< Kokkos::DefaultHostExecutionSpace, Kokkos::Serial >::value ) {
+ expected_nthreads = 1;
}
- ASSERT_EQ(Kokkos::HostSpace::execution_space::thread_pool_size(),expected_nthreads);
+#endif
+ }
-#ifdef KOKKOS_ENABLE_CUDA
- if(std::is_same<Kokkos::DefaultExecutionSpace,Kokkos::Cuda>::value) {
- int device;
- cudaGetDevice( &device );
- int expected_device = argstruct.device_id;
- if(argstruct.device_id<0) {
- expected_device = 0;
- }
- ASSERT_EQ(expected_device,device);
+ int expected_numa = argstruct.num_numa;
+
+ if ( expected_numa < 1 ) {
+ if ( Kokkos::hwloc::available() ) {
+ expected_numa = Kokkos::hwloc::get_available_numa_count();
+ }
+ else {
+ expected_numa = 1;
}
+
+#ifdef KOKKOS_ENABLE_SERIAL
+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Serial >::value ||
+ std::is_same< Kokkos::DefaultHostExecutionSpace, Kokkos::Serial >::value )
+ expected_numa = 1;
#endif
}
- //ToDo: Add check whether correct number of threads are actually started
- void test_no_arguments() {
- Kokkos::initialize();
- check_correct_initialization(Kokkos::InitArguments());
- Kokkos::finalize();
- }
+ ASSERT_EQ( Kokkos::HostSpace::execution_space::thread_pool_size(), expected_nthreads );
- void test_commandline_args(int nargs, char** args, const Kokkos::InitArguments& argstruct) {
- Kokkos::initialize(nargs,args);
- check_correct_initialization(argstruct);
- Kokkos::finalize();
- }
- void test_initstruct_args(const Kokkos::InitArguments& args) {
- Kokkos::initialize(args);
- check_correct_initialization(args);
- Kokkos::finalize();
+#ifdef KOKKOS_ENABLE_CUDA
+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Cuda >::value ) {
+ int device;
+ cudaGetDevice( &device );
+
+ int expected_device = argstruct.device_id;
+ if ( argstruct.device_id < 0 ) {
+ expected_device = 0;
+ }
+
+ ASSERT_EQ( expected_device, device );
}
+#endif
+}
+
+// TODO: Add check whether correct number of threads are actually started.
+void test_no_arguments() {
+ Kokkos::initialize();
+ check_correct_initialization( Kokkos::InitArguments() );
+ Kokkos::finalize();
}
+void test_commandline_args( int nargs, char** args, const Kokkos::InitArguments & argstruct ) {
+ Kokkos::initialize( nargs, args );
+ check_correct_initialization( argstruct );
+ Kokkos::finalize();
+}
+
+void test_initstruct_args( const Kokkos::InitArguments & args ) {
+ Kokkos::initialize( args );
+ check_correct_initialization( args );
+ Kokkos::finalize();
+}
+
+} // namespace Impl
+
class defaultdevicetypeinit : public ::testing::Test {
protected:
- static void SetUpTestCase()
- {
- }
+ static void SetUpTestCase() {}
- static void TearDownTestCase()
- {
- }
+ static void TearDownTestCase() {}
};
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_01
-TEST_F( defaultdevicetypeinit, no_args) {
+TEST_F( defaultdevicetypeinit, no_args )
+{
Impl::test_no_arguments();
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_02
-TEST_F( defaultdevicetypeinit, commandline_args_empty) {
+TEST_F( defaultdevicetypeinit, commandline_args_empty )
+{
Kokkos::InitArguments argstruct;
int nargs = 0;
- char** args = Impl::init_kokkos_args(false,false,false,false,nargs, argstruct);
- Impl::test_commandline_args(nargs,args,argstruct);
- for(int i = 0; i < nargs; i++)
+ char** args = Impl::init_kokkos_args( false, false, false, false, nargs, argstruct );
+ Impl::test_commandline_args( nargs, args, argstruct );
+
+ for ( int i = 0; i < nargs; i++ ) {
delete [] args[i];
+ }
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_03
-TEST_F( defaultdevicetypeinit, commandline_args_other) {
+TEST_F( defaultdevicetypeinit, commandline_args_other )
+{
Kokkos::InitArguments argstruct;
int nargs = 0;
- char** args = Impl::init_kokkos_args(false,false,false,true,nargs, argstruct);
- Impl::test_commandline_args(nargs,args,argstruct);
- for(int i = 0; i < nargs; i++)
+ char** args = Impl::init_kokkos_args( false, false, false, true, nargs, argstruct );
+ Impl::test_commandline_args( nargs, args, argstruct );
+
+ for ( int i = 0; i < nargs; i++ ) {
delete [] args[i];
+ }
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_04
-TEST_F( defaultdevicetypeinit, commandline_args_nthreads) {
+TEST_F( defaultdevicetypeinit, commandline_args_nthreads )
+{
Kokkos::InitArguments argstruct;
int nargs = 0;
- char** args = Impl::init_kokkos_args(true,false,false,false,nargs, argstruct);
- Impl::test_commandline_args(nargs,args,argstruct);
- for(int i = 0; i < nargs; i++)
+ char** args = Impl::init_kokkos_args( true, false, false, false, nargs, argstruct );
+ Impl::test_commandline_args( nargs, args, argstruct );
+
+ for ( int i = 0; i < nargs; i++ ) {
delete [] args[i];
+ }
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_05
-TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa) {
+TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa )
+{
Kokkos::InitArguments argstruct;
int nargs = 0;
- char** args = Impl::init_kokkos_args(true,true,false,false,nargs, argstruct);
- Impl::test_commandline_args(nargs,args,argstruct);
- for(int i = 0; i < nargs; i++)
+ char** args = Impl::init_kokkos_args( true, true, false, false, nargs, argstruct );
+ Impl::test_commandline_args( nargs, args, argstruct );
+
+ for ( int i = 0; i < nargs; i++ ) {
delete [] args[i];
+ }
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_06
-TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa_device) {
+TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa_device )
+{
Kokkos::InitArguments argstruct;
int nargs = 0;
- char** args = Impl::init_kokkos_args(true,true,true,false,nargs, argstruct);
- Impl::test_commandline_args(nargs,args,argstruct);
- for(int i = 0; i < nargs; i++)
+ char** args = Impl::init_kokkos_args( true, true, true, false, nargs, argstruct );
+ Impl::test_commandline_args( nargs, args, argstruct );
+
+ for ( int i = 0; i < nargs; i++ ) {
delete [] args[i];
+ }
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_07
-TEST_F( defaultdevicetypeinit, commandline_args_nthreads_device) {
+TEST_F( defaultdevicetypeinit, commandline_args_nthreads_device )
+{
Kokkos::InitArguments argstruct;
int nargs = 0;
- char** args = Impl::init_kokkos_args(true,false,true,false,nargs, argstruct);
- Impl::test_commandline_args(nargs,args,argstruct);
- for(int i = 0; i < nargs; i++)
+ char** args = Impl::init_kokkos_args( true, false, true, false, nargs, argstruct );
+ Impl::test_commandline_args( nargs, args, argstruct );
+
+ for ( int i = 0; i < nargs; i++ ) {
delete [] args[i];
+ }
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_08
-TEST_F( defaultdevicetypeinit, commandline_args_numa_device) {
+TEST_F( defaultdevicetypeinit, commandline_args_numa_device )
+{
Kokkos::InitArguments argstruct;
int nargs = 0;
- char** args = Impl::init_kokkos_args(false,true,true,false,nargs, argstruct);
- Impl::test_commandline_args(nargs,args,argstruct);
- for(int i = 0; i < nargs; i++)
+ char** args = Impl::init_kokkos_args( false, true, true, false, nargs, argstruct );
+ Impl::test_commandline_args( nargs, args, argstruct );
+
+ for ( int i = 0; i < nargs; i++ ) {
delete [] args[i];
+ }
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_09
-TEST_F( defaultdevicetypeinit, commandline_args_device) {
+TEST_F( defaultdevicetypeinit, commandline_args_device )
+{
Kokkos::InitArguments argstruct;
int nargs = 0;
- char** args = Impl::init_kokkos_args(false,false,true,false,nargs, argstruct);
- Impl::test_commandline_args(nargs,args,argstruct);
- for(int i = 0; i < nargs; i++)
+ char** args = Impl::init_kokkos_args( false, false, true, false, nargs, argstruct );
+ Impl::test_commandline_args( nargs, args, argstruct );
+
+ for ( int i = 0; i < nargs; i++ ) {
delete [] args[i];
+ }
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_10
-TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa_device_other) {
+TEST_F( defaultdevicetypeinit, commandline_args_nthreads_numa_device_other )
+{
Kokkos::InitArguments argstruct;
int nargs = 0;
- char** args = Impl::init_kokkos_args(true,true,true,true,nargs, argstruct);
- Impl::test_commandline_args(nargs,args,argstruct);
- for(int i = 0; i < nargs; i++)
+ char** args = Impl::init_kokkos_args( true, true, true, true, nargs, argstruct );
+ Impl::test_commandline_args( nargs, args, argstruct );
+
+ for ( int i = 0; i < nargs; i++ ) {
delete [] args[i];
+ }
delete [] args;
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_11
-TEST_F( defaultdevicetypeinit, initstruct_default) {
+TEST_F( defaultdevicetypeinit, initstruct_default )
+{
Kokkos::InitArguments args;
- Impl::test_initstruct_args(args);
+ Impl::test_initstruct_args( args );
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_12
-TEST_F( defaultdevicetypeinit, initstruct_nthreads) {
- Kokkos::InitArguments args = Impl::init_initstruct(true,false,false);
- Impl::test_initstruct_args(args);
+TEST_F( defaultdevicetypeinit, initstruct_nthreads )
+{
+ Kokkos::InitArguments args = Impl::init_initstruct( true, false, false );
+ Impl::test_initstruct_args( args );
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_13
-TEST_F( defaultdevicetypeinit, initstruct_nthreads_numa) {
- Kokkos::InitArguments args = Impl::init_initstruct(true,true,false);
- Impl::test_initstruct_args(args);
+TEST_F( defaultdevicetypeinit, initstruct_nthreads_numa )
+{
+ Kokkos::InitArguments args = Impl::init_initstruct( true, true, false );
+ Impl::test_initstruct_args( args );
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_14
-TEST_F( defaultdevicetypeinit, initstruct_device) {
- Kokkos::InitArguments args = Impl::init_initstruct(false,false,true);
- Impl::test_initstruct_args(args);
+TEST_F( defaultdevicetypeinit, initstruct_device )
+{
+ Kokkos::InitArguments args = Impl::init_initstruct( false, false, true );
+ Impl::test_initstruct_args( args );
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_15
-TEST_F( defaultdevicetypeinit, initstruct_nthreads_device) {
- Kokkos::InitArguments args = Impl::init_initstruct(true,false,true);
- Impl::test_initstruct_args(args);
+TEST_F( defaultdevicetypeinit, initstruct_nthreads_device )
+{
+ Kokkos::InitArguments args = Impl::init_initstruct( true, false, true );
+ Impl::test_initstruct_args( args );
}
#endif
#ifdef KOKKOS_DEFAULTDEVICETYPE_INIT_TEST_16
-TEST_F( defaultdevicetypeinit, initstruct_nthreads_numa_device) {
- Kokkos::InitArguments args = Impl::init_initstruct(true,true,true);
- Impl::test_initstruct_args(args);
+TEST_F( defaultdevicetypeinit, initstruct_nthreads_numa_device )
+{
+ Kokkos::InitArguments args = Impl::init_initstruct( true, true, true );
+ Impl::test_initstruct_args( args );
}
#endif
-
-} // namespace test
+} // namespace Test
#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
index dd148a062..4fdfa9591 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_a.cpp
@@ -1,76 +1,74 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
-#if !defined(KOKKOS_ENABLE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+#if !defined( KOKKOS_ENABLE_CUDA ) || defined( __CUDACC__ )
#include <TestReduce.hpp>
-
namespace Test {
class defaultdevicetype : public ::testing::Test {
protected:
static void SetUpTestCase()
{
Kokkos::initialize();
}
static void TearDownTestCase()
{
Kokkos::finalize();
}
};
-
-TEST_F( defaultdevicetype, reduce_instantiation_a) {
+TEST_F( defaultdevicetype, reduce_instantiation_a )
+{
TestReduceCombinatoricalInstantiation<>::execute_a();
}
-} // namespace test
+} // namespace Test
#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp
index c8edfdd5c..841f34e03 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_b.cpp
@@ -1,76 +1,74 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
-#if !defined(KOKKOS_ENABLE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+#if !defined( KOKKOS_ENABLE_CUDA ) || defined( __CUDACC__ )
#include <TestReduce.hpp>
-
namespace Test {
class defaultdevicetype : public ::testing::Test {
protected:
static void SetUpTestCase()
{
Kokkos::initialize();
}
static void TearDownTestCase()
{
Kokkos::finalize();
}
};
-
-TEST_F( defaultdevicetype, reduce_instantiation_b) {
+TEST_F( defaultdevicetype, reduce_instantiation_b )
+{
TestReduceCombinatoricalInstantiation<>::execute_b();
}
-} // namespace test
+} // namespace Test
#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp
index 405d49a9b..602863be3 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_c.cpp
@@ -1,76 +1,74 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
-#if !defined(KOKKOS_ENABLE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+#if !defined( KOKKOS_ENABLE_CUDA ) || defined( __CUDACC__ )
#include <TestReduce.hpp>
-
namespace Test {
class defaultdevicetype : public ::testing::Test {
protected:
static void SetUpTestCase()
{
Kokkos::initialize();
}
static void TearDownTestCase()
{
Kokkos::finalize();
}
};
-
-TEST_F( defaultdevicetype, reduce_instantiation_c) {
+TEST_F( defaultdevicetype, reduce_instantiation_c )
+{
TestReduceCombinatoricalInstantiation<>::execute_c();
}
-} // namespace test
+} // namespace Test
#endif
diff --git a/lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp b/lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp
index 426cc4f06..5d3665b90 100644
--- a/lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp
+++ b/lib/kokkos/core/unit_test/TestDefaultDeviceType_d.cpp
@@ -1,237 +1,237 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
-#if !defined(KOKKOS_ENABLE_CUDA) || defined(__CUDACC__)
-//----------------------------------------------------------------------------
+#if !defined( KOKKOS_ENABLE_CUDA ) || defined( __CUDACC__ )
#include <TestAtomic.hpp>
-
#include <TestViewAPI.hpp>
-
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestTeam.hpp>
#include <TestAggregate.hpp>
#include <TestCompilerMacros.hpp>
#include <TestCXX11.hpp>
#include <TestTeamVector.hpp>
#include <TestUtilities.hpp>
namespace Test {
class defaultdevicetype : public ::testing::Test {
protected:
static void SetUpTestCase()
{
Kokkos::initialize();
}
static void TearDownTestCase()
{
Kokkos::finalize();
}
};
-TEST_F( defaultdevicetype, test_utilities) {
+TEST_F( defaultdevicetype, test_utilities )
+{
test_utilities();
}
-TEST_F( defaultdevicetype, long_reduce) {
- TestReduce< long , Kokkos::DefaultExecutionSpace >( 100000 );
+TEST_F( defaultdevicetype, long_reduce )
+{
+ TestReduce< long, Kokkos::DefaultExecutionSpace >( 100000 );
}
-TEST_F( defaultdevicetype, double_reduce) {
- TestReduce< double , Kokkos::DefaultExecutionSpace >( 100000 );
+TEST_F( defaultdevicetype, double_reduce )
+{
+ TestReduce< double, Kokkos::DefaultExecutionSpace >( 100000 );
}
-TEST_F( defaultdevicetype, long_reduce_dynamic ) {
- TestReduceDynamic< long , Kokkos::DefaultExecutionSpace >( 100000 );
+TEST_F( defaultdevicetype, long_reduce_dynamic )
+{
+ TestReduceDynamic< long, Kokkos::DefaultExecutionSpace >( 100000 );
}
-TEST_F( defaultdevicetype, double_reduce_dynamic ) {
- TestReduceDynamic< double , Kokkos::DefaultExecutionSpace >( 100000 );
+TEST_F( defaultdevicetype, double_reduce_dynamic )
+{
+ TestReduceDynamic< double, Kokkos::DefaultExecutionSpace >( 100000 );
}
-TEST_F( defaultdevicetype, long_reduce_dynamic_view ) {
- TestReduceDynamicView< long , Kokkos::DefaultExecutionSpace >( 100000 );
+TEST_F( defaultdevicetype, long_reduce_dynamic_view )
+{
+ TestReduceDynamicView< long, Kokkos::DefaultExecutionSpace >( 100000 );
}
-
-TEST_F( defaultdevicetype , atomics )
+TEST_F( defaultdevicetype, atomics )
{
- const int loop_count = 1e4 ;
+ const int loop_count = 1e4;
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::DefaultExecutionSpace>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::DefaultExecutionSpace >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::DefaultExecutionSpace >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::DefaultExecutionSpace >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::DefaultExecutionSpace>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::DefaultExecutionSpace >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::DefaultExecutionSpace >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::DefaultExecutionSpace >( 100, 3 ) ) );
}
-/*TEST_F( defaultdevicetype , view_remap )
+/*TEST_F( defaultdevicetype, view_remap )
{
- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
-
- typedef Kokkos::View< double*[N1][N2][N3] ,
- Kokkos::LayoutRight ,
- Kokkos::DefaultExecutionSpace > output_type ;
-
- typedef Kokkos::View< int**[N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::DefaultExecutionSpace > input_type ;
-
- typedef Kokkos::View< int*[N0][N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::DefaultExecutionSpace > diff_type ;
-
- output_type output( "output" , N0 );
- input_type input ( "input" , N0 , N1 );
- diff_type diff ( "diff" , N0 );
-
- int value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- input(i0,i1,i2,i3) = ++value ;
- }}}}
-
- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
- Kokkos::deep_copy( output , input );
-
- value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- ++value ;
- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
- }}}}
-}*/
-
-//----------------------------------------------------------------------------
+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };
+
+ typedef Kokkos::View< double*[N1][N2][N3],
+ Kokkos::LayoutRight,
+ Kokkos::DefaultExecutionSpace > output_type;
+
+ typedef Kokkos::View< int**[N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::DefaultExecutionSpace > input_type;
+
+ typedef Kokkos::View< int*[N0][N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::DefaultExecutionSpace > diff_type;
+
+ output_type output( "output", N0 );
+ input_type input ( "input", N0, N1 );
+ diff_type diff ( "diff", N0 );
+
+ int value = 0;
+ for ( size_t i3 = 0; i3 < N3; ++i3 ) {
+ for ( size_t i2 = 0; i2 < N2; ++i2 ) {
+ for ( size_t i1 = 0; i1 < N1; ++i1 ) {
+ for ( size_t i0 = 0; i0 < N0; ++i0 ) {
+ input( i0, i1, i2, i3 ) = ++value;
+ }
+ }
+ }
+ }
+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
+ Kokkos::deep_copy( output, input );
+
+ value = 0;
+ for ( size_t i3 = 0; i3 < N3; ++i3 ) {
+ for ( size_t i2 = 0; i2 < N2; ++i2 ) {
+ for ( size_t i1 = 0; i1 < N1; ++i1 ) {
+ for ( size_t i0 = 0; i0 < N0; ++i0 ) {
+ ++value;
+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
+ }
+ }
+ }
+ }
+}*/
-TEST_F( defaultdevicetype , view_aggregate )
+TEST_F( defaultdevicetype, view_aggregate )
{
TestViewAggregate< Kokkos::DefaultExecutionSpace >();
}
-//----------------------------------------------------------------------------
-
-TEST_F( defaultdevicetype , scan )
+TEST_F( defaultdevicetype, scan )
{
- TestScan< Kokkos::DefaultExecutionSpace >::test_range( 1 , 1000 );
+ TestScan< Kokkos::DefaultExecutionSpace >::test_range( 1, 1000 );
TestScan< Kokkos::DefaultExecutionSpace >( 1000000 );
TestScan< Kokkos::DefaultExecutionSpace >( 10000000 );
Kokkos::DefaultExecutionSpace::fence();
}
-
-//----------------------------------------------------------------------------
-
-TEST_F( defaultdevicetype , compiler_macros )
+TEST_F( defaultdevicetype, compiler_macros )
{
ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::DefaultExecutionSpace >() ) );
}
-
-//----------------------------------------------------------------------------
-TEST_F( defaultdevicetype , cxx11 )
+TEST_F( defaultdevicetype, cxx11 )
{
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(1) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(2) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(3) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >(4) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >( 1 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >( 2 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >( 3 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::DefaultExecutionSpace >( 4 ) ) );
}
-TEST_F( defaultdevicetype , team_vector )
+#if !defined(KOKKOS_CUDA_CLANG_WORKAROUND) && !defined(KOKKOS_ARCH_PASCAL)
+TEST_F( defaultdevicetype, team_vector )
{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(4) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >(5) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 0 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 1 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 2 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 3 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 4 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::DefaultExecutionSpace >( 5 ) ) );
}
+#endif
-TEST_F( defaultdevicetype , malloc )
+TEST_F( defaultdevicetype, malloc )
{
- int* data = (int*) Kokkos::kokkos_malloc(100*sizeof(int));
- ASSERT_NO_THROW(data = (int*) Kokkos::kokkos_realloc(data,120*sizeof(int)));
- Kokkos::kokkos_free(data);
+ int* data = (int*) Kokkos::kokkos_malloc( 100 * sizeof( int ) );
+ ASSERT_NO_THROW( data = (int*) Kokkos::kokkos_realloc( data, 120 * sizeof( int ) ) );
+ Kokkos::kokkos_free( data );
- int* data2 = (int*) Kokkos::kokkos_malloc(0);
- ASSERT_TRUE(data2==NULL);
- Kokkos::kokkos_free(data2);
+ int* data2 = (int*) Kokkos::kokkos_malloc( 0 );
+ ASSERT_TRUE( data2 == NULL );
+ Kokkos::kokkos_free( data2 );
}
-} // namespace test
+} // namespace Test
#endif
diff --git a/lib/kokkos/core/unit_test/TestHWLOC.cpp b/lib/kokkos/core/unit_test/TestHWLOC.cpp
index 1637dec5d..d03d9b816 100644
--- a/lib/kokkos/core/unit_test/TestHWLOC.cpp
+++ b/lib/kokkos/core/unit_test/TestHWLOC.cpp
@@ -1,69 +1,67 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <iostream>
+
#include <Kokkos_hwloc.hpp>
namespace Test {
class hwloc : public ::testing::Test {
protected:
- static void SetUpTestCase()
- {}
+ static void SetUpTestCase() {}
- static void TearDownTestCase()
- {}
+ static void TearDownTestCase() {}
};
-TEST_F( hwloc, query)
+TEST_F( hwloc, query )
{
std::cout << " NUMA[" << Kokkos::hwloc::get_available_numa_count() << "]"
<< " CORE[" << Kokkos::hwloc::get_available_cores_per_numa() << "]"
<< " PU[" << Kokkos::hwloc::get_available_threads_per_core() << "]"
- << std::endl ;
-}
-
+ << std::endl;
}
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestMDRange.hpp b/lib/kokkos/core/unit_test/TestMDRange.hpp
index 9894d1ce6..1dc349cc1 100644
--- a/lib/kokkos/core/unit_test/TestMDRange.hpp
+++ b/lib/kokkos/core/unit_test/TestMDRange.hpp
@@ -1,555 +1,1721 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <stdio.h>
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
-/*--------------------------------------------------------------------------*/
-
namespace Test {
+
namespace {
template <typename ExecSpace >
struct TestMDRange_2D {
+ using DataType = int;
+ using ViewType = typename Kokkos::View< DataType**, ExecSpace >;
+ using HostViewType = typename ViewType::HostMirror;
- using DataType = int ;
- using ViewType = typename Kokkos::View< DataType** , ExecSpace > ;
- using HostViewType = typename ViewType::HostMirror ;
+ ViewType input_view;
- ViewType input_view ;
+ TestMDRange_2D( const DataType N0, const DataType N1 ) : input_view( "input_view", N0, N1 ) {}
- TestMDRange_2D( const DataType N0, const DataType N1 ) : input_view("input_view", N0, N1) {}
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int i, const int j ) const
+ {
+ input_view( i, j ) = 1;
+ }
KOKKOS_INLINE_FUNCTION
- void operator()( const int i , const int j ) const
+ void operator()( const int i, const int j, double &lsum ) const
{
- input_view(i,j) = 1;
+ lsum += input_view( i, j ) * 2;
}
+ // tagged operators
+ struct InitTag {};
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const InitTag &, const int i, const int j ) const
+ {
+ input_view( i, j ) = 3;
+ }
- static void test_for2( const int64_t N0, const int64_t N1 )
+ static void test_reduce2( const int N0, const int N1 )
{
+ using namespace Kokkos::Experimental;
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 3, 3 } } );
+
+ TestMDRange_2D functor( N0, N1 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 2, 6 } } );
+
+ TestMDRange_2D functor( N0, N1 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 2, 6 } } );
+
+ TestMDRange_2D functor( N0, N1 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 2, 6 } } );
+
+ TestMDRange_2D functor( N0, N1 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 2, 6 } } );
+
+ TestMDRange_2D functor( N0, N1 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 2, 6 } } );
+
+ TestMDRange_2D functor( N0, N1 );
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 );
+ }
+ } // end test_reduce2
+
+ static void test_for2( const int N0, const int N1 )
+ {
using namespace Kokkos::Experimental;
{
- using range_type = MDRangePolicy< ExecSpace, Rank<2>, Kokkos::IndexType<int> >;
- range_type range( {0,0}, {N0,N1} );
- TestMDRange_2D functor(N0,N1);
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2>, Kokkos::IndexType<int>, InitTag > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 3, 3 } } );
+ TestMDRange_2D functor( N0, N1 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- if ( h_view(i,j) != 1 ) {
- ++counter;
- }
- }}
- if ( counter != 0 )
- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ {
+ if ( h_view( i, j ) != 3 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "Default Layouts + InitTag op(): Errors in test_for2; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Default, Iterate::Default >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2>, InitTag > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0}, {N0,N1} );
- TestMDRange_2D functor(N0,N1);
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 3, 3 } } );
+ TestMDRange_2D functor( N0, N1 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- if ( h_view(i,j) != 1 ) {
- ++counter;
- }
- }}
- if ( counter != 0 )
- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ {
+ if ( h_view( i, j ) != 3 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "Default Layouts + InitTag op(): Errors in test_for2; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Default, Iterate::Flat >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2>, InitTag > range_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0}, {N0,N1} );
- TestMDRange_2D functor(N0,N1);
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } } );
+ TestMDRange_2D functor( N0, N1 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- if ( h_view(i,j) != 1 ) {
- ++counter;
- }
- }}
- if ( counter != 0 )
- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ {
+ if ( h_view( i, j ) != 3 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "Default Layouts + InitTag op() + Default Tile: Errors in test_for2; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Flat >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0}, {N0,N1} );
- TestMDRange_2D functor(N0,N1);
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 3, 3 } } );
+ TestMDRange_2D functor( N0, N1 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- if ( h_view(i,j) != 1 ) {
- ++counter;
- }
- }}
- if ( counter != 0 )
- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ {
+ if ( h_view( i, j ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "No info: Errors in test_for2; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Left, Iterate::Flat >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0}, {N0,N1} );
- TestMDRange_2D functor(N0,N1);
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 4, 4 } } );
+ TestMDRange_2D functor( N0, N1 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- if ( h_view(i,j) != 1 ) {
- ++counter;
- }
- }}
- if ( counter != 0 )
- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ {
+ if ( h_view( i, j ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "D D: Errors in test_for2; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Left , Iterate::Left >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0}, {N0,N1}, {3,3} );
- TestMDRange_2D functor(N0,N1);
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 3, 3 } } );
+ TestMDRange_2D functor( N0, N1 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- if ( h_view(i,j) != 1 ) {
- ++counter;
- }
- }}
- if ( counter != 0 )
- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ {
+ if ( h_view( i, j ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "L L: Errors in test_for2; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Left , Iterate::Right >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0}, {N0,N1}, {7,7} );
- TestMDRange_2D functor(N0,N1);
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 7, 7 } } );
+ TestMDRange_2D functor( N0, N1 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- if ( h_view(i,j) != 1 ) {
- ++counter;
- }
- }}
- if ( counter != 0 )
- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ {
+ if ( h_view( i, j ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "L R: Errors in test_for2; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Left >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0}, {N0,N1}, {16,16} );
- TestMDRange_2D functor(N0,N1);
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 16, 16 } } );
+ TestMDRange_2D functor( N0, N1 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- if ( h_view(i,j) != 1 ) {
- ++counter;
- }
- }}
- if ( counter != 0 )
- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ {
+ if ( h_view( i, j ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "R L: Errors in test_for2; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Right >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<2, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0}, {N0,N1}, {5,16} );
- TestMDRange_2D functor(N0,N1);
+ range_type range( point_type{ { 0, 0 } }, point_type{ { N0, N1 } }, tile_type{ { 5, 16 } } );
+ TestMDRange_2D functor( N0, N1 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- if ( h_view(i,j) != 1 ) {
- ++counter;
- }
- }}
- if ( counter != 0 )
- printf(" Errors in test_for2; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ {
+ if ( h_view( i, j ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "R R: Errors in test_for2; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
-
- } //end test_for2
-}; //MDRange_2D
+ } // end test_for2
+}; // MDRange_2D
template <typename ExecSpace >
struct TestMDRange_3D {
+ using DataType = int;
+ using ViewType = typename Kokkos::View< DataType***, ExecSpace >;
+ using HostViewType = typename ViewType::HostMirror;
- using DataType = int ;
- using ViewType = typename Kokkos::View< DataType*** , ExecSpace > ;
- using HostViewType = typename ViewType::HostMirror ;
+ ViewType input_view;
- ViewType input_view ;
+ TestMDRange_3D( const DataType N0, const DataType N1, const DataType N2 ) : input_view( "input_view", N0, N1, N2 ) {}
- TestMDRange_3D( const DataType N0, const DataType N1, const DataType N2 ) : input_view("input_view", N0, N1, N2) {}
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int i, const int j, const int k ) const
+ {
+ input_view( i, j, k ) = 1;
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int i, const int j, const int k, double &lsum ) const
+ {
+ lsum += input_view( i, j, k ) * 2;
+ }
+ // tagged operators
+ struct InitTag {};
KOKKOS_INLINE_FUNCTION
- void operator()( const int i , const int j , const int k ) const
+ void operator()( const InitTag &, const int i, const int j, const int k ) const
{
- input_view(i,j,k) = 1;
+ input_view( i, j, k ) = 3;
}
- static void test_for3( const int64_t N0, const int64_t N1, const int64_t N2 )
+ static void test_reduce3( const int N0, const int N1, const int N2 )
{
using namespace Kokkos::Experimental;
{
- using range_type = MDRangePolicy< ExecSpace, Rank<3>, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 3, 3, 3 } } );
+
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Default, Iterate::Default >, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 6 } } );
+
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 6 } } );
+
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 6 } } );
+
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0,0}, {N0,N1,N2} );
- TestMDRange_3D functor(N0,N1,N2);
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 6 } } );
+
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 6 } } );
+
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+ double sum = 0.0;
+ md_parallel_reduce( range, functor, sum );
+
+ ASSERT_EQ( sum, 2 * N0 * N1 * N2 );
+ }
+ } // end test_reduce3
+
+ static void test_for3( const int N0, const int N1, const int N2 )
+ {
+ using namespace Kokkos::Experimental;
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3> > range_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } } );
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ {
+ if ( h_view( i, j, k ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "Defaults + No Tile: Errors in test_for3; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3>, Kokkos::IndexType<int>, InitTag > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 3, 3, 3 } } );
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ {
+ if ( h_view( i, j, k ) != 3 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "Defaults + InitTag op(): Errors in test_for3; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 3, 3, 3 } } );
+
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ {
+ if ( h_view( i, j, k ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 3, 3, 3 } } );
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ {
+ if ( h_view( i, j, k ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 2 } } );
+ TestMDRange_3D functor( N0, N1, N2 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- for ( int k=0; k<N2; ++k ) {
- if ( h_view(i,j,k) != 1 ) {
- ++counter;
- }
- }}}
- if ( counter != 0 )
- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ {
+ if ( h_view( i, j, k ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Default, Iterate::Default >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0,0}, {N0,N1,N2} );
- TestMDRange_3D functor(N0,N1,N2);
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 3, 5, 7 } } );
+ TestMDRange_3D functor( N0, N1, N2 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- for ( int k=0; k<N2; ++k ) {
- if ( h_view(i,j,k) != 1 ) {
- ++counter;
- }
- }}}
- if ( counter != 0 )
- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ {
+ if ( h_view( i, j, k ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Flat, Iterate::Default>, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0,0}, {N0,N1,N2} );
- TestMDRange_3D functor(N0,N1,N2);
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 8, 8, 8 } } );
+ TestMDRange_3D functor( N0, N1, N2 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- for ( int k=0; k<N2; ++k ) {
- if ( h_view(i,j,k) != 1 ) {
- ++counter;
- }
- }}}
- if ( counter != 0 )
- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ {
+ if ( h_view( i, j, k ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Flat, Iterate::Flat >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0 } }, point_type{ { N0, N1, N2 } }, tile_type{ { 2, 4, 2 } } );
+ TestMDRange_3D functor( N0, N1, N2 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ {
+ if ( h_view( i, j, k ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for3; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+ } // end test_for3
+};
+
+template <typename ExecSpace >
+struct TestMDRange_4D {
+ using DataType = int;
+ using ViewType = typename Kokkos::View< DataType****, ExecSpace >;
+ using HostViewType = typename ViewType::HostMirror;
+
+ ViewType input_view;
+
+ TestMDRange_4D( const DataType N0, const DataType N1, const DataType N2, const DataType N3 ) : input_view( "input_view", N0, N1, N2, N3 ) {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int i, const int j, const int k, const int l ) const
+ {
+ input_view( i, j, k, l ) = 1;
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int i, const int j, const int k, const int l, double &lsum ) const
+ {
+ lsum += input_view( i, j, k, l ) * 2;
+ }
+
+ // tagged operators
+ struct InitTag {};
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const InitTag &, const int i, const int j, const int k, const int l ) const
+ {
+ input_view( i, j, k, l ) = 3;
+ }
+
+ static void test_for4( const int N0, const int N1, const int N2, const int N3 )
+ {
+ using namespace Kokkos::Experimental;
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4> > range_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0,0}, {N0,N1,N2} );
- TestMDRange_3D functor(N0,N1,N2);
+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } } );
+ TestMDRange_4D functor( N0, N1, N2, N3 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- for ( int k=0; k<N2; ++k ) {
- if ( h_view(i,j,k) != 1 ) {
- ++counter;
- }
- }}}
- if ( counter != 0 )
- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ {
+ if ( h_view( i, j, k, l ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "Defaults + No Tile: Errors in test_for4; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Flat >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4>, Kokkos::IndexType<int>, InitTag > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0,0}, {N0,N1,N2} );
- TestMDRange_3D functor(N0,N1,N2);
+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 3, 11, 3, 3 } } );
+ TestMDRange_4D functor( N0, N1, N2, N3 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- for ( int k=0; k<N2; ++k ) {
- if ( h_view(i,j,k) != 1 ) {
- ++counter;
- }
- }}}
- if ( counter != 0 )
- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ {
+ if ( h_view( i, j, k, l ) != 3 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf("Defaults +m_tile > m_upper dim2 InitTag op(): Errors in test_for4; mismatches = %d\n\n",counter);
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Flat >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );
- range_type range( {0,0,0}, {N0,N1,N2} );
- TestMDRange_3D functor(N0,N1,N2);
+ TestMDRange_4D functor( N0, N1, N2, N3 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- for ( int k=0; k<N2; ++k ) {
- if ( h_view(i,j,k) != 1 ) {
- ++counter;
- }
- }}}
- if ( counter != 0 )
- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ {
+ if ( h_view( i, j, k, l ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Left >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0,0}, {N0,N1,N2}, {2,4,2} );
- TestMDRange_3D functor(N0,N1,N2);
+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );
+
+ TestMDRange_4D functor( N0, N1, N2, N3 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- for ( int k=0; k<N2; ++k ) {
- if ( h_view(i,j,k) != 1 ) {
- ++counter;
- }
- }}}
- if ( counter != 0 )
- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ {
+ if ( h_view( i, j, k, l ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Left, Iterate::Right >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );
- range_type range( {0,0,0}, {N0,N1,N2}, {3,5,7} );
- TestMDRange_3D functor(N0,N1,N2);
+ TestMDRange_4D functor( N0, N1, N2, N3 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- for ( int k=0; k<N2; ++k ) {
- if ( h_view(i,j,k) != 1 ) {
- ++counter;
- }
- }}}
- if ( counter != 0 )
- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ {
+ if ( h_view( i, j, k, l ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Left >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );
- range_type range( {0,0,0}, {N0,N1,N2}, {8,8,8} );
- TestMDRange_3D functor(N0,N1,N2);
+ TestMDRange_4D functor( N0, N1, N2, N3 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- for ( int k=0; k<N2; ++k ) {
- if ( h_view(i,j,k) != 1 ) {
- ++counter;
- }
- }}}
- if ( counter != 0 )
- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ {
+ if ( h_view( i, j, k, l ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
{
- using range_type = MDRangePolicy< ExecSpace, Rank<3, Iterate::Right, Iterate::Right >, Kokkos::IndexType<int> >;
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
- range_type range( {0,0,0}, {N0,N1,N2}, {2,4,2} );
- TestMDRange_3D functor(N0,N1,N2);
+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );
+
+ TestMDRange_4D functor( N0, N1, N2, N3 );
md_parallel_for( range, functor );
HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
- Kokkos::deep_copy( h_view , functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
int counter = 0;
- for ( int i=0; i<N0; ++i ) {
- for ( int j=0; j<N1; ++j ) {
- for ( int k=0; k<N2; ++k ) {
- if ( h_view(i,j,k) != 1 ) {
- ++counter;
- }
- }}}
- if ( counter != 0 )
- printf(" Errors in test_for3; mismatches = %d\n\n",counter);
- ASSERT_EQ( counter , 0 );
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ {
+ if ( h_view( i, j, k, l ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
}
- } //end test_for3
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<4, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3 } }, tile_type{ { 4, 4, 4, 4 } } );
+
+ TestMDRange_4D functor( N0, N1, N2, N3 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ {
+ if ( h_view( i, j, k, l ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for4; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+ } // end test_for4
};
-} /* namespace */
-} /* namespace Test */
+template <typename ExecSpace >
+struct TestMDRange_5D {
+ using DataType = int;
+ using ViewType = typename Kokkos::View< DataType*****, ExecSpace >;
+ using HostViewType = typename ViewType::HostMirror;
+
+ ViewType input_view;
+
+ TestMDRange_5D( const DataType N0, const DataType N1, const DataType N2, const DataType N3, const DataType N4 ) : input_view( "input_view", N0, N1, N2, N3, N4 ) {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int i, const int j, const int k, const int l, const int m ) const
+ {
+ input_view( i, j, k, l, m ) = 1;
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int i, const int j, const int k, const int l, const int m, double &lsum ) const
+ {
+ lsum += input_view( i, j, k, l, m ) * 2;
+ }
+
+ // tagged operators
+ struct InitTag {};
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const InitTag &, const int i, const int j, const int k, const int l, const int m ) const
+ {
+ input_view( i, j, k, l, m ) = 3;
+ }
+
+ static void test_for5( const int N0, const int N1, const int N2, const int N3, const int N4 )
+ {
+ using namespace Kokkos::Experimental;
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5> > range_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } } );
+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ {
+ if ( h_view( i, j, k, l, m ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "Defaults + No Tile: Errors in test_for5; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5>, Kokkos::IndexType<int>, InitTag > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 3, 3, 3, 3, 7 } } );
+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ {
+ if ( h_view( i, j, k, l, m ) != 3 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "Defaults + InitTag op(): Errors in test_for5; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
+
+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ {
+ if ( h_view( i, j, k, l, m ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
+
+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ {
+ if ( h_view( i, j, k, l, m ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
+
+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ {
+ if ( h_view( i, j, k, l, m ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
+
+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ {
+ if ( h_view( i, j, k, l, m ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
+
+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ {
+ if ( h_view( i, j, k, l, m ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<5, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4 } }, tile_type{ { 4, 4, 4, 2, 2 } } );
+
+ TestMDRange_5D functor( N0, N1, N2, N3, N4 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ {
+ if ( h_view( i, j, k, l, m ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for5; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+ }
+};
+
+template <typename ExecSpace >
+struct TestMDRange_6D {
+ using DataType = int;
+ using ViewType = typename Kokkos::View< DataType******, ExecSpace >;
+ using HostViewType = typename ViewType::HostMirror;
+
+ ViewType input_view;
+
+ TestMDRange_6D( const DataType N0, const DataType N1, const DataType N2, const DataType N3, const DataType N4, const DataType N5 ) : input_view( "input_view", N0, N1, N2, N3, N4, N5 ) {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int i, const int j, const int k, const int l, const int m, const int n ) const
+ {
+ input_view( i, j, k, l, m, n ) = 1;
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int i, const int j, const int k, const int l, const int m, const int n, double &lsum ) const
+ {
+ lsum += input_view( i, j, k, l, m, n ) * 2;
+ }
+
+ // tagged operators
+ struct InitTag {};
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const InitTag &, const int i, const int j, const int k, const int l, const int m, const int n ) const
+ {
+ input_view( i, j, k, l, m, n ) = 3;
+ }
+
+ static void test_for6( const int N0, const int N1, const int N2, const int N3, const int N4, const int N5 )
+ {
+ using namespace Kokkos::Experimental;
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6> > range_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } } );
+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ for ( int n = 0; n < N5; ++n )
+ {
+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "Defaults + No Tile: Errors in test_for6; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6>, Kokkos::IndexType<int>, InitTag > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 3, 3, 3, 3, 2, 3 } } ); //tile dims 3,3,3,3,3,3 more than cuda can handle with debugging
+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ for ( int n = 0; n < N5; ++n )
+ {
+ if ( h_view( i, j, k, l, m, n ) != 3 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( "Defaults + InitTag op(): Errors in test_for6; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
+
+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ for ( int n = 0; n < N5; ++n )
+ {
+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6, Iterate::Default, Iterate::Default>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
+
+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ for ( int n = 0; n < N5; ++n )
+ {
+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6, Iterate::Left, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
+
+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ for ( int n = 0; n < N5; ++n )
+ {
+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6, Iterate::Left, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
+
+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ for ( int n = 0; n < N5; ++n )
+ {
+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6, Iterate::Right, Iterate::Left>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
+
+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ for ( int n = 0; n < N5; ++n )
+ {
+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+
+ {
+ typedef typename Kokkos::Experimental::MDRangePolicy< ExecSpace, Rank<6, Iterate::Right, Iterate::Right>, Kokkos::IndexType<int> > range_type;
+ typedef typename range_type::tile_type tile_type;
+ typedef typename range_type::point_type point_type;
+
+ range_type range( point_type{ { 0, 0, 0, 0, 0, 0 } }, point_type{ { N0, N1, N2, N3, N4, N5 } }, tile_type{ { 4, 4, 4, 2, 2, 2 } } );
+
+ TestMDRange_6D functor( N0, N1, N2, N3, N4, N5 );
+
+ md_parallel_for( range, functor );
+
+ HostViewType h_view = Kokkos::create_mirror_view( functor.input_view );
+ Kokkos::deep_copy( h_view, functor.input_view );
+
+ int counter = 0;
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < N2; ++k )
+ for ( int l = 0; l < N3; ++l )
+ for ( int m = 0; m < N4; ++m )
+ for ( int n = 0; n < N5; ++n )
+ {
+ if ( h_view( i, j, k, l, m, n ) != 1 ) {
+ ++counter;
+ }
+ }
+
+ if ( counter != 0 ) {
+ printf( " Errors in test_for6; mismatches = %d\n\n", counter );
+ }
+
+ ASSERT_EQ( counter, 0 );
+ }
+ }
+};
-/*--------------------------------------------------------------------------*/
+} // namespace
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestMemoryPool.hpp b/lib/kokkos/core/unit_test/TestMemoryPool.hpp
index 868e64e9d..925f0e35e 100644
--- a/lib/kokkos/core/unit_test/TestMemoryPool.hpp
+++ b/lib/kokkos/core/unit_test/TestMemoryPool.hpp
@@ -1,820 +1,820 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_UNITTEST_MEMPOOL_HPP
#define KOKKOS_UNITTEST_MEMPOOL_HPP
#include <stdio.h>
#include <iostream>
#include <cmath>
#include <algorithm>
#include <impl/Kokkos_Timer.hpp>
//#define TESTMEMORYPOOL_PRINT
//#define TESTMEMORYPOOL_PRINT_STATUS
#define STRIDE 1
#ifdef KOKKOS_ENABLE_CUDA
#define STRIDE_ALLOC 32
#else
#define STRIDE_ALLOC 1
#endif
namespace TestMemoryPool {
struct pointer_obj {
uint64_t * ptr;
KOKKOS_INLINE_FUNCTION
pointer_obj() : ptr( 0 ) {}
};
struct pointer_obj2 {
void * ptr;
size_t size;
KOKKOS_INLINE_FUNCTION
pointer_obj2() : ptr( 0 ), size( 0 ) {}
};
template < typename PointerView, typename Allocator >
struct allocate_memory {
typedef typename PointerView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
PointerView m_pointers;
size_t m_chunk_size;
Allocator m_mempool;
allocate_memory( PointerView & ptrs, size_t num_ptrs,
size_t cs, Allocator & m )
: m_pointers( ptrs ), m_chunk_size( cs ), m_mempool( m )
{
// Initialize the view with the out degree of each vertex.
Kokkos::parallel_for( num_ptrs * STRIDE_ALLOC, *this );
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const
{
if ( i % STRIDE_ALLOC == 0 ) {
m_pointers[i / STRIDE_ALLOC].ptr =
static_cast< uint64_t * >( m_mempool.allocate( m_chunk_size ) );
}
}
};
template < typename PointerView >
struct count_invalid_memory {
typedef typename PointerView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
typedef uint64_t value_type;
PointerView m_pointers;
uint64_t & m_result;
count_invalid_memory( PointerView & ptrs, size_t num_ptrs, uint64_t & res )
: m_pointers( ptrs ), m_result( res )
{
// Initialize the view with the out degree of each vertex.
Kokkos::parallel_reduce( num_ptrs * STRIDE, *this, m_result );
}
KOKKOS_INLINE_FUNCTION
void init( value_type & v ) const
{ v = 0; }
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & dst, volatile value_type const & src ) const
{ dst += src; }
KOKKOS_INLINE_FUNCTION
void operator()( size_type i, value_type & r ) const
{
if ( i % STRIDE == 0 ) {
r += ( m_pointers[i / STRIDE].ptr == 0 );
}
}
};
template < typename PointerView >
struct fill_memory {
typedef typename PointerView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
PointerView m_pointers;
fill_memory( PointerView & ptrs, size_t num_ptrs ) : m_pointers( ptrs )
{
// Initialize the view with the out degree of each vertex.
Kokkos::parallel_for( num_ptrs * STRIDE, *this );
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const
{
if ( i % STRIDE == 0 ) {
- *m_pointers[i / STRIDE].ptr = i / STRIDE ;
+ *m_pointers[i / STRIDE].ptr = i / STRIDE;
}
}
};
template < typename PointerView >
struct sum_memory {
typedef typename PointerView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
typedef uint64_t value_type;
PointerView m_pointers;
uint64_t & m_result;
sum_memory( PointerView & ptrs, size_t num_ptrs, uint64_t & res )
: m_pointers( ptrs ), m_result( res )
{
// Initialize the view with the out degree of each vertex.
Kokkos::parallel_reduce( num_ptrs * STRIDE, *this, m_result );
}
KOKKOS_INLINE_FUNCTION
void init( value_type & v ) const
{ v = 0; }
KOKKOS_INLINE_FUNCTION
void join( volatile value_type & dst, volatile value_type const & src ) const
{ dst += src; }
KOKKOS_INLINE_FUNCTION
void operator()( size_type i, value_type & r ) const
{
if ( i % STRIDE == 0 ) {
r += *m_pointers[i / STRIDE].ptr;
}
}
};
template < typename PointerView, typename Allocator >
struct deallocate_memory {
typedef typename PointerView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
PointerView m_pointers;
size_t m_chunk_size;
Allocator m_mempool;
deallocate_memory( PointerView & ptrs, size_t num_ptrs,
size_t cs, Allocator & m )
: m_pointers( ptrs ), m_chunk_size( cs ), m_mempool( m )
{
// Initialize the view with the out degree of each vertex.
Kokkos::parallel_for( num_ptrs * STRIDE, *this );
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const
{
if ( i % STRIDE == 0 ) {
m_mempool.deallocate( m_pointers[i / STRIDE].ptr, m_chunk_size );
}
}
};
template < typename WorkView, typename PointerView, typename ScalarView,
typename Allocator >
struct allocate_deallocate_memory {
typedef typename WorkView::execution_space execution_space;
typedef typename execution_space::size_type size_type;
WorkView m_work;
PointerView m_pointers;
ScalarView m_ptrs_front;
ScalarView m_ptrs_back;
Allocator m_mempool;
allocate_deallocate_memory( WorkView & w, size_t work_size, PointerView & p,
ScalarView pf, ScalarView pb, Allocator & m )
: m_work( w ), m_pointers( p ), m_ptrs_front( pf ), m_ptrs_back( pb ),
m_mempool( m )
{
// Initialize the view with the out degree of each vertex.
Kokkos::parallel_for( work_size * STRIDE_ALLOC, *this );
}
KOKKOS_INLINE_FUNCTION
void operator()( size_type i ) const
{
if ( i % STRIDE_ALLOC == 0 ) {
unsigned my_work = m_work[i / STRIDE_ALLOC];
if ( ( my_work & 1 ) == 0 ) {
// Allocation.
size_t pos = Kokkos::atomic_fetch_add( &m_ptrs_back(), 1 );
size_t alloc_size = my_work >> 1;
m_pointers[pos].ptr = m_mempool.allocate( alloc_size );
m_pointers[pos].size = alloc_size;
}
else {
// Deallocation.
size_t pos = Kokkos::atomic_fetch_add( &m_ptrs_front(), 1 );
m_mempool.deallocate( m_pointers[pos].ptr, m_pointers[pos].size );
}
}
}
};
#define PRECISION 6
#define SHIFTW 24
#define SHIFTW2 12
template < typename F >
void print_results( const std::string & text, F elapsed_time )
{
std::cout << std::setw( SHIFTW ) << text << std::setw( SHIFTW2 )
<< std::fixed << std::setprecision( PRECISION ) << elapsed_time
<< std::endl;
}
template < typename F, typename T >
void print_results( const std::string & text, unsigned long long width,
F elapsed_time, T result )
{
std::cout << std::setw( SHIFTW ) << text << std::setw( SHIFTW2 )
<< std::fixed << std::setprecision( PRECISION ) << elapsed_time
<< " " << std::setw( width ) << result << std::endl;
}
template < typename F >
void print_results( const std::string & text, unsigned long long width,
F elapsed_time, const std::string & result )
{
std::cout << std::setw( SHIFTW ) << text << std::setw( SHIFTW2 )
<< std::fixed << std::setprecision( PRECISION ) << elapsed_time
<< " " << std::setw( width ) << result << std::endl;
}
// This test slams allocation and deallocation in a worse than real-world usage
// scenario to see how bad the thread-safety really is by having a loop where
// all threads allocate and a subsequent loop where all threads deallocate.
// All of the allocation requests are for equal-sized chunks that are the base
// chunk size of the memory pool. It also tests initialization of the memory
// pool and breaking large chunks into smaller chunks to fulfill allocation
// requests. It verifies that MemoryPool(), allocate(), and deallocate() work
// correctly.
template < class Device >
bool test_mempool( size_t chunk_size, size_t total_size )
{
typedef typename Device::execution_space execution_space;
typedef typename Device::memory_space memory_space;
typedef Device device_type;
typedef Kokkos::View< pointer_obj *, device_type > pointer_view;
typedef Kokkos::Experimental::MemoryPool< device_type > pool_memory_space;
uint64_t result = 0;
size_t num_chunks = total_size / chunk_size;
bool return_val = true;
pointer_view pointers( "pointers", num_chunks );
#ifdef TESTMEMORYPOOL_PRINT
std::cout << "*** test_mempool() ***" << std::endl
<< std::setw( SHIFTW ) << "chunk_size: " << std::setw( 12 )
<< chunk_size << std::endl
<< std::setw( SHIFTW ) << "total_size: " << std::setw( 12 )
<< total_size << std::endl
<< std::setw( SHIFTW ) << "num_chunks: " << std::setw( 12 )
<< num_chunks << std::endl;
double elapsed_time = 0;
Kokkos::Timer timer;
#endif
pool_memory_space mempool( memory_space(), total_size * 1.2, 20 );
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "initialize mempool: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_memory< pointer_view, pool_memory_space >
am( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "allocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
count_invalid_memory< pointer_view > sm( pointers, num_chunks, result );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "invalid chunks: ", 16, elapsed_time, result );
timer.reset();
#endif
{
fill_memory< pointer_view > fm( pointers, num_chunks );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "fill chunks: ", elapsed_time );
timer.reset();
#endif
{
sum_memory< pointer_view > sm( pointers, num_chunks, result );
}
execution_space::fence();
#ifdef TESTMEMORYPOOL_PRINT
elapsed_time = timer.seconds();
print_results( "sum chunks: ", 16, elapsed_time, result );
#endif
if ( result != ( num_chunks * ( num_chunks - 1 ) ) / 2 ) {
std::cerr << "Invalid sum value in memory." << std::endl;
return_val = false;
}
#ifdef TESTMEMORYPOOL_PRINT
timer.reset();
#endif
{
deallocate_memory< pointer_view, pool_memory_space >
dm( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "deallocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_memory< pointer_view, pool_memory_space >
am( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "allocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
count_invalid_memory< pointer_view > sm( pointers, num_chunks, result );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "invalid chunks: ", 16, elapsed_time, result );
timer.reset();
#endif
{
fill_memory< pointer_view > fm( pointers, num_chunks );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "fill chunks: ", elapsed_time );
timer.reset();
#endif
{
sum_memory< pointer_view > sm( pointers, num_chunks, result );
}
execution_space::fence();
#ifdef TESTMEMORYPOOL_PRINT
elapsed_time = timer.seconds();
print_results( "sum chunks: ", 16, elapsed_time, result );
#endif
if ( result != ( num_chunks * ( num_chunks - 1 ) ) / 2 ) {
std::cerr << "Invalid sum value in memory." << std::endl;
return_val = false;
}
#ifdef TESTMEMORYPOOL_PRINT
timer.reset();
#endif
{
deallocate_memory< pointer_view, pool_memory_space >
dm( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "deallocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
#endif
return return_val;
}
template < typename T >
T smallest_power2_ge( T val )
{
// Find the most significant nonzero bit.
int first_nonzero_bit = Kokkos::Impl::bit_scan_reverse( val );
- // If val is an integral power of 2, ceil( log2(val) ) is equal to the
+ // If val is an integral power of 2, ceil( log2( val ) ) is equal to the
// most significant nonzero bit. Otherwise, you need to add 1.
int lg2_size = first_nonzero_bit +
!Kokkos::Impl::is_integral_power_of_two( val );
- return T(1) << T(lg2_size);
+ return T( 1 ) << T( lg2_size );
}
// This test makes allocation requests for multiple sizes and interleaves
// allocation and deallocation.
//
// There are 3 phases. The first phase does only allocations to build up a
// working state for the allocator. The second phase interleaves allocations
// and deletions. The third phase does only deallocations to undo all the
// allocations from the first phase. By building first to a working state,
// allocations and deallocations can happen in any order for the second phase.
// Each phase performs on multiple chunk sizes.
template < class Device >
void test_mempool2( unsigned base_chunk_size, size_t num_chunk_sizes,
size_t phase1_size, size_t phase2_size )
{
#ifdef TESTMEMORYPOOL_PRINT
typedef typename Device::execution_space execution_space;
#endif
typedef typename Device::memory_space memory_space;
typedef Device device_type;
typedef Kokkos::View< unsigned *, device_type > work_view;
typedef Kokkos::View< size_t, device_type > scalar_view;
typedef Kokkos::View< pointer_obj2 *, device_type > pointer_view;
typedef Kokkos::Experimental::MemoryPool< device_type > pool_memory_space;
enum {
MIN_CHUNK_SIZE = 64,
MIN_BASE_CHUNK_SIZE = MIN_CHUNK_SIZE / 2 + 1
};
// Make sure the base chunk size is at least MIN_BASE_CHUNK_SIZE bytes, so
// all the different chunk sizes translate to different block sizes for the
// allocator.
if ( base_chunk_size < MIN_BASE_CHUNK_SIZE ) {
base_chunk_size = MIN_BASE_CHUNK_SIZE;
}
// Get the smallest power of 2 >= the base chunk size. The size must be
// >= MIN_CHUNK_SIZE, though.
unsigned ceil_base_chunk_size = smallest_power2_ge( base_chunk_size );
if ( ceil_base_chunk_size < MIN_CHUNK_SIZE ) {
ceil_base_chunk_size = MIN_CHUNK_SIZE;
}
// Make sure the phase 1 size is multiples of num_chunk_sizes.
phase1_size = ( ( phase1_size + num_chunk_sizes - 1 ) / num_chunk_sizes ) *
num_chunk_sizes;
- // Make sure the phase 2 size is multiples of (2 * num_chunk_sizes).
+ // Make sure the phase 2 size is multiples of ( 2 * num_chunk_sizes ).
phase2_size =
( ( phase2_size + 2 * num_chunk_sizes - 1 ) / ( 2 * num_chunk_sizes ) ) *
2 * num_chunk_sizes;
// The phase2 size must be <= twice the phase1 size so that deallocations
// can't happen before allocations.
if ( phase2_size > 2 * phase1_size ) phase2_size = 2 * phase1_size;
size_t phase3_size = phase1_size;
size_t half_phase2_size = phase2_size / 2;
// Each entry in the work views has the following format. The least
// significant bit indicates allocation (0) vs. deallocation (1). For
// allocation, the other bits indicate the desired allocation size.
// Initialize the phase 1 work view with an equal number of allocations for
// each chunk size.
work_view phase1_work( "Phase 1 Work", phase1_size );
typename work_view::HostMirror host_phase1_work =
- create_mirror_view(phase1_work);
+ create_mirror_view( phase1_work );
size_t inner_size = phase1_size / num_chunk_sizes;
unsigned chunk_size = base_chunk_size;
for ( size_t i = 0; i < num_chunk_sizes; ++i ) {
for ( size_t j = 0; j < inner_size; ++j ) {
host_phase1_work[i * inner_size + j] = chunk_size << 1;
}
chunk_size *= 2;
}
std::random_shuffle( host_phase1_work.ptr_on_device(),
host_phase1_work.ptr_on_device() + phase1_size );
deep_copy( phase1_work, host_phase1_work );
// Initialize the phase 2 work view with half allocations and half
// deallocations with an equal number of allocations for each chunk size.
work_view phase2_work( "Phase 2 Work", phase2_size );
typename work_view::HostMirror host_phase2_work =
- create_mirror_view(phase2_work);
+ create_mirror_view( phase2_work );
inner_size = half_phase2_size / num_chunk_sizes;
chunk_size = base_chunk_size;
for ( size_t i = 0; i < num_chunk_sizes; ++i ) {
for ( size_t j = 0; j < inner_size; ++j ) {
host_phase2_work[i * inner_size + j] = chunk_size << 1;
}
chunk_size *= 2;
}
for ( size_t i = half_phase2_size; i < phase2_size; ++i ) {
host_phase2_work[i] = 1;
}
std::random_shuffle( host_phase2_work.ptr_on_device(),
host_phase2_work.ptr_on_device() + phase2_size );
deep_copy( phase2_work, host_phase2_work );
// Initialize the phase 3 work view with all deallocations.
work_view phase3_work( "Phase 3 Work", phase3_size );
typename work_view::HostMirror host_phase3_work =
- create_mirror_view(phase3_work);
+ create_mirror_view( phase3_work );
inner_size = phase3_size / num_chunk_sizes;
for ( size_t i = 0; i < phase3_size; ++i ) host_phase3_work[i] = 1;
deep_copy( phase3_work, host_phase3_work );
// Calculate the amount of memory needed for the allocator. We need to know
// the number of superblocks required for each chunk size and use that to
// calculate the amount of memory for each chunk size.
size_t lg_sb_size = 18;
size_t sb_size = 1 << lg_sb_size;
size_t total_size = 0;
size_t allocs_per_size = phase1_size / num_chunk_sizes +
half_phase2_size / num_chunk_sizes;
chunk_size = ceil_base_chunk_size;
for ( size_t i = 0; i < num_chunk_sizes; ++i ) {
size_t my_size = allocs_per_size * chunk_size;
total_size += ( my_size + sb_size - 1 ) / sb_size * sb_size;
chunk_size *= 2;
}
// Declare the queue to hold the records for allocated memory. An allocation
// adds a record to the back of the queue, and a deallocation removes a
// record from the front of the queue.
size_t num_allocations = phase1_size + half_phase2_size;
scalar_view ptrs_front( "Pointers front" );
scalar_view ptrs_back( "Pointers back" );
pointer_view pointers( "pointers", num_allocations );
#ifdef TESTMEMORYPOOL_PRINT
printf( "\n*** test_mempool2() ***\n" );
printf( " num_chunk_sizes: %12zu\n", num_chunk_sizes );
printf( " base_chunk_size: %12u\n", base_chunk_size );
printf( " ceil_base_chunk_size: %12u\n", ceil_base_chunk_size );
printf( " phase1_size: %12zu\n", phase1_size );
printf( " phase2_size: %12zu\n", phase2_size );
printf( " phase3_size: %12zu\n", phase3_size );
printf( " allocs_per_size: %12zu\n", allocs_per_size );
printf( " num_allocations: %12zu\n", num_allocations );
printf( " total_size: %12zu\n", total_size );
fflush( stdout );
double elapsed_time = 0;
Kokkos::Timer timer;
#endif
pool_memory_space mempool( memory_space(), total_size * 1.2, lg_sb_size );
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "initialize mempool: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_deallocate_memory< work_view, pointer_view, scalar_view,
pool_memory_space >
adm( phase1_work, phase1_size, pointers, ptrs_front, ptrs_back, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "phase1: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_deallocate_memory< work_view, pointer_view, scalar_view,
pool_memory_space >
adm( phase2_work, phase2_size, pointers, ptrs_front, ptrs_back, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "phase2: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_deallocate_memory< work_view, pointer_view, scalar_view,
pool_memory_space >
adm( phase3_work, phase3_size, pointers, ptrs_front, ptrs_back, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "phase3: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
#endif
}
// Tests for correct behavior when the allocator is out of memory.
template < class Device >
void test_memory_exhaustion()
{
#ifdef TESTMEMORYPOOL_PRINT
typedef typename Device::execution_space execution_space;
#endif
typedef typename Device::memory_space memory_space;
typedef Device device_type;
typedef Kokkos::View< pointer_obj *, device_type > pointer_view;
typedef Kokkos::Experimental::MemoryPool< device_type > pool_memory_space;
// The allocator will have a single superblock, and allocations will all be
// of the same chunk size. The allocation loop will attempt to allocate
// twice the number of chunks as are available in the allocator. The
// deallocation loop will only free the successfully allocated chunks.
size_t chunk_size = 128;
size_t num_chunks = 128;
size_t half_num_chunks = num_chunks / 2;
size_t superblock_size = chunk_size * half_num_chunks;
size_t lg_superblock_size =
Kokkos::Impl::integral_power_of_two( superblock_size );
pointer_view pointers( "pointers", num_chunks );
#ifdef TESTMEMORYPOOL_PRINT
std::cout << "\n*** test_memory_exhaustion() ***" << std::endl;
double elapsed_time = 0;
Kokkos::Timer timer;
#endif
pool_memory_space mempool( memory_space(), superblock_size,
lg_superblock_size );
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "initialize mempool: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
allocate_memory< pointer_view, pool_memory_space >
am( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "allocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
timer.reset();
#endif
{
// In parallel, the allocations that succeeded were not put contiguously
// into the pointers View. The whole View can still be looped over and
// have deallocate called because deallocate will just do nothing for NULL
// pointers.
deallocate_memory< pointer_view, pool_memory_space >
dm( pointers, num_chunks, chunk_size, mempool );
}
#ifdef TESTMEMORYPOOL_PRINT
execution_space::fence();
elapsed_time = timer.seconds();
print_results( "deallocate chunks: ", elapsed_time );
#ifdef TESTMEMORYPOOL_PRINT_STATUS
mempool.print_status();
#endif
#endif
}
}
#undef TESTMEMORYPOOL_PRINT
#undef TESTMEMORYPOOL_PRINT_STATUS
#undef STRIDE
#undef STRIDE_ALLOC
#endif
diff --git a/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp b/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp
index 1bb45481c..6f2ca6a61 100644
--- a/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp
+++ b/lib/kokkos/core/unit_test/TestPolicyConstruction.hpp
@@ -1,497 +1,528 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <stdexcept>
#include <sstream>
#include <iostream>
-struct SomeTag{};
+struct SomeTag {};
template< class ExecutionSpace >
class TestRangePolicyConstruction {
public:
TestRangePolicyConstruction() {
test_compile_time_parameters();
}
+
private:
void test_compile_time_parameters() {
{
Kokkos::Impl::expand_variadic();
- Kokkos::Impl::expand_variadic(1,2,3);
+ Kokkos::Impl::expand_variadic( 1, 2, 3 );
}
+
{
typedef Kokkos::RangePolicy<> policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Static> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<ExecutionSpace> policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::RangePolicy< ExecutionSpace > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Static> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::RangePolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::RangePolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<Kokkos::IndexType<long>, ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::RangePolicy< Kokkos::IndexType<long>, ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::RangePolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,ExecutionSpace,Kokkos::IndexType<long>,SomeTag > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::RangePolicy< Kokkos::Schedule<Kokkos::Dynamic>, ExecutionSpace, Kokkos::IndexType<long>, SomeTag > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,ExecutionSpace > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::RangePolicy< SomeTag, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, ExecutionSpace > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::RangePolicy< Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::RangePolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<Kokkos::IndexType<long>, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::RangePolicy< Kokkos::IndexType<long>, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::RangePolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::RangePolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
+
{
- typedef Kokkos::RangePolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::RangePolicy< SomeTag, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
}
};
template< class ExecutionSpace >
class TestTeamPolicyConstruction {
public:
TestTeamPolicyConstruction() {
test_compile_time_parameters();
test_run_time_parameters();
}
+
private:
void test_compile_time_parameters() {
{
typedef Kokkos::TeamPolicy<> policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Static> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Static> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Static> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::TeamPolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::TeamPolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<Kokkos::IndexType<long>, ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::TeamPolicy< Kokkos::IndexType<long>, ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::TeamPolicy< ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,ExecutionSpace,Kokkos::IndexType<long>,SomeTag > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::TeamPolicy< Kokkos::Schedule<Kokkos::Dynamic>, ExecutionSpace, Kokkos::IndexType<long>, SomeTag > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,ExecutionSpace > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,ExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::TeamPolicy< SomeTag, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, ExecutionSpace > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, ExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,typename execution_space::size_type >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::TeamPolicy< Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, typename execution_space::size_type >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::TeamPolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<Kokkos::IndexType<long>, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,void >::value));
+ typedef Kokkos::TeamPolicy< Kokkos::IndexType<long>, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, void >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::TeamPolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,SomeTag > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::TeamPolicy< Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, SomeTag > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
+
{
- typedef Kokkos::TeamPolicy<SomeTag,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > policy_t;
- typedef typename policy_t::execution_space execution_space;
- typedef typename policy_t::index_type index_type;
- typedef typename policy_t::schedule_type schedule_type;
- typedef typename policy_t::work_tag work_tag;
-
- ASSERT_TRUE((std::is_same<execution_space ,Kokkos::DefaultExecutionSpace >::value));
- ASSERT_TRUE((std::is_same<index_type ,long >::value));
- ASSERT_TRUE((std::is_same<schedule_type ,Kokkos::Schedule<Kokkos::Dynamic> >::value));
- ASSERT_TRUE((std::is_same<work_tag ,SomeTag >::value));
+ typedef Kokkos::TeamPolicy< SomeTag, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > policy_t;
+ typedef typename policy_t::execution_space execution_space;
+ typedef typename policy_t::index_type index_type;
+ typedef typename policy_t::schedule_type schedule_type;
+ typedef typename policy_t::work_tag work_tag;
+
+ ASSERT_TRUE( ( std::is_same< execution_space, Kokkos::DefaultExecutionSpace >::value ) );
+ ASSERT_TRUE( ( std::is_same< index_type, long >::value ) );
+ ASSERT_TRUE( ( std::is_same< schedule_type, Kokkos::Schedule<Kokkos::Dynamic> >::value ) );
+ ASSERT_TRUE( ( std::is_same< work_tag, SomeTag >::value ) );
}
}
- template<class policy_t>
+ template< class policy_t >
void test_run_time_parameters_type() {
int league_size = 131;
- int team_size = 4<policy_t::execution_space::concurrency()?4:policy_t::execution_space::concurrency();
+ int team_size = 4 < policy_t::execution_space::concurrency() ? 4 : policy_t::execution_space::concurrency();
int chunk_size = 4;
int per_team_scratch = 1024;
int per_thread_scratch = 16;
- int scratch_size = per_team_scratch + per_thread_scratch*team_size;
- policy_t p1(league_size,team_size);
- ASSERT_EQ (p1.league_size() , league_size);
- ASSERT_EQ (p1.team_size() , team_size);
- ASSERT_TRUE(p1.chunk_size() > 0);
- ASSERT_EQ (p1.scratch_size(0), 0);
-
- policy_t p2 = p1.set_chunk_size(chunk_size);
- ASSERT_EQ (p1.league_size() , league_size);
- ASSERT_EQ (p1.team_size() , team_size);
- ASSERT_TRUE(p1.chunk_size() > 0);
- ASSERT_EQ (p1.scratch_size(0), 0);
-
- ASSERT_EQ (p2.league_size() , league_size);
- ASSERT_EQ (p2.team_size() , team_size);
- ASSERT_EQ (p2.chunk_size() , chunk_size);
- ASSERT_EQ (p2.scratch_size(0), 0);
-
- policy_t p3 = p2.set_scratch_size(0,Kokkos::PerTeam(per_team_scratch));
- ASSERT_EQ (p2.league_size() , league_size);
- ASSERT_EQ (p2.team_size() , team_size);
- ASSERT_EQ (p2.chunk_size() , chunk_size);
- ASSERT_EQ (p2.scratch_size(0), 0);
- ASSERT_EQ (p3.league_size() , league_size);
- ASSERT_EQ (p3.team_size() , team_size);
- ASSERT_EQ (p3.chunk_size() , chunk_size);
- ASSERT_EQ (p3.scratch_size(0), per_team_scratch);
-
- policy_t p4 = p2.set_scratch_size(0,Kokkos::PerThread(per_thread_scratch));
- ASSERT_EQ (p2.league_size() , league_size);
- ASSERT_EQ (p2.team_size() , team_size);
- ASSERT_EQ (p2.chunk_size() , chunk_size);
- ASSERT_EQ (p2.scratch_size(0), 0);
- ASSERT_EQ (p4.league_size() , league_size);
- ASSERT_EQ (p4.team_size() , team_size);
- ASSERT_EQ (p4.chunk_size() , chunk_size);
- ASSERT_EQ (p4.scratch_size(0), per_thread_scratch*team_size);
-
- policy_t p5 = p2.set_scratch_size(0,Kokkos::PerThread(per_thread_scratch),Kokkos::PerTeam(per_team_scratch));
- ASSERT_EQ (p2.league_size() , league_size);
- ASSERT_EQ (p2.team_size() , team_size);
- ASSERT_EQ (p2.chunk_size() , chunk_size);
- ASSERT_EQ (p2.scratch_size(0), 0);
- ASSERT_EQ (p5.league_size() , league_size);
- ASSERT_EQ (p5.team_size() , team_size);
- ASSERT_EQ (p5.chunk_size() , chunk_size);
- ASSERT_EQ (p5.scratch_size(0), scratch_size);
-
- policy_t p6 = p2.set_scratch_size(0,Kokkos::PerTeam(per_team_scratch),Kokkos::PerThread(per_thread_scratch));
- ASSERT_EQ (p2.league_size() , league_size);
- ASSERT_EQ (p2.team_size() , team_size);
- ASSERT_EQ (p2.chunk_size() , chunk_size);
- ASSERT_EQ (p2.scratch_size(0), 0);
- ASSERT_EQ (p6.league_size() , league_size);
- ASSERT_EQ (p6.team_size() , team_size);
- ASSERT_EQ (p6.chunk_size() , chunk_size);
- ASSERT_EQ (p6.scratch_size(0), scratch_size);
-
- policy_t p7 = p3.set_scratch_size(0,Kokkos::PerTeam(per_team_scratch),Kokkos::PerThread(per_thread_scratch));
- ASSERT_EQ (p3.league_size() , league_size);
- ASSERT_EQ (p3.team_size() , team_size);
- ASSERT_EQ (p3.chunk_size() , chunk_size);
- ASSERT_EQ (p3.scratch_size(0), per_team_scratch);
- ASSERT_EQ (p7.league_size() , league_size);
- ASSERT_EQ (p7.team_size() , team_size);
- ASSERT_EQ (p7.chunk_size() , chunk_size);
- ASSERT_EQ (p7.scratch_size(0), scratch_size);
-}
+ int scratch_size = per_team_scratch + per_thread_scratch * team_size;
+
+ policy_t p1( league_size, team_size );
+ ASSERT_EQ ( p1.league_size(), league_size );
+ ASSERT_EQ ( p1.team_size(), team_size );
+ ASSERT_TRUE( p1.chunk_size() > 0 );
+ ASSERT_EQ ( p1.scratch_size( 0 ), 0 );
+
+ policy_t p2 = p1.set_chunk_size( chunk_size );
+ ASSERT_EQ ( p1.league_size(), league_size );
+ ASSERT_EQ ( p1.team_size(), team_size );
+ ASSERT_TRUE( p1.chunk_size() > 0 );
+ ASSERT_EQ ( p1.scratch_size( 0 ), 0 );
+
+ ASSERT_EQ ( p2.league_size(), league_size );
+ ASSERT_EQ ( p2.team_size(), team_size );
+ ASSERT_EQ ( p2.chunk_size(), chunk_size );
+ ASSERT_EQ ( p2.scratch_size( 0 ), 0 );
+
+ policy_t p3 = p2.set_scratch_size( 0, Kokkos::PerTeam( per_team_scratch ) );
+ ASSERT_EQ ( p2.league_size(), league_size );
+ ASSERT_EQ ( p2.team_size(), team_size );
+ ASSERT_EQ ( p2.chunk_size(), chunk_size );
+ ASSERT_EQ ( p2.scratch_size( 0 ), 0 );
+ ASSERT_EQ ( p3.league_size(), league_size );
+ ASSERT_EQ ( p3.team_size(), team_size );
+ ASSERT_EQ ( p3.chunk_size(), chunk_size );
+ ASSERT_EQ ( p3.scratch_size( 0 ), per_team_scratch );
+
+ policy_t p4 = p2.set_scratch_size( 0, Kokkos::PerThread( per_thread_scratch ) );
+ ASSERT_EQ ( p2.league_size(), league_size );
+ ASSERT_EQ ( p2.team_size(), team_size );
+ ASSERT_EQ ( p2.chunk_size(), chunk_size );
+ ASSERT_EQ ( p2.scratch_size( 0 ), 0 );
+ ASSERT_EQ ( p4.league_size(), league_size );
+ ASSERT_EQ ( p4.team_size(), team_size );
+ ASSERT_EQ ( p4.chunk_size(), chunk_size );
+ ASSERT_EQ ( p4.scratch_size( 0 ), per_thread_scratch * team_size );
+
+ policy_t p5 = p2.set_scratch_size( 0, Kokkos::PerThread( per_thread_scratch ), Kokkos::PerTeam( per_team_scratch ) );
+ ASSERT_EQ ( p2.league_size(), league_size );
+ ASSERT_EQ ( p2.team_size(), team_size );
+ ASSERT_EQ ( p2.chunk_size(), chunk_size );
+ ASSERT_EQ ( p2.scratch_size( 0 ), 0 );
+ ASSERT_EQ ( p5.league_size(), league_size );
+ ASSERT_EQ ( p5.team_size(), team_size );
+ ASSERT_EQ ( p5.chunk_size(), chunk_size );
+ ASSERT_EQ ( p5.scratch_size( 0 ), scratch_size );
+
+ policy_t p6 = p2.set_scratch_size( 0, Kokkos::PerTeam( per_team_scratch ), Kokkos::PerThread( per_thread_scratch ) );
+ ASSERT_EQ ( p2.league_size(), league_size );
+ ASSERT_EQ ( p2.team_size(), team_size );
+ ASSERT_EQ ( p2.chunk_size(), chunk_size );
+ ASSERT_EQ ( p2.scratch_size( 0 ), 0 );
+ ASSERT_EQ ( p6.league_size(), league_size );
+ ASSERT_EQ ( p6.team_size(), team_size );
+ ASSERT_EQ ( p6.chunk_size(), chunk_size );
+ ASSERT_EQ ( p6.scratch_size( 0 ), scratch_size );
+
+ policy_t p7 = p3.set_scratch_size( 0, Kokkos::PerTeam( per_team_scratch ), Kokkos::PerThread( per_thread_scratch ) );
+ ASSERT_EQ ( p3.league_size(), league_size );
+ ASSERT_EQ ( p3.team_size(), team_size );
+ ASSERT_EQ ( p3.chunk_size(), chunk_size );
+ ASSERT_EQ ( p3.scratch_size( 0 ), per_team_scratch );
+ ASSERT_EQ ( p7.league_size(), league_size );
+ ASSERT_EQ ( p7.team_size(), team_size );
+ ASSERT_EQ ( p7.chunk_size(), chunk_size );
+ ASSERT_EQ ( p7.scratch_size( 0 ), scratch_size );
+ }
+
void test_run_time_parameters() {
- test_run_time_parameters_type<Kokkos::TeamPolicy<ExecutionSpace> >();
- test_run_time_parameters_type<Kokkos::TeamPolicy<ExecutionSpace,Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long> > >();
- test_run_time_parameters_type<Kokkos::TeamPolicy<Kokkos::IndexType<long>, ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > >();
- test_run_time_parameters_type<Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>,Kokkos::IndexType<long>,ExecutionSpace,SomeTag > >();
+ test_run_time_parameters_type< Kokkos::TeamPolicy<ExecutionSpace> >();
+ test_run_time_parameters_type< Kokkos::TeamPolicy<ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long> > >();
+ test_run_time_parameters_type< Kokkos::TeamPolicy<Kokkos::IndexType<long>, ExecutionSpace, Kokkos::Schedule<Kokkos::Dynamic> > >();
+ test_run_time_parameters_type< Kokkos::TeamPolicy<Kokkos::Schedule<Kokkos::Dynamic>, Kokkos::IndexType<long>, ExecutionSpace, SomeTag > >();
}
};
diff --git a/lib/kokkos/core/unit_test/TestQthread.cpp b/lib/kokkos/core/unit_test/TestQthread.cpp
deleted file mode 100644
index a465f39ca..000000000
--- a/lib/kokkos/core/unit_test/TestQthread.cpp
+++ /dev/null
@@ -1,287 +0,0 @@
-/*
-//@HEADER
-// ************************************************************************
-//
-// Kokkos v. 2.0
-// Copyright (2014) Sandia Corporation
-//
-// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
-// the U.S. Government retains certain rights in this software.
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-//
-// 1. Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//
-// 2. Redistributions in binary form must reproduce the above copyright
-// notice, this list of conditions and the following disclaimer in the
-// documentation and/or other materials provided with the distribution.
-//
-// 3. Neither the name of the Corporation nor the names of the
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
-// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
-// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
-// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
-// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
-// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-//
-// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
-// ************************************************************************
-//@HEADER
-*/
-
-#include <gtest/gtest.h>
-
-#include <Kokkos_Core.hpp>
-#include <Kokkos_Qthread.hpp>
-
-//----------------------------------------------------------------------------
-
-#include <TestAtomic.hpp>
-
-#include <TestViewAPI.hpp>
-#include <TestViewOfClass.hpp>
-
-#include <TestTeam.hpp>
-#include <TestRange.hpp>
-#include <TestReduce.hpp>
-#include <TestScan.hpp>
-#include <TestAggregate.hpp>
-#include <TestCompilerMacros.hpp>
-#include <TestTaskScheduler.hpp>
-// #include <TestTeamVector.hpp>
-
-namespace Test {
-
-class qthread : public ::testing::Test {
-protected:
- static void SetUpTestCase()
- {
- const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
- const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
- const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
-
- int threads_count = std::max( 1u , numa_count )
- * std::max( 2u , ( cores_per_numa * threads_per_core ) / 2 );
- Kokkos::Qthread::initialize( threads_count );
- Kokkos::Qthread::print_configuration( std::cout , true );
- }
-
- static void TearDownTestCase()
- {
- Kokkos::Qthread::finalize();
- }
-};
-
-TEST_F( qthread , compiler_macros )
-{
- ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Qthread >() ) );
-}
-
-TEST_F( qthread, view_impl) {
- test_view_impl< Kokkos::Qthread >();
-}
-
-TEST_F( qthread, view_api) {
- TestViewAPI< double , Kokkos::Qthread >();
-}
-
-TEST_F( qthread , view_nested_view )
-{
- ::Test::view_nested_view< Kokkos::Qthread >();
-}
-
-TEST_F( qthread , range_tag )
-{
- TestRange< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestRange< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestRange< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
-}
-
-TEST_F( qthread , team_tag )
-{
- TestTeamPolicy< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
- TestTeamPolicy< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
-}
-
-TEST_F( qthread, long_reduce) {
- TestReduce< long , Kokkos::Qthread >( 1000000 );
-}
-
-TEST_F( qthread, double_reduce) {
- TestReduce< double , Kokkos::Qthread >( 1000000 );
-}
-
-TEST_F( qthread, long_reduce_dynamic ) {
- TestReduceDynamic< long , Kokkos::Qthread >( 1000000 );
-}
-
-TEST_F( qthread, double_reduce_dynamic ) {
- TestReduceDynamic< double , Kokkos::Qthread >( 1000000 );
-}
-
-TEST_F( qthread, long_reduce_dynamic_view ) {
- TestReduceDynamicView< long , Kokkos::Qthread >( 1000000 );
-}
-
-TEST_F( qthread, team_long_reduce) {
- TestReduceTeam< long , Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 1000000 );
-}
-
-TEST_F( qthread, team_double_reduce) {
- TestReduceTeam< double , Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 1000000 );
-}
-
-
-TEST_F( qthread , atomics )
-{
- const int loop_count = 1e4 ;
-
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Qthread>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Qthread>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Qthread>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Qthread>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Qthread>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Qthread>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Qthread>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Qthread>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Qthread>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Qthread>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Qthread>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Qthread>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Qthread>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Qthread>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Qthread>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Qthread>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Qthread>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Qthread>(loop_count,3) ) );
-
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Qthread>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Qthread>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Qthread>(100,3) ) );
-
-#if defined( KOKKOS_ENABLE_ASM )
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Qthread>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Qthread>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Qthread>(100,3) ) );
-#endif
-
-}
-
-TEST_F( qthread , view_remap )
-{
- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
-
- typedef Kokkos::View< double*[N1][N2][N3] ,
- Kokkos::LayoutRight ,
- Kokkos::Qthread > output_type ;
-
- typedef Kokkos::View< int**[N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::Qthread > input_type ;
-
- typedef Kokkos::View< int*[N0][N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::Qthread > diff_type ;
-
- output_type output( "output" , N0 );
- input_type input ( "input" , N0 , N1 );
- diff_type diff ( "diff" , N0 );
-
- int value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- input(i0,i1,i2,i3) = ++value ;
- }}}}
-
- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
- Kokkos::deep_copy( output , input );
-
- value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- ++value ;
- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
- }}}}
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( qthread , view_aggregate )
-{
- TestViewAggregate< Kokkos::Qthread >();
-}
-
-//----------------------------------------------------------------------------
-
-TEST_F( qthread , scan )
-{
- TestScan< Kokkos::Qthread >::test_range( 1 , 1000 );
- TestScan< Kokkos::Qthread >( 1000000 );
- TestScan< Kokkos::Qthread >( 10000000 );
- Kokkos::Qthread::fence();
-}
-
-TEST_F( qthread, team_shared ) {
- TestSharedTeam< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >();
-}
-
-TEST_F( qthread, shmem_size) {
- TestShmemSize< Kokkos::Qthread >();
-}
-
-TEST_F( qthread , team_scan )
-{
- TestScanTeam< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 10 );
- TestScanTeam< Kokkos::Qthread , Kokkos::Schedule<Kokkos::Static> >( 10000 );
-}
-
-#if 0 /* disable */
-TEST_F( qthread , team_vector )
-{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthread >(4) ) );
-}
-#endif
-
-//----------------------------------------------------------------------------
-
-TEST_F( qthread , task_policy )
-{
- TestTaskScheduler::test_task_dep< Kokkos::Qthread >( 10 );
- for ( long i = 0 ; i < 25 ; ++i ) TestTaskScheduler::test_fib< Kokkos::Qthread >(i);
- for ( long i = 0 ; i < 35 ; ++i ) TestTaskScheduler::test_fib2< Kokkos::Qthread >(i);
-}
-
-TEST_F( qthread , task_team )
-{
- TestTaskScheduler::test_task_team< Kokkos::Qthread >(1000);
-}
-
-//----------------------------------------------------------------------------
-
-} // namespace test
-
diff --git a/lib/kokkos/core/unit_test/TestRange.hpp b/lib/kokkos/core/unit_test/TestRange.hpp
index e342e844c..90411a57a 100644
--- a/lib/kokkos/core/unit_test/TestRange.hpp
+++ b/lib/kokkos/core/unit_test/TestRange.hpp
@@ -1,242 +1,248 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <stdio.h>
#include <Kokkos_Core.hpp>
-/*--------------------------------------------------------------------------*/
-
namespace Test {
+
namespace {
template< class ExecSpace, class ScheduleType >
struct TestRange {
+ typedef int value_type; ///< typedef required for the parallel_reduce
- typedef int value_type ; ///< typedef required for the parallel_reduce
-
- typedef Kokkos::View<int*,ExecSpace> view_type ;
+ typedef Kokkos::View< int*, ExecSpace > view_type;
- view_type m_flags ;
+ view_type m_flags;
struct VerifyInitTag {};
struct ResetTag {};
struct VerifyResetTag {};
TestRange( const size_t N )
- : m_flags( Kokkos::ViewAllocateWithoutInitializing("flags"), N )
+ : m_flags( Kokkos::ViewAllocateWithoutInitializing( "flags" ), N )
{}
static void test_for( const size_t N )
- {
- TestRange functor(N);
+ {
+ TestRange functor( N );
- typename view_type::HostMirror host_flags = Kokkos::create_mirror_view( functor.m_flags );
+ typename view_type::HostMirror host_flags = Kokkos::create_mirror_view( functor.m_flags );
- Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor );
- Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType,VerifyInitTag>(0,N) , functor );
+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, ScheduleType >( 0, N ), functor );
+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, ScheduleType, VerifyInitTag >( 0, N ), functor );
- Kokkos::deep_copy( host_flags , functor.m_flags );
+ Kokkos::deep_copy( host_flags, functor.m_flags );
- size_t error_count = 0 ;
- for ( size_t i = 0 ; i < N ; ++i ) {
- if ( int(i) != host_flags(i) ) ++error_count ;
- }
- ASSERT_EQ( error_count , size_t(0) );
+ size_t error_count = 0;
+ for ( size_t i = 0; i < N; ++i ) {
+ if ( int( i ) != host_flags( i ) ) ++error_count;
+ }
+ ASSERT_EQ( error_count, size_t( 0 ) );
- Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType,ResetTag>(0,N) , functor );
- Kokkos::parallel_for( std::string("TestKernelFor") , Kokkos::RangePolicy<ExecSpace,ScheduleType,VerifyResetTag>(0,N) , functor );
+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, ScheduleType, ResetTag >( 0, N ), functor );
+ Kokkos::parallel_for( std::string( "TestKernelFor" ), Kokkos::RangePolicy< ExecSpace, ScheduleType, VerifyResetTag >( 0, N ), functor );
- Kokkos::deep_copy( host_flags , functor.m_flags );
+ Kokkos::deep_copy( host_flags, functor.m_flags );
- error_count = 0 ;
- for ( size_t i = 0 ; i < N ; ++i ) {
- if ( int(2*i) != host_flags(i) ) ++error_count ;
- }
- ASSERT_EQ( error_count , size_t(0) );
+ error_count = 0;
+ for ( size_t i = 0; i < N; ++i ) {
+ if ( int( 2 * i ) != host_flags( i ) ) ++error_count;
}
+ ASSERT_EQ( error_count, size_t( 0 ) );
+ }
KOKKOS_INLINE_FUNCTION
void operator()( const int i ) const
- { m_flags(i) = i ; }
+ { m_flags( i ) = i; }
KOKKOS_INLINE_FUNCTION
- void operator()( const VerifyInitTag & , const int i ) const
- { if ( i != m_flags(i) ) { printf("TestRange::test_for error at %d != %d\n",i,m_flags(i)); } }
+ void operator()( const VerifyInitTag &, const int i ) const
+ {
+ if ( i != m_flags( i ) ) {
+ printf( "TestRange::test_for error at %d != %d\n", i, m_flags( i ) );
+ }
+ }
KOKKOS_INLINE_FUNCTION
- void operator()( const ResetTag & , const int i ) const
- { m_flags(i) = 2 * m_flags(i); }
+ void operator()( const ResetTag &, const int i ) const
+ { m_flags( i ) = 2 * m_flags( i ); }
KOKKOS_INLINE_FUNCTION
- void operator()( const VerifyResetTag & , const int i ) const
- { if ( 2 * i != m_flags(i) ) { printf("TestRange::test_for error at %d != %d\n",i,m_flags(i)); } }
+ void operator()( const VerifyResetTag &, const int i ) const
+ {
+ if ( 2 * i != m_flags( i ) )
+ {
+ printf( "TestRange::test_for error at %d != %d\n", i, m_flags( i ) );
+ }
+ }
//----------------------------------------
struct OffsetTag {};
static void test_reduce( const size_t N )
- {
- TestRange functor(N);
- int total = 0 ;
+ {
+ TestRange functor( N );
+ int total = 0;
- Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor );
+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, ScheduleType >( 0, N ), functor );
- Kokkos::parallel_reduce( "TestKernelReduce" , Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor , total );
- // sum( 0 .. N-1 )
- ASSERT_EQ( size_t((N-1)*(N)/2) , size_t(total) );
+ Kokkos::parallel_reduce( "TestKernelReduce", Kokkos::RangePolicy< ExecSpace, ScheduleType >( 0, N ), functor, total );
+ // sum( 0 .. N-1 )
+ ASSERT_EQ( size_t( ( N - 1 ) * ( N ) / 2 ), size_t( total ) );
- Kokkos::parallel_reduce( Kokkos::RangePolicy<ExecSpace,ScheduleType,OffsetTag>(0,N) , functor , total );
- // sum( 1 .. N )
- ASSERT_EQ( size_t((N)*(N+1)/2) , size_t(total) );
- }
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace, ScheduleType, OffsetTag>( 0, N ), functor, total );
+ // sum( 1 .. N )
+ ASSERT_EQ( size_t( ( N ) * ( N + 1 ) / 2 ), size_t( total ) );
+ }
KOKKOS_INLINE_FUNCTION
- void operator()( const int i , value_type & update ) const
- { update += m_flags(i); }
+ void operator()( const int i, value_type & update ) const
+ { update += m_flags( i ); }
KOKKOS_INLINE_FUNCTION
- void operator()( const OffsetTag & , const int i , value_type & update ) const
- { update += 1 + m_flags(i); }
+ void operator()( const OffsetTag &, const int i, value_type & update ) const
+ { update += 1 + m_flags( i ); }
//----------------------------------------
static void test_scan( const size_t N )
- {
- TestRange functor(N);
+ {
+ TestRange functor( N );
- Kokkos::parallel_for( Kokkos::RangePolicy<ExecSpace,ScheduleType>(0,N) , functor );
+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, ScheduleType >( 0, N ), functor );
- Kokkos::parallel_scan( "TestKernelScan" , Kokkos::RangePolicy<ExecSpace,ScheduleType,OffsetTag>(0,N) , functor );
- }
+ Kokkos::parallel_scan( "TestKernelScan", Kokkos::RangePolicy< ExecSpace, ScheduleType, OffsetTag>( 0, N ), functor );
+ }
KOKKOS_INLINE_FUNCTION
- void operator()( const OffsetTag & , const int i , value_type & update , bool final ) const
- {
- update += m_flags(i);
+ void operator()( const OffsetTag &, const int i, value_type & update, bool final ) const
+ {
+ update += m_flags( i );
- if ( final ) {
- if ( update != (i*(i+1))/2 ) {
- printf("TestRange::test_scan error %d : %d != %d\n",i,(i*(i+1))/2,m_flags(i));
- }
+ if ( final ) {
+ if ( update != ( i * ( i + 1 ) ) / 2 ) {
+ printf( "TestRange::test_scan error %d : %d != %d\n", i, ( i * ( i + 1 ) ) / 2, m_flags( i ) );
}
}
+ }
- static void test_dynamic_policy( const size_t N ) {
-
-
- typedef Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
+ static void test_dynamic_policy( const size_t N )
+ {
+ typedef Kokkos::RangePolicy< ExecSpace, Kokkos::Schedule<Kokkos::Dynamic> > policy_t;
{
- Kokkos::View<size_t*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > count("Count",ExecSpace::concurrency());
- Kokkos::View<int*,ExecSpace> a("A",N);
-
- Kokkos::parallel_for( policy_t(0,N),
- KOKKOS_LAMBDA (const typename policy_t::member_type& i) {
- for(int k=0; k<(i<N/2?1:10000); k++ )
- a(i)++;
- count(ExecSpace::hardware_thread_id())++;
+ Kokkos::View< size_t*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > count( "Count", ExecSpace::concurrency() );
+ Kokkos::View< int*, ExecSpace > a( "A", N );
+
+ Kokkos::parallel_for( policy_t( 0, N ), KOKKOS_LAMBDA ( const typename policy_t::member_type& i ) {
+ for ( int k = 0; k < ( i < N / 2 ? 1 : 10000 ); k++ ) {
+ a( i )++;
+ }
+ count( ExecSpace::hardware_thread_id() )++;
});
int error = 0;
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N), KOKKOS_LAMBDA(const typename policy_t::member_type& i, int& lsum) {
- lsum += ( a(i)!= (i<N/2?1:10000) );
- },error);
- ASSERT_EQ(error,0);
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), KOKKOS_LAMBDA( const typename policy_t::member_type & i, int & lsum ) {
+ lsum += ( a( i ) != ( i < N / 2 ? 1 : 10000 ) );
+ }, error );
+ ASSERT_EQ( error, 0 );
- if( ( ExecSpace::concurrency()>(int)1) && (N>static_cast<size_t>(4*ExecSpace::concurrency())) ) {
+ if ( ( ExecSpace::concurrency() > (int) 1 ) && ( N > static_cast<size_t>( 4 * ExecSpace::concurrency() ) ) ) {
size_t min = N;
size_t max = 0;
- for(int t=0; t<ExecSpace::concurrency(); t++) {
- if(count(t)<min) min = count(t);
- if(count(t)>max) max = count(t);
+ for ( int t = 0; t < ExecSpace::concurrency(); t++ ) {
+ if ( count( t ) < min ) min = count( t );
+ if ( count( t ) > max ) max = count( t );
}
- ASSERT_TRUE(min<max);
- //if(ExecSpace::concurrency()>2)
- // ASSERT_TRUE(2*min<max);
+ ASSERT_TRUE( min < max );
+
+ //if ( ExecSpace::concurrency() > 2 ) {
+ // ASSERT_TRUE( 2 * min < max );
+ //}
}
-
}
{
- Kokkos::View<size_t*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > count("Count",ExecSpace::concurrency());
- Kokkos::View<int*,ExecSpace> a("A",N);
+ Kokkos::View< size_t*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > count( "Count", ExecSpace::concurrency() );
+ Kokkos::View< int*, ExecSpace> a( "A", N );
int sum = 0;
- Kokkos::parallel_reduce( policy_t(0,N),
- KOKKOS_LAMBDA (const typename policy_t::member_type& i, int& lsum) {
- for(int k=0; k<(i<N/2?1:10000); k++ )
- a(i)++;
- count(ExecSpace::hardware_thread_id())++;
+ Kokkos::parallel_reduce( policy_t( 0, N ), KOKKOS_LAMBDA( const typename policy_t::member_type & i, int & lsum ) {
+ for ( int k = 0; k < ( i < N / 2 ? 1 : 10000 ); k++ ) {
+ a( i )++;
+ }
+ count( ExecSpace::hardware_thread_id() )++;
lsum++;
- },sum);
- ASSERT_EQ(sum,N);
+ }, sum );
+ ASSERT_EQ( sum, N );
int error = 0;
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N), KOKKOS_LAMBDA(const typename policy_t::member_type& i, int& lsum) {
- lsum += ( a(i)!= (i<N/2?1:10000) );
- },error);
- ASSERT_EQ(error,0);
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), KOKKOS_LAMBDA( const typename policy_t::member_type & i, int & lsum ) {
+ lsum += ( a( i ) != ( i < N / 2 ? 1 : 10000 ) );
+ }, error );
+ ASSERT_EQ( error, 0 );
- if( ( ExecSpace::concurrency()>(int)1) && (N>static_cast<size_t>(4*ExecSpace::concurrency())) ) {
+ if ( ( ExecSpace::concurrency() > (int) 1 ) && ( N > static_cast<size_t>( 4 * ExecSpace::concurrency() ) ) ) {
size_t min = N;
size_t max = 0;
- for(int t=0; t<ExecSpace::concurrency(); t++) {
- if(count(t)<min) min = count(t);
- if(count(t)>max) max = count(t);
+ for ( int t = 0; t < ExecSpace::concurrency(); t++ ) {
+ if ( count( t ) < min ) min = count( t );
+ if ( count( t ) > max ) max = count( t );
}
- ASSERT_TRUE(min<max);
- //if(ExecSpace::concurrency()>2)
- // ASSERT_TRUE(2*min<max);
+ ASSERT_TRUE( min < max );
+
+ //if ( ExecSpace::concurrency() > 2 ) {
+ // ASSERT_TRUE( 2 * min < max );
+ //}
}
}
-
}
};
-} /* namespace */
-} /* namespace Test */
-
-/*--------------------------------------------------------------------------*/
+} // namespace
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestReduce.hpp b/lib/kokkos/core/unit_test/TestReduce.hpp
index 645fc9e31..7e77dadf6 100644
--- a/lib/kokkos/core/unit_test/TestReduce.hpp
+++ b/lib/kokkos/core/unit_test/TestReduce.hpp
@@ -1,1907 +1,2062 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <stdexcept>
#include <sstream>
#include <iostream>
#include <limits>
#include <Kokkos_Core.hpp>
-/*--------------------------------------------------------------------------*/
-
namespace Test {
-template< typename ScalarType , class DeviceType >
+template< typename ScalarType, class DeviceType >
class ReduceFunctor
{
public:
- typedef DeviceType execution_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef DeviceType execution_space;
+ typedef typename execution_space::size_type size_type;
struct value_type {
- ScalarType value[3] ;
+ ScalarType value[3];
};
- const size_type nwork ;
+ const size_type nwork;
- ReduceFunctor( const size_type & arg_nwork ) : nwork( arg_nwork ) {}
+ ReduceFunctor( const size_type & arg_nwork )
+ : nwork( arg_nwork ) {}
ReduceFunctor( const ReduceFunctor & rhs )
: nwork( rhs.nwork ) {}
/*
KOKKOS_INLINE_FUNCTION
void init( value_type & dst ) const
{
- dst.value[0] = 0 ;
- dst.value[1] = 0 ;
- dst.value[2] = 0 ;
+ dst.value[0] = 0;
+ dst.value[1] = 0;
+ dst.value[2] = 0;
}
*/
KOKKOS_INLINE_FUNCTION
- void join( volatile value_type & dst ,
+ void join( volatile value_type & dst,
const volatile value_type & src ) const
{
- dst.value[0] += src.value[0] ;
- dst.value[1] += src.value[1] ;
- dst.value[2] += src.value[2] ;
+ dst.value[0] += src.value[0];
+ dst.value[1] += src.value[1];
+ dst.value[2] += src.value[2];
}
KOKKOS_INLINE_FUNCTION
- void operator()( size_type iwork , value_type & dst ) const
+ void operator()( size_type iwork, value_type & dst ) const
{
- dst.value[0] += 1 ;
- dst.value[1] += iwork + 1 ;
- dst.value[2] += nwork - iwork ;
+ dst.value[0] += 1;
+ dst.value[1] += iwork + 1;
+ dst.value[2] += nwork - iwork;
}
};
template< class DeviceType >
-class ReduceFunctorFinal : public ReduceFunctor< long , DeviceType > {
+class ReduceFunctorFinal : public ReduceFunctor< long, DeviceType > {
public:
-
- typedef typename ReduceFunctor< long , DeviceType >::value_type value_type ;
+ typedef typename ReduceFunctor< long, DeviceType >::value_type value_type;
ReduceFunctorFinal( const size_t n )
- : ReduceFunctor<long,DeviceType>(n)
- {}
+ : ReduceFunctor< long, DeviceType >( n ) {}
KOKKOS_INLINE_FUNCTION
void final( value_type & dst ) const
{
- dst.value[0] = - dst.value[0] ;
- dst.value[1] = - dst.value[1] ;
- dst.value[2] = - dst.value[2] ;
+ dst.value[0] = -dst.value[0];
+ dst.value[1] = -dst.value[1];
+ dst.value[2] = -dst.value[2];
}
};
-template< typename ScalarType , class DeviceType >
+template< typename ScalarType, class DeviceType >
class RuntimeReduceFunctor
{
public:
// Required for functor:
- typedef DeviceType execution_space ;
- typedef ScalarType value_type[] ;
- const unsigned value_count ;
-
+ typedef DeviceType execution_space;
+ typedef ScalarType value_type[];
+ const unsigned value_count;
// Unit test details:
- typedef typename execution_space::size_type size_type ;
+ typedef typename execution_space::size_type size_type;
- const size_type nwork ;
+ const size_type nwork;
- RuntimeReduceFunctor( const size_type arg_nwork ,
+ RuntimeReduceFunctor( const size_type arg_nwork,
const size_type arg_count )
: value_count( arg_count )
, nwork( arg_nwork ) {}
KOKKOS_INLINE_FUNCTION
void init( ScalarType dst[] ) const
{
- for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] = 0 ;
+ for ( unsigned i = 0; i < value_count; ++i ) dst[i] = 0;
}
KOKKOS_INLINE_FUNCTION
- void join( volatile ScalarType dst[] ,
+ void join( volatile ScalarType dst[],
const volatile ScalarType src[] ) const
{
- for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] += src[i] ;
+ for ( unsigned i = 0; i < value_count; ++i ) dst[i] += src[i];
}
KOKKOS_INLINE_FUNCTION
- void operator()( size_type iwork , ScalarType dst[] ) const
+ void operator()( size_type iwork, ScalarType dst[] ) const
{
- const size_type tmp[3] = { 1 , iwork + 1 , nwork - iwork };
+ const size_type tmp[3] = { 1, iwork + 1, nwork - iwork };
- for ( size_type i = 0 ; i < value_count ; ++i ) {
+ for ( size_type i = 0; i < value_count; ++i ) {
dst[i] += tmp[ i % 3 ];
}
}
};
-template< typename ScalarType , class DeviceType >
+template< typename ScalarType, class DeviceType >
class RuntimeReduceMinMax
{
public:
// Required for functor:
- typedef DeviceType execution_space ;
- typedef ScalarType value_type[] ;
- const unsigned value_count ;
+ typedef DeviceType execution_space;
+ typedef ScalarType value_type[];
+ const unsigned value_count;
// Unit test details:
- typedef typename execution_space::size_type size_type ;
+ typedef typename execution_space::size_type size_type;
- const size_type nwork ;
- const ScalarType amin ;
- const ScalarType amax ;
+ const size_type nwork;
+ const ScalarType amin;
+ const ScalarType amax;
- RuntimeReduceMinMax( const size_type arg_nwork ,
+ RuntimeReduceMinMax( const size_type arg_nwork,
const size_type arg_count )
: value_count( arg_count )
, nwork( arg_nwork )
- , amin( std::numeric_limits<ScalarType>::min() )
- , amax( std::numeric_limits<ScalarType>::max() )
+ , amin( std::numeric_limits< ScalarType >::min() )
+ , amax( std::numeric_limits< ScalarType >::max() )
{}
KOKKOS_INLINE_FUNCTION
void init( ScalarType dst[] ) const
{
- for ( unsigned i = 0 ; i < value_count ; ++i ) {
- dst[i] = i % 2 ? amax : amin ;
+ for ( unsigned i = 0; i < value_count; ++i ) {
+ dst[i] = i % 2 ? amax : amin;
}
}
KOKKOS_INLINE_FUNCTION
- void join( volatile ScalarType dst[] ,
+ void join( volatile ScalarType dst[],
const volatile ScalarType src[] ) const
{
- for ( unsigned i = 0 ; i < value_count ; ++i ) {
+ for ( unsigned i = 0; i < value_count; ++i ) {
dst[i] = i % 2 ? ( dst[i] < src[i] ? dst[i] : src[i] ) // min
: ( dst[i] > src[i] ? dst[i] : src[i] ); // max
}
}
KOKKOS_INLINE_FUNCTION
- void operator()( size_type iwork , ScalarType dst[] ) const
+ void operator()( size_type iwork, ScalarType dst[] ) const
{
- const ScalarType tmp[2] = { ScalarType(iwork + 1)
- , ScalarType(nwork - iwork) };
+ const ScalarType tmp[2] = { ScalarType( iwork + 1 )
+ , ScalarType( nwork - iwork ) };
- for ( size_type i = 0 ; i < value_count ; ++i ) {
- dst[i] = i % 2 ? ( dst[i] < tmp[i%2] ? dst[i] : tmp[i%2] )
- : ( dst[i] > tmp[i%2] ? dst[i] : tmp[i%2] );
+ for ( size_type i = 0; i < value_count; ++i ) {
+ dst[i] = i % 2 ? ( dst[i] < tmp[i % 2] ? dst[i] : tmp[i % 2] )
+ : ( dst[i] > tmp[i % 2] ? dst[i] : tmp[i % 2] );
}
}
};
template< class DeviceType >
-class RuntimeReduceFunctorFinal : public RuntimeReduceFunctor< long , DeviceType > {
+class RuntimeReduceFunctorFinal : public RuntimeReduceFunctor< long, DeviceType > {
public:
+ typedef RuntimeReduceFunctor< long, DeviceType > base_type;
+ typedef typename base_type::value_type value_type;
+ typedef long scalar_type;
- typedef RuntimeReduceFunctor< long , DeviceType > base_type ;
- typedef typename base_type::value_type value_type ;
- typedef long scalar_type ;
-
- RuntimeReduceFunctorFinal( const size_t theNwork , const size_t count ) : base_type(theNwork,count) {}
+ RuntimeReduceFunctorFinal( const size_t theNwork, const size_t count )
+ : base_type( theNwork, count ) {}
KOKKOS_INLINE_FUNCTION
void final( value_type dst ) const
{
- for ( unsigned i = 0 ; i < base_type::value_count ; ++i ) {
- dst[i] = - dst[i] ;
+ for ( unsigned i = 0; i < base_type::value_count; ++i ) {
+ dst[i] = -dst[i];
}
}
};
+
} // namespace Test
namespace {
-template< typename ScalarType , class DeviceType >
+template< typename ScalarType, class DeviceType >
class TestReduce
{
public:
- typedef DeviceType execution_space ;
- typedef typename execution_space::size_type size_type ;
-
- //------------------------------------
+ typedef DeviceType execution_space;
+ typedef typename execution_space::size_type size_type;
TestReduce( const size_type & nwork )
{
- run_test(nwork);
- run_test_final(nwork);
+ run_test( nwork );
+ run_test_final( nwork );
}
void run_test( const size_type & nwork )
{
- typedef Test::ReduceFunctor< ScalarType , execution_space > functor_type ;
- typedef typename functor_type::value_type value_type ;
+ typedef Test::ReduceFunctor< ScalarType, execution_space > functor_type;
+ typedef typename functor_type::value_type value_type;
enum { Count = 3 };
enum { Repeat = 100 };
value_type result[ Repeat ];
- const unsigned long nw = nwork ;
- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
- : (nw/2) * ( nw + 1 );
+ const unsigned long nw = nwork;
+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
+ : ( nw / 2 ) * ( nw + 1 );
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- Kokkos::parallel_reduce( nwork , functor_type(nwork) , result[i] );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ Kokkos::parallel_reduce( nwork, functor_type( nwork ), result[i] );
}
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- for ( unsigned j = 0 ; j < Count ; ++j ) {
- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
- ASSERT_EQ( (ScalarType) correct , result[i].value[j] );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ for ( unsigned j = 0; j < Count; ++j ) {
+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
+ ASSERT_EQ( (ScalarType) correct, result[i].value[j] );
}
}
}
void run_test_final( const size_type & nwork )
{
- typedef Test::ReduceFunctorFinal< execution_space > functor_type ;
- typedef typename functor_type::value_type value_type ;
+ typedef Test::ReduceFunctorFinal< execution_space > functor_type;
+ typedef typename functor_type::value_type value_type;
enum { Count = 3 };
enum { Repeat = 100 };
value_type result[ Repeat ];
- const unsigned long nw = nwork ;
- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
- : (nw/2) * ( nw + 1 );
+ const unsigned long nw = nwork;
+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
+ : ( nw / 2 ) * ( nw + 1 );
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- if(i%2==0)
- Kokkos::parallel_reduce( nwork , functor_type(nwork) , result[i] );
- else
- Kokkos::parallel_reduce( "Reduce", nwork , functor_type(nwork) , result[i] );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ if ( i % 2 == 0 ) {
+ Kokkos::parallel_reduce( nwork, functor_type( nwork ), result[i] );
+ }
+ else {
+ Kokkos::parallel_reduce( "Reduce", nwork, functor_type( nwork ), result[i] );
+ }
}
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- for ( unsigned j = 0 ; j < Count ; ++j ) {
- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
- ASSERT_EQ( (ScalarType) correct , - result[i].value[j] );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ for ( unsigned j = 0; j < Count; ++j ) {
+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
+ ASSERT_EQ( (ScalarType) correct, -result[i].value[j] );
}
}
}
};
-template< typename ScalarType , class DeviceType >
+template< typename ScalarType, class DeviceType >
class TestReduceDynamic
{
public:
- typedef DeviceType execution_space ;
- typedef typename execution_space::size_type size_type ;
-
- //------------------------------------
+ typedef DeviceType execution_space;
+ typedef typename execution_space::size_type size_type;
TestReduceDynamic( const size_type nwork )
{
- run_test_dynamic(nwork);
- run_test_dynamic_minmax(nwork);
- run_test_dynamic_final(nwork);
+ run_test_dynamic( nwork );
+ run_test_dynamic_minmax( nwork );
+ run_test_dynamic_final( nwork );
}
void run_test_dynamic( const size_type nwork )
{
- typedef Test::RuntimeReduceFunctor< ScalarType , execution_space > functor_type ;
+ typedef Test::RuntimeReduceFunctor< ScalarType, execution_space > functor_type;
enum { Count = 3 };
enum { Repeat = 100 };
- ScalarType result[ Repeat ][ Count ] ;
+ ScalarType result[ Repeat ][ Count ];
- const unsigned long nw = nwork ;
- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
- : (nw/2) * ( nw + 1 );
+ const unsigned long nw = nwork;
+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
+ : ( nw / 2 ) * ( nw + 1 );
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- if(i%2==0)
- Kokkos::parallel_reduce( nwork , functor_type(nwork,Count) , result[i] );
- else
- Kokkos::parallel_reduce( "Reduce", nwork , functor_type(nwork,Count) , result[i] );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ if ( i % 2 == 0 ) {
+ Kokkos::parallel_reduce( nwork, functor_type( nwork, Count ), result[i] );
+ }
+ else {
+ Kokkos::parallel_reduce( "Reduce", nwork, functor_type( nwork, Count ), result[i] );
+ }
}
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- for ( unsigned j = 0 ; j < Count ; ++j ) {
- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
- ASSERT_EQ( (ScalarType) correct , result[i][j] );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ for ( unsigned j = 0; j < Count; ++j ) {
+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
+ ASSERT_EQ( (ScalarType) correct, result[i][j] );
}
}
}
void run_test_dynamic_minmax( const size_type nwork )
{
- typedef Test::RuntimeReduceMinMax< ScalarType , execution_space > functor_type ;
+ typedef Test::RuntimeReduceMinMax< ScalarType, execution_space > functor_type;
enum { Count = 2 };
enum { Repeat = 100 };
- ScalarType result[ Repeat ][ Count ] ;
+ ScalarType result[ Repeat ][ Count ];
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- if(i%2==0)
- Kokkos::parallel_reduce( nwork , functor_type(nwork,Count) , result[i] );
- else
- Kokkos::parallel_reduce( "Reduce", nwork , functor_type(nwork,Count) , result[i] );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ if ( i % 2 == 0 ) {
+ Kokkos::parallel_reduce( nwork, functor_type( nwork, Count ), result[i] );
+ }
+ else {
+ Kokkos::parallel_reduce( "Reduce", nwork, functor_type( nwork, Count ), result[i] );
+ }
}
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- for ( unsigned j = 0 ; j < Count ; ++j ) {
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ for ( unsigned j = 0; j < Count; ++j ) {
if ( nwork == 0 )
{
- ScalarType amin( std::numeric_limits<ScalarType>::min() );
- ScalarType amax( std::numeric_limits<ScalarType>::max() );
- const ScalarType correct = (j%2) ? amax : amin;
- ASSERT_EQ( (ScalarType) correct , result[i][j] );
- } else {
- const unsigned long correct = j % 2 ? 1 : nwork ;
- ASSERT_EQ( (ScalarType) correct , result[i][j] );
+ ScalarType amin( std::numeric_limits< ScalarType >::min() );
+ ScalarType amax( std::numeric_limits< ScalarType >::max() );
+ const ScalarType correct = ( j % 2 ) ? amax : amin;
+ ASSERT_EQ( (ScalarType) correct, result[i][j] );
+ }
+ else {
+ const unsigned long correct = j % 2 ? 1 : nwork;
+ ASSERT_EQ( (ScalarType) correct, result[i][j] );
}
}
}
}
void run_test_dynamic_final( const size_type nwork )
{
- typedef Test::RuntimeReduceFunctorFinal< execution_space > functor_type ;
+ typedef Test::RuntimeReduceFunctorFinal< execution_space > functor_type;
enum { Count = 3 };
enum { Repeat = 100 };
- typename functor_type::scalar_type result[ Repeat ][ Count ] ;
+ typename functor_type::scalar_type result[ Repeat ][ Count ];
- const unsigned long nw = nwork ;
- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
- : (nw/2) * ( nw + 1 );
+ const unsigned long nw = nwork;
+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
+ : ( nw / 2 ) * ( nw + 1 );
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- if(i%2==0)
- Kokkos::parallel_reduce( nwork , functor_type(nwork,Count) , result[i] );
- else
- Kokkos::parallel_reduce( "TestKernelReduce" , nwork , functor_type(nwork,Count) , result[i] );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ if ( i % 2 == 0 ) {
+ Kokkos::parallel_reduce( nwork, functor_type( nwork, Count ), result[i] );
+ }
+ else {
+ Kokkos::parallel_reduce( "TestKernelReduce", nwork, functor_type( nwork, Count ), result[i] );
+ }
}
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- for ( unsigned j = 0 ; j < Count ; ++j ) {
- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
- ASSERT_EQ( (ScalarType) correct , - result[i][j] );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ for ( unsigned j = 0; j < Count; ++j ) {
+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
+ ASSERT_EQ( (ScalarType) correct, -result[i][j] );
}
}
}
};
-template< typename ScalarType , class DeviceType >
+template< typename ScalarType, class DeviceType >
class TestReduceDynamicView
{
public:
- typedef DeviceType execution_space ;
- typedef typename execution_space::size_type size_type ;
-
- //------------------------------------
+ typedef DeviceType execution_space;
+ typedef typename execution_space::size_type size_type;
TestReduceDynamicView( const size_type nwork )
{
- run_test_dynamic_view(nwork);
+ run_test_dynamic_view( nwork );
}
void run_test_dynamic_view( const size_type nwork )
{
- typedef Test::RuntimeReduceFunctor< ScalarType , execution_space > functor_type ;
+ typedef Test::RuntimeReduceFunctor< ScalarType, execution_space > functor_type;
- typedef Kokkos::View< ScalarType* , DeviceType > result_type ;
- typedef typename result_type::HostMirror result_host_type ;
+ typedef Kokkos::View< ScalarType*, DeviceType > result_type;
+ typedef typename result_type::HostMirror result_host_type;
- const unsigned CountLimit = 23 ;
+ const unsigned CountLimit = 23;
- const unsigned long nw = nwork ;
- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
- : (nw/2) * ( nw + 1 );
+ const unsigned long nw = nwork;
+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
+ : ( nw / 2 ) * ( nw + 1 );
- for ( unsigned count = 0 ; count < CountLimit ; ++count ) {
+ for ( unsigned count = 0; count < CountLimit; ++count ) {
- result_type result("result",count);
+ result_type result( "result", count );
result_host_type host_result = Kokkos::create_mirror( result );
// Test result to host pointer:
- std::string str("TestKernelReduce");
- if(count%2==0)
- Kokkos::parallel_reduce( nw , functor_type(nw,count) , host_result.ptr_on_device() );
- else
- Kokkos::parallel_reduce( str , nw , functor_type(nw,count) , host_result.ptr_on_device() );
+ std::string str( "TestKernelReduce" );
+ if ( count % 2 == 0 ) {
+ Kokkos::parallel_reduce( nw, functor_type( nw, count ), host_result.ptr_on_device() );
+ }
+ else {
+ Kokkos::parallel_reduce( str, nw, functor_type( nw, count ), host_result.ptr_on_device() );
+ }
- for ( unsigned j = 0 ; j < count ; ++j ) {
- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
- ASSERT_EQ( host_result(j), (ScalarType) correct );
- host_result(j) = 0 ;
+ for ( unsigned j = 0; j < count; ++j ) {
+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
+ ASSERT_EQ( host_result( j ), (ScalarType) correct );
+ host_result( j ) = 0;
}
}
}
};
-}
+
+} // namespace
// Computes y^T*A*x
-// (modified from kokkos-tutorials/GTC2016/Exercises/ThreeLevelPar )
+// ( modified from kokkos-tutorials/GTC2016/Exercises/ThreeLevelPar )
#if ( ! defined( KOKKOS_ENABLE_CUDA ) ) || defined( KOKKOS_ENABLE_CUDA_LAMBDA )
-template< typename ScalarType , class DeviceType >
+template< typename ScalarType, class DeviceType >
class TestTripleNestedReduce
{
public:
- typedef DeviceType execution_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef DeviceType execution_space;
+ typedef typename execution_space::size_type size_type;
- //------------------------------------
-
- TestTripleNestedReduce( const size_type & nrows , const size_type & ncols
- , const size_type & team_size , const size_type & vector_length )
+ TestTripleNestedReduce( const size_type & nrows, const size_type & ncols
+ , const size_type & team_size, const size_type & vector_length )
{
- run_test( nrows , ncols , team_size, vector_length );
+ run_test( nrows, ncols, team_size, vector_length );
}
- void run_test( const size_type & nrows , const size_type & ncols
+ void run_test( const size_type & nrows, const size_type & ncols
, const size_type & team_size, const size_type & vector_length )
{
//typedef Kokkos::LayoutLeft Layout;
typedef Kokkos::LayoutRight Layout;
- typedef Kokkos::View<ScalarType* , DeviceType> ViewVector;
- typedef Kokkos::View<ScalarType** , Layout , DeviceType> ViewMatrix;
- ViewVector y( "y" , nrows );
- ViewVector x( "x" , ncols );
- ViewMatrix A( "A" , nrows , ncols );
+ typedef Kokkos::View< ScalarType*, DeviceType > ViewVector;
+ typedef Kokkos::View< ScalarType**, Layout, DeviceType > ViewMatrix;
+
+ ViewVector y( "y", nrows );
+ ViewVector x( "x", ncols );
+ ViewMatrix A( "A", nrows, ncols );
typedef Kokkos::RangePolicy<DeviceType> range_policy;
- // Initialize y vector
- Kokkos::parallel_for( range_policy( 0 , nrows ) , KOKKOS_LAMBDA( const int i ) { y( i ) = 1; } );
+ // Initialize y vector.
+ Kokkos::parallel_for( range_policy( 0, nrows ), KOKKOS_LAMBDA ( const int i ) { y( i ) = 1; } );
- // Initialize x vector
- Kokkos::parallel_for( range_policy( 0 , ncols ) , KOKKOS_LAMBDA( const int i ) { x( i ) = 1; } );
+ // Initialize x vector.
+ Kokkos::parallel_for( range_policy( 0, ncols ), KOKKOS_LAMBDA ( const int i ) { x( i ) = 1; } );
- typedef Kokkos::TeamPolicy<DeviceType> team_policy;
- typedef typename Kokkos::TeamPolicy<DeviceType>::member_type member_type;
+ typedef Kokkos::TeamPolicy< DeviceType > team_policy;
+ typedef typename Kokkos::TeamPolicy< DeviceType >::member_type member_type;
- // Initialize A matrix, note 2D indexing computation
- Kokkos::parallel_for( team_policy( nrows , Kokkos::AUTO ) , KOKKOS_LAMBDA( const member_type& teamMember ) {
+ // Initialize A matrix, note 2D indexing computation.
+ Kokkos::parallel_for( team_policy( nrows, Kokkos::AUTO ), KOKKOS_LAMBDA ( const member_type & teamMember ) {
const int j = teamMember.league_rank();
- Kokkos::parallel_for( Kokkos::TeamThreadRange( teamMember , ncols ) , [&] ( const int i ) {
- A( j , i ) = 1;
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( teamMember, ncols ), [&] ( const int i ) {
+ A( j, i ) = 1;
} );
} );
- // Three level parallelism kernel to force caching of vector x
+ // Three level parallelism kernel to force caching of vector x.
ScalarType result = 0.0;
int chunk_size = 128;
- Kokkos::parallel_reduce( team_policy( nrows/chunk_size , team_size , vector_length ) , KOKKOS_LAMBDA ( const member_type& teamMember , double &update ) {
+ Kokkos::parallel_reduce( team_policy( nrows / chunk_size, team_size, vector_length ),
+ KOKKOS_LAMBDA ( const member_type & teamMember, double & update ) {
const int row_start = teamMember.league_rank() * chunk_size;
const int row_end = row_start + chunk_size;
- Kokkos::parallel_for( Kokkos::TeamThreadRange( teamMember , row_start , row_end ) , [&] ( const int i ) {
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( teamMember, row_start, row_end ), [&] ( const int i ) {
ScalarType sum_i = 0.0;
- Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( teamMember , ncols ) , [&] ( const int j , ScalarType &innerUpdate ) {
- innerUpdate += A( i , j ) * x( j );
- } , sum_i );
- Kokkos::single( Kokkos::PerThread( teamMember ) , [&] () {
+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( teamMember, ncols ), [&] ( const int j, ScalarType &innerUpdate ) {
+ innerUpdate += A( i, j ) * x( j );
+ }, sum_i );
+ Kokkos::single( Kokkos::PerThread( teamMember ), [&] () {
update += y( i ) * sum_i;
} );
} );
- } , result );
+ }, result );
- const ScalarType solution= ( ScalarType ) nrows * ( ScalarType ) ncols;
- ASSERT_EQ( solution , result );
+ const ScalarType solution = (ScalarType) nrows * (ScalarType) ncols;
+ ASSERT_EQ( solution, result );
}
};
-#else /* #if ( ! defined( KOKKOS_ENABLE_CUDA ) ) || defined( KOKKOS_ENABLE_CUDA_LAMBDA ) */
+#else // #if ( ! defined( KOKKOS_ENABLE_CUDA ) ) || defined( KOKKOS_ENABLE_CUDA_LAMBDA )
-template< typename ScalarType , class DeviceType >
+template< typename ScalarType, class DeviceType >
class TestTripleNestedReduce
{
public:
- typedef DeviceType execution_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef DeviceType execution_space;
+ typedef typename execution_space::size_type size_type;
- TestTripleNestedReduce( const size_type & , const size_type
- , const size_type & , const size_type )
- { }
+ TestTripleNestedReduce( const size_type &, const size_type
+ , const size_type &, const size_type )
+ {}
};
#endif
//--------------------------------------------------------------------------
namespace Test {
+
namespace ReduceCombinatorical {
-template<class Scalar,class Space = Kokkos::HostSpace>
+template< class Scalar, class Space = Kokkos::HostSpace >
struct AddPlus {
public:
- //Required
+ // Required.
typedef AddPlus reducer_type;
typedef Scalar value_type;
- typedef Kokkos::View<value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
+ typedef Kokkos::View< value_type, Space, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_type;
private:
result_view_type result;
public:
+ AddPlus( value_type & result_ ) : result( &result_ ) {}
- AddPlus(value_type& result_):result(&result_) {}
-
- //Required
+ // Required.
KOKKOS_INLINE_FUNCTION
- void join(value_type& dest, const value_type& src) const {
+ void join( value_type & dest, const value_type & src ) const {
dest += src + 1;
}
KOKKOS_INLINE_FUNCTION
- void join(volatile value_type& dest, const volatile value_type& src) const {
+ void join( volatile value_type & dest, const volatile value_type & src ) const {
dest += src + 1;
}
- //Optional
+ // Optional.
KOKKOS_INLINE_FUNCTION
- void init( value_type& val) const {
+ void init( value_type & val ) const {
val = value_type();
}
result_view_type result_view() const {
return result;
}
};
-template<int ISTEAM>
+template< int ISTEAM >
struct FunctorScalar;
template<>
-struct FunctorScalar<0>{
- FunctorScalar(Kokkos::View<double> r):result(r) {}
- Kokkos::View<double> result;
+struct FunctorScalar< 0 > {
+ Kokkos::View< double > result;
+
+ FunctorScalar( Kokkos::View< double > r ) : result( r ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i,double& update) const {
- update+=i;
+ void operator()( const int & i, double & update ) const {
+ update += i;
}
};
template<>
-struct FunctorScalar<1>{
- FunctorScalar(Kokkos::View<double> r):result(r) {}
- Kokkos::View<double> result;
-
+struct FunctorScalar< 1 > {
typedef Kokkos::TeamPolicy<>::member_type team_type;
+
+ Kokkos::View< double > result;
+
+ FunctorScalar( Kokkos::View< double > r ) : result( r ) {}
+
KOKKOS_INLINE_FUNCTION
- void operator() (const team_type& team,double& update) const {
- update+=1.0/team.team_size()*team.league_rank();
+ void operator()( const team_type & team, double & update ) const {
+ update += 1.0 / team.team_size() * team.league_rank();
}
};
-template<int ISTEAM>
+template< int ISTEAM >
struct FunctorScalarInit;
template<>
-struct FunctorScalarInit<0> {
- FunctorScalarInit(Kokkos::View<double> r):result(r) {}
+struct FunctorScalarInit< 0 > {
+ Kokkos::View< double > result;
- Kokkos::View<double> result;
+ FunctorScalarInit( Kokkos::View< double > r ) : result( r ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, double& update) const {
+ void operator()( const int & i, double & update ) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
- void init(double& update) const {
+ void init( double & update ) const {
update = 0.0;
}
};
template<>
-struct FunctorScalarInit<1> {
- FunctorScalarInit(Kokkos::View<double> r):result(r) {}
+struct FunctorScalarInit< 1 > {
+ typedef Kokkos::TeamPolicy<>::member_type team_type;
- Kokkos::View<double> result;
+ Kokkos::View< double > result;
+
+ FunctorScalarInit( Kokkos::View< double > r ) : result( r ) {}
- typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
- void operator() (const team_type& team,double& update) const {
- update+=1.0/team.team_size()*team.league_rank();
+ void operator()( const team_type & team, double & update ) const {
+ update += 1.0 / team.team_size() * team.league_rank();
}
KOKKOS_INLINE_FUNCTION
- void init(double& update) const {
+ void init( double & update ) const {
update = 0.0;
}
};
-template<int ISTEAM>
+template< int ISTEAM >
struct FunctorScalarFinal;
-
template<>
-struct FunctorScalarFinal<0> {
- FunctorScalarFinal(Kokkos::View<double> r):result(r) {}
-
+struct FunctorScalarFinal< 0 > {
Kokkos::View<double> result;
+
+ FunctorScalarFinal( Kokkos::View< double > r ) : result( r ) {}
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, double& update) const {
+ void operator()( const int & i, double & update ) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
- void final(double& update) const {
+ void final( double & update ) const {
result() = update;
}
};
template<>
-struct FunctorScalarFinal<1> {
- FunctorScalarFinal(Kokkos::View<double> r):result(r) {}
+struct FunctorScalarFinal< 1 > {
+ typedef Kokkos::TeamPolicy<>::member_type team_type;
- Kokkos::View<double> result;
+ Kokkos::View< double > result;
- typedef Kokkos::TeamPolicy<>::member_type team_type;
+ FunctorScalarFinal( Kokkos::View< double > r ) : result( r ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (const team_type& team, double& update) const {
- update+=1.0/team.team_size()*team.league_rank();
+ void operator()( const team_type & team, double & update ) const {
+ update += 1.0 / team.team_size() * team.league_rank();
}
+
KOKKOS_INLINE_FUNCTION
- void final(double& update) const {
+ void final( double & update ) const {
result() = update;
}
};
-template<int ISTEAM>
+template< int ISTEAM >
struct FunctorScalarJoin;
template<>
-struct FunctorScalarJoin<0> {
- FunctorScalarJoin(Kokkos::View<double> r):result(r) {}
-
+struct FunctorScalarJoin< 0 > {
Kokkos::View<double> result;
+
+ FunctorScalarJoin( Kokkos::View< double > r ) : result( r ) {}
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, double& update) const {
+ void operator()( const int & i, double & update ) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
- void join(volatile double& dst, const volatile double& update) const {
+ void join( volatile double & dst, const volatile double & update ) const {
dst += update;
}
};
template<>
-struct FunctorScalarJoin<1> {
- FunctorScalarJoin(Kokkos::View<double> r):result(r) {}
+struct FunctorScalarJoin< 1 > {
+ typedef Kokkos::TeamPolicy<>::member_type team_type;
- Kokkos::View<double> result;
+ Kokkos::View< double > result;
+
+ FunctorScalarJoin( Kokkos::View< double > r ) : result( r ) {}
- typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
- void operator() (const team_type& team,double& update) const {
- update+=1.0/team.team_size()*team.league_rank();
+ void operator()( const team_type & team, double & update ) const {
+ update += 1.0 / team.team_size() * team.league_rank();
}
KOKKOS_INLINE_FUNCTION
- void join(volatile double& dst, const volatile double& update) const {
+ void join( volatile double & dst, const volatile double & update ) const {
dst += update;
}
};
-template<int ISTEAM>
+template< int ISTEAM >
struct FunctorScalarJoinFinal;
template<>
-struct FunctorScalarJoinFinal<0> {
- FunctorScalarJoinFinal(Kokkos::View<double> r):result(r) {}
+struct FunctorScalarJoinFinal< 0 > {
+ Kokkos::View< double > result;
+
+ FunctorScalarJoinFinal( Kokkos::View< double > r ) : result( r ) {}
- Kokkos::View<double> result;
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, double& update) const {
+ void operator()( const int & i, double & update ) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
- void join(volatile double& dst, const volatile double& update) const {
+ void join( volatile double & dst, const volatile double & update ) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
- void final(double& update) const {
+ void final( double & update ) const {
result() = update;
}
};
template<>
-struct FunctorScalarJoinFinal<1> {
- FunctorScalarJoinFinal(Kokkos::View<double> r):result(r) {}
+struct FunctorScalarJoinFinal< 1 > {
+ typedef Kokkos::TeamPolicy<>::member_type team_type;
- Kokkos::View<double> result;
+ Kokkos::View< double > result;
+
+ FunctorScalarJoinFinal( Kokkos::View< double > r ) : result( r ) {}
- typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
- void operator() (const team_type& team,double& update) const {
- update+=1.0/team.team_size()*team.league_rank();
+ void operator()( const team_type & team, double & update ) const {
+ update += 1.0 / team.team_size() * team.league_rank();
}
KOKKOS_INLINE_FUNCTION
- void join(volatile double& dst, const volatile double& update) const {
+ void join( volatile double & dst, const volatile double & update ) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
- void final(double& update) const {
+ void final( double & update ) const {
result() = update;
}
};
-template<int ISTEAM>
+template< int ISTEAM >
struct FunctorScalarJoinInit;
template<>
-struct FunctorScalarJoinInit<0> {
- FunctorScalarJoinInit(Kokkos::View<double> r):result(r) {}
+struct FunctorScalarJoinInit< 0 > {
+ Kokkos::View< double > result;
+
+ FunctorScalarJoinInit( Kokkos::View< double > r ) : result( r ) {}
- Kokkos::View<double> result;
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, double& update) const {
+ void operator()( const int & i, double & update ) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
- void join(volatile double& dst, const volatile double& update) const {
+ void join( volatile double & dst, const volatile double & update ) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
- void init(double& update) const {
+ void init( double & update ) const {
update = 0.0;
}
};
template<>
-struct FunctorScalarJoinInit<1> {
- FunctorScalarJoinInit(Kokkos::View<double> r):result(r) {}
+struct FunctorScalarJoinInit< 1 > {
+ typedef Kokkos::TeamPolicy<>::member_type team_type;
- Kokkos::View<double> result;
+ Kokkos::View< double > result;
+
+ FunctorScalarJoinInit( Kokkos::View< double > r ) : result( r ) {}
- typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
- void operator() (const team_type& team,double& update) const {
- update+=1.0/team.team_size()*team.league_rank();
+ void operator()( const team_type & team, double & update ) const {
+ update += 1.0 / team.team_size() * team.league_rank();
}
KOKKOS_INLINE_FUNCTION
- void join(volatile double& dst, const volatile double& update) const {
+ void join( volatile double & dst, const volatile double & update ) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
- void init(double& update) const {
+ void init( double & update ) const {
update = 0.0;
}
};
-template<int ISTEAM>
+template< int ISTEAM >
struct FunctorScalarJoinFinalInit;
template<>
-struct FunctorScalarJoinFinalInit<0> {
- FunctorScalarJoinFinalInit(Kokkos::View<double> r):result(r) {}
-
+struct FunctorScalarJoinFinalInit< 0 > {
Kokkos::View<double> result;
+ FunctorScalarJoinFinalInit( Kokkos::View< double > r ) : result( r ) {}
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, double& update) const {
+ void operator()( const int & i, double & update ) const {
update += i;
}
KOKKOS_INLINE_FUNCTION
- void join(volatile double& dst, const volatile double& update) const {
+ void join( volatile double & dst, const volatile double & update ) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
- void final(double& update) const {
+ void final( double & update ) const {
result() = update;
}
KOKKOS_INLINE_FUNCTION
- void init(double& update) const {
+ void init( double & update ) const {
update = 0.0;
}
};
template<>
-struct FunctorScalarJoinFinalInit<1> {
- FunctorScalarJoinFinalInit(Kokkos::View<double> r):result(r) {}
+struct FunctorScalarJoinFinalInit< 1 > {
+ typedef Kokkos::TeamPolicy<>::member_type team_type;
- Kokkos::View<double> result;
+ Kokkos::View< double > result;
+
+ FunctorScalarJoinFinalInit( Kokkos::View< double > r ) : result( r ) {}
- typedef Kokkos::TeamPolicy<>::member_type team_type;
KOKKOS_INLINE_FUNCTION
- void operator() (const team_type& team,double& update) const {
- update+=1.0/team.team_size()*team.league_rank();
+ void operator()( const team_type & team, double & update ) const {
+ update += 1.0 / team.team_size() * team.league_rank();
}
KOKKOS_INLINE_FUNCTION
- void join(volatile double& dst, const volatile double& update) const {
+ void join( volatile double & dst, const volatile double & update ) const {
dst += update;
}
KOKKOS_INLINE_FUNCTION
- void final(double& update) const {
+ void final( double & update ) const {
result() = update;
}
KOKKOS_INLINE_FUNCTION
- void init(double& update) const {
+ void init( double & update ) const {
update = 0.0;
}
};
+
struct Functor1 {
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i,double& update) const {
- update+=i;
+ void operator()( const int & i, double & update ) const {
+ update += i;
}
};
struct Functor2 {
typedef double value_type[];
+
const unsigned value_count;
- Functor2(unsigned n):value_count(n){}
+ Functor2( unsigned n ) : value_count( n ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (const unsigned& i,double update[]) const {
- for(unsigned j=0;j<value_count;j++)
- update[j]+=i;
+ void operator()( const unsigned & i, double update[] ) const {
+ for ( unsigned j = 0; j < value_count; j++ ) {
+ update[j] += i;
+ }
}
KOKKOS_INLINE_FUNCTION
void init( double dst[] ) const
{
- for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] = 0 ;
+ for ( unsigned i = 0; i < value_count; ++i ) dst[i] = 0;
}
KOKKOS_INLINE_FUNCTION
- void join( volatile double dst[] ,
+ void join( volatile double dst[],
const volatile double src[] ) const
{
- for ( unsigned i = 0 ; i < value_count ; ++i ) dst[i] += src[i] ;
+ for ( unsigned i = 0; i < value_count; ++i ) dst[i] += src[i];
}
};
-}
-}
+} // namespace ReduceCombinatorical
+
+} // namespace Test
namespace Test {
-template<class ExecSpace = Kokkos::DefaultExecutionSpace>
+template< class ExecSpace = Kokkos::DefaultExecutionSpace >
struct TestReduceCombinatoricalInstantiation {
- template<class ... Args>
- static void CallParallelReduce(Args... args) {
- Kokkos::parallel_reduce(args...);
+ template< class ... Args >
+ static void CallParallelReduce( Args... args ) {
+ Kokkos::parallel_reduce( args... );
}
- template<class ... Args>
- static void AddReturnArgument(Args... args) {
- Kokkos::View<double,Kokkos::HostSpace> result_view("ResultView");
- double expected_result = 1000.0*999.0/2.0;
+ template< class ... Args >
+ static void AddReturnArgument( Args... args ) {
+ Kokkos::View< double, Kokkos::HostSpace > result_view( "ResultView" );
+ double expected_result = 1000.0 * 999.0 / 2.0;
double value = 0;
- Kokkos::parallel_reduce(args...,value);
- ASSERT_EQ(expected_result,value);
+ Kokkos::parallel_reduce( args..., value );
+ ASSERT_EQ( expected_result, value );
result_view() = 0;
- CallParallelReduce(args...,result_view);
- ASSERT_EQ(expected_result,result_view());
+ CallParallelReduce( args..., result_view );
+ ASSERT_EQ( expected_result, result_view() );
value = 0;
- CallParallelReduce(args...,Kokkos::View<double,Kokkos::HostSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>(&value));
- ASSERT_EQ(expected_result,value);
+ CallParallelReduce( args..., Kokkos::View< double, Kokkos::HostSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >( &value ) );
+ ASSERT_EQ( expected_result, value );
result_view() = 0;
- const Kokkos::View<double,Kokkos::HostSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> result_view_const_um = result_view;
- CallParallelReduce(args...,result_view_const_um);
- ASSERT_EQ(expected_result,result_view_const_um());
+ const Kokkos::View< double, Kokkos::HostSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > result_view_const_um = result_view;
+ CallParallelReduce( args..., result_view_const_um );
+ ASSERT_EQ( expected_result, result_view_const_um() );
value = 0;
- CallParallelReduce(args...,Test::ReduceCombinatorical::AddPlus<double>(value));
- if((Kokkos::DefaultExecutionSpace::concurrency() > 1) && (ExecSpace::concurrency()>1))
- ASSERT_TRUE(expected_result<value);
- else if((Kokkos::DefaultExecutionSpace::concurrency() > 1) || (ExecSpace::concurrency()>1))
- ASSERT_TRUE(expected_result<=value);
- else
- ASSERT_EQ(expected_result,value);
+ CallParallelReduce( args..., Test::ReduceCombinatorical::AddPlus< double >( value ) );
+ if ( ( Kokkos::DefaultExecutionSpace::concurrency() > 1 ) && ( ExecSpace::concurrency() > 1 ) ) {
+ ASSERT_TRUE( expected_result < value );
+ }
+ else if ( ( Kokkos::DefaultExecutionSpace::concurrency() > 1 ) || ( ExecSpace::concurrency() > 1 ) ) {
+ ASSERT_TRUE( expected_result <= value );
+ }
+ else {
+ ASSERT_EQ( expected_result, value );
+ }
value = 0;
- Test::ReduceCombinatorical::AddPlus<double> add(value);
- CallParallelReduce(args...,add);
- if((Kokkos::DefaultExecutionSpace::concurrency() > 1) && (ExecSpace::concurrency()>1))
- ASSERT_TRUE(expected_result<value);
- else if((Kokkos::DefaultExecutionSpace::concurrency() > 1) || (ExecSpace::concurrency()>1))
- ASSERT_TRUE(expected_result<=value);
- else
- ASSERT_EQ(expected_result,value);
+ Test::ReduceCombinatorical::AddPlus< double > add( value );
+ CallParallelReduce( args..., add );
+ if ( ( Kokkos::DefaultExecutionSpace::concurrency() > 1 ) && ( ExecSpace::concurrency() > 1 ) ) {
+ ASSERT_TRUE( expected_result < value );
+ }
+ else if ( ( Kokkos::DefaultExecutionSpace::concurrency() > 1 ) || ( ExecSpace::concurrency() > 1 ) ) {
+ ASSERT_TRUE( expected_result <= value );
+ }
+ else {
+ ASSERT_EQ( expected_result, value );
+ }
}
-
- template<class ... Args>
- static void AddLambdaRange(void*,Args... args) {
- AddReturnArgument(args..., KOKKOS_LAMBDA (const int&i , double& lsum) {
+ template< class ... Args >
+ static void AddLambdaRange( void*, Args... args ) {
+ AddReturnArgument( args..., KOKKOS_LAMBDA ( const int & i, double & lsum ) {
lsum += i;
});
}
- template<class ... Args>
- static void AddLambdaTeam(void*,Args... args) {
- AddReturnArgument(args..., KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& team, double& update) {
- update+=1.0/team.team_size()*team.league_rank();
+ template< class ... Args >
+ static void AddLambdaTeam( void*, Args... args ) {
+ AddReturnArgument( args..., KOKKOS_LAMBDA ( const Kokkos::TeamPolicy<>::member_type & team, double & update ) {
+ update += 1.0 / team.team_size() * team.league_rank();
});
}
- template<class ... Args>
- static void AddLambdaRange(Kokkos::InvalidType,Args... args) {
- }
+ template< class ... Args >
+ static void AddLambdaRange( Kokkos::InvalidType, Args... args ) {}
- template<class ... Args>
- static void AddLambdaTeam(Kokkos::InvalidType,Args... args) {
- }
+ template< class ... Args >
+ static void AddLambdaTeam( Kokkos::InvalidType, Args... args ) {}
- template<int ISTEAM, class ... Args>
- static void AddFunctor(Args... args) {
- Kokkos::View<double> result_view("FunctorView");
- auto h_r = Kokkos::create_mirror_view(result_view);
- Test::ReduceCombinatorical::FunctorScalar<ISTEAM> functor(result_view);
- double expected_result = 1000.0*999.0/2.0;
+ template< int ISTEAM, class ... Args >
+ static void AddFunctor( Args... args ) {
+ Kokkos::View< double > result_view( "FunctorView" );
+ auto h_r = Kokkos::create_mirror_view( result_view );
+ Test::ReduceCombinatorical::FunctorScalar< ISTEAM > functor( result_view );
+ double expected_result = 1000.0 * 999.0 / 2.0;
- AddReturnArgument(args..., functor);
- AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalar<ISTEAM>(result_view));
- AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalarInit<ISTEAM>(result_view));
- AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalarJoin<ISTEAM>(result_view));
- AddReturnArgument(args..., Test::ReduceCombinatorical::FunctorScalarJoinInit<ISTEAM>(result_view));
+ AddReturnArgument( args..., functor );
+ AddReturnArgument( args..., Test::ReduceCombinatorical::FunctorScalar< ISTEAM >( result_view ) );
+ AddReturnArgument( args..., Test::ReduceCombinatorical::FunctorScalarInit< ISTEAM >( result_view ) );
+ AddReturnArgument( args..., Test::ReduceCombinatorical::FunctorScalarJoin< ISTEAM >( result_view ) );
+ AddReturnArgument( args..., Test::ReduceCombinatorical::FunctorScalarJoinInit< ISTEAM >( result_view ) );
h_r() = 0;
- Kokkos::deep_copy(result_view,h_r);
- CallParallelReduce(args..., Test::ReduceCombinatorical::FunctorScalarFinal<ISTEAM>(result_view));
- Kokkos::deep_copy(h_r,result_view);
- ASSERT_EQ(expected_result,h_r());
+ Kokkos::deep_copy( result_view, h_r );
+ CallParallelReduce( args..., Test::ReduceCombinatorical::FunctorScalarFinal< ISTEAM >( result_view ) );
+ Kokkos::deep_copy( h_r, result_view );
+ ASSERT_EQ( expected_result, h_r() );
h_r() = 0;
- Kokkos::deep_copy(result_view,h_r);
- CallParallelReduce(args..., Test::ReduceCombinatorical::FunctorScalarJoinFinal<ISTEAM>(result_view));
- Kokkos::deep_copy(h_r,result_view);
- ASSERT_EQ(expected_result,h_r());
+ Kokkos::deep_copy( result_view, h_r );
+ CallParallelReduce( args..., Test::ReduceCombinatorical::FunctorScalarJoinFinal< ISTEAM >( result_view ) );
+ Kokkos::deep_copy( h_r, result_view );
+ ASSERT_EQ( expected_result, h_r() );
h_r() = 0;
- Kokkos::deep_copy(result_view,h_r);
- CallParallelReduce(args..., Test::ReduceCombinatorical::FunctorScalarJoinFinalInit<ISTEAM>(result_view));
- Kokkos::deep_copy(h_r,result_view);
- ASSERT_EQ(expected_result,h_r());
+ Kokkos::deep_copy( result_view, h_r );
+ CallParallelReduce( args..., Test::ReduceCombinatorical::FunctorScalarJoinFinalInit< ISTEAM >( result_view ) );
+ Kokkos::deep_copy( h_r, result_view );
+ ASSERT_EQ( expected_result, h_r() );
}
- template<class ... Args>
- static void AddFunctorLambdaRange(Args... args) {
- AddFunctor<0,Args...>(args...);
- #ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
- AddLambdaRange(typename std::conditional<std::is_same<ExecSpace,Kokkos::DefaultExecutionSpace>::value,void*,Kokkos::InvalidType>::type(), args...);
- #endif
+ template< class ... Args >
+ static void AddFunctorLambdaRange( Args... args ) {
+ AddFunctor< 0, Args... >( args... );
+#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
+ AddLambdaRange( typename std::conditional< std::is_same<ExecSpace, Kokkos::DefaultExecutionSpace>::value, void*, Kokkos::InvalidType >::type(), args... );
+#endif
}
- template<class ... Args>
- static void AddFunctorLambdaTeam(Args... args) {
- AddFunctor<1,Args...>(args...);
- #ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
- AddLambdaTeam(typename std::conditional<std::is_same<ExecSpace,Kokkos::DefaultExecutionSpace>::value,void*,Kokkos::InvalidType>::type(), args...);
- #endif
+ template< class ... Args >
+ static void AddFunctorLambdaTeam( Args... args ) {
+ AddFunctor< 1, Args... >( args... );
+#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
+ AddLambdaTeam( typename std::conditional< std::is_same<ExecSpace, Kokkos::DefaultExecutionSpace>::value, void*, Kokkos::InvalidType >::type(), args... );
+#endif
}
- template<class ... Args>
- static void AddPolicy(Args... args) {
+ template< class ... Args >
+ static void AddPolicy( Args... args ) {
int N = 1000;
- Kokkos::RangePolicy<ExecSpace> policy(0,N);
+ Kokkos::RangePolicy< ExecSpace > policy( 0, N );
- AddFunctorLambdaRange(args...,1000);
- AddFunctorLambdaRange(args...,N);
- AddFunctorLambdaRange(args...,policy);
- AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace>(0,N));
- AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(0,N));
- AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Static> >(0,N).set_chunk_size(10));
- AddFunctorLambdaRange(args...,Kokkos::RangePolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(0,N).set_chunk_size(10));
+ AddFunctorLambdaRange( args..., 1000 );
+ AddFunctorLambdaRange( args..., N );
+ AddFunctorLambdaRange( args..., policy );
+ AddFunctorLambdaRange( args..., Kokkos::RangePolicy< ExecSpace >( 0, N ) );
+ AddFunctorLambdaRange( args..., Kokkos::RangePolicy< ExecSpace, Kokkos::Schedule<Kokkos::Dynamic> >( 0, N ) );
+ AddFunctorLambdaRange( args..., Kokkos::RangePolicy< ExecSpace, Kokkos::Schedule<Kokkos::Static> >( 0, N ).set_chunk_size( 10 ) );
+ AddFunctorLambdaRange( args..., Kokkos::RangePolicy< ExecSpace, Kokkos::Schedule<Kokkos::Dynamic> >( 0, N ).set_chunk_size( 10 ) );
- AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace>(N,Kokkos::AUTO));
- AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(N,Kokkos::AUTO));
- AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace,Kokkos::Schedule<Kokkos::Static> >(N,Kokkos::AUTO).set_chunk_size(10));
- AddFunctorLambdaTeam(args...,Kokkos::TeamPolicy<ExecSpace,Kokkos::Schedule<Kokkos::Dynamic> >(N,Kokkos::AUTO).set_chunk_size(10));
+ AddFunctorLambdaTeam( args..., Kokkos::TeamPolicy< ExecSpace >( N, Kokkos::AUTO ) );
+ AddFunctorLambdaTeam( args..., Kokkos::TeamPolicy< ExecSpace, Kokkos::Schedule<Kokkos::Dynamic> >( N, Kokkos::AUTO ) );
+ AddFunctorLambdaTeam( args..., Kokkos::TeamPolicy< ExecSpace, Kokkos::Schedule<Kokkos::Static> >( N, Kokkos::AUTO ).set_chunk_size( 10 ) );
+ AddFunctorLambdaTeam( args..., Kokkos::TeamPolicy< ExecSpace, Kokkos::Schedule<Kokkos::Dynamic> >( N, Kokkos::AUTO ).set_chunk_size( 10 ) );
}
-
static void execute_a() {
AddPolicy();
}
static void execute_b() {
- std::string s("Std::String");
- AddPolicy(s.c_str());
- AddPolicy("Char Constant");
+ std::string s( "Std::String" );
+ AddPolicy( s.c_str() );
+ AddPolicy( "Char Constant" );
}
static void execute_c() {
- std::string s("Std::String");
- AddPolicy(s);
+ std::string s( "Std::String" );
+ AddPolicy( s );
}
};
-template<class Scalar, class ExecSpace = Kokkos::DefaultExecutionSpace>
+template< class Scalar, class ExecSpace = Kokkos::DefaultExecutionSpace >
struct TestReducers {
-
struct SumFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, Scalar& value) const {
- value += values(i);
+ void operator()( const int & i, Scalar & value ) const {
+ value += values( i );
}
};
struct ProdFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, Scalar& value) const {
- value *= values(i);
+ void operator()( const int & i, Scalar & value ) const {
+ value *= values( i );
}
};
struct MinFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, Scalar& value) const {
- if(values(i) < value)
- value = values(i);
+ void operator()( const int & i, Scalar & value ) const {
+ if ( values( i ) < value ) value = values( i );
}
};
struct MaxFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, Scalar& value) const {
- if(values(i) > value)
- value = values(i);
+ void operator()( const int & i, Scalar & value ) const {
+ if ( values( i ) > value ) value = values( i );
}
};
struct MinLocFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i,
- typename Kokkos::Experimental::MinLoc<Scalar,int>::value_type& value) const {
- if(values(i) < value.val) {
- value.val = values(i);
+ void operator()( const int & i, typename Kokkos::Experimental::MinLoc< Scalar, int >::value_type & value ) const {
+ if ( values( i ) < value.val ) {
+ value.val = values( i );
value.loc = i;
}
}
};
struct MaxLocFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i,
- typename Kokkos::Experimental::MaxLoc<Scalar,int>::value_type& value) const {
- if(values(i) > value.val) {
- value.val = values(i);
+ void operator()( const int & i, typename Kokkos::Experimental::MaxLoc< Scalar, int >::value_type & value ) const {
+ if ( values( i ) > value.val ) {
+ value.val = values( i );
value.loc = i;
}
}
};
struct MinMaxLocFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i,
- typename Kokkos::Experimental::MinMaxLoc<Scalar,int>::value_type& value) const {
- if(values(i) > value.max_val) {
- value.max_val = values(i);
+ void operator()( const int & i, typename Kokkos::Experimental::MinMaxLoc< Scalar, int >::value_type & value ) const {
+ if ( values( i ) > value.max_val ) {
+ value.max_val = values( i );
value.max_loc = i;
}
- if(values(i) < value.min_val) {
- value.min_val = values(i);
+
+ if ( values( i ) < value.min_val ) {
+ value.min_val = values( i );
value.min_loc = i;
}
}
};
struct BAndFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, Scalar& value) const {
- value = value & values(i);
+ void operator()( const int & i, Scalar & value ) const {
+ value = value & values( i );
}
};
struct BOrFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, Scalar& value) const {
- value = value | values(i);
+ void operator()( const int & i, Scalar & value ) const {
+ value = value | values( i );
}
};
struct BXorFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, Scalar& value) const {
- value = value ^ values(i);
+ void operator()( const int & i, Scalar & value ) const {
+ value = value ^ values( i );
}
};
struct LAndFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, Scalar& value) const {
- value = value && values(i);
+ void operator()( const int & i, Scalar & value ) const {
+ value = value && values( i );
}
};
struct LOrFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, Scalar& value) const {
- value = value || values(i);
+ void operator()( const int & i, Scalar & value ) const {
+ value = value || values( i );
}
};
struct LXorFunctor {
- Kokkos::View<const Scalar*,ExecSpace> values;
+ Kokkos::View< const Scalar*, ExecSpace > values;
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int& i, Scalar& value) const {
- value = value ? (!values(i)) : values(i);
+ void operator()( const int & i, Scalar & value ) const {
+ value = value ? ( !values( i ) ) : values( i );
}
};
- static void test_sum(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
+ static void test_sum( int N ) {
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
Scalar reference_sum = 0;
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%100);
- reference_sum += h_values(i);
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 100 );
+ reference_sum += h_values( i );
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
SumFunctor f;
f.values = values;
Scalar init = 0;
{
Scalar sum_scalar = init;
- Kokkos::Experimental::Sum<Scalar> reducer_scalar(sum_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(sum_scalar,reference_sum);
+ Kokkos::Experimental::Sum< Scalar > reducer_scalar( sum_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( sum_scalar, reference_sum );
+
Scalar sum_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(sum_scalar_view,reference_sum);
+ ASSERT_EQ( sum_scalar_view, reference_sum );
}
+
{
Scalar sum_scalar_init = init;
- Kokkos::Experimental::Sum<Scalar> reducer_scalar_init(sum_scalar_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
- ASSERT_EQ(sum_scalar_init,reference_sum);
+ Kokkos::Experimental::Sum< Scalar > reducer_scalar_init( sum_scalar_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
+
+ ASSERT_EQ( sum_scalar_init, reference_sum );
+
Scalar sum_scalar_init_view = reducer_scalar_init.result_view()();
- ASSERT_EQ(sum_scalar_init_view,reference_sum);
+ ASSERT_EQ( sum_scalar_init_view, reference_sum );
}
+
{
- Kokkos::View<Scalar,Kokkos::HostSpace> sum_view("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace> sum_view( "View" );
sum_view() = init;
- Kokkos::Experimental::Sum<Scalar> reducer_view(sum_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::Experimental::Sum< Scalar > reducer_view( sum_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
Scalar sum_view_scalar = sum_view();
- ASSERT_EQ(sum_view_scalar,reference_sum);
+ ASSERT_EQ( sum_view_scalar, reference_sum );
+
Scalar sum_view_view = reducer_view.result_view()();
- ASSERT_EQ(sum_view_view,reference_sum);
+ ASSERT_EQ( sum_view_view, reference_sum );
}
+
{
- Kokkos::View<Scalar,Kokkos::HostSpace> sum_view_init("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > sum_view_init( "View" );
sum_view_init() = init;
- Kokkos::Experimental::Sum<Scalar> reducer_view_init(sum_view_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
+ Kokkos::Experimental::Sum< Scalar > reducer_view_init( sum_view_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
+
Scalar sum_view_init_scalar = sum_view_init();
- ASSERT_EQ(sum_view_init_scalar,reference_sum);
+ ASSERT_EQ( sum_view_init_scalar, reference_sum );
+
Scalar sum_view_init_view = reducer_view_init.result_view()();
- ASSERT_EQ(sum_view_init_view,reference_sum);
+ ASSERT_EQ( sum_view_init_view, reference_sum );
}
}
- static void test_prod(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
+ static void test_prod( int N ) {
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
Scalar reference_prod = 1;
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%4+1);
- reference_prod *= h_values(i);
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 4 + 1 );
+ reference_prod *= h_values( i );
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
ProdFunctor f;
f.values = values;
Scalar init = 1;
- if(std::is_arithmetic<Scalar>::value)
+ if ( std::is_arithmetic< Scalar >::value )
{
Scalar prod_scalar = init;
- Kokkos::Experimental::Prod<Scalar> reducer_scalar(prod_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(prod_scalar,reference_prod);
+ Kokkos::Experimental::Prod< Scalar > reducer_scalar( prod_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( prod_scalar, reference_prod );
+
Scalar prod_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(prod_scalar_view,reference_prod);
+ ASSERT_EQ( prod_scalar_view, reference_prod );
}
+
{
Scalar prod_scalar_init = init;
- Kokkos::Experimental::Prod<Scalar> reducer_scalar_init(prod_scalar_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
- ASSERT_EQ(prod_scalar_init,reference_prod);
+ Kokkos::Experimental::Prod< Scalar > reducer_scalar_init( prod_scalar_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
+
+ ASSERT_EQ( prod_scalar_init, reference_prod );
+
Scalar prod_scalar_init_view = reducer_scalar_init.result_view()();
- ASSERT_EQ(prod_scalar_init_view,reference_prod);
+ ASSERT_EQ( prod_scalar_init_view, reference_prod );
}
- if(std::is_arithmetic<Scalar>::value)
+ if ( std::is_arithmetic< Scalar >::value )
{
- Kokkos::View<Scalar,Kokkos::HostSpace> prod_view("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > prod_view( "View" );
prod_view() = init;
- Kokkos::Experimental::Prod<Scalar> reducer_view(prod_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::Experimental::Prod< Scalar > reducer_view( prod_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
Scalar prod_view_scalar = prod_view();
- ASSERT_EQ(prod_view_scalar,reference_prod);
+ ASSERT_EQ( prod_view_scalar, reference_prod );
+
Scalar prod_view_view = reducer_view.result_view()();
- ASSERT_EQ(prod_view_view,reference_prod);
+ ASSERT_EQ( prod_view_view, reference_prod );
}
+
{
- Kokkos::View<Scalar,Kokkos::HostSpace> prod_view_init("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > prod_view_init( "View" );
prod_view_init() = init;
- Kokkos::Experimental::Prod<Scalar> reducer_view_init(prod_view_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
+ Kokkos::Experimental::Prod< Scalar > reducer_view_init( prod_view_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
+
Scalar prod_view_init_scalar = prod_view_init();
- ASSERT_EQ(prod_view_init_scalar,reference_prod);
+ ASSERT_EQ( prod_view_init_scalar, reference_prod );
+
Scalar prod_view_init_view = reducer_view_init.result_view()();
- ASSERT_EQ(prod_view_init_view,reference_prod);
+ ASSERT_EQ( prod_view_init_view, reference_prod );
}
}
- static void test_min(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
- Scalar reference_min = std::numeric_limits<Scalar>::max();
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%100000);
- if(h_values(i)<reference_min)
- reference_min = h_values(i);
+ static void test_min( int N ) {
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
+ Scalar reference_min = std::numeric_limits< Scalar >::max();
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 100000 );
+
+ if ( h_values( i ) < reference_min ) reference_min = h_values( i );
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
MinFunctor f;
f.values = values;
- Scalar init = std::numeric_limits<Scalar>::max();
+ Scalar init = std::numeric_limits< Scalar >::max();
{
Scalar min_scalar = init;
- Kokkos::Experimental::Min<Scalar> reducer_scalar(min_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(min_scalar,reference_min);
+ Kokkos::Experimental::Min< Scalar > reducer_scalar( min_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( min_scalar, reference_min );
+
Scalar min_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(min_scalar_view,reference_min);
+ ASSERT_EQ( min_scalar_view, reference_min );
}
+
{
Scalar min_scalar_init = init;
- Kokkos::Experimental::Min<Scalar> reducer_scalar_init(min_scalar_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
- ASSERT_EQ(min_scalar_init,reference_min);
+ Kokkos::Experimental::Min< Scalar > reducer_scalar_init( min_scalar_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
+
+ ASSERT_EQ( min_scalar_init, reference_min );
+
Scalar min_scalar_init_view = reducer_scalar_init.result_view()();
- ASSERT_EQ(min_scalar_init_view,reference_min);
+ ASSERT_EQ( min_scalar_init_view, reference_min );
}
+
{
- Kokkos::View<Scalar,Kokkos::HostSpace> min_view("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > min_view( "View" );
min_view() = init;
- Kokkos::Experimental::Min<Scalar> reducer_view(min_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::Experimental::Min< Scalar > reducer_view( min_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
Scalar min_view_scalar = min_view();
- ASSERT_EQ(min_view_scalar,reference_min);
+ ASSERT_EQ( min_view_scalar, reference_min );
+
Scalar min_view_view = reducer_view.result_view()();
- ASSERT_EQ(min_view_view,reference_min);
+ ASSERT_EQ( min_view_view, reference_min );
}
+
{
- Kokkos::View<Scalar,Kokkos::HostSpace> min_view_init("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > min_view_init( "View" );
min_view_init() = init;
- Kokkos::Experimental::Min<Scalar> reducer_view_init(min_view_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
+ Kokkos::Experimental::Min< Scalar > reducer_view_init( min_view_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
+
Scalar min_view_init_scalar = min_view_init();
- ASSERT_EQ(min_view_init_scalar,reference_min);
+ ASSERT_EQ( min_view_init_scalar, reference_min );
+
Scalar min_view_init_view = reducer_view_init.result_view()();
- ASSERT_EQ(min_view_init_view,reference_min);
+ ASSERT_EQ( min_view_init_view, reference_min );
}
}
- static void test_max(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
- Scalar reference_max = std::numeric_limits<Scalar>::min();
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%100000+1);
- if(h_values(i)>reference_max)
- reference_max = h_values(i);
+ static void test_max( int N ) {
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
+ Scalar reference_max = std::numeric_limits< Scalar >::min();
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 100000 + 1 );
+
+ if ( h_values( i ) > reference_max ) reference_max = h_values( i );
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
MaxFunctor f;
f.values = values;
- Scalar init = std::numeric_limits<Scalar>::min();
+ Scalar init = std::numeric_limits< Scalar >::min();
{
Scalar max_scalar = init;
- Kokkos::Experimental::Max<Scalar> reducer_scalar(max_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(max_scalar,reference_max);
+ Kokkos::Experimental::Max< Scalar > reducer_scalar( max_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( max_scalar, reference_max );
+
Scalar max_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(max_scalar_view,reference_max);
+ ASSERT_EQ( max_scalar_view, reference_max );
}
+
{
Scalar max_scalar_init = init;
- Kokkos::Experimental::Max<Scalar> reducer_scalar_init(max_scalar_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
- ASSERT_EQ(max_scalar_init,reference_max);
+ Kokkos::Experimental::Max< Scalar > reducer_scalar_init( max_scalar_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
+
+ ASSERT_EQ( max_scalar_init, reference_max );
+
Scalar max_scalar_init_view = reducer_scalar_init.result_view()();
- ASSERT_EQ(max_scalar_init_view,reference_max);
+ ASSERT_EQ( max_scalar_init_view, reference_max );
}
+
{
- Kokkos::View<Scalar,Kokkos::HostSpace> max_view("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > max_view( "View" );
max_view() = init;
- Kokkos::Experimental::Max<Scalar> reducer_view(max_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::Experimental::Max< Scalar > reducer_view( max_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
Scalar max_view_scalar = max_view();
- ASSERT_EQ(max_view_scalar,reference_max);
+ ASSERT_EQ( max_view_scalar, reference_max );
+
Scalar max_view_view = reducer_view.result_view()();
- ASSERT_EQ(max_view_view,reference_max);
+ ASSERT_EQ( max_view_view, reference_max );
}
+
{
- Kokkos::View<Scalar,Kokkos::HostSpace> max_view_init("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > max_view_init( "View" );
max_view_init() = init;
- Kokkos::Experimental::Max<Scalar> reducer_view_init(max_view_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
+ Kokkos::Experimental::Max< Scalar > reducer_view_init( max_view_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
+
Scalar max_view_init_scalar = max_view_init();
- ASSERT_EQ(max_view_init_scalar,reference_max);
+ ASSERT_EQ( max_view_init_scalar, reference_max );
+
Scalar max_view_init_view = reducer_view_init.result_view()();
- ASSERT_EQ(max_view_init_view,reference_max);
+ ASSERT_EQ( max_view_init_view, reference_max );
}
}
- static void test_minloc(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
- Scalar reference_min = std::numeric_limits<Scalar>::max();
+ static void test_minloc( int N ) {
+ typedef typename Kokkos::Experimental::MinLoc< Scalar, int >::value_type value_type;
+
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
+ Scalar reference_min = std::numeric_limits< Scalar >::max();
int reference_loc = -1;
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%100000);
- if(h_values(i)<reference_min) {
- reference_min = h_values(i);
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 100000 );
+
+ if ( h_values( i ) < reference_min ) {
+ reference_min = h_values( i );
reference_loc = i;
- } else if (h_values(i) == reference_min) {
- // make min unique
- h_values(i) += std::numeric_limits<Scalar>::epsilon();
+ }
+ else if ( h_values( i ) == reference_min ) {
+ // Make min unique.
+ h_values( i ) += std::numeric_limits< Scalar >::epsilon();
}
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
MinLocFunctor f;
- typedef typename Kokkos::Experimental::MinLoc<Scalar,int>::value_type value_type;
f.values = values;
- Scalar init = std::numeric_limits<Scalar>::max();
-
+ Scalar init = std::numeric_limits< Scalar >::max();
{
value_type min_scalar;
- Kokkos::Experimental::MinLoc<Scalar,int> reducer_scalar(min_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(min_scalar.val,reference_min);
- ASSERT_EQ(min_scalar.loc,reference_loc);
+ Kokkos::Experimental::MinLoc< Scalar, int > reducer_scalar( min_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( min_scalar.val, reference_min );
+ ASSERT_EQ( min_scalar.loc, reference_loc );
+
value_type min_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(min_scalar_view.val,reference_min);
- ASSERT_EQ(min_scalar_view.loc,reference_loc);
+ ASSERT_EQ( min_scalar_view.val, reference_min );
+ ASSERT_EQ( min_scalar_view.loc, reference_loc );
}
+
{
value_type min_scalar_init;
- Kokkos::Experimental::MinLoc<Scalar,int> reducer_scalar_init(min_scalar_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
- ASSERT_EQ(min_scalar_init.val,reference_min);
- ASSERT_EQ(min_scalar_init.loc,reference_loc);
+ Kokkos::Experimental::MinLoc< Scalar, int > reducer_scalar_init( min_scalar_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
+
+ ASSERT_EQ( min_scalar_init.val, reference_min );
+ ASSERT_EQ( min_scalar_init.loc, reference_loc );
+
value_type min_scalar_init_view = reducer_scalar_init.result_view()();
- ASSERT_EQ(min_scalar_init_view.val,reference_min);
- ASSERT_EQ(min_scalar_init_view.loc,reference_loc);
+ ASSERT_EQ( min_scalar_init_view.val, reference_min );
+ ASSERT_EQ( min_scalar_init_view.loc, reference_loc );
}
+
{
- Kokkos::View<value_type,Kokkos::HostSpace> min_view("View");
- Kokkos::Experimental::MinLoc<Scalar,int> reducer_view(min_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::View< value_type, Kokkos::HostSpace > min_view( "View" );
+ Kokkos::Experimental::MinLoc< Scalar, int > reducer_view( min_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
value_type min_view_scalar = min_view();
- ASSERT_EQ(min_view_scalar.val,reference_min);
- ASSERT_EQ(min_view_scalar.loc,reference_loc);
+ ASSERT_EQ( min_view_scalar.val, reference_min );
+ ASSERT_EQ( min_view_scalar.loc, reference_loc );
+
value_type min_view_view = reducer_view.result_view()();
- ASSERT_EQ(min_view_view.val,reference_min);
- ASSERT_EQ(min_view_view.loc,reference_loc);
+ ASSERT_EQ( min_view_view.val, reference_min );
+ ASSERT_EQ( min_view_view.loc, reference_loc );
}
+
{
- Kokkos::View<value_type,Kokkos::HostSpace> min_view_init("View");
- Kokkos::Experimental::MinLoc<Scalar,int> reducer_view_init(min_view_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
+ Kokkos::View< value_type, Kokkos::HostSpace > min_view_init( "View" );
+ Kokkos::Experimental::MinLoc< Scalar, int > reducer_view_init( min_view_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
+
value_type min_view_init_scalar = min_view_init();
- ASSERT_EQ(min_view_init_scalar.val,reference_min);
- ASSERT_EQ(min_view_init_scalar.loc,reference_loc);
+ ASSERT_EQ( min_view_init_scalar.val, reference_min );
+ ASSERT_EQ( min_view_init_scalar.loc, reference_loc );
+
value_type min_view_init_view = reducer_view_init.result_view()();
- ASSERT_EQ(min_view_init_view.val,reference_min);
- ASSERT_EQ(min_view_init_view.loc,reference_loc);
+ ASSERT_EQ( min_view_init_view.val, reference_min );
+ ASSERT_EQ( min_view_init_view.loc, reference_loc );
}
}
- static void test_maxloc(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
- Scalar reference_max = std::numeric_limits<Scalar>::min();
+ static void test_maxloc( int N ) {
+ typedef typename Kokkos::Experimental::MaxLoc< Scalar, int >::value_type value_type;
+
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
+ Scalar reference_max = std::numeric_limits< Scalar >::min();
int reference_loc = -1;
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%100000);
- if(h_values(i)>reference_max) {
- reference_max = h_values(i);
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 100000 );
+
+ if ( h_values( i ) > reference_max ) {
+ reference_max = h_values( i );
reference_loc = i;
- } else if (h_values(i) == reference_max) {
- // make max unique
- h_values(i) -= std::numeric_limits<Scalar>::epsilon();
+ }
+ else if ( h_values( i ) == reference_max ) {
+ // Make max unique.
+ h_values( i ) -= std::numeric_limits< Scalar >::epsilon();
}
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
MaxLocFunctor f;
- typedef typename Kokkos::Experimental::MaxLoc<Scalar,int>::value_type value_type;
f.values = values;
- Scalar init = std::numeric_limits<Scalar>::min();
-
+ Scalar init = std::numeric_limits< Scalar >::min();
{
value_type max_scalar;
- Kokkos::Experimental::MaxLoc<Scalar,int> reducer_scalar(max_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(max_scalar.val,reference_max);
- ASSERT_EQ(max_scalar.loc,reference_loc);
+ Kokkos::Experimental::MaxLoc< Scalar, int > reducer_scalar( max_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( max_scalar.val, reference_max );
+ ASSERT_EQ( max_scalar.loc, reference_loc );
+
value_type max_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(max_scalar_view.val,reference_max);
- ASSERT_EQ(max_scalar_view.loc,reference_loc);
+ ASSERT_EQ( max_scalar_view.val, reference_max );
+ ASSERT_EQ( max_scalar_view.loc, reference_loc );
}
+
{
value_type max_scalar_init;
- Kokkos::Experimental::MaxLoc<Scalar,int> reducer_scalar_init(max_scalar_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
- ASSERT_EQ(max_scalar_init.val,reference_max);
- ASSERT_EQ(max_scalar_init.loc,reference_loc);
+ Kokkos::Experimental::MaxLoc< Scalar, int > reducer_scalar_init( max_scalar_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
+
+ ASSERT_EQ( max_scalar_init.val, reference_max );
+ ASSERT_EQ( max_scalar_init.loc, reference_loc );
+
value_type max_scalar_init_view = reducer_scalar_init.result_view()();
- ASSERT_EQ(max_scalar_init_view.val,reference_max);
- ASSERT_EQ(max_scalar_init_view.loc,reference_loc);
+ ASSERT_EQ( max_scalar_init_view.val, reference_max );
+ ASSERT_EQ( max_scalar_init_view.loc, reference_loc );
}
+
{
- Kokkos::View<value_type,Kokkos::HostSpace> max_view("View");
- Kokkos::Experimental::MaxLoc<Scalar,int> reducer_view(max_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::View< value_type, Kokkos::HostSpace > max_view( "View" );
+ Kokkos::Experimental::MaxLoc< Scalar, int > reducer_view( max_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
value_type max_view_scalar = max_view();
- ASSERT_EQ(max_view_scalar.val,reference_max);
- ASSERT_EQ(max_view_scalar.loc,reference_loc);
+ ASSERT_EQ( max_view_scalar.val, reference_max );
+ ASSERT_EQ( max_view_scalar.loc, reference_loc );
+
value_type max_view_view = reducer_view.result_view()();
- ASSERT_EQ(max_view_view.val,reference_max);
- ASSERT_EQ(max_view_view.loc,reference_loc);
+ ASSERT_EQ( max_view_view.val, reference_max );
+ ASSERT_EQ( max_view_view.loc, reference_loc );
}
+
{
- Kokkos::View<value_type,Kokkos::HostSpace> max_view_init("View");
- Kokkos::Experimental::MaxLoc<Scalar,int> reducer_view_init(max_view_init,init);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
+ Kokkos::View< value_type, Kokkos::HostSpace > max_view_init( "View" );
+ Kokkos::Experimental::MaxLoc< Scalar, int > reducer_view_init( max_view_init, init );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
+
value_type max_view_init_scalar = max_view_init();
- ASSERT_EQ(max_view_init_scalar.val,reference_max);
- ASSERT_EQ(max_view_init_scalar.loc,reference_loc);
+ ASSERT_EQ( max_view_init_scalar.val, reference_max );
+ ASSERT_EQ( max_view_init_scalar.loc, reference_loc );
+
value_type max_view_init_view = reducer_view_init.result_view()();
- ASSERT_EQ(max_view_init_view.val,reference_max);
- ASSERT_EQ(max_view_init_view.loc,reference_loc);
+ ASSERT_EQ( max_view_init_view.val, reference_max );
+ ASSERT_EQ( max_view_init_view.loc, reference_loc );
}
}
- static void test_minmaxloc(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
- Scalar reference_max = std::numeric_limits<Scalar>::min();
- Scalar reference_min = std::numeric_limits<Scalar>::max();
+ static void test_minmaxloc( int N ) {
+ typedef typename Kokkos::Experimental::MinMaxLoc< Scalar, int >::value_type value_type;
+
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
+ Scalar reference_max = std::numeric_limits< Scalar >::min();
+ Scalar reference_min = std::numeric_limits< Scalar >::max();
int reference_minloc = -1;
int reference_maxloc = -1;
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%100000);
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 100000 );
}
- for(int i=0; i<N; i++) {
- if(h_values(i)>reference_max) {
- reference_max = h_values(i);
+
+ for ( int i = 0; i < N; i++ ) {
+ if ( h_values( i ) > reference_max ) {
+ reference_max = h_values( i );
reference_maxloc = i;
- } else if (h_values(i) == reference_max) {
- // make max unique
- h_values(i) -= std::numeric_limits<Scalar>::epsilon();
+ }
+ else if ( h_values( i ) == reference_max ) {
+ // Make max unique.
+ h_values( i ) -= std::numeric_limits< Scalar >::epsilon();
}
}
- for(int i=0; i<N; i++) {
- if(h_values(i)<reference_min) {
- reference_min = h_values(i);
+
+ for ( int i = 0; i < N; i++ ) {
+ if ( h_values( i ) < reference_min ) {
+ reference_min = h_values( i );
reference_minloc = i;
- } else if (h_values(i) == reference_min) {
- // make min unique
- h_values(i) += std::numeric_limits<Scalar>::epsilon();
+ }
+ else if ( h_values( i ) == reference_min ) {
+ // Make min unique.
+ h_values( i ) += std::numeric_limits< Scalar >::epsilon();
}
}
- Kokkos::deep_copy(values,h_values);
+
+ Kokkos::deep_copy( values, h_values );
MinMaxLocFunctor f;
- typedef typename Kokkos::Experimental::MinMaxLoc<Scalar,int>::value_type value_type;
f.values = values;
- Scalar init_min = std::numeric_limits<Scalar>::max();
- Scalar init_max = std::numeric_limits<Scalar>::min();
-
+ Scalar init_min = std::numeric_limits< Scalar >::max();
+ Scalar init_max = std::numeric_limits< Scalar >::min();
{
value_type minmax_scalar;
- Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_scalar(minmax_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(minmax_scalar.min_val,reference_min);
- for(int i=0; i<N; i++) {
- if((i == minmax_scalar.min_loc) && (h_values(i)==reference_min))
+ Kokkos::Experimental::MinMaxLoc< Scalar, int > reducer_scalar( minmax_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( minmax_scalar.min_val, reference_min );
+
+ for ( int i = 0; i < N; i++ ) {
+ if ( ( i == minmax_scalar.min_loc ) && ( h_values( i ) == reference_min ) ) {
reference_minloc = i;
+ }
}
- ASSERT_EQ(minmax_scalar.min_loc,reference_minloc);
- ASSERT_EQ(minmax_scalar.max_val,reference_max);
- for(int i=0; i<N; i++) {
- if((i == minmax_scalar.max_loc) && (h_values(i)==reference_max))
+
+ ASSERT_EQ( minmax_scalar.min_loc, reference_minloc );
+ ASSERT_EQ( minmax_scalar.max_val, reference_max );
+
+ for ( int i = 0; i < N; i++ ) {
+ if ( ( i == minmax_scalar.max_loc ) && ( h_values( i ) == reference_max ) ) {
reference_maxloc = i;
+ }
}
- ASSERT_EQ(minmax_scalar.max_loc,reference_maxloc);
+
+ ASSERT_EQ( minmax_scalar.max_loc, reference_maxloc );
+
value_type minmax_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(minmax_scalar_view.min_val,reference_min);
- ASSERT_EQ(minmax_scalar_view.min_loc,reference_minloc);
- ASSERT_EQ(minmax_scalar_view.max_val,reference_max);
- ASSERT_EQ(minmax_scalar_view.max_loc,reference_maxloc);
+ ASSERT_EQ( minmax_scalar_view.min_val, reference_min );
+ ASSERT_EQ( minmax_scalar_view.min_loc, reference_minloc );
+ ASSERT_EQ( minmax_scalar_view.max_val, reference_max );
+ ASSERT_EQ( minmax_scalar_view.max_loc, reference_maxloc );
}
+
{
value_type minmax_scalar_init;
- Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_scalar_init(minmax_scalar_init,init_min,init_max);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar_init);
- ASSERT_EQ(minmax_scalar_init.min_val,reference_min);
- ASSERT_EQ(minmax_scalar_init.min_loc,reference_minloc);
- ASSERT_EQ(minmax_scalar_init.max_val,reference_max);
- ASSERT_EQ(minmax_scalar_init.max_loc,reference_maxloc);
+ Kokkos::Experimental::MinMaxLoc< Scalar, int > reducer_scalar_init( minmax_scalar_init, init_min, init_max );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar_init );
+
+ ASSERT_EQ( minmax_scalar_init.min_val, reference_min );
+ ASSERT_EQ( minmax_scalar_init.min_loc, reference_minloc );
+ ASSERT_EQ( minmax_scalar_init.max_val, reference_max );
+ ASSERT_EQ( minmax_scalar_init.max_loc, reference_maxloc );
+
value_type minmax_scalar_init_view = reducer_scalar_init.result_view()();
- ASSERT_EQ(minmax_scalar_init_view.min_val,reference_min);
- ASSERT_EQ(minmax_scalar_init_view.min_loc,reference_minloc);
- ASSERT_EQ(minmax_scalar_init_view.max_val,reference_max);
- ASSERT_EQ(minmax_scalar_init_view.max_loc,reference_maxloc);
+ ASSERT_EQ( minmax_scalar_init_view.min_val, reference_min );
+ ASSERT_EQ( minmax_scalar_init_view.min_loc, reference_minloc );
+ ASSERT_EQ( minmax_scalar_init_view.max_val, reference_max );
+ ASSERT_EQ( minmax_scalar_init_view.max_loc, reference_maxloc );
}
+
{
- Kokkos::View<value_type,Kokkos::HostSpace> minmax_view("View");
- Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_view(minmax_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::View< value_type, Kokkos::HostSpace > minmax_view( "View" );
+ Kokkos::Experimental::MinMaxLoc< Scalar, int > reducer_view( minmax_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
value_type minmax_view_scalar = minmax_view();
- ASSERT_EQ(minmax_view_scalar.min_val,reference_min);
- ASSERT_EQ(minmax_view_scalar.min_loc,reference_minloc);
- ASSERT_EQ(minmax_view_scalar.max_val,reference_max);
- ASSERT_EQ(minmax_view_scalar.max_loc,reference_maxloc);
+ ASSERT_EQ( minmax_view_scalar.min_val, reference_min );
+ ASSERT_EQ( minmax_view_scalar.min_loc, reference_minloc );
+ ASSERT_EQ( minmax_view_scalar.max_val, reference_max );
+ ASSERT_EQ( minmax_view_scalar.max_loc, reference_maxloc );
+
value_type minmax_view_view = reducer_view.result_view()();
- ASSERT_EQ(minmax_view_view.min_val,reference_min);
- ASSERT_EQ(minmax_view_view.min_loc,reference_minloc);
- ASSERT_EQ(minmax_view_view.max_val,reference_max);
- ASSERT_EQ(minmax_view_view.max_loc,reference_maxloc);
+ ASSERT_EQ( minmax_view_view.min_val, reference_min );
+ ASSERT_EQ( minmax_view_view.min_loc, reference_minloc );
+ ASSERT_EQ( minmax_view_view.max_val, reference_max );
+ ASSERT_EQ( minmax_view_view.max_loc, reference_maxloc );
}
+
{
- Kokkos::View<value_type,Kokkos::HostSpace> minmax_view_init("View");
- Kokkos::Experimental::MinMaxLoc<Scalar,int> reducer_view_init(minmax_view_init,init_min,init_max);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view_init);
+ Kokkos::View< value_type, Kokkos::HostSpace > minmax_view_init( "View" );
+ Kokkos::Experimental::MinMaxLoc< Scalar, int > reducer_view_init( minmax_view_init, init_min, init_max );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view_init );
+
value_type minmax_view_init_scalar = minmax_view_init();
- ASSERT_EQ(minmax_view_init_scalar.min_val,reference_min);
- ASSERT_EQ(minmax_view_init_scalar.min_loc,reference_minloc);
- ASSERT_EQ(minmax_view_init_scalar.max_val,reference_max);
- ASSERT_EQ(minmax_view_init_scalar.max_loc,reference_maxloc);
+ ASSERT_EQ( minmax_view_init_scalar.min_val, reference_min );
+ ASSERT_EQ( minmax_view_init_scalar.min_loc, reference_minloc );
+ ASSERT_EQ( minmax_view_init_scalar.max_val, reference_max );
+ ASSERT_EQ( minmax_view_init_scalar.max_loc, reference_maxloc );
+
value_type minmax_view_init_view = reducer_view_init.result_view()();
- ASSERT_EQ(minmax_view_init_view.min_val,reference_min);
- ASSERT_EQ(minmax_view_init_view.min_loc,reference_minloc);
- ASSERT_EQ(minmax_view_init_view.max_val,reference_max);
- ASSERT_EQ(minmax_view_init_view.max_loc,reference_maxloc);
+ ASSERT_EQ( minmax_view_init_view.min_val, reference_min );
+ ASSERT_EQ( minmax_view_init_view.min_loc, reference_minloc );
+ ASSERT_EQ( minmax_view_init_view.max_val, reference_max );
+ ASSERT_EQ( minmax_view_init_view.max_loc, reference_maxloc );
}
}
- static void test_BAnd(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
- Scalar reference_band = Scalar() | (~Scalar());
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%100000+1);
- reference_band = reference_band & h_values(i);
+ static void test_BAnd( int N ) {
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
+ Scalar reference_band = Scalar() | ( ~Scalar() );
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 100000 + 1 );
+ reference_band = reference_band & h_values( i );
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
BAndFunctor f;
f.values = values;
- Scalar init = Scalar() | (~Scalar());
+ Scalar init = Scalar() | ( ~Scalar() );
{
Scalar band_scalar = init;
- Kokkos::Experimental::BAnd<Scalar> reducer_scalar(band_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(band_scalar,reference_band);
+ Kokkos::Experimental::BAnd< Scalar > reducer_scalar( band_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( band_scalar, reference_band );
Scalar band_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(band_scalar_view,reference_band);
+
+ ASSERT_EQ( band_scalar_view, reference_band );
}
{
- Kokkos::View<Scalar,Kokkos::HostSpace> band_view("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > band_view( "View" );
band_view() = init;
- Kokkos::Experimental::BAnd<Scalar> reducer_view(band_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::Experimental::BAnd< Scalar > reducer_view( band_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
Scalar band_view_scalar = band_view();
- ASSERT_EQ(band_view_scalar,reference_band);
+ ASSERT_EQ( band_view_scalar, reference_band );
+
Scalar band_view_view = reducer_view.result_view()();
- ASSERT_EQ(band_view_view,reference_band);
+ ASSERT_EQ( band_view_view, reference_band );
}
}
- static void test_BOr(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
- Scalar reference_bor = Scalar() & (~Scalar());
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)((rand()%100000+1)*2);
- reference_bor = reference_bor | h_values(i);
+ static void test_BOr( int N ) {
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
+ Scalar reference_bor = Scalar() & ( ~Scalar() );
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( ( rand() % 100000 + 1 ) * 2 );
+ reference_bor = reference_bor | h_values( i );
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
BOrFunctor f;
f.values = values;
- Scalar init = Scalar() & (~Scalar());
+ Scalar init = Scalar() & ( ~Scalar() );
{
Scalar bor_scalar = init;
- Kokkos::Experimental::BOr<Scalar> reducer_scalar(bor_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(bor_scalar,reference_bor);
+ Kokkos::Experimental::BOr< Scalar > reducer_scalar( bor_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( bor_scalar, reference_bor );
+
Scalar bor_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(bor_scalar_view,reference_bor);
+ ASSERT_EQ( bor_scalar_view, reference_bor );
}
{
- Kokkos::View<Scalar,Kokkos::HostSpace> bor_view("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > bor_view( "View" );
bor_view() = init;
- Kokkos::Experimental::BOr<Scalar> reducer_view(bor_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::Experimental::BOr< Scalar > reducer_view( bor_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
Scalar bor_view_scalar = bor_view();
- ASSERT_EQ(bor_view_scalar,reference_bor);
+ ASSERT_EQ( bor_view_scalar, reference_bor );
+
Scalar bor_view_view = reducer_view.result_view()();
- ASSERT_EQ(bor_view_view,reference_bor);
+ ASSERT_EQ( bor_view_view, reference_bor );
}
}
- static void test_BXor(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
- Scalar reference_bxor = Scalar() & (~Scalar());
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)((rand()%100000+1)*2);
- reference_bxor = reference_bxor ^ h_values(i);
+ static void test_BXor( int N ) {
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
+ Scalar reference_bxor = Scalar() & ( ~Scalar() );
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( ( rand() % 100000 + 1 ) * 2 );
+ reference_bxor = reference_bxor ^ h_values( i );
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
BXorFunctor f;
f.values = values;
- Scalar init = Scalar() & (~Scalar());
+ Scalar init = Scalar() & ( ~Scalar() );
{
Scalar bxor_scalar = init;
- Kokkos::Experimental::BXor<Scalar> reducer_scalar(bxor_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(bxor_scalar,reference_bxor);
+ Kokkos::Experimental::BXor< Scalar > reducer_scalar( bxor_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( bxor_scalar, reference_bxor );
+
Scalar bxor_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(bxor_scalar_view,reference_bxor);
+ ASSERT_EQ( bxor_scalar_view, reference_bxor );
}
{
- Kokkos::View<Scalar,Kokkos::HostSpace> bxor_view("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > bxor_view( "View" );
bxor_view() = init;
- Kokkos::Experimental::BXor<Scalar> reducer_view(bxor_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::Experimental::BXor< Scalar > reducer_view( bxor_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
Scalar bxor_view_scalar = bxor_view();
- ASSERT_EQ(bxor_view_scalar,reference_bxor);
+ ASSERT_EQ( bxor_view_scalar, reference_bxor );
+
Scalar bxor_view_view = reducer_view.result_view()();
- ASSERT_EQ(bxor_view_view,reference_bxor);
+ ASSERT_EQ( bxor_view_view, reference_bxor );
}
}
- static void test_LAnd(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
+ static void test_LAnd( int N ) {
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
Scalar reference_land = 1;
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%2);
- reference_land = reference_land && h_values(i);
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 2 );
+ reference_land = reference_land && h_values( i );
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
LAndFunctor f;
f.values = values;
Scalar init = 1;
{
Scalar land_scalar = init;
- Kokkos::Experimental::LAnd<Scalar> reducer_scalar(land_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(land_scalar,reference_land);
+ Kokkos::Experimental::LAnd< Scalar > reducer_scalar( land_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( land_scalar, reference_land );
+
Scalar land_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(land_scalar_view,reference_land);
+ ASSERT_EQ( land_scalar_view, reference_land );
}
{
- Kokkos::View<Scalar,Kokkos::HostSpace> land_view("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > land_view( "View" );
land_view() = init;
- Kokkos::Experimental::LAnd<Scalar> reducer_view(land_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::Experimental::LAnd< Scalar > reducer_view( land_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
Scalar land_view_scalar = land_view();
- ASSERT_EQ(land_view_scalar,reference_land);
+ ASSERT_EQ( land_view_scalar, reference_land );
+
Scalar land_view_view = reducer_view.result_view()();
- ASSERT_EQ(land_view_view,reference_land);
+ ASSERT_EQ( land_view_view, reference_land );
}
}
- static void test_LOr(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
+ static void test_LOr( int N ) {
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
Scalar reference_lor = 0;
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%2);
- reference_lor = reference_lor || h_values(i);
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 2 );
+ reference_lor = reference_lor || h_values( i );
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
LOrFunctor f;
f.values = values;
Scalar init = 0;
{
Scalar lor_scalar = init;
- Kokkos::Experimental::LOr<Scalar> reducer_scalar(lor_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(lor_scalar,reference_lor);
+ Kokkos::Experimental::LOr< Scalar > reducer_scalar( lor_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( lor_scalar, reference_lor );
+
Scalar lor_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(lor_scalar_view,reference_lor);
+ ASSERT_EQ( lor_scalar_view, reference_lor );
}
{
- Kokkos::View<Scalar,Kokkos::HostSpace> lor_view("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > lor_view( "View" );
lor_view() = init;
- Kokkos::Experimental::LOr<Scalar> reducer_view(lor_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::Experimental::LOr< Scalar > reducer_view( lor_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
Scalar lor_view_scalar = lor_view();
- ASSERT_EQ(lor_view_scalar,reference_lor);
+ ASSERT_EQ( lor_view_scalar, reference_lor );
+
Scalar lor_view_view = reducer_view.result_view()();
- ASSERT_EQ(lor_view_view,reference_lor);
+ ASSERT_EQ( lor_view_view, reference_lor );
}
}
- static void test_LXor(int N) {
- Kokkos::View<Scalar*,ExecSpace> values("Values",N);
- auto h_values = Kokkos::create_mirror_view(values);
+ static void test_LXor( int N ) {
+ Kokkos::View< Scalar*, ExecSpace > values( "Values", N );
+ auto h_values = Kokkos::create_mirror_view( values );
Scalar reference_lxor = 0;
- for(int i=0; i<N; i++) {
- h_values(i) = (Scalar)(rand()%2);
- reference_lxor = reference_lxor ? (!h_values(i)) : h_values(i);
+
+ for ( int i = 0; i < N; i++ ) {
+ h_values( i ) = (Scalar) ( rand() % 2 );
+ reference_lxor = reference_lxor ? ( !h_values( i ) ) : h_values( i );
}
- Kokkos::deep_copy(values,h_values);
+ Kokkos::deep_copy( values, h_values );
LXorFunctor f;
f.values = values;
Scalar init = 0;
{
Scalar lxor_scalar = init;
- Kokkos::Experimental::LXor<Scalar> reducer_scalar(lxor_scalar);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_scalar);
- ASSERT_EQ(lxor_scalar,reference_lxor);
+ Kokkos::Experimental::LXor< Scalar > reducer_scalar( lxor_scalar );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_scalar );
+
+ ASSERT_EQ( lxor_scalar, reference_lxor );
+
Scalar lxor_scalar_view = reducer_scalar.result_view()();
- ASSERT_EQ(lxor_scalar_view,reference_lxor);
+ ASSERT_EQ( lxor_scalar_view, reference_lxor );
}
{
- Kokkos::View<Scalar,Kokkos::HostSpace> lxor_view("View");
+ Kokkos::View< Scalar, Kokkos::HostSpace > lxor_view( "View" );
lxor_view() = init;
- Kokkos::Experimental::LXor<Scalar> reducer_view(lxor_view);
- Kokkos::parallel_reduce(Kokkos::RangePolicy<ExecSpace>(0,N),f,reducer_view);
+ Kokkos::Experimental::LXor< Scalar > reducer_view( lxor_view );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, N ), f, reducer_view );
+
Scalar lxor_view_scalar = lxor_view();
- ASSERT_EQ(lxor_view_scalar,reference_lxor);
+ ASSERT_EQ( lxor_view_scalar, reference_lxor );
+
Scalar lxor_view_view = reducer_view.result_view()();
- ASSERT_EQ(lxor_view_view,reference_lxor);
+ ASSERT_EQ( lxor_view_view, reference_lxor );
}
}
static void execute_float() {
- test_sum(10001);
- test_prod(35);
- test_min(10003);
- test_minloc(10003);
- test_max(10007);
- test_maxloc(10007);
- test_minmaxloc(10007);
+ test_sum( 10001 );
+ test_prod( 35 );
+ test_min( 10003 );
+ test_minloc( 10003 );
+ test_max( 10007 );
+ test_maxloc( 10007 );
+ test_minmaxloc( 10007 );
}
static void execute_integer() {
- test_sum(10001);
- test_prod(35);
- test_min(10003);
- test_minloc(10003);
- test_max(10007);
- test_maxloc(10007);
- test_minmaxloc(10007);
- test_BAnd(35);
- test_BOr(35);
- test_BXor(35);
- test_LAnd(35);
- test_LOr(35);
- test_LXor(35);
+ test_sum( 10001 );
+ test_prod( 35 );
+ test_min( 10003 );
+ test_minloc( 10003 );
+ test_max( 10007 );
+ test_maxloc( 10007 );
+ test_minmaxloc( 10007 );
+ test_BAnd( 35 );
+ test_BOr( 35 );
+ test_BXor( 35 );
+ test_LAnd( 35 );
+ test_LOr( 35 );
+ test_LXor( 35 );
}
static void execute_basic() {
- test_sum(10001);
- test_prod(35);
+ test_sum( 10001 );
+ test_prod( 35 );
}
};
-}
-
-/*--------------------------------------------------------------------------*/
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestScan.hpp b/lib/kokkos/core/unit_test/TestScan.hpp
index 1a9811a85..547e03497 100644
--- a/lib/kokkos/core/unit_test/TestScan.hpp
+++ b/lib/kokkos/core/unit_test/TestScan.hpp
@@ -1,117 +1,116 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-/*--------------------------------------------------------------------------*/
-
#include <stdio.h>
namespace Test {
-template< class Device , class WorkSpec = size_t >
+template< class Device, class WorkSpec = size_t >
struct TestScan {
+ typedef Device execution_space;
+ typedef long int value_type;
- typedef Device execution_space ;
- typedef long int value_type ;
-
- Kokkos::View<int,Device,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
+ Kokkos::View< int, Device, Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
KOKKOS_INLINE_FUNCTION
- void operator()( const int iwork , value_type & update , const bool final_pass ) const
+ void operator()( const int iwork, value_type & update, const bool final_pass ) const
{
- const value_type n = iwork + 1 ;
- const value_type imbalance = ( (1000 <= n) && (0 == n % 1000) ) ? 1000 : 0 ;
+ const value_type n = iwork + 1;
+ const value_type imbalance = ( ( 1000 <= n ) && ( 0 == n % 1000 ) ) ? 1000 : 0;
// Insert an artificial load imbalance
- for ( value_type i = 0 ; i < imbalance ; ++i ) { ++update ; }
+ for ( value_type i = 0; i < imbalance; ++i ) { ++update; }
- update += n - imbalance ;
+ update += n - imbalance;
if ( final_pass ) {
const value_type answer = n & 1 ? ( n * ( ( n + 1 ) / 2 ) ) : ( ( n / 2 ) * ( n + 1 ) );
if ( answer != update ) {
errors()++;
- if(errors()<20)
- printf("TestScan(%d,%ld) != %ld\n",iwork,update,answer);
+
+ if ( errors() < 20 ) {
+ printf( "TestScan(%d,%ld) != %ld\n", iwork, update, answer );
+ }
}
}
}
KOKKOS_INLINE_FUNCTION
- void init( value_type & update ) const { update = 0 ; }
+ void init( value_type & update ) const { update = 0; }
KOKKOS_INLINE_FUNCTION
- void join( volatile value_type & update ,
+ void join( volatile value_type & update,
volatile const value_type & input ) const
- { update += input ; }
+ { update += input; }
TestScan( const WorkSpec & N )
- {
- Kokkos::View<int,Device > errors_a("Errors");
- Kokkos::deep_copy(errors_a,0);
- errors = errors_a;
- parallel_scan( N , *this );
- }
+ {
+ Kokkos::View< int, Device > errors_a( "Errors" );
+ Kokkos::deep_copy( errors_a, 0 );
+ errors = errors_a;
+
+ parallel_scan( N , *this );
+ }
TestScan( const WorkSpec & Start , const WorkSpec & N )
- {
- typedef Kokkos::RangePolicy<execution_space> exec_policy ;
+ {
+ typedef Kokkos::RangePolicy< execution_space > exec_policy ;
- Kokkos::View<int,Device > errors_a("Errors");
- Kokkos::deep_copy(errors_a,0);
- errors = errors_a;
+ Kokkos::View< int, Device > errors_a( "Errors" );
+ Kokkos::deep_copy( errors_a, 0 );
+ errors = errors_a;
- parallel_scan( exec_policy( Start , N ) , *this );
- }
+ parallel_scan( exec_policy( Start , N ) , *this );
+ }
- static void test_range( const WorkSpec & begin , const WorkSpec & end )
- {
- for ( WorkSpec i = begin ; i < end ; ++i ) {
- (void) TestScan( i );
- }
+ static void test_range( const WorkSpec & begin, const WorkSpec & end )
+ {
+ for ( WorkSpec i = begin; i < end; ++i ) {
+ (void) TestScan( i );
}
+ }
};
-}
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestSharedAlloc.hpp b/lib/kokkos/core/unit_test/TestSharedAlloc.hpp
index 291f9f60e..6eca6bb38 100644
--- a/lib/kokkos/core/unit_test/TestSharedAlloc.hpp
+++ b/lib/kokkos/core/unit_test/TestSharedAlloc.hpp
@@ -1,215 +1,210 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <stdexcept>
#include <sstream>
#include <iostream>
#include <Kokkos_Core.hpp>
/*--------------------------------------------------------------------------*/
namespace Test {
struct SharedAllocDestroy {
+ volatile int * count;
- volatile int * count ;
-
- SharedAllocDestroy() = default ;
+ SharedAllocDestroy() = default;
SharedAllocDestroy( int * arg ) : count( arg ) {}
void destroy_shared_allocation()
- {
- Kokkos::atomic_increment( count );
- }
-
+ {
+ Kokkos::atomic_increment( count );
+ }
};
-template< class MemorySpace , class ExecutionSpace >
+template< class MemorySpace, class ExecutionSpace >
void test_shared_alloc()
{
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
+ typedef const Kokkos::Impl::SharedAllocationHeader Header;
+ typedef Kokkos::Impl::SharedAllocationTracker Tracker;
+ typedef Kokkos::Impl::SharedAllocationRecord< void, void > RecordBase;
+ typedef Kokkos::Impl::SharedAllocationRecord< MemorySpace, void > RecordMemS;
+ typedef Kokkos::Impl::SharedAllocationRecord< MemorySpace, SharedAllocDestroy > RecordFull;
- typedef const Kokkos::Impl::SharedAllocationHeader Header ;
- typedef Kokkos::Impl::SharedAllocationTracker Tracker ;
- typedef Kokkos::Impl::SharedAllocationRecord< void , void > RecordBase ;
- typedef Kokkos::Impl::SharedAllocationRecord< MemorySpace , void > RecordMemS ;
- typedef Kokkos::Impl::SharedAllocationRecord< MemorySpace , SharedAllocDestroy > RecordFull ;
-
- static_assert( sizeof(Tracker) == sizeof(int*), "SharedAllocationTracker has wrong size!" );
+ static_assert( sizeof( Tracker ) == sizeof( int* ), "SharedAllocationTracker has wrong size!" );
- MemorySpace s ;
+ MemorySpace s;
- const size_t N = 1200 ;
- const size_t size = 8 ;
+ const size_t N = 1200;
+ const size_t size = 8;
RecordMemS * rarray[ N ];
Header * harray[ N ];
- RecordMemS ** const r = rarray ;
- Header ** const h = harray ;
+ RecordMemS ** const r = rarray;
+ Header ** const h = harray;
+
+ Kokkos::RangePolicy< ExecutionSpace > range( 0, N );
- Kokkos::RangePolicy< ExecutionSpace > range(0,N);
-
- //----------------------------------------
{
- // Since always executed on host space, leave [=]
- Kokkos::parallel_for( range , [=]( size_t i ){
- char name[64] ;
- sprintf(name,"test_%.2d",int(i));
+ // Since always executed on host space, leave [=]
+ Kokkos::parallel_for( range, [=] ( size_t i ) {
+ char name[64];
+ sprintf( name, "test_%.2d", int( i ) );
- r[i] = RecordMemS::allocate( s , name , size * ( i + 1 ) );
+ r[i] = RecordMemS::allocate( s, name, size * ( i + 1 ) );
h[i] = Header::get_header( r[i]->data() );
- ASSERT_EQ( r[i]->use_count() , 0 );
+ ASSERT_EQ( r[i]->use_count(), 0 );
- for ( size_t j = 0 ; j < ( i / 10 ) + 1 ; ++j ) RecordBase::increment( r[i] );
+ for ( size_t j = 0; j < ( i / 10 ) + 1; ++j ) RecordBase::increment( r[i] );
- ASSERT_EQ( r[i]->use_count() , ( i / 10 ) + 1 );
- ASSERT_EQ( r[i] , RecordMemS::get_record( r[i]->data() ) );
+ ASSERT_EQ( r[i]->use_count(), ( i / 10 ) + 1 );
+ ASSERT_EQ( r[i], RecordMemS::get_record( r[i]->data() ) );
});
// Sanity check for the whole set of allocation records to which this record belongs.
RecordBase::is_sane( r[0] );
- // RecordMemS::print_records( std::cout , s , true );
+ // RecordMemS::print_records( std::cout, s, true );
- Kokkos::parallel_for( range , [=]( size_t i ){
- while ( 0 != ( r[i] = static_cast< RecordMemS *>( RecordBase::decrement( r[i] ) ) ) ) {
+ Kokkos::parallel_for( range, [=] ( size_t i ) {
+ while ( 0 != ( r[i] = static_cast< RecordMemS * >( RecordBase::decrement( r[i] ) ) ) ) {
if ( r[i]->use_count() == 1 ) RecordBase::is_sane( r[i] );
}
});
}
- //----------------------------------------
+
{
- int destroy_count = 0 ;
- SharedAllocDestroy counter( & destroy_count );
+ int destroy_count = 0;
+ SharedAllocDestroy counter( &destroy_count );
- Kokkos::parallel_for( range , [=]( size_t i ){
- char name[64] ;
- sprintf(name,"test_%.2d",int(i));
+ Kokkos::parallel_for( range, [=] ( size_t i ) {
+ char name[64];
+ sprintf( name, "test_%.2d", int( i ) );
- RecordFull * rec = RecordFull::allocate( s , name , size * ( i + 1 ) );
+ RecordFull * rec = RecordFull::allocate( s, name, size * ( i + 1 ) );
- rec->m_destroy = counter ;
+ rec->m_destroy = counter;
- r[i] = rec ;
+ r[i] = rec;
h[i] = Header::get_header( r[i]->data() );
- ASSERT_EQ( r[i]->use_count() , 0 );
+ ASSERT_EQ( r[i]->use_count(), 0 );
- for ( size_t j = 0 ; j < ( i / 10 ) + 1 ; ++j ) RecordBase::increment( r[i] );
+ for ( size_t j = 0; j < ( i / 10 ) + 1; ++j ) RecordBase::increment( r[i] );
- ASSERT_EQ( r[i]->use_count() , ( i / 10 ) + 1 );
- ASSERT_EQ( r[i] , RecordMemS::get_record( r[i]->data() ) );
+ ASSERT_EQ( r[i]->use_count(), ( i / 10 ) + 1 );
+ ASSERT_EQ( r[i], RecordMemS::get_record( r[i]->data() ) );
});
RecordBase::is_sane( r[0] );
- Kokkos::parallel_for( range , [=]( size_t i ){
- while ( 0 != ( r[i] = static_cast< RecordMemS *>( RecordBase::decrement( r[i] ) ) ) ) {
+ Kokkos::parallel_for( range, [=] ( size_t i ) {
+ while ( 0 != ( r[i] = static_cast< RecordMemS * >( RecordBase::decrement( r[i] ) ) ) ) {
if ( r[i]->use_count() == 1 ) RecordBase::is_sane( r[i] );
}
});
- ASSERT_EQ( destroy_count , int(N) );
+ ASSERT_EQ( destroy_count, int( N ) );
}
- //----------------------------------------
{
- int destroy_count = 0 ;
+ int destroy_count = 0;
{
- RecordFull * rec = RecordFull::allocate( s , "test" , size );
+ RecordFull * rec = RecordFull::allocate( s, "test", size );
- // ... Construction of the allocated { rec->data() , rec->size() }
+ // ... Construction of the allocated { rec->data(), rec->size() }
- // Copy destruction function object into the allocation record
+ // Copy destruction function object into the allocation record.
rec->m_destroy = SharedAllocDestroy( & destroy_count );
- ASSERT_EQ( rec->use_count() , 0 );
+ ASSERT_EQ( rec->use_count(), 0 );
- // Start tracking, increments the use count from 0 to 1
- Tracker track ;
+ // Start tracking, increments the use count from 0 to 1.
+ Tracker track;
track.assign_allocated_record_to_uninitialized( rec );
- ASSERT_EQ( rec->use_count() , 1 );
- ASSERT_EQ( track.use_count() , 1 );
+ ASSERT_EQ( rec->use_count(), 1 );
+ ASSERT_EQ( track.use_count(), 1 );
+
+ // Verify construction / destruction increment.
+ for ( size_t i = 0; i < N; ++i ) {
+ ASSERT_EQ( rec->use_count(), 1 );
- // Verify construction / destruction increment
- for ( size_t i = 0 ; i < N ; ++i ) {
- ASSERT_EQ( rec->use_count() , 1 );
{
- Tracker local_tracker ;
+ Tracker local_tracker;
local_tracker.assign_allocated_record_to_uninitialized( rec );
- ASSERT_EQ( rec->use_count() , 2 );
- ASSERT_EQ( local_tracker.use_count() , 2 );
+ ASSERT_EQ( rec->use_count(), 2 );
+ ASSERT_EQ( local_tracker.use_count(), 2 );
}
- ASSERT_EQ( rec->use_count() , 1 );
- ASSERT_EQ( track.use_count() , 1 );
+
+ ASSERT_EQ( rec->use_count(), 1 );
+ ASSERT_EQ( track.use_count(), 1 );
}
- Kokkos::parallel_for( range , [=]( size_t i ){
- Tracker local_tracker ;
+ Kokkos::parallel_for( range, [=] ( size_t i ) {
+ Tracker local_tracker;
local_tracker.assign_allocated_record_to_uninitialized( rec );
- ASSERT_GT( rec->use_count() , 1 );
+ ASSERT_GT( rec->use_count(), 1 );
});
- ASSERT_EQ( rec->use_count() , 1 );
- ASSERT_EQ( track.use_count() , 1 );
+ ASSERT_EQ( rec->use_count(), 1 );
+ ASSERT_EQ( track.use_count(), 1 );
// Destruction of 'track' object deallocates the 'rec' and invokes the destroy function object.
}
- ASSERT_EQ( destroy_count , 1 );
+ ASSERT_EQ( destroy_count, 1 );
}
#endif /* #if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST ) */
}
-
-}
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestSynchronic.cpp b/lib/kokkos/core/unit_test/TestSynchronic.cpp
deleted file mode 100644
index dc1abbd8b..000000000
--- a/lib/kokkos/core/unit_test/TestSynchronic.cpp
+++ /dev/null
@@ -1,449 +0,0 @@
-/*
-
-Copyright (c) 2014, NVIDIA Corporation
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
-1. Redistributions of source code must retain the above copyright notice, this
-list of conditions and the following disclaimer.
-
-2. Redistributions in binary form must reproduce the above copyright notice,
-this list of conditions and the following disclaimer in the documentation
-and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
-IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
-OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
-OF THE POSSIBILITY OF SUCH DAMAGE.
-
-*/
-
-//#undef _WIN32_WINNT
-//#define _WIN32_WINNT 0x0602
-
-#if defined(__powerpc__) || defined(__ppc__) || defined(__PPC__) || \
- defined(__APPLE__) || defined(__ARM_ARCH_8A) || defined(_CRAYC)
-
-// Skip for now
-
-#else
-
-#include <gtest/gtest.h>
-
-#ifdef USEOMP
-#include <omp.h>
-#endif
-
-#include <iostream>
-#include <sstream>
-#include <algorithm>
-#include <string>
-#include <vector>
-#include <map>
-#include <cstring>
-#include <ctime>
-
-//#include <details/config>
-//#undef __SYNCHRONIC_COMPATIBLE
-
-#include <impl/Kokkos_Synchronic.hpp>
-#include <impl/Kokkos_Synchronic_n3998.hpp>
-
-#include "TestSynchronic.hpp"
-
-// Uncomment to allow test to dump output
-//#define VERBOSE_TEST
-
-namespace Test {
-
-unsigned next_table[] =
- {
- 0, 1, 2, 3, //0-3
- 4, 4, 6, 6, //4-7
- 8, 8, 8, 8, //8-11
- 12, 12, 12, 12, //12-15
- 16, 16, 16, 16, //16-19
- 16, 16, 16, 16, //20-23
- 24, 24, 24, 24, //24-27
- 24, 24, 24, 24, //28-31
- 32, 32, 32, 32, //32-35
- 32, 32, 32, 32, //36-39
- 40, 40, 40, 40, //40-43
- 40, 40, 40, 40, //44-47
- 48, 48, 48, 48, //48-51
- 48, 48, 48, 48, //52-55
- 56, 56, 56, 56, //56-59
- 56, 56, 56, 56, //60-63
- };
-
-//change this if you want to allow oversubscription of the system, by default only the range {1-(system size)} is tested
-#define FOR_GAUNTLET(x) for(unsigned x = (std::min)(std::thread::hardware_concurrency()*8,unsigned(sizeof(next_table)/sizeof(unsigned))); x; x = next_table[x-1])
-
-//set this to override the benchmark of barriers to use OMP barriers instead of n3998 std::barrier
-//#define USEOMP
-
-#if defined(__SYNCHRONIC_COMPATIBLE)
- #define PREFIX "futex-"
-#else
- #define PREFIX "backoff-"
-#endif
-
-//this test uses a custom Mersenne twister to eliminate implementation variation
-MersenneTwister mt;
-
-int dummya = 1, dummyb =1;
-
-int dummy1 = 1;
-std::atomic<int> dummy2(1);
-std::atomic<int> dummy3(1);
-
-double time_item(int const count = (int)1E8) {
-
- clock_t const start = clock();
-
- for(int i = 0;i < count; ++i)
- mt.integer();
-
- clock_t const end = clock();
- double elapsed_seconds = (end - start) / double(CLOCKS_PER_SEC);
-
- return elapsed_seconds / count;
-}
-double time_nil(int const count = (int)1E08) {
-
- clock_t const start = clock();
-
- dummy3 = count;
- for(int i = 0;i < (int)1E6; ++i) {
- if(dummy1) {
- // Do some work while holding the lock
- int workunits = dummy3;//(int) (mtc.poissonInterval((float)num_items_critical) + 0.5f);
- for (int j = 1; j < workunits; j++)
- dummy1 &= j; // Do one work unit
- dummy2.fetch_add(dummy1,std::memory_order_relaxed);
- }
- }
-
- clock_t const end = clock();
- double elapsed_seconds = (end - start) / double(CLOCKS_PER_SEC);
-
- return elapsed_seconds / count;
-}
-
-
-template <class mutex_type>
-void testmutex_inner(mutex_type& m, std::atomic<int>& t,std::atomic<int>& wc,std::atomic<int>& wnc, int const num_iterations,
- int const num_items_critical, int const num_items_noncritical, MersenneTwister& mtc, MersenneTwister& mtnc, bool skip) {
-
- for(int k = 0; k < num_iterations; ++k) {
-
- if(num_items_noncritical) {
- // Do some work without holding the lock
- int workunits = num_items_noncritical;//(int) (mtnc.poissonInterval((float)num_items_noncritical) + 0.5f);
- for (int i = 1; i < workunits; i++)
- mtnc.integer(); // Do one work unit
- wnc.fetch_add(workunits,std::memory_order_relaxed);
- }
-
- t.fetch_add(1,std::memory_order_relaxed);
-
- if(!skip) {
- std::unique_lock<mutex_type> l(m);
- if(num_items_critical) {
- // Do some work while holding the lock
- int workunits = num_items_critical;//(int) (mtc.poissonInterval((float)num_items_critical) + 0.5f);
- for (int i = 1; i < workunits; i++)
- mtc.integer(); // Do one work unit
- wc.fetch_add(workunits,std::memory_order_relaxed);
- }
- }
- }
-}
-template <class mutex_type>
-void testmutex_outer(std::map<std::string,std::vector<double>>& results, std::string const& name, double critical_fraction, double critical_duration) {
-
- std::ostringstream truename;
- truename << name << " (f=" << critical_fraction << ",d=" << critical_duration << ")";
-
- std::vector<double>& data = results[truename.str()];
-
- double const workItemTime = time_item() ,
- nilTime = time_nil();
-
- int const num_items_critical = (critical_duration <= 0 ? 0 : (std::max)( int(critical_duration / workItemTime + 0.5), int(100 * nilTime / workItemTime + 0.5))),
- num_items_noncritical = (num_items_critical <= 0 ? 0 : int( ( 1 - critical_fraction ) * num_items_critical / critical_fraction + 0.5 ));
-
- FOR_GAUNTLET(num_threads) {
-
- //Kokkos::Impl::portable_sleep(std::chrono::microseconds(2000000));
-
- int const num_iterations = (num_items_critical + num_items_noncritical != 0) ?
-#ifdef __SYNCHRONIC_JUST_YIELD
- int( 1 / ( 8 * workItemTime ) / (num_items_critical + num_items_noncritical) / num_threads + 0.5 ) :
-#else
- int( 1 / ( 8 * workItemTime ) / (num_items_critical + num_items_noncritical) / num_threads + 0.5 ) :
-#endif
-#ifdef WIN32
- int( 1 / workItemTime / (20 * num_threads * num_threads) );
-#else
- int( 1 / workItemTime / (200 * num_threads * num_threads) );
-#endif
-
-#ifdef VERBOSE_TEST
- std::cerr << "running " << truename.str() << " #" << num_threads << ", " << num_iterations << " * " << num_items_noncritical << "\n" << std::flush;
-#endif
-
-
- std::atomic<int> t[2], wc[2], wnc[2];
-
- clock_t start[2], end[2];
- for(int pass = 0; pass < 2; ++pass) {
-
- t[pass] = 0;
- wc[pass] = 0;
- wnc[pass] = 0;
-
- srand(num_threads);
- std::vector<MersenneTwister> randomsnc(num_threads),
- randomsc(num_threads);
-
- mutex_type m;
-
- start[pass] = clock();
-#ifdef USEOMP
- omp_set_num_threads(num_threads);
- std::atomic<int> _j(0);
- #pragma omp parallel
- {
- int const j = _j.fetch_add(1,std::memory_order_relaxed);
- testmutex_inner(m, t[pass], wc[pass], wnc[pass], num_iterations, num_items_critical, num_items_noncritical, randomsc[j], randomsnc[j], pass==0);
- num_threads = omp_get_num_threads();
- }
-#else
- std::vector<std::thread*> threads(num_threads);
- for(unsigned j = 0; j < num_threads; ++j)
- threads[j] = new std::thread([&,j](){
- testmutex_inner(m, t[pass], wc[pass], wnc[pass], num_iterations, num_items_critical, num_items_noncritical, randomsc[j], randomsnc[j], pass==0);
- }
- );
- for(unsigned j = 0; j < num_threads; ++j) {
- threads[j]->join();
- delete threads[j];
- }
-#endif
- end[pass] = clock();
- }
- if(t[0] != t[1]) throw std::string("mismatched iteration counts");
- if(wnc[0] != wnc[1]) throw std::string("mismatched work item counts");
-
- double elapsed_seconds_0 = (end[0] - start[0]) / double(CLOCKS_PER_SEC),
- elapsed_seconds_1 = (end[1] - start[1]) / double(CLOCKS_PER_SEC);
- double time = (elapsed_seconds_1 - elapsed_seconds_0 - wc[1]*workItemTime) / num_iterations;
-
- data.push_back(time);
-#ifdef VERBOSE_TEST
- std::cerr << truename.str() << " : " << num_threads << "," << elapsed_seconds_1 / num_iterations << " - " << elapsed_seconds_0 / num_iterations << " - " << wc[1]*workItemTime/num_iterations << " = " << time << " \n";
-#endif
- }
-}
-
-template <class barrier_type>
-void testbarrier_inner(barrier_type& b, int const num_threads, int const j, std::atomic<int>& t,std::atomic<int>& w,
- int const num_iterations_odd, int const num_iterations_even,
- int const num_items_noncritical, MersenneTwister& arg_mt, bool skip) {
-
- for(int k = 0; k < (std::max)(num_iterations_even,num_iterations_odd); ++k) {
-
- if(k >= (~j & 0x1 ? num_iterations_odd : num_iterations_even )) {
- if(!skip)
- b.arrive_and_drop();
- break;
- }
-
- if(num_items_noncritical) {
- // Do some work without holding the lock
- int workunits = (int) (arg_mt.poissonInterval((float)num_items_noncritical) + 0.5f);
- for (int i = 1; i < workunits; i++)
- arg_mt.integer(); // Do one work unit
- w.fetch_add(workunits,std::memory_order_relaxed);
- }
-
- t.fetch_add(1,std::memory_order_relaxed);
-
- if(!skip) {
- int const thiscount = (std::min)(k+1,num_iterations_odd)*((num_threads>>1)+(num_threads&1)) + (std::min)(k+1,num_iterations_even)*(num_threads>>1);
- if(t.load(std::memory_order_relaxed) > thiscount) {
- std::cerr << "FAILURE: some threads have run ahead of the barrier (" << t.load(std::memory_order_relaxed) << ">" << thiscount << ").\n";
- EXPECT_TRUE(false);
- }
-#ifdef USEOMP
- #pragma omp barrier
-#else
- b.arrive_and_wait();
-#endif
- if(t.load(std::memory_order_relaxed) < thiscount) {
- std::cerr << "FAILURE: some threads have fallen behind the barrier (" << t.load(std::memory_order_relaxed) << "<" << thiscount << ").\n";
- EXPECT_TRUE(false);
- }
- }
- }
-}
-template <class barrier_type>
-void testbarrier_outer(std::map<std::string,std::vector<double>>& results, std::string const& name, double barrier_frequency, double phase_duration, bool randomIterations = false) {
-
- std::vector<double>& data = results[name];
-
- double const workItemTime = time_item();
- int const num_items_noncritical = int( phase_duration / workItemTime + 0.5 );
-
- FOR_GAUNTLET(num_threads) {
-
- int const num_iterations = int( barrier_frequency );
-#ifdef VERBOSE_TEST
- std::cerr << "running " << name << " #" << num_threads << ", " << num_iterations << " * " << num_items_noncritical << "\r" << std::flush;
-#endif
-
- srand(num_threads);
-
- MersenneTwister local_mt;
- int const num_iterations_odd = randomIterations ? int(local_mt.poissonInterval((float)num_iterations)+0.5f) : num_iterations,
- num_iterations_even = randomIterations ? int(local_mt.poissonInterval((float)num_iterations)+0.5f) : num_iterations;
-
- std::atomic<int> t[2], w[2];
- std::chrono::time_point<std::chrono::high_resolution_clock> start[2], end[2];
- for(int pass = 0; pass < 2; ++pass) {
-
- t[pass] = 0;
- w[pass] = 0;
-
- srand(num_threads);
- std::vector<MersenneTwister> randoms(num_threads);
-
- barrier_type b(num_threads);
-
- start[pass] = std::chrono::high_resolution_clock::now();
-#ifdef USEOMP
- omp_set_num_threads(num_threads);
- std::atomic<int> _j(0);
- #pragma omp parallel
- {
- int const j = _j.fetch_add(1,std::memory_order_relaxed);
- testbarrier_inner(b, num_threads, j, t[pass], w[pass], num_iterations_odd, num_iterations_even, num_items_noncritical, randoms[j], pass==0);
- num_threads = omp_get_num_threads();
- }
-#else
- std::vector<std::thread*> threads(num_threads);
- for(unsigned j = 0; j < num_threads; ++j)
- threads[j] = new std::thread([&,j](){
- testbarrier_inner(b, num_threads, j, t[pass], w[pass], num_iterations_odd, num_iterations_even, num_items_noncritical, randoms[j], pass==0);
- });
- for(unsigned j = 0; j < num_threads; ++j) {
- threads[j]->join();
- delete threads[j];
- }
-#endif
- end[pass] = std::chrono::high_resolution_clock::now();
- }
-
- if(t[0] != t[1]) throw std::string("mismatched iteration counts");
- if(w[0] != w[1]) throw std::string("mismatched work item counts");
-
- int const phases = (std::max)(num_iterations_odd, num_iterations_even);
-
- std::chrono::duration<double> elapsed_seconds_0 = end[0]-start[0],
- elapsed_seconds_1 = end[1]-start[1];
- double const time = (elapsed_seconds_1.count() - elapsed_seconds_0.count()) / phases;
-
- data.push_back(time);
-#ifdef VERBOSE_TEST
- std::cerr << name << " : " << num_threads << "," << elapsed_seconds_1.count() / phases << " - " << elapsed_seconds_0.count() / phases << " = " << time << " \n";
-#endif
- }
-}
-
-template <class... T>
-struct mutex_tester;
-template <class F>
-struct mutex_tester<F> {
- static void run(std::map<std::string,std::vector<double>>& results, std::string const name[], double critical_fraction, double critical_duration) {
- testmutex_outer<F>(results, *name, critical_fraction, critical_duration);
- }
-};
-template <class F, class... T>
-struct mutex_tester<F,T...> {
- static void run(std::map<std::string,std::vector<double>>& results, std::string const name[], double critical_fraction, double critical_duration) {
- mutex_tester<F>::run(results, name, critical_fraction, critical_duration);
- mutex_tester<T...>::run(results, ++name, critical_fraction, critical_duration);
- }
-};
-
-TEST( synchronic, main )
-{
- //warm up
- time_item();
-
- //measure up
-#ifdef VERBOSE_TEST
- std::cerr << "measuring work item speed...\r";
- std::cerr << "work item speed is " << time_item() << " per item, nil is " << time_nil() << "\n";
-#endif
- try {
-
- std::pair<double,double> testpoints[] = { {1, 0}, /*{1E-1, 10E-3}, {5E-1, 2E-6}, {3E-1, 50E-9},*/ };
- for(auto x : testpoints ) {
-
- std::map<std::string,std::vector<double>> results;
-
- //testbarrier_outer<std::barrier>(results, PREFIX"bar 1khz 100us", 1E3, x.second);
-
- std::string const names[] = {
- PREFIX"tkt", PREFIX"mcs", PREFIX"ttas", PREFIX"std"
-#ifdef WIN32
- ,PREFIX"srw"
-#endif
- };
-
- //run -->
-
- mutex_tester<
- ticket_mutex, mcs_mutex, ttas_mutex, std::mutex
-#ifdef WIN32
- ,srw_mutex
-#endif
- >::run(results, names, x.first, x.second);
-
- //<-- run
-
-#ifdef VERBOSE_TEST
- std::cout << "threads";
- for(auto & i : results)
- std::cout << ",\"" << i.first << '\"';
- std::cout << std::endl;
- int j = 0;
- FOR_GAUNTLET(num_threads) {
- std::cout << num_threads;
- for(auto & i : results)
- std::cout << ',' << i.second[j];
- std::cout << std::endl;
- ++j;
- }
-#endif
- }
- }
- catch(std::string & e) {
- std::cerr << "EXCEPTION : " << e << std::endl;
- EXPECT_TRUE( false );
- }
-}
-
-} // namespace Test
-
-#endif
diff --git a/lib/kokkos/core/unit_test/TestSynchronic.hpp b/lib/kokkos/core/unit_test/TestSynchronic.hpp
deleted file mode 100644
index f4341b978..000000000
--- a/lib/kokkos/core/unit_test/TestSynchronic.hpp
+++ /dev/null
@@ -1,241 +0,0 @@
-/*
-
-Copyright (c) 2014, NVIDIA Corporation
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
-1. Redistributions of source code must retain the above copyright notice, this
-list of conditions and the following disclaimer.
-
-2. Redistributions in binary form must reproduce the above copyright notice,
-this list of conditions and the following disclaimer in the documentation
-and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
-IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
-BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
-OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
-OF THE POSSIBILITY OF SUCH DAMAGE.
-
-*/
-
-#ifndef TEST_SYNCHRONIC_HPP
-#define TEST_SYNCHRONIC_HPP
-
-#include <impl/Kokkos_Synchronic.hpp>
-#include <mutex>
-#include <cmath>
-
-namespace Test {
-
-template <bool truly>
-struct dumb_mutex {
-
- dumb_mutex () : locked(0) {
- }
-
- void lock() {
- while(1) {
- bool state = false;
- if (locked.compare_exchange_weak(state,true,std::memory_order_acquire)) {
- break;
- }
- while (locked.load(std::memory_order_relaxed)) {
- if (!truly) {
- Kokkos::Impl::portable_yield();
- }
- }
- }
- }
-
- void unlock() {
- locked.store(false,std::memory_order_release);
- }
-
-private :
- std::atomic<bool> locked;
-};
-
-#ifdef WIN32
-#include <winsock2.h>
-#include <windows.h>
-#include <synchapi.h>
-struct srw_mutex {
-
- srw_mutex () {
- InitializeSRWLock(&_lock);
- }
-
- void lock() {
- AcquireSRWLockExclusive(&_lock);
- }
- void unlock() {
- ReleaseSRWLockExclusive(&_lock);
- }
-
-private :
- SRWLOCK _lock;
-};
-#endif
-
-struct ttas_mutex {
-
- ttas_mutex() : locked(false) {
- }
-
- ttas_mutex(const ttas_mutex&) = delete;
- ttas_mutex& operator=(const ttas_mutex&) = delete;
-
- void lock() {
- for(int i = 0;; ++i) {
- bool state = false;
- if(locked.compare_exchange_weak(state,true,std::memory_order_relaxed,Kokkos::Impl::notify_none))
- break;
- locked.expect_update(true);
- }
- std::atomic_thread_fence(std::memory_order_acquire);
- }
- void unlock() {
- locked.store(false,std::memory_order_release);
- }
-
-private :
- Kokkos::Impl::synchronic<bool> locked;
-};
-
-struct ticket_mutex {
-
- ticket_mutex() : active(0), queue(0) {
- }
-
- ticket_mutex(const ticket_mutex&) = delete;
- ticket_mutex& operator=(const ticket_mutex&) = delete;
-
- void lock() {
- int const me = queue.fetch_add(1, std::memory_order_relaxed);
- while(me != active.load_when_equal(me, std::memory_order_acquire))
- ;
- }
-
- void unlock() {
- active.fetch_add(1,std::memory_order_release);
- }
-private :
- Kokkos::Impl::synchronic<int> active;
- std::atomic<int> queue;
-};
-
-struct mcs_mutex {
-
- mcs_mutex() : head(nullptr) {
- }
-
- mcs_mutex(const mcs_mutex&) = delete;
- mcs_mutex& operator=(const mcs_mutex&) = delete;
-
- struct unique_lock {
-
- unique_lock(mcs_mutex & arg_m) : m(arg_m), next(nullptr), ready(false) {
-
- unique_lock * const h = m.head.exchange(this,std::memory_order_acquire);
- if(__builtin_expect(h != nullptr,0)) {
- h->next.store(this,std::memory_order_seq_cst,Kokkos::Impl::notify_one);
- while(!ready.load_when_not_equal(false,std::memory_order_acquire))
- ;
- }
- }
-
- unique_lock(const unique_lock&) = delete;
- unique_lock& operator=(const unique_lock&) = delete;
-
- ~unique_lock() {
- unique_lock * h = this;
- if(__builtin_expect(!m.head.compare_exchange_strong(h,nullptr,std::memory_order_release, std::memory_order_relaxed),0)) {
- unique_lock * n = next.load(std::memory_order_relaxed);
- while(!n)
- n = next.load_when_not_equal(n,std::memory_order_relaxed);
- n->ready.store(true,std::memory_order_release,Kokkos::Impl::notify_one);
- }
- }
-
- private:
- mcs_mutex & m;
- Kokkos::Impl::synchronic<unique_lock*> next;
- Kokkos::Impl::synchronic<bool> ready;
- };
-
-private :
- std::atomic<unique_lock*> head;
-};
-
-}
-
-namespace std {
-template<>
-struct unique_lock<Test::mcs_mutex> : Test::mcs_mutex::unique_lock {
- unique_lock(Test::mcs_mutex & arg_m) : Test::mcs_mutex::unique_lock(arg_m) {
- }
- unique_lock(const unique_lock&) = delete;
- unique_lock& operator=(const unique_lock&) = delete;
-};
-
-}
-
-/* #include <cmath> */
-#include <stdlib.h>
-
-namespace Test {
-
-//-------------------------------------
-// MersenneTwister
-//-------------------------------------
-#define MT_IA 397
-#define MT_LEN 624
-
-class MersenneTwister
-{
- volatile unsigned long m_buffer[MT_LEN][64/sizeof(unsigned long)];
- volatile int m_index;
-
-public:
- MersenneTwister() {
- for (int i = 0; i < MT_LEN; i++)
- m_buffer[i][0] = rand();
- m_index = 0;
- for (int i = 0; i < MT_LEN * 100; i++)
- integer();
- }
- unsigned long integer() {
- // Indices
- int i = m_index;
- int i2 = m_index + 1; if (i2 >= MT_LEN) i2 = 0; // wrap-around
- int j = m_index + MT_IA; if (j >= MT_LEN) j -= MT_LEN; // wrap-around
-
- // Twist
- unsigned long s = (m_buffer[i][0] & 0x80000000) | (m_buffer[i2][0] & 0x7fffffff);
- unsigned long r = m_buffer[j][0] ^ (s >> 1) ^ ((s & 1) * 0x9908B0DF);
- m_buffer[m_index][0] = r;
- m_index = i2;
-
- // Swizzle
- r ^= (r >> 11);
- r ^= (r << 7) & 0x9d2c5680UL;
- r ^= (r << 15) & 0xefc60000UL;
- r ^= (r >> 18);
- return r;
- }
- float poissonInterval(float ooLambda) {
- return -logf(1.0f - integer() * 2.3283e-10f) * ooLambda;
- }
-};
-
-} // namespace Test
-
-#endif //TEST_HPP
diff --git a/lib/kokkos/core/unit_test/TestTaskScheduler.hpp b/lib/kokkos/core/unit_test/TestTaskScheduler.hpp
index 113455398..57e47d4ba 100644
--- a/lib/kokkos/core/unit_test/TestTaskScheduler.hpp
+++ b/lib/kokkos/core/unit_test/TestTaskScheduler.hpp
@@ -1,551 +1,561 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-
#ifndef KOKKOS_UNITTEST_TASKSCHEDULER_HPP
#define KOKKOS_UNITTEST_TASKSCHEDULER_HPP
#include <stdio.h>
#include <iostream>
#include <cmath>
#if defined( KOKKOS_ENABLE_TASKDAG )
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-
namespace TestTaskScheduler {
namespace {
inline
long eval_fib( long n )
{
- constexpr long mask = 0x03 ;
+ constexpr long mask = 0x03;
- long fib[4] = { 0 , 1 , 1 , 2 };
+ long fib[4] = { 0, 1, 1, 2 };
- for ( long i = 2 ; i <= n ; ++i ) {
+ for ( long i = 2; i <= n; ++i ) {
fib[ i & mask ] = fib[ ( i - 1 ) & mask ] + fib[ ( i - 2 ) & mask ];
}
-
+
return fib[ n & mask ];
}
}
template< typename Space >
struct TestFib
{
- typedef Kokkos::TaskScheduler<Space> policy_type ;
- typedef Kokkos::Future<long,Space> future_type ;
- typedef long value_type ;
+ typedef Kokkos::TaskScheduler< Space > sched_type;
+ typedef Kokkos::Future< long, Space > future_type;
+ typedef long value_type;
- policy_type policy ;
- future_type fib_m1 ;
- future_type fib_m2 ;
- const value_type n ;
+ sched_type sched;
+ future_type fib_m1;
+ future_type fib_m2;
+ const value_type n;
KOKKOS_INLINE_FUNCTION
- TestFib( const policy_type & arg_policy , const value_type arg_n )
- : policy(arg_policy)
- , fib_m1() , fib_m2()
- , n( arg_n )
- {}
+ TestFib( const sched_type & arg_sched, const value_type arg_n )
+ : sched( arg_sched ), fib_m1(), fib_m2(), n( arg_n ) {}
KOKKOS_INLINE_FUNCTION
- void operator()( typename policy_type::member_type & , value_type & result )
- {
+ void operator()( typename sched_type::member_type &, value_type & result )
+ {
#if 0
- printf( "\nTestFib(%ld) %d %d\n"
- , n
- , int( ! fib_m1.is_null() )
- , int( ! fib_m2.is_null() )
- );
+ printf( "\nTestFib(%ld) %d %d\n", n, int( !fib_m1.is_null() ), int( !fib_m2.is_null() ) );
#endif
- if ( n < 2 ) {
- result = n ;
- }
- else if ( ! fib_m2.is_null() && ! fib_m1.is_null() ) {
- result = fib_m1.get() + fib_m2.get();
- }
- else {
-
- // Spawn new children and respawn myself to sum their results:
- // Spawn lower value at higher priority as it has a shorter
- // path to completion.
-
- fib_m2 = policy.task_spawn( TestFib(policy,n-2)
- , Kokkos::TaskSingle
- , Kokkos::TaskHighPriority );
+ if ( n < 2 ) {
+ result = n;
+ }
+ else if ( !fib_m2.is_null() && !fib_m1.is_null() ) {
+ result = fib_m1.get() + fib_m2.get();
+ }
+ else {
+ // Spawn new children and respawn myself to sum their results.
+ // Spawn lower value at higher priority as it has a shorter
+ // path to completion.
- fib_m1 = policy.task_spawn( TestFib(policy,n-1)
- , Kokkos::TaskSingle );
+ fib_m2 = Kokkos::task_spawn( Kokkos::TaskSingle( sched, Kokkos::TaskPriority::High )
+ , TestFib( sched, n - 2 ) );
- Kokkos::Future<Space> dep[] = { fib_m1 , fib_m2 };
+ fib_m1 = Kokkos::task_spawn( Kokkos::TaskSingle( sched )
+ , TestFib( sched, n - 1 ) );
- Kokkos::Future<Space> fib_all = policy.when_all( 2 , dep );
+ Kokkos::Future< Space > dep[] = { fib_m1, fib_m2 };
+ Kokkos::Future< Space > fib_all = Kokkos::when_all( dep, 2 );
- if ( ! fib_m2.is_null() && ! fib_m1.is_null() && ! fib_all.is_null() ) {
- // High priority to retire this branch
- policy.respawn( this , Kokkos::TaskHighPriority , fib_all );
- }
- else {
+ if ( !fib_m2.is_null() && !fib_m1.is_null() && !fib_all.is_null() ) {
+ // High priority to retire this branch.
+ Kokkos::respawn( this, fib_all, Kokkos::TaskPriority::High );
+ }
+ else {
#if 1
- printf( "TestFib(%ld) insufficient memory alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
- , n
- , policy.allocation_capacity()
- , policy.allocated_task_count_max()
- , policy.allocated_task_count_accum()
- );
+ printf( "TestFib(%ld) insufficient memory alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
+ , n
+ , sched.allocation_capacity()
+ , sched.allocated_task_count_max()
+ , sched.allocated_task_count_accum()
+ );
#endif
- Kokkos::abort("TestFib insufficient memory");
- }
+ Kokkos::abort( "TestFib insufficient memory" );
+
}
}
+ }
- static void run( int i , size_t MemoryCapacity = 16000 )
- {
- typedef typename policy_type::memory_space memory_space ;
+ static void run( int i, size_t MemoryCapacity = 16000 )
+ {
+ typedef typename sched_type::memory_space memory_space;
- enum { Log2_SuperBlockSize = 12 };
+ enum { Log2_SuperBlockSize = 12 };
- policy_type root_policy( memory_space() , MemoryCapacity , Log2_SuperBlockSize );
+ sched_type root_sched( memory_space(), MemoryCapacity, Log2_SuperBlockSize );
- future_type f = root_policy.host_spawn( TestFib(root_policy,i) , Kokkos::TaskSingle );
- Kokkos::wait( root_policy );
- ASSERT_EQ( eval_fib(i) , f.get() );
+ future_type f = Kokkos::host_spawn( Kokkos::TaskSingle( root_sched )
+ , TestFib( root_sched, i ) );
+
+ Kokkos::wait( root_sched );
+
+ ASSERT_EQ( eval_fib( i ), f.get() );
#if 0
- fprintf( stdout , "\nTestFib::run(%d) spawn_size(%d) when_all_size(%d) alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
- , i
- , int(root_policy.template spawn_allocation_size<TestFib>())
- , int(root_policy.when_all_allocation_size(2))
- , root_policy.allocation_capacity()
- , root_policy.allocated_task_count_max()
- , root_policy.allocated_task_count_accum()
- );
- fflush( stdout );
+ fprintf( stdout, "\nTestFib::run(%d) spawn_size(%d) when_all_size(%d) alloc_capacity(%d) task_max(%d) task_accum(%ld)\n"
+ , i
+ , int(root_sched.template spawn_allocation_size<TestFib>())
+ , int(root_sched.when_all_allocation_size(2))
+ , root_sched.allocation_capacity()
+ , root_sched.allocated_task_count_max()
+ , root_sched.allocated_task_count_accum()
+ );
+ fflush( stdout );
#endif
- }
-
+ }
};
} // namespace TestTaskScheduler
//----------------------------------------------------------------------------
namespace TestTaskScheduler {
template< class Space >
struct TestTaskDependence {
+ typedef Kokkos::TaskScheduler< Space > sched_type;
+ typedef Kokkos::Future< Space > future_type;
+ typedef Kokkos::View< long, Space > accum_type;
+ typedef void value_type;
- typedef Kokkos::TaskScheduler<Space> policy_type ;
- typedef Kokkos::Future<Space> future_type ;
- typedef Kokkos::View<long,Space> accum_type ;
- typedef void value_type ;
-
- policy_type m_policy ;
- accum_type m_accum ;
- long m_count ;
+ sched_type m_sched;
+ accum_type m_accum;
+ long m_count;
KOKKOS_INLINE_FUNCTION
TestTaskDependence( long n
- , const policy_type & arg_policy
- , const accum_type & arg_accum )
- : m_policy( arg_policy )
+ , const sched_type & arg_sched
+ , const accum_type & arg_accum )
+ : m_sched( arg_sched )
, m_accum( arg_accum )
- , m_count( n )
- {}
+ , m_count( n ) {}
KOKKOS_INLINE_FUNCTION
- void operator()( typename policy_type::member_type & )
- {
- enum { CHUNK = 8 };
- const int n = CHUNK < m_count ? CHUNK : m_count ;
+ void operator()( typename sched_type::member_type & )
+ {
+ enum { CHUNK = 8 };
+ const int n = CHUNK < m_count ? CHUNK : m_count;
- if ( 1 < m_count ) {
- future_type f[ CHUNK ] ;
+ if ( 1 < m_count ) {
+ future_type f[ CHUNK ];
- const int inc = ( m_count + n - 1 ) / n ;
+ const int inc = ( m_count + n - 1 ) / n;
- for ( int i = 0 ; i < n ; ++i ) {
- long begin = i * inc ;
- long count = begin + inc < m_count ? inc : m_count - begin ;
- f[i] = m_policy.task_spawn( TestTaskDependence(count,m_policy,m_accum) , Kokkos::TaskSingle );
- }
+ for ( int i = 0; i < n; ++i ) {
+ long begin = i * inc;
+ long count = begin + inc < m_count ? inc : m_count - begin;
+ f[i] = Kokkos::task_spawn( Kokkos::TaskSingle( m_sched )
+ , TestTaskDependence( count, m_sched, m_accum ) );
+ }
- m_count = 0 ;
+ m_count = 0;
- m_policy.respawn( this , m_policy.when_all( n , f ) );
- }
- else if ( 1 == m_count ) {
- Kokkos::atomic_increment( & m_accum() );
- }
+ Kokkos::respawn( this, Kokkos::when_all( f, n ) );
+ }
+ else if ( 1 == m_count ) {
+ Kokkos::atomic_increment( & m_accum() );
}
+ }
static void run( int n )
- {
- typedef typename policy_type::memory_space memory_space ;
+ {
+ typedef typename sched_type::memory_space memory_space;
- // enum { MemoryCapacity = 4000 }; // Triggers infinite loop in memory pool
- enum { MemoryCapacity = 16000 };
- enum { Log2_SuperBlockSize = 12 };
- policy_type policy( memory_space() , MemoryCapacity , Log2_SuperBlockSize );
+ // enum { MemoryCapacity = 4000 }; // Triggers infinite loop in memory pool.
+ enum { MemoryCapacity = 16000 };
+ enum { Log2_SuperBlockSize = 12 };
+ sched_type sched( memory_space(), MemoryCapacity, Log2_SuperBlockSize );
- accum_type accum("accum");
+ accum_type accum( "accum" );
- typename accum_type::HostMirror host_accum =
- Kokkos::create_mirror_view( accum );
+ typename accum_type::HostMirror host_accum = Kokkos::create_mirror_view( accum );
- policy.host_spawn( TestTaskDependence(n,policy,accum) , Kokkos::TaskSingle );
+ Kokkos::host_spawn( Kokkos::TaskSingle( sched ), TestTaskDependence( n, sched, accum ) );
- Kokkos::wait( policy );
+ Kokkos::wait( sched );
- Kokkos::deep_copy( host_accum , accum );
+ Kokkos::deep_copy( host_accum, accum );
- ASSERT_EQ( host_accum() , n );
- }
+ ASSERT_EQ( host_accum(), n );
+ }
};
} // namespace TestTaskScheduler
//----------------------------------------------------------------------------
namespace TestTaskScheduler {
template< class ExecSpace >
struct TestTaskTeam {
-
//enum { SPAN = 8 };
enum { SPAN = 33 };
//enum { SPAN = 1 };
- typedef void value_type ;
- typedef Kokkos::TaskScheduler<ExecSpace> policy_type ;
- typedef Kokkos::Future<ExecSpace> future_type ;
- typedef Kokkos::View<long*,ExecSpace> view_type ;
+ typedef void value_type;
+ typedef Kokkos::TaskScheduler< ExecSpace > sched_type;
+ typedef Kokkos::Future< ExecSpace > future_type;
+ typedef Kokkos::View< long*, ExecSpace > view_type;
- policy_type policy ;
- future_type future ;
+ sched_type sched;
+ future_type future;
- view_type parfor_result ;
- view_type parreduce_check ;
- view_type parscan_result ;
- view_type parscan_check ;
- const long nvalue ;
+ view_type parfor_result;
+ view_type parreduce_check;
+ view_type parscan_result;
+ view_type parscan_check;
+ const long nvalue;
KOKKOS_INLINE_FUNCTION
- TestTaskTeam( const policy_type & arg_policy
- , const view_type & arg_parfor_result
- , const view_type & arg_parreduce_check
- , const view_type & arg_parscan_result
- , const view_type & arg_parscan_check
- , const long arg_nvalue )
- : policy(arg_policy)
+ TestTaskTeam( const sched_type & arg_sched
+ , const view_type & arg_parfor_result
+ , const view_type & arg_parreduce_check
+ , const view_type & arg_parscan_result
+ , const view_type & arg_parscan_check
+ , const long arg_nvalue )
+ : sched( arg_sched )
, future()
, parfor_result( arg_parfor_result )
, parreduce_check( arg_parreduce_check )
, parscan_result( arg_parscan_result )
, parscan_check( arg_parscan_check )
- , nvalue( arg_nvalue )
- {}
+ , nvalue( arg_nvalue ) {}
KOKKOS_INLINE_FUNCTION
- void operator()( typename policy_type::member_type & member )
- {
- const long end = nvalue + 1 ;
- const long begin = 0 < end - SPAN ? end - SPAN : 0 ;
-
- if ( 0 < begin && future.is_null() ) {
- if ( member.team_rank() == 0 ) {
- future = policy.task_spawn
- ( TestTaskTeam( policy ,
- parfor_result ,
- parreduce_check,
- parscan_result,
- parscan_check,
- begin - 1 )
- , Kokkos::TaskTeam );
-
- assert( ! future.is_null() );
-
- policy.respawn( this , future );
- }
- return ;
- }
+ void operator()( typename sched_type::member_type & member )
+ {
+ const long end = nvalue + 1;
+ const long begin = 0 < end - SPAN ? end - SPAN : 0;
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) { parfor_result[i] = i ; }
- );
-
- // test parallel_reduce without join
-
- long tot = 0;
- long expected = (begin+end-1)*(end-begin)*0.5;
-
- Kokkos::parallel_reduce( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i, long &res) { res += parfor_result[i]; }
- , tot);
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) { parreduce_check[i] = expected-tot ; }
- );
-
- // test parallel_reduce with join
-
- tot = 0;
- Kokkos::parallel_reduce( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i, long &res) { res += parfor_result[i]; }
- , [&]( long& val1, const long& val2) { val1 += val2; }
- , tot);
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) { parreduce_check[i] += expected-tot ; }
- );
-
- // test parallel_scan
-
- // Exclusive scan
- Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i, long &val , const bool final ) {
- if ( final ) { parscan_result[i] = val; }
- val += i;
- }
- );
+ if ( 0 < begin && future.is_null() ) {
if ( member.team_rank() == 0 ) {
- for ( long i = begin ; i < end ; ++i ) {
- parscan_check[i] = (i*(i-1)-begin*(begin-1))*0.5-parscan_result[i];
- }
+ future = Kokkos::task_spawn( Kokkos::TaskTeam( sched )
+ , TestTaskTeam( sched
+ , parfor_result
+ , parreduce_check
+ , parscan_result
+ , parscan_check
+ , begin - 1 )
+ );
+
+ assert( !future.is_null() );
+
+ Kokkos::respawn( this, future );
}
- // Inclusive scan
- Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i, long &val , const bool final ) {
- val += i;
- if ( final ) { parscan_result[i] = val; }
- }
- );
- if ( member.team_rank() == 0 ) {
- for ( long i = begin ; i < end ; ++i ) {
- parscan_check[i] += (i*(i+1)-begin*(begin-1))*0.5-parscan_result[i];
- }
+ return;
+ }
+
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( member, begin, end )
+ , [&] ( int i ) { parfor_result[i] = i; }
+ );
+
+ // Test parallel_reduce without join.
+
+ long tot = 0;
+ long expected = ( begin + end - 1 ) * ( end - begin ) * 0.5;
+
+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( member, begin, end )
+ , [&] ( int i, long & res ) { res += parfor_result[i]; }
+ , tot
+ );
+
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( member, begin, end )
+ , [&] ( int i ) { parreduce_check[i] = expected - tot; }
+ );
+
+ // Test parallel_reduce with join.
+
+ tot = 0;
+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( member, begin, end )
+ , [&] ( int i, long & res ) { res += parfor_result[i]; }
+#if 0
+ , Kokkos::Sum( tot )
+#else
+ , [] ( long & dst, const long & src ) { dst += src; }
+ , tot
+#endif
+ );
+
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( member, begin, end )
+ , [&] ( int i ) { parreduce_check[i] += expected - tot; }
+ );
+
+ // Test parallel_scan.
+
+ // Exclusive scan.
+ Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange( member, begin, end )
+ , [&] ( int i, long & val, const bool final )
+ {
+ if ( final ) { parscan_result[i] = val; }
+
+ val += i;
+ });
+
+ // Wait for 'parscan_result' before testing it.
+ member.team_barrier();
+
+ if ( member.team_rank() == 0 ) {
+ for ( long i = begin; i < end; ++i ) {
+ parscan_check[i] = ( i * ( i - 1 ) - begin * ( begin - 1 ) ) * 0.5 - parscan_result[i];
}
- // ThreadVectorRange check
- /*
- long result = 0;
- expected = (begin+end-1)*(end-begin)*0.5;
- Kokkos::parallel_reduce( Kokkos::TeamThreadRange( member , 0 , 1 )
- , [&] ( const int i , long & outerUpdate ) {
- long sum_j = 0.0;
- Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( member , end - begin )
- , [&] ( const int j , long &innerUpdate ) {
- innerUpdate += begin+j;
- } , sum_j );
- outerUpdate += sum_j ;
- } , result );
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) {
- parreduce_check[i] += result-expected ;
- }
- );
- */
}
- static void run( long n )
+ // Don't overwrite 'parscan_result' until it has been tested.
+ member.team_barrier();
+
+ // Inclusive scan.
+ Kokkos::parallel_scan<long>( Kokkos::TeamThreadRange( member, begin, end )
+ , [&] ( int i, long & val, const bool final )
{
- // const unsigned memory_capacity = 10000 ; // causes memory pool infinite loop
- // const unsigned memory_capacity = 100000 ; // fails with SPAN=1 for serial and OMP
- const unsigned memory_capacity = 400000 ;
-
- policy_type root_policy( typename policy_type::memory_space()
- , memory_capacity );
-
- view_type root_parfor_result("parfor_result",n+1);
- view_type root_parreduce_check("parreduce_check",n+1);
- view_type root_parscan_result("parscan_result",n+1);
- view_type root_parscan_check("parscan_check",n+1);
-
- typename view_type::HostMirror
- host_parfor_result = Kokkos::create_mirror_view( root_parfor_result );
- typename view_type::HostMirror
- host_parreduce_check = Kokkos::create_mirror_view( root_parreduce_check );
- typename view_type::HostMirror
- host_parscan_result = Kokkos::create_mirror_view( root_parscan_result );
- typename view_type::HostMirror
- host_parscan_check = Kokkos::create_mirror_view( root_parscan_check );
-
- future_type f = root_policy.host_spawn(
- TestTaskTeam( root_policy ,
- root_parfor_result ,
- root_parreduce_check ,
- root_parscan_result,
- root_parscan_check,
- n ) ,
- Kokkos::TaskTeam );
-
- Kokkos::wait( root_policy );
-
- Kokkos::deep_copy( host_parfor_result , root_parfor_result );
- Kokkos::deep_copy( host_parreduce_check , root_parreduce_check );
- Kokkos::deep_copy( host_parscan_result , root_parscan_result );
- Kokkos::deep_copy( host_parscan_check , root_parscan_check );
-
- for ( long i = 0 ; i <= n ; ++i ) {
- const long answer = i ;
- if ( host_parfor_result(i) != answer ) {
- std::cerr << "TestTaskTeam::run ERROR parallel_for result(" << i << ") = "
- << host_parfor_result(i) << " != " << answer << std::endl ;
- }
- if ( host_parreduce_check(i) != 0 ) {
- std::cerr << "TestTaskTeam::run ERROR parallel_reduce check(" << i << ") = "
- << host_parreduce_check(i) << " != 0" << std::endl ;
- }
- if ( host_parscan_check(i) != 0 ) {
- std::cerr << "TestTaskTeam::run ERROR parallel_scan check(" << i << ") = "
- << host_parscan_check(i) << " != 0" << std::endl ;
- }
+ val += i;
+
+ if ( final ) { parscan_result[i] = val; }
+ });
+
+ // Wait for 'parscan_result' before testing it.
+ member.team_barrier();
+
+ if ( member.team_rank() == 0 ) {
+ for ( long i = begin; i < end; ++i ) {
+ parscan_check[i] += ( i * ( i + 1 ) - begin * ( begin - 1 ) ) * 0.5 - parscan_result[i];
}
}
+
+ // ThreadVectorRange check.
+/*
+ long result = 0;
+ expected = ( begin + end - 1 ) * ( end - begin ) * 0.5;
+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( member, 0, 1 )
+ , [&] ( const int i, long & outerUpdate )
+ {
+ long sum_j = 0.0;
+
+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( member, end - begin )
+ , [&] ( const int j, long & innerUpdate )
+ {
+ innerUpdate += begin + j;
+ }, sum_j );
+
+ outerUpdate += sum_j;
+ }, result );
+
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( member, begin, end )
+ , [&] ( int i )
+ {
+ parreduce_check[i] += result - expected;
+ });
+*/
+ }
+
+ static void run( long n )
+ {
+ //const unsigned memory_capacity = 10000; // Causes memory pool infinite loop.
+ //const unsigned memory_capacity = 100000; // Fails with SPAN=1 for serial and OMP.
+ const unsigned memory_capacity = 400000;
+
+ sched_type root_sched( typename sched_type::memory_space(), memory_capacity );
+
+ view_type root_parfor_result( "parfor_result", n + 1 );
+ view_type root_parreduce_check( "parreduce_check", n + 1 );
+ view_type root_parscan_result( "parscan_result", n + 1 );
+ view_type root_parscan_check( "parscan_check", n + 1 );
+
+ typename view_type::HostMirror
+ host_parfor_result = Kokkos::create_mirror_view( root_parfor_result );
+ typename view_type::HostMirror
+ host_parreduce_check = Kokkos::create_mirror_view( root_parreduce_check );
+ typename view_type::HostMirror
+ host_parscan_result = Kokkos::create_mirror_view( root_parscan_result );
+ typename view_type::HostMirror
+ host_parscan_check = Kokkos::create_mirror_view( root_parscan_check );
+
+ future_type f = Kokkos::host_spawn( Kokkos::TaskTeam( root_sched )
+ , TestTaskTeam( root_sched
+ , root_parfor_result
+ , root_parreduce_check
+ , root_parscan_result
+ , root_parscan_check
+ , n )
+ );
+
+ Kokkos::wait( root_sched );
+
+ Kokkos::deep_copy( host_parfor_result, root_parfor_result );
+ Kokkos::deep_copy( host_parreduce_check, root_parreduce_check );
+ Kokkos::deep_copy( host_parscan_result, root_parscan_result );
+ Kokkos::deep_copy( host_parscan_check, root_parscan_check );
+
+ for ( long i = 0; i <= n; ++i ) {
+ const long answer = i;
+
+ if ( host_parfor_result( i ) != answer ) {
+ std::cerr << "TestTaskTeam::run ERROR parallel_for result(" << i << ") = "
+ << host_parfor_result( i ) << " != " << answer << std::endl;
+ }
+
+ if ( host_parreduce_check( i ) != 0 ) {
+ std::cerr << "TestTaskTeam::run ERROR parallel_reduce check(" << i << ") = "
+ << host_parreduce_check( i ) << " != 0" << std::endl;
+ }
+
+ if ( host_parscan_check( i ) != 0 ) {
+ std::cerr << "TestTaskTeam::run ERROR parallel_scan check(" << i << ") = "
+ << host_parscan_check( i ) << " != 0" << std::endl;
+ }
+ }
+ }
};
template< class ExecSpace >
struct TestTaskTeamValue {
-
enum { SPAN = 8 };
- typedef long value_type ;
- typedef Kokkos::TaskScheduler<ExecSpace> policy_type ;
- typedef Kokkos::Future<value_type,ExecSpace> future_type ;
- typedef Kokkos::View<long*,ExecSpace> view_type ;
+ typedef long value_type;
+ typedef Kokkos::TaskScheduler< ExecSpace > sched_type;
+ typedef Kokkos::Future< value_type, ExecSpace > future_type;
+ typedef Kokkos::View< long*, ExecSpace > view_type;
- policy_type policy ;
- future_type future ;
+ sched_type sched;
+ future_type future;
- view_type result ;
- const long nvalue ;
+ view_type result;
+ const long nvalue;
KOKKOS_INLINE_FUNCTION
- TestTaskTeamValue( const policy_type & arg_policy
- , const view_type & arg_result
- , const long arg_nvalue )
- : policy(arg_policy)
+ TestTaskTeamValue( const sched_type & arg_sched
+ , const view_type & arg_result
+ , const long arg_nvalue )
+ : sched( arg_sched )
, future()
, result( arg_result )
- , nvalue( arg_nvalue )
- {}
+ , nvalue( arg_nvalue ) {}
KOKKOS_INLINE_FUNCTION
- void operator()( typename policy_type::member_type const & member
+ void operator()( typename sched_type::member_type const & member
, value_type & final )
- {
- const long end = nvalue + 1 ;
- const long begin = 0 < end - SPAN ? end - SPAN : 0 ;
+ {
+ const long end = nvalue + 1;
+ const long begin = 0 < end - SPAN ? end - SPAN : 0;
- if ( 0 < begin && future.is_null() ) {
- if ( member.team_rank() == 0 ) {
-
- future = policy.task_spawn
- ( TestTaskTeamValue( policy , result , begin - 1 )
- , Kokkos::TaskTeam );
+ if ( 0 < begin && future.is_null() ) {
+ if ( member.team_rank() == 0 ) {
+ future = sched.task_spawn( TestTaskTeamValue( sched, result, begin - 1 )
+ , Kokkos::TaskTeam );
- assert( ! future.is_null() );
+ assert( !future.is_null() );
- policy.respawn( this , future );
- }
- return ;
+ sched.respawn( this , future );
}
- Kokkos::parallel_for( Kokkos::TeamThreadRange(member,begin,end)
- , [&]( int i ) { result[i] = i + 1 ; }
- );
+ return;
+ }
- if ( member.team_rank() == 0 ) {
- final = result[nvalue] ;
- }
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( member, begin, end )
+ , [&] ( int i ) { result[i] = i + 1; }
+ );
- Kokkos::memory_fence();
+ if ( member.team_rank() == 0 ) {
+ final = result[nvalue];
}
+ Kokkos::memory_fence();
+ }
+
static void run( long n )
- {
- // const unsigned memory_capacity = 10000 ; // causes memory pool infinite loop
- const unsigned memory_capacity = 100000 ;
+ {
+ //const unsigned memory_capacity = 10000; // Causes memory pool infinite loop.
+ const unsigned memory_capacity = 100000;
- policy_type root_policy( typename policy_type::memory_space()
- , memory_capacity );
+ sched_type root_sched( typename sched_type::memory_space()
+ , memory_capacity );
- view_type root_result("result",n+1);
+ view_type root_result( "result", n + 1 );
- typename view_type::HostMirror
- host_result = Kokkos::create_mirror_view( root_result );
+ typename view_type::HostMirror host_result = Kokkos::create_mirror_view( root_result );
- future_type fv = root_policy.host_spawn
- ( TestTaskTeamValue( root_policy, root_result, n ) , Kokkos::TaskTeam );
+ future_type fv = root_sched.host_spawn( TestTaskTeamValue( root_sched, root_result, n )
+ , Kokkos::TaskTeam );
- Kokkos::wait( root_policy );
+ Kokkos::wait( root_sched );
- Kokkos::deep_copy( host_result , root_result );
+ Kokkos::deep_copy( host_result, root_result );
- if ( fv.get() != n + 1 ) {
- std::cerr << "TestTaskTeamValue ERROR future = "
- << fv.get() << " != " << n + 1 << std::endl ;
- }
- for ( long i = 0 ; i <= n ; ++i ) {
- const long answer = i + 1 ;
- if ( host_result(i) != answer ) {
- std::cerr << "TestTaskTeamValue ERROR result(" << i << ") = "
- << host_result(i) << " != " << answer << std::endl ;
- }
+ if ( fv.get() != n + 1 ) {
+ std::cerr << "TestTaskTeamValue ERROR future = "
+ << fv.get() << " != " << n + 1 << std::endl;
+ }
+
+ for ( long i = 0; i <= n; ++i ) {
+ const long answer = i + 1;
+
+ if ( host_result( i ) != answer ) {
+ std::cerr << "TestTaskTeamValue ERROR result(" << i << ") = "
+ << host_result( i ) << " != " << answer << std::endl;
}
}
+ }
};
-} // namespace TestTaskScheduler
-
-//----------------------------------------------------------------------------
-//----------------------------------------------------------------------------
-#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
-#endif /* #ifndef KOKKOS_UNITTEST_TASKSCHEDULER_HPP */
+} // namespace TestTaskScheduler
+#endif // #if defined( KOKKOS_ENABLE_TASKDAG )
+#endif // #ifndef KOKKOS_UNITTEST_TASKSCHEDULER_HPP
diff --git a/lib/kokkos/core/unit_test/TestTeam.hpp b/lib/kokkos/core/unit_test/TestTeam.hpp
index bcf4d3a17..11a523921 100644
--- a/lib/kokkos/core/unit_test/TestTeam.hpp
+++ b/lib/kokkos/core/unit_test/TestTeam.hpp
@@ -1,923 +1,947 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <stdio.h>
#include <stdexcept>
#include <sstream>
#include <iostream>
#include <Kokkos_Core.hpp>
-/*--------------------------------------------------------------------------*/
-
namespace Test {
+
namespace {
template< class ExecSpace, class ScheduleType >
struct TestTeamPolicy {
+ typedef typename Kokkos::TeamPolicy< ScheduleType, ExecSpace >::member_type team_member;
+ typedef Kokkos::View< int**, ExecSpace > view_type;
- typedef typename Kokkos::TeamPolicy< ScheduleType, ExecSpace >::member_type team_member ;
- typedef Kokkos::View<int**,ExecSpace> view_type ;
-
- view_type m_flags ;
+ view_type m_flags;
TestTeamPolicy( const size_t league_size )
- : m_flags( Kokkos::ViewAllocateWithoutInitializing("flags")
- , Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( *this )
- , league_size )
- {}
+ : m_flags( Kokkos::ViewAllocateWithoutInitializing( "flags" ),
+ Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( *this ),
+ league_size ) {}
struct VerifyInitTag {};
KOKKOS_INLINE_FUNCTION
void operator()( const team_member & member ) const
- {
- const int tid = member.team_rank() + member.team_size() * member.league_rank();
+ {
+ const int tid = member.team_rank() + member.team_size() * member.league_rank();
- m_flags( member.team_rank() , member.league_rank() ) = tid ;
- }
+ m_flags( member.team_rank(), member.league_rank() ) = tid;
+ }
KOKKOS_INLINE_FUNCTION
- void operator()( const VerifyInitTag & , const team_member & member ) const
- {
- const int tid = member.team_rank() + member.team_size() * member.league_rank();
+ void operator()( const VerifyInitTag &, const team_member & member ) const
+ {
+ const int tid = member.team_rank() + member.team_size() * member.league_rank();
- if ( tid != m_flags( member.team_rank() , member.league_rank() ) ) {
- printf("TestTeamPolicy member(%d,%d) error %d != %d\n"
- , member.league_rank() , member.team_rank()
- , tid , m_flags( member.team_rank() , member.league_rank() ) );
- }
+ if ( tid != m_flags( member.team_rank(), member.league_rank() ) ) {
+ printf( "TestTeamPolicy member(%d,%d) error %d != %d\n",
+ member.league_rank(), member.team_rank(),
+ tid, m_flags( member.team_rank(), member.league_rank() ) );
}
+ }
- // included for test_small_league_size
- TestTeamPolicy()
- : m_flags()
- {}
+ // Included for test_small_league_size.
+ TestTeamPolicy() : m_flags() {}
+
+ // Included for test_small_league_size.
+ struct NoOpTag {};
- // included for test_small_league_size
- struct NoOpTag {} ;
KOKKOS_INLINE_FUNCTION
- void operator()( const NoOpTag & , const team_member & member ) const
- {}
+ void operator()( const NoOpTag &, const team_member & member ) const {}
static void test_small_league_size() {
-
int bs = 8; // batch size (number of elements per batch)
int ns = 16; // total number of "problems" to process
- // calculate total scratch memory space size
+ // Calculate total scratch memory space size.
const int level = 0;
int mem_size = 960;
- const int num_teams = ns/bs;
- const Kokkos::TeamPolicy< ExecSpace, NoOpTag > policy(num_teams, Kokkos::AUTO());
+ const int num_teams = ns / bs;
+ const Kokkos::TeamPolicy< ExecSpace, NoOpTag > policy( num_teams, Kokkos::AUTO() );
- Kokkos::parallel_for ( policy.set_scratch_size(level, Kokkos::PerTeam(mem_size), Kokkos::PerThread(0))
- , TestTeamPolicy()
- );
+ Kokkos::parallel_for( policy.set_scratch_size( level, Kokkos::PerTeam( mem_size ), Kokkos::PerThread( 0 ) ),
+ TestTeamPolicy() );
}
static void test_for( const size_t league_size )
- {
- TestTeamPolicy functor( league_size );
+ {
+ TestTeamPolicy functor( league_size );
- const int team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( functor );
+ const int team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( functor );
- Kokkos::parallel_for( Kokkos::TeamPolicy< ScheduleType, ExecSpace >( league_size , team_size ) , functor );
- Kokkos::parallel_for( Kokkos::TeamPolicy< ScheduleType, ExecSpace , VerifyInitTag >( league_size , team_size ) , functor );
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ScheduleType, ExecSpace >( league_size, team_size ), functor );
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ScheduleType, ExecSpace, VerifyInitTag >( league_size, team_size ), functor );
- test_small_league_size();
- }
+ test_small_league_size();
+ }
struct ReduceTag {};
- typedef long value_type ;
+ typedef long value_type;
KOKKOS_INLINE_FUNCTION
- void operator()( const team_member & member , value_type & update ) const
- {
- update += member.team_rank() + member.team_size() * member.league_rank();
- }
+ void operator()( const team_member & member, value_type & update ) const
+ {
+ update += member.team_rank() + member.team_size() * member.league_rank();
+ }
KOKKOS_INLINE_FUNCTION
- void operator()( const ReduceTag & , const team_member & member , value_type & update ) const
- {
- update += 1 + member.team_rank() + member.team_size() * member.league_rank();
- }
+ void operator()( const ReduceTag &, const team_member & member, value_type & update ) const
+ {
+ update += 1 + member.team_rank() + member.team_size() * member.league_rank();
+ }
static void test_reduce( const size_t league_size )
- {
- TestTeamPolicy functor( league_size );
+ {
+ TestTeamPolicy functor( league_size );
- const int team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( functor );
- const long N = team_size * league_size ;
+ const int team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( functor );
+ const long N = team_size * league_size;
- long total = 0 ;
+ long total = 0;
- Kokkos::parallel_reduce( Kokkos::TeamPolicy< ScheduleType, ExecSpace >( league_size , team_size ) , functor , total );
- ASSERT_EQ( size_t((N-1)*(N))/2 , size_t(total) );
+ Kokkos::parallel_reduce( Kokkos::TeamPolicy< ScheduleType, ExecSpace >( league_size, team_size ), functor, total );
+ ASSERT_EQ( size_t( ( N - 1 ) * ( N ) ) / 2, size_t( total ) );
- Kokkos::parallel_reduce( Kokkos::TeamPolicy< ScheduleType, ExecSpace , ReduceTag >( league_size , team_size ) , functor , total );
- ASSERT_EQ( (size_t(N)*size_t(N+1))/2 , size_t(total) );
- }
+ Kokkos::parallel_reduce( Kokkos::TeamPolicy< ScheduleType, ExecSpace, ReduceTag >( league_size, team_size ), functor, total );
+ ASSERT_EQ( ( size_t( N ) * size_t( N + 1 ) ) / 2, size_t( total ) );
+ }
};
-}
-}
+} // namespace
+
+} // namespace Test
/*--------------------------------------------------------------------------*/
namespace Test {
-template< typename ScalarType , class DeviceType, class ScheduleType >
+template< typename ScalarType, class DeviceType, class ScheduleType >
class ReduceTeamFunctor
{
public:
- typedef DeviceType execution_space ;
- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
- typedef typename execution_space::size_type size_type ;
+ typedef DeviceType execution_space;
+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;
+ typedef typename execution_space::size_type size_type;
struct value_type {
- ScalarType value[3] ;
+ ScalarType value[3];
};
- const size_type nwork ;
+ const size_type nwork;
ReduceTeamFunctor( const size_type & arg_nwork ) : nwork( arg_nwork ) {}
- ReduceTeamFunctor( const ReduceTeamFunctor & rhs )
- : nwork( rhs.nwork ) {}
+ ReduceTeamFunctor( const ReduceTeamFunctor & rhs ) : nwork( rhs.nwork ) {}
KOKKOS_INLINE_FUNCTION
void init( value_type & dst ) const
{
- dst.value[0] = 0 ;
- dst.value[1] = 0 ;
- dst.value[2] = 0 ;
+ dst.value[0] = 0;
+ dst.value[1] = 0;
+ dst.value[2] = 0;
}
KOKKOS_INLINE_FUNCTION
- void join( volatile value_type & dst ,
- const volatile value_type & src ) const
+ void join( volatile value_type & dst, const volatile value_type & src ) const
{
- dst.value[0] += src.value[0] ;
- dst.value[1] += src.value[1] ;
- dst.value[2] += src.value[2] ;
+ dst.value[0] += src.value[0];
+ dst.value[1] += src.value[1];
+ dst.value[2] += src.value[2];
}
KOKKOS_INLINE_FUNCTION
- void operator()( const typename policy_type::member_type ind , value_type & dst ) const
+ void operator()( const typename policy_type::member_type ind, value_type & dst ) const
{
const int thread_rank = ind.team_rank() + ind.team_size() * ind.league_rank();
const int thread_size = ind.team_size() * ind.league_size();
- const int chunk = ( nwork + thread_size - 1 ) / thread_size ;
+ const int chunk = ( nwork + thread_size - 1 ) / thread_size;
- size_type iwork = chunk * thread_rank ;
- const size_type iwork_end = iwork + chunk < nwork ? iwork + chunk : nwork ;
+ size_type iwork = chunk * thread_rank;
+ const size_type iwork_end = iwork + chunk < nwork ? iwork + chunk : nwork;
- for ( ; iwork < iwork_end ; ++iwork ) {
- dst.value[0] += 1 ;
- dst.value[1] += iwork + 1 ;
- dst.value[2] += nwork - iwork ;
+ for ( ; iwork < iwork_end; ++iwork ) {
+ dst.value[0] += 1;
+ dst.value[1] += iwork + 1;
+ dst.value[2] += nwork - iwork;
}
}
};
} // namespace Test
namespace {
-template< typename ScalarType , class DeviceType, class ScheduleType >
+template< typename ScalarType, class DeviceType, class ScheduleType >
class TestReduceTeam
{
public:
- typedef DeviceType execution_space ;
- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
- typedef typename execution_space::size_type size_type ;
-
- //------------------------------------
+ typedef DeviceType execution_space;
+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;
+ typedef typename execution_space::size_type size_type;
- TestReduceTeam( const size_type & nwork )
- {
- run_test(nwork);
- }
+ TestReduceTeam( const size_type & nwork ) { run_test( nwork ); }
void run_test( const size_type & nwork )
{
- typedef Test::ReduceTeamFunctor< ScalarType , execution_space , ScheduleType> functor_type ;
- typedef typename functor_type::value_type value_type ;
- typedef Kokkos::View< value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type ;
+ typedef Test::ReduceTeamFunctor< ScalarType, execution_space, ScheduleType> functor_type;
+ typedef typename functor_type::value_type value_type;
+ typedef Kokkos::View< value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type;
enum { Count = 3 };
enum { Repeat = 100 };
value_type result[ Repeat ];
- const unsigned long nw = nwork ;
- const unsigned long nsum = nw % 2 ? nw * (( nw + 1 )/2 )
- : (nw/2) * ( nw + 1 );
+ const unsigned long nw = nwork;
+ const unsigned long nsum = nw % 2 ? nw * ( ( nw + 1 ) / 2 )
+ : ( nw / 2 ) * ( nw + 1 );
- const unsigned team_size = policy_type::team_size_recommended( functor_type(nwork) );
- const unsigned league_size = ( nwork + team_size - 1 ) / team_size ;
+ const unsigned team_size = policy_type::team_size_recommended( functor_type( nwork ) );
+ const unsigned league_size = ( nwork + team_size - 1 ) / team_size;
- policy_type team_exec( league_size , team_size );
+ policy_type team_exec( league_size, team_size );
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
+ for ( unsigned i = 0; i < Repeat; ++i ) {
result_type tmp( & result[i] );
- Kokkos::parallel_reduce( team_exec , functor_type(nwork) , tmp );
+ Kokkos::parallel_reduce( team_exec, functor_type( nwork ), tmp );
}
execution_space::fence();
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- for ( unsigned j = 0 ; j < Count ; ++j ) {
- const unsigned long correct = 0 == j % 3 ? nw : nsum ;
- ASSERT_EQ( (ScalarType) correct , result[i].value[j] );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ for ( unsigned j = 0; j < Count; ++j ) {
+ const unsigned long correct = 0 == j % 3 ? nw : nsum;
+ ASSERT_EQ( (ScalarType) correct, result[i].value[j] );
}
}
}
};
-}
+} // namespace
/*--------------------------------------------------------------------------*/
namespace Test {
template< class DeviceType, class ScheduleType >
class ScanTeamFunctor
{
public:
- typedef DeviceType execution_space ;
- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
+ typedef DeviceType execution_space;
+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;
+ typedef long int value_type;
- typedef long int value_type ;
- Kokkos::View< value_type , execution_space > accum ;
- Kokkos::View< value_type , execution_space > total ;
+ Kokkos::View< value_type, execution_space > accum;
+ Kokkos::View< value_type, execution_space > total;
- ScanTeamFunctor() : accum("accum"), total("total") {}
+ ScanTeamFunctor() : accum( "accum" ), total( "total" ) {}
KOKKOS_INLINE_FUNCTION
- void init( value_type & error ) const { error = 0 ; }
+ void init( value_type & error ) const { error = 0; }
KOKKOS_INLINE_FUNCTION
- void join( value_type volatile & error ,
- value_type volatile const & input ) const
- { if ( input ) error = 1 ; }
+ void join( value_type volatile & error, value_type volatile const & input ) const
+ { if ( input ) error = 1; }
struct JoinMax {
- typedef long int value_type ;
+ typedef long int value_type;
+
KOKKOS_INLINE_FUNCTION
- void join( value_type volatile & dst
- , value_type volatile const & input ) const
- { if ( dst < input ) dst = input ; }
+ void join( value_type volatile & dst, value_type volatile const & input ) const
+ { if ( dst < input ) dst = input; }
};
KOKKOS_INLINE_FUNCTION
- void operator()( const typename policy_type::member_type ind , value_type & error ) const
+ void operator()( const typename policy_type::member_type ind, value_type & error ) const
{
if ( 0 == ind.league_rank() && 0 == ind.team_rank() ) {
const long int thread_count = ind.league_size() * ind.team_size();
- total() = ( thread_count * ( thread_count + 1 ) ) / 2 ;
+ total() = ( thread_count * ( thread_count + 1 ) ) / 2;
}
// Team max:
- const int long m = ind.team_reduce( (long int) ( ind.league_rank() + ind.team_rank() ) , JoinMax() );
+ const int long m = ind.team_reduce( (long int) ( ind.league_rank() + ind.team_rank() ), JoinMax() );
if ( m != ind.league_rank() + ( ind.team_size() - 1 ) ) {
- printf("ScanTeamFunctor[%d.%d of %d.%d] reduce_max_answer(%ld) != reduce_max(%ld)\n"
- , ind.league_rank(), ind.team_rank()
- , ind.league_size(), ind.team_size()
- , (long int)(ind.league_rank() + ( ind.team_size() - 1 )) , m );
+ printf( "ScanTeamFunctor[%d.%d of %d.%d] reduce_max_answer(%ld) != reduce_max(%ld)\n",
+ ind.league_rank(), ind.team_rank(),
+ ind.league_size(), ind.team_size(),
+ (long int) ( ind.league_rank() + ( ind.team_size() - 1 ) ), m );
}
// Scan:
const long int answer =
- ( ind.league_rank() + 1 ) * ind.team_rank() +
- ( ind.team_rank() * ( ind.team_rank() + 1 ) ) / 2 ;
+ ( ind.league_rank() + 1 ) * ind.team_rank() + ( ind.team_rank() * ( ind.team_rank() + 1 ) ) / 2;
const long int result =
ind.team_scan( ind.league_rank() + 1 + ind.team_rank() + 1 );
const long int result2 =
ind.team_scan( ind.league_rank() + 1 + ind.team_rank() + 1 );
if ( answer != result || answer != result2 ) {
- printf("ScanTeamFunctor[%d.%d of %d.%d] answer(%ld) != scan_first(%ld) or scan_second(%ld)\n",
- ind.league_rank(), ind.team_rank(),
- ind.league_size(), ind.team_size(),
- answer,result,result2);
- error = 1 ;
+ printf( "ScanTeamFunctor[%d.%d of %d.%d] answer(%ld) != scan_first(%ld) or scan_second(%ld)\n",
+ ind.league_rank(), ind.team_rank(),
+ ind.league_size(), ind.team_size(),
+ answer, result, result2 );
+
+ error = 1;
}
const long int thread_rank = ind.team_rank() +
ind.team_size() * ind.league_rank();
- ind.team_scan( 1 + thread_rank , accum.ptr_on_device() );
+ ind.team_scan( 1 + thread_rank, accum.ptr_on_device() );
}
};
template< class DeviceType, class ScheduleType >
class TestScanTeam
{
public:
- typedef DeviceType execution_space ;
- typedef long int value_type ;
-
- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
- typedef Test::ScanTeamFunctor<DeviceType, ScheduleType> functor_type ;
+ typedef DeviceType execution_space;
+ typedef long int value_type;
+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;
+ typedef Test::ScanTeamFunctor<DeviceType, ScheduleType> functor_type;
- //------------------------------------
-
- TestScanTeam( const size_t nteam )
- {
- run_test(nteam);
- }
+ TestScanTeam( const size_t nteam ) { run_test( nteam ); }
void run_test( const size_t nteam )
{
- typedef Kokkos::View< long int , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
- const unsigned REPEAT = 100000 ;
+ typedef Kokkos::View< long int, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type;
+
+ const unsigned REPEAT = 100000;
unsigned Repeat;
- if ( nteam == 0 )
- {
+
+ if ( nteam == 0 ) {
Repeat = 1;
- } else {
- Repeat = ( REPEAT + nteam - 1 ) / nteam ; //error here
}
+ else {
+ Repeat = ( REPEAT + nteam - 1 ) / nteam; // Error here.
+ }
+
+ functor_type functor;
- functor_type functor ;
+ policy_type team_exec( nteam, policy_type::team_size_max( functor ) );
- policy_type team_exec( nteam , policy_type::team_size_max( functor ) );
+ for ( unsigned i = 0; i < Repeat; ++i ) {
+ long int accum = 0;
+ long int total = 0;
+ long int error = 0;
+ Kokkos::deep_copy( functor.accum, total );
- for ( unsigned i = 0 ; i < Repeat ; ++i ) {
- long int accum = 0 ;
- long int total = 0 ;
- long int error = 0 ;
- Kokkos::deep_copy( functor.accum , total );
- Kokkos::parallel_reduce( team_exec , functor , result_type( & error ) );
+ Kokkos::parallel_reduce( team_exec, functor, result_type( & error ) );
DeviceType::fence();
- Kokkos::deep_copy( accum , functor.accum );
- Kokkos::deep_copy( total , functor.total );
- ASSERT_EQ( error , 0 );
- ASSERT_EQ( total , accum );
+ Kokkos::deep_copy( accum, functor.accum );
+ Kokkos::deep_copy( total, functor.total );
+
+ ASSERT_EQ( error, 0 );
+ ASSERT_EQ( total, accum );
}
execution_space::fence();
}
};
} // namespace Test
/*--------------------------------------------------------------------------*/
namespace Test {
template< class ExecSpace, class ScheduleType >
struct SharedTeamFunctor {
- typedef ExecSpace execution_space ;
- typedef int value_type ;
- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
+ typedef ExecSpace execution_space;
+ typedef int value_type;
+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;
enum { SHARED_COUNT = 1000 };
- typedef typename ExecSpace::scratch_memory_space shmem_space ;
+ typedef typename ExecSpace::scratch_memory_space shmem_space;
- // tbd: MemoryUnmanaged should be the default for shared memory space
- typedef Kokkos::View<int*,shmem_space,Kokkos::MemoryUnmanaged> shared_int_array_type ;
+ // TBD: MemoryUnmanaged should be the default for shared memory space.
+ typedef Kokkos::View< int*, shmem_space, Kokkos::MemoryUnmanaged > shared_int_array_type;
- // Tell how much shared memory will be required by this functor:
+ // Tell how much shared memory will be required by this functor.
inline
unsigned team_shmem_size( int team_size ) const
{
return shared_int_array_type::shmem_size( SHARED_COUNT ) +
shared_int_array_type::shmem_size( SHARED_COUNT );
}
KOKKOS_INLINE_FUNCTION
- void operator()( const typename policy_type::member_type & ind , value_type & update ) const
+ void operator()( const typename policy_type::member_type & ind, value_type & update ) const
{
- const shared_int_array_type shared_A( ind.team_shmem() , SHARED_COUNT );
- const shared_int_array_type shared_B( ind.team_shmem() , SHARED_COUNT );
-
- if ((shared_A.ptr_on_device () == NULL && SHARED_COUNT > 0) ||
- (shared_B.ptr_on_device () == NULL && SHARED_COUNT > 0)) {
- printf ("Failed to allocate shared memory of size %lu\n",
- static_cast<unsigned long> (SHARED_COUNT));
- ++update; // failure to allocate is an error
+ const shared_int_array_type shared_A( ind.team_shmem(), SHARED_COUNT );
+ const shared_int_array_type shared_B( ind.team_shmem(), SHARED_COUNT );
+
+ if ( ( shared_A.ptr_on_device () == NULL && SHARED_COUNT > 0 ) ||
+ ( shared_B.ptr_on_device () == NULL && SHARED_COUNT > 0 ) )
+ {
+ printf ("member( %d/%d , %d/%d ) Failed to allocate shared memory of size %lu\n"
+ , ind.league_rank()
+ , ind.league_size()
+ , ind.team_rank()
+ , ind.team_size()
+ , static_cast<unsigned long>( SHARED_COUNT )
+ );
+
+ ++update; // Failure to allocate is an error.
}
else {
- for ( int i = ind.team_rank() ; i < SHARED_COUNT ; i += ind.team_size() ) {
+ for ( int i = ind.team_rank(); i < SHARED_COUNT; i += ind.team_size() ) {
shared_A[i] = i + ind.league_rank();
shared_B[i] = 2 * i + ind.league_rank();
}
ind.team_barrier();
if ( ind.team_rank() + 1 == ind.team_size() ) {
- for ( int i = 0 ; i < SHARED_COUNT ; ++i ) {
+ for ( int i = 0; i < SHARED_COUNT; ++i ) {
if ( shared_A[i] != i + ind.league_rank() ) {
- ++update ;
+ ++update;
}
+
if ( shared_B[i] != 2 * i + ind.league_rank() ) {
- ++update ;
+ ++update;
}
}
}
}
}
};
-}
+} // namespace Test
namespace {
template< class ExecSpace, class ScheduleType >
struct TestSharedTeam {
-
- TestSharedTeam()
- { run(); }
+ TestSharedTeam() { run(); }
void run()
{
- typedef Test::SharedTeamFunctor<ExecSpace, ScheduleType> Functor ;
- typedef Kokkos::View< typename Functor::value_type , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
+ typedef Test::SharedTeamFunctor<ExecSpace, ScheduleType> Functor;
+ typedef Kokkos::View< typename Functor::value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type;
- const size_t team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( Functor() );
+ const size_t team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( Functor() );
- Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size , team_size );
+ Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size, team_size );
- typename Functor::value_type error_count = 0 ;
+ typename Functor::value_type error_count = 0;
- Kokkos::parallel_reduce( team_exec , Functor() , result_type( & error_count ) );
+ Kokkos::parallel_reduce( team_exec, Functor(), result_type( & error_count ) );
- ASSERT_EQ( error_count , 0 );
+ ASSERT_EQ( error_count, 0 );
}
};
-}
+
+} // namespace
namespace Test {
-#if defined (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
template< class MemorySpace, class ExecSpace, class ScheduleType >
struct TestLambdaSharedTeam {
-
- TestLambdaSharedTeam()
- { run(); }
+ TestLambdaSharedTeam() { run(); }
void run()
{
- typedef Test::SharedTeamFunctor<ExecSpace, ScheduleType> Functor ;
- //typedef Kokkos::View< typename Functor::value_type , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
- typedef Kokkos::View< typename Functor::value_type , MemorySpace, Kokkos::MemoryUnmanaged > result_type ;
+ typedef Test::SharedTeamFunctor< ExecSpace, ScheduleType > Functor;
+ //typedef Kokkos::View< typename Functor::value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type;
+ typedef Kokkos::View< typename Functor::value_type, MemorySpace, Kokkos::MemoryUnmanaged > result_type;
- typedef typename ExecSpace::scratch_memory_space shmem_space ;
+ typedef typename ExecSpace::scratch_memory_space shmem_space;
- // tbd: MemoryUnmanaged should be the default for shared memory space
- typedef Kokkos::View<int*,shmem_space,Kokkos::MemoryUnmanaged> shared_int_array_type ;
+ // TBD: MemoryUnmanaged should be the default for shared memory space.
+ typedef Kokkos::View< int*, shmem_space, Kokkos::MemoryUnmanaged > shared_int_array_type;
const int SHARED_COUNT = 1000;
int team_size = 1;
+
#ifdef KOKKOS_ENABLE_CUDA
- if(std::is_same<ExecSpace,Kokkos::Cuda>::value)
- team_size = 128;
+ if ( std::is_same< ExecSpace, Kokkos::Cuda >::value ) team_size = 128;
#endif
- Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size , team_size);
- team_exec = team_exec.set_scratch_size(0,Kokkos::PerTeam(SHARED_COUNT*2*sizeof(int)));
- typename Functor::value_type error_count = 0 ;
+ Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size, team_size );
+ team_exec = team_exec.set_scratch_size( 0, Kokkos::PerTeam( SHARED_COUNT * 2 * sizeof( int ) ) );
+
+ typename Functor::value_type error_count = 0;
- Kokkos::parallel_reduce( team_exec , KOKKOS_LAMBDA
- ( const typename Kokkos::TeamPolicy< ScheduleType, ExecSpace >::member_type & ind , int & update ) {
+ Kokkos::parallel_reduce( team_exec, KOKKOS_LAMBDA
+ ( const typename Kokkos::TeamPolicy< ScheduleType, ExecSpace >::member_type & ind, int & update )
+ {
+ const shared_int_array_type shared_A( ind.team_shmem(), SHARED_COUNT );
+ const shared_int_array_type shared_B( ind.team_shmem(), SHARED_COUNT );
- const shared_int_array_type shared_A( ind.team_shmem() , SHARED_COUNT );
- const shared_int_array_type shared_B( ind.team_shmem() , SHARED_COUNT );
+ if ( ( shared_A.ptr_on_device () == NULL && SHARED_COUNT > 0 ) ||
+ ( shared_B.ptr_on_device () == NULL && SHARED_COUNT > 0 ) )
+ {
+ printf( "Failed to allocate shared memory of size %lu\n",
+ static_cast<unsigned long>( SHARED_COUNT ) );
- if ((shared_A.ptr_on_device () == NULL && SHARED_COUNT > 0) ||
- (shared_B.ptr_on_device () == NULL && SHARED_COUNT > 0)) {
- printf ("Failed to allocate shared memory of size %lu\n",
- static_cast<unsigned long> (SHARED_COUNT));
- ++update; // failure to allocate is an error
- } else {
- for ( int i = ind.team_rank() ; i < SHARED_COUNT ; i += ind.team_size() ) {
+ ++update; // Failure to allocate is an error.
+ }
+ else {
+ for ( int i = ind.team_rank(); i < SHARED_COUNT; i += ind.team_size() ) {
shared_A[i] = i + ind.league_rank();
shared_B[i] = 2 * i + ind.league_rank();
}
ind.team_barrier();
if ( ind.team_rank() + 1 == ind.team_size() ) {
- for ( int i = 0 ; i < SHARED_COUNT ; ++i ) {
+ for ( int i = 0; i < SHARED_COUNT; ++i ) {
if ( shared_A[i] != i + ind.league_rank() ) {
- ++update ;
+ ++update;
}
+
if ( shared_B[i] != 2 * i + ind.league_rank() ) {
- ++update ;
+ ++update;
}
}
}
}
}, result_type( & error_count ) );
- ASSERT_EQ( error_count , 0 );
+ ASSERT_EQ( error_count, 0 );
}
};
#endif
-}
+
+} // namespace Test
namespace Test {
template< class ExecSpace, class ScheduleType >
struct ScratchTeamFunctor {
- typedef ExecSpace execution_space ;
- typedef int value_type ;
- typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type ;
+ typedef ExecSpace execution_space;
+ typedef int value_type;
+ typedef Kokkos::TeamPolicy< ScheduleType, execution_space > policy_type;
enum { SHARED_TEAM_COUNT = 100 };
enum { SHARED_THREAD_COUNT = 10 };
- typedef typename ExecSpace::scratch_memory_space shmem_space ;
+ typedef typename ExecSpace::scratch_memory_space shmem_space;
- // tbd: MemoryUnmanaged should be the default for shared memory space
- typedef Kokkos::View<size_t*,shmem_space,Kokkos::MemoryUnmanaged> shared_int_array_type ;
+ // TBD: MemoryUnmanaged should be the default for shared memory space.
+ typedef Kokkos::View< size_t*, shmem_space, Kokkos::MemoryUnmanaged > shared_int_array_type;
KOKKOS_INLINE_FUNCTION
- void operator()( const typename policy_type::member_type & ind , value_type & update ) const
+ void operator()( const typename policy_type::member_type & ind, value_type & update ) const
{
- const shared_int_array_type scratch_ptr( ind.team_scratch(1) , 3*ind.team_size() );
- const shared_int_array_type scratch_A( ind.team_scratch(1) , SHARED_TEAM_COUNT );
- const shared_int_array_type scratch_B( ind.thread_scratch(1) , SHARED_THREAD_COUNT );
-
- if ((scratch_ptr.ptr_on_device () == NULL ) ||
- (scratch_A. ptr_on_device () == NULL && SHARED_TEAM_COUNT > 0) ||
- (scratch_B. ptr_on_device () == NULL && SHARED_THREAD_COUNT > 0)) {
- printf ("Failed to allocate shared memory of size %lu\n",
- static_cast<unsigned long> (SHARED_TEAM_COUNT));
- ++update; // failure to allocate is an error
+ const shared_int_array_type scratch_ptr( ind.team_scratch( 1 ), 3 * ind.team_size() );
+ const shared_int_array_type scratch_A( ind.team_scratch( 1 ), SHARED_TEAM_COUNT );
+ const shared_int_array_type scratch_B( ind.thread_scratch( 1 ), SHARED_THREAD_COUNT );
+
+ if ( ( scratch_ptr.ptr_on_device () == NULL ) ||
+ ( scratch_A. ptr_on_device () == NULL && SHARED_TEAM_COUNT > 0 ) ||
+ ( scratch_B. ptr_on_device () == NULL && SHARED_THREAD_COUNT > 0 ) )
+ {
+ printf( "Failed to allocate shared memory of size %lu\n",
+ static_cast<unsigned long>( SHARED_TEAM_COUNT ) );
+
+ ++update; // Failure to allocate is an error.
}
else {
- Kokkos::parallel_for(Kokkos::TeamThreadRange(ind,0,(int)SHARED_TEAM_COUNT),[&] (const int &i) {
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( ind, 0, (int) SHARED_TEAM_COUNT ), [&] ( const int & i ) {
scratch_A[i] = i + ind.league_rank();
});
- for(int i=0; i<SHARED_THREAD_COUNT; i++)
- scratch_B[i] = 10000*ind.league_rank() + 100*ind.team_rank() + i;
+
+ for ( int i = 0; i < SHARED_THREAD_COUNT; i++ ) {
+ scratch_B[i] = 10000 * ind.league_rank() + 100 * ind.team_rank() + i;
+ }
scratch_ptr[ind.team_rank()] = (size_t) scratch_A.ptr_on_device();
scratch_ptr[ind.team_rank() + ind.team_size()] = (size_t) scratch_B.ptr_on_device();
ind.team_barrier();
- for( int i = 0; i<SHARED_TEAM_COUNT; i++) {
- if(scratch_A[i] != size_t(i + ind.league_rank()))
- ++update;
+ for ( int i = 0; i < SHARED_TEAM_COUNT; i++ ) {
+ if ( scratch_A[i] != size_t( i + ind.league_rank() ) ) ++update;
}
- for( int i = 0; i < ind.team_size(); i++) {
- if(scratch_ptr[0]!=scratch_ptr[i]) ++update;
+
+ for ( int i = 0; i < ind.team_size(); i++ ) {
+ if ( scratch_ptr[0] != scratch_ptr[i] ) ++update;
}
- if(scratch_ptr[1+ind.team_size()] - scratch_ptr[0 + ind.team_size()] <
- SHARED_THREAD_COUNT*sizeof(size_t))
+
+ if ( scratch_ptr[1 + ind.team_size()] - scratch_ptr[0 + ind.team_size()] < SHARED_THREAD_COUNT * sizeof( size_t ) ) {
++update;
- for( int i = 1; i < ind.team_size(); i++) {
- if((scratch_ptr[i+ind.team_size()] - scratch_ptr[i-1+ind.team_size()]) !=
- (scratch_ptr[1+ind.team_size()] - scratch_ptr[0 + ind.team_size()])) ++update;
+ }
+ for ( int i = 1; i < ind.team_size(); i++ ) {
+ if ( ( scratch_ptr[i + ind.team_size()] - scratch_ptr[i - 1 + ind.team_size()] ) !=
+ ( scratch_ptr[1 + ind.team_size()] - scratch_ptr[0 + ind.team_size()] ) )
+ {
+ ++update;
+ }
}
}
}
};
-}
+} // namespace Test
namespace {
template< class ExecSpace, class ScheduleType >
struct TestScratchTeam {
-
- TestScratchTeam()
- { run(); }
+ TestScratchTeam() { run(); }
void run()
{
- typedef Test::ScratchTeamFunctor<ExecSpace, ScheduleType> Functor ;
- typedef Kokkos::View< typename Functor::value_type , Kokkos::HostSpace , Kokkos::MemoryUnmanaged > result_type ;
+ typedef Test::ScratchTeamFunctor<ExecSpace, ScheduleType> Functor;
+ typedef Kokkos::View< typename Functor::value_type, Kokkos::HostSpace, Kokkos::MemoryUnmanaged > result_type;
const size_t team_size = Kokkos::TeamPolicy< ScheduleType, ExecSpace >::team_size_max( Functor() );
- Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size , team_size );
+ Kokkos::TeamPolicy< ScheduleType, ExecSpace > team_exec( 8192 / team_size, team_size );
+
+ typename Functor::value_type error_count = 0;
+
+ int team_scratch_size = Functor::shared_int_array_type::shmem_size( Functor::SHARED_TEAM_COUNT ) +
+ Functor::shared_int_array_type::shmem_size( 3 * team_size );
- typename Functor::value_type error_count = 0 ;
+ int thread_scratch_size = Functor::shared_int_array_type::shmem_size( Functor::SHARED_THREAD_COUNT );
- int team_scratch_size = Functor::shared_int_array_type::shmem_size(Functor::SHARED_TEAM_COUNT) +
- Functor::shared_int_array_type::shmem_size(3*team_size);
- int thread_scratch_size = Functor::shared_int_array_type::shmem_size(Functor::SHARED_THREAD_COUNT);
- Kokkos::parallel_reduce( team_exec.set_scratch_size(0,Kokkos::PerTeam(team_scratch_size),
- Kokkos::PerThread(thread_scratch_size)) ,
- Functor() , result_type( & error_count ) );
+ Kokkos::parallel_reduce( team_exec.set_scratch_size( 0, Kokkos::PerTeam( team_scratch_size ),
+ Kokkos::PerThread( thread_scratch_size ) ),
+ Functor(), result_type( & error_count ) );
- ASSERT_EQ( error_count , 0 );
+ ASSERT_EQ( error_count, 0 );
}
};
-}
+
+} // namespace
namespace Test {
-template< class ExecSpace>
+
+template< class ExecSpace >
KOKKOS_INLINE_FUNCTION
-int test_team_mulit_level_scratch_loop_body(const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team) {
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team1(team.team_scratch(0),128);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread1(team.thread_scratch(0),16);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team2(team.team_scratch(0),128);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread2(team.thread_scratch(0),16);
-
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team1(team.team_scratch(1),128000);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread1(team.thread_scratch(1),16000);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team2(team.team_scratch(1),128000);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread2(team.thread_scratch(1),16000);
-
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_team3(team.team_scratch(0),128);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> a_thread3(team.thread_scratch(0),16);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_team3(team.team_scratch(1),128000);
- Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>> b_thread3(team.thread_scratch(1),16000);
+int test_team_mulit_level_scratch_loop_body( const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team ) {
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_team1( team.team_scratch( 0 ), 128 );
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_thread1( team.thread_scratch( 0 ), 16 );
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_team2( team.team_scratch( 0 ), 128 );
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_thread2( team.thread_scratch( 0 ), 16 );
+
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_team1( team.team_scratch( 1 ), 128000 );
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_thread1( team.thread_scratch( 1 ), 16000 );
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_team2( team.team_scratch( 1 ), 128000 );
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_thread2( team.thread_scratch( 1 ), 16000 );
+
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_team3( team.team_scratch( 0 ), 128 );
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > a_thread3( team.thread_scratch( 0 ), 16 );
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_team3( team.team_scratch( 1 ), 128000 );
+ Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> > b_thread3( team.thread_scratch( 1 ), 16000 );
// The explicit types for 0 and 128 are here to test TeamThreadRange accepting different
// types for begin and end.
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,int(0),unsigned(128)), [&] (const int& i)
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, int( 0 ), unsigned( 128 ) ), [&] ( const int & i )
{
- a_team1(i) = 1000000 + i;
- a_team2(i) = 2000000 + i;
- a_team3(i) = 3000000 + i;
+ a_team1( i ) = 1000000 + i + team.league_rank() * 100000;
+ a_team2( i ) = 2000000 + i + team.league_rank() * 100000;
+ a_team3( i ) = 3000000 + i + team.league_rank() * 100000;
});
team.team_barrier();
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16), [&] (const int& i)
+
+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 16 ), [&] ( const int & i )
{
- a_thread1(i) = 1000000 + 100000*team.team_rank() + 16-i;
- a_thread2(i) = 2000000 + 100000*team.team_rank() + 16-i;
- a_thread3(i) = 3000000 + 100000*team.team_rank() + 16-i;
+ a_thread1( i ) = 1000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
+ a_thread2( i ) = 2000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
+ a_thread3( i ) = 3000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
});
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128000), [&] (const int& i)
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, 0, 128000 ), [&] ( const int & i )
{
- b_team1(i) = 1000000 + i;
- b_team2(i) = 2000000 + i;
- b_team3(i) = 3000000 + i;
+ b_team1( i ) = 1000000 + i + team.league_rank() * 100000;
+ b_team2( i ) = 2000000 + i + team.league_rank() * 100000;
+ b_team3( i ) = 3000000 + i + team.league_rank() * 100000;
});
team.team_barrier();
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16000), [&] (const int& i)
+
+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 16000 ), [&] ( const int & i )
{
- b_thread1(i) = 1000000 + 100000*team.team_rank() + 16-i;
- b_thread2(i) = 2000000 + 100000*team.team_rank() + 16-i;
- b_thread3(i) = 3000000 + 100000*team.team_rank() + 16-i;
+ b_thread1( i ) = 1000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
+ b_thread2( i ) = 2000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
+ b_thread3( i ) = 3000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000;
});
team.team_barrier();
+
int error = 0;
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128), [&] (const int& i)
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, 0, 128 ), [&] ( const int & i )
{
- if(a_team1(i) != 1000000 + i) error++;
- if(a_team2(i) != 2000000 + i) error++;
- if(a_team3(i) != 3000000 + i) error++;
+ if ( a_team1( i ) != 1000000 + i + team.league_rank() * 100000 ) error++;
+ if ( a_team2( i ) != 2000000 + i + team.league_rank() * 100000 ) error++;
+ if ( a_team3( i ) != 3000000 + i + team.league_rank() * 100000 ) error++;
});
team.team_barrier();
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16), [&] (const int& i)
+
+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 16 ), [&] ( const int & i )
{
- if(a_thread1(i) != 1000000 + 100000*team.team_rank() + 16-i) error++;
- if(a_thread2(i) != 2000000 + 100000*team.team_rank() + 16-i) error++;
- if(a_thread3(i) != 3000000 + 100000*team.team_rank() + 16-i) error++;
+ if ( a_thread1( i ) != 1000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
+ if ( a_thread2( i ) != 2000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
+ if ( a_thread3( i ) != 3000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
});
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,0,128000), [&] (const int& i)
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, 0, 128000 ), [&] ( const int & i )
{
- if(b_team1(i) != 1000000 + i) error++;
- if(b_team2(i) != 2000000 + i) error++;
- if(b_team3(i) != 3000000 + i) error++;
+ if ( b_team1( i ) != 1000000 + i + team.league_rank() * 100000 ) error++;
+ if ( b_team2( i ) != 2000000 + i + team.league_rank() * 100000 ) error++;
+ if ( b_team3( i ) != 3000000 + i + team.league_rank() * 100000 ) error++;
});
team.team_barrier();
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,16000), [&] (const int& i)
+
+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 16000 ), [&] ( const int & i )
{
- if(b_thread1(i) != 1000000 + 100000*team.team_rank() + 16-i) error++;
- if(b_thread2(i) != 2000000 + 100000*team.team_rank() + 16-i) error++;
- if( b_thread3(i) != 3000000 + 100000*team.team_rank() + 16-i) error++;
+ if ( b_thread1( i ) != 1000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
+ if ( b_thread2( i ) != 2000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
+ if ( b_thread3( i ) != 3000000 + 100000 * team.team_rank() + 16 - i + team.league_rank() * 100000 ) error++;
});
return error;
}
struct TagReduce {};
struct TagFor {};
template< class ExecSpace, class ScheduleType >
struct ClassNoShmemSizeFunction {
- Kokkos::View<int,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
+ typedef typename Kokkos::TeamPolicy< ExecSpace, ScheduleType >::member_type member_type;
+
+ Kokkos::View< int, ExecSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
KOKKOS_INLINE_FUNCTION
- void operator() (const TagFor&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team) const {
- int error = test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
+ void operator()( const TagFor &, const member_type & team ) const {
+ int error = test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
errors() += error;
}
KOKKOS_INLINE_FUNCTION
- void operator() (const TagReduce&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team, int& error) const {
- error += test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
+ void operator() ( const TagReduce &, const member_type & team, int & error ) const {
+ error += test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
}
void run() {
- Kokkos::View<int,ExecSpace> d_errors = Kokkos::View<int,ExecSpace>("Errors");
+ Kokkos::View< int, ExecSpace > d_errors = Kokkos::View< int, ExecSpace >( "Errors" );
errors = d_errors;
- const int per_team0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128);
- const int per_thread0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16);
+ const int per_team0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128 );
+ const int per_thread0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16 );
+
+ const int per_team1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128000 );
+ const int per_thread1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16000 );
- const int per_team1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128000);
- const int per_thread1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16000);
{
- Kokkos::TeamPolicy<TagFor,ExecSpace,ScheduleType> policy(10,8,16);
- Kokkos::parallel_for(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
- *this);
- Kokkos::fence();
- typename Kokkos::View<int,ExecSpace>::HostMirror h_errors = Kokkos::create_mirror_view(d_errors);
- Kokkos::deep_copy(h_errors,d_errors);
- ASSERT_EQ(h_errors(),0);
+ Kokkos::TeamPolicy< TagFor, ExecSpace, ScheduleType > policy( 10, 8, 16 );
+
+ Kokkos::parallel_for( policy.set_scratch_size( 0, Kokkos::PerTeam( per_team0 ), Kokkos::PerThread( per_thread0 ) ).set_scratch_size( 1, Kokkos::PerTeam( per_team1 ), Kokkos::PerThread( per_thread1 ) ), *this );
+ Kokkos::fence();
+
+ typename Kokkos::View< int, ExecSpace >::HostMirror h_errors = Kokkos::create_mirror_view( d_errors );
+ Kokkos::deep_copy( h_errors, d_errors );
+ ASSERT_EQ( h_errors(), 0 );
}
{
- int error = 0;
- Kokkos::TeamPolicy<TagReduce,ExecSpace,ScheduleType> policy(10,8,16);
- Kokkos::parallel_reduce(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
- *this,error);
- Kokkos::fence();
- ASSERT_EQ(error,0);
+ int error = 0;
+ Kokkos::TeamPolicy< TagReduce, ExecSpace, ScheduleType > policy( 10, 8, 16 );
+
+ Kokkos::parallel_reduce( policy.set_scratch_size( 0, Kokkos::PerTeam( per_team0 ), Kokkos::PerThread( per_thread0 ) ).set_scratch_size( 1, Kokkos::PerTeam( per_team1 ), Kokkos::PerThread( per_thread1 ) ), *this, error );
+ Kokkos::fence();
+
+ ASSERT_EQ( error, 0 );
}
};
};
template< class ExecSpace, class ScheduleType >
struct ClassWithShmemSizeFunction {
- Kokkos::View<int,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
+ typedef typename Kokkos::TeamPolicy< ExecSpace, ScheduleType >::member_type member_type;
+
+ Kokkos::View< int, ExecSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
KOKKOS_INLINE_FUNCTION
- void operator() (const TagFor&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team) const {
- int error = test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
+ void operator()( const TagFor &, const member_type & team ) const {
+ int error = test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
errors() += error;
}
KOKKOS_INLINE_FUNCTION
- void operator() (const TagReduce&, const typename Kokkos::TeamPolicy<ExecSpace,ScheduleType>::member_type& team, int& error) const {
- error += test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
+ void operator() ( const TagReduce &, const member_type & team, int & error ) const {
+ error += test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
}
void run() {
- Kokkos::View<int,ExecSpace> d_errors = Kokkos::View<int,ExecSpace>("Errors");
+ Kokkos::View< int, ExecSpace > d_errors = Kokkos::View< int, ExecSpace >( "Errors" );
errors = d_errors;
- const int per_team1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128000);
- const int per_thread1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16000);
+ const int per_team1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128000 );
+ const int per_thread1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16000 );
+
{
- Kokkos::TeamPolicy<TagFor,ExecSpace,ScheduleType> policy(10,8,16);
- Kokkos::parallel_for(policy.set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
- *this);
- Kokkos::fence();
- typename Kokkos::View<int,ExecSpace>::HostMirror h_errors= Kokkos::create_mirror_view(d_errors);
- Kokkos::deep_copy(h_errors,d_errors);
- ASSERT_EQ(h_errors(),0);
+ Kokkos::TeamPolicy< TagFor, ExecSpace, ScheduleType > policy( 10, 8, 16 );
+
+ Kokkos::parallel_for( policy.set_scratch_size( 1, Kokkos::PerTeam( per_team1 ),
+ Kokkos::PerThread( per_thread1 ) ),
+ *this );
+ Kokkos::fence();
+
+ typename Kokkos::View< int, ExecSpace >::HostMirror h_errors = Kokkos::create_mirror_view( d_errors );
+ Kokkos::deep_copy( h_errors, d_errors );
+ ASSERT_EQ( h_errors(), 0 );
}
{
- int error = 0;
- Kokkos::TeamPolicy<TagReduce,ExecSpace,ScheduleType> policy(10,8,16);
- Kokkos::parallel_reduce(policy.set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
- *this,error);
- Kokkos::fence();
- ASSERT_EQ(error,0);
+ int error = 0;
+ Kokkos::TeamPolicy< TagReduce, ExecSpace, ScheduleType > policy( 10, 8, 16 );
+
+ Kokkos::parallel_reduce( policy.set_scratch_size( 1, Kokkos::PerTeam( per_team1 ),
+ Kokkos::PerThread( per_thread1 ) ),
+ *this, error );
+ Kokkos::fence();
+
+ ASSERT_EQ( error, 0 );
}
};
- unsigned team_shmem_size(int team_size) const {
- const int per_team0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128);
- const int per_thread0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16);
+ unsigned team_shmem_size( int team_size ) const {
+ const int per_team0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128 );
+ const int per_thread0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16 );
return per_team0 + team_size * per_thread0;
}
};
template< class ExecSpace, class ScheduleType >
void test_team_mulit_level_scratch_test_lambda() {
#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
- Kokkos::View<int,ExecSpace,Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
- Kokkos::View<int,ExecSpace> d_errors("Errors");
+ Kokkos::View< int, ExecSpace, Kokkos::MemoryTraits<Kokkos::Atomic> > errors;
+ Kokkos::View< int, ExecSpace > d_errors( "Errors" );
errors = d_errors;
- const int per_team0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128);
- const int per_thread0 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16);
+ const int per_team0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128 );
+ const int per_thread0 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16 );
+
+ const int per_team1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 128000 );
+ const int per_thread1 = 3 * Kokkos::View< double*, ExecSpace, Kokkos::MemoryTraits<Kokkos::Unmanaged> >::shmem_size( 16000 );
- const int per_team1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(128000);
- const int per_thread1 = 3*Kokkos::View<double*,ExecSpace,Kokkos::MemoryTraits<Kokkos::Unmanaged>>::shmem_size(16000);
+ Kokkos::TeamPolicy< ExecSpace, ScheduleType > policy( 10, 8, 16 );
- Kokkos::TeamPolicy<ExecSpace,ScheduleType> policy(10,8,16);
- Kokkos::parallel_for(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
- KOKKOS_LAMBDA(const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team) {
- int error = test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
+ Kokkos::parallel_for( policy.set_scratch_size( 0, Kokkos::PerTeam( per_team0 ), Kokkos::PerThread( per_thread0 ) ).set_scratch_size( 1, Kokkos::PerTeam( per_team1 ), Kokkos::PerThread( per_thread1 ) ),
+ KOKKOS_LAMBDA ( const typename Kokkos::TeamPolicy< ExecSpace >::member_type & team )
+ {
+ int error = test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
errors() += error;
});
Kokkos::fence();
- typename Kokkos::View<int,ExecSpace>::HostMirror h_errors= Kokkos::create_mirror_view(errors);
- Kokkos::deep_copy(h_errors,d_errors);
- ASSERT_EQ(h_errors(),0);
+
+ typename Kokkos::View< int, ExecSpace >::HostMirror h_errors = Kokkos::create_mirror_view( errors );
+ Kokkos::deep_copy( h_errors, d_errors );
+ ASSERT_EQ( h_errors(), 0 );
int error = 0;
- Kokkos::parallel_reduce(policy.set_scratch_size(0,Kokkos::PerTeam(per_team0),Kokkos::PerThread(per_thread0)).set_scratch_size(1,Kokkos::PerTeam(per_team1),Kokkos::PerThread(per_thread1)),
- KOKKOS_LAMBDA(const typename Kokkos::TeamPolicy<ExecSpace>::member_type& team, int& count) {
- count += test_team_mulit_level_scratch_loop_body<ExecSpace>(team);
- },error);
- ASSERT_EQ(error,0);
+ Kokkos::parallel_reduce( policy.set_scratch_size( 0, Kokkos::PerTeam( per_team0 ), Kokkos::PerThread( per_thread0 ) ).set_scratch_size( 1, Kokkos::PerTeam( per_team1 ), Kokkos::PerThread( per_thread1 ) ),
+ KOKKOS_LAMBDA ( const typename Kokkos::TeamPolicy< ExecSpace >::member_type & team, int & count )
+ {
+ count += test_team_mulit_level_scratch_loop_body< ExecSpace >( team );
+ }, error );
+ ASSERT_EQ( error, 0 );
Kokkos::fence();
#endif
}
-
-}
+} // namespace Test
namespace {
+
template< class ExecSpace, class ScheduleType >
struct TestMultiLevelScratchTeam {
-
- TestMultiLevelScratchTeam()
- { run(); }
+ TestMultiLevelScratchTeam() { run(); }
void run()
{
#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
- Test::test_team_mulit_level_scratch_test_lambda<ExecSpace, ScheduleType>();
+ Test::test_team_mulit_level_scratch_test_lambda< ExecSpace, ScheduleType >();
#endif
- Test::ClassNoShmemSizeFunction<ExecSpace, ScheduleType> c1;
+ Test::ClassNoShmemSizeFunction< ExecSpace, ScheduleType > c1;
c1.run();
- Test::ClassWithShmemSizeFunction<ExecSpace, ScheduleType> c2;
+ Test::ClassWithShmemSizeFunction< ExecSpace, ScheduleType > c2;
c2.run();
-
}
};
-}
+
+} // namespace
namespace Test {
template< class ExecSpace >
struct TestShmemSize {
-
TestShmemSize() { run(); }
void run()
{
typedef Kokkos::View< long***, ExecSpace > view_type;
size_t d1 = 5;
size_t d2 = 6;
size_t d3 = 7;
size_t size = view_type::shmem_size( d1, d2, d3 );
- ASSERT_EQ( size, d1 * d2 * d3 * sizeof(long) );
+ ASSERT_EQ( size, d1 * d2 * d3 * sizeof( long ) );
}
};
-}
-/*--------------------------------------------------------------------------*/
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestTeamVector.hpp b/lib/kokkos/core/unit_test/TestTeamVector.hpp
index d9b06c29e..8d16ac66d 100644
--- a/lib/kokkos/core/unit_test/TestTeamVector.hpp
+++ b/lib/kokkos/core/unit_test/TestTeamVector.hpp
@@ -1,673 +1,745 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Timer.hpp>
#include <iostream>
#include <cstdlib>
namespace TestTeamVector {
struct my_complex {
- double re,im;
+ double re, im;
int dummy;
+
KOKKOS_INLINE_FUNCTION
my_complex() {
re = 0.0;
im = 0.0;
dummy = 0;
}
+
KOKKOS_INLINE_FUNCTION
- my_complex(const my_complex& src) {
+ my_complex( const my_complex & src ) {
re = src.re;
im = src.im;
dummy = src.dummy;
}
KOKKOS_INLINE_FUNCTION
- my_complex(const volatile my_complex& src) {
+ my_complex & operator=( const my_complex & src ) {
re = src.re;
im = src.im;
dummy = src.dummy;
+ return *this ;
}
KOKKOS_INLINE_FUNCTION
- my_complex(const double& val) {
+ my_complex( const volatile my_complex & src ) {
+ re = src.re;
+ im = src.im;
+ dummy = src.dummy;
+ }
+
+ KOKKOS_INLINE_FUNCTION
+ my_complex( const double & val ) {
re = val;
im = 0.0;
dummy = 0;
}
+
KOKKOS_INLINE_FUNCTION
- my_complex& operator += (const my_complex& src) {
+ my_complex & operator+=( const my_complex & src ) {
re += src.re;
im += src.im;
dummy += src.dummy;
return *this;
}
KOKKOS_INLINE_FUNCTION
- void operator += (const volatile my_complex& src) volatile {
+ void operator+=( const volatile my_complex & src ) volatile {
re += src.re;
im += src.im;
dummy += src.dummy;
}
+
KOKKOS_INLINE_FUNCTION
- my_complex& operator *= (const my_complex& src) {
- double re_tmp = re*src.re - im*src.im;
+ my_complex & operator*=( const my_complex & src ) {
+ double re_tmp = re * src.re - im * src.im;
double im_tmp = re * src.im + im * src.re;
re = re_tmp;
im = im_tmp;
dummy *= src.dummy;
return *this;
}
+
KOKKOS_INLINE_FUNCTION
- void operator *= (const volatile my_complex& src) volatile {
- double re_tmp = re*src.re - im*src.im;
+ void operator*=( const volatile my_complex & src ) volatile {
+ double re_tmp = re * src.re - im * src.im;
double im_tmp = re * src.im + im * src.re;
re = re_tmp;
im = im_tmp;
dummy *= src.dummy;
}
+
KOKKOS_INLINE_FUNCTION
- bool operator == (const my_complex& src) {
- return (re == src.re) && (im == src.im) && ( dummy == src.dummy );
+ bool operator==( const my_complex & src ) {
+ return ( re == src.re ) && ( im == src.im ) && ( dummy == src.dummy );
}
+
KOKKOS_INLINE_FUNCTION
- bool operator != (const my_complex& src) {
- return (re != src.re) || (im != src.im) || ( dummy != src.dummy );
+ bool operator!=( const my_complex & src ) {
+ return ( re != src.re ) || ( im != src.im ) || ( dummy != src.dummy );
}
+
KOKKOS_INLINE_FUNCTION
- bool operator != (const double& val) {
- return (re != val) ||
- (im != 0) || (dummy != 0);
+ bool operator!=( const double & val ) {
+ return ( re != val ) || ( im != 0 ) || ( dummy != 0 );
}
+
KOKKOS_INLINE_FUNCTION
- my_complex& operator= (const int& val) {
+ my_complex & operator=( const int & val ) {
re = val;
im = 0.0;
dummy = 0;
return *this;
}
+
KOKKOS_INLINE_FUNCTION
- my_complex& operator= (const double& val) {
+ my_complex & operator=( const double & val ) {
re = val;
im = 0.0;
dummy = 0;
return *this;
}
+
KOKKOS_INLINE_FUNCTION
operator double() {
return re;
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_team_for {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_team_for(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
- unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
+ functor_team_for( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
- KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }
- typedef typename ExecutionSpace::scratch_memory_space shmem_space ;
- typedef Kokkos::View<Scalar*,shmem_space,Kokkos::MemoryUnmanaged> shared_int;
+ KOKKOS_INLINE_FUNCTION
+ void operator()( typename policy_type::member_type team ) const {
+ typedef typename ExecutionSpace::scratch_memory_space shmem_space;
+ typedef Kokkos::View< Scalar*, shmem_space, Kokkos::MemoryUnmanaged > shared_int;
typedef typename shared_int::size_type size_type;
- const size_type shmemSize = team.team_size () * 13;
- shared_int values = shared_int (team.team_shmem (), shmemSize);
+ const size_type shmemSize = team.team_size() * 13;
+ shared_int values = shared_int( team.team_shmem(), shmemSize );
- if (values.ptr_on_device () == NULL || values.dimension_0 () < shmemSize) {
- printf ("FAILED to allocate shared memory of size %u\n",
- static_cast<unsigned int> (shmemSize));
+ if ( values.ptr_on_device() == NULL || values.dimension_0() < shmemSize ) {
+ printf( "FAILED to allocate shared memory of size %u\n",
+ static_cast<unsigned int>( shmemSize ) );
}
else {
+ // Initialize shared memory.
+ values( team.team_rank() ) = 0;
- // Initialize shared memory
- values(team.team_rank ()) = 0;
-
- // Accumulate value into per thread shared memory
- // This is non blocking
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,131),[&] (int i)
+ // Accumulate value into per thread shared memory.
+ // This is non blocking.
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i )
{
- values(team.team_rank ()) += i - team.league_rank () + team.league_size () + team.team_size ();
+ values( team.team_rank() ) += i - team.league_rank() + team.league_size() + team.team_size();
});
- // Wait for all memory to be written
- team.team_barrier ();
- // One thread per team executes the comparison
- Kokkos::single(Kokkos::PerTeam(team),[&]()
+
+ // Wait for all memory to be written.
+ team.team_barrier();
+
+ // One thread per team executes the comparison.
+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
{
- Scalar test = 0;
- Scalar value = 0;
- for (int i = 0; i < 131; ++i) {
- test += i - team.league_rank () + team.league_size () + team.team_size ();
- }
- for (int i = 0; i < team.team_size (); ++i) {
- value += values(i);
- }
- if (test != value) {
- printf ("FAILED team_parallel_for %i %i %f %f\n",
- team.league_rank (), team.team_rank (),
- static_cast<double> (test), static_cast<double> (value));
- flag() = 1;
- }
+ Scalar test = 0;
+ Scalar value = 0;
+
+ for ( int i = 0; i < 131; ++i ) {
+ test += i - team.league_rank() + team.league_size() + team.team_size();
+ }
+
+ for ( int i = 0; i < team.team_size(); ++i ) {
+ value += values( i );
+ }
+
+ if ( test != value ) {
+ printf ( "FAILED team_parallel_for %i %i %f %f\n",
+ team.league_rank(), team.team_rank(),
+ static_cast<double>( test ), static_cast<double>( value ) );
+ flag() = 1;
+ }
});
}
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_team_reduce {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_team_reduce(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
- unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
+ functor_team_reduce( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
- KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }
+ KOKKOS_INLINE_FUNCTION
+ void operator()( typename policy_type::member_type team ) const {
Scalar value = Scalar();
- Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131),[&] (int i, Scalar& val)
+
+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i, Scalar & val )
{
- val += i - team.league_rank () + team.league_size () + team.team_size ();
- },value);
+ val += i - team.league_rank() + team.league_size() + team.team_size();
+ }, value );
- team.team_barrier ();
- Kokkos::single(Kokkos::PerTeam(team),[&]()
- {
- Scalar test = 0;
- for (int i = 0; i < 131; ++i) {
- test += i - team.league_rank () + team.league_size () + team.team_size ();
- }
- if (test != value) {
- if(team.league_rank() == 0)
- printf ("FAILED team_parallel_reduce %i %i %f %f %lu\n",
- team.league_rank (), team.team_rank (),
- static_cast<double> (test), static_cast<double> (value),sizeof(Scalar));
- flag() = 1;
- }
+ team.team_barrier();
+
+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
+ {
+ Scalar test = 0;
+
+ for ( int i = 0; i < 131; ++i ) {
+ test += i - team.league_rank() + team.league_size() + team.team_size();
+ }
+
+ if ( test != value ) {
+ if ( team.league_rank() == 0 ) {
+ printf( "FAILED team_parallel_reduce %i %i %f %f %lu\n",
+ team.league_rank(), team.team_rank(),
+ static_cast<double>( test ), static_cast<double>( value ), sizeof( Scalar ) );
+ }
+
+ flag() = 1;
+ }
});
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_team_reduce_join {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_team_reduce_join(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
- unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
+ functor_team_reduce_join( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
- KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }
+ KOKKOS_INLINE_FUNCTION
+ void operator()( typename policy_type::member_type team ) const {
Scalar value = 0;
- Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131)
- , [&] (int i, Scalar& val)
- {
- val += i - team.league_rank () + team.league_size () + team.team_size ();
- }
- , [&] (volatile Scalar& val, const volatile Scalar& src)
- {val+=src;}
- , value
+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i, Scalar & val )
+ {
+ val += i - team.league_rank() + team.league_size() + team.team_size();
+ },
+ [] ( volatile Scalar & val, const volatile Scalar & src ) { val += src; },
+ value
);
- team.team_barrier ();
- Kokkos::single(Kokkos::PerTeam(team),[&]()
+ team.team_barrier();
+
+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
{
- Scalar test = 0;
- for (int i = 0; i < 131; ++i) {
- test += i - team.league_rank () + team.league_size () + team.team_size ();
- }
- if (test != value) {
- printf ("FAILED team_vector_parallel_reduce_join %i %i %f %f\n",
- team.league_rank (), team.team_rank (),
- static_cast<double> (test), static_cast<double> (value));
- flag() = 1;
- }
+ Scalar test = 0;
+
+ for ( int i = 0; i < 131; ++i ) {
+ test += i - team.league_rank() + team.league_size() + team.team_size();
+ }
+
+ if ( test != value ) {
+ printf( "FAILED team_vector_parallel_reduce_join %i %i %f %f\n",
+ team.league_rank(), team.team_rank(),
+ static_cast<double>( test ), static_cast<double>( value ) );
+
+ flag() = 1;
+ }
});
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_team_vector_for {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_team_vector_for(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
- unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
+ functor_team_vector_for( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
- KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }
- typedef typename ExecutionSpace::scratch_memory_space shmem_space ;
- typedef Kokkos::View<Scalar*,shmem_space,Kokkos::MemoryUnmanaged> shared_int;
+ KOKKOS_INLINE_FUNCTION
+ void operator()( typename policy_type::member_type team ) const {
+ typedef typename ExecutionSpace::scratch_memory_space shmem_space;
+ typedef Kokkos::View< Scalar*, shmem_space, Kokkos::MemoryUnmanaged > shared_int;
typedef typename shared_int::size_type size_type;
- const size_type shmemSize = team.team_size () * 13;
- shared_int values = shared_int (team.team_shmem (), shmemSize);
+ const size_type shmemSize = team.team_size() * 13;
+ shared_int values = shared_int( team.team_shmem(), shmemSize );
- if (values.ptr_on_device () == NULL || values.dimension_0 () < shmemSize) {
- printf ("FAILED to allocate shared memory of size %u\n",
- static_cast<unsigned int> (shmemSize));
+ if ( values.ptr_on_device() == NULL || values.dimension_0() < shmemSize ) {
+ printf( "FAILED to allocate shared memory of size %u\n",
+ static_cast<unsigned int>( shmemSize ) );
}
else {
- Kokkos::single(Kokkos::PerThread(team),[&] ()
+ team.team_barrier();
+
+ Kokkos::single( Kokkos::PerThread( team ), [&] ()
{
- values(team.team_rank ()) = 0;
+ values( team.team_rank() ) = 0;
});
- Kokkos::parallel_for(Kokkos::TeamThreadRange(team,131),[&] (int i)
+ Kokkos::parallel_for( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i )
{
- Kokkos::single(Kokkos::PerThread(team),[&] ()
+ Kokkos::single( Kokkos::PerThread( team ), [&] ()
{
- values(team.team_rank ()) += i - team.league_rank () + team.league_size () + team.team_size ();
+ values( team.team_rank() ) += i - team.league_rank() + team.league_size() + team.team_size();
});
});
- team.team_barrier ();
- Kokkos::single(Kokkos::PerTeam(team),[&]()
+ team.team_barrier();
+
+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
{
Scalar test = 0;
Scalar value = 0;
- for (int i = 0; i < 131; ++i) {
- test += i - team.league_rank () + team.league_size () + team.team_size ();
+
+ for ( int i = 0; i < 131; ++i ) {
+ test += i - team.league_rank() + team.league_size() + team.team_size();
}
- for (int i = 0; i < team.team_size (); ++i) {
- value += values(i);
+
+ for ( int i = 0; i < team.team_size(); ++i ) {
+ value += values( i );
}
- if (test != value) {
- printf ("FAILED team_vector_parallel_for %i %i %f %f\n",
- team.league_rank (), team.team_rank (),
- static_cast<double> (test), static_cast<double> (value));
+
+ if ( test != value ) {
+ printf( "FAILED team_vector_parallel_for %i %i %f %f\n",
+ team.league_rank(), team.team_rank(),
+ static_cast<double>( test ), static_cast<double>( value ) );
+
flag() = 1;
}
});
}
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_team_vector_reduce {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_team_vector_reduce(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
+ functor_team_vector_reduce( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
- unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }
KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
-
+ void operator()( typename policy_type::member_type team ) const {
Scalar value = Scalar();
- Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131),[&] (int i, Scalar& val)
+
+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i, Scalar & val )
{
- val += i - team.league_rank () + team.league_size () + team.team_size ();
- },value);
+ val += i - team.league_rank() + team.league_size() + team.team_size();
+ }, value );
- team.team_barrier ();
- Kokkos::single(Kokkos::PerTeam(team),[&]()
+ team.team_barrier();
+
+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
{
Scalar test = 0;
- for (int i = 0; i < 131; ++i) {
- test += i - team.league_rank () + team.league_size () + team.team_size ();
+
+ for ( int i = 0; i < 131; ++i ) {
+ test += i - team.league_rank() + team.league_size() + team.team_size();
}
- if (test != value) {
- if(team.league_rank() == 0)
- printf ("FAILED team_vector_parallel_reduce %i %i %f %f %lu\n",
- team.league_rank (), team.team_rank (),
- static_cast<double> (test), static_cast<double> (value),sizeof(Scalar));
- flag() = 1;
+
+ if ( test != value ) {
+ if ( team.league_rank() == 0 ) {
+ printf( "FAILED team_vector_parallel_reduce %i %i %f %f %lu\n",
+ team.league_rank(), team.team_rank(),
+ static_cast<double>( test ), static_cast<double>( value ), sizeof( Scalar ) );
+ }
+
+ flag() = 1;
}
});
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_team_vector_reduce_join {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_team_vector_reduce_join(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
- unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
+ functor_team_vector_reduce_join( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
- KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }
+ KOKKOS_INLINE_FUNCTION
+ void operator()( typename policy_type::member_type team ) const {
Scalar value = 0;
- Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,131)
- , [&] (int i, Scalar& val)
- {
- val += i - team.league_rank () + team.league_size () + team.team_size ();
- }
- , [&] (volatile Scalar& val, const volatile Scalar& src)
- {val+=src;}
- , value
+
+ Kokkos::parallel_reduce( Kokkos::TeamThreadRange( team, 131 ), [&] ( int i, Scalar & val )
+ {
+ val += i - team.league_rank() + team.league_size() + team.team_size();
+ },
+ [] ( volatile Scalar & val, const volatile Scalar & src ) { val += src; },
+ value
);
- team.team_barrier ();
- Kokkos::single(Kokkos::PerTeam(team),[&]()
+ team.team_barrier();
+
+ Kokkos::single( Kokkos::PerTeam( team ), [&] ()
{
Scalar test = 0;
- for (int i = 0; i < 131; ++i) {
- test += i - team.league_rank () + team.league_size () + team.team_size ();
+
+ for ( int i = 0; i < 131; ++i ) {
+ test += i - team.league_rank() + team.league_size() + team.team_size();
}
- if (test != value) {
- printf ("FAILED team_vector_parallel_reduce_join %i %i %f %f\n",
- team.league_rank (), team.team_rank (),
- static_cast<double> (test), static_cast<double> (value));
+
+ if ( test != value ) {
+ printf( "FAILED team_vector_parallel_reduce_join %i %i %f %f\n",
+ team.league_rank(), team.team_rank(),
+ static_cast<double>( test ), static_cast<double>( value ) );
+
flag() = 1;
}
});
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_vec_single {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_vec_single(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
+ functor_vec_single( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
-
- // Warning: this test case intentionally violates permissable semantics
+ void operator()( typename policy_type::member_type team ) const {
+ // Warning: this test case intentionally violates permissable semantics.
// It is not valid to get references to members of the enclosing region
// inside a parallel_for and write to it.
Scalar value = 0;
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,13),[&] (int i)
+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i )
{
- value = i; // This write is violating Kokkos semantics for nested parallelism
+ value = i; // This write is violating Kokkos semantics for nested parallelism.
});
- Kokkos::single(Kokkos::PerThread(team),[&] (Scalar& val)
+ Kokkos::single( Kokkos::PerThread( team ), [&] ( Scalar & val )
{
val = 1;
- },value);
+ }, value );
Scalar value2 = 0;
- Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13), [&] (int i, Scalar& val)
+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i, Scalar & val )
{
val += value;
- },value2);
+ }, value2 );
+
+ if ( value2 != ( value * 13 ) ) {
+ printf( "FAILED vector_single broadcast %i %i %f %f\n",
+ team.league_rank(), team.team_rank(), (double) value2, (double) value );
- if(value2!=(value*13)) {
- printf("FAILED vector_single broadcast %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) value2,(double) value);
- flag()=1;
+ flag() = 1;
}
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_vec_for {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_vec_for(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
+
+ functor_vec_for( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
- unsigned team_shmem_size(int team_size) const {return team_size*13*sizeof(Scalar)+8;}
+ unsigned team_shmem_size( int team_size ) const { return team_size * 13 * sizeof( Scalar ) + 8; }
KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
+ void operator()( typename policy_type::member_type team ) const {
+ typedef typename ExecutionSpace::scratch_memory_space shmem_space;
+ typedef Kokkos::View< Scalar*, shmem_space, Kokkos::MemoryUnmanaged > shared_int;
- typedef typename ExecutionSpace::scratch_memory_space shmem_space ;
- typedef Kokkos::View<Scalar*,shmem_space,Kokkos::MemoryUnmanaged> shared_int;
- shared_int values = shared_int(team.team_shmem(),team.team_size()*13);
+ shared_int values = shared_int( team.team_shmem(), team.team_size() * 13 );
- if (values.ptr_on_device () == NULL ||
- values.dimension_0() < (unsigned) team.team_size() * 13) {
- printf ("FAILED to allocate memory of size %i\n",
- static_cast<int> (team.team_size () * 13));
+ if ( values.ptr_on_device() == NULL || values.dimension_0() < (unsigned) team.team_size() * 13 ) {
+ printf( "FAILED to allocate memory of size %i\n", static_cast<int>( team.team_size() * 13 ) );
flag() = 1;
}
else {
- Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,13), [&] (int i)
+ Kokkos::parallel_for( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i )
{
- values(13*team.team_rank() + i) = i - team.team_rank() - team.league_rank() + team.league_size() + team.team_size();
+ values( 13 * team.team_rank() + i ) =
+ i - team.team_rank() - team.league_rank() + team.league_size() + team.team_size();
});
- Kokkos::single(Kokkos::PerThread(team),[&] ()
+ Kokkos::single( Kokkos::PerThread( team ), [&] ()
{
Scalar test = 0;
Scalar value = 0;
- for (int i = 0; i < 13; ++i) {
+
+ for ( int i = 0; i < 13; ++i ) {
test += i - team.team_rank() - team.league_rank() + team.league_size() + team.team_size();
- value += values(13*team.team_rank() + i);
+ value += values( 13 * team.team_rank() + i );
}
- if (test != value) {
- printf ("FAILED vector_par_for %i %i %f %f\n",
- team.league_rank (), team.team_rank (),
- static_cast<double> (test), static_cast<double> (value));
+
+ if ( test != value ) {
+ printf( "FAILED vector_par_for %i %i %f %f\n",
+ team.league_rank(), team.team_rank(),
+ static_cast<double>( test ), static_cast<double>( value ) );
+
flag() = 1;
}
});
}
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_vec_red {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_vec_red(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
+
+ functor_vec_red( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
+ void operator()( typename policy_type::member_type team ) const {
Scalar value = 0;
- Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13),[&] (int i, Scalar& val)
+ // When no reducer is given the default is summation.
+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i, Scalar & val )
{
val += i;
- }, value);
+ }, value );
- Kokkos::single(Kokkos::PerThread(team),[&] ()
+ Kokkos::single( Kokkos::PerThread( team ), [&] ()
{
Scalar test = 0;
- for(int i = 0; i < 13; i++) {
- test+=i;
- }
- if(test!=value) {
- printf("FAILED vector_par_reduce %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) test,(double) value);
- flag()=1;
+
+ for ( int i = 0; i < 13; i++ ) test += i;
+
+ if ( test != value ) {
+ printf( "FAILED vector_par_reduce %i %i %f %f\n",
+ team.league_rank(), team.team_rank(), (double) test, (double) value );
+
+ flag() = 1;
}
});
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_vec_red_join {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_vec_red_join(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
+
+ functor_vec_red_join( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
+ void operator()( typename policy_type::member_type team ) const {
+ // Must initialize to the identity value for the reduce operation
+ // for this test:
+ // ( identity, operation ) = ( 1 , *= )
Scalar value = 1;
- Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,13)
- , [&] (int i, Scalar& val)
- { val *= i; }
- , [&] (Scalar& val, const Scalar& src)
- {val*=src;}
- , value
+ Kokkos::parallel_reduce( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i, Scalar & val )
+ {
+ val *= ( i % 5 + 1 );
+ },
+ [&] ( Scalar & val, const Scalar & src ) { val *= src; },
+ value
);
- Kokkos::single(Kokkos::PerThread(team),[&] ()
+ Kokkos::single( Kokkos::PerThread( team ), [&] ()
{
Scalar test = 1;
- for(int i = 0; i < 13; i++) {
- test*=i;
- }
- if(test!=value) {
- printf("FAILED vector_par_reduce_join %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) test,(double) value);
- flag()=1;
+
+ for ( int i = 0; i < 13; i++ ) test *= ( i % 5 + 1 );
+
+ if ( test != value ) {
+ printf( "FAILED vector_par_reduce_join %i %i %f %f\n",
+ team.league_rank(), team.team_rank(), (double) test, (double) value );
+
+ flag() = 1;
}
});
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_vec_scan {
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_vec_scan(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
+ functor_vec_scan( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team) const {
- Kokkos::parallel_scan(Kokkos::ThreadVectorRange(team,13),[&] (int i, Scalar& val, bool final)
+ void operator()( typename policy_type::member_type team ) const {
+ Kokkos::parallel_scan( Kokkos::ThreadVectorRange( team, 13 ), [&] ( int i, Scalar & val, bool final )
{
val += i;
- if(final) {
+
+ if ( final ) {
Scalar test = 0;
- for(int k = 0; k <= i; k++) {
- test+=k;
- }
- if(test!=val) {
- printf("FAILED vector_par_scan %i %i %f %f\n",team.league_rank(),team.team_rank(),(double) test,(double) val);
- flag()=1;
+ for ( int k = 0; k <= i; k++ ) test += k;
+
+ if ( test != val ) {
+ printf( "FAILED vector_par_scan %i %i %f %f\n",
+ team.league_rank(), team.team_rank(), (double) test, (double) val );
+
+ flag() = 1;
}
}
});
}
};
-template<typename Scalar, class ExecutionSpace>
+template< typename Scalar, class ExecutionSpace >
struct functor_reduce {
typedef double value_type;
- typedef Kokkos::TeamPolicy<ExecutionSpace> policy_type;
+ typedef Kokkos::TeamPolicy< ExecutionSpace > policy_type;
typedef ExecutionSpace execution_space;
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag;
- functor_reduce(Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> flag_):flag(flag_) {}
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag;
+ functor_reduce( Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > flag_ ) : flag( flag_ ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (typename policy_type::member_type team, double& sum) const {
+ void operator()( typename policy_type::member_type team, double & sum ) const {
sum += team.league_rank() * 100 + team.thread_rank();
}
};
-template<typename Scalar,class ExecutionSpace>
-bool test_scalar(int nteams, int team_size, int test) {
- Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace> d_flag("flag");
- typename Kokkos::View<int,Kokkos::LayoutLeft,ExecutionSpace>::HostMirror h_flag("h_flag");
- h_flag() = 0 ;
- Kokkos::deep_copy(d_flag,h_flag);
-
- if(test==0)
- Kokkos::parallel_for( std::string("A") , Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
- functor_vec_red<Scalar, ExecutionSpace>(d_flag));
- if(test==1)
- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
- functor_vec_red_join<Scalar, ExecutionSpace>(d_flag));
- if(test==2)
- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
- functor_vec_scan<Scalar, ExecutionSpace>(d_flag));
- if(test==3)
- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
- functor_vec_for<Scalar, ExecutionSpace>(d_flag));
- if(test==4)
- Kokkos::parallel_for( "B" , Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
- functor_vec_single<Scalar, ExecutionSpace>(d_flag));
- if(test==5)
- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),
- functor_team_for<Scalar, ExecutionSpace>(d_flag));
- if(test==6)
- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),
- functor_team_reduce<Scalar, ExecutionSpace>(d_flag));
- if(test==7)
- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size),
- functor_team_reduce_join<Scalar, ExecutionSpace>(d_flag));
- if(test==8)
- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
- functor_team_vector_for<Scalar, ExecutionSpace>(d_flag));
- if(test==9)
- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
- functor_team_vector_reduce<Scalar, ExecutionSpace>(d_flag));
- if(test==10)
- Kokkos::parallel_for( Kokkos::TeamPolicy<ExecutionSpace>(nteams,team_size,8),
- functor_team_vector_reduce_join<Scalar, ExecutionSpace>(d_flag));
-
- Kokkos::deep_copy(h_flag,d_flag);
-
- return (h_flag() == 0);
+template< typename Scalar, class ExecutionSpace >
+bool test_scalar( int nteams, int team_size, int test ) {
+ Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace > d_flag( "flag" );
+ typename Kokkos::View< int, Kokkos::LayoutLeft, ExecutionSpace >::HostMirror h_flag( "h_flag" );
+ h_flag() = 0;
+ Kokkos::deep_copy( d_flag, h_flag );
+
+ if ( test == 0 ) {
+ Kokkos::parallel_for( std::string( "A" ), Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
+ functor_vec_red< Scalar, ExecutionSpace >( d_flag ) );
+ }
+ else if ( test == 1 ) {
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
+ functor_vec_red_join< Scalar, ExecutionSpace >( d_flag ) );
+ }
+ else if ( test == 2 ) {
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
+ functor_vec_scan< Scalar, ExecutionSpace >( d_flag ) );
+ }
+ else if ( test == 3 ) {
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
+ functor_vec_for< Scalar, ExecutionSpace >( d_flag ) );
+ }
+ else if ( test == 4 ) {
+ Kokkos::parallel_for( "B", Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
+ functor_vec_single< Scalar, ExecutionSpace >( d_flag ) );
+ }
+ else if ( test == 5 ) {
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size ),
+ functor_team_for< Scalar, ExecutionSpace >( d_flag ) );
+ }
+ else if ( test == 6 ) {
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size ),
+ functor_team_reduce< Scalar, ExecutionSpace >( d_flag ) );
+ }
+ else if ( test == 7 ) {
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size ),
+ functor_team_reduce_join< Scalar, ExecutionSpace >( d_flag ) );
+ }
+ else if ( test == 8 ) {
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
+ functor_team_vector_for< Scalar, ExecutionSpace >( d_flag ) );
+ }
+ else if ( test == 9 ) {
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
+ functor_team_vector_reduce< Scalar, ExecutionSpace >( d_flag ) );
+ }
+ else if ( test == 10 ) {
+ Kokkos::parallel_for( Kokkos::TeamPolicy< ExecutionSpace >( nteams, team_size, 8 ),
+ functor_team_vector_reduce_join< Scalar, ExecutionSpace >( d_flag ) );
+ }
+
+ Kokkos::deep_copy( h_flag, d_flag );
+
+ return ( h_flag() == 0 );
}
-template<class ExecutionSpace>
-bool Test(int test) {
+template< class ExecutionSpace >
+bool Test( int test ) {
bool passed = true;
- passed = passed && test_scalar<int, ExecutionSpace>(317,33,test);
- passed = passed && test_scalar<long long int, ExecutionSpace>(317,33,test);
- passed = passed && test_scalar<float, ExecutionSpace>(317,33,test);
- passed = passed && test_scalar<double, ExecutionSpace>(317,33,test);
- passed = passed && test_scalar<my_complex, ExecutionSpace>(317,33,test);
- return passed;
-}
+ passed = passed && test_scalar< int, ExecutionSpace >( 317, 33, test );
+ passed = passed && test_scalar< long long int, ExecutionSpace >( 317, 33, test );
+ passed = passed && test_scalar< float, ExecutionSpace >( 317, 33, test );
+ passed = passed && test_scalar< double, ExecutionSpace >( 317, 33, test );
+ passed = passed && test_scalar< my_complex, ExecutionSpace >( 317, 33, test );
+ return passed;
}
+} // namespace TestTeamVector
diff --git a/lib/kokkos/core/unit_test/TestTemplateMetaFunctions.hpp b/lib/kokkos/core/unit_test/TestTemplateMetaFunctions.hpp
index 203c95267..7bcf3f8a3 100644
--- a/lib/kokkos/core/unit_test/TestTemplateMetaFunctions.hpp
+++ b/lib/kokkos/core/unit_test/TestTemplateMetaFunctions.hpp
@@ -1,198 +1,208 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#define KOKKOS_PRAGMA_UNROLL(a)
namespace {
-template<class Scalar, class ExecutionSpace>
+template< class Scalar, class ExecutionSpace >
struct SumPlain {
typedef ExecutionSpace execution_space;
- typedef typename Kokkos::View<Scalar*,execution_space> type;
+ typedef typename Kokkos::View< Scalar*, execution_space > type;
+
type view;
- SumPlain(type view_):view(view_) {}
+
+ SumPlain( type view_ ) : view( view_ ) {}
KOKKOS_INLINE_FUNCTION
- void operator() (int i, Scalar& val) {
+ void operator() ( int i, Scalar & val ) {
val += Scalar();
}
};
-template<class Scalar, class ExecutionSpace>
+template< class Scalar, class ExecutionSpace >
struct SumInitJoinFinalValueType {
typedef ExecutionSpace execution_space;
- typedef typename Kokkos::View<Scalar*,execution_space> type;
- type view;
+ typedef typename Kokkos::View< Scalar*, execution_space > type;
typedef Scalar value_type;
- SumInitJoinFinalValueType(type view_):view(view_) {}
+
+ type view;
+
+ SumInitJoinFinalValueType( type view_ ) : view( view_ ) {}
KOKKOS_INLINE_FUNCTION
- void init(value_type& val) const {
+ void init( value_type & val ) const {
val = value_type();
}
KOKKOS_INLINE_FUNCTION
- void join(volatile value_type& val, volatile value_type& src) const {
+ void join( volatile value_type & val, volatile value_type & src ) const {
val += src;
}
KOKKOS_INLINE_FUNCTION
- void operator() (int i, value_type& val) const {
+ void operator()( int i, value_type & val ) const {
val += value_type();
}
-
};
-template<class Scalar, class ExecutionSpace>
+template< class Scalar, class ExecutionSpace >
struct SumInitJoinFinalValueType2 {
typedef ExecutionSpace execution_space;
- typedef typename Kokkos::View<Scalar*,execution_space> type;
- type view;
+ typedef typename Kokkos::View< Scalar*, execution_space > type;
typedef Scalar value_type;
- SumInitJoinFinalValueType2(type view_):view(view_) {}
+
+ type view;
+
+ SumInitJoinFinalValueType2( type view_ ) : view( view_ ) {}
KOKKOS_INLINE_FUNCTION
- void init(volatile value_type& val) const {
+ void init( volatile value_type & val ) const {
val = value_type();
}
KOKKOS_INLINE_FUNCTION
- void join(volatile value_type& val, const volatile value_type& src) const {
+ void join( volatile value_type & val, const volatile value_type & src ) const {
val += src;
}
KOKKOS_INLINE_FUNCTION
- void operator() (int i, value_type& val) const {
+ void operator()( int i, value_type & val ) const {
val += value_type();
}
-
};
-template<class Scalar, class ExecutionSpace>
+template< class Scalar, class ExecutionSpace >
struct SumInitJoinFinalValueTypeArray {
typedef ExecutionSpace execution_space;
- typedef typename Kokkos::View<Scalar*,execution_space> type;
- type view;
+ typedef typename Kokkos::View< Scalar*, execution_space > type;
typedef Scalar value_type[];
+
+ type view;
int n;
- SumInitJoinFinalValueTypeArray(type view_, int n_):view(view_),n(n_) {}
+
+ SumInitJoinFinalValueTypeArray( type view_, int n_ ) : view( view_ ), n( n_ ) {}
KOKKOS_INLINE_FUNCTION
- void init(value_type val) const {
- for(int k=0;k<n;k++)
+ void init( value_type val ) const {
+ for ( int k = 0; k < n; k++ ) {
val[k] = 0;
+ }
}
KOKKOS_INLINE_FUNCTION
- void join(volatile value_type val, const volatile value_type src) const {
- for(int k=0;k<n;k++)
+ void join( volatile value_type val, const volatile value_type src ) const {
+ for ( int k = 0; k < n; k++ ) {
val[k] += src[k];
+ }
}
KOKKOS_INLINE_FUNCTION
- void operator() (int i, value_type val) const {
- for(int k=0;k<n;k++)
- val[k] += k*i;
+ void operator()( int i, value_type val ) const {
+ for ( int k = 0; k < n; k++ ) {
+ val[k] += k * i;
+ }
}
-
};
-template<class Scalar, class ExecutionSpace>
+template< class Scalar, class ExecutionSpace >
struct SumWrongInitJoinFinalValueType {
typedef ExecutionSpace execution_space;
- typedef typename Kokkos::View<Scalar*,execution_space> type;
- type view;
+ typedef typename Kokkos::View< Scalar*, execution_space > type;
typedef Scalar value_type;
- SumWrongInitJoinFinalValueType(type view_):view(view_) {}
+
+ type view;
+
+ SumWrongInitJoinFinalValueType( type view_ ) : view( view_ ) {}
KOKKOS_INLINE_FUNCTION
- void init(double& val) const {
+ void init( double & val ) const {
val = double();
}
KOKKOS_INLINE_FUNCTION
- void join(volatile value_type& val, const value_type& src) const {
+ void join( volatile value_type & val, const value_type & src ) const {
val += src;
}
KOKKOS_INLINE_FUNCTION
- void operator() (int i, value_type& val) const {
+ void operator()( int i, value_type & val ) const {
val += value_type();
}
-
};
-template<class Scalar, class ExecutionSpace>
+template< class Scalar, class ExecutionSpace >
void TestTemplateMetaFunctions() {
- typedef typename Kokkos::View<Scalar*,ExecutionSpace> type;
- type a("A",100);
+ typedef typename Kokkos::View< Scalar*, ExecutionSpace > type;
+ type a( "A", 100 );
/*
- int sum_plain_has_init_arg = Kokkos::Impl::FunctorHasInit<SumPlain<Scalar,ExecutionSpace>, Scalar& >::value;
- ASSERT_EQ(sum_plain_has_init_arg,0);
- int sum_initjoinfinalvaluetype_has_init_arg = Kokkos::Impl::FunctorHasInit<SumInitJoinFinalValueType<Scalar,ExecutionSpace>, Scalar >::value;
- ASSERT_EQ(sum_initjoinfinalvaluetype_has_init_arg,1);
- int sum_initjoinfinalvaluetype_has_init_arg2 = Kokkos::Impl::FunctorHasInit<SumInitJoinFinalValueType2<Scalar,ExecutionSpace>, Scalar >::value;
- ASSERT_EQ(sum_initjoinfinalvaluetype_has_init_arg2,1);
- int sum_wronginitjoinfinalvaluetype_has_init_arg = Kokkos::Impl::FunctorHasInit<SumWrongInitJoinFinalValueType<Scalar,ExecutionSpace>, Scalar >::value;
- ASSERT_EQ(sum_wronginitjoinfinalvaluetype_has_init_arg,0);
-
- //int sum_initjoinfinalvaluetypearray_has_init_arg = Kokkos::Impl::FunctorHasInit<SumInitJoinFinalValueTypeArray<Scalar,ExecutionSpace>, Scalar[] >::value;
- //ASSERT_EQ(sum_initjoinfinalvaluetypearray_has_init_arg,1);
-
- //printf("Values Init: %i %i %i\n",sum_plain_has_init_arg,sum_initjoinfinalvaluetype_has_init_arg,sum_wronginitjoinfinalvaluetype_has_init_arg);
-
- int sum_plain_has_join_arg = Kokkos::Impl::FunctorHasJoin<SumPlain<Scalar,ExecutionSpace>, Scalar >::value;
- ASSERT_EQ(sum_plain_has_join_arg,0);
- int sum_initjoinfinalvaluetype_has_join_arg = Kokkos::Impl::FunctorHasJoin<SumInitJoinFinalValueType<Scalar,ExecutionSpace>, Scalar >::value;
- ASSERT_EQ(sum_initjoinfinalvaluetype_has_join_arg,1);
- int sum_initjoinfinalvaluetype_has_join_arg2 = Kokkos::Impl::FunctorHasJoin<SumInitJoinFinalValueType2<Scalar,ExecutionSpace>, Scalar >::value;
- ASSERT_EQ(sum_initjoinfinalvaluetype_has_join_arg2,1);
- int sum_wronginitjoinfinalvaluetype_has_join_arg = Kokkos::Impl::FunctorHasJoin<SumWrongInitJoinFinalValueType<Scalar,ExecutionSpace>, Scalar >::value;
- ASSERT_EQ(sum_wronginitjoinfinalvaluetype_has_join_arg,0);
+ int sum_plain_has_init_arg = Kokkos::Impl::FunctorHasInit< SumPlain<Scalar, ExecutionSpace>, Scalar & >::value;
+ ASSERT_EQ( sum_plain_has_init_arg, 0 );
+ int sum_initjoinfinalvaluetype_has_init_arg = Kokkos::Impl::FunctorHasInit< SumInitJoinFinalValueType<Scalar, ExecutionSpace>, Scalar >::value;
+ ASSERT_EQ( sum_initjoinfinalvaluetype_has_init_arg, 1 );
+ int sum_initjoinfinalvaluetype_has_init_arg2 = Kokkos::Impl::FunctorHasInit< SumInitJoinFinalValueType2<Scalar,ExecutionSpace>, Scalar >::value;
+ ASSERT_EQ( sum_initjoinfinalvaluetype_has_init_arg2, 1 );
+ int sum_wronginitjoinfinalvaluetype_has_init_arg = Kokkos::Impl::FunctorHasInit< SumWrongInitJoinFinalValueType<Scalar, ExecutionSpace>, Scalar >::value;
+ ASSERT_EQ( sum_wronginitjoinfinalvaluetype_has_init_arg, 0 );
+
+ //int sum_initjoinfinalvaluetypearray_has_init_arg = Kokkos::Impl::FunctorHasInit< SumInitJoinFinalValueTypeArray<Scalar, ExecutionSpace>, Scalar[] >::value;
+ //ASSERT_EQ( sum_initjoinfinalvaluetypearray_has_init_arg, 1 );
+
+ //printf( "Values Init: %i %i %i\n", sum_plain_has_init_arg, sum_initjoinfinalvaluetype_has_init_arg, sum_wronginitjoinfinalvaluetype_has_init_arg );
+
+ int sum_plain_has_join_arg = Kokkos::Impl::FunctorHasJoin< SumPlain<Scalar, ExecutionSpace>, Scalar >::value;
+ ASSERT_EQ( sum_plain_has_join_arg, 0 );
+ int sum_initjoinfinalvaluetype_has_join_arg = Kokkos::Impl::FunctorHasJoin< SumInitJoinFinalValueType<Scalar, ExecutionSpace>, Scalar >::value;
+ ASSERT_EQ( sum_initjoinfinalvaluetype_has_join_arg, 1 );
+ int sum_initjoinfinalvaluetype_has_join_arg2 = Kokkos::Impl::FunctorHasJoin< SumInitJoinFinalValueType2<Scalar, ExecutionSpace>, Scalar >::value;
+ ASSERT_EQ( sum_initjoinfinalvaluetype_has_join_arg2, 1 );
+ int sum_wronginitjoinfinalvaluetype_has_join_arg = Kokkos::Impl::FunctorHasJoin< SumWrongInitJoinFinalValueType<Scalar, ExecutionSpace>, Scalar >::value;
+ ASSERT_EQ( sum_wronginitjoinfinalvaluetype_has_join_arg, 0 );
+
+ //printf( "Values Join: %i %i %i\n", sum_plain_has_join_arg, sum_initjoinfinalvaluetype_has_join_arg, sum_wronginitjoinfinalvaluetype_has_join_arg );
*/
- //printf("Values Join: %i %i %i\n",sum_plain_has_join_arg,sum_initjoinfinalvaluetype_has_join_arg,sum_wronginitjoinfinalvaluetype_has_join_arg);
}
-}
+} // namespace
diff --git a/lib/kokkos/core/unit_test/TestTile.hpp b/lib/kokkos/core/unit_test/TestTile.hpp
index 842131deb..7d096c24c 100644
--- a/lib/kokkos/core/unit_test/TestTile.hpp
+++ b/lib/kokkos/core/unit_test/TestTile.hpp
@@ -1,154 +1,142 @@
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
#ifndef TEST_TILE_HPP
#define TEST_TILE_HPP
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_ViewTile.hpp>
namespace TestTile {
-template < typename Device , typename TileLayout>
+template < typename Device, typename TileLayout >
struct ReduceTileErrors
{
- typedef Device execution_space ;
-
- typedef Kokkos::View< ptrdiff_t**, TileLayout, Device> array_type;
- typedef Kokkos::View< ptrdiff_t[ TileLayout::N0 ][ TileLayout::N1 ], Kokkos::LayoutLeft , Device > tile_type ;
-
- array_type m_array ;
-
+ typedef Device execution_space;
+ typedef Kokkos::View< ptrdiff_t**, TileLayout, Device > array_type;
+ typedef Kokkos::View< ptrdiff_t[ TileLayout::N0 ][ TileLayout::N1 ], Kokkos::LayoutLeft, Device > tile_type;
typedef ptrdiff_t value_type;
- ReduceTileErrors( array_type a )
- : m_array(a)
- {}
+ array_type m_array;
+ ReduceTileErrors( array_type a ) : m_array( a ) {}
KOKKOS_INLINE_FUNCTION
- static void init( value_type & errors )
- {
- errors = 0;
- }
+ static void init( value_type & errors ) { errors = 0; }
KOKKOS_INLINE_FUNCTION
- static void join( volatile value_type & errors ,
+ static void join( volatile value_type & errors,
const volatile value_type & src_errors )
{
errors += src_errors;
}
- // Initialize
+ // Initialize.
KOKKOS_INLINE_FUNCTION
void operator()( size_t iwork ) const
{
const size_t i = iwork % m_array.dimension_0();
const size_t j = iwork / m_array.dimension_0();
- if ( j < m_array.dimension_1() ) {
- m_array(i,j) = & m_array(i,j) - & m_array(0,0);
-// printf("m_array(%d,%d) = %d\n",int(i),int(j),int(m_array(i,j)));
+ if ( j < m_array.dimension_1() ) {
+ m_array( i, j ) = &m_array( i, j ) - &m_array( 0, 0 );
+ //printf( "m_array(%d, %d) = %d\n", int( i ), int( j ), int( m_array( i, j ) ) );
}
}
// Verify:
KOKKOS_INLINE_FUNCTION
- void operator()( size_t iwork , value_type & errors ) const
+ void operator()( size_t iwork, value_type & errors ) const
{
- const size_t tile_dim0 = ( m_array.dimension_0() + TileLayout::N0 - 1 ) / TileLayout::N0 ;
- const size_t tile_dim1 = ( m_array.dimension_1() + TileLayout::N1 - 1 ) / TileLayout::N1 ;
+ const size_t tile_dim0 = ( m_array.dimension_0() + TileLayout::N0 - 1 ) / TileLayout::N0;
+ const size_t tile_dim1 = ( m_array.dimension_1() + TileLayout::N1 - 1 ) / TileLayout::N1;
- const size_t itile = iwork % tile_dim0 ;
- const size_t jtile = iwork / tile_dim0 ;
+ const size_t itile = iwork % tile_dim0;
+ const size_t jtile = iwork / tile_dim0;
if ( jtile < tile_dim1 ) {
+ tile_type tile = Kokkos::Experimental::tile_subview( m_array, itile, jtile );
- tile_type tile = Kokkos::Experimental::tile_subview( m_array , itile , jtile );
-
- if ( tile(0,0) != ptrdiff_t(( itile + jtile * tile_dim0 ) * TileLayout::N0 * TileLayout::N1 ) ) {
- ++errors ;
+ if ( tile( 0, 0 ) != ptrdiff_t( ( itile + jtile * tile_dim0 ) * TileLayout::N0 * TileLayout::N1 ) ) {
+ ++errors;
}
else {
+ for ( size_t j = 0; j < size_t( TileLayout::N1 ); ++j ) {
+ for ( size_t i = 0; i < size_t( TileLayout::N0 ); ++i ) {
+ const size_t iglobal = i + itile * TileLayout::N0;
+ const size_t jglobal = j + jtile * TileLayout::N1;
- for ( size_t j = 0 ; j < size_t(TileLayout::N1) ; ++j ) {
- for ( size_t i = 0 ; i < size_t(TileLayout::N0) ; ++i ) {
- const size_t iglobal = i + itile * TileLayout::N0 ;
- const size_t jglobal = j + jtile * TileLayout::N1 ;
-
- if ( iglobal < m_array.dimension_0() && jglobal < m_array.dimension_1() ) {
- if ( tile(i,j) != ptrdiff_t( tile(0,0) + i + j * TileLayout::N0 ) ) ++errors ;
-
-// printf("tile(%d,%d)(%d,%d) = %d\n",int(itile),int(jtile),int(i),int(j),int(tile(i,j)));
+ if ( iglobal < m_array.dimension_0() && jglobal < m_array.dimension_1() ) {
+ if ( tile( i, j ) != ptrdiff_t( tile( 0, 0 ) + i + j * TileLayout::N0 ) ) ++errors;
+ //printf( "tile(%d, %d)(%d, %d) = %d\n", int( itile ), int( jtile ), int( i ), int( j ), int( tile( i, j ) ) );
+ }
}
}
- }
}
}
}
};
-template< class Space , unsigned N0 , unsigned N1 >
-void test( const size_t dim0 , const size_t dim1 )
+template< class Space, unsigned N0, unsigned N1 >
+void test( const size_t dim0, const size_t dim1 )
{
- typedef Kokkos::LayoutTileLeft<N0,N1> array_layout ;
- typedef ReduceTileErrors< Space , array_layout > functor_type ;
+ typedef Kokkos::LayoutTileLeft< N0, N1 > array_layout;
+ typedef ReduceTileErrors< Space, array_layout > functor_type;
- const size_t tile_dim0 = ( dim0 + N0 - 1 ) / N0 ;
- const size_t tile_dim1 = ( dim1 + N1 - 1 ) / N1 ;
-
- typename functor_type::array_type array("",dim0,dim1);
+ const size_t tile_dim0 = ( dim0 + N0 - 1 ) / N0;
+ const size_t tile_dim1 = ( dim1 + N1 - 1 ) / N1;
- Kokkos::parallel_for( Kokkos::RangePolicy<Space,size_t>(0,dim0*dim1) , functor_type( array ) );
+ typename functor_type::array_type array( "", dim0, dim1 );
- ptrdiff_t error = 0 ;
+ Kokkos::parallel_for( Kokkos::RangePolicy< Space, size_t >( 0, dim0 * dim1 ), functor_type( array ) );
- Kokkos::parallel_reduce( Kokkos::RangePolicy<Space,size_t>(0,tile_dim0*tile_dim1) , functor_type( array ) , error );
+ ptrdiff_t error = 0;
- EXPECT_EQ( error , ptrdiff_t(0) );
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< Space, size_t >( 0, tile_dim0 * tile_dim1 ), functor_type( array ), error );
+
+ EXPECT_EQ( error, ptrdiff_t( 0 ) );
}
-} /* namespace TestTile */
+} // namespace TestTile
#endif //TEST_TILE_HPP
-
diff --git a/lib/kokkos/core/unit_test/TestUtilities.hpp b/lib/kokkos/core/unit_test/TestUtilities.hpp
index 947be03e3..be4a93b89 100644
--- a/lib/kokkos/core/unit_test/TestUtilities.hpp
+++ b/lib/kokkos/core/unit_test/TestUtilities.hpp
@@ -1,306 +1,301 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <stdexcept>
#include <sstream>
#include <iostream>
#include <Kokkos_Core.hpp>
-/*--------------------------------------------------------------------------*/
-
namespace Test {
inline
void test_utilities()
{
using namespace Kokkos::Impl;
+
{
- using i = integer_sequence<int>;
- using j = make_integer_sequence<int,0>;
+ using i = integer_sequence< int >;
+ using j = make_integer_sequence< int, 0 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 0u, "Error: integer_sequence.size()" );
}
-
{
- using i = integer_sequence<int,0>;
- using j = make_integer_sequence<int,1>;
+ using i = integer_sequence< int, 0 >;
+ using j = make_integer_sequence< int, 1 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 1u, "Error: integer_sequence.size()" );
- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
}
-
{
- using i = integer_sequence<int,0,1>;
- using j = make_integer_sequence<int,2>;
+ using i = integer_sequence< int, 0, 1 >;
+ using j = make_integer_sequence< int, 2 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 2u, "Error: integer_sequence.size()" );
- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
}
{
- using i = integer_sequence<int,0,1,2>;
- using j = make_integer_sequence<int,3>;
+ using i = integer_sequence< int, 0, 1, 2 >;
+ using j = make_integer_sequence< int, 3 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 3u, "Error: integer_sequence.size()" );
- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
}
{
- using i = integer_sequence<int,0,1,2,3>;
- using j = make_integer_sequence<int,4>;
+ using i = integer_sequence< int, 0, 1, 2, 3 >;
+ using j = make_integer_sequence< int, 4 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 4u, "Error: integer_sequence.size()" );
- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
}
{
- using i = integer_sequence<int,0,1,2,3,4>;
- using j = make_integer_sequence<int,5>;
+ using i = integer_sequence< int, 0, 1, 2, 3, 4 >;
+ using j = make_integer_sequence< int, 5 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 5u, "Error: integer_sequence.size()" );
- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
-
- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
+
+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
}
{
- using i = integer_sequence<int,0,1,2,3,4,5>;
- using j = make_integer_sequence<int,6>;
+ using i = integer_sequence< int, 0, 1, 2, 3, 4, 5 >;
+ using j = make_integer_sequence< int, 6 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 6u, "Error: integer_sequence.size()" );
- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
-
- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 5, i >::value == 5, "Error: integer_sequence_at" );
+
+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 5, i{} ) == 5, "Error: at(unsigned, integer_sequence)" );
}
{
- using i = integer_sequence<int,0,1,2,3,4,5,6>;
- using j = make_integer_sequence<int,7>;
+ using i = integer_sequence< int, 0, 1, 2, 3, 4, 5, 6 >;
+ using j = make_integer_sequence< int, 7 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 7u, "Error: integer_sequence.size()" );
- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
-
- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 5, i >::value == 5, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 6, i >::value == 6, "Error: integer_sequence_at" );
+
+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 5, i{} ) == 5, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 6, i{} ) == 6, "Error: at(unsigned, integer_sequence)" );
}
{
- using i = integer_sequence<int,0,1,2,3,4,5,6,7>;
- using j = make_integer_sequence<int,8>;
+ using i = integer_sequence< int, 0, 1, 2, 3, 4, 5, 6, 7 >;
+ using j = make_integer_sequence< int, 8 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 8u, "Error: integer_sequence.size()" );
- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<7, i>::value == 7, "Error: integer_sequence_at" );
-
- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(7, i{}) == 7, "Error: at(unsigned, integer_sequence)" );
+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 5, i >::value == 5, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 6, i >::value == 6, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 7, i >::value == 7, "Error: integer_sequence_at" );
+
+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 5, i{} ) == 5, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 6, i{} ) == 6, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 7, i{} ) == 7, "Error: at(unsigned, integer_sequence)" );
}
{
- using i = integer_sequence<int,0,1,2,3,4,5,6,7,8>;
- using j = make_integer_sequence<int,9>;
+ using i = integer_sequence< int, 0, 1, 2, 3, 4, 5, 6, 7, 8 >;
+ using j = make_integer_sequence< int, 9 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 9u, "Error: integer_sequence.size()" );
- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<7, i>::value == 7, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<8, i>::value == 8, "Error: integer_sequence_at" );
-
- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(7, i{}) == 7, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(8, i{}) == 8, "Error: at(unsigned, integer_sequence)" );
+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 5, i >::value == 5, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 6, i >::value == 6, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 7, i >::value == 7, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 8, i >::value == 8, "Error: integer_sequence_at" );
+
+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 5, i{} ) == 5, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 6, i{} ) == 6, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 7, i{} ) == 7, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 8, i{} ) == 8, "Error: at(unsigned, integer_sequence)" );
}
{
- using i = integer_sequence<int,0,1,2,3,4,5,6,7,8,9>;
- using j = make_integer_sequence<int,10>;
+ using i = integer_sequence< int, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 >;
+ using j = make_integer_sequence< int, 10 >;
- static_assert( std::is_same<i,j>::value, "Error: make_integer_sequence" );
+ static_assert( std::is_same< i, j >::value, "Error: make_integer_sequence" );
static_assert( i::size() == 10u, "Error: integer_sequence.size()" );
- static_assert( integer_sequence_at<0, i>::value == 0, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<1, i>::value == 1, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<2, i>::value == 2, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<3, i>::value == 3, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<4, i>::value == 4, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<5, i>::value == 5, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<6, i>::value == 6, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<7, i>::value == 7, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<8, i>::value == 8, "Error: integer_sequence_at" );
- static_assert( integer_sequence_at<9, i>::value == 9, "Error: integer_sequence_at" );
-
- static_assert( at(0, i{}) == 0, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(1, i{}) == 1, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(2, i{}) == 2, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(3, i{}) == 3, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(4, i{}) == 4, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(5, i{}) == 5, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(6, i{}) == 6, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(7, i{}) == 7, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(8, i{}) == 8, "Error: at(unsigned, integer_sequence)" );
- static_assert( at(9, i{}) == 9, "Error: at(unsigned, integer_sequence)" );
+ static_assert( integer_sequence_at< 0, i >::value == 0, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 1, i >::value == 1, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 2, i >::value == 2, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 3, i >::value == 3, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 4, i >::value == 4, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 5, i >::value == 5, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 6, i >::value == 6, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 7, i >::value == 7, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 8, i >::value == 8, "Error: integer_sequence_at" );
+ static_assert( integer_sequence_at< 9, i >::value == 9, "Error: integer_sequence_at" );
+
+ static_assert( at( 0, i{} ) == 0, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 1, i{} ) == 1, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 2, i{} ) == 2, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 3, i{} ) == 3, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 4, i{} ) == 4, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 5, i{} ) == 5, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 6, i{} ) == 6, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 7, i{} ) == 7, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 8, i{} ) == 8, "Error: at(unsigned, integer_sequence)" );
+ static_assert( at( 9, i{} ) == 9, "Error: at(unsigned, integer_sequence)" );
}
{
- using i = make_integer_sequence<int, 5>;
- using r = reverse_integer_sequence<i>;
- using gr = integer_sequence<int, 4, 3, 2, 1, 0>;
+ using i = make_integer_sequence< int, 5 >;
+ using r = reverse_integer_sequence< i >;
+ using gr = integer_sequence< int, 4, 3, 2, 1, 0 >;
- static_assert( std::is_same<r,gr>::value, "Error: reverse_integer_sequence" );
+ static_assert( std::is_same< r, gr >::value, "Error: reverse_integer_sequence" );
}
{
- using s = make_integer_sequence<int,10>;
- using e = exclusive_scan_integer_sequence<s>;
- using i = inclusive_scan_integer_sequence<s>;
+ using s = make_integer_sequence< int, 10 >;
+ using e = exclusive_scan_integer_sequence< s >;
+ using i = inclusive_scan_integer_sequence< s >;
- using ge = integer_sequence<int, 0, 0, 1, 3, 6, 10, 15, 21, 28, 36>;
- using gi = integer_sequence<int, 0, 1, 3, 6, 10, 15, 21, 28, 36, 45>;
+ using ge = integer_sequence< int, 0, 0, 1, 3, 6, 10, 15, 21, 28, 36 >;
+ using gi = integer_sequence< int, 0, 1, 3, 6, 10, 15, 21, 28, 36, 45 >;
- static_assert( e::value == 45, "Error: scan value");
- static_assert( i::value == 45, "Error: scan value");
+ static_assert( e::value == 45, "Error: scan value" );
+ static_assert( i::value == 45, "Error: scan value" );
- static_assert( std::is_same< e::type, ge >::value, "Error: exclusive_scan");
- static_assert( std::is_same< i::type, gi >::value, "Error: inclusive_scan");
+ static_assert( std::is_same< e::type, ge >::value, "Error: exclusive_scan" );
+ static_assert( std::is_same< i::type, gi >::value, "Error: inclusive_scan" );
}
-
-
}
} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestViewAPI.hpp b/lib/kokkos/core/unit_test/TestViewAPI.hpp
index a96f31cc1..cbf86dc58 100644
--- a/lib/kokkos/core/unit_test/TestViewAPI.hpp
+++ b/lib/kokkos/core/unit_test/TestViewAPI.hpp
@@ -1,1361 +1,1322 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <stdexcept>
#include <sstream>
#include <iostream>
-/*--------------------------------------------------------------------------*/
-
-
-/*--------------------------------------------------------------------------*/
-
namespace Test {
-template< class T , class ... P >
-size_t allocation_count( const Kokkos::View<T,P...> & view )
+template< class T, class ... P >
+size_t allocation_count( const Kokkos::View< T, P... > & view )
{
const size_t card = view.size();
const size_t alloc = view.span();
- const int memory_span = Kokkos::View<int*>::required_allocation_size(100);
+ const int memory_span = Kokkos::View< int* >::required_allocation_size( 100 );
- return (card <= alloc && memory_span == 400) ? alloc : 0 ;
+ return ( card <= alloc && memory_span == 400 ) ? alloc : 0;
}
/*--------------------------------------------------------------------------*/
-template< typename T, class DeviceType>
+template< typename T, class DeviceType >
struct TestViewOperator
{
- typedef typename DeviceType::execution_space execution_space ;
+ typedef typename DeviceType::execution_space execution_space;
- static const unsigned N = 100 ;
- static const unsigned D = 3 ;
+ static const unsigned N = 100;
+ static const unsigned D = 3;
- typedef Kokkos::View< T*[D] , execution_space > view_type ;
+ typedef Kokkos::View< T*[D], execution_space > view_type;
- const view_type v1 ;
- const view_type v2 ;
+ const view_type v1;
+ const view_type v2;
TestViewOperator()
- : v1( "v1" , N )
- , v2( "v2" , N )
+ : v1( "v1", N )
+ , v2( "v2", N )
{}
static void testit()
{
- Kokkos::parallel_for( N , TestViewOperator() );
+ Kokkos::parallel_for( N, TestViewOperator() );
}
KOKKOS_INLINE_FUNCTION
void operator()( const unsigned i ) const
{
- const unsigned X = 0 ;
- const unsigned Y = 1 ;
- const unsigned Z = 2 ;
+ const unsigned X = 0;
+ const unsigned Y = 1;
+ const unsigned Z = 2;
- v2(i,X) = v1(i,X);
- v2(i,Y) = v1(i,Y);
- v2(i,Z) = v1(i,Z);
+ v2( i, X ) = v1( i, X );
+ v2( i, Y ) = v1( i, Y );
+ v2( i, Z ) = v1( i, Z );
}
};
/*--------------------------------------------------------------------------*/
-template< class DataType ,
- class DeviceType ,
+template< class DataType,
+ class DeviceType,
unsigned Rank = Kokkos::ViewTraits< DataType >::rank >
-struct TestViewOperator_LeftAndRight ;
+struct TestViewOperator_LeftAndRight;
-template< class DataType , class DeviceType >
-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 8 >
+template< class DataType, class DeviceType >
+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 8 >
{
- typedef typename DeviceType::execution_space execution_space ;
- typedef typename DeviceType::memory_space memory_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef typename DeviceType::execution_space execution_space;
+ typedef typename DeviceType::memory_space memory_space;
+ typedef typename execution_space::size_type size_type;
- typedef int value_type ;
+ typedef int value_type;
KOKKOS_INLINE_FUNCTION
- static void join( volatile value_type & update ,
+ static void join( volatile value_type & update,
const volatile value_type & input )
- { update |= input ; }
+ { update |= input; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
- { update = 0 ; }
-
+ { update = 0; }
- typedef Kokkos::
- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutStride, execution_space > stride_view;
- typedef Kokkos::
- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
-
- typedef Kokkos::
- View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
-
- left_view left ;
- right_view right ;
- stride_view left_stride ;
- stride_view right_stride ;
- long left_alloc ;
- long right_alloc ;
+ left_view left;
+ right_view right;
+ stride_view left_stride;
+ stride_view right_stride;
+ long left_alloc;
+ long right_alloc;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
- TestViewOperator_LeftAndRight driver ;
+ TestViewOperator_LeftAndRight driver;
- int error_flag = 0 ;
+ int error_flag = 0;
- Kokkos::parallel_reduce( 1 , driver , error_flag );
+ Kokkos::parallel_reduce( 1, driver, error_flag );
- ASSERT_EQ( error_flag , 0 );
+ ASSERT_EQ( error_flag, 0 );
}
KOKKOS_INLINE_FUNCTION
- void operator()( const size_type , value_type & update ) const
+ void operator()( const size_type, value_type & update ) const
{
- long offset ;
-
- offset = -1 ;
- for ( unsigned i7 = 0 ; i7 < unsigned(left.dimension_7()) ; ++i7 )
- for ( unsigned i6 = 0 ; i6 < unsigned(left.dimension_6()) ; ++i6 )
- for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
- for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
- for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
+ long offset = -1;
+
+ for ( unsigned i7 = 0; i7 < unsigned( left.dimension_7() ); ++i7 )
+ for ( unsigned i6 = 0; i6 < unsigned( left.dimension_6() ); ++i6 )
+ for ( unsigned i5 = 0; i5 < unsigned( left.dimension_5() ); ++i5 )
+ for ( unsigned i4 = 0; i4 < unsigned( left.dimension_4() ); ++i4 )
+ for ( unsigned i3 = 0; i3 < unsigned( left.dimension_3() ); ++i3 )
+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4, i5, i6, i7 ) -
& left( 0, 0, 0, 0, 0, 0, 0, 0 );
- if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
- offset = j ;
+ if ( j <= offset || left_alloc <= j ) { update |= 1; }
+ offset = j;
- if ( & left(i0,i1,i2,i3,i4,i5,i6,i7) !=
- & left_stride(i0,i1,i2,i3,i4,i5,i6,i7) ) {
- update |= 4 ;
+ if ( & left( i0, i1, i2, i3, i4, i5, i6, i7 ) !=
+ & left_stride( i0, i1, i2, i3, i4, i5, i6, i7 ) ) {
+ update |= 4;
}
}
- offset = -1 ;
- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
- for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
- for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
- for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
- for ( unsigned i6 = 0 ; i6 < unsigned(right.dimension_6()) ; ++i6 )
- for ( unsigned i7 = 0 ; i7 < unsigned(right.dimension_7()) ; ++i7 )
+ offset = -1;
+
+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
+ for ( unsigned i3 = 0; i3 < unsigned( right.dimension_3() ); ++i3 )
+ for ( unsigned i4 = 0; i4 < unsigned( right.dimension_4() ); ++i4 )
+ for ( unsigned i5 = 0; i5 < unsigned( right.dimension_5() ); ++i5 )
+ for ( unsigned i6 = 0; i6 < unsigned( right.dimension_6() ); ++i6 )
+ for ( unsigned i7 = 0; i7 < unsigned( right.dimension_7() ); ++i7 )
{
const long j = & right( i0, i1, i2, i3, i4, i5, i6, i7 ) -
& right( 0, 0, 0, 0, 0, 0, 0, 0 );
- if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
- offset = j ;
+ if ( j <= offset || right_alloc <= j ) { update |= 2; }
+ offset = j;
- if ( & right(i0,i1,i2,i3,i4,i5,i6,i7) !=
- & right_stride(i0,i1,i2,i3,i4,i5,i6,i7) ) {
- update |= 8 ;
+ if ( & right( i0, i1, i2, i3, i4, i5, i6, i7 ) !=
+ & right_stride( i0, i1, i2, i3, i4, i5, i6, i7 ) ) {
+ update |= 8;
}
}
}
};
-template< class DataType , class DeviceType >
-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 7 >
+template< class DataType, class DeviceType >
+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 7 >
{
- typedef typename DeviceType::execution_space execution_space ;
- typedef typename DeviceType::memory_space memory_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef typename DeviceType::execution_space execution_space;
+ typedef typename DeviceType::memory_space memory_space;
+ typedef typename execution_space::size_type size_type;
- typedef int value_type ;
+ typedef int value_type;
KOKKOS_INLINE_FUNCTION
- static void join( volatile value_type & update ,
+ static void join( volatile value_type & update,
const volatile value_type & input )
- { update |= input ; }
+ { update |= input; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
- { update = 0 ; }
-
-
- typedef Kokkos::
- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
+ { update = 0; }
- typedef Kokkos::
- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
- left_view left ;
- right_view right ;
- long left_alloc ;
- long right_alloc ;
+ left_view left;
+ right_view right;
+ long left_alloc;
+ long right_alloc;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
- TestViewOperator_LeftAndRight driver ;
+ TestViewOperator_LeftAndRight driver;
- int error_flag = 0 ;
+ int error_flag = 0;
- Kokkos::parallel_reduce( 1 , driver , error_flag );
+ Kokkos::parallel_reduce( 1, driver, error_flag );
- ASSERT_EQ( error_flag , 0 );
+ ASSERT_EQ( error_flag, 0 );
}
KOKKOS_INLINE_FUNCTION
- void operator()( const size_type , value_type & update ) const
+ void operator()( const size_type, value_type & update ) const
{
- long offset ;
-
- offset = -1 ;
- for ( unsigned i6 = 0 ; i6 < unsigned(left.dimension_6()) ; ++i6 )
- for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
- for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
- for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
+ long offset = -1;
+
+ for ( unsigned i6 = 0; i6 < unsigned( left.dimension_6() ); ++i6 )
+ for ( unsigned i5 = 0; i5 < unsigned( left.dimension_5() ); ++i5 )
+ for ( unsigned i4 = 0; i4 < unsigned( left.dimension_4() ); ++i4 )
+ for ( unsigned i3 = 0; i3 < unsigned( left.dimension_3() ); ++i3 )
+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4, i5, i6 ) -
& left( 0, 0, 0, 0, 0, 0, 0 );
- if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
- offset = j ;
+ if ( j <= offset || left_alloc <= j ) { update |= 1; }
+ offset = j;
}
- offset = -1 ;
- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
- for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
- for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
- for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
- for ( unsigned i6 = 0 ; i6 < unsigned(right.dimension_6()) ; ++i6 )
+ offset = -1;
+
+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
+ for ( unsigned i3 = 0; i3 < unsigned( right.dimension_3() ); ++i3 )
+ for ( unsigned i4 = 0; i4 < unsigned( right.dimension_4() ); ++i4 )
+ for ( unsigned i5 = 0; i5 < unsigned( right.dimension_5() ); ++i5 )
+ for ( unsigned i6 = 0; i6 < unsigned( right.dimension_6() ); ++i6 )
{
const long j = & right( i0, i1, i2, i3, i4, i5, i6 ) -
& right( 0, 0, 0, 0, 0, 0, 0 );
- if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
- offset = j ;
+ if ( j <= offset || right_alloc <= j ) { update |= 2; }
+ offset = j;
}
}
};
-template< class DataType , class DeviceType >
-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 6 >
+template< class DataType, class DeviceType >
+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 6 >
{
- typedef typename DeviceType::execution_space execution_space ;
- typedef typename DeviceType::memory_space memory_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef typename DeviceType::execution_space execution_space;
+ typedef typename DeviceType::memory_space memory_space;
+ typedef typename execution_space::size_type size_type;
- typedef int value_type ;
+ typedef int value_type;
KOKKOS_INLINE_FUNCTION
- static void join( volatile value_type & update ,
+ static void join( volatile value_type & update,
const volatile value_type & input )
- { update |= input ; }
+ { update |= input; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
- { update = 0 ; }
-
-
- typedef Kokkos::
- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
+ { update = 0; }
- typedef Kokkos::
- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
- left_view left ;
- right_view right ;
- long left_alloc ;
- long right_alloc ;
+ left_view left;
+ right_view right;
+ long left_alloc;
+ long right_alloc;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
- TestViewOperator_LeftAndRight driver ;
+ TestViewOperator_LeftAndRight driver;
- int error_flag = 0 ;
+ int error_flag = 0;
- Kokkos::parallel_reduce( 1 , driver , error_flag );
+ Kokkos::parallel_reduce( 1, driver, error_flag );
- ASSERT_EQ( error_flag , 0 );
+ ASSERT_EQ( error_flag, 0 );
}
KOKKOS_INLINE_FUNCTION
- void operator()( const size_type , value_type & update ) const
+ void operator()( const size_type, value_type & update ) const
{
- long offset ;
-
- offset = -1 ;
- for ( unsigned i5 = 0 ; i5 < unsigned(left.dimension_5()) ; ++i5 )
- for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
- for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
+ long offset = -1;
+
+ for ( unsigned i5 = 0; i5 < unsigned( left.dimension_5() ); ++i5 )
+ for ( unsigned i4 = 0; i4 < unsigned( left.dimension_4() ); ++i4 )
+ for ( unsigned i3 = 0; i3 < unsigned( left.dimension_3() ); ++i3 )
+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4, i5 ) -
& left( 0, 0, 0, 0, 0, 0 );
- if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
- offset = j ;
+ if ( j <= offset || left_alloc <= j ) { update |= 1; }
+ offset = j;
}
- offset = -1 ;
- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
- for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
- for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
- for ( unsigned i5 = 0 ; i5 < unsigned(right.dimension_5()) ; ++i5 )
+ offset = -1;
+
+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
+ for ( unsigned i3 = 0; i3 < unsigned( right.dimension_3() ); ++i3 )
+ for ( unsigned i4 = 0; i4 < unsigned( right.dimension_4() ); ++i4 )
+ for ( unsigned i5 = 0; i5 < unsigned( right.dimension_5() ); ++i5 )
{
const long j = & right( i0, i1, i2, i3, i4, i5 ) -
& right( 0, 0, 0, 0, 0, 0 );
- if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
- offset = j ;
+ if ( j <= offset || right_alloc <= j ) { update |= 2; }
+ offset = j;
}
}
};
-template< class DataType , class DeviceType >
-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 5 >
+template< class DataType, class DeviceType >
+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 5 >
{
- typedef typename DeviceType::execution_space execution_space ;
- typedef typename DeviceType::memory_space memory_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef typename DeviceType::execution_space execution_space;
+ typedef typename DeviceType::memory_space memory_space;
+ typedef typename execution_space::size_type size_type;
- typedef int value_type ;
+ typedef int value_type;
KOKKOS_INLINE_FUNCTION
- static void join( volatile value_type & update ,
+ static void join( volatile value_type & update,
const volatile value_type & input )
- { update |= input ; }
+ { update |= input; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
- { update = 0 ; }
-
+ { update = 0; }
- typedef Kokkos::
- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutStride, execution_space > stride_view;
- typedef Kokkos::
- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
-
- typedef Kokkos::
- View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
-
- left_view left ;
- right_view right ;
- stride_view left_stride ;
- stride_view right_stride ;
- long left_alloc ;
- long right_alloc ;
+ left_view left;
+ right_view right;
+ stride_view left_stride;
+ stride_view right_stride;
+ long left_alloc;
+ long right_alloc;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
- TestViewOperator_LeftAndRight driver ;
+ TestViewOperator_LeftAndRight driver;
- int error_flag = 0 ;
+ int error_flag = 0;
- Kokkos::parallel_reduce( 1 , driver , error_flag );
+ Kokkos::parallel_reduce( 1, driver, error_flag );
- ASSERT_EQ( error_flag , 0 );
+ ASSERT_EQ( error_flag, 0 );
}
KOKKOS_INLINE_FUNCTION
- void operator()( const size_type , value_type & update ) const
+ void operator()( const size_type, value_type & update ) const
{
- long offset ;
-
- offset = -1 ;
- for ( unsigned i4 = 0 ; i4 < unsigned(left.dimension_4()) ; ++i4 )
- for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
+ long offset = -1;
+
+ for ( unsigned i4 = 0; i4 < unsigned( left.dimension_4() ); ++i4 )
+ for ( unsigned i3 = 0; i3 < unsigned( left.dimension_3() ); ++i3 )
+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
{
const long j = & left( i0, i1, i2, i3, i4 ) -
& left( 0, 0, 0, 0, 0 );
- if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
- offset = j ;
+ if ( j <= offset || left_alloc <= j ) { update |= 1; }
+ offset = j;
if ( & left( i0, i1, i2, i3, i4 ) !=
- & left_stride( i0, i1, i2, i3, i4 ) ) { update |= 4 ; }
+ & left_stride( i0, i1, i2, i3, i4 ) ) { update |= 4; }
}
- offset = -1 ;
- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
- for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
- for ( unsigned i4 = 0 ; i4 < unsigned(right.dimension_4()) ; ++i4 )
+ offset = -1;
+
+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
+ for ( unsigned i3 = 0; i3 < unsigned( right.dimension_3() ); ++i3 )
+ for ( unsigned i4 = 0; i4 < unsigned( right.dimension_4() ); ++i4 )
{
const long j = & right( i0, i1, i2, i3, i4 ) -
& right( 0, 0, 0, 0, 0 );
- if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
- offset = j ;
+ if ( j <= offset || right_alloc <= j ) { update |= 2; }
+ offset = j;
if ( & right( i0, i1, i2, i3, i4 ) !=
- & right_stride( i0, i1, i2, i3, i4 ) ) { update |= 8 ; }
+ & right_stride( i0, i1, i2, i3, i4 ) ) { update |= 8; }
}
}
};
-template< class DataType , class DeviceType >
-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 4 >
+template< class DataType, class DeviceType >
+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 4 >
{
- typedef typename DeviceType::execution_space execution_space ;
- typedef typename DeviceType::memory_space memory_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef typename DeviceType::execution_space execution_space;
+ typedef typename DeviceType::memory_space memory_space;
+ typedef typename execution_space::size_type size_type;
- typedef int value_type ;
+ typedef int value_type;
KOKKOS_INLINE_FUNCTION
- static void join( volatile value_type & update ,
+ static void join( volatile value_type & update,
const volatile value_type & input )
- { update |= input ; }
+ { update |= input; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
- { update = 0 ; }
-
+ { update = 0; }
- typedef Kokkos::
- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
- typedef Kokkos::
- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
-
- left_view left ;
- right_view right ;
- long left_alloc ;
- long right_alloc ;
+ left_view left;
+ right_view right;
+ long left_alloc;
+ long right_alloc;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
- TestViewOperator_LeftAndRight driver ;
+ TestViewOperator_LeftAndRight driver;
- int error_flag = 0 ;
+ int error_flag = 0;
- Kokkos::parallel_reduce( 1 , driver , error_flag );
+ Kokkos::parallel_reduce( 1, driver, error_flag );
- ASSERT_EQ( error_flag , 0 );
+ ASSERT_EQ( error_flag, 0 );
}
KOKKOS_INLINE_FUNCTION
- void operator()( const size_type , value_type & update ) const
+ void operator()( const size_type, value_type & update ) const
{
- long offset ;
+ long offset = -1;
- offset = -1 ;
- for ( unsigned i3 = 0 ; i3 < unsigned(left.dimension_3()) ; ++i3 )
- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
+ for ( unsigned i3 = 0; i3 < unsigned( left.dimension_3() ); ++i3 )
+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
{
const long j = & left( i0, i1, i2, i3 ) -
& left( 0, 0, 0, 0 );
- if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
- offset = j ;
+ if ( j <= offset || left_alloc <= j ) { update |= 1; }
+ offset = j;
}
- offset = -1 ;
- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
- for ( unsigned i3 = 0 ; i3 < unsigned(right.dimension_3()) ; ++i3 )
+ offset = -1;
+
+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
+ for ( unsigned i3 = 0; i3 < unsigned( right.dimension_3() ); ++i3 )
{
const long j = & right( i0, i1, i2, i3 ) -
& right( 0, 0, 0, 0 );
- if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
- offset = j ;
+ if ( j <= offset || right_alloc <= j ) { update |= 2; }
+ offset = j;
}
}
};
-template< class DataType , class DeviceType >
-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 3 >
+template< class DataType, class DeviceType >
+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 3 >
{
- typedef typename DeviceType::execution_space execution_space ;
- typedef typename DeviceType::memory_space memory_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef typename DeviceType::execution_space execution_space;
+ typedef typename DeviceType::memory_space memory_space;
+ typedef typename execution_space::size_type size_type;
- typedef int value_type ;
+ typedef int value_type;
KOKKOS_INLINE_FUNCTION
- static void join( volatile value_type & update ,
+ static void join( volatile value_type & update,
const volatile value_type & input )
- { update |= input ; }
+ { update |= input; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
- { update = 0 ; }
-
-
- typedef Kokkos::
- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
-
- typedef Kokkos::
- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
+ { update = 0; }
- typedef Kokkos::
- View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutStride, execution_space > stride_view;
- left_view left ;
- right_view right ;
- stride_view left_stride ;
- stride_view right_stride ;
- long left_alloc ;
- long right_alloc ;
+ left_view left;
+ right_view right;
+ stride_view left_stride;
+ stride_view right_stride;
+ long left_alloc;
+ long right_alloc;
TestViewOperator_LeftAndRight()
- : left( std::string("left") )
- , right( std::string("right") )
+ : left( std::string( "left" ) )
+ , right( std::string( "right" ) )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
- TestViewOperator_LeftAndRight driver ;
+ TestViewOperator_LeftAndRight driver;
- int error_flag = 0 ;
+ int error_flag = 0;
- Kokkos::parallel_reduce( 1 , driver , error_flag );
+ Kokkos::parallel_reduce( 1, driver, error_flag );
- ASSERT_EQ( error_flag , 0 );
+ ASSERT_EQ( error_flag, 0 );
}
KOKKOS_INLINE_FUNCTION
- void operator()( const size_type , value_type & update ) const
+ void operator()( const size_type, value_type & update ) const
{
- long offset ;
+ long offset = -1;
- offset = -1 ;
- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
{
const long j = & left( i0, i1, i2 ) -
& left( 0, 0, 0 );
- if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
- offset = j ;
+ if ( j <= offset || left_alloc <= j ) { update |= 1; }
+ offset = j;
- if ( & left(i0,i1,i2) != & left_stride(i0,i1,i2) ) { update |= 4 ; }
+ if ( & left( i0, i1, i2 ) != & left_stride( i0, i1, i2 ) ) { update |= 4; }
}
- offset = -1 ;
- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
- for ( unsigned i2 = 0 ; i2 < unsigned(right.dimension_2()) ; ++i2 )
+ offset = -1;
+
+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
+ for ( unsigned i2 = 0; i2 < unsigned( right.dimension_2() ); ++i2 )
{
const long j = & right( i0, i1, i2 ) -
& right( 0, 0, 0 );
- if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
- offset = j ;
+ if ( j <= offset || right_alloc <= j ) { update |= 2; }
+ offset = j;
- if ( & right(i0,i1,i2) != & right_stride(i0,i1,i2) ) { update |= 8 ; }
+ if ( & right( i0, i1, i2 ) != & right_stride( i0, i1, i2 ) ) { update |= 8; }
}
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
- for ( unsigned i2 = 0 ; i2 < unsigned(left.dimension_2()) ; ++i2 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
+ for ( unsigned i2 = 0; i2 < unsigned( left.dimension_2() ); ++i2 )
{
- if ( & left(i0,i1,i2) != & left(i0,i1,i2,0,0,0,0,0) ) { update |= 3 ; }
- if ( & right(i0,i1,i2) != & right(i0,i1,i2,0,0,0,0,0) ) { update |= 3 ; }
+ if ( & left( i0, i1, i2 ) != & left( i0, i1, i2, 0, 0, 0, 0, 0 ) ) { update |= 3; }
+ if ( & right( i0, i1, i2 ) != & right( i0, i1, i2, 0, 0, 0, 0, 0 ) ) { update |= 3; }
}
}
};
-template< class DataType , class DeviceType >
-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 2 >
+template< class DataType, class DeviceType >
+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 2 >
{
- typedef typename DeviceType::execution_space execution_space ;
- typedef typename DeviceType::memory_space memory_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef typename DeviceType::execution_space execution_space;
+ typedef typename DeviceType::memory_space memory_space;
+ typedef typename execution_space::size_type size_type;
- typedef int value_type ;
+ typedef int value_type;
KOKKOS_INLINE_FUNCTION
- static void join( volatile value_type & update ,
+ static void join( volatile value_type & update,
const volatile value_type & input )
- { update |= input ; }
+ { update |= input; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
- { update = 0 ; }
-
-
- typedef Kokkos::
- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
+ { update = 0; }
- typedef Kokkos::
- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
- left_view left ;
- right_view right ;
- long left_alloc ;
- long right_alloc ;
+ left_view left;
+ right_view right;
+ long left_alloc;
+ long right_alloc;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
- TestViewOperator_LeftAndRight driver ;
+ TestViewOperator_LeftAndRight driver;
- int error_flag = 0 ;
+ int error_flag = 0;
- Kokkos::parallel_reduce( 1 , driver , error_flag );
+ Kokkos::parallel_reduce( 1, driver, error_flag );
- ASSERT_EQ( error_flag , 0 );
+ ASSERT_EQ( error_flag, 0 );
}
KOKKOS_INLINE_FUNCTION
- void operator()( const size_type , value_type & update ) const
+ void operator()( const size_type, value_type & update ) const
{
- long offset ;
+ long offset = -1;
- offset = -1 ;
- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
{
const long j = & left( i0, i1 ) -
& left( 0, 0 );
- if ( j <= offset || left_alloc <= j ) { update |= 1 ; }
- offset = j ;
+ if ( j <= offset || left_alloc <= j ) { update |= 1; }
+ offset = j;
}
- offset = -1 ;
- for ( unsigned i0 = 0 ; i0 < unsigned(right.dimension_0()) ; ++i0 )
- for ( unsigned i1 = 0 ; i1 < unsigned(right.dimension_1()) ; ++i1 )
+ offset = -1;
+
+ for ( unsigned i0 = 0; i0 < unsigned( right.dimension_0() ); ++i0 )
+ for ( unsigned i1 = 0; i1 < unsigned( right.dimension_1() ); ++i1 )
{
const long j = & right( i0, i1 ) -
& right( 0, 0 );
- if ( j <= offset || right_alloc <= j ) { update |= 2 ; }
- offset = j ;
+ if ( j <= offset || right_alloc <= j ) { update |= 2; }
+ offset = j;
}
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
- for ( unsigned i1 = 0 ; i1 < unsigned(left.dimension_1()) ; ++i1 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
+ for ( unsigned i1 = 0; i1 < unsigned( left.dimension_1() ); ++i1 )
{
- if ( & left(i0,i1) != & left(i0,i1,0,0,0,0,0,0) ) { update |= 3 ; }
- if ( & right(i0,i1) != & right(i0,i1,0,0,0,0,0,0) ) { update |= 3 ; }
+ if ( & left( i0, i1 ) != & left( i0, i1, 0, 0, 0, 0, 0, 0 ) ) { update |= 3; }
+ if ( & right( i0, i1 ) != & right( i0, i1, 0, 0, 0, 0, 0, 0 ) ) { update |= 3; }
}
}
};
-template< class DataType , class DeviceType >
-struct TestViewOperator_LeftAndRight< DataType , DeviceType , 1 >
+template< class DataType, class DeviceType >
+struct TestViewOperator_LeftAndRight< DataType, DeviceType, 1 >
{
- typedef typename DeviceType::execution_space execution_space ;
- typedef typename DeviceType::memory_space memory_space ;
- typedef typename execution_space::size_type size_type ;
+ typedef typename DeviceType::execution_space execution_space;
+ typedef typename DeviceType::memory_space memory_space;
+ typedef typename execution_space::size_type size_type;
- typedef int value_type ;
+ typedef int value_type;
KOKKOS_INLINE_FUNCTION
- static void join( volatile value_type & update ,
+ static void join( volatile value_type & update,
const volatile value_type & input )
- { update |= input ; }
+ { update |= input; }
KOKKOS_INLINE_FUNCTION
static void init( value_type & update )
- { update = 0 ; }
-
+ { update = 0; }
- typedef Kokkos::
- View< DataType, Kokkos::LayoutLeft, execution_space > left_view ;
+ typedef Kokkos::View< DataType, Kokkos::LayoutLeft, execution_space > left_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutRight, execution_space > right_view;
+ typedef Kokkos::View< DataType, Kokkos::LayoutStride, execution_space > stride_view;
- typedef Kokkos::
- View< DataType, Kokkos::LayoutRight, execution_space > right_view ;
-
- typedef Kokkos::
- View< DataType, Kokkos::LayoutStride, execution_space > stride_view ;
-
- left_view left ;
- right_view right ;
- stride_view left_stride ;
- stride_view right_stride ;
- long left_alloc ;
- long right_alloc ;
+ left_view left;
+ right_view right;
+ stride_view left_stride;
+ stride_view right_stride;
+ long left_alloc;
+ long right_alloc;
TestViewOperator_LeftAndRight()
: left( "left" )
, right( "right" )
, left_stride( left )
, right_stride( right )
, left_alloc( allocation_count( left ) )
, right_alloc( allocation_count( right ) )
{}
static void testit()
{
- TestViewOperator_LeftAndRight driver ;
+ TestViewOperator_LeftAndRight driver;
- int error_flag = 0 ;
+ int error_flag = 0;
- Kokkos::parallel_reduce( 1 , driver , error_flag );
+ Kokkos::parallel_reduce( 1, driver, error_flag );
- ASSERT_EQ( error_flag , 0 );
+ ASSERT_EQ( error_flag, 0 );
}
KOKKOS_INLINE_FUNCTION
- void operator()( const size_type , value_type & update ) const
+ void operator()( const size_type, value_type & update ) const
{
- for ( unsigned i0 = 0 ; i0 < unsigned(left.dimension_0()) ; ++i0 )
+ for ( unsigned i0 = 0; i0 < unsigned( left.dimension_0() ); ++i0 )
{
- if ( & left(i0) != & left(i0,0,0,0,0,0,0,0) ) { update |= 3 ; }
- if ( & right(i0) != & right(i0,0,0,0,0,0,0,0) ) { update |= 3 ; }
- if ( & left(i0) != & left_stride(i0) ) { update |= 4 ; }
- if ( & right(i0) != & right_stride(i0) ) { update |= 8 ; }
+ if ( & left( i0 ) != & left( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update |= 3; }
+ if ( & right( i0 ) != & right( i0, 0, 0, 0, 0, 0, 0, 0 ) ) { update |= 3; }
+ if ( & left( i0 ) != & left_stride( i0 ) ) { update |= 4; }
+ if ( & right( i0 ) != & right_stride( i0 ) ) { update |= 8; }
}
}
};
-template<class Layout, class DeviceType>
-struct TestViewMirror {
-
- template<class MemoryTraits>
+template< class Layout, class DeviceType >
+struct TestViewMirror
+{
+ template< class MemoryTraits >
void static test_mirror() {
- Kokkos::View<double*, Layout, Kokkos::HostSpace> a_org("A",1000);
- Kokkos::View<double*, Layout, Kokkos::HostSpace, MemoryTraits> a_h = a_org;
- auto a_h2 = Kokkos::create_mirror(Kokkos::HostSpace(),a_h);
- auto a_d = Kokkos::create_mirror(DeviceType(),a_h);
-
- int equal_ptr_h_h2 = (a_h.data() ==a_h2.data())?1:0;
- int equal_ptr_h_d = (a_h.data() ==a_d. data())?1:0;
- int equal_ptr_h2_d = (a_h2.data()==a_d. data())?1:0;
-
- ASSERT_EQ(equal_ptr_h_h2,0);
- ASSERT_EQ(equal_ptr_h_d ,0);
- ASSERT_EQ(equal_ptr_h2_d,0);
-
-
- ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
- ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
- }
+ Kokkos::View< double*, Layout, Kokkos::HostSpace > a_org( "A", 1000 );
+ Kokkos::View< double*, Layout, Kokkos::HostSpace, MemoryTraits > a_h = a_org;
+ auto a_h2 = Kokkos::create_mirror( Kokkos::HostSpace(), a_h );
+ auto a_d = Kokkos::create_mirror( DeviceType(), a_h );
+ int equal_ptr_h_h2 = ( a_h.data() == a_h2.data() ) ? 1 : 0;
+ int equal_ptr_h_d = ( a_h.data() == a_d.data() ) ? 1 : 0;
+ int equal_ptr_h2_d = ( a_h2.data() == a_d.data() ) ? 1 : 0;
- template<class MemoryTraits>
- void static test_mirror_view() {
- Kokkos::View<double*, Layout, Kokkos::HostSpace> a_org("A",1000);
- Kokkos::View<double*, Layout, Kokkos::HostSpace, MemoryTraits> a_h = a_org;
- auto a_h2 = Kokkos::create_mirror_view(Kokkos::HostSpace(),a_h);
- auto a_d = Kokkos::create_mirror_view(DeviceType(),a_h);
-
- int equal_ptr_h_h2 = a_h.data() ==a_h2.data()?1:0;
- int equal_ptr_h_d = a_h.data() ==a_d. data()?1:0;
- int equal_ptr_h2_d = a_h2.data()==a_d. data()?1:0;
-
- int is_same_memspace = std::is_same<Kokkos::HostSpace,typename DeviceType::memory_space>::value?1:0;
- ASSERT_EQ(equal_ptr_h_h2,1);
- ASSERT_EQ(equal_ptr_h_d ,is_same_memspace);
- ASSERT_EQ(equal_ptr_h2_d ,is_same_memspace);
+ ASSERT_EQ( equal_ptr_h_h2, 0 );
+ ASSERT_EQ( equal_ptr_h_d, 0 );
+ ASSERT_EQ( equal_ptr_h2_d, 0 );
+ ASSERT_EQ( a_h.dimension_0(), a_h2.dimension_0() );
+ ASSERT_EQ( a_h.dimension_0(), a_d .dimension_0() );
+ }
- ASSERT_EQ(a_h.dimension_0(),a_h2.dimension_0());
- ASSERT_EQ(a_h.dimension_0(),a_d .dimension_0());
- }
+ template< class MemoryTraits >
+ void static test_mirror_view() {
+ Kokkos::View< double*, Layout, Kokkos::HostSpace > a_org( "A", 1000 );
+ Kokkos::View< double*, Layout, Kokkos::HostSpace, MemoryTraits > a_h = a_org;
+ auto a_h2 = Kokkos::create_mirror_view( Kokkos::HostSpace(), a_h );
+ auto a_d = Kokkos::create_mirror_view( DeviceType(), a_h );
+
+ int equal_ptr_h_h2 = a_h.data() == a_h2.data() ? 1 : 0;
+ int equal_ptr_h_d = a_h.data() == a_d.data() ? 1 : 0;
+ int equal_ptr_h2_d = a_h2.data() == a_d.data() ? 1 : 0;
+
+ int is_same_memspace = std::is_same< Kokkos::HostSpace, typename DeviceType::memory_space >::value ? 1 : 0;
+ ASSERT_EQ( equal_ptr_h_h2, 1 );
+ ASSERT_EQ( equal_ptr_h_d, is_same_memspace );
+ ASSERT_EQ( equal_ptr_h2_d, is_same_memspace );
+
+ ASSERT_EQ( a_h.dimension_0(), a_h2.dimension_0() );
+ ASSERT_EQ( a_h.dimension_0(), a_d .dimension_0() );
+ }
void static testit() {
- test_mirror<Kokkos::MemoryTraits<0>>();
- test_mirror<Kokkos::MemoryTraits<Kokkos::Unmanaged>>();
- test_mirror_view<Kokkos::MemoryTraits<0>>();
- test_mirror_view<Kokkos::MemoryTraits<Kokkos::Unmanaged>>();
+ test_mirror< Kokkos::MemoryTraits<0> >();
+ test_mirror< Kokkos::MemoryTraits<Kokkos::Unmanaged> >();
+ test_mirror_view< Kokkos::MemoryTraits<0> >();
+ test_mirror_view< Kokkos::MemoryTraits<Kokkos::Unmanaged> >();
}
};
/*--------------------------------------------------------------------------*/
template< typename T, class DeviceType >
class TestViewAPI
{
public:
- typedef DeviceType device ;
+ typedef DeviceType device;
- enum { N0 = 1000 ,
- N1 = 3 ,
- N2 = 5 ,
+ enum { N0 = 1000,
+ N1 = 3,
+ N2 = 5,
N3 = 7 };
- typedef Kokkos::View< T , device > dView0 ;
- typedef Kokkos::View< T* , device > dView1 ;
- typedef Kokkos::View< T*[N1] , device > dView2 ;
- typedef Kokkos::View< T*[N1][N2] , device > dView3 ;
- typedef Kokkos::View< T*[N1][N2][N3] , device > dView4 ;
- typedef Kokkos::View< const T*[N1][N2][N3] , device > const_dView4 ;
-
- typedef Kokkos::View< T****, device, Kokkos::MemoryUnmanaged > dView4_unmanaged ;
-
- typedef typename dView0::host_mirror_space host ;
+ typedef Kokkos::View< T, device > dView0;
+ typedef Kokkos::View< T*, device > dView1;
+ typedef Kokkos::View< T*[N1], device > dView2;
+ typedef Kokkos::View< T*[N1][N2], device > dView3;
+ typedef Kokkos::View< T*[N1][N2][N3], device > dView4;
+ typedef Kokkos::View< const T*[N1][N2][N3], device > const_dView4;
+ typedef Kokkos::View< T****, device, Kokkos::MemoryUnmanaged > dView4_unmanaged;
+ typedef typename dView0::host_mirror_space host;
TestViewAPI()
{
run_test_mirror();
run_test();
run_test_scalar();
run_test_const();
run_test_subview();
run_test_subview_strided();
run_test_vector();
- TestViewOperator< T , device >::testit();
- TestViewOperator_LeftAndRight< int[2][3][4][2][3][4][2][3] , device >::testit();
- TestViewOperator_LeftAndRight< int[2][3][4][2][3][4][2] , device >::testit();
- TestViewOperator_LeftAndRight< int[2][3][4][2][3][4] , device >::testit();
- TestViewOperator_LeftAndRight< int[2][3][4][2][3] , device >::testit();
- TestViewOperator_LeftAndRight< int[2][3][4][2] , device >::testit();
- TestViewOperator_LeftAndRight< int[2][3][4] , device >::testit();
- TestViewOperator_LeftAndRight< int[2][3] , device >::testit();
- TestViewOperator_LeftAndRight< int[2] , device >::testit();
- TestViewMirror<Kokkos::LayoutLeft, device >::testit();
- TestViewMirror<Kokkos::LayoutRight, device >::testit();
-
+ TestViewOperator< T, device >::testit();
+ TestViewOperator_LeftAndRight< int[2][3][4][2][3][4][2][3], device >::testit();
+ TestViewOperator_LeftAndRight< int[2][3][4][2][3][4][2], device >::testit();
+ TestViewOperator_LeftAndRight< int[2][3][4][2][3][4], device >::testit();
+ TestViewOperator_LeftAndRight< int[2][3][4][2][3], device >::testit();
+ TestViewOperator_LeftAndRight< int[2][3][4][2], device >::testit();
+ TestViewOperator_LeftAndRight< int[2][3][4], device >::testit();
+ TestViewOperator_LeftAndRight< int[2][3], device >::testit();
+ TestViewOperator_LeftAndRight< int[2], device >::testit();
+ TestViewMirror< Kokkos::LayoutLeft, device >::testit();
+ TestViewMirror< Kokkos::LayoutRight, device >::testit();
}
static void run_test_mirror()
{
- typedef Kokkos::View< int , host > view_type ;
- typedef typename view_type::HostMirror mirror_type ;
+ typedef Kokkos::View< int, host > view_type;
+ typedef typename view_type::HostMirror mirror_type;
- static_assert( std::is_same< typename view_type::memory_space
- , typename mirror_type::memory_space
- >::value , "" );
+ static_assert( std::is_same< typename view_type::memory_space, typename mirror_type::memory_space >::value, "" );
- view_type a("a");
- mirror_type am = Kokkos::create_mirror_view(a);
- mirror_type ax = Kokkos::create_mirror(a);
- ASSERT_EQ( & a() , & am() );
+ view_type a( "a" );
+ mirror_type am = Kokkos::create_mirror_view( a );
+ mirror_type ax = Kokkos::create_mirror( a );
+ ASSERT_EQ( & a(), & am() );
}
static void run_test_scalar()
{
- typedef typename dView0::HostMirror hView0 ;
+ typedef typename dView0::HostMirror hView0;
- dView0 dx , dy ;
- hView0 hx , hy ;
+ dView0 dx, dy;
+ hView0 hx, hy;
dx = dView0( "dx" );
dy = dView0( "dy" );
hx = Kokkos::create_mirror( dx );
hy = Kokkos::create_mirror( dy );
- hx() = 1 ;
+ hx() = 1;
- Kokkos::deep_copy( dx , hx );
- Kokkos::deep_copy( dy , dx );
- Kokkos::deep_copy( hy , dy );
+ Kokkos::deep_copy( dx, hx );
+ Kokkos::deep_copy( dy, dx );
+ Kokkos::deep_copy( hy, dy );
ASSERT_EQ( hx(), hy() );
}
static void run_test()
{
// mfh 14 Feb 2014: This test doesn't actually create instances of
// these types. In order to avoid "declared but unused typedef"
// warnings, we declare empty instances of these types, with the
// usual "(void)" marker to avoid compiler warnings for unused
// variables.
- typedef typename dView0::HostMirror hView0 ;
- typedef typename dView1::HostMirror hView1 ;
- typedef typename dView2::HostMirror hView2 ;
- typedef typename dView3::HostMirror hView3 ;
- typedef typename dView4::HostMirror hView4 ;
+ typedef typename dView0::HostMirror hView0;
+ typedef typename dView1::HostMirror hView1;
+ typedef typename dView2::HostMirror hView2;
+ typedef typename dView3::HostMirror hView3;
+ typedef typename dView4::HostMirror hView4;
{
hView0 thing;
(void) thing;
}
{
hView1 thing;
(void) thing;
}
{
hView2 thing;
(void) thing;
}
{
hView3 thing;
(void) thing;
}
{
hView4 thing;
(void) thing;
}
- dView4 dx , dy , dz ;
- hView4 hx , hy , hz ;
+ dView4 dx, dy, dz;
+ hView4 hx, hy, hz;
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_TRUE( dy.ptr_on_device() == 0 );
ASSERT_TRUE( dz.ptr_on_device() == 0 );
ASSERT_TRUE( hx.ptr_on_device() == 0 );
ASSERT_TRUE( hy.ptr_on_device() == 0 );
ASSERT_TRUE( hz.ptr_on_device() == 0 );
- ASSERT_EQ( dx.dimension_0() , 0u );
- ASSERT_EQ( dy.dimension_0() , 0u );
- ASSERT_EQ( dz.dimension_0() , 0u );
- ASSERT_EQ( hx.dimension_0() , 0u );
- ASSERT_EQ( hy.dimension_0() , 0u );
- ASSERT_EQ( hz.dimension_0() , 0u );
- ASSERT_EQ( dx.dimension_1() , unsigned(N1) );
- ASSERT_EQ( dy.dimension_1() , unsigned(N1) );
- ASSERT_EQ( dz.dimension_1() , unsigned(N1) );
- ASSERT_EQ( hx.dimension_1() , unsigned(N1) );
- ASSERT_EQ( hy.dimension_1() , unsigned(N1) );
- ASSERT_EQ( hz.dimension_1() , unsigned(N1) );
-
- dx = dView4( "dx" , N0 );
- dy = dView4( "dy" , N0 );
-
- ASSERT_EQ( dx.use_count() , size_t(1) );
+ ASSERT_EQ( dx.dimension_0(), 0u );
+ ASSERT_EQ( dy.dimension_0(), 0u );
+ ASSERT_EQ( dz.dimension_0(), 0u );
+ ASSERT_EQ( hx.dimension_0(), 0u );
+ ASSERT_EQ( hy.dimension_0(), 0u );
+ ASSERT_EQ( hz.dimension_0(), 0u );
+ ASSERT_EQ( dx.dimension_1(), unsigned( N1 ) );
+ ASSERT_EQ( dy.dimension_1(), unsigned( N1 ) );
+ ASSERT_EQ( dz.dimension_1(), unsigned( N1 ) );
+ ASSERT_EQ( hx.dimension_1(), unsigned( N1 ) );
+ ASSERT_EQ( hy.dimension_1(), unsigned( N1 ) );
+ ASSERT_EQ( hz.dimension_1(), unsigned( N1 ) );
+
+ dx = dView4( "dx", N0 );
+ dy = dView4( "dy", N0 );
+
+ ASSERT_EQ( dx.use_count(), size_t( 1 ) );
dView4_unmanaged unmanaged_dx = dx;
- ASSERT_EQ( dx.use_count() , size_t(1) );
+ ASSERT_EQ( dx.use_count(), size_t( 1 ) );
- dView4_unmanaged unmanaged_from_ptr_dx = dView4_unmanaged(dx.ptr_on_device(),
- dx.dimension_0(),
- dx.dimension_1(),
- dx.dimension_2(),
- dx.dimension_3());
+ dView4_unmanaged unmanaged_from_ptr_dx = dView4_unmanaged( dx.ptr_on_device(),
+ dx.dimension_0(),
+ dx.dimension_1(),
+ dx.dimension_2(),
+ dx.dimension_3() );
{
- // Destruction of this view should be harmless
- const_dView4 unmanaged_from_ptr_const_dx( dx.ptr_on_device() ,
- dx.dimension_0() ,
- dx.dimension_1() ,
- dx.dimension_2() ,
+ // Destruction of this view should be harmless.
+ const_dView4 unmanaged_from_ptr_const_dx( dx.ptr_on_device(),
+ dx.dimension_0(),
+ dx.dimension_1(),
+ dx.dimension_2(),
dx.dimension_3() );
}
- const_dView4 const_dx = dx ;
- ASSERT_EQ( dx.use_count() , size_t(2) );
+ const_dView4 const_dx = dx;
+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );
{
const_dView4 const_dx2;
const_dx2 = const_dx;
- ASSERT_EQ( dx.use_count() , size_t(3) );
+ ASSERT_EQ( dx.use_count(), size_t( 3 ) );
const_dx2 = dy;
- ASSERT_EQ( dx.use_count() , size_t(2) );
+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );
- const_dView4 const_dx3(dx);
- ASSERT_EQ( dx.use_count() , size_t(3) );
-
- dView4_unmanaged dx4_unmanaged(dx);
- ASSERT_EQ( dx.use_count() , size_t(3) );
- }
+ const_dView4 const_dx3( dx );
+ ASSERT_EQ( dx.use_count(), size_t( 3 ) );
- ASSERT_EQ( dx.use_count() , size_t(2) );
+ dView4_unmanaged dx4_unmanaged( dx );
+ ASSERT_EQ( dx.use_count(), size_t( 3 ) );
+ }
+ ASSERT_EQ( dx.use_count(), size_t( 2 ) );
ASSERT_FALSE( dx.ptr_on_device() == 0 );
ASSERT_FALSE( const_dx.ptr_on_device() == 0 );
ASSERT_FALSE( unmanaged_dx.ptr_on_device() == 0 );
ASSERT_FALSE( unmanaged_from_ptr_dx.ptr_on_device() == 0 );
ASSERT_FALSE( dy.ptr_on_device() == 0 );
- ASSERT_NE( dx , dy );
+ ASSERT_NE( dx, dy );
- ASSERT_EQ( dx.dimension_0() , unsigned(N0) );
- ASSERT_EQ( dx.dimension_1() , unsigned(N1) );
- ASSERT_EQ( dx.dimension_2() , unsigned(N2) );
- ASSERT_EQ( dx.dimension_3() , unsigned(N3) );
+ ASSERT_EQ( dx.dimension_0(), unsigned( N0 ) );
+ ASSERT_EQ( dx.dimension_1(), unsigned( N1 ) );
+ ASSERT_EQ( dx.dimension_2(), unsigned( N2 ) );
+ ASSERT_EQ( dx.dimension_3(), unsigned( N3 ) );
- ASSERT_EQ( dy.dimension_0() , unsigned(N0) );
- ASSERT_EQ( dy.dimension_1() , unsigned(N1) );
- ASSERT_EQ( dy.dimension_2() , unsigned(N2) );
- ASSERT_EQ( dy.dimension_3() , unsigned(N3) );
+ ASSERT_EQ( dy.dimension_0(), unsigned( N0 ) );
+ ASSERT_EQ( dy.dimension_1(), unsigned( N1 ) );
+ ASSERT_EQ( dy.dimension_2(), unsigned( N2 ) );
+ ASSERT_EQ( dy.dimension_3(), unsigned( N3 ) );
- ASSERT_EQ( unmanaged_from_ptr_dx.capacity(),unsigned(N0)*unsigned(N1)*unsigned(N2)*unsigned(N3) );
+ ASSERT_EQ( unmanaged_from_ptr_dx.capacity(), unsigned( N0 ) * unsigned( N1 ) * unsigned( N2 ) * unsigned( N3 ) );
hx = Kokkos::create_mirror( dx );
hy = Kokkos::create_mirror( dy );
- // T v1 = hx() ; // Generates compile error as intended
- // T v2 = hx(0,0) ; // Generates compile error as intended
- // hx(0,0) = v2 ; // Generates compile error as intended
+ // T v1 = hx(); // Generates compile error as intended.
+ // T v2 = hx( 0, 0 ); // Generates compile error as intended.
+ // hx( 0, 0 ) = v2; // Generates compile error as intended.
// Testing with asynchronous deep copy with respect to device
{
- size_t count = 0 ;
- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
- for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
- for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
- for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
- hx(ip,i1,i2,i3) = ++count ;
- }}}}
-
-
- Kokkos::deep_copy(typename hView4::execution_space(), dx , hx );
- Kokkos::deep_copy(typename hView4::execution_space(), dy , dx );
- Kokkos::deep_copy(typename hView4::execution_space(), hy , dy );
-
- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- { ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
- }}}}
-
- Kokkos::deep_copy(typename hView4::execution_space(), dx , T(0) );
- Kokkos::deep_copy(typename hView4::execution_space(), hx , dx );
-
- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- { ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
- }}}}
+ size_t count = 0;
+
+ for ( size_t ip = 0; ip < N0; ++ip )
+ for ( size_t i1 = 0; i1 < hx.dimension_1(); ++i1 )
+ for ( size_t i2 = 0; i2 < hx.dimension_2(); ++i2 )
+ for ( size_t i3 = 0; i3 < hx.dimension_3(); ++i3 )
+ {
+ hx( ip, i1, i2, i3 ) = ++count;
+ }
+
+ Kokkos::deep_copy( typename hView4::execution_space(), dx, hx );
+ Kokkos::deep_copy( typename hView4::execution_space(), dy, dx );
+ Kokkos::deep_copy( typename hView4::execution_space(), hy, dy );
+
+ for ( size_t ip = 0; ip < N0; ++ip )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ {
+ ASSERT_EQ( hx( ip, i1, i2, i3 ), hy( ip, i1, i2, i3 ) );
+ }
+
+ Kokkos::deep_copy( typename hView4::execution_space(), dx, T( 0 ) );
+ Kokkos::deep_copy( typename hView4::execution_space(), hx, dx );
+
+ for ( size_t ip = 0; ip < N0; ++ip )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ {
+ ASSERT_EQ( hx( ip, i1, i2, i3 ), T( 0 ) );
+ }
}
- // Testing with asynchronous deep copy with respect to host
+ // Testing with asynchronous deep copy with respect to host.
{
- size_t count = 0 ;
- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
- for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
- for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
- for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
- hx(ip,i1,i2,i3) = ++count ;
- }}}}
-
- Kokkos::deep_copy(typename dView4::execution_space(), dx , hx );
- Kokkos::deep_copy(typename dView4::execution_space(), dy , dx );
- Kokkos::deep_copy(typename dView4::execution_space(), hy , dy );
-
- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- { ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
- }}}}
-
- Kokkos::deep_copy(typename dView4::execution_space(), dx , T(0) );
- Kokkos::deep_copy(typename dView4::execution_space(), hx , dx );
-
- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- { ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
- }}}}
+ size_t count = 0;
+
+ for ( size_t ip = 0; ip < N0; ++ip )
+ for ( size_t i1 = 0; i1 < hx.dimension_1(); ++i1 )
+ for ( size_t i2 = 0; i2 < hx.dimension_2(); ++i2 )
+ for ( size_t i3 = 0; i3 < hx.dimension_3(); ++i3 )
+ {
+ hx( ip, i1, i2, i3 ) = ++count;
+ }
+
+ Kokkos::deep_copy( typename dView4::execution_space(), dx, hx );
+ Kokkos::deep_copy( typename dView4::execution_space(), dy, dx );
+ Kokkos::deep_copy( typename dView4::execution_space(), hy, dy );
+
+ for ( size_t ip = 0; ip < N0; ++ip )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ {
+ ASSERT_EQ( hx( ip, i1, i2, i3 ), hy( ip, i1, i2, i3 ) );
+ }
+
+ Kokkos::deep_copy( typename dView4::execution_space(), dx, T( 0 ) );
+ Kokkos::deep_copy( typename dView4::execution_space(), hx, dx );
+
+ for ( size_t ip = 0; ip < N0; ++ip )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ {
+ ASSERT_EQ( hx( ip, i1, i2, i3 ), T( 0 ) );
+ }
}
- // Testing with synchronous deep copy
+ // Testing with synchronous deep copy.
{
- size_t count = 0 ;
- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
- for ( size_t i1 = 0 ; i1 < hx.dimension_1() ; ++i1 ) {
- for ( size_t i2 = 0 ; i2 < hx.dimension_2() ; ++i2 ) {
- for ( size_t i3 = 0 ; i3 < hx.dimension_3() ; ++i3 ) {
- hx(ip,i1,i2,i3) = ++count ;
- }}}}
-
- Kokkos::deep_copy( dx , hx );
- Kokkos::deep_copy( dy , dx );
- Kokkos::deep_copy( hy , dy );
-
- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- { ASSERT_EQ( hx(ip,i1,i2,i3) , hy(ip,i1,i2,i3) ); }
- }}}}
-
- Kokkos::deep_copy( dx , T(0) );
- Kokkos::deep_copy( hx , dx );
-
- for ( size_t ip = 0 ; ip < N0 ; ++ip ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- { ASSERT_EQ( hx(ip,i1,i2,i3) , T(0) ); }
- }}}}
+ size_t count = 0;
+
+ for ( size_t ip = 0; ip < N0; ++ip )
+ for ( size_t i1 = 0; i1 < hx.dimension_1(); ++i1 )
+ for ( size_t i2 = 0; i2 < hx.dimension_2(); ++i2 )
+ for ( size_t i3 = 0; i3 < hx.dimension_3(); ++i3 )
+ {
+ hx( ip, i1, i2, i3 ) = ++count;
+ }
+
+ Kokkos::deep_copy( dx, hx );
+ Kokkos::deep_copy( dy, dx );
+ Kokkos::deep_copy( hy, dy );
+
+ for ( size_t ip = 0; ip < N0; ++ip )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ {
+ ASSERT_EQ( hx( ip, i1, i2, i3 ), hy( ip, i1, i2, i3 ) );
+ }
+
+ Kokkos::deep_copy( dx, T( 0 ) );
+ Kokkos::deep_copy( hx, dx );
+
+ for ( size_t ip = 0; ip < N0; ++ip )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ {
+ ASSERT_EQ( hx( ip, i1, i2, i3 ), T( 0 ) );
+ }
}
- dz = dx ; ASSERT_EQ( dx, dz); ASSERT_NE( dy, dz);
- dz = dy ; ASSERT_EQ( dy, dz); ASSERT_NE( dx, dz);
+
+ dz = dx;
+ ASSERT_EQ( dx, dz );
+ ASSERT_NE( dy, dz );
+
+ dz = dy;
+ ASSERT_EQ( dy, dz );
+ ASSERT_NE( dx, dz );
dx = dView4();
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_FALSE( dy.ptr_on_device() == 0 );
ASSERT_FALSE( dz.ptr_on_device() == 0 );
+
dy = dView4();
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_TRUE( dy.ptr_on_device() == 0 );
ASSERT_FALSE( dz.ptr_on_device() == 0 );
+
dz = dView4();
ASSERT_TRUE( dx.ptr_on_device() == 0 );
ASSERT_TRUE( dy.ptr_on_device() == 0 );
ASSERT_TRUE( dz.ptr_on_device() == 0 );
}
- typedef T DataType[2] ;
+ typedef T DataType[2];
static void
check_auto_conversion_to_const(
- const Kokkos::View< const DataType , device > & arg_const ,
- const Kokkos::View< DataType , device > & arg )
+ const Kokkos::View< const DataType, device > & arg_const,
+ const Kokkos::View< DataType, device > & arg )
{
ASSERT_TRUE( arg_const == arg );
}
static void run_test_const()
{
- typedef Kokkos::View< DataType , device > typeX ;
- typedef Kokkos::View< const DataType , device > const_typeX ;
- typedef Kokkos::View< const DataType , device , Kokkos::MemoryRandomAccess > const_typeR ;
+ typedef Kokkos::View< DataType, device > typeX;
+ typedef Kokkos::View< const DataType, device > const_typeX;
+ typedef Kokkos::View< const DataType, device, Kokkos::MemoryRandomAccess > const_typeR;
+
typeX x( "X" );
- const_typeX xc = x ;
- const_typeR xr = x ;
+ const_typeX xc = x;
+ const_typeR xr = x;
ASSERT_TRUE( xc == x );
ASSERT_TRUE( x == xc );
// For CUDA the constant random access View does not return
// an lvalue reference due to retrieving through texture cache
// therefore not allowed to query the underlying pointer.
#if defined( KOKKOS_ENABLE_CUDA )
- if ( ! std::is_same< typename device::execution_space , Kokkos::Cuda >::value )
+ if ( !std::is_same< typename device::execution_space, Kokkos::Cuda >::value )
#endif
{
ASSERT_TRUE( x.ptr_on_device() == xr.ptr_on_device() );
}
- // typeX xf = xc ; // setting non-const from const must not compile
+ // typeX xf = xc; // Setting non-const from const must not compile.
- check_auto_conversion_to_const( x , x );
+ check_auto_conversion_to_const( x, x );
}
static void run_test_subview()
{
- typedef Kokkos::View< const T , device > sView ;
+ typedef Kokkos::View< const T, device > sView;
dView0 d0( "d0" );
- dView1 d1( "d1" , N0 );
- dView2 d2( "d2" , N0 );
- dView3 d3( "d3" , N0 );
- dView4 d4( "d4" , N0 );
-
- sView s0 = d0 ;
- sView s1 = Kokkos::subview( d1 , 1 );
- sView s2 = Kokkos::subview( d2 , 1 , 1 );
- sView s3 = Kokkos::subview( d3 , 1 , 1 , 1 );
- sView s4 = Kokkos::subview( d4 , 1 , 1 , 1 , 1 );
+ dView1 d1( "d1", N0 );
+ dView2 d2( "d2", N0 );
+ dView3 d3( "d3", N0 );
+ dView4 d4( "d4", N0 );
+
+ sView s0 = d0;
+ sView s1 = Kokkos::subview( d1, 1 );
+ sView s2 = Kokkos::subview( d2, 1, 1 );
+ sView s3 = Kokkos::subview( d3, 1, 1, 1 );
+ sView s4 = Kokkos::subview( d4, 1, 1, 1, 1 );
}
static void run_test_subview_strided()
{
- typedef Kokkos::View< int **** , Kokkos::LayoutLeft , host > view_left_4 ;
- typedef Kokkos::View< int **** , Kokkos::LayoutRight , host > view_right_4 ;
- typedef Kokkos::View< int ** , Kokkos::LayoutLeft , host > view_left_2 ;
- typedef Kokkos::View< int ** , Kokkos::LayoutRight , host > view_right_2 ;
-
- typedef Kokkos::View< int * , Kokkos::LayoutStride , host > view_stride_1 ;
- typedef Kokkos::View< int ** , Kokkos::LayoutStride , host > view_stride_2 ;
-
- view_left_2 xl2("xl2", 100 , 200 );
- view_right_2 xr2("xr2", 100 , 200 );
- view_stride_1 yl1 = Kokkos::subview( xl2 , 0 , Kokkos::ALL() );
- view_stride_1 yl2 = Kokkos::subview( xl2 , 1 , Kokkos::ALL() );
- view_stride_1 yr1 = Kokkos::subview( xr2 , 0 , Kokkos::ALL() );
- view_stride_1 yr2 = Kokkos::subview( xr2 , 1 , Kokkos::ALL() );
-
- ASSERT_EQ( yl1.dimension_0() , xl2.dimension_1() );
- ASSERT_EQ( yl2.dimension_0() , xl2.dimension_1() );
- ASSERT_EQ( yr1.dimension_0() , xr2.dimension_1() );
- ASSERT_EQ( yr2.dimension_0() , xr2.dimension_1() );
-
- ASSERT_EQ( & yl1(0) - & xl2(0,0) , 0 );
- ASSERT_EQ( & yl2(0) - & xl2(1,0) , 0 );
- ASSERT_EQ( & yr1(0) - & xr2(0,0) , 0 );
- ASSERT_EQ( & yr2(0) - & xr2(1,0) , 0 );
-
- view_left_4 xl4( "xl4" , 10 , 20 , 30 , 40 );
- view_right_4 xr4( "xr4" , 10 , 20 , 30 , 40 );
-
- view_stride_2 yl4 = Kokkos::subview( xl4 , 1 , Kokkos::ALL() , 2 , Kokkos::ALL() );
- view_stride_2 yr4 = Kokkos::subview( xr4 , 1 , Kokkos::ALL() , 2 , Kokkos::ALL() );
-
- ASSERT_EQ( yl4.dimension_0() , xl4.dimension_1() );
- ASSERT_EQ( yl4.dimension_1() , xl4.dimension_3() );
- ASSERT_EQ( yr4.dimension_0() , xr4.dimension_1() );
- ASSERT_EQ( yr4.dimension_1() , xr4.dimension_3() );
-
- ASSERT_EQ( & yl4(4,4) - & xl4(1,4,2,4) , 0 );
- ASSERT_EQ( & yr4(4,4) - & xr4(1,4,2,4) , 0 );
+ typedef Kokkos::View< int ****, Kokkos::LayoutLeft , host > view_left_4;
+ typedef Kokkos::View< int ****, Kokkos::LayoutRight, host > view_right_4;
+ typedef Kokkos::View< int ** , Kokkos::LayoutLeft , host > view_left_2;
+ typedef Kokkos::View< int ** , Kokkos::LayoutRight, host > view_right_2;
+
+ typedef Kokkos::View< int * , Kokkos::LayoutStride, host > view_stride_1;
+ typedef Kokkos::View< int **, Kokkos::LayoutStride, host > view_stride_2;
+
+ view_left_2 xl2( "xl2", 100, 200 );
+ view_right_2 xr2( "xr2", 100, 200 );
+ view_stride_1 yl1 = Kokkos::subview( xl2, 0, Kokkos::ALL() );
+ view_stride_1 yl2 = Kokkos::subview( xl2, 1, Kokkos::ALL() );
+ view_stride_1 yr1 = Kokkos::subview( xr2, 0, Kokkos::ALL() );
+ view_stride_1 yr2 = Kokkos::subview( xr2, 1, Kokkos::ALL() );
+
+ ASSERT_EQ( yl1.dimension_0(), xl2.dimension_1() );
+ ASSERT_EQ( yl2.dimension_0(), xl2.dimension_1() );
+ ASSERT_EQ( yr1.dimension_0(), xr2.dimension_1() );
+ ASSERT_EQ( yr2.dimension_0(), xr2.dimension_1() );
+
+ ASSERT_EQ( & yl1( 0 ) - & xl2( 0, 0 ), 0 );
+ ASSERT_EQ( & yl2( 0 ) - & xl2( 1, 0 ), 0 );
+ ASSERT_EQ( & yr1( 0 ) - & xr2( 0, 0 ), 0 );
+ ASSERT_EQ( & yr2( 0 ) - & xr2( 1, 0 ), 0 );
+
+ view_left_4 xl4( "xl4", 10, 20, 30, 40 );
+ view_right_4 xr4( "xr4", 10, 20, 30, 40 );
+
+ view_stride_2 yl4 = Kokkos::subview( xl4, 1, Kokkos::ALL(), 2, Kokkos::ALL() );
+ view_stride_2 yr4 = Kokkos::subview( xr4, 1, Kokkos::ALL(), 2, Kokkos::ALL() );
+
+ ASSERT_EQ( yl4.dimension_0(), xl4.dimension_1() );
+ ASSERT_EQ( yl4.dimension_1(), xl4.dimension_3() );
+ ASSERT_EQ( yr4.dimension_0(), xr4.dimension_1() );
+ ASSERT_EQ( yr4.dimension_1(), xr4.dimension_3() );
+
+ ASSERT_EQ( & yl4( 4, 4 ) - & xl4( 1, 4, 2, 4 ), 0 );
+ ASSERT_EQ( & yr4( 4, 4 ) - & xr4( 1, 4, 2, 4 ), 0 );
}
static void run_test_vector()
{
- static const unsigned Length = 1000 , Count = 8 ;
+ static const unsigned Length = 1000, Count = 8;
- typedef Kokkos::View< T* , Kokkos::LayoutLeft , host > vector_type ;
- typedef Kokkos::View< T** , Kokkos::LayoutLeft , host > multivector_type ;
+ typedef Kokkos::View< T*, Kokkos::LayoutLeft, host > vector_type;
+ typedef Kokkos::View< T**, Kokkos::LayoutLeft, host > multivector_type;
- typedef Kokkos::View< T* , Kokkos::LayoutRight , host > vector_right_type ;
- typedef Kokkos::View< T** , Kokkos::LayoutRight , host > multivector_right_type ;
+ typedef Kokkos::View< T*, Kokkos::LayoutRight, host > vector_right_type;
+ typedef Kokkos::View< T**, Kokkos::LayoutRight, host > multivector_right_type;
- typedef Kokkos::View< const T* , Kokkos::LayoutRight, host > const_vector_right_type ;
- typedef Kokkos::View< const T* , Kokkos::LayoutLeft , host > const_vector_type ;
- typedef Kokkos::View< const T** , Kokkos::LayoutLeft , host > const_multivector_type ;
+ typedef Kokkos::View< const T*, Kokkos::LayoutRight, host > const_vector_right_type;
+ typedef Kokkos::View< const T*, Kokkos::LayoutLeft, host > const_vector_type;
+ typedef Kokkos::View< const T**, Kokkos::LayoutLeft, host > const_multivector_type;
- multivector_type mv = multivector_type( "mv" , Length , Count );
- multivector_right_type mv_right = multivector_right_type( "mv" , Length , Count );
+ multivector_type mv = multivector_type( "mv", Length, Count );
+ multivector_right_type mv_right = multivector_right_type( "mv", Length, Count );
- vector_type v1 = Kokkos::subview( mv , Kokkos::ALL() , 0 );
- vector_type v2 = Kokkos::subview( mv , Kokkos::ALL() , 1 );
- vector_type v3 = Kokkos::subview( mv , Kokkos::ALL() , 2 );
+ vector_type v1 = Kokkos::subview( mv, Kokkos::ALL(), 0 );
+ vector_type v2 = Kokkos::subview( mv, Kokkos::ALL(), 1 );
+ vector_type v3 = Kokkos::subview( mv, Kokkos::ALL(), 2 );
- vector_type rv1 = Kokkos::subview( mv_right , 0 , Kokkos::ALL() );
- vector_type rv2 = Kokkos::subview( mv_right , 1 , Kokkos::ALL() );
- vector_type rv3 = Kokkos::subview( mv_right , 2 , Kokkos::ALL() );
+ vector_type rv1 = Kokkos::subview( mv_right, 0, Kokkos::ALL() );
+ vector_type rv2 = Kokkos::subview( mv_right, 1, Kokkos::ALL() );
+ vector_type rv3 = Kokkos::subview( mv_right, 2, Kokkos::ALL() );
- multivector_type mv1 = Kokkos::subview( mv , std::make_pair( 1 , 998 ) ,
- std::make_pair( 2 , 5 ) );
+ multivector_type mv1 = Kokkos::subview( mv, std::make_pair( 1, 998 ),
+ std::make_pair( 2, 5 ) );
- multivector_right_type mvr1 =
- Kokkos::subview( mv_right ,
- std::make_pair( 1 , 998 ) ,
- std::make_pair( 2 , 5 ) );
+ multivector_right_type mvr1 = Kokkos::subview( mv_right, std::make_pair( 1, 998 ),
+ std::make_pair( 2, 5 ) );
- const_vector_type cv1 = Kokkos::subview( mv , Kokkos::ALL(), 0 );
- const_vector_type cv2 = Kokkos::subview( mv , Kokkos::ALL(), 1 );
- const_vector_type cv3 = Kokkos::subview( mv , Kokkos::ALL(), 2 );
+ const_vector_type cv1 = Kokkos::subview( mv, Kokkos::ALL(), 0 );
+ const_vector_type cv2 = Kokkos::subview( mv, Kokkos::ALL(), 1 );
+ const_vector_type cv3 = Kokkos::subview( mv, Kokkos::ALL(), 2 );
- vector_right_type vr1 = Kokkos::subview( mv , Kokkos::ALL() , 0 );
- vector_right_type vr2 = Kokkos::subview( mv , Kokkos::ALL() , 1 );
- vector_right_type vr3 = Kokkos::subview( mv , Kokkos::ALL() , 2 );
+ vector_right_type vr1 = Kokkos::subview( mv, Kokkos::ALL(), 0 );
+ vector_right_type vr2 = Kokkos::subview( mv, Kokkos::ALL(), 1 );
+ vector_right_type vr3 = Kokkos::subview( mv, Kokkos::ALL(), 2 );
- const_vector_right_type cvr1 = Kokkos::subview( mv , Kokkos::ALL() , 0 );
- const_vector_right_type cvr2 = Kokkos::subview( mv , Kokkos::ALL() , 1 );
- const_vector_right_type cvr3 = Kokkos::subview( mv , Kokkos::ALL() , 2 );
+ const_vector_right_type cvr1 = Kokkos::subview( mv, Kokkos::ALL(), 0 );
+ const_vector_right_type cvr2 = Kokkos::subview( mv, Kokkos::ALL(), 1 );
+ const_vector_right_type cvr3 = Kokkos::subview( mv, Kokkos::ALL(), 2 );
- ASSERT_TRUE( & v1[0] == & v1(0) );
- ASSERT_TRUE( & v1[0] == & mv(0,0) );
- ASSERT_TRUE( & v2[0] == & mv(0,1) );
- ASSERT_TRUE( & v3[0] == & mv(0,2) );
+ ASSERT_TRUE( & v1[0] == & v1( 0 ) );
+ ASSERT_TRUE( & v1[0] == & mv( 0, 0 ) );
+ ASSERT_TRUE( & v2[0] == & mv( 0, 1 ) );
+ ASSERT_TRUE( & v3[0] == & mv( 0, 2 ) );
- ASSERT_TRUE( & cv1[0] == & mv(0,0) );
- ASSERT_TRUE( & cv2[0] == & mv(0,1) );
- ASSERT_TRUE( & cv3[0] == & mv(0,2) );
+ ASSERT_TRUE( & cv1[0] == & mv( 0, 0 ) );
+ ASSERT_TRUE( & cv2[0] == & mv( 0, 1 ) );
+ ASSERT_TRUE( & cv3[0] == & mv( 0, 2 ) );
- ASSERT_TRUE( & vr1[0] == & mv(0,0) );
- ASSERT_TRUE( & vr2[0] == & mv(0,1) );
- ASSERT_TRUE( & vr3[0] == & mv(0,2) );
+ ASSERT_TRUE( & vr1[0] == & mv( 0, 0 ) );
+ ASSERT_TRUE( & vr2[0] == & mv( 0, 1 ) );
+ ASSERT_TRUE( & vr3[0] == & mv( 0, 2 ) );
- ASSERT_TRUE( & cvr1[0] == & mv(0,0) );
- ASSERT_TRUE( & cvr2[0] == & mv(0,1) );
- ASSERT_TRUE( & cvr3[0] == & mv(0,2) );
+ ASSERT_TRUE( & cvr1[0] == & mv( 0, 0 ) );
+ ASSERT_TRUE( & cvr2[0] == & mv( 0, 1 ) );
+ ASSERT_TRUE( & cvr3[0] == & mv( 0, 2 ) );
- ASSERT_TRUE( & mv1(0,0) == & mv( 1 , 2 ) );
- ASSERT_TRUE( & mv1(1,1) == & mv( 2 , 3 ) );
- ASSERT_TRUE( & mv1(3,2) == & mv( 4 , 4 ) );
- ASSERT_TRUE( & mvr1(0,0) == & mv_right( 1 , 2 ) );
- ASSERT_TRUE( & mvr1(1,1) == & mv_right( 2 , 3 ) );
- ASSERT_TRUE( & mvr1(3,2) == & mv_right( 4 , 4 ) );
+ ASSERT_TRUE( & mv1( 0, 0 ) == & mv( 1, 2 ) );
+ ASSERT_TRUE( & mv1( 1, 1 ) == & mv( 2, 3 ) );
+ ASSERT_TRUE( & mv1( 3, 2 ) == & mv( 4, 4 ) );
+ ASSERT_TRUE( & mvr1( 0, 0 ) == & mv_right( 1, 2 ) );
+ ASSERT_TRUE( & mvr1( 1, 1 ) == & mv_right( 2, 3 ) );
+ ASSERT_TRUE( & mvr1( 3, 2 ) == & mv_right( 4, 4 ) );
const_vector_type c_cv1( v1 );
typename vector_type::const_type c_cv2( v2 );
typename const_vector_type::const_type c_ccv2( v2 );
const_multivector_type cmv( mv );
typename multivector_type::const_type cmvX( cmv );
typename const_multivector_type::const_type ccmvX( cmv );
}
};
} // namespace Test
-
-/*--------------------------------------------------------------------------*/
-
diff --git a/lib/kokkos/core/unit_test/TestViewMapping.hpp b/lib/kokkos/core/unit_test/TestViewMapping.hpp
index 324f02e94..71604bed5 100644
--- a/lib/kokkos/core/unit_test/TestViewMapping.hpp
+++ b/lib/kokkos/core/unit_test/TestViewMapping.hpp
@@ -1,1437 +1,1463 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <stdexcept>
#include <sstream>
#include <iostream>
#include <Kokkos_Core.hpp>
-/*--------------------------------------------------------------------------*/
-
namespace Test {
template< class Space >
void test_view_mapping()
{
- typedef typename Space::execution_space ExecSpace ;
-
- typedef Kokkos::Experimental::Impl::ViewDimension<> dim_0 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<2> dim_s2 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<2,3> dim_s2_s3 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<2,3,4> dim_s2_s3_s4 ;
-
- typedef Kokkos::Experimental::Impl::ViewDimension<0> dim_s0 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<0,3> dim_s0_s3 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<0,3,4> dim_s0_s3_s4 ;
-
- typedef Kokkos::Experimental::Impl::ViewDimension<0,0> dim_s0_s0 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,4> dim_s0_s0_s4 ;
-
- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0> dim_s0_s0_s0 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0> dim_s0_s0_s0_s0 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0> dim_s0_s0_s0_s0_s0 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0,0> dim_s0_s0_s0_s0_s0_s0 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0,0,0> dim_s0_s0_s0_s0_s0_s0_s0 ;
- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0,0,0,0,0,0> dim_s0_s0_s0_s0_s0_s0_s0_s0 ;
-
- // Fully static dimensions should not be larger than an int
- ASSERT_LE( sizeof(dim_0) , sizeof(int) );
- ASSERT_LE( sizeof(dim_s2) , sizeof(int) );
- ASSERT_LE( sizeof(dim_s2_s3) , sizeof(int) );
- ASSERT_LE( sizeof(dim_s2_s3_s4) , sizeof(int) );
-
- // Rank 1 is size_t
- ASSERT_EQ( sizeof(dim_s0) , sizeof(size_t) );
- ASSERT_EQ( sizeof(dim_s0_s3) , sizeof(size_t) );
- ASSERT_EQ( sizeof(dim_s0_s3_s4) , sizeof(size_t) );
-
- // Allow for padding
- ASSERT_LE( sizeof(dim_s0_s0) , 2 * sizeof(size_t) );
- ASSERT_LE( sizeof(dim_s0_s0_s4) , 2 * sizeof(size_t) );
-
- ASSERT_LE( sizeof(dim_s0_s0_s0) , 4 * sizeof(size_t) );
- ASSERT_EQ( sizeof(dim_s0_s0_s0_s0) , 4 * sizeof(unsigned) );
- ASSERT_LE( sizeof(dim_s0_s0_s0_s0_s0) , 6 * sizeof(unsigned) );
- ASSERT_EQ( sizeof(dim_s0_s0_s0_s0_s0_s0) , 6 * sizeof(unsigned) );
- ASSERT_LE( sizeof(dim_s0_s0_s0_s0_s0_s0_s0) , 8 * sizeof(unsigned) );
- ASSERT_EQ( sizeof(dim_s0_s0_s0_s0_s0_s0_s0_s0) , 8 * sizeof(unsigned) );
-
- static_assert( int(dim_0::rank) == int(0) , "" );
- static_assert( int(dim_0::rank_dynamic) == int(0) , "" );
- static_assert( int(dim_0::ArgN0) == 1 , "" );
- static_assert( int(dim_0::ArgN1) == 1 , "" );
- static_assert( int(dim_0::ArgN2) == 1 , "" );
-
- static_assert( int(dim_s2::rank) == int(1) , "" );
- static_assert( int(dim_s2::rank_dynamic) == int(0) , "" );
- static_assert( int(dim_s2::ArgN0) == 2 , "" );
- static_assert( int(dim_s2::ArgN1) == 1 , "" );
-
- static_assert( int(dim_s2_s3::rank) == int(2) , "" );
- static_assert( int(dim_s2_s3::rank_dynamic) == int(0) , "" );
- static_assert( int(dim_s2_s3::ArgN0) == 2 , "" );
- static_assert( int(dim_s2_s3::ArgN1) == 3 , "" );
- static_assert( int(dim_s2_s3::ArgN2) == 1 , "" );
-
- static_assert( int(dim_s2_s3_s4::rank) == int(3) , "" );
- static_assert( int(dim_s2_s3_s4::rank_dynamic) == int(0) , "" );
- static_assert( int(dim_s2_s3_s4::ArgN0) == 2 , "" );
- static_assert( int(dim_s2_s3_s4::ArgN1) == 3 , "" );
- static_assert( int(dim_s2_s3_s4::ArgN2) == 4 , "" );
- static_assert( int(dim_s2_s3_s4::ArgN3) == 1 , "" );
-
- static_assert( int(dim_s0::rank) == int(1) , "" );
- static_assert( int(dim_s0::rank_dynamic) == int(1) , "" );
-
- static_assert( int(dim_s0_s3::rank) == int(2) , "" );
- static_assert( int(dim_s0_s3::rank_dynamic) == int(1) , "" );
- static_assert( int(dim_s0_s3::ArgN0) == 0 , "" );
- static_assert( int(dim_s0_s3::ArgN1) == 3 , "" );
-
- static_assert( int(dim_s0_s3_s4::rank) == int(3) , "" );
- static_assert( int(dim_s0_s3_s4::rank_dynamic) == int(1) , "" );
- static_assert( int(dim_s0_s3_s4::ArgN0) == 0 , "" );
- static_assert( int(dim_s0_s3_s4::ArgN1) == 3 , "" );
- static_assert( int(dim_s0_s3_s4::ArgN2) == 4 , "" );
-
- static_assert( int(dim_s0_s0_s4::rank) == int(3) , "" );
- static_assert( int(dim_s0_s0_s4::rank_dynamic) == int(2) , "" );
- static_assert( int(dim_s0_s0_s4::ArgN0) == 0 , "" );
- static_assert( int(dim_s0_s0_s4::ArgN1) == 0 , "" );
- static_assert( int(dim_s0_s0_s4::ArgN2) == 4 , "" );
-
- static_assert( int(dim_s0_s0_s0::rank) == int(3) , "" );
- static_assert( int(dim_s0_s0_s0::rank_dynamic) == int(3) , "" );
-
- static_assert( int(dim_s0_s0_s0_s0::rank) == int(4) , "" );
- static_assert( int(dim_s0_s0_s0_s0::rank_dynamic) == int(4) , "" );
-
- static_assert( int(dim_s0_s0_s0_s0_s0::rank) == int(5) , "" );
- static_assert( int(dim_s0_s0_s0_s0_s0::rank_dynamic) == int(5) , "" );
-
- static_assert( int(dim_s0_s0_s0_s0_s0_s0::rank) == int(6) , "" );
- static_assert( int(dim_s0_s0_s0_s0_s0_s0::rank_dynamic) == int(6) , "" );
-
- static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0::rank) == int(7) , "" );
- static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0::rank_dynamic) == int(7) , "" );
-
- static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0_s0::rank) == int(8) , "" );
- static_assert( int(dim_s0_s0_s0_s0_s0_s0_s0_s0::rank_dynamic) == int(8) , "" );
-
- dim_s0 d1( 2, 3, 4, 5, 6, 7, 8, 9 );
+ typedef typename Space::execution_space ExecSpace;
+
+ typedef Kokkos::Experimental::Impl::ViewDimension<> dim_0;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 2 > dim_s2;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 2, 3 > dim_s2_s3;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 2, 3, 4 > dim_s2_s3_s4;
+
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0 > dim_s0;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 3 > dim_s0_s3;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 3, 4 > dim_s0_s3_s4;
+
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0 > dim_s0_s0;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 4 > dim_s0_s0_s4;
+
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0 > dim_s0_s0_s0;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0, 0 > dim_s0_s0_s0_s0;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0, 0, 0 > dim_s0_s0_s0_s0_s0;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0, 0, 0, 0 > dim_s0_s0_s0_s0_s0_s0;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0, 0, 0, 0, 0 > dim_s0_s0_s0_s0_s0_s0_s0;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0, 0, 0, 0, 0, 0 > dim_s0_s0_s0_s0_s0_s0_s0_s0;
+
+ // Fully static dimensions should not be larger than an int.
+ ASSERT_LE( sizeof( dim_0 ), sizeof( int ) );
+ ASSERT_LE( sizeof( dim_s2 ), sizeof( int ) );
+ ASSERT_LE( sizeof( dim_s2_s3 ), sizeof( int ) );
+ ASSERT_LE( sizeof( dim_s2_s3_s4 ), sizeof( int ) );
+
+ // Rank 1 is size_t.
+ ASSERT_EQ( sizeof( dim_s0 ), sizeof( size_t ) );
+ ASSERT_EQ( sizeof( dim_s0_s3 ), sizeof( size_t ) );
+ ASSERT_EQ( sizeof( dim_s0_s3_s4 ), sizeof( size_t ) );
+
+ // Allow for padding.
+ ASSERT_LE( sizeof( dim_s0_s0 ), 2 * sizeof( size_t ) );
+ ASSERT_LE( sizeof( dim_s0_s0_s4 ), 2 * sizeof( size_t ) );
+
+ ASSERT_LE( sizeof( dim_s0_s0_s0 ), 4 * sizeof( size_t ) );
+ ASSERT_EQ( sizeof( dim_s0_s0_s0_s0 ), 4 * sizeof( unsigned ) );
+ ASSERT_LE( sizeof( dim_s0_s0_s0_s0_s0 ), 6 * sizeof( unsigned ) );
+ ASSERT_EQ( sizeof( dim_s0_s0_s0_s0_s0_s0 ), 6 * sizeof( unsigned ) );
+ ASSERT_LE( sizeof( dim_s0_s0_s0_s0_s0_s0_s0 ), 8 * sizeof( unsigned ) );
+ ASSERT_EQ( sizeof( dim_s0_s0_s0_s0_s0_s0_s0_s0 ), 8 * sizeof( unsigned ) );
+
+ static_assert( int( dim_0::rank ) == int( 0 ), "" );
+ static_assert( int( dim_0::rank_dynamic ) == int( 0 ), "" );
+ static_assert( int( dim_0::ArgN0 ) == 1, "" );
+ static_assert( int( dim_0::ArgN1 ) == 1, "" );
+ static_assert( int( dim_0::ArgN2 ) == 1, "" );
+
+ static_assert( int( dim_s2::rank ) == int( 1 ), "" );
+ static_assert( int( dim_s2::rank_dynamic ) == int( 0 ), "" );
+ static_assert( int( dim_s2::ArgN0 ) == 2, "" );
+ static_assert( int( dim_s2::ArgN1 ) == 1, "" );
+
+ static_assert( int( dim_s2_s3::rank ) == int( 2 ), "" );
+ static_assert( int( dim_s2_s3::rank_dynamic ) == int( 0 ), "" );
+ static_assert( int( dim_s2_s3::ArgN0 ) == 2, "" );
+ static_assert( int( dim_s2_s3::ArgN1 ) == 3, "" );
+ static_assert( int( dim_s2_s3::ArgN2 ) == 1, "" );
+
+ static_assert( int( dim_s2_s3_s4::rank ) == int( 3 ), "" );
+ static_assert( int( dim_s2_s3_s4::rank_dynamic ) == int( 0 ), "" );
+ static_assert( int( dim_s2_s3_s4::ArgN0 ) == 2, "" );
+ static_assert( int( dim_s2_s3_s4::ArgN1 ) == 3, "" );
+ static_assert( int( dim_s2_s3_s4::ArgN2 ) == 4, "" );
+ static_assert( int( dim_s2_s3_s4::ArgN3 ) == 1, "" );
+
+ static_assert( int( dim_s0::rank ) == int( 1 ), "" );
+ static_assert( int( dim_s0::rank_dynamic ) == int( 1 ), "" );
+
+ static_assert( int( dim_s0_s3::rank ) == int( 2 ), "" );
+ static_assert( int( dim_s0_s3::rank_dynamic ) == int( 1 ), "" );
+ static_assert( int( dim_s0_s3::ArgN0 ) == 0, "" );
+ static_assert( int( dim_s0_s3::ArgN1 ) == 3, "" );
+
+ static_assert( int( dim_s0_s3_s4::rank ) == int( 3 ), "" );
+ static_assert( int( dim_s0_s3_s4::rank_dynamic ) == int( 1 ), "" );
+ static_assert( int( dim_s0_s3_s4::ArgN0 ) == 0, "" );
+ static_assert( int( dim_s0_s3_s4::ArgN1 ) == 3, "" );
+ static_assert( int( dim_s0_s3_s4::ArgN2 ) == 4, "" );
+
+ static_assert( int( dim_s0_s0_s4::rank ) == int( 3 ), "" );
+ static_assert( int( dim_s0_s0_s4::rank_dynamic ) == int( 2 ), "" );
+ static_assert( int( dim_s0_s0_s4::ArgN0 ) == 0, "" );
+ static_assert( int( dim_s0_s0_s4::ArgN1 ) == 0, "" );
+ static_assert( int( dim_s0_s0_s4::ArgN2 ) == 4, "" );
+
+ static_assert( int( dim_s0_s0_s0::rank ) == int( 3 ), "" );
+ static_assert( int( dim_s0_s0_s0::rank_dynamic ) == int( 3 ), "" );
+
+ static_assert( int( dim_s0_s0_s0_s0::rank ) == int( 4 ), "" );
+ static_assert( int( dim_s0_s0_s0_s0::rank_dynamic ) == int( 4 ), "" );
+
+ static_assert( int( dim_s0_s0_s0_s0_s0::rank ) == int( 5 ), "" );
+ static_assert( int( dim_s0_s0_s0_s0_s0::rank_dynamic ) == int( 5 ), "" );
+
+ static_assert( int( dim_s0_s0_s0_s0_s0_s0::rank ) == int( 6 ), "" );
+ static_assert( int( dim_s0_s0_s0_s0_s0_s0::rank_dynamic ) == int( 6 ), "" );
+
+ static_assert( int( dim_s0_s0_s0_s0_s0_s0_s0::rank ) == int( 7 ), "" );
+ static_assert( int( dim_s0_s0_s0_s0_s0_s0_s0::rank_dynamic ) == int( 7 ), "" );
+
+ static_assert( int( dim_s0_s0_s0_s0_s0_s0_s0_s0::rank ) == int( 8 ), "" );
+ static_assert( int( dim_s0_s0_s0_s0_s0_s0_s0_s0::rank_dynamic ) == int( 8 ), "" );
+
+ dim_s0 d1( 2, 3, 4, 5, 6, 7, 8, 9 );
dim_s0_s0 d2( 2, 3, 4, 5, 6, 7, 8, 9 );
dim_s0_s0_s0 d3( 2, 3, 4, 5, 6, 7, 8, 9 );
dim_s0_s0_s0_s0 d4( 2, 3, 4, 5, 6, 7, 8, 9 );
- ASSERT_EQ( d1.N0 , 2 );
- ASSERT_EQ( d2.N0 , 2 );
- ASSERT_EQ( d3.N0 , 2 );
- ASSERT_EQ( d4.N0 , 2 );
+ ASSERT_EQ( d1.N0, 2 );
+ ASSERT_EQ( d2.N0, 2 );
+ ASSERT_EQ( d3.N0, 2 );
+ ASSERT_EQ( d4.N0, 2 );
- ASSERT_EQ( d1.N1 , 1 );
- ASSERT_EQ( d2.N1 , 3 );
- ASSERT_EQ( d3.N1 , 3 );
- ASSERT_EQ( d4.N1 , 3 );
+ ASSERT_EQ( d1.N1, 1 );
+ ASSERT_EQ( d2.N1, 3 );
+ ASSERT_EQ( d3.N1, 3 );
+ ASSERT_EQ( d4.N1, 3 );
- ASSERT_EQ( d1.N2 , 1 );
- ASSERT_EQ( d2.N2 , 1 );
- ASSERT_EQ( d3.N2 , 4 );
- ASSERT_EQ( d4.N2 , 4 );
+ ASSERT_EQ( d1.N2, 1 );
+ ASSERT_EQ( d2.N2, 1 );
+ ASSERT_EQ( d3.N2, 4 );
+ ASSERT_EQ( d4.N2, 4 );
- ASSERT_EQ( d1.N3 , 1 );
- ASSERT_EQ( d2.N3 , 1 );
- ASSERT_EQ( d3.N3 , 1 );
- ASSERT_EQ( d4.N3 , 5 );
+ ASSERT_EQ( d1.N3, 1 );
+ ASSERT_EQ( d2.N3, 1 );
+ ASSERT_EQ( d3.N3, 1 );
+ ASSERT_EQ( d4.N3, 5 );
//----------------------------------------
- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s0 , Kokkos::LayoutStride > stride_s0_s0_s0 ;
+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s0, Kokkos::LayoutStride > stride_s0_s0_s0;
//----------------------------------------
- // Static dimension
+ // Static dimension.
{
- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s2_s3_s4 , Kokkos::LayoutLeft > left_s2_s3_s4 ;
+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s2_s3_s4, Kokkos::LayoutLeft > left_s2_s3_s4;
- ASSERT_EQ( sizeof(left_s2_s3_s4) , sizeof(dim_s2_s3_s4) );
+ ASSERT_EQ( sizeof( left_s2_s3_s4 ), sizeof( dim_s2_s3_s4 ) );
- left_s2_s3_s4 off3 ;
+ left_s2_s3_s4 off3;
- stride_s0_s0_s0 stride3( off3 );
+ stride_s0_s0_s0 stride3( off3 );
- ASSERT_EQ( off3.stride_0() , 1 );
- ASSERT_EQ( off3.stride_1() , 2 );
- ASSERT_EQ( off3.stride_2() , 6 );
- ASSERT_EQ( off3.span() , 24 );
+ ASSERT_EQ( off3.stride_0(), 1 );
+ ASSERT_EQ( off3.stride_1(), 2 );
+ ASSERT_EQ( off3.stride_2(), 6 );
+ ASSERT_EQ( off3.span(), 24 );
- ASSERT_EQ( off3.stride_0() , stride3.stride_0() );
- ASSERT_EQ( off3.stride_1() , stride3.stride_1() );
- ASSERT_EQ( off3.stride_2() , stride3.stride_2() );
- ASSERT_EQ( off3.span() , stride3.span() );
+ ASSERT_EQ( off3.stride_0(), stride3.stride_0() );
+ ASSERT_EQ( off3.stride_1(), stride3.stride_1() );
+ ASSERT_EQ( off3.stride_2(), stride3.stride_2() );
+ ASSERT_EQ( off3.span(), stride3.span() );
- int offset = 0 ;
+ int offset = 0;
- for ( int k = 0 ; k < 4 ; ++k ){
- for ( int j = 0 ; j < 3 ; ++j ){
- for ( int i = 0 ; i < 2 ; ++i , ++offset ){
- ASSERT_EQ( off3(i,j,k) , offset );
- ASSERT_EQ( stride3(i,j,k) , off3(i,j,k) );
- }}}
+ for ( int k = 0; k < 4; ++k )
+ for ( int j = 0; j < 3; ++j )
+ for ( int i = 0; i < 2; ++i, ++offset )
+ {
+ ASSERT_EQ( off3( i, j, k ), offset );
+ ASSERT_EQ( stride3( i, j, k ), off3( i, j, k ) );
+ }
}
//----------------------------------------
- // Small dimension is unpadded
+ // Small dimension is unpadded.
{
- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutLeft > left_s0_s0_s4 ;
+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutLeft > left_s0_s0_s4;
- left_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
+ left_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
, Kokkos::LayoutLeft( 2, 3, 0, 0, 0, 0, 0, 0 ) );
stride_s0_s0_s0 stride3( dyn_off3 );
- ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
- ASSERT_EQ( dyn_off3.m_dim.N0 , 2 );
- ASSERT_EQ( dyn_off3.m_dim.N1 , 3 );
- ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
- ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
- ASSERT_EQ( dyn_off3.size() , 2 * 3 * 4 );
+ ASSERT_EQ( dyn_off3.m_dim.rank, 3 );
+ ASSERT_EQ( dyn_off3.m_dim.N0, 2 );
+ ASSERT_EQ( dyn_off3.m_dim.N1, 3 );
+ ASSERT_EQ( dyn_off3.m_dim.N2, 4 );
+ ASSERT_EQ( dyn_off3.m_dim.N3, 1 );
+ ASSERT_EQ( dyn_off3.size(), 2 * 3 * 4 );
const Kokkos::LayoutLeft layout = dyn_off3.layout();
- ASSERT_EQ( layout.dimension[0] , 2 );
- ASSERT_EQ( layout.dimension[1] , 3 );
- ASSERT_EQ( layout.dimension[2] , 4 );
- ASSERT_EQ( layout.dimension[3] , 1 );
- ASSERT_EQ( layout.dimension[4] , 1 );
- ASSERT_EQ( layout.dimension[5] , 1 );
- ASSERT_EQ( layout.dimension[6] , 1 );
- ASSERT_EQ( layout.dimension[7] , 1 );
-
- ASSERT_EQ( stride3.m_dim.rank , 3 );
- ASSERT_EQ( stride3.m_dim.N0 , 2 );
- ASSERT_EQ( stride3.m_dim.N1 , 3 );
- ASSERT_EQ( stride3.m_dim.N2 , 4 );
- ASSERT_EQ( stride3.m_dim.N3 , 1 );
- ASSERT_EQ( stride3.size() , 2 * 3 * 4 );
-
- int offset = 0 ;
-
- for ( int k = 0 ; k < 4 ; ++k ){
- for ( int j = 0 ; j < 3 ; ++j ){
- for ( int i = 0 ; i < 2 ; ++i , ++offset ){
- ASSERT_EQ( offset , dyn_off3(i,j,k) );
- ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
- }}}
-
- ASSERT_EQ( dyn_off3.span() , offset );
- ASSERT_EQ( stride3.span() , dyn_off3.span() );
+ ASSERT_EQ( layout.dimension[0], 2 );
+ ASSERT_EQ( layout.dimension[1], 3 );
+ ASSERT_EQ( layout.dimension[2], 4 );
+ ASSERT_EQ( layout.dimension[3], 1 );
+ ASSERT_EQ( layout.dimension[4], 1 );
+ ASSERT_EQ( layout.dimension[5], 1 );
+ ASSERT_EQ( layout.dimension[6], 1 );
+ ASSERT_EQ( layout.dimension[7], 1 );
+
+ ASSERT_EQ( stride3.m_dim.rank, 3 );
+ ASSERT_EQ( stride3.m_dim.N0, 2 );
+ ASSERT_EQ( stride3.m_dim.N1, 3 );
+ ASSERT_EQ( stride3.m_dim.N2, 4 );
+ ASSERT_EQ( stride3.m_dim.N3, 1 );
+ ASSERT_EQ( stride3.size(), 2 * 3 * 4 );
+
+ int offset = 0;
+
+ for ( int k = 0; k < 4; ++k )
+ for ( int j = 0; j < 3; ++j )
+ for ( int i = 0; i < 2; ++i, ++offset )
+ {
+ ASSERT_EQ( offset, dyn_off3( i, j, k ) );
+ ASSERT_EQ( stride3( i, j, k ), dyn_off3( i, j, k ) );
+ }
+
+ ASSERT_EQ( dyn_off3.span(), offset );
+ ASSERT_EQ( stride3.span(), dyn_off3.span() );
}
- // Large dimension is likely padded
+ //----------------------------------------
+ // Large dimension is likely padded.
{
- constexpr int N0 = 2000 ;
- constexpr int N1 = 300 ;
+ constexpr int N0 = 2000;
+ constexpr int N1 = 300;
- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutLeft > left_s0_s0_s4 ;
+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutLeft > left_s0_s0_s4;
- left_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
+ left_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
, Kokkos::LayoutLeft( N0, N1, 0, 0, 0, 0, 0, 0 ) );
stride_s0_s0_s0 stride3( dyn_off3 );
- ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
- ASSERT_EQ( dyn_off3.m_dim.N0 , N0 );
- ASSERT_EQ( dyn_off3.m_dim.N1 , N1 );
- ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
- ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
- ASSERT_EQ( dyn_off3.size() , N0 * N1 * 4 );
-
- ASSERT_EQ( stride3.m_dim.rank , 3 );
- ASSERT_EQ( stride3.m_dim.N0 , N0 );
- ASSERT_EQ( stride3.m_dim.N1 , N1 );
- ASSERT_EQ( stride3.m_dim.N2 , 4 );
- ASSERT_EQ( stride3.m_dim.N3 , 1 );
- ASSERT_EQ( stride3.size() , N0 * N1 * 4 );
- ASSERT_EQ( stride3.span() , dyn_off3.span() );
-
- int offset = 0 ;
-
- for ( int k = 0 ; k < 4 ; ++k ){
- for ( int j = 0 ; j < N1 ; ++j ){
- for ( int i = 0 ; i < N0 ; ++i ){
- ASSERT_LE( offset , dyn_off3(i,j,k) );
- ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
- offset = dyn_off3(i,j,k) + 1 ;
- }}}
-
- ASSERT_LE( offset , dyn_off3.span() );
+ ASSERT_EQ( dyn_off3.m_dim.rank, 3 );
+ ASSERT_EQ( dyn_off3.m_dim.N0, N0 );
+ ASSERT_EQ( dyn_off3.m_dim.N1, N1 );
+ ASSERT_EQ( dyn_off3.m_dim.N2, 4 );
+ ASSERT_EQ( dyn_off3.m_dim.N3, 1 );
+ ASSERT_EQ( dyn_off3.size(), N0 * N1 * 4 );
+
+ ASSERT_EQ( stride3.m_dim.rank, 3 );
+ ASSERT_EQ( stride3.m_dim.N0, N0 );
+ ASSERT_EQ( stride3.m_dim.N1, N1 );
+ ASSERT_EQ( stride3.m_dim.N2, 4 );
+ ASSERT_EQ( stride3.m_dim.N3, 1 );
+ ASSERT_EQ( stride3.size(), N0 * N1 * 4 );
+ ASSERT_EQ( stride3.span(), dyn_off3.span() );
+
+ int offset = 0;
+
+ for ( int k = 0; k < 4; ++k )
+ for ( int j = 0; j < N1; ++j )
+ for ( int i = 0; i < N0; ++i )
+ {
+ ASSERT_LE( offset, dyn_off3( i, j, k ) );
+ ASSERT_EQ( stride3( i, j, k ), dyn_off3( i, j, k ) );
+ offset = dyn_off3( i, j, k ) + 1;
+ }
+
+ ASSERT_LE( offset, dyn_off3.span() );
}
//----------------------------------------
- // Static dimension
+ // Static dimension.
{
- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s2_s3_s4 , Kokkos::LayoutRight > right_s2_s3_s4 ;
+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s2_s3_s4, Kokkos::LayoutRight > right_s2_s3_s4;
- ASSERT_EQ( sizeof(right_s2_s3_s4) , sizeof(dim_s2_s3_s4) );
+ ASSERT_EQ( sizeof( right_s2_s3_s4 ), sizeof( dim_s2_s3_s4 ) );
- right_s2_s3_s4 off3 ;
+ right_s2_s3_s4 off3;
stride_s0_s0_s0 stride3( off3 );
- ASSERT_EQ( off3.stride_0() , 12 );
- ASSERT_EQ( off3.stride_1() , 4 );
- ASSERT_EQ( off3.stride_2() , 1 );
+ ASSERT_EQ( off3.stride_0(), 12 );
+ ASSERT_EQ( off3.stride_1(), 4 );
+ ASSERT_EQ( off3.stride_2(), 1 );
- ASSERT_EQ( off3.dimension_0() , stride3.dimension_0() );
- ASSERT_EQ( off3.dimension_1() , stride3.dimension_1() );
- ASSERT_EQ( off3.dimension_2() , stride3.dimension_2() );
- ASSERT_EQ( off3.stride_0() , stride3.stride_0() );
- ASSERT_EQ( off3.stride_1() , stride3.stride_1() );
- ASSERT_EQ( off3.stride_2() , stride3.stride_2() );
- ASSERT_EQ( off3.span() , stride3.span() );
+ ASSERT_EQ( off3.dimension_0(), stride3.dimension_0() );
+ ASSERT_EQ( off3.dimension_1(), stride3.dimension_1() );
+ ASSERT_EQ( off3.dimension_2(), stride3.dimension_2() );
+ ASSERT_EQ( off3.stride_0(), stride3.stride_0() );
+ ASSERT_EQ( off3.stride_1(), stride3.stride_1() );
+ ASSERT_EQ( off3.stride_2(), stride3.stride_2() );
+ ASSERT_EQ( off3.span(), stride3.span() );
- int offset = 0 ;
+ int offset = 0;
- for ( int i = 0 ; i < 2 ; ++i ){
- for ( int j = 0 ; j < 3 ; ++j ){
- for ( int k = 0 ; k < 4 ; ++k , ++offset ){
- ASSERT_EQ( off3(i,j,k) , offset );
- ASSERT_EQ( off3(i,j,k) , stride3(i,j,k) );
- }}}
+ for ( int i = 0; i < 2; ++i )
+ for ( int j = 0; j < 3; ++j )
+ for ( int k = 0; k < 4; ++k, ++offset )
+ {
+ ASSERT_EQ( off3( i, j, k ), offset );
+ ASSERT_EQ( off3( i, j, k ), stride3( i, j, k ) );
+ }
- ASSERT_EQ( off3.span() , offset );
+ ASSERT_EQ( off3.span(), offset );
}
//----------------------------------------
- // Small dimension is unpadded
+ // Small dimension is unpadded.
{
- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutRight > right_s0_s0_s4 ;
+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutRight > right_s0_s0_s4;
- right_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
+ right_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
, Kokkos::LayoutRight( 2, 3, 0, 0, 0, 0, 0, 0 ) );
stride_s0_s0_s0 stride3( dyn_off3 );
- ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
- ASSERT_EQ( dyn_off3.m_dim.N0 , 2 );
- ASSERT_EQ( dyn_off3.m_dim.N1 , 3 );
- ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
- ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
- ASSERT_EQ( dyn_off3.size() , 2 * 3 * 4 );
-
- ASSERT_EQ( dyn_off3.dimension_0() , stride3.dimension_0() );
- ASSERT_EQ( dyn_off3.dimension_1() , stride3.dimension_1() );
- ASSERT_EQ( dyn_off3.dimension_2() , stride3.dimension_2() );
- ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
- ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
- ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
- ASSERT_EQ( dyn_off3.span() , stride3.span() );
-
- int offset = 0 ;
-
- for ( int i = 0 ; i < 2 ; ++i ){
- for ( int j = 0 ; j < 3 ; ++j ){
- for ( int k = 0 ; k < 4 ; ++k , ++offset ){
- ASSERT_EQ( offset , dyn_off3(i,j,k) );
- ASSERT_EQ( dyn_off3(i,j,k) , stride3(i,j,k) );
- }}}
-
- ASSERT_EQ( dyn_off3.span() , offset );
+ ASSERT_EQ( dyn_off3.m_dim.rank, 3 );
+ ASSERT_EQ( dyn_off3.m_dim.N0, 2 );
+ ASSERT_EQ( dyn_off3.m_dim.N1, 3 );
+ ASSERT_EQ( dyn_off3.m_dim.N2, 4 );
+ ASSERT_EQ( dyn_off3.m_dim.N3, 1 );
+ ASSERT_EQ( dyn_off3.size(), 2 * 3 * 4 );
+
+ ASSERT_EQ( dyn_off3.dimension_0(), stride3.dimension_0() );
+ ASSERT_EQ( dyn_off3.dimension_1(), stride3.dimension_1() );
+ ASSERT_EQ( dyn_off3.dimension_2(), stride3.dimension_2() );
+ ASSERT_EQ( dyn_off3.stride_0(), stride3.stride_0() );
+ ASSERT_EQ( dyn_off3.stride_1(), stride3.stride_1() );
+ ASSERT_EQ( dyn_off3.stride_2(), stride3.stride_2() );
+ ASSERT_EQ( dyn_off3.span(), stride3.span() );
+
+ int offset = 0;
+
+ for ( int i = 0; i < 2; ++i )
+ for ( int j = 0; j < 3; ++j )
+ for ( int k = 0; k < 4; ++k, ++offset )
+ {
+ ASSERT_EQ( offset, dyn_off3( i, j, k ) );
+ ASSERT_EQ( dyn_off3( i, j, k ), stride3( i, j, k ) );
+ }
+
+ ASSERT_EQ( dyn_off3.span(), offset );
}
- // Large dimension is likely padded
+ //----------------------------------------
+ // Large dimension is likely padded.
{
- constexpr int N0 = 2000 ;
- constexpr int N1 = 300 ;
+ constexpr int N0 = 2000;
+ constexpr int N1 = 300;
- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutRight > right_s0_s0_s4 ;
+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutRight > right_s0_s0_s4;
- right_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
+ right_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
, Kokkos::LayoutRight( N0, N1, 0, 0, 0, 0, 0, 0 ) );
stride_s0_s0_s0 stride3( dyn_off3 );
- ASSERT_EQ( dyn_off3.m_dim.rank , 3 );
- ASSERT_EQ( dyn_off3.m_dim.N0 , N0 );
- ASSERT_EQ( dyn_off3.m_dim.N1 , N1 );
- ASSERT_EQ( dyn_off3.m_dim.N2 , 4 );
- ASSERT_EQ( dyn_off3.m_dim.N3 , 1 );
- ASSERT_EQ( dyn_off3.size() , N0 * N1 * 4 );
-
- ASSERT_EQ( dyn_off3.dimension_0() , stride3.dimension_0() );
- ASSERT_EQ( dyn_off3.dimension_1() , stride3.dimension_1() );
- ASSERT_EQ( dyn_off3.dimension_2() , stride3.dimension_2() );
- ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
- ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
- ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
- ASSERT_EQ( dyn_off3.span() , stride3.span() );
-
- int offset = 0 ;
-
- for ( int i = 0 ; i < N0 ; ++i ){
- for ( int j = 0 ; j < N1 ; ++j ){
- for ( int k = 0 ; k < 4 ; ++k ){
- ASSERT_LE( offset , dyn_off3(i,j,k) );
- ASSERT_EQ( dyn_off3(i,j,k) , stride3(i,j,k) );
- offset = dyn_off3(i,j,k) + 1 ;
- }}}
-
- ASSERT_LE( offset , dyn_off3.span() );
+ ASSERT_EQ( dyn_off3.m_dim.rank, 3 );
+ ASSERT_EQ( dyn_off3.m_dim.N0, N0 );
+ ASSERT_EQ( dyn_off3.m_dim.N1, N1 );
+ ASSERT_EQ( dyn_off3.m_dim.N2, 4 );
+ ASSERT_EQ( dyn_off3.m_dim.N3, 1 );
+ ASSERT_EQ( dyn_off3.size(), N0 * N1 * 4 );
+
+ ASSERT_EQ( dyn_off3.dimension_0(), stride3.dimension_0() );
+ ASSERT_EQ( dyn_off3.dimension_1(), stride3.dimension_1() );
+ ASSERT_EQ( dyn_off3.dimension_2(), stride3.dimension_2() );
+ ASSERT_EQ( dyn_off3.stride_0(), stride3.stride_0() );
+ ASSERT_EQ( dyn_off3.stride_1(), stride3.stride_1() );
+ ASSERT_EQ( dyn_off3.stride_2(), stride3.stride_2() );
+ ASSERT_EQ( dyn_off3.span(), stride3.span() );
+
+ int offset = 0;
+
+ for ( int i = 0; i < N0; ++i )
+ for ( int j = 0; j < N1; ++j )
+ for ( int k = 0; k < 4; ++k )
+ {
+ ASSERT_LE( offset, dyn_off3( i, j, k ) );
+ ASSERT_EQ( dyn_off3( i, j, k ), stride3( i, j, k ) );
+ offset = dyn_off3( i, j, k ) + 1;
+ }
+
+ ASSERT_LE( offset, dyn_off3.span() );
}
//----------------------------------------
- // Subview
+ // Subview.
{
// Mapping rank 4 to rank 3
- typedef Kokkos::Experimental::Impl::SubviewExtents<4,3> SubviewExtents ;
+ typedef Kokkos::Experimental::Impl::SubviewExtents< 4, 3 > SubviewExtents;
- constexpr int N0 = 1000 ;
- constexpr int N1 = 2000 ;
- constexpr int N2 = 3000 ;
- constexpr int N3 = 4000 ;
+ constexpr int N0 = 1000;
+ constexpr int N1 = 2000;
+ constexpr int N2 = 3000;
+ constexpr int N3 = 4000;
- Kokkos::Experimental::Impl::ViewDimension<N0,N1,N2,N3> dim ;
+ Kokkos::Experimental::Impl::ViewDimension< N0, N1, N2, N3 > dim;
SubviewExtents tmp( dim
, N0 / 2
, Kokkos::Experimental::ALL
- , std::pair<int,int>( N2 / 4 , 10 + N2 / 4 )
- , Kokkos::pair<int,int>( N3 / 4 , 20 + N3 / 4 )
+ , std::pair< int, int >( N2 / 4, 10 + N2 / 4 )
+ , Kokkos::pair< int, int >( N3 / 4, 20 + N3 / 4 )
);
- ASSERT_EQ( tmp.domain_offset(0) , N0 / 2 );
- ASSERT_EQ( tmp.domain_offset(1) , 0 );
- ASSERT_EQ( tmp.domain_offset(2) , N2 / 4 );
- ASSERT_EQ( tmp.domain_offset(3) , N3 / 4 );
+ ASSERT_EQ( tmp.domain_offset( 0 ), N0 / 2 );
+ ASSERT_EQ( tmp.domain_offset( 1 ), 0 );
+ ASSERT_EQ( tmp.domain_offset( 2 ), N2 / 4 );
+ ASSERT_EQ( tmp.domain_offset( 3 ), N3 / 4 );
- ASSERT_EQ( tmp.range_index(0) , 1 );
- ASSERT_EQ( tmp.range_index(1) , 2 );
- ASSERT_EQ( tmp.range_index(2) , 3 );
+ ASSERT_EQ( tmp.range_index( 0 ), 1 );
+ ASSERT_EQ( tmp.range_index( 1 ), 2 );
+ ASSERT_EQ( tmp.range_index( 2 ), 3 );
- ASSERT_EQ( tmp.range_extent(0) , N1 );
- ASSERT_EQ( tmp.range_extent(1) , 10 );
- ASSERT_EQ( tmp.range_extent(2) , 20 );
+ ASSERT_EQ( tmp.range_extent( 0 ), N1 );
+ ASSERT_EQ( tmp.range_extent( 1 ), 10 );
+ ASSERT_EQ( tmp.range_extent( 2 ), 20 );
}
- //----------------------------------------
+
{
- constexpr int N0 = 2000 ;
- constexpr int N1 = 300 ;
+ constexpr int N0 = 2000;
+ constexpr int N1 = 300;
- constexpr int sub_N0 = 1000 ;
- constexpr int sub_N1 = 200 ;
- constexpr int sub_N2 = 4 ;
+ constexpr int sub_N0 = 1000;
+ constexpr int sub_N1 = 200;
+ constexpr int sub_N2 = 4;
- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutLeft > left_s0_s0_s4 ;
+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutLeft > left_s0_s0_s4;
- left_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
+ left_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
, Kokkos::LayoutLeft( N0, N1, 0, 0, 0, 0, 0, 0 ) );
- Kokkos::Experimental::Impl::SubviewExtents< 3 , 3 >
+ Kokkos::Experimental::Impl::SubviewExtents< 3, 3 >
sub( dyn_off3.m_dim
- , Kokkos::pair<int,int>(0,sub_N0)
- , Kokkos::pair<int,int>(0,sub_N1)
- , Kokkos::pair<int,int>(0,sub_N2)
+ , Kokkos::pair< int, int >( 0, sub_N0 )
+ , Kokkos::pair< int, int >( 0, sub_N1 )
+ , Kokkos::pair< int, int >( 0, sub_N2 )
);
- stride_s0_s0_s0 stride3( dyn_off3 , sub );
+ stride_s0_s0_s0 stride3( dyn_off3, sub );
- ASSERT_EQ( stride3.dimension_0() , sub_N0 );
- ASSERT_EQ( stride3.dimension_1() , sub_N1 );
- ASSERT_EQ( stride3.dimension_2() , sub_N2 );
- ASSERT_EQ( stride3.size() , sub_N0 * sub_N1 * sub_N2 );
+ ASSERT_EQ( stride3.dimension_0(), sub_N0 );
+ ASSERT_EQ( stride3.dimension_1(), sub_N1 );
+ ASSERT_EQ( stride3.dimension_2(), sub_N2 );
+ ASSERT_EQ( stride3.size(), sub_N0 * sub_N1 * sub_N2 );
- ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
- ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
- ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
- ASSERT_GE( dyn_off3.span() , stride3.span() );
+ ASSERT_EQ( dyn_off3.stride_0(), stride3.stride_0() );
+ ASSERT_EQ( dyn_off3.stride_1(), stride3.stride_1() );
+ ASSERT_EQ( dyn_off3.stride_2(), stride3.stride_2() );
+ ASSERT_GE( dyn_off3.span() , stride3.span() );
- for ( int k = 0 ; k < sub_N2 ; ++k ){
- for ( int j = 0 ; j < sub_N1 ; ++j ){
- for ( int i = 0 ; i < sub_N0 ; ++i ){
- ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
- }}}
+ for ( int k = 0; k < sub_N2; ++k )
+ for ( int j = 0; j < sub_N1; ++j )
+ for ( int i = 0; i < sub_N0; ++i )
+ {
+ ASSERT_EQ( stride3( i, j, k ), dyn_off3( i, j, k ) );
+ }
}
{
- constexpr int N0 = 2000 ;
- constexpr int N1 = 300 ;
+ constexpr int N0 = 2000;
+ constexpr int N1 = 300;
- constexpr int sub_N0 = 1000 ;
- constexpr int sub_N1 = 200 ;
- constexpr int sub_N2 = 4 ;
+ constexpr int sub_N0 = 1000;
+ constexpr int sub_N1 = 200;
+ constexpr int sub_N2 = 4;
- typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4 , Kokkos::LayoutRight > right_s0_s0_s4 ;
+ typedef Kokkos::Experimental::Impl::ViewOffset< dim_s0_s0_s4, Kokkos::LayoutRight > right_s0_s0_s4;
- right_s0_s0_s4 dyn_off3( std::integral_constant<unsigned,sizeof(int)>()
+ right_s0_s0_s4 dyn_off3( std::integral_constant< unsigned, sizeof( int ) >()
, Kokkos::LayoutRight( N0, N1, 0, 0, 0, 0, 0, 0 ) );
- Kokkos::Experimental::Impl::SubviewExtents< 3 , 3 >
+ Kokkos::Experimental::Impl::SubviewExtents< 3, 3 >
sub( dyn_off3.m_dim
- , Kokkos::pair<int,int>(0,sub_N0)
- , Kokkos::pair<int,int>(0,sub_N1)
- , Kokkos::pair<int,int>(0,sub_N2)
+ , Kokkos::pair< int, int >( 0, sub_N0 )
+ , Kokkos::pair< int, int >( 0, sub_N1 )
+ , Kokkos::pair< int, int >( 0, sub_N2 )
);
- stride_s0_s0_s0 stride3( dyn_off3 , sub );
+ stride_s0_s0_s0 stride3( dyn_off3, sub );
- ASSERT_EQ( stride3.dimension_0() , sub_N0 );
- ASSERT_EQ( stride3.dimension_1() , sub_N1 );
- ASSERT_EQ( stride3.dimension_2() , sub_N2 );
- ASSERT_EQ( stride3.size() , sub_N0 * sub_N1 * sub_N2 );
+ ASSERT_EQ( stride3.dimension_0(), sub_N0 );
+ ASSERT_EQ( stride3.dimension_1(), sub_N1 );
+ ASSERT_EQ( stride3.dimension_2(), sub_N2 );
+ ASSERT_EQ( stride3.size(), sub_N0 * sub_N1 * sub_N2 );
- ASSERT_EQ( dyn_off3.stride_0() , stride3.stride_0() );
- ASSERT_EQ( dyn_off3.stride_1() , stride3.stride_1() );
- ASSERT_EQ( dyn_off3.stride_2() , stride3.stride_2() );
- ASSERT_GE( dyn_off3.span() , stride3.span() );
+ ASSERT_EQ( dyn_off3.stride_0(), stride3.stride_0() );
+ ASSERT_EQ( dyn_off3.stride_1(), stride3.stride_1() );
+ ASSERT_EQ( dyn_off3.stride_2(), stride3.stride_2() );
+ ASSERT_GE( dyn_off3.span() , stride3.span() );
- for ( int i = 0 ; i < sub_N0 ; ++i ){
- for ( int j = 0 ; j < sub_N1 ; ++j ){
- for ( int k = 0 ; k < sub_N2 ; ++k ){
- ASSERT_EQ( stride3(i,j,k) , dyn_off3(i,j,k) );
- }}}
+ for ( int i = 0; i < sub_N0; ++i )
+ for ( int j = 0; j < sub_N1; ++j )
+ for ( int k = 0; k < sub_N2; ++k )
+ {
+ ASSERT_EQ( stride3( i, j, k ), dyn_off3( i, j, k ) );
+ }
}
//----------------------------------------
- // view data analysis
+ // View data analysis.
{
- using namespace Kokkos::Experimental::Impl ;
- static_assert( rank_dynamic<>::value == 0 , "" );
- static_assert( rank_dynamic<1>::value == 0 , "" );
- static_assert( rank_dynamic<0>::value == 1 , "" );
- static_assert( rank_dynamic<0,1>::value == 1 , "" );
- static_assert( rank_dynamic<0,0,1>::value == 2 , "" );
+ using namespace Kokkos::Experimental::Impl;
+
+ static_assert( rank_dynamic<>::value == 0, "" );
+ static_assert( rank_dynamic< 1 >::value == 0, "" );
+ static_assert( rank_dynamic< 0 >::value == 1, "" );
+ static_assert( rank_dynamic< 0, 1 >::value == 1, "" );
+ static_assert( rank_dynamic< 0, 0, 1 >::value == 2, "" );
}
{
- using namespace Kokkos::Experimental::Impl ;
-
- typedef ViewArrayAnalysis< int[] > a_int_r1 ;
- typedef ViewArrayAnalysis< int**[4][5][6] > a_int_r5 ;
- typedef ViewArrayAnalysis< const int[] > a_const_int_r1 ;
- typedef ViewArrayAnalysis< const int**[4][5][6] > a_const_int_r5 ;
-
- static_assert( a_int_r1::dimension::rank == 1 , "" );
- static_assert( a_int_r1::dimension::rank_dynamic == 1 , "" );
- static_assert( a_int_r5::dimension::ArgN0 == 0 , "" );
- static_assert( a_int_r5::dimension::ArgN1 == 0 , "" );
- static_assert( a_int_r5::dimension::ArgN2 == 4 , "" );
- static_assert( a_int_r5::dimension::ArgN3 == 5 , "" );
- static_assert( a_int_r5::dimension::ArgN4 == 6 , "" );
- static_assert( a_int_r5::dimension::ArgN5 == 1 , "" );
-
- static_assert( std::is_same< typename a_int_r1::dimension , ViewDimension<0> >::value , "" );
- static_assert( std::is_same< typename a_int_r1::non_const_value_type , int >::value , "" );
-
- static_assert( a_const_int_r1::dimension::rank == 1 , "" );
- static_assert( a_const_int_r1::dimension::rank_dynamic == 1 , "" );
- static_assert( std::is_same< typename a_const_int_r1::dimension , ViewDimension<0> >::value , "" );
- static_assert( std::is_same< typename a_const_int_r1::non_const_value_type , int >::value , "" );
-
- static_assert( a_const_int_r5::dimension::rank == 5 , "" );
- static_assert( a_const_int_r5::dimension::rank_dynamic == 2 , "" );
-
- static_assert( a_const_int_r5::dimension::ArgN0 == 0 , "" );
- static_assert( a_const_int_r5::dimension::ArgN1 == 0 , "" );
- static_assert( a_const_int_r5::dimension::ArgN2 == 4 , "" );
- static_assert( a_const_int_r5::dimension::ArgN3 == 5 , "" );
- static_assert( a_const_int_r5::dimension::ArgN4 == 6 , "" );
- static_assert( a_const_int_r5::dimension::ArgN5 == 1 , "" );
-
- static_assert( std::is_same< typename a_const_int_r5::dimension , ViewDimension<0,0,4,5,6> >::value , "" );
- static_assert( std::is_same< typename a_const_int_r5::non_const_value_type , int >::value , "" );
-
- static_assert( a_int_r5::dimension::rank == 5 , "" );
- static_assert( a_int_r5::dimension::rank_dynamic == 2 , "" );
- static_assert( std::is_same< typename a_int_r5::dimension , ViewDimension<0,0,4,5,6> >::value , "" );
- static_assert( std::is_same< typename a_int_r5::non_const_value_type , int >::value , "" );
+ using namespace Kokkos::Experimental::Impl;
+
+ typedef ViewArrayAnalysis< int[] > a_int_r1;
+ typedef ViewArrayAnalysis< int**[4][5][6] > a_int_r5;
+ typedef ViewArrayAnalysis< const int[] > a_const_int_r1;
+ typedef ViewArrayAnalysis< const int**[4][5][6] > a_const_int_r5;
+
+ static_assert( a_int_r1::dimension::rank == 1, "" );
+ static_assert( a_int_r1::dimension::rank_dynamic == 1, "" );
+ static_assert( a_int_r5::dimension::ArgN0 == 0, "" );
+ static_assert( a_int_r5::dimension::ArgN1 == 0, "" );
+ static_assert( a_int_r5::dimension::ArgN2 == 4, "" );
+ static_assert( a_int_r5::dimension::ArgN3 == 5, "" );
+ static_assert( a_int_r5::dimension::ArgN4 == 6, "" );
+ static_assert( a_int_r5::dimension::ArgN5 == 1, "" );
+
+ static_assert( std::is_same< typename a_int_r1::dimension, ViewDimension<0> >::value, "" );
+ static_assert( std::is_same< typename a_int_r1::non_const_value_type, int >::value, "" );
+
+ static_assert( a_const_int_r1::dimension::rank == 1, "" );
+ static_assert( a_const_int_r1::dimension::rank_dynamic == 1, "" );
+ static_assert( std::is_same< typename a_const_int_r1::dimension, ViewDimension<0> >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r1::non_const_value_type, int >::value, "" );
+
+ static_assert( a_const_int_r5::dimension::rank == 5, "" );
+ static_assert( a_const_int_r5::dimension::rank_dynamic == 2, "" );
+
+ static_assert( a_const_int_r5::dimension::ArgN0 == 0, "" );
+ static_assert( a_const_int_r5::dimension::ArgN1 == 0, "" );
+ static_assert( a_const_int_r5::dimension::ArgN2 == 4, "" );
+ static_assert( a_const_int_r5::dimension::ArgN3 == 5, "" );
+ static_assert( a_const_int_r5::dimension::ArgN4 == 6, "" );
+ static_assert( a_const_int_r5::dimension::ArgN5 == 1, "" );
+
+ static_assert( std::is_same< typename a_const_int_r5::dimension, ViewDimension<0, 0, 4, 5, 6> >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r5::non_const_value_type, int >::value, "" );
+
+ static_assert( a_int_r5::dimension::rank == 5, "" );
+ static_assert( a_int_r5::dimension::rank_dynamic == 2, "" );
+ static_assert( std::is_same< typename a_int_r5::dimension, ViewDimension<0, 0, 4, 5, 6> >::value, "" );
+ static_assert( std::is_same< typename a_int_r5::non_const_value_type, int >::value, "" );
}
{
- using namespace Kokkos::Experimental::Impl ;
+ using namespace Kokkos::Experimental::Impl;
- typedef int t_i4[4] ;
+ typedef int t_i4[4];
// Dimensions of t_i4 are appended to the multdimensional array.
- typedef ViewArrayAnalysis< t_i4 ***[3] > a_int_r5 ;
-
- static_assert( a_int_r5::dimension::rank == 5 , "" );
- static_assert( a_int_r5::dimension::rank_dynamic == 3 , "" );
- static_assert( a_int_r5::dimension::ArgN0 == 0 , "" );
- static_assert( a_int_r5::dimension::ArgN1 == 0 , "" );
- static_assert( a_int_r5::dimension::ArgN2 == 0 , "" );
- static_assert( a_int_r5::dimension::ArgN3 == 3 , "" );
- static_assert( a_int_r5::dimension::ArgN4 == 4 , "" );
- static_assert( std::is_same< typename a_int_r5::non_const_value_type , int >::value , "" );
+ typedef ViewArrayAnalysis< t_i4 ***[3] > a_int_r5;
+
+ static_assert( a_int_r5::dimension::rank == 5, "" );
+ static_assert( a_int_r5::dimension::rank_dynamic == 3, "" );
+ static_assert( a_int_r5::dimension::ArgN0 == 0, "" );
+ static_assert( a_int_r5::dimension::ArgN1 == 0, "" );
+ static_assert( a_int_r5::dimension::ArgN2 == 0, "" );
+ static_assert( a_int_r5::dimension::ArgN3 == 3, "" );
+ static_assert( a_int_r5::dimension::ArgN4 == 4, "" );
+ static_assert( std::is_same< typename a_int_r5::non_const_value_type, int >::value, "" );
}
{
- using namespace Kokkos::Experimental::Impl ;
+ using namespace Kokkos::Experimental::Impl;
- typedef ViewDataAnalysis< const int[] , void > a_const_int_r1 ;
+ typedef ViewDataAnalysis< const int[], void > a_const_int_r1;
- static_assert( std::is_same< typename a_const_int_r1::specialize , void >::value , "" );
- static_assert( std::is_same< typename a_const_int_r1::dimension , Kokkos::Experimental::Impl::ViewDimension<0> >::value , "" );
+ static_assert( std::is_same< typename a_const_int_r1::specialize, void >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r1::dimension, Kokkos::Experimental::Impl::ViewDimension<0> >::value, "" );
- static_assert( std::is_same< typename a_const_int_r1::type , const int * >::value , "" );
- static_assert( std::is_same< typename a_const_int_r1::value_type , const int >::value , "" );
+ static_assert( std::is_same< typename a_const_int_r1::type, const int * >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r1::value_type, const int >::value, "" );
- static_assert( std::is_same< typename a_const_int_r1::scalar_array_type , const int * >::value , "" );
- static_assert( std::is_same< typename a_const_int_r1::const_type , const int * >::value , "" );
- static_assert( std::is_same< typename a_const_int_r1::const_value_type , const int >::value , "" );
- static_assert( std::is_same< typename a_const_int_r1::const_scalar_array_type , const int * >::value , "" );
- static_assert( std::is_same< typename a_const_int_r1::non_const_type , int * >::value , "" );
- static_assert( std::is_same< typename a_const_int_r1::non_const_value_type , int >::value , "" );
+ static_assert( std::is_same< typename a_const_int_r1::scalar_array_type, const int * >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r1::const_type, const int * >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r1::const_value_type, const int >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r1::const_scalar_array_type, const int * >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r1::non_const_type, int * >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r1::non_const_value_type, int >::value, "" );
- typedef ViewDataAnalysis< const int**[4] , void > a_const_int_r3 ;
+ typedef ViewDataAnalysis< const int**[4], void > a_const_int_r3;
- static_assert( std::is_same< typename a_const_int_r3::specialize , void >::value , "" );
+ static_assert( std::is_same< typename a_const_int_r3::specialize, void >::value, "" );
- static_assert( std::is_same< typename a_const_int_r3::dimension , Kokkos::Experimental::Impl::ViewDimension<0,0,4> >::value , "" );
+ static_assert( std::is_same< typename a_const_int_r3::dimension, Kokkos::Experimental::Impl::ViewDimension<0, 0, 4> >::value, "" );
- static_assert( std::is_same< typename a_const_int_r3::type , const int**[4] >::value , "" );
- static_assert( std::is_same< typename a_const_int_r3::value_type , const int >::value , "" );
- static_assert( std::is_same< typename a_const_int_r3::scalar_array_type , const int**[4] >::value , "" );
- static_assert( std::is_same< typename a_const_int_r3::const_type , const int**[4] >::value , "" );
- static_assert( std::is_same< typename a_const_int_r3::const_value_type , const int >::value , "" );
- static_assert( std::is_same< typename a_const_int_r3::const_scalar_array_type , const int**[4] >::value , "" );
- static_assert( std::is_same< typename a_const_int_r3::non_const_type , int**[4] >::value , "" );
- static_assert( std::is_same< typename a_const_int_r3::non_const_value_type , int >::value , "" );
- static_assert( std::is_same< typename a_const_int_r3::non_const_scalar_array_type , int**[4] >::value , "" );
+ static_assert( std::is_same< typename a_const_int_r3::type, const int**[4] >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r3::value_type, const int >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r3::scalar_array_type, const int**[4] >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r3::const_type, const int**[4] >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r3::const_value_type, const int >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r3::const_scalar_array_type, const int**[4] >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r3::non_const_type, int**[4] >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r3::non_const_value_type, int >::value, "" );
+ static_assert( std::is_same< typename a_const_int_r3::non_const_scalar_array_type, int**[4] >::value, "" );
-
- // std::cout << "typeid(const int**[4]).name() = " << typeid(const int**[4]).name() << std::endl ;
+ // std::cout << "typeid( const int**[4] ).name() = " << typeid( const int**[4] ).name() << std::endl;
}
//----------------------------------------
{
- constexpr int N = 10 ;
+ constexpr int N = 10;
- typedef Kokkos::View<int*,Space> T ;
- typedef Kokkos::View<const int*,Space> C ;
+ typedef Kokkos::View< int*, Space > T;
+ typedef Kokkos::View< const int*, Space > C;
- int data[N] ;
+ int data[N];
- T vr1(data,N); // view of non-const
- C cr1(vr1); // view of const from view of non-const
- C cr2( (const int *) data , N );
+ T vr1( data, N ); // View of non-const.
+ C cr1( vr1 ); // View of const from view of non-const.
+ C cr2( (const int *) data, N );
// Generate static_assert error:
// T tmp( cr1 );
- ASSERT_EQ( vr1.span() , N );
- ASSERT_EQ( cr1.span() , N );
- ASSERT_EQ( vr1.data() , & data[0] );
- ASSERT_EQ( cr1.data() , & data[0] );
+ ASSERT_EQ( vr1.span(), N );
+ ASSERT_EQ( cr1.span(), N );
+ ASSERT_EQ( vr1.data(), & data[0] );
+ ASSERT_EQ( cr1.data(), & data[0] );
- ASSERT_TRUE( ( std::is_same< typename T::data_type , int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::const_data_type , const int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::non_const_data_type , int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::data_type , int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::const_data_type , const int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::non_const_data_type, int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::scalar_array_type , int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::const_scalar_array_type , const int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::non_const_scalar_array_type , int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::scalar_array_type , int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::const_scalar_array_type , const int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::non_const_scalar_array_type, int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::value_type , int >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::const_value_type , const int >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::non_const_value_type , int >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::value_type , int >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::const_value_type , const int >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::non_const_value_type, int >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::memory_space , typename Space::memory_space >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::reference_type , int & >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::memory_space, typename Space::memory_space >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::reference_type, int & >::value ) );
- ASSERT_EQ( T::Rank , 1 );
+ ASSERT_EQ( T::Rank, 1 );
- ASSERT_TRUE( ( std::is_same< typename C::data_type , const int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename C::const_data_type , const int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename C::non_const_data_type , int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::data_type , const int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::const_data_type , const int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::non_const_data_type, int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename C::scalar_array_type , const int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename C::const_scalar_array_type , const int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename C::non_const_scalar_array_type , int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::scalar_array_type , const int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::const_scalar_array_type , const int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::non_const_scalar_array_type, int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename C::value_type , const int >::value ) );
- ASSERT_TRUE( ( std::is_same< typename C::const_value_type , const int >::value ) );
- ASSERT_TRUE( ( std::is_same< typename C::non_const_value_type , int >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::value_type , const int >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::const_value_type , const int >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::non_const_value_type, int >::value ) );
- ASSERT_TRUE( ( std::is_same< typename C::memory_space , typename Space::memory_space >::value ) );
- ASSERT_TRUE( ( std::is_same< typename C::reference_type , const int & >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::memory_space, typename Space::memory_space >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename C::reference_type, const int & >::value ) );
- ASSERT_EQ( C::Rank , 1 );
+ ASSERT_EQ( C::Rank, 1 );
- ASSERT_EQ( vr1.dimension_0() , N );
+ ASSERT_EQ( vr1.dimension_0(), N );
- if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , typename Space::memory_space >::accessible ) {
- for ( int i = 0 ; i < N ; ++i ) data[i] = i + 1 ;
- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 1 );
- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( cr1[i] , i + 1 );
+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
+ for ( int i = 0; i < N; ++i ) data[i] = i + 1;
+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( vr1[i], i + 1 );
+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( cr1[i], i + 1 );
{
T tmp( vr1 );
- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 1 );
- for ( int i = 0 ; i < N ; ++i ) vr1(i) = i + 2 ;
- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 2 );
+
+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( tmp[i], i + 1 );
+ for ( int i = 0; i < N; ++i ) vr1( i ) = i + 2;
+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( tmp[i], i + 2 );
}
- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 2 );
+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( vr1[i], i + 2 );
}
}
-
{
- constexpr int N = 10 ;
- typedef Kokkos::View<int*,Space> T ;
- typedef Kokkos::View<const int*,Space> C ;
+ constexpr int N = 10;
+ typedef Kokkos::View< int*, Space > T;
+ typedef Kokkos::View< const int*, Space > C;
+
+ T vr1( "vr1", N );
+ C cr1( vr1 );
- T vr1("vr1",N);
- C cr1(vr1);
+ ASSERT_TRUE( ( std::is_same< typename T::data_type , int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::const_data_type , const int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::non_const_data_type, int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::data_type , int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::const_data_type , const int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::non_const_data_type , int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::scalar_array_type , int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::const_scalar_array_type , const int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::non_const_scalar_array_type, int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::scalar_array_type , int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::const_scalar_array_type , const int* >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::non_const_scalar_array_type , int* >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::value_type , int >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::const_value_type , const int >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::non_const_value_type, int >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::value_type , int >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::const_value_type , const int >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::non_const_value_type , int >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::memory_space, typename Space::memory_space >::value ) );
+ ASSERT_TRUE( ( std::is_same< typename T::reference_type, int & >::value ) );
+ ASSERT_EQ( T::Rank, 1 );
- ASSERT_TRUE( ( std::is_same< typename T::memory_space , typename Space::memory_space >::value ) );
- ASSERT_TRUE( ( std::is_same< typename T::reference_type , int & >::value ) );
- ASSERT_EQ( T::Rank , 1 );
-
- ASSERT_EQ( vr1.dimension_0() , N );
+ ASSERT_EQ( vr1.dimension_0(), N );
- if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , typename Space::memory_space >::accessible ) {
- for ( int i = 0 ; i < N ; ++i ) vr1(i) = i + 1 ;
- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 1 );
- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( cr1[i] , i + 1 );
+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
+ for ( int i = 0; i < N; ++i ) vr1( i ) = i + 1;
+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( vr1[i], i + 1 );
+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( cr1[i], i + 1 );
{
T tmp( vr1 );
- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 1 );
- for ( int i = 0 ; i < N ; ++i ) vr1(i) = i + 2 ;
- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( tmp[i] , i + 2 );
+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( tmp[i], i + 1 );
+ for ( int i = 0; i < N; ++i ) vr1( i ) = i + 2;
+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( tmp[i], i + 2 );
}
- for ( int i = 0 ; i < N ; ++i ) ASSERT_EQ( vr1[i] , i + 2 );
+ for ( int i = 0; i < N; ++i ) ASSERT_EQ( vr1[i], i + 2 );
}
}
- // Testing proper handling of zero-length allocations
+ // Testing proper handling of zero-length allocations.
{
- constexpr int N = 0 ;
- typedef Kokkos::View<int*,Space> T ;
- typedef Kokkos::View<const int*,Space> C ;
+ constexpr int N = 0;
+ typedef Kokkos::View< int*, Space > T;
+ typedef Kokkos::View< const int*, Space > C;
- T vr1("vr1",N);
- C cr1(vr1);
+ T vr1( "vr1", N );
+ C cr1( vr1 );
- ASSERT_EQ( vr1.dimension_0() , 0 );
- ASSERT_EQ( cr1.dimension_0() , 0 );
+ ASSERT_EQ( vr1.dimension_0(), 0 );
+ ASSERT_EQ( cr1.dimension_0(), 0 );
}
-
// Testing using space instance for allocation.
- // The execution space of the memory space must be available for view data initialization
-
- if ( std::is_same< ExecSpace , typename ExecSpace::memory_space::execution_space >::value ) {
-
- using namespace Kokkos::Experimental ;
-
- typedef typename ExecSpace::memory_space memory_space ;
- typedef View<int*,memory_space> V ;
-
- constexpr int N = 10 ;
-
- memory_space mem_space ;
-
- V v( "v" , N );
- V va( view_alloc() , N );
- V vb( view_alloc( "vb" ) , N );
- V vc( view_alloc( "vc" , AllowPadding ) , N );
- V vd( view_alloc( "vd" , WithoutInitializing ) , N );
- V ve( view_alloc( "ve" , WithoutInitializing , AllowPadding ) , N );
- V vf( view_alloc( "vf" , mem_space , WithoutInitializing , AllowPadding ) , N );
- V vg( view_alloc( mem_space , "vg" , WithoutInitializing , AllowPadding ) , N );
- V vh( view_alloc( WithoutInitializing , AllowPadding ) , N );
- V vi( view_alloc( WithoutInitializing ) , N );
- V vj( view_alloc( std::string("vj") , AllowPadding ) , N );
- V vk( view_alloc( mem_space , std::string("vk") , AllowPadding ) , N );
+ // The execution space of the memory space must be available for view data initialization.
+ if ( std::is_same< ExecSpace, typename ExecSpace::memory_space::execution_space >::value ) {
+
+ using namespace Kokkos::Experimental;
+
+ typedef typename ExecSpace::memory_space memory_space;
+ typedef View< int*, memory_space > V;
+
+ constexpr int N = 10;
+
+ memory_space mem_space;
+
+ V v( "v", N );
+ V va( view_alloc(), N );
+ V vb( view_alloc( "vb" ), N );
+ V vc( view_alloc( "vc", AllowPadding ), N );
+ V vd( view_alloc( "vd", WithoutInitializing ), N );
+ V ve( view_alloc( "ve", WithoutInitializing, AllowPadding ), N );
+ V vf( view_alloc( "vf", mem_space, WithoutInitializing, AllowPadding ), N );
+ V vg( view_alloc( mem_space, "vg", WithoutInitializing, AllowPadding ), N );
+ V vh( view_alloc( WithoutInitializing, AllowPadding ), N );
+ V vi( view_alloc( WithoutInitializing ), N );
+ V vj( view_alloc( std::string( "vj" ), AllowPadding ), N );
+ V vk( view_alloc( mem_space, std::string( "vk" ), AllowPadding ), N );
}
{
- typedef Kokkos::ViewTraits<int***,Kokkos::LayoutStride,ExecSpace> traits_t ;
- typedef Kokkos::Experimental::Impl::ViewDimension<0,0,0> dims_t ;
- typedef Kokkos::Experimental::Impl::ViewOffset< dims_t , Kokkos::LayoutStride > offset_t ;
+ typedef Kokkos::ViewTraits< int***, Kokkos::LayoutStride, ExecSpace > traits_t;
+ typedef Kokkos::Experimental::Impl::ViewDimension< 0, 0, 0 > dims_t;
+ typedef Kokkos::Experimental::Impl::ViewOffset< dims_t, Kokkos::LayoutStride > offset_t;
- Kokkos::LayoutStride stride ;
+ Kokkos::LayoutStride stride;
- stride.dimension[0] = 3 ;
- stride.dimension[1] = 4 ;
- stride.dimension[2] = 5 ;
- stride.stride[0] = 4 ;
- stride.stride[1] = 1 ;
- stride.stride[2] = 12 ;
+ stride.dimension[0] = 3;
+ stride.dimension[1] = 4;
+ stride.dimension[2] = 5;
+ stride.stride[0] = 4;
+ stride.stride[1] = 1;
+ stride.stride[2] = 12;
- const offset_t offset( std::integral_constant<unsigned,0>() , stride );
+ const offset_t offset( std::integral_constant< unsigned, 0 >(), stride );
- ASSERT_EQ( offset.dimension_0() , 3 );
- ASSERT_EQ( offset.dimension_1() , 4 );
- ASSERT_EQ( offset.dimension_2() , 5 );
+ ASSERT_EQ( offset.dimension_0(), 3 );
+ ASSERT_EQ( offset.dimension_1(), 4 );
+ ASSERT_EQ( offset.dimension_2(), 5 );
- ASSERT_EQ( offset.stride_0() , 4 );
- ASSERT_EQ( offset.stride_1() , 1 );
- ASSERT_EQ( offset.stride_2() , 12 );
+ ASSERT_EQ( offset.stride_0(), 4 );
+ ASSERT_EQ( offset.stride_1(), 1 );
+ ASSERT_EQ( offset.stride_2(), 12 );
- ASSERT_EQ( offset.span() , 60 );
+ ASSERT_EQ( offset.span(), 60 );
ASSERT_TRUE( offset.span_is_contiguous() );
- Kokkos::Experimental::Impl::ViewMapping< traits_t , void >
- v( Kokkos::Experimental::Impl::ViewCtorProp<int*>((int*)0), stride );
+ Kokkos::Experimental::Impl::ViewMapping< traits_t, void >
+ v( Kokkos::Experimental::Impl::ViewCtorProp< int* >( (int*) 0 ), stride );
}
{
- typedef Kokkos::View<int**,Space> V ;
- typedef typename V::HostMirror M ;
- typedef typename Kokkos::View<int**,Space>::array_layout layout_type;
+ typedef Kokkos::View< int**, Space > V;
+ typedef typename V::HostMirror M;
+ typedef typename Kokkos::View< int**, Space >::array_layout layout_type;
- constexpr int N0 = 10 ;
- constexpr int N1 = 11 ;
+ constexpr int N0 = 10;
+ constexpr int N1 = 11;
- V a("a",N0,N1);
- M b = Kokkos::Experimental::create_mirror(a);
- M c = Kokkos::Experimental::create_mirror_view(a);
- M d ;
+ V a( "a", N0, N1 );
+ M b = Kokkos::Experimental::create_mirror( a );
+ M c = Kokkos::Experimental::create_mirror_view( a );
+ M d;
- for ( int i0 = 0 ; i0 < N0 ; ++i0 )
- for ( int i1 = 0 ; i1 < N1 ; ++i1 )
- b(i0,i1) = 1 + i0 + i1 * N0 ;
+ for ( int i0 = 0; i0 < N0; ++i0 )
+ for ( int i1 = 0; i1 < N1; ++i1 )
+ {
+ b( i0, i1 ) = 1 + i0 + i1 * N0;
+ }
- Kokkos::Experimental::deep_copy( a , b );
- Kokkos::Experimental::deep_copy( c , a );
+ Kokkos::Experimental::deep_copy( a, b );
+ Kokkos::Experimental::deep_copy( c, a );
- for ( int i0 = 0 ; i0 < N0 ; ++i0 )
- for ( int i1 = 0 ; i1 < N1 ; ++i1 )
- ASSERT_EQ( b(i0,i1) , c(i0,i1) );
+ for ( int i0 = 0; i0 < N0; ++i0 )
+ for ( int i1 = 0; i1 < N1; ++i1 )
+ {
+ ASSERT_EQ( b( i0, i1 ), c( i0, i1 ) );
+ }
- Kokkos::Experimental::resize( b , 5 , 6 );
+ Kokkos::Experimental::resize( b, 5, 6 );
- for ( int i0 = 0 ; i0 < 5 ; ++i0 )
- for ( int i1 = 0 ; i1 < 6 ; ++i1 ) {
+ for ( int i0 = 0; i0 < 5; ++i0 )
+ for ( int i1 = 0; i1 < 6; ++i1 )
+ {
int val = 1 + i0 + i1 * N0;
- ASSERT_EQ( b(i0,i1) , c(i0,i1) );
- ASSERT_EQ( b(i0,i1) , val );
+ ASSERT_EQ( b( i0, i1 ), c( i0, i1 ) );
+ ASSERT_EQ( b( i0, i1 ), val );
}
- Kokkos::Experimental::realloc( c , 5 , 6 );
- Kokkos::Experimental::realloc( d , 5 , 6 );
+ Kokkos::Experimental::realloc( c, 5, 6 );
+ Kokkos::Experimental::realloc( d, 5, 6 );
- ASSERT_EQ( b.dimension_0() , 5 );
- ASSERT_EQ( b.dimension_1() , 6 );
- ASSERT_EQ( c.dimension_0() , 5 );
- ASSERT_EQ( c.dimension_1() , 6 );
- ASSERT_EQ( d.dimension_0() , 5 );
- ASSERT_EQ( d.dimension_1() , 6 );
+ ASSERT_EQ( b.dimension_0(), 5 );
+ ASSERT_EQ( b.dimension_1(), 6 );
+ ASSERT_EQ( c.dimension_0(), 5 );
+ ASSERT_EQ( c.dimension_1(), 6 );
+ ASSERT_EQ( d.dimension_0(), 5 );
+ ASSERT_EQ( d.dimension_1(), 6 );
- layout_type layout(7,8);
- Kokkos::Experimental::resize( b , layout );
- for ( int i0 = 0 ; i0 < 7 ; ++i0 )
- for ( int i1 = 6 ; i1 < 8 ; ++i1 )
- b(i0,i1) = 1 + i0 + i1 * N0 ;
+ layout_type layout( 7, 8 );
+ Kokkos::Experimental::resize( b, layout );
+ for ( int i0 = 0; i0 < 7; ++i0 )
+ for ( int i1 = 6; i1 < 8; ++i1 )
+ {
+ b( i0, i1 ) = 1 + i0 + i1 * N0;
+ }
- for ( int i0 = 5 ; i0 < 7 ; ++i0 )
- for ( int i1 = 0 ; i1 < 8 ; ++i1 )
- b(i0,i1) = 1 + i0 + i1 * N0 ;
+ for ( int i0 = 5; i0 < 7; ++i0 )
+ for ( int i1 = 0; i1 < 8; ++i1 )
+ {
+ b( i0, i1 ) = 1 + i0 + i1 * N0;
+ }
- for ( int i0 = 0 ; i0 < 7 ; ++i0 )
- for ( int i1 = 0 ; i1 < 8 ; ++i1 ) {
+ for ( int i0 = 0; i0 < 7; ++i0 )
+ for ( int i1 = 0; i1 < 8; ++i1 )
+ {
int val = 1 + i0 + i1 * N0;
- ASSERT_EQ( b(i0,i1) , val );
+ ASSERT_EQ( b( i0, i1 ), val );
}
- Kokkos::Experimental::realloc( c , layout );
- Kokkos::Experimental::realloc( d , layout );
-
- ASSERT_EQ( b.dimension_0() , 7 );
- ASSERT_EQ( b.dimension_1() , 8 );
- ASSERT_EQ( c.dimension_0() , 7 );
- ASSERT_EQ( c.dimension_1() , 8 );
- ASSERT_EQ( d.dimension_0() , 7 );
- ASSERT_EQ( d.dimension_1() , 8 );
+ Kokkos::Experimental::realloc( c, layout );
+ Kokkos::Experimental::realloc( d, layout );
+ ASSERT_EQ( b.dimension_0(), 7 );
+ ASSERT_EQ( b.dimension_1(), 8 );
+ ASSERT_EQ( c.dimension_0(), 7 );
+ ASSERT_EQ( c.dimension_1(), 8 );
+ ASSERT_EQ( d.dimension_0(), 7 );
+ ASSERT_EQ( d.dimension_1(), 8 );
}
{
- typedef Kokkos::View<int**,Kokkos::LayoutStride,Space> V ;
- typedef typename V::HostMirror M ;
- typedef typename Kokkos::View<int**,Kokkos::LayoutStride,Space>::array_layout layout_type;
+ typedef Kokkos::View< int**, Kokkos::LayoutStride, Space > V;
+ typedef typename V::HostMirror M;
+ typedef typename Kokkos::View< int**, Kokkos::LayoutStride, Space >::array_layout layout_type;
- constexpr int N0 = 10 ;
- constexpr int N1 = 11 ;
+ constexpr int N0 = 10;
+ constexpr int N1 = 11;
- const int dimensions[] = {N0,N1};
- const int order[] = {1,0};
+ const int dimensions[] = { N0, N1 };
+ const int order[] = { 1, 0 };
- V a("a",Kokkos::LayoutStride::order_dimensions(2,order,dimensions));
- M b = Kokkos::Experimental::create_mirror(a);
- M c = Kokkos::Experimental::create_mirror_view(a);
- M d ;
+ V a( "a", Kokkos::LayoutStride::order_dimensions( 2, order, dimensions ) );
+ M b = Kokkos::Experimental::create_mirror( a );
+ M c = Kokkos::Experimental::create_mirror_view( a );
+ M d;
- for ( int i0 = 0 ; i0 < N0 ; ++i0 )
- for ( int i1 = 0 ; i1 < N1 ; ++i1 )
- b(i0,i1) = 1 + i0 + i1 * N0 ;
+ for ( int i0 = 0; i0 < N0; ++i0 )
+ for ( int i1 = 0; i1 < N1; ++i1 )
+ {
+ b( i0, i1 ) = 1 + i0 + i1 * N0;
+ }
- Kokkos::Experimental::deep_copy( a , b );
- Kokkos::Experimental::deep_copy( c , a );
+ Kokkos::Experimental::deep_copy( a, b );
+ Kokkos::Experimental::deep_copy( c, a );
- for ( int i0 = 0 ; i0 < N0 ; ++i0 )
- for ( int i1 = 0 ; i1 < N1 ; ++i1 )
- ASSERT_EQ( b(i0,i1) , c(i0,i1) );
+ for ( int i0 = 0; i0 < N0; ++i0 )
+ for ( int i1 = 0; i1 < N1; ++i1 )
+ {
+ ASSERT_EQ( b( i0, i1 ), c( i0, i1 ) );
+ }
- const int dimensions2[] = {7,8};
- const int order2[] = {1,0};
- layout_type layout = layout_type::order_dimensions(2,order2,dimensions2);
- Kokkos::Experimental::resize( b , layout );
+ const int dimensions2[] = { 7, 8 };
+ const int order2[] = { 1, 0 };
+ layout_type layout = layout_type::order_dimensions( 2, order2, dimensions2 );
+ Kokkos::Experimental::resize( b, layout );
- for ( int i0 = 0 ; i0 < 7 ; ++i0 )
- for ( int i1 = 0 ; i1 < 8 ; ++i1 ) {
+ for ( int i0 = 0; i0 < 7; ++i0 )
+ for ( int i1 = 0; i1 < 8; ++i1 )
+ {
int val = 1 + i0 + i1 * N0;
- ASSERT_EQ( b(i0,i1) , c(i0,i1) );
- ASSERT_EQ( b(i0,i1) , val );
+ ASSERT_EQ( b( i0, i1 ), c( i0, i1 ) );
+ ASSERT_EQ( b( i0, i1 ), val );
}
- Kokkos::Experimental::realloc( c , layout );
- Kokkos::Experimental::realloc( d , layout );
+ Kokkos::Experimental::realloc( c, layout );
+ Kokkos::Experimental::realloc( d, layout );
- ASSERT_EQ( b.dimension_0() , 7 );
- ASSERT_EQ( b.dimension_1() , 8 );
- ASSERT_EQ( c.dimension_0() , 7 );
- ASSERT_EQ( c.dimension_1() , 8 );
- ASSERT_EQ( d.dimension_0() , 7 );
- ASSERT_EQ( d.dimension_1() , 8 );
+ ASSERT_EQ( b.dimension_0(), 7 );
+ ASSERT_EQ( b.dimension_1(), 8 );
+ ASSERT_EQ( c.dimension_0(), 7 );
+ ASSERT_EQ( c.dimension_1(), 8 );
+ ASSERT_EQ( d.dimension_0(), 7 );
+ ASSERT_EQ( d.dimension_1(), 8 );
}
{
- typedef Kokkos::View<int*,Space> V ;
- typedef Kokkos::View<int*,Space,Kokkos::MemoryUnmanaged> U ;
+ typedef Kokkos::View< int*, Space > V;
+ typedef Kokkos::View< int*, Space, Kokkos::MemoryUnmanaged > U;
+ V a( "a", 10 );
- V a("a",10);
+ ASSERT_EQ( a.use_count(), 1 );
- ASSERT_EQ( a.use_count() , 1 );
+ V b = a;
- V b = a ;
-
- ASSERT_EQ( a.use_count() , 2 );
- ASSERT_EQ( b.use_count() , 2 );
+ ASSERT_EQ( a.use_count(), 2 );
+ ASSERT_EQ( b.use_count(), 2 );
{
- U c = b ; // 'c' is compile-time unmanaged
+ U c = b; // 'c' is compile-time unmanaged.
- ASSERT_EQ( a.use_count() , 2 );
- ASSERT_EQ( b.use_count() , 2 );
- ASSERT_EQ( c.use_count() , 2 );
+ ASSERT_EQ( a.use_count(), 2 );
+ ASSERT_EQ( b.use_count(), 2 );
+ ASSERT_EQ( c.use_count(), 2 );
- V d = c ; // 'd' is run-time unmanaged
+ V d = c; // 'd' is run-time unmanaged.
- ASSERT_EQ( a.use_count() , 2 );
- ASSERT_EQ( b.use_count() , 2 );
- ASSERT_EQ( c.use_count() , 2 );
- ASSERT_EQ( d.use_count() , 2 );
+ ASSERT_EQ( a.use_count(), 2 );
+ ASSERT_EQ( b.use_count(), 2 );
+ ASSERT_EQ( c.use_count(), 2 );
+ ASSERT_EQ( d.use_count(), 2 );
}
- ASSERT_EQ( a.use_count() , 2 );
- ASSERT_EQ( b.use_count() , 2 );
+ ASSERT_EQ( a.use_count(), 2 );
+ ASSERT_EQ( b.use_count(), 2 );
b = V();
- ASSERT_EQ( a.use_count() , 1 );
- ASSERT_EQ( b.use_count() , 0 );
-
-#if ! defined ( KOKKOS_ENABLE_CUDA_LAMBDA )
- /* Cannot launch host lambda when CUDA lambda is enabled */
-
- typedef typename Kokkos::Impl::HostMirror< Space >::Space::execution_space
- host_exec_space ;
-
- Kokkos::parallel_for(
- Kokkos::RangePolicy< host_exec_space >(0,10) ,
- KOKKOS_LAMBDA( int i ){
- // 'a' is captured by copy and the capture mechanism
- // converts 'a' to an unmanaged copy.
- // When the parallel dispatch accepts a move for the lambda
- // this count should become 1
- ASSERT_EQ( a.use_count() , 2 );
- V x = a ;
- ASSERT_EQ( a.use_count() , 2 );
- ASSERT_EQ( x.use_count() , 2 );
- });
-#endif /* #if ! defined ( KOKKOS_ENABLE_CUDA_LAMBDA ) */
+ ASSERT_EQ( a.use_count(), 1 );
+ ASSERT_EQ( b.use_count(), 0 );
+
+#if !defined( KOKKOS_ENABLE_CUDA_LAMBDA )
+ // Cannot launch host lambda when CUDA lambda is enabled.
+
+ typedef typename Kokkos::Impl::HostMirror< Space >::Space::execution_space host_exec_space;
+
+ Kokkos::parallel_for( Kokkos::RangePolicy< host_exec_space >( 0, 10 ), KOKKOS_LAMBDA ( int i ) {
+ // 'a' is captured by copy, and the capture mechanism converts 'a' to an
+ // unmanaged copy. When the parallel dispatch accepts a move for the
+ // lambda, this count should become 1.
+ ASSERT_EQ( a.use_count(), 2 );
+ V x = a;
+ ASSERT_EQ( a.use_count(), 2 );
+ ASSERT_EQ( x.use_count(), 2 );
+ });
+#endif // #if !defined( KOKKOS_ENABLE_CUDA_LAMBDA )
}
}
template< class Space >
struct TestViewMappingSubview
{
- typedef typename Space::execution_space ExecSpace ;
- typedef typename Space::memory_space MemSpace ;
+ typedef typename Space::execution_space ExecSpace;
+ typedef typename Space::memory_space MemSpace;
- typedef Kokkos::pair<int,int> range ;
+ typedef Kokkos::pair< int, int > range;
enum { AN = 10 };
- typedef Kokkos::View<int*,ExecSpace> AT ;
- typedef Kokkos::View<const int*,ExecSpace> ACT ;
- typedef Kokkos::Subview< AT , range > AS ;
+ typedef Kokkos::View< int*, ExecSpace > AT;
+ typedef Kokkos::View< const int*, ExecSpace > ACT;
+ typedef Kokkos::Subview< AT, range > AS;
- enum { BN0 = 10 , BN1 = 11 , BN2 = 12 };
- typedef Kokkos::View<int***,ExecSpace> BT ;
- typedef Kokkos::Subview< BT , range , range , range > BS ;
+ enum { BN0 = 10, BN1 = 11, BN2 = 12 };
+ typedef Kokkos::View< int***, ExecSpace > BT;
+ typedef Kokkos::Subview< BT, range, range, range > BS;
- enum { CN0 = 10 , CN1 = 11 , CN2 = 12 };
- typedef Kokkos::View<int***[13][14],ExecSpace> CT ;
- typedef Kokkos::Subview< CT , range , range , range , int , int > CS ;
+ enum { CN0 = 10, CN1 = 11, CN2 = 12 };
+ typedef Kokkos::View< int***[13][14], ExecSpace > CT;
+ typedef Kokkos::Subview< CT, range, range, range, int, int > CS;
- enum { DN0 = 10 , DN1 = 11 , DN2 = 12 , DN3 = 13 , DN4 = 14 };
- typedef Kokkos::View<int***[DN3][DN4],ExecSpace> DT ;
- typedef Kokkos::Subview< DT , int , range , range , range , int > DS ;
+ enum { DN0 = 10, DN1 = 11, DN2 = 12, DN3 = 13, DN4 = 14 };
+ typedef Kokkos::View< int***[DN3][DN4], ExecSpace > DT;
+ typedef Kokkos::Subview< DT, int, range, range, range, int > DS;
+ typedef Kokkos::View< int***[13][14], Kokkos::LayoutLeft, ExecSpace > DLT;
+ typedef Kokkos::Subview< DLT, range, int, int, int, int > DLS1;
- typedef Kokkos::View<int***[13][14],Kokkos::LayoutLeft,ExecSpace> DLT ;
- typedef Kokkos::Subview< DLT , range , int , int , int , int > DLS1 ;
-
- static_assert( DLS1::rank == 1 && std::is_same< typename DLS1::array_layout , Kokkos::LayoutLeft >::value
+ static_assert( DLS1::rank == 1 && std::is_same< typename DLS1::array_layout, Kokkos::LayoutLeft >::value
, "Subview layout error for rank 1 subview of left-most range of LayoutLeft" );
- typedef Kokkos::View<int***[13][14],Kokkos::LayoutRight,ExecSpace> DRT ;
- typedef Kokkos::Subview< DRT , int , int , int , int , range > DRS1 ;
+ typedef Kokkos::View< int***[13][14], Kokkos::LayoutRight, ExecSpace > DRT;
+ typedef Kokkos::Subview< DRT, int, int, int, int, range > DRS1;
- static_assert( DRS1::rank == 1 && std::is_same< typename DRS1::array_layout , Kokkos::LayoutRight >::value
+ static_assert( DRS1::rank == 1 && std::is_same< typename DRS1::array_layout, Kokkos::LayoutRight >::value
, "Subview layout error for rank 1 subview of right-most range of LayoutRight" );
- AT Aa ;
- AS Ab ;
- ACT Ac ;
- BT Ba ;
- BS Bb ;
- CT Ca ;
- CS Cb ;
- DT Da ;
- DS Db ;
+ AT Aa;
+ AS Ab;
+ ACT Ac;
+ BT Ba;
+ BS Bb;
+ CT Ca;
+ CS Cb;
+ DT Da;
+ DS Db;
TestViewMappingSubview()
- : Aa("Aa",AN)
- , Ab( Kokkos::Experimental::subview( Aa , std::pair<int,int>(1,AN-1) ) )
- , Ac( Aa , std::pair<int,int>(1,AN-1) )
- , Ba("Ba",BN0,BN1,BN2)
+ : Aa( "Aa", AN )
+ , Ab( Kokkos::Experimental::subview( Aa, std::pair< int, int >( 1, AN - 1 ) ) )
+ , Ac( Aa, std::pair< int, int >( 1, AN - 1 ) )
+ , Ba( "Ba", BN0, BN1, BN2 )
, Bb( Kokkos::Experimental::subview( Ba
- , std::pair<int,int>(1,BN0-1)
- , std::pair<int,int>(1,BN1-1)
- , std::pair<int,int>(1,BN2-1)
+ , std::pair< int, int >( 1, BN0 - 1 )
+ , std::pair< int, int >( 1, BN1 - 1 )
+ , std::pair< int, int >( 1, BN2 - 1 )
) )
- , Ca("Ca",CN0,CN1,CN2)
+ , Ca( "Ca", CN0, CN1, CN2 )
, Cb( Kokkos::Experimental::subview( Ca
- , std::pair<int,int>(1,CN0-1)
- , std::pair<int,int>(1,CN1-1)
- , std::pair<int,int>(1,CN2-1)
+ , std::pair< int, int >( 1, CN0 - 1 )
+ , std::pair< int, int >( 1, CN1 - 1 )
+ , std::pair< int, int >( 1, CN2 - 1 )
, 1
, 2
) )
- , Da("Da",DN0,DN1,DN2)
+ , Da( "Da", DN0, DN1, DN2 )
, Db( Kokkos::Experimental::subview( Da
, 1
- , std::pair<int,int>(1,DN1-1)
- , std::pair<int,int>(1,DN2-1)
- , std::pair<int,int>(1,DN3-1)
+ , std::pair< int, int >( 1, DN1 - 1 )
+ , std::pair< int, int >( 1, DN2 - 1 )
+ , std::pair< int, int >( 1, DN3 - 1 )
, 2
) )
+ {}
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int, long & error_count ) const
+ {
+ auto Ad = Kokkos::Experimental::subview< Kokkos::MemoryUnmanaged >( Aa, Kokkos::pair< int, int >( 1, AN - 1 ) );
+
+ for ( int i = 1; i < AN - 1; ++i ) if( & Aa[i] != & Ab[i - 1] ) ++error_count;
+ for ( int i = 1; i < AN - 1; ++i ) if( & Aa[i] != & Ac[i - 1] ) ++error_count;
+ for ( int i = 1; i < AN - 1; ++i ) if( & Aa[i] != & Ad[i - 1] ) ++error_count;
+
+ for ( int i2 = 1; i2 < BN2 - 1; ++i2 )
+ for ( int i1 = 1; i1 < BN1 - 1; ++i1 )
+ for ( int i0 = 1; i0 < BN0 - 1; ++i0 )
{
+ if ( & Ba( i0, i1, i2 ) != & Bb( i0 - 1, i1 - 1, i2 - 1 ) ) ++error_count;
}
+ for ( int i2 = 1; i2 < CN2 - 1; ++i2 )
+ for ( int i1 = 1; i1 < CN1 - 1; ++i1 )
+ for ( int i0 = 1; i0 < CN0 - 1; ++i0 )
+ {
+ if ( & Ca( i0, i1, i2, 1, 2 ) != & Cb( i0 - 1, i1 - 1, i2 - 1 ) ) ++error_count;
+ }
- KOKKOS_INLINE_FUNCTION
- void operator()( const int , long & error_count ) const
+ for ( int i2 = 1; i2 < DN3 - 1; ++i2 )
+ for ( int i1 = 1; i1 < DN2 - 1; ++i1 )
+ for ( int i0 = 1; i0 < DN1 - 1; ++i0 )
{
- auto Ad = Kokkos::Experimental::subview< Kokkos::MemoryUnmanaged >( Aa , Kokkos::pair<int,int>(1,AN-1) );
-
- for ( int i = 1 ; i < AN-1 ; ++i ) if( & Aa[i] != & Ab[i-1] ) ++error_count ;
- for ( int i = 1 ; i < AN-1 ; ++i ) if( & Aa[i] != & Ac[i-1] ) ++error_count ;
- for ( int i = 1 ; i < AN-1 ; ++i ) if( & Aa[i] != & Ad[i-1] ) ++error_count ;
-
- for ( int i2 = 1 ; i2 < BN2-1 ; ++i2 ) {
- for ( int i1 = 1 ; i1 < BN1-1 ; ++i1 ) {
- for ( int i0 = 1 ; i0 < BN0-1 ; ++i0 ) {
- if ( & Ba(i0,i1,i2) != & Bb(i0-1,i1-1,i2-1) ) ++error_count ;
- }}}
-
- for ( int i2 = 1 ; i2 < CN2-1 ; ++i2 ) {
- for ( int i1 = 1 ; i1 < CN1-1 ; ++i1 ) {
- for ( int i0 = 1 ; i0 < CN0-1 ; ++i0 ) {
- if ( & Ca(i0,i1,i2,1,2) != & Cb(i0-1,i1-1,i2-1) ) ++error_count ;
- }}}
-
- for ( int i2 = 1 ; i2 < DN3-1 ; ++i2 ) {
- for ( int i1 = 1 ; i1 < DN2-1 ; ++i1 ) {
- for ( int i0 = 1 ; i0 < DN1-1 ; ++i0 ) {
- if ( & Da(1,i0,i1,i2,2) != & Db(i0-1,i1-1,i2-1) ) ++error_count ;
- }}}
+ if ( & Da( 1, i0, i1, i2, 2 ) != & Db( i0 - 1, i1 - 1, i2 - 1 ) ) ++error_count;
}
+ }
static void run()
{
- TestViewMappingSubview self ;
-
- ASSERT_EQ( self.Aa.dimension_0() , AN );
- ASSERT_EQ( self.Ab.dimension_0() , AN - 2 );
- ASSERT_EQ( self.Ac.dimension_0() , AN - 2 );
- ASSERT_EQ( self.Ba.dimension_0() , BN0 );
- ASSERT_EQ( self.Ba.dimension_1() , BN1 );
- ASSERT_EQ( self.Ba.dimension_2() , BN2 );
- ASSERT_EQ( self.Bb.dimension_0() , BN0 - 2 );
- ASSERT_EQ( self.Bb.dimension_1() , BN1 - 2 );
- ASSERT_EQ( self.Bb.dimension_2() , BN2 - 2 );
-
- ASSERT_EQ( self.Ca.dimension_0() , CN0 );
- ASSERT_EQ( self.Ca.dimension_1() , CN1 );
- ASSERT_EQ( self.Ca.dimension_2() , CN2 );
- ASSERT_EQ( self.Ca.dimension_3() , 13 );
- ASSERT_EQ( self.Ca.dimension_4() , 14 );
- ASSERT_EQ( self.Cb.dimension_0() , CN0 - 2 );
- ASSERT_EQ( self.Cb.dimension_1() , CN1 - 2 );
- ASSERT_EQ( self.Cb.dimension_2() , CN2 - 2 );
-
- ASSERT_EQ( self.Da.dimension_0() , DN0 );
- ASSERT_EQ( self.Da.dimension_1() , DN1 );
- ASSERT_EQ( self.Da.dimension_2() , DN2 );
- ASSERT_EQ( self.Da.dimension_3() , DN3 );
- ASSERT_EQ( self.Da.dimension_4() , DN4 );
-
- ASSERT_EQ( self.Db.dimension_0() , DN1 - 2 );
- ASSERT_EQ( self.Db.dimension_1() , DN2 - 2 );
- ASSERT_EQ( self.Db.dimension_2() , DN3 - 2 );
-
- ASSERT_EQ( self.Da.stride_1() , self.Db.stride_0() );
- ASSERT_EQ( self.Da.stride_2() , self.Db.stride_1() );
- ASSERT_EQ( self.Da.stride_3() , self.Db.stride_2() );
-
- long error_count = -1 ;
- Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >(0,1) , self , error_count );
- ASSERT_EQ( error_count , 0 );
+ TestViewMappingSubview self;
+
+ ASSERT_EQ( self.Aa.dimension_0(), AN );
+ ASSERT_EQ( self.Ab.dimension_0(), AN - 2 );
+ ASSERT_EQ( self.Ac.dimension_0(), AN - 2 );
+ ASSERT_EQ( self.Ba.dimension_0(), BN0 );
+ ASSERT_EQ( self.Ba.dimension_1(), BN1 );
+ ASSERT_EQ( self.Ba.dimension_2(), BN2 );
+ ASSERT_EQ( self.Bb.dimension_0(), BN0 - 2 );
+ ASSERT_EQ( self.Bb.dimension_1(), BN1 - 2 );
+ ASSERT_EQ( self.Bb.dimension_2(), BN2 - 2 );
+
+ ASSERT_EQ( self.Ca.dimension_0(), CN0 );
+ ASSERT_EQ( self.Ca.dimension_1(), CN1 );
+ ASSERT_EQ( self.Ca.dimension_2(), CN2 );
+ ASSERT_EQ( self.Ca.dimension_3(), 13 );
+ ASSERT_EQ( self.Ca.dimension_4(), 14 );
+ ASSERT_EQ( self.Cb.dimension_0(), CN0 - 2 );
+ ASSERT_EQ( self.Cb.dimension_1(), CN1 - 2 );
+ ASSERT_EQ( self.Cb.dimension_2(), CN2 - 2 );
+
+ ASSERT_EQ( self.Da.dimension_0(), DN0 );
+ ASSERT_EQ( self.Da.dimension_1(), DN1 );
+ ASSERT_EQ( self.Da.dimension_2(), DN2 );
+ ASSERT_EQ( self.Da.dimension_3(), DN3 );
+ ASSERT_EQ( self.Da.dimension_4(), DN4 );
+
+ ASSERT_EQ( self.Db.dimension_0(), DN1 - 2 );
+ ASSERT_EQ( self.Db.dimension_1(), DN2 - 2 );
+ ASSERT_EQ( self.Db.dimension_2(), DN3 - 2 );
+
+ ASSERT_EQ( self.Da.stride_1(), self.Db.stride_0() );
+ ASSERT_EQ( self.Da.stride_2(), self.Db.stride_1() );
+ ASSERT_EQ( self.Da.stride_3(), self.Db.stride_2() );
+
+ long error_count = -1;
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace >( 0, 1 ), self, error_count );
+ ASSERT_EQ( error_count, 0 );
}
-
};
template< class Space >
void test_view_mapping_subview()
{
- typedef typename Space::execution_space ExecSpace ;
+ typedef typename Space::execution_space ExecSpace;
TestViewMappingSubview< ExecSpace >::run();
}
/*--------------------------------------------------------------------------*/
template< class ViewType >
struct TestViewMapOperator {
static_assert( ViewType::reference_type_is_lvalue_reference
, "Test only valid for lvalue reference type" );
- const ViewType v ;
+ const ViewType v;
KOKKOS_INLINE_FUNCTION
- void test_left( size_t i0 , long & error_count ) const
+ void test_left( size_t i0, long & error_count ) const
+ {
+ typename ViewType::value_type * const base_ptr = & v( 0, 0, 0, 0, 0, 0, 0, 0 );
+ const size_t n1 = v.dimension_1();
+ const size_t n2 = v.dimension_2();
+ const size_t n3 = v.dimension_3();
+ const size_t n4 = v.dimension_4();
+ const size_t n5 = v.dimension_5();
+ const size_t n6 = v.dimension_6();
+ const size_t n7 = v.dimension_7();
+
+ long offset = 0;
+
+ for ( size_t i7 = 0; i7 < n7; ++i7 )
+ for ( size_t i6 = 0; i6 < n6; ++i6 )
+ for ( size_t i5 = 0; i5 < n5; ++i5 )
+ for ( size_t i4 = 0; i4 < n4; ++i4 )
+ for ( size_t i3 = 0; i3 < n3; ++i3 )
+ for ( size_t i2 = 0; i2 < n2; ++i2 )
+ for ( size_t i1 = 0; i1 < n1; ++i1 )
{
- typename ViewType::value_type * const base_ptr = & v(0,0,0,0,0,0,0,0);
- const size_t n1 = v.dimension_1();
- const size_t n2 = v.dimension_2();
- const size_t n3 = v.dimension_3();
- const size_t n4 = v.dimension_4();
- const size_t n5 = v.dimension_5();
- const size_t n6 = v.dimension_6();
- const size_t n7 = v.dimension_7();
-
- long offset = 0 ;
-
- for ( size_t i7 = 0 ; i7 < n7 ; ++i7 )
- for ( size_t i6 = 0 ; i6 < n6 ; ++i6 )
- for ( size_t i5 = 0 ; i5 < n5 ; ++i5 )
- for ( size_t i4 = 0 ; i4 < n4 ; ++i4 )
- for ( size_t i3 = 0 ; i3 < n3 ; ++i3 )
- for ( size_t i2 = 0 ; i2 < n2 ; ++i2 )
- for ( size_t i1 = 0 ; i1 < n1 ; ++i1 )
- {
- const long d = & v(i0,i1,i2,i3,i4,i5,i6,i7) - base_ptr ;
- if ( d < offset ) ++error_count ;
- offset = d ;
- }
-
- if ( v.span() <= size_t(offset) ) ++error_count ;
+ const long d = & v( i0, i1, i2, i3, i4, i5, i6, i7 ) - base_ptr;
+ if ( d < offset ) ++error_count;
+ offset = d;
}
+ if ( v.span() <= size_t( offset ) ) ++error_count;
+ }
+
KOKKOS_INLINE_FUNCTION
- void test_right( size_t i0 , long & error_count ) const
+ void test_right( size_t i0, long & error_count ) const
+ {
+ typename ViewType::value_type * const base_ptr = & v( 0, 0, 0, 0, 0, 0, 0, 0 );
+ const size_t n1 = v.dimension_1();
+ const size_t n2 = v.dimension_2();
+ const size_t n3 = v.dimension_3();
+ const size_t n4 = v.dimension_4();
+ const size_t n5 = v.dimension_5();
+ const size_t n6 = v.dimension_6();
+ const size_t n7 = v.dimension_7();
+
+ long offset = 0;
+
+ for ( size_t i1 = 0; i1 < n1; ++i1 )
+ for ( size_t i2 = 0; i2 < n2; ++i2 )
+ for ( size_t i3 = 0; i3 < n3; ++i3 )
+ for ( size_t i4 = 0; i4 < n4; ++i4 )
+ for ( size_t i5 = 0; i5 < n5; ++i5 )
+ for ( size_t i6 = 0; i6 < n6; ++i6 )
+ for ( size_t i7 = 0; i7 < n7; ++i7 )
{
- typename ViewType::value_type * const base_ptr = & v(0,0,0,0,0,0,0,0);
- const size_t n1 = v.dimension_1();
- const size_t n2 = v.dimension_2();
- const size_t n3 = v.dimension_3();
- const size_t n4 = v.dimension_4();
- const size_t n5 = v.dimension_5();
- const size_t n6 = v.dimension_6();
- const size_t n7 = v.dimension_7();
-
- long offset = 0 ;
-
- for ( size_t i1 = 0 ; i1 < n1 ; ++i1 )
- for ( size_t i2 = 0 ; i2 < n2 ; ++i2 )
- for ( size_t i3 = 0 ; i3 < n3 ; ++i3 )
- for ( size_t i4 = 0 ; i4 < n4 ; ++i4 )
- for ( size_t i5 = 0 ; i5 < n5 ; ++i5 )
- for ( size_t i6 = 0 ; i6 < n6 ; ++i6 )
- for ( size_t i7 = 0 ; i7 < n7 ; ++i7 )
- {
- const long d = & v(i0,i1,i2,i3,i4,i5,i6,i7) - base_ptr ;
- if ( d < offset ) ++error_count ;
- offset = d ;
- }
-
- if ( v.span() <= size_t(offset) ) ++error_count ;
+ const long d = & v( i0, i1, i2, i3, i4, i5, i6, i7 ) - base_ptr;
+ if ( d < offset ) ++error_count;
+ offset = d;
}
+ if ( v.span() <= size_t( offset ) ) ++error_count;
+ }
+
KOKKOS_INLINE_FUNCTION
- void operator()( size_t i , long & error_count ) const
- {
- if ( std::is_same< typename ViewType::array_layout , Kokkos::LayoutLeft >::value )
- test_left(i,error_count);
- else if ( std::is_same< typename ViewType::array_layout , Kokkos::LayoutRight >::value )
- test_right(i,error_count);
+ void operator()( size_t i, long & error_count ) const
+ {
+ if ( std::is_same< typename ViewType::array_layout, Kokkos::LayoutLeft >::value ) {
+ test_left( i, error_count );
}
+ else if ( std::is_same< typename ViewType::array_layout, Kokkos::LayoutRight >::value ) {
+ test_right( i, error_count );
+ }
+ }
- constexpr static size_t N0 = 10 ;
- constexpr static size_t N1 = 9 ;
- constexpr static size_t N2 = 8 ;
- constexpr static size_t N3 = 7 ;
- constexpr static size_t N4 = 6 ;
- constexpr static size_t N5 = 5 ;
- constexpr static size_t N6 = 4 ;
- constexpr static size_t N7 = 3 ;
+ constexpr static size_t N0 = 10;
+ constexpr static size_t N1 = 9;
+ constexpr static size_t N2 = 8;
+ constexpr static size_t N3 = 7;
+ constexpr static size_t N4 = 6;
+ constexpr static size_t N5 = 5;
+ constexpr static size_t N6 = 4;
+ constexpr static size_t N7 = 3;
- TestViewMapOperator() : v( "Test" , N0, N1, N2, N3, N4, N5, N6, N7 ) {}
+ TestViewMapOperator() : v( "Test", N0, N1, N2, N3, N4, N5, N6, N7 ) {}
static void run()
- {
- TestViewMapOperator self ;
-
- ASSERT_EQ( self.v.dimension_0() , ( 0 < ViewType::rank ? N0 : 1 ) );
- ASSERT_EQ( self.v.dimension_1() , ( 1 < ViewType::rank ? N1 : 1 ) );
- ASSERT_EQ( self.v.dimension_2() , ( 2 < ViewType::rank ? N2 : 1 ) );
- ASSERT_EQ( self.v.dimension_3() , ( 3 < ViewType::rank ? N3 : 1 ) );
- ASSERT_EQ( self.v.dimension_4() , ( 4 < ViewType::rank ? N4 : 1 ) );
- ASSERT_EQ( self.v.dimension_5() , ( 5 < ViewType::rank ? N5 : 1 ) );
- ASSERT_EQ( self.v.dimension_6() , ( 6 < ViewType::rank ? N6 : 1 ) );
- ASSERT_EQ( self.v.dimension_7() , ( 7 < ViewType::rank ? N7 : 1 ) );
-
- ASSERT_LE( self.v.dimension_0()*
- self.v.dimension_1()*
- self.v.dimension_2()*
- self.v.dimension_3()*
- self.v.dimension_4()*
- self.v.dimension_5()*
- self.v.dimension_6()*
- self.v.dimension_7()
- , self.v.span() );
-
- long error_count ;
- Kokkos::RangePolicy< typename ViewType::execution_space > range(0,self.v.dimension_0());
- Kokkos::parallel_reduce( range , self , error_count );
- ASSERT_EQ( 0 , error_count );
- }
+ {
+ TestViewMapOperator self;
+
+ ASSERT_EQ( self.v.dimension_0(), ( 0 < ViewType::rank ? N0 : 1 ) );
+ ASSERT_EQ( self.v.dimension_1(), ( 1 < ViewType::rank ? N1 : 1 ) );
+ ASSERT_EQ( self.v.dimension_2(), ( 2 < ViewType::rank ? N2 : 1 ) );
+ ASSERT_EQ( self.v.dimension_3(), ( 3 < ViewType::rank ? N3 : 1 ) );
+ ASSERT_EQ( self.v.dimension_4(), ( 4 < ViewType::rank ? N4 : 1 ) );
+ ASSERT_EQ( self.v.dimension_5(), ( 5 < ViewType::rank ? N5 : 1 ) );
+ ASSERT_EQ( self.v.dimension_6(), ( 6 < ViewType::rank ? N6 : 1 ) );
+ ASSERT_EQ( self.v.dimension_7(), ( 7 < ViewType::rank ? N7 : 1 ) );
+
+ ASSERT_LE( self.v.dimension_0() *
+ self.v.dimension_1() *
+ self.v.dimension_2() *
+ self.v.dimension_3() *
+ self.v.dimension_4() *
+ self.v.dimension_5() *
+ self.v.dimension_6() *
+ self.v.dimension_7()
+ , self.v.span() );
+
+ long error_count;
+ Kokkos::RangePolicy< typename ViewType::execution_space > range( 0, self.v.dimension_0() );
+ Kokkos::parallel_reduce( range, self, error_count );
+ ASSERT_EQ( 0, error_count );
+ }
};
-
template< class Space >
void test_view_mapping_operator()
{
- typedef typename Space::execution_space ExecSpace ;
-
- TestViewMapOperator< Kokkos::View<int,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int*,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int**,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int***,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int****,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int*****,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int******,Kokkos::LayoutLeft,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int*******,Kokkos::LayoutLeft,ExecSpace> >::run();
-
- TestViewMapOperator< Kokkos::View<int,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int*,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int**,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int***,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int****,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int*****,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int******,Kokkos::LayoutRight,ExecSpace> >::run();
- TestViewMapOperator< Kokkos::View<int*******,Kokkos::LayoutRight,ExecSpace> >::run();
+ typedef typename Space::execution_space ExecSpace;
+
+ TestViewMapOperator< Kokkos::View<int, Kokkos::LayoutLeft, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*, Kokkos::LayoutLeft, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int**, Kokkos::LayoutLeft, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int***, Kokkos::LayoutLeft, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int****, Kokkos::LayoutLeft, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*****, Kokkos::LayoutLeft, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int******, Kokkos::LayoutLeft, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*******, Kokkos::LayoutLeft, ExecSpace> >::run();
+
+ TestViewMapOperator< Kokkos::View<int, Kokkos::LayoutRight, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*, Kokkos::LayoutRight, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int**, Kokkos::LayoutRight, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int***, Kokkos::LayoutRight, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int****, Kokkos::LayoutRight, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*****, Kokkos::LayoutRight, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int******, Kokkos::LayoutRight, ExecSpace> >::run();
+ TestViewMapOperator< Kokkos::View<int*******, Kokkos::LayoutRight, ExecSpace> >::run();
}
/*--------------------------------------------------------------------------*/
template< class Space >
struct TestViewMappingAtomic {
- typedef typename Space::execution_space ExecSpace ;
- typedef typename Space::memory_space MemSpace ;
+ typedef typename Space::execution_space ExecSpace;
+ typedef typename Space::memory_space MemSpace;
- typedef Kokkos::MemoryTraits< Kokkos::Atomic > mem_trait ;
+ typedef Kokkos::MemoryTraits< Kokkos::Atomic > mem_trait;
- typedef Kokkos::View< int * , ExecSpace > T ;
- typedef Kokkos::View< int * , ExecSpace , mem_trait > T_atom ;
+ typedef Kokkos::View< int *, ExecSpace > T;
+ typedef Kokkos::View< int *, ExecSpace, mem_trait > T_atom;
- T x ;
- T_atom x_atom ;
+ T x;
+ T_atom x_atom;
- constexpr static size_t N = 100000 ;
+ constexpr static size_t N = 100000;
struct TagInit {};
struct TagUpdate {};
struct TagVerify {};
KOKKOS_INLINE_FUNCTION
- void operator()( const TagInit & , const int i ) const
- { x(i) = i ; }
+ void operator()( const TagInit &, const int i ) const
+ { x( i ) = i; }
KOKKOS_INLINE_FUNCTION
- void operator()( const TagUpdate & , const int i ) const
- { x_atom(i%2) += 1 ; }
+ void operator()( const TagUpdate &, const int i ) const
+ { x_atom( i % 2 ) += 1; }
KOKKOS_INLINE_FUNCTION
- void operator()( const TagVerify & , const int i , long & error_count ) const
- {
- if ( i < 2 ) { if ( x(i) != int(i + N / 2) ) ++error_count ; }
- else { if ( x(i) != int(i) ) ++error_count ; }
- }
+ void operator()( const TagVerify &, const int i, long & error_count ) const
+ {
+ if ( i < 2 ) { if ( x( i ) != int( i + N / 2 ) ) ++error_count; }
+ else { if ( x( i ) != int( i ) ) ++error_count; }
+ }
TestViewMappingAtomic()
- : x("x",N)
+ : x( "x", N )
, x_atom( x )
{}
static void run()
+ {
+ ASSERT_TRUE( T::reference_type_is_lvalue_reference );
+ ASSERT_FALSE( T_atom::reference_type_is_lvalue_reference );
+
+ TestViewMappingAtomic self;
+
+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, TagInit >( 0, N ), self );
+ Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace, TagUpdate >( 0, N ), self );
+
+ long error_count = -1;
+
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace, TagVerify >( 0, N ), self, error_count );
+
+ ASSERT_EQ( 0, error_count );
+
+ typename TestViewMappingAtomic::T_atom::HostMirror x_host = Kokkos::create_mirror_view( self.x );
+ Kokkos::deep_copy( x_host, self.x );
+
+ error_count = -1;
+
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< Kokkos::DefaultHostExecutionSpace, TagVerify >( 0, N ),
+ [=] ( const TagVerify &, const int i, long & tmp_error_count )
{
- ASSERT_TRUE( T::reference_type_is_lvalue_reference );
- ASSERT_FALSE( T_atom::reference_type_is_lvalue_reference );
-
- TestViewMappingAtomic self ;
- Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace , TagInit >(0,N) , self );
- Kokkos::parallel_for( Kokkos::RangePolicy< ExecSpace , TagUpdate >(0,N) , self );
- long error_count = -1 ;
- Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace , TagVerify >(0,N) , self , error_count );
- ASSERT_EQ( 0 , error_count );
- typename TestViewMappingAtomic::T_atom::HostMirror x_host = Kokkos::create_mirror_view(self.x);
- Kokkos::deep_copy(x_host,self.x);
- error_count = -1;
- Kokkos::parallel_reduce( Kokkos::RangePolicy< Kokkos::DefaultHostExecutionSpace, TagVerify>(0,N),
- [=] ( const TagVerify & , const int i , long & tmp_error_count ) {
- if ( i < 2 ) { if ( x_host(i) != int(i + N / 2) ) ++tmp_error_count ; }
- else { if ( x_host(i) != int(i) ) ++tmp_error_count ; }
- }, error_count);
- ASSERT_EQ( 0 , error_count );
- Kokkos::deep_copy(self.x,x_host);
- }
+ if ( i < 2 ) {
+ if ( x_host( i ) != int( i + N / 2 ) ) ++tmp_error_count ;
+ }
+ else {
+ if ( x_host( i ) != int( i ) ) ++tmp_error_count ;
+ }
+ }, error_count);
+
+ ASSERT_EQ( 0 , error_count );
+ Kokkos::deep_copy( self.x, x_host );
+ }
};
/*--------------------------------------------------------------------------*/
template< class Space >
struct TestViewMappingClassValue {
- typedef typename Space::execution_space ExecSpace ;
- typedef typename Space::memory_space MemSpace ;
+ typedef typename Space::execution_space ExecSpace;
+ typedef typename Space::memory_space MemSpace;
struct ValueType {
KOKKOS_INLINE_FUNCTION
ValueType()
{
#if 0
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA )
- printf("TestViewMappingClassValue construct on Cuda\n");
+ printf( "TestViewMappingClassValue construct on Cuda\n" );
#elif defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- printf("TestViewMappingClassValue construct on Host\n");
+ printf( "TestViewMappingClassValue construct on Host\n" );
#else
- printf("TestViewMappingClassValue construct unknown\n");
+ printf( "TestViewMappingClassValue construct unknown\n" );
#endif
#endif
}
KOKKOS_INLINE_FUNCTION
~ValueType()
{
#if 0
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA )
- printf("TestViewMappingClassValue destruct on Cuda\n");
+ printf( "TestViewMappingClassValue destruct on Cuda\n" );
#elif defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
- printf("TestViewMappingClassValue destruct on Host\n");
+ printf( "TestViewMappingClassValue destruct on Host\n" );
#else
- printf("TestViewMappingClassValue destruct unknown\n");
+ printf( "TestViewMappingClassValue destruct unknown\n" );
#endif
#endif
}
};
static void run()
{
- using namespace Kokkos::Experimental ;
+ using namespace Kokkos::Experimental;
+
ExecSpace::fence();
{
- View< ValueType , ExecSpace > a("a");
+ View< ValueType, ExecSpace > a( "a" );
ExecSpace::fence();
}
ExecSpace::fence();
}
};
-} /* namespace Test */
-
-/*--------------------------------------------------------------------------*/
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestViewOfClass.hpp b/lib/kokkos/core/unit_test/TestViewOfClass.hpp
index 381b8786b..d624c5dda 100644
--- a/lib/kokkos/core/unit_test/TestViewOfClass.hpp
+++ b/lib/kokkos/core/unit_test/TestViewOfClass.hpp
@@ -1,131 +1,121 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <stdexcept>
#include <sstream>
#include <iostream>
-/*--------------------------------------------------------------------------*/
-
namespace Test {
template< class Space >
struct NestedView {
-
- Kokkos::View<int*,Space> member ;
+ Kokkos::View< int*, Space > member;
public:
-
KOKKOS_INLINE_FUNCTION
- NestedView() : member()
- {}
+ NestedView() : member() {}
KOKKOS_INLINE_FUNCTION
- NestedView & operator = ( const Kokkos::View<int*,Space> & lhs )
- {
- member = lhs ;
- if ( member.dimension_0() ) Kokkos::atomic_add( & member(0) , 1 );
- return *this ;
- }
+ NestedView & operator=( const Kokkos::View< int*, Space > & lhs )
+ {
+ member = lhs;
+ if ( member.dimension_0() ) Kokkos::atomic_add( & member( 0 ), 1 );
+ return *this;
+ }
KOKKOS_INLINE_FUNCTION
~NestedView()
- {
+ {
if ( member.dimension_0() ) {
- Kokkos::atomic_add( & member(0) , -1 );
+ Kokkos::atomic_add( & member( 0 ), -1 );
}
}
};
template< class Space >
struct NestedViewFunctor {
- Kokkos::View< NestedView<Space> * , Space > nested ;
- Kokkos::View<int*,Space> array ;
+ Kokkos::View< NestedView<Space> *, Space > nested;
+ Kokkos::View< int*, Space > array;
- NestedViewFunctor(
- const Kokkos::View< NestedView<Space> * , Space > & arg_nested ,
- const Kokkos::View<int*,Space> & arg_array )
+ NestedViewFunctor(
+ const Kokkos::View< NestedView<Space> *, Space > & arg_nested,
+ const Kokkos::View< int*, Space > & arg_array )
: nested( arg_nested )
, array( arg_array )
{}
KOKKOS_INLINE_FUNCTION
- void operator()( int i ) const
- { nested[i] = array ; }
+ void operator()( int i ) const { nested[i] = array; }
};
-
template< class Space >
void view_nested_view()
{
- Kokkos::View<int*,Space> tracking("tracking",1);
+ Kokkos::View< int*, Space > tracking( "tracking", 1 );
- typename Kokkos::View<int*,Space>::HostMirror
- host_tracking = Kokkos::create_mirror( tracking );
+ typename Kokkos::View< int*, Space >::HostMirror host_tracking = Kokkos::create_mirror( tracking );
{
- Kokkos::View< NestedView<Space> * , Space > a("a_nested_view",2);
+ Kokkos::View< NestedView<Space> *, Space > a( "a_nested_view", 2 );
- Kokkos::parallel_for( Kokkos::RangePolicy<Space>(0,2) , NestedViewFunctor<Space>( a , tracking ) );
- Kokkos::deep_copy( host_tracking , tracking );
- ASSERT_EQ( 2 , host_tracking(0) );
+ Kokkos::parallel_for( Kokkos::RangePolicy< Space >( 0, 2 ), NestedViewFunctor< Space >( a, tracking ) );
+ Kokkos::deep_copy( host_tracking, tracking );
+ ASSERT_EQ( 2, host_tracking( 0 ) );
- Kokkos::View< NestedView<Space> * , Space > b("b_nested_view",2);
- Kokkos::parallel_for( Kokkos::RangePolicy<Space>(0,2) , NestedViewFunctor<Space>( b , tracking ) );
- Kokkos::deep_copy( host_tracking , tracking );
- ASSERT_EQ( 4 , host_tracking(0) );
+ Kokkos::View< NestedView<Space> *, Space > b( "b_nested_view", 2 );
+ Kokkos::parallel_for( Kokkos::RangePolicy< Space >( 0, 2 ), NestedViewFunctor< Space >( b, tracking ) );
+ Kokkos::deep_copy( host_tracking, tracking );
+ ASSERT_EQ( 4, host_tracking( 0 ) );
}
- Kokkos::deep_copy( host_tracking , tracking );
- ASSERT_EQ( 0 , host_tracking(0) );
-}
+ Kokkos::deep_copy( host_tracking, tracking );
+ ASSERT_EQ( 0, host_tracking( 0 ) );
}
-/*--------------------------------------------------------------------------*/
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/TestViewSpaceAssign.hpp b/lib/kokkos/core/unit_test/TestViewSpaceAssign.hpp
index 09141e582..21ae92e93 100644
--- a/lib/kokkos/core/unit_test/TestViewSpaceAssign.hpp
+++ b/lib/kokkos/core/unit_test/TestViewSpaceAssign.hpp
@@ -1,82 +1,76 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <stdexcept>
#include <sstream>
#include <iostream>
-/*--------------------------------------------------------------------------*/
-
namespace Test {
-template< typename SpaceDst , typename SpaceSrc >
+template< typename SpaceDst, typename SpaceSrc >
void view_space_assign()
{
- Kokkos::View<double*,SpaceDst> a =
- Kokkos::View<double*,SpaceSrc>("a",1);
+ Kokkos::View< double*, SpaceDst > a =
+ Kokkos::View< double*, SpaceSrc >( "a", 1 );
- Kokkos::View<double*,Kokkos::LayoutLeft,SpaceDst> b =
- Kokkos::View<double*,Kokkos::LayoutLeft,SpaceSrc>("b",1);
+ Kokkos::View< double*, Kokkos::LayoutLeft, SpaceDst > b =
+ Kokkos::View< double*, Kokkos::LayoutLeft, SpaceSrc >( "b", 1 );
- Kokkos::View<double*,Kokkos::LayoutRight,SpaceDst> c =
- Kokkos::View<double*,Kokkos::LayoutRight,SpaceSrc>("c",1);
+ Kokkos::View< double*, Kokkos::LayoutRight, SpaceDst > c =
+ Kokkos::View< double*, Kokkos::LayoutRight, SpaceSrc >( "c", 1 );
- Kokkos::View<double*,SpaceDst,Kokkos::MemoryRandomAccess> d =
- Kokkos::View<double*,SpaceSrc>("d",1);
+ Kokkos::View< double*, SpaceDst, Kokkos::MemoryRandomAccess > d =
+ Kokkos::View< double*, SpaceSrc >( "d", 1 );
- Kokkos::View<double*,Kokkos::LayoutLeft,SpaceDst,Kokkos::MemoryRandomAccess> e =
- Kokkos::View<double*,Kokkos::LayoutLeft,SpaceSrc>("e",1);
+ Kokkos::View< double*, Kokkos::LayoutLeft, SpaceDst, Kokkos::MemoryRandomAccess > e =
+ Kokkos::View< double*, Kokkos::LayoutLeft, SpaceSrc >( "e", 1 );
// Rank-one layout can assign:
- Kokkos::View<double*,Kokkos::LayoutRight,SpaceDst> f =
- Kokkos::View<double*,Kokkos::LayoutLeft,SpaceSrc>("f",1);
+ Kokkos::View< double*, Kokkos::LayoutRight, SpaceDst > f =
+ Kokkos::View< double*, Kokkos::LayoutLeft, SpaceSrc >( "f", 1 );
}
-
} // namespace Test
-
-/*--------------------------------------------------------------------------*/
-
diff --git a/lib/kokkos/core/unit_test/TestViewSubview.hpp b/lib/kokkos/core/unit_test/TestViewSubview.hpp
index 1c2575b6f..386301b45 100644
--- a/lib/kokkos/core/unit_test/TestViewSubview.hpp
+++ b/lib/kokkos/core/unit_test/TestViewSubview.hpp
@@ -1,1239 +1,1291 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <stdexcept>
#include <sstream>
#include <iostream>
-/*--------------------------------------------------------------------------*/
-
namespace TestViewSubview {
-template<class Layout, class Space>
+template< class Layout, class Space >
struct getView {
static
- Kokkos::View<double**,Layout,Space> get(int n, int m) {
- return Kokkos::View<double**,Layout,Space>("G",n,m);
+ Kokkos::View< double**, Layout, Space > get( int n, int m ) {
+ return Kokkos::View< double**, Layout, Space >( "G", n, m );
}
};
-template<class Space>
-struct getView<Kokkos::LayoutStride,Space> {
+template< class Space >
+struct getView< Kokkos::LayoutStride, Space > {
static
- Kokkos::View<double**,Kokkos::LayoutStride,Space> get(int n, int m) {
- const int rank = 2 ;
+ Kokkos::View< double**, Kokkos::LayoutStride, Space > get( int n, int m ) {
+ const int rank = 2;
const int order[] = { 0, 1 };
- const unsigned dim[] = { unsigned(n), unsigned(m) };
- Kokkos::LayoutStride stride = Kokkos::LayoutStride::order_dimensions( rank , order , dim );
- return Kokkos::View<double**,Kokkos::LayoutStride,Space>("G",stride);
+ const unsigned dim[] = { unsigned( n ), unsigned( m ) };
+ Kokkos::LayoutStride stride = Kokkos::LayoutStride::order_dimensions( rank, order, dim );
+
+ return Kokkos::View< double**, Kokkos::LayoutStride, Space >( "G", stride );
}
};
-template<class ViewType, class Space>
+template< class ViewType, class Space >
struct fill_1D {
typedef typename Space::execution_space execution_space;
typedef typename ViewType::size_type size_type;
+
ViewType a;
double val;
- fill_1D(ViewType a_, double val_):a(a_),val(val_) {
- }
+
+ fill_1D( ViewType a_, double val_ ) : a( a_ ), val( val_ ) {}
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int i) const {
- a(i) = val;
- }
+ void operator()( const int i ) const { a( i ) = val; }
};
-template<class ViewType, class Space>
+template< class ViewType, class Space >
struct fill_2D {
typedef typename Space::execution_space execution_space;
typedef typename ViewType::size_type size_type;
+
ViewType a;
double val;
- fill_2D(ViewType a_, double val_):a(a_),val(val_) {
- }
+
+ fill_2D( ViewType a_, double val_ ) : a( a_ ), val( val_ ) {}
+
KOKKOS_INLINE_FUNCTION
- void operator() (const int i) const{
- for(int j = 0; j < static_cast<int>(a.dimension_1()); j++)
- a(i,j) = val;
+ void operator()( const int i ) const
+ {
+ for ( int j = 0; j < static_cast< int >( a.dimension_1() ); j++ ) {
+ a( i, j ) = val;
+ }
}
};
-template<class Layout, class Space>
+template< class Layout, class Space >
void test_auto_1d ()
{
- typedef Kokkos::View<double**, Layout, Space> mv_type;
+ typedef Kokkos::View< double**, Layout, Space > mv_type;
typedef typename mv_type::size_type size_type;
+
const double ZERO = 0.0;
const double ONE = 1.0;
const double TWO = 2.0;
const size_type numRows = 10;
const size_type numCols = 3;
- mv_type X = getView<Layout,Space>::get(numRows, numCols);
- typename mv_type::HostMirror X_h = Kokkos::create_mirror_view (X);
+ mv_type X = getView< Layout, Space >::get( numRows, numCols );
+ typename mv_type::HostMirror X_h = Kokkos::create_mirror_view( X );
- fill_2D<mv_type,Space> f1(X, ONE);
- Kokkos::parallel_for(X.dimension_0(),f1);
- Kokkos::deep_copy (X_h, X);
- for (size_type j = 0; j < numCols; ++j) {
- for (size_type i = 0; i < numRows; ++i) {
- ASSERT_TRUE(X_h(i,j) == ONE);
+ fill_2D< mv_type, Space > f1( X, ONE );
+ Kokkos::parallel_for( X.dimension_0(), f1 );
+ Kokkos::deep_copy( X_h, X );
+ for ( size_type j = 0; j < numCols; ++j ) {
+ for ( size_type i = 0; i < numRows; ++i ) {
+ ASSERT_TRUE( X_h( i, j ) == ONE );
}
}
- fill_2D<mv_type,Space> f2(X, 0.0);
- Kokkos::parallel_for(X.dimension_0(),f2);
- Kokkos::deep_copy (X_h, X);
- for (size_type j = 0; j < numCols; ++j) {
- for (size_type i = 0; i < numRows; ++i) {
- ASSERT_TRUE(X_h(i,j) == ZERO);
+ fill_2D< mv_type, Space > f2( X, 0.0 );
+ Kokkos::parallel_for( X.dimension_0(), f2 );
+ Kokkos::deep_copy( X_h, X );
+ for ( size_type j = 0; j < numCols; ++j ) {
+ for ( size_type i = 0; i < numRows; ++i ) {
+ ASSERT_TRUE( X_h( i, j ) == ZERO );
}
}
- fill_2D<mv_type,Space> f3(X, TWO);
- Kokkos::parallel_for(X.dimension_0(),f3);
- Kokkos::deep_copy (X_h, X);
- for (size_type j = 0; j < numCols; ++j) {
- for (size_type i = 0; i < numRows; ++i) {
- ASSERT_TRUE(X_h(i,j) == TWO);
+ fill_2D< mv_type, Space > f3( X, TWO );
+ Kokkos::parallel_for( X.dimension_0(), f3 );
+ Kokkos::deep_copy( X_h, X );
+ for ( size_type j = 0; j < numCols; ++j ) {
+ for ( size_type i = 0; i < numRows; ++i ) {
+ ASSERT_TRUE( X_h( i, j ) == TWO );
}
}
- for (size_type j = 0; j < numCols; ++j) {
- auto X_j = Kokkos::subview (X, Kokkos::ALL, j);
+ for ( size_type j = 0; j < numCols; ++j ) {
+ auto X_j = Kokkos::subview( X, Kokkos::ALL, j );
- fill_1D<decltype(X_j),Space> f4(X_j, ZERO);
- Kokkos::parallel_for(X_j.dimension_0(),f4);
- Kokkos::deep_copy (X_h, X);
- for (size_type i = 0; i < numRows; ++i) {
- ASSERT_TRUE(X_h(i,j) == ZERO);
+ fill_1D< decltype( X_j ), Space > f4( X_j, ZERO );
+ Kokkos::parallel_for( X_j.dimension_0(), f4 );
+ Kokkos::deep_copy( X_h, X );
+ for ( size_type i = 0; i < numRows; ++i ) {
+ ASSERT_TRUE( X_h( i, j ) == ZERO );
}
- for (size_type jj = 0; jj < numCols; ++jj) {
- auto X_jj = Kokkos::subview (X, Kokkos::ALL, jj);
- fill_1D<decltype(X_jj),Space> f5(X_jj, ONE);
- Kokkos::parallel_for(X_jj.dimension_0(),f5);
- Kokkos::deep_copy (X_h, X);
- for (size_type i = 0; i < numRows; ++i) {
- ASSERT_TRUE(X_h(i,jj) == ONE);
+ for ( size_type jj = 0; jj < numCols; ++jj ) {
+ auto X_jj = Kokkos::subview ( X, Kokkos::ALL, jj );
+ fill_1D< decltype( X_jj ), Space > f5( X_jj, ONE );
+ Kokkos::parallel_for( X_jj.dimension_0(), f5 );
+ Kokkos::deep_copy( X_h, X );
+ for ( size_type i = 0; i < numRows; ++i ) {
+ ASSERT_TRUE( X_h( i, jj ) == ONE );
}
}
}
}
-template<class LD, class LS, class Space>
-void test_1d_strided_assignment_impl(bool a, bool b, bool c, bool d, int n, int m) {
- Kokkos::View<double**,LS,Space> l2d("l2d",n,m);
+template< class LD, class LS, class Space >
+void test_1d_strided_assignment_impl( bool a, bool b, bool c, bool d, int n, int m ) {
+ Kokkos::View< double**, LS, Space > l2d( "l2d", n, m );
- int col = n>2?2:0;
- int row = m>2?2:0;
+ int col = n > 2 ? 2 : 0;
+ int row = m > 2 ? 2 : 0;
- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
- if(a) {
- Kokkos::View<double*,LD,Space> l1da = Kokkos::subview(l2d,Kokkos::ALL,row);
- ASSERT_TRUE( & l1da(0) == & l2d(0,row) );
- if(n>1)
- ASSERT_TRUE( & l1da(1) == & l2d(1,row) );
- }
- if(b && n>13) {
- Kokkos::View<double*,LD,Space> l1db = Kokkos::subview(l2d,std::pair<unsigned,unsigned>(2,13),row);
- ASSERT_TRUE( & l1db(0) == & l2d(2,row) );
- ASSERT_TRUE( & l1db(1) == & l2d(3,row) );
- }
- if(c) {
- Kokkos::View<double*,LD,Space> l1dc = Kokkos::subview(l2d,col,Kokkos::ALL);
- ASSERT_TRUE( & l1dc(0) == & l2d(col,0) );
- if(m>1)
- ASSERT_TRUE( & l1dc(1) == & l2d(col,1) );
- }
- if(d && m>13) {
- Kokkos::View<double*,LD,Space> l1dd = Kokkos::subview(l2d,col,std::pair<unsigned,unsigned>(2,13));
- ASSERT_TRUE( & l1dd(0) == & l2d(col,2) );
- ASSERT_TRUE( & l1dd(1) == & l2d(col,3) );
- }
+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
+ if ( a ) {
+ Kokkos::View< double*, LD, Space > l1da = Kokkos::subview( l2d, Kokkos::ALL, row );
+ ASSERT_TRUE( & l1da( 0 ) == & l2d( 0, row ) );
+ if ( n > 1 ) {
+ ASSERT_TRUE( & l1da( 1 ) == & l2d( 1, row ) );
+ }
+ }
+
+ if ( b && n > 13 ) {
+ Kokkos::View< double*, LD, Space > l1db = Kokkos::subview( l2d, std::pair< unsigned, unsigned >( 2, 13 ), row );
+ ASSERT_TRUE( & l1db( 0 ) == & l2d( 2, row ) );
+ ASSERT_TRUE( & l1db( 1 ) == & l2d( 3, row ) );
+ }
+
+ if ( c ) {
+ Kokkos::View< double*, LD, Space > l1dc = Kokkos::subview( l2d, col, Kokkos::ALL );
+ ASSERT_TRUE( & l1dc( 0 ) == & l2d( col, 0 ) );
+ if( m > 1 ) {
+ ASSERT_TRUE( & l1dc( 1 ) == & l2d( col, 1 ) );
+ }
+ }
+
+ if ( d && m > 13 ) {
+ Kokkos::View< double*, LD, Space > l1dd = Kokkos::subview( l2d, col, std::pair< unsigned, unsigned >( 2, 13 ) );
+ ASSERT_TRUE( & l1dd( 0 ) == & l2d( col, 2 ) );
+ ASSERT_TRUE( & l1dd( 1 ) == & l2d( col, 3 ) );
+ }
}
}
-template<class Space >
+template< class Space >
void test_1d_strided_assignment() {
- test_1d_strided_assignment_impl<Kokkos::LayoutStride,Kokkos::LayoutLeft,Space>(true,true,true,true,17,3);
- test_1d_strided_assignment_impl<Kokkos::LayoutStride,Kokkos::LayoutRight,Space>(true,true,true,true,17,3);
-
- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutLeft,Space>(true,true,false,false,17,3);
- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutLeft,Space>(true,true,false,false,17,3);
- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutRight,Space>(false,false,true,true,17,3);
- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutRight,Space>(false,false,true,true,17,3);
-
- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutLeft,Space>(true,true,false,false,17,1);
- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutLeft,Space>(true,true,true,true,1,17);
- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutLeft,Space>(true,true,true,true,1,17);
- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutLeft,Space>(true,true,false,false,17,1);
-
- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutRight,Space>(true,true,true,true,17,1);
- test_1d_strided_assignment_impl<Kokkos::LayoutLeft,Kokkos::LayoutRight,Space>(false,false,true,true,1,17);
- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutRight,Space>(false,false,true,true,1,17);
- test_1d_strided_assignment_impl<Kokkos::LayoutRight,Kokkos::LayoutRight,Space>(true,true,true,true,17,1);
+ test_1d_strided_assignment_impl< Kokkos::LayoutStride, Kokkos::LayoutLeft, Space >( true, true, true, true, 17, 3 );
+ test_1d_strided_assignment_impl< Kokkos::LayoutStride, Kokkos::LayoutRight, Space >( true, true, true, true, 17, 3 );
+
+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutLeft, Space >( true, true, false, false, 17, 3 );
+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutLeft, Space >( true, true, false, false, 17, 3 );
+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutRight, Space >( false, false, true, true, 17, 3 );
+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutRight, Space >( false, false, true, true, 17, 3 );
+
+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutLeft, Space >( true, true, false, false, 17, 1 );
+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutLeft, Space >( true, true, true, true, 1, 17 );
+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutLeft, Space >( true, true, true, true, 1, 17 );
+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutLeft, Space >( true, true, false, false, 17, 1 );
+
+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutRight, Space >( true, true, true, true, 17, 1 );
+ test_1d_strided_assignment_impl< Kokkos::LayoutLeft, Kokkos::LayoutRight, Space >( false, false, true, true, 1, 17 );
+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutRight, Space >( false, false, true, true, 1, 17 );
+ test_1d_strided_assignment_impl< Kokkos::LayoutRight, Kokkos::LayoutRight, Space >( true, true, true, true, 17, 1 );
}
template< class Space >
void test_left_0()
{
- typedef Kokkos::View< int [2][3][4][5][2][3][4][5] , Kokkos::LayoutLeft , Space >
- view_static_8_type ;
-
- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
+ typedef Kokkos::View< int [2][3][4][5][2][3][4][5], Kokkos::LayoutLeft, Space > view_static_8_type;
- view_static_8_type x_static_8("x_static_left_8");
+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
+ view_static_8_type x_static_8( "x_static_left_8" );
- ASSERT_TRUE( x_static_8.is_contiguous() );
+ ASSERT_TRUE( x_static_8.is_contiguous() );
- Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( x_static_8 , 0, 0, 0, 0, 0, 0, 0, 0 );
+ Kokkos::View< int, Kokkos::LayoutLeft, Space > x0 = Kokkos::subview( x_static_8, 0, 0, 0, 0, 0, 0, 0, 0 );
- ASSERT_TRUE( x0.is_contiguous() );
- ASSERT_TRUE( & x0() == & x_static_8(0,0,0,0,0,0,0,0) );
+ ASSERT_TRUE( x0.is_contiguous() );
+ ASSERT_TRUE( & x0() == & x_static_8( 0, 0, 0, 0, 0, 0, 0, 0 ) );
- Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
- Kokkos::subview( x_static_8, Kokkos::pair<int,int>(0,2), 1, 2, 3, 0, 1, 2, 3 );
+ Kokkos::View< int*, Kokkos::LayoutLeft, Space > x1 =
+ Kokkos::subview( x_static_8, Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3, 0, 1, 2, 3 );
- ASSERT_TRUE( x1.is_contiguous() );
- ASSERT_TRUE( & x1(0) == & x_static_8(0,1,2,3,0,1,2,3) );
- ASSERT_TRUE( & x1(1) == & x_static_8(1,1,2,3,0,1,2,3) );
+ ASSERT_TRUE( x1.is_contiguous() );
+ ASSERT_TRUE( & x1( 0 ) == & x_static_8( 0, 1, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & x1( 1 ) == & x_static_8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
- Kokkos::subview( x_static_8, Kokkos::pair<int,int>(0,2), 1, 2, 3
- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2 =
+ Kokkos::subview( x_static_8, Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3
+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );
- ASSERT_TRUE( ! x2.is_contiguous() );
- ASSERT_TRUE( & x2(0,0) == & x_static_8(0,1,2,3,0,1,2,3) );
- ASSERT_TRUE( & x2(1,0) == & x_static_8(1,1,2,3,0,1,2,3) );
- ASSERT_TRUE( & x2(0,1) == & x_static_8(0,1,2,3,1,1,2,3) );
- ASSERT_TRUE( & x2(1,1) == & x_static_8(1,1,2,3,1,1,2,3) );
+ ASSERT_TRUE( ! x2.is_contiguous() );
+ ASSERT_TRUE( & x2( 0, 0 ) == & x_static_8( 0, 1, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & x2( 1, 0 ) == & x_static_8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & x2( 0, 1 ) == & x_static_8( 0, 1, 2, 3, 1, 1, 2, 3 ) );
+ ASSERT_TRUE( & x2( 1, 1 ) == & x_static_8( 1, 1, 2, 3, 1, 1, 2, 3 ) );
- // Kokkos::View<int**,Kokkos::LayoutLeft,Space> error_2 =
- Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
- Kokkos::subview( x_static_8, 1, Kokkos::pair<int,int>(0,2), 2, 3
- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
+ // Kokkos::View< int**, Kokkos::LayoutLeft, Space > error_2 =
+ Kokkos::View< int**, Kokkos::LayoutStride, Space > sx2 =
+ Kokkos::subview( x_static_8, 1, Kokkos::pair< int, int >( 0, 2 ), 2, 3
+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );
- ASSERT_TRUE( ! sx2.is_contiguous() );
- ASSERT_TRUE( & sx2(0,0) == & x_static_8(1,0,2,3,0,1,2,3) );
- ASSERT_TRUE( & sx2(1,0) == & x_static_8(1,1,2,3,0,1,2,3) );
- ASSERT_TRUE( & sx2(0,1) == & x_static_8(1,0,2,3,1,1,2,3) );
- ASSERT_TRUE( & sx2(1,1) == & x_static_8(1,1,2,3,1,1,2,3) );
+ ASSERT_TRUE( ! sx2.is_contiguous() );
+ ASSERT_TRUE( & sx2( 0, 0 ) == & x_static_8( 1, 0, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 1, 0 ) == & x_static_8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 0, 1 ) == & x_static_8( 1, 0, 2, 3, 1, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 1, 1 ) == & x_static_8( 1, 1, 2, 3, 1, 1, 2, 3 ) );
- Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
- Kokkos::subview( x_static_8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
- , 1, Kokkos::pair<int,int>(1,3) /* of [5] */
- , 1, Kokkos::pair<int,int>(0,2) /* of [3] */
- , 2, Kokkos::pair<int,int>(2,4) /* of [5] */
- );
+ Kokkos::View< int****, Kokkos::LayoutStride, Space > sx4 =
+ Kokkos::subview( x_static_8, 0, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
+ , 1, Kokkos::pair< int, int >( 1, 3 ) /* of [5] */
+ , 1, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
+ , 2, Kokkos::pair< int, int >( 2, 4 ) /* of [5] */
+ );
- ASSERT_TRUE( ! sx4.is_contiguous() );
-
- for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
- for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
- for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
- for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
- ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x_static_8(0,0+i0, 1,1+i1, 1,0+i2, 2,2+i3) );
- }
+ ASSERT_TRUE( ! sx4.is_contiguous() );
+ for ( int i0 = 0; i0 < (int) sx4.dimension_0(); ++i0 )
+ for ( int i1 = 0; i1 < (int) sx4.dimension_1(); ++i1 )
+ for ( int i2 = 0; i2 < (int) sx4.dimension_2(); ++i2 )
+ for ( int i3 = 0; i3 < (int) sx4.dimension_3(); ++i3 )
+ {
+ ASSERT_TRUE( & sx4( i0, i1, i2, i3 ) == & x_static_8( 0, 0 + i0, 1, 1 + i1, 1, 0 + i2, 2, 2 + i3 ) );
+ }
}
}
template< class Space >
void test_left_1()
{
- typedef Kokkos::View< int ****[2][3][4][5] , Kokkos::LayoutLeft , Space >
- view_type ;
-
- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
+ typedef Kokkos::View< int ****[2][3][4][5], Kokkos::LayoutLeft, Space > view_type;
- view_type x8("x_left_8",2,3,4,5);
+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
+ view_type x8( "x_left_8", 2, 3, 4, 5 );
- ASSERT_TRUE( x8.is_contiguous() );
+ ASSERT_TRUE( x8.is_contiguous() );
- Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( x8 , 0, 0, 0, 0, 0, 0, 0, 0 );
+ Kokkos::View< int, Kokkos::LayoutLeft, Space > x0 = Kokkos::subview( x8, 0, 0, 0, 0, 0, 0, 0, 0 );
- ASSERT_TRUE( x0.is_contiguous() );
- ASSERT_TRUE( & x0() == & x8(0,0,0,0,0,0,0,0) );
+ ASSERT_TRUE( x0.is_contiguous() );
+ ASSERT_TRUE( & x0() == & x8( 0, 0, 0, 0, 0, 0, 0, 0 ) );
- Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
- Kokkos::subview( x8, Kokkos::pair<int,int>(0,2), 1, 2, 3, 0, 1, 2, 3 );
+ Kokkos::View< int*, Kokkos::LayoutLeft, Space > x1 =
+ Kokkos::subview( x8, Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3, 0, 1, 2, 3 );
- ASSERT_TRUE( x1.is_contiguous() );
- ASSERT_TRUE( & x1(0) == & x8(0,1,2,3,0,1,2,3) );
- ASSERT_TRUE( & x1(1) == & x8(1,1,2,3,0,1,2,3) );
+ ASSERT_TRUE( x1.is_contiguous() );
+ ASSERT_TRUE( & x1( 0 ) == & x8( 0, 1, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & x1( 1 ) == & x8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
- Kokkos::subview( x8, Kokkos::pair<int,int>(0,2), 1, 2, 3
- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2 =
+ Kokkos::subview( x8, Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3
+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );
- ASSERT_TRUE( ! x2.is_contiguous() );
- ASSERT_TRUE( & x2(0,0) == & x8(0,1,2,3,0,1,2,3) );
- ASSERT_TRUE( & x2(1,0) == & x8(1,1,2,3,0,1,2,3) );
- ASSERT_TRUE( & x2(0,1) == & x8(0,1,2,3,1,1,2,3) );
- ASSERT_TRUE( & x2(1,1) == & x8(1,1,2,3,1,1,2,3) );
+ ASSERT_TRUE( ! x2.is_contiguous() );
+ ASSERT_TRUE( & x2( 0, 0 ) == & x8( 0, 1, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & x2( 1, 0 ) == & x8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & x2( 0, 1 ) == & x8( 0, 1, 2, 3, 1, 1, 2, 3 ) );
+ ASSERT_TRUE( & x2( 1, 1 ) == & x8( 1, 1, 2, 3, 1, 1, 2, 3 ) );
- // Kokkos::View<int**,Kokkos::LayoutLeft,Space> error_2 =
- Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
- Kokkos::subview( x8, 1, Kokkos::pair<int,int>(0,2), 2, 3
- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
+ // Kokkos::View< int**, Kokkos::LayoutLeft, Space > error_2 =
+ Kokkos::View< int**, Kokkos::LayoutStride, Space > sx2 =
+ Kokkos::subview( x8, 1, Kokkos::pair< int, int >( 0, 2 ), 2, 3
+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );
- ASSERT_TRUE( ! sx2.is_contiguous() );
- ASSERT_TRUE( & sx2(0,0) == & x8(1,0,2,3,0,1,2,3) );
- ASSERT_TRUE( & sx2(1,0) == & x8(1,1,2,3,0,1,2,3) );
- ASSERT_TRUE( & sx2(0,1) == & x8(1,0,2,3,1,1,2,3) );
- ASSERT_TRUE( & sx2(1,1) == & x8(1,1,2,3,1,1,2,3) );
+ ASSERT_TRUE( ! sx2.is_contiguous() );
+ ASSERT_TRUE( & sx2( 0, 0 ) == & x8( 1, 0, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 1, 0 ) == & x8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 0, 1 ) == & x8( 1, 0, 2, 3, 1, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 1, 1 ) == & x8( 1, 1, 2, 3, 1, 1, 2, 3 ) );
- Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
- Kokkos::subview( x8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
- , 1, Kokkos::pair<int,int>(1,3) /* of [5] */
- , 1, Kokkos::pair<int,int>(0,2) /* of [3] */
- , 2, Kokkos::pair<int,int>(2,4) /* of [5] */
- );
+ Kokkos::View< int****, Kokkos::LayoutStride, Space > sx4 =
+ Kokkos::subview( x8, 0, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
+ , 1, Kokkos::pair< int, int >( 1, 3 ) /* of [5] */
+ , 1, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
+ , 2, Kokkos::pair< int, int >( 2, 4 ) /* of [5] */
+ );
- ASSERT_TRUE( ! sx4.is_contiguous() );
-
- for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
- for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
- for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
- for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
- ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x8(0,0+i0, 1,1+i1, 1,0+i2, 2,2+i3) );
- }
+ ASSERT_TRUE( ! sx4.is_contiguous() );
+ for ( int i0 = 0; i0 < (int) sx4.dimension_0(); ++i0 )
+ for ( int i1 = 0; i1 < (int) sx4.dimension_1(); ++i1 )
+ for ( int i2 = 0; i2 < (int) sx4.dimension_2(); ++i2 )
+ for ( int i3 = 0; i3 < (int) sx4.dimension_3(); ++i3 )
+ {
+ ASSERT_TRUE( & sx4( i0, i1, i2, i3 ) == & x8( 0, 0 + i0, 1, 1 + i1, 1, 0 + i2, 2, 2 + i3 ) );
+ }
}
}
template< class Space >
void test_left_2()
{
- typedef Kokkos::View< int **** , Kokkos::LayoutLeft , Space > view_type ;
-
- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
-
- view_type x4("x4",2,3,4,5);
-
- ASSERT_TRUE( x4.is_contiguous() );
-
- Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( x4 , 0, 0, 0, 0 );
-
- ASSERT_TRUE( x0.is_contiguous() );
- ASSERT_TRUE( & x0() == & x4(0,0,0,0) );
-
- Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
- Kokkos::subview( x4, Kokkos::pair<int,int>(0,2), 1, 2, 3 );
-
- ASSERT_TRUE( x1.is_contiguous() );
- ASSERT_TRUE( & x1(0) == & x4(0,1,2,3) );
- ASSERT_TRUE( & x1(1) == & x4(1,1,2,3) );
-
- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
- Kokkos::subview( x4, Kokkos::pair<int,int>(0,2), 1, Kokkos::pair<int,int>(1,3), 2 );
-
- ASSERT_TRUE( ! x2.is_contiguous() );
- ASSERT_TRUE( & x2(0,0) == & x4(0,1,1,2) );
- ASSERT_TRUE( & x2(1,0) == & x4(1,1,1,2) );
- ASSERT_TRUE( & x2(0,1) == & x4(0,1,2,2) );
- ASSERT_TRUE( & x2(1,1) == & x4(1,1,2,2) );
-
- // Kokkos::View<int**,Kokkos::LayoutLeft,Space> error_2 =
- Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
- Kokkos::subview( x4, 1, Kokkos::pair<int,int>(0,2)
- , 2, Kokkos::pair<int,int>(1,4) );
-
- ASSERT_TRUE( ! sx2.is_contiguous() );
- ASSERT_TRUE( & sx2(0,0) == & x4(1,0,2,1) );
- ASSERT_TRUE( & sx2(1,0) == & x4(1,1,2,1) );
- ASSERT_TRUE( & sx2(0,1) == & x4(1,0,2,2) );
- ASSERT_TRUE( & sx2(1,1) == & x4(1,1,2,2) );
- ASSERT_TRUE( & sx2(0,2) == & x4(1,0,2,3) );
- ASSERT_TRUE( & sx2(1,2) == & x4(1,1,2,3) );
-
- Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
- Kokkos::subview( x4, Kokkos::pair<int,int>(1,2) /* of [2] */
- , Kokkos::pair<int,int>(1,3) /* of [3] */
- , Kokkos::pair<int,int>(0,4) /* of [4] */
- , Kokkos::pair<int,int>(2,4) /* of [5] */
- );
-
- ASSERT_TRUE( ! sx4.is_contiguous() );
-
- for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
- for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
- for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
- for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
- ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x4( 1+i0, 1+i1, 0+i2, 2+i3 ) );
- }
-
+ typedef Kokkos::View< int ****, Kokkos::LayoutLeft, Space > view_type;
+
+ if ( Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace, typename Space::memory_space>::accessible ) {
+ view_type x4( "x4", 2, 3, 4, 5 );
+
+ ASSERT_TRUE( x4.is_contiguous() );
+
+ Kokkos::View< int, Kokkos::LayoutLeft, Space > x0 = Kokkos::subview( x4, 0, 0, 0, 0 );
+
+ ASSERT_TRUE( x0.is_contiguous() );
+ ASSERT_TRUE( & x0() == & x4( 0, 0, 0, 0 ) );
+
+ Kokkos::View< int*, Kokkos::LayoutLeft, Space > x1 =
+ Kokkos::subview( x4, Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );
+
+ ASSERT_TRUE( x1.is_contiguous() );
+ ASSERT_TRUE( & x1( 0 ) == & x4( 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & x1( 1 ) == & x4( 1, 1, 2, 3 ) );
+
+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2 =
+ Kokkos::subview( x4, Kokkos::pair< int, int >( 0, 2 ), 1
+ , Kokkos::pair< int, int >( 1, 3 ), 2 );
+
+ ASSERT_TRUE( ! x2.is_contiguous() );
+ ASSERT_TRUE( & x2( 0, 0 ) == & x4( 0, 1, 1, 2 ) );
+ ASSERT_TRUE( & x2( 1, 0 ) == & x4( 1, 1, 1, 2 ) );
+ ASSERT_TRUE( & x2( 0, 1 ) == & x4( 0, 1, 2, 2 ) );
+ ASSERT_TRUE( & x2( 1, 1 ) == & x4( 1, 1, 2, 2 ) );
+
+ // Kokkos::View< int**, Kokkos::LayoutLeft, Space > error_2 =
+ Kokkos::View< int**, Kokkos::LayoutStride, Space > sx2 =
+ Kokkos::subview( x4, 1, Kokkos::pair< int, int >( 0, 2 )
+ , 2, Kokkos::pair< int, int >( 1, 4 ) );
+
+ ASSERT_TRUE( ! sx2.is_contiguous() );
+ ASSERT_TRUE( & sx2( 0, 0 ) == & x4( 1, 0, 2, 1 ) );
+ ASSERT_TRUE( & sx2( 1, 0 ) == & x4( 1, 1, 2, 1 ) );
+ ASSERT_TRUE( & sx2( 0, 1 ) == & x4( 1, 0, 2, 2 ) );
+ ASSERT_TRUE( & sx2( 1, 1 ) == & x4( 1, 1, 2, 2 ) );
+ ASSERT_TRUE( & sx2( 0, 2 ) == & x4( 1, 0, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 1, 2 ) == & x4( 1, 1, 2, 3 ) );
+
+ Kokkos::View< int****, Kokkos::LayoutStride, Space > sx4 =
+ Kokkos::subview( x4, Kokkos::pair< int, int >( 1, 2 ) /* of [2] */
+ , Kokkos::pair< int, int >( 1, 3 ) /* of [3] */
+ , Kokkos::pair< int, int >( 0, 4 ) /* of [4] */
+ , Kokkos::pair< int, int >( 2, 4 ) /* of [5] */
+ );
+
+ ASSERT_TRUE( ! sx4.is_contiguous() );
+
+ for ( int i0 = 0; i0 < (int) sx4.dimension_0(); ++i0 )
+ for ( int i1 = 0; i1 < (int) sx4.dimension_1(); ++i1 )
+ for ( int i2 = 0; i2 < (int) sx4.dimension_2(); ++i2 )
+ for ( int i3 = 0; i3 < (int) sx4.dimension_3(); ++i3 )
+ {
+ ASSERT_TRUE( & sx4( i0, i1, i2, i3 ) == & x4( 1 + i0, 1 + i1, 0 + i2, 2 + i3 ) );
+ }
}
}
template< class Space >
void test_left_3()
{
- typedef Kokkos::View< int ** , Kokkos::LayoutLeft , Space > view_type ;
-
- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
+ typedef Kokkos::View< int **, Kokkos::LayoutLeft, Space > view_type;
- view_type xm("x4",10,5);
+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
+ view_type xm( "x4", 10, 5 );
- ASSERT_TRUE( xm.is_contiguous() );
+ ASSERT_TRUE( xm.is_contiguous() );
- Kokkos::View<int,Kokkos::LayoutLeft,Space> x0 = Kokkos::subview( xm , 5, 3 );
+ Kokkos::View< int, Kokkos::LayoutLeft, Space > x0 = Kokkos::subview( xm, 5, 3 );
- ASSERT_TRUE( x0.is_contiguous() );
- ASSERT_TRUE( & x0() == & xm(5,3) );
+ ASSERT_TRUE( x0.is_contiguous() );
+ ASSERT_TRUE( & x0() == & xm( 5, 3 ) );
- Kokkos::View<int*,Kokkos::LayoutLeft,Space> x1 =
- Kokkos::subview( xm, Kokkos::ALL, 3 );
+ Kokkos::View< int*, Kokkos::LayoutLeft, Space > x1 = Kokkos::subview( xm, Kokkos::ALL, 3 );
- ASSERT_TRUE( x1.is_contiguous() );
- for ( int i = 0 ; i < int(xm.dimension_0()) ; ++i ) {
- ASSERT_TRUE( & x1(i) == & xm(i,3) );
- }
-
- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2 =
- Kokkos::subview( xm, Kokkos::pair<int,int>(1,9), Kokkos::ALL );
+ ASSERT_TRUE( x1.is_contiguous() );
+ for ( int i = 0; i < int( xm.dimension_0() ); ++i ) {
+ ASSERT_TRUE( & x1( i ) == & xm( i, 3 ) );
+ }
- ASSERT_TRUE( ! x2.is_contiguous() );
- for ( int j = 0 ; j < int(x2.dimension_1()) ; ++j )
- for ( int i = 0 ; i < int(x2.dimension_0()) ; ++i ) {
- ASSERT_TRUE( & x2(i,j) == & xm(1+i,j) );
- }
+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2 =
+ Kokkos::subview( xm, Kokkos::pair< int, int >( 1, 9 ), Kokkos::ALL );
- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2c =
- Kokkos::subview( xm, Kokkos::ALL, std::pair<int,int>(2,4) );
+ ASSERT_TRUE( ! x2.is_contiguous() );
+ for ( int j = 0; j < int( x2.dimension_1() ); ++j )
+ for ( int i = 0; i < int( x2.dimension_0() ); ++i )
+ {
+ ASSERT_TRUE( & x2( i, j ) == & xm( 1 + i, j ) );
+ }
- ASSERT_TRUE( x2c.is_contiguous() );
- for ( int j = 0 ; j < int(x2c.dimension_1()) ; ++j )
- for ( int i = 0 ; i < int(x2c.dimension_0()) ; ++i ) {
- ASSERT_TRUE( & x2c(i,j) == & xm(i,2+j) );
- }
+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2c =
+ Kokkos::subview( xm, Kokkos::ALL, std::pair< int, int >( 2, 4 ) );
- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2_n1 =
- Kokkos::subview( xm , std::pair<int,int>(1,1) , Kokkos::ALL );
+ ASSERT_TRUE( x2c.is_contiguous() );
+ for ( int j = 0; j < int( x2c.dimension_1() ); ++j )
+ for ( int i = 0; i < int( x2c.dimension_0() ); ++i )
+ {
+ ASSERT_TRUE( & x2c( i, j ) == & xm( i, 2 + j ) );
+ }
- ASSERT_TRUE( x2_n1.dimension_0() == 0 );
- ASSERT_TRUE( x2_n1.dimension_1() == xm.dimension_1() );
+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2_n1 =
+ Kokkos::subview( xm, std::pair< int, int >( 1, 1 ), Kokkos::ALL );
- Kokkos::View<int**,Kokkos::LayoutLeft,Space> x2_n2 =
- Kokkos::subview( xm , Kokkos::ALL , std::pair<int,int>(1,1) );
+ ASSERT_TRUE( x2_n1.dimension_0() == 0 );
+ ASSERT_TRUE( x2_n1.dimension_1() == xm.dimension_1() );
- ASSERT_TRUE( x2_n2.dimension_0() == xm.dimension_0() );
- ASSERT_TRUE( x2_n2.dimension_1() == 0 );
+ Kokkos::View< int**, Kokkos::LayoutLeft, Space > x2_n2 =
+ Kokkos::subview( xm, Kokkos::ALL, std::pair< int, int >( 1, 1 ) );
+ ASSERT_TRUE( x2_n2.dimension_0() == xm.dimension_0() );
+ ASSERT_TRUE( x2_n2.dimension_1() == 0 );
}
}
//----------------------------------------------------------------------------
template< class Space >
void test_right_0()
{
- typedef Kokkos::View< int [2][3][4][5][2][3][4][5] , Kokkos::LayoutRight , Space >
- view_static_8_type ;
-
- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
-
- view_static_8_type x_static_8("x_static_right_8");
-
- Kokkos::View<int,Kokkos::LayoutRight,Space> x0 = Kokkos::subview( x_static_8 , 0, 0, 0, 0, 0, 0, 0, 0 );
-
- ASSERT_TRUE( & x0() == & x_static_8(0,0,0,0,0,0,0,0) );
-
- Kokkos::View<int*,Kokkos::LayoutRight,Space> x1 =
- Kokkos::subview( x_static_8, 0, 1, 2, 3, 0, 1, 2, Kokkos::pair<int,int>(1,3) );
-
- ASSERT_TRUE( x1.dimension_0() == 2 );
- ASSERT_TRUE( & x1(0) == & x_static_8(0,1,2,3,0,1,2,1) );
- ASSERT_TRUE( & x1(1) == & x_static_8(0,1,2,3,0,1,2,2) );
-
- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2 =
- Kokkos::subview( x_static_8, 0, 1, 2, Kokkos::pair<int,int>(1,3)
- , 0, 1, 2, Kokkos::pair<int,int>(1,3) );
-
- ASSERT_TRUE( x2.dimension_0() == 2 );
- ASSERT_TRUE( x2.dimension_1() == 2 );
- ASSERT_TRUE( & x2(0,0) == & x_static_8(0,1,2,1,0,1,2,1) );
- ASSERT_TRUE( & x2(1,0) == & x_static_8(0,1,2,2,0,1,2,1) );
- ASSERT_TRUE( & x2(0,1) == & x_static_8(0,1,2,1,0,1,2,2) );
- ASSERT_TRUE( & x2(1,1) == & x_static_8(0,1,2,2,0,1,2,2) );
-
- // Kokkos::View<int**,Kokkos::LayoutRight,Space> error_2 =
- Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
- Kokkos::subview( x_static_8, 1, Kokkos::pair<int,int>(0,2), 2, 3
- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
-
- ASSERT_TRUE( sx2.dimension_0() == 2 );
- ASSERT_TRUE( sx2.dimension_1() == 2 );
- ASSERT_TRUE( & sx2(0,0) == & x_static_8(1,0,2,3,0,1,2,3) );
- ASSERT_TRUE( & sx2(1,0) == & x_static_8(1,1,2,3,0,1,2,3) );
- ASSERT_TRUE( & sx2(0,1) == & x_static_8(1,0,2,3,1,1,2,3) );
- ASSERT_TRUE( & sx2(1,1) == & x_static_8(1,1,2,3,1,1,2,3) );
-
- Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
- Kokkos::subview( x_static_8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
- , 1, Kokkos::pair<int,int>(1,3) /* of [5] */
- , 1, Kokkos::pair<int,int>(0,2) /* of [3] */
- , 2, Kokkos::pair<int,int>(2,4) /* of [5] */
- );
-
- ASSERT_TRUE( sx4.dimension_0() == 2 );
- ASSERT_TRUE( sx4.dimension_1() == 2 );
- ASSERT_TRUE( sx4.dimension_2() == 2 );
- ASSERT_TRUE( sx4.dimension_3() == 2 );
- for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
- for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
- for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
- for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
- ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x_static_8(0, 0+i0, 1, 1+i1, 1, 0+i2, 2, 2+i3) );
- }
-
+ typedef Kokkos::View< int [2][3][4][5][2][3][4][5], Kokkos::LayoutRight, Space > view_static_8_type;
+
+ if ( Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace, typename Space::memory_space>::accessible ) {
+ view_static_8_type x_static_8( "x_static_right_8" );
+
+ Kokkos::View< int, Kokkos::LayoutRight, Space > x0 = Kokkos::subview( x_static_8, 0, 0, 0, 0, 0, 0, 0, 0 );
+
+ ASSERT_TRUE( & x0() == & x_static_8( 0, 0, 0, 0, 0, 0, 0, 0 ) );
+
+ Kokkos::View< int*, Kokkos::LayoutRight, Space > x1 =
+ Kokkos::subview( x_static_8, 0, 1, 2, 3, 0, 1, 2, Kokkos::pair< int, int >( 1, 3 ) );
+
+ ASSERT_TRUE( x1.dimension_0() == 2 );
+ ASSERT_TRUE( & x1( 0 ) == & x_static_8( 0, 1, 2, 3, 0, 1, 2, 1 ) );
+ ASSERT_TRUE( & x1( 1 ) == & x_static_8( 0, 1, 2, 3, 0, 1, 2, 2 ) );
+
+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2 =
+ Kokkos::subview( x_static_8, 0, 1, 2, Kokkos::pair< int, int >( 1, 3 )
+ , 0, 1, 2, Kokkos::pair< int, int >( 1, 3 ) );
+
+ ASSERT_TRUE( x2.dimension_0() == 2 );
+ ASSERT_TRUE( x2.dimension_1() == 2 );
+ ASSERT_TRUE( & x2( 0, 0 ) == & x_static_8( 0, 1, 2, 1, 0, 1, 2, 1 ) );
+ ASSERT_TRUE( & x2( 1, 0 ) == & x_static_8( 0, 1, 2, 2, 0, 1, 2, 1 ) );
+ ASSERT_TRUE( & x2( 0, 1 ) == & x_static_8( 0, 1, 2, 1, 0, 1, 2, 2 ) );
+ ASSERT_TRUE( & x2( 1, 1 ) == & x_static_8( 0, 1, 2, 2, 0, 1, 2, 2 ) );
+
+ // Kokkos::View< int**, Kokkos::LayoutRight, Space > error_2 =
+ Kokkos::View< int**, Kokkos::LayoutStride, Space > sx2 =
+ Kokkos::subview( x_static_8, 1, Kokkos::pair< int, int >( 0, 2 ), 2, 3
+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );
+
+ ASSERT_TRUE( sx2.dimension_0() == 2 );
+ ASSERT_TRUE( sx2.dimension_1() == 2 );
+ ASSERT_TRUE( & sx2( 0, 0 ) == & x_static_8( 1, 0, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 1, 0 ) == & x_static_8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 0, 1 ) == & x_static_8( 1, 0, 2, 3, 1, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 1, 1 ) == & x_static_8( 1, 1, 2, 3, 1, 1, 2, 3 ) );
+
+ Kokkos::View< int****, Kokkos::LayoutStride, Space > sx4 =
+ Kokkos::subview( x_static_8, 0, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
+ , 1, Kokkos::pair< int, int >( 1, 3 ) /* of [5] */
+ , 1, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
+ , 2, Kokkos::pair< int, int >( 2, 4 ) /* of [5] */
+ );
+
+ ASSERT_TRUE( sx4.dimension_0() == 2 );
+ ASSERT_TRUE( sx4.dimension_1() == 2 );
+ ASSERT_TRUE( sx4.dimension_2() == 2 );
+ ASSERT_TRUE( sx4.dimension_3() == 2 );
+ for ( int i0 = 0; i0 < (int) sx4.dimension_0(); ++i0 )
+ for ( int i1 = 0; i1 < (int) sx4.dimension_1(); ++i1 )
+ for ( int i2 = 0; i2 < (int) sx4.dimension_2(); ++i2 )
+ for ( int i3 = 0; i3 < (int) sx4.dimension_3(); ++i3 )
+ {
+ ASSERT_TRUE( & sx4( i0, i1, i2, i3 ) == & x_static_8( 0, 0 + i0, 1, 1 + i1, 1, 0 + i2, 2, 2 + i3 ) );
+ }
}
}
template< class Space >
void test_right_1()
{
- typedef Kokkos::View< int ****[2][3][4][5] , Kokkos::LayoutRight , Space >
- view_type ;
-
- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
+ typedef Kokkos::View< int ****[2][3][4][5], Kokkos::LayoutRight, Space > view_type;
- view_type x8("x_right_8",2,3,4,5);
+ if ( Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace, typename Space::memory_space>::accessible ) {
+ view_type x8( "x_right_8", 2, 3, 4, 5 );
- Kokkos::View<int,Kokkos::LayoutRight,Space> x0 = Kokkos::subview( x8 , 0, 0, 0, 0, 0, 0, 0, 0 );
+ Kokkos::View< int, Kokkos::LayoutRight, Space > x0 = Kokkos::subview( x8, 0, 0, 0, 0, 0, 0, 0, 0 );
- ASSERT_TRUE( & x0() == & x8(0,0,0,0,0,0,0,0) );
+ ASSERT_TRUE( & x0() == & x8( 0, 0, 0, 0, 0, 0, 0, 0 ) );
- Kokkos::View<int*,Kokkos::LayoutRight,Space> x1 =
- Kokkos::subview( x8, 0, 1, 2, 3, 0, 1, 2, Kokkos::pair<int,int>(1,3) );
+ Kokkos::View< int*, Kokkos::LayoutRight, Space > x1 =
+ Kokkos::subview( x8, 0, 1, 2, 3, 0, 1, 2, Kokkos::pair< int, int >( 1, 3 ) );
- ASSERT_TRUE( & x1(0) == & x8(0,1,2,3,0,1,2,1) );
- ASSERT_TRUE( & x1(1) == & x8(0,1,2,3,0,1,2,2) );
+ ASSERT_TRUE( & x1( 0 ) == & x8( 0, 1, 2, 3, 0, 1, 2, 1 ) );
+ ASSERT_TRUE( & x1( 1 ) == & x8( 0, 1, 2, 3, 0, 1, 2, 2 ) );
- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2 =
- Kokkos::subview( x8, 0, 1, 2, Kokkos::pair<int,int>(1,3)
- , 0, 1, 2, Kokkos::pair<int,int>(1,3) );
+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2 =
+ Kokkos::subview( x8, 0, 1, 2, Kokkos::pair< int, int >( 1, 3 )
+ , 0, 1, 2, Kokkos::pair< int, int >( 1, 3 ) );
- ASSERT_TRUE( & x2(0,0) == & x8(0,1,2,1,0,1,2,1) );
- ASSERT_TRUE( & x2(1,0) == & x8(0,1,2,2,0,1,2,1) );
- ASSERT_TRUE( & x2(0,1) == & x8(0,1,2,1,0,1,2,2) );
- ASSERT_TRUE( & x2(1,1) == & x8(0,1,2,2,0,1,2,2) );
+ ASSERT_TRUE( & x2( 0, 0 ) == & x8( 0, 1, 2, 1, 0, 1, 2, 1 ) );
+ ASSERT_TRUE( & x2( 1, 0 ) == & x8( 0, 1, 2, 2, 0, 1, 2, 1 ) );
+ ASSERT_TRUE( & x2( 0, 1 ) == & x8( 0, 1, 2, 1, 0, 1, 2, 2 ) );
+ ASSERT_TRUE( & x2( 1, 1 ) == & x8( 0, 1, 2, 2, 0, 1, 2, 2 ) );
- // Kokkos::View<int**,Kokkos::LayoutRight,Space> error_2 =
- Kokkos::View<int**,Kokkos::LayoutStride,Space> sx2 =
- Kokkos::subview( x8, 1, Kokkos::pair<int,int>(0,2), 2, 3
- , Kokkos::pair<int,int>(0,2), 1, 2, 3 );
+ // Kokkos::View< int**, Kokkos::LayoutRight, Space > error_2 =
+ Kokkos::View< int**, Kokkos::LayoutStride, Space > sx2 =
+ Kokkos::subview( x8, 1, Kokkos::pair< int, int >( 0, 2 ), 2, 3
+ , Kokkos::pair< int, int >( 0, 2 ), 1, 2, 3 );
- ASSERT_TRUE( & sx2(0,0) == & x8(1,0,2,3,0,1,2,3) );
- ASSERT_TRUE( & sx2(1,0) == & x8(1,1,2,3,0,1,2,3) );
- ASSERT_TRUE( & sx2(0,1) == & x8(1,0,2,3,1,1,2,3) );
- ASSERT_TRUE( & sx2(1,1) == & x8(1,1,2,3,1,1,2,3) );
+ ASSERT_TRUE( & sx2( 0, 0 ) == & x8( 1, 0, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 1, 0 ) == & x8( 1, 1, 2, 3, 0, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 0, 1 ) == & x8( 1, 0, 2, 3, 1, 1, 2, 3 ) );
+ ASSERT_TRUE( & sx2( 1, 1 ) == & x8( 1, 1, 2, 3, 1, 1, 2, 3 ) );
- Kokkos::View<int****,Kokkos::LayoutStride,Space> sx4 =
- Kokkos::subview( x8, 0, Kokkos::pair<int,int>(0,2) /* of [3] */
- , 1, Kokkos::pair<int,int>(1,3) /* of [5] */
- , 1, Kokkos::pair<int,int>(0,2) /* of [3] */
- , 2, Kokkos::pair<int,int>(2,4) /* of [5] */
- );
-
- for ( int i0 = 0 ; i0 < (int) sx4.dimension_0() ; ++i0 )
- for ( int i1 = 0 ; i1 < (int) sx4.dimension_1() ; ++i1 )
- for ( int i2 = 0 ; i2 < (int) sx4.dimension_2() ; ++i2 )
- for ( int i3 = 0 ; i3 < (int) sx4.dimension_3() ; ++i3 ) {
- ASSERT_TRUE( & sx4(i0,i1,i2,i3) == & x8(0,0+i0, 1,1+i1, 1,0+i2, 2,2+i3) );
- }
+ Kokkos::View< int****, Kokkos::LayoutStride, Space > sx4 =
+ Kokkos::subview( x8, 0, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
+ , 1, Kokkos::pair< int, int >( 1, 3 ) /* of [5] */
+ , 1, Kokkos::pair< int, int >( 0, 2 ) /* of [3] */
+ , 2, Kokkos::pair< int, int >( 2, 4 ) /* of [5] */
+ );
+ for ( int i0 = 0; i0 < (int) sx4.dimension_0(); ++i0 )
+ for ( int i1 = 0; i1 < (int) sx4.dimension_1(); ++i1 )
+ for ( int i2 = 0; i2 < (int) sx4.dimension_2(); ++i2 )
+ for ( int i3 = 0; i3 < (int) sx4.dimension_3(); ++i3 )
+ {
+ ASSERT_TRUE( & sx4( i0, i1, i2, i3 ) == & x8( 0, 0 + i0, 1, 1 + i1, 1, 0 + i2, 2, 2 + i3 ) );
+ }
}
}
template< class Space >
void test_right_3()
{
- typedef Kokkos::View< int ** , Kokkos::LayoutRight , Space > view_type ;
-
- if(Kokkos::Impl::SpaceAccessibility<Kokkos::HostSpace,typename Space::memory_space>::accessible) {
+ typedef Kokkos::View< int **, Kokkos::LayoutRight, Space > view_type;
- view_type xm("x4",10,5);
+ if ( Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, typename Space::memory_space >::accessible ) {
+ view_type xm( "x4", 10, 5 );
- ASSERT_TRUE( xm.is_contiguous() );
+ ASSERT_TRUE( xm.is_contiguous() );
- Kokkos::View<int,Kokkos::LayoutRight,Space> x0 = Kokkos::subview( xm , 5, 3 );
+ Kokkos::View< int, Kokkos::LayoutRight, Space > x0 = Kokkos::subview( xm, 5, 3 );
- ASSERT_TRUE( x0.is_contiguous() );
- ASSERT_TRUE( & x0() == & xm(5,3) );
+ ASSERT_TRUE( x0.is_contiguous() );
+ ASSERT_TRUE( & x0() == & xm( 5, 3 ) );
- Kokkos::View<int*,Kokkos::LayoutRight,Space> x1 =
- Kokkos::subview( xm, 3, Kokkos::ALL );
-
- ASSERT_TRUE( x1.is_contiguous() );
- for ( int i = 0 ; i < int(xm.dimension_1()) ; ++i ) {
- ASSERT_TRUE( & x1(i) == & xm(3,i) );
- }
+ Kokkos::View< int*, Kokkos::LayoutRight, Space > x1 = Kokkos::subview( xm, 3, Kokkos::ALL );
- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2c =
- Kokkos::subview( xm, Kokkos::pair<int,int>(1,9), Kokkos::ALL );
+ ASSERT_TRUE( x1.is_contiguous() );
+ for ( int i = 0; i < int( xm.dimension_1() ); ++i ) {
+ ASSERT_TRUE( & x1( i ) == & xm( 3, i ) );
+ }
- ASSERT_TRUE( x2c.is_contiguous() );
- for ( int j = 0 ; j < int(x2c.dimension_1()) ; ++j )
- for ( int i = 0 ; i < int(x2c.dimension_0()) ; ++i ) {
- ASSERT_TRUE( & x2c(i,j) == & xm(1+i,j) );
- }
+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2c =
+ Kokkos::subview( xm, Kokkos::pair< int, int >( 1, 9 ), Kokkos::ALL );
- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2 =
- Kokkos::subview( xm, Kokkos::ALL, std::pair<int,int>(2,4) );
+ ASSERT_TRUE( x2c.is_contiguous() );
+ for ( int j = 0; j < int( x2c.dimension_1() ); ++j )
+ for ( int i = 0; i < int( x2c.dimension_0() ); ++i ) {
+ ASSERT_TRUE( & x2c( i, j ) == & xm( 1 + i, j ) );
+ }
- ASSERT_TRUE( ! x2.is_contiguous() );
- for ( int j = 0 ; j < int(x2.dimension_1()) ; ++j )
- for ( int i = 0 ; i < int(x2.dimension_0()) ; ++i ) {
- ASSERT_TRUE( & x2(i,j) == & xm(i,2+j) );
- }
+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2 =
+ Kokkos::subview( xm, Kokkos::ALL, std::pair< int, int >( 2, 4 ) );
- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2_n1 =
- Kokkos::subview( xm , std::pair<int,int>(1,1) , Kokkos::ALL );
+ ASSERT_TRUE( ! x2.is_contiguous() );
+ for ( int j = 0; j < int( x2.dimension_1() ); ++j )
+ for ( int i = 0; i < int( x2.dimension_0() ); ++i )
+ {
+ ASSERT_TRUE( & x2( i, j ) == & xm( i, 2 + j ) );
+ }
- ASSERT_TRUE( x2_n1.dimension_0() == 0 );
- ASSERT_TRUE( x2_n1.dimension_1() == xm.dimension_1() );
+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2_n1 =
+ Kokkos::subview( xm, std::pair< int, int >( 1, 1 ), Kokkos::ALL );
- Kokkos::View<int**,Kokkos::LayoutRight,Space> x2_n2 =
- Kokkos::subview( xm , Kokkos::ALL , std::pair<int,int>(1,1) );
+ ASSERT_TRUE( x2_n1.dimension_0() == 0 );
+ ASSERT_TRUE( x2_n1.dimension_1() == xm.dimension_1() );
- ASSERT_TRUE( x2_n2.dimension_0() == xm.dimension_0() );
- ASSERT_TRUE( x2_n2.dimension_1() == 0 );
+ Kokkos::View< int**, Kokkos::LayoutRight, Space > x2_n2 =
+ Kokkos::subview( xm, Kokkos::ALL, std::pair< int, int >( 1, 1 ) );
+ ASSERT_TRUE( x2_n2.dimension_0() == xm.dimension_0() );
+ ASSERT_TRUE( x2_n2.dimension_1() == 0 );
}
}
namespace Impl {
-constexpr int N0=113;
-constexpr int N1=11;
-constexpr int N2=17;
-constexpr int N3=5;
-constexpr int N4=7;
+constexpr int N0 = 113;
+constexpr int N1 = 11;
+constexpr int N2 = 17;
+constexpr int N3 = 5;
+constexpr int N4 = 7;
-template<class SubView,class View>
-void test_Check1D(SubView a, View b, std::pair<int,int> range) {
+template< class SubView, class View >
+void test_Check1D( SubView a, View b, std::pair< int, int > range ) {
int errors = 0;
- for(int i=0;i<range.second-range.first;i++) {
- if(a(i)!=b(i+range.first))
- errors++;
+
+ for ( int i = 0; i < range.second - range.first; i++ ) {
+ if ( a( i ) != b( i + range.first ) ) errors++;
+ }
+
+ if ( errors > 0 ) {
+ std::cout << "Error Suviews test_Check1D: " << errors << std::endl;
}
- if(errors>0)
- std::cout << "Error Suviews test_Check1D: " << errors <<std::endl;
+
ASSERT_TRUE( errors == 0 );
}
-template<class SubView,class View>
-void test_Check1D2D(SubView a, View b, int i0, std::pair<int,int> range) {
+template< class SubView, class View >
+void test_Check1D2D( SubView a, View b, int i0, std::pair< int, int > range ) {
int errors = 0;
- for(int i1=0;i1<range.second-range.first;i1++) {
- if(a(i1)!=b(i0,i1+range.first))
- errors++;
+
+ for ( int i1 = 0; i1 < range.second - range.first; i1++ ) {
+ if ( a( i1 ) != b( i0, i1 + range.first ) ) errors++;
}
- if(errors>0)
- std::cout << "Error Suviews test_Check1D2D: " << errors <<std::endl;
+
+ if ( errors > 0 ) {
+ std::cout << "Error Suviews test_Check1D2D: " << errors << std::endl;
+ }
+
ASSERT_TRUE( errors == 0 );
}
-template<class SubView,class View>
-void test_Check2D3D(SubView a, View b, int i0, std::pair<int,int> range1, std::pair<int,int> range2) {
+template< class SubView, class View >
+void test_Check2D3D( SubView a, View b, int i0, std::pair< int, int > range1
+ , std::pair< int, int > range2 )
+{
int errors = 0;
- for(int i1=0;i1<range1.second-range1.first;i1++) {
- for(int i2=0;i2<range2.second-range2.first;i2++) {
- if(a(i1,i2)!=b(i0,i1+range1.first,i2+range2.first))
- errors++;
+
+ for ( int i1 = 0; i1 < range1.second - range1.first; i1++ ) {
+ for ( int i2 = 0; i2 < range2.second - range2.first; i2++ ) {
+ if ( a( i1, i2 ) != b( i0, i1 + range1.first, i2 + range2.first ) ) errors++;
}
}
- if(errors>0)
- std::cout << "Error Suviews test_Check2D3D: " << errors <<std::endl;
+
+ if ( errors > 0 ) {
+ std::cout << "Error Suviews test_Check2D3D: " << errors << std::endl;
+ }
+
ASSERT_TRUE( errors == 0 );
}
-template<class SubView,class View>
-void test_Check3D5D(SubView a, View b, int i0, int i1, std::pair<int,int> range2, std::pair<int,int> range3, std::pair<int,int> range4) {
+template<class SubView, class View>
+void test_Check3D5D( SubView a, View b, int i0, int i1, std::pair< int, int > range2
+ , std::pair< int, int > range3, std::pair< int, int > range4 )
+{
int errors = 0;
- for(int i2=0;i2<range2.second-range2.first;i2++) {
- for(int i3=0;i3<range3.second-range3.first;i3++) {
- for(int i4=0;i4<range4.second-range4.first;i4++) {
- if(a(i2,i3,i4)!=b(i0,i1,i2+range2.first,i3+range3.first,i4+range4.first))
+
+ for ( int i2 = 0; i2 < range2.second - range2.first; i2++ ) {
+ for ( int i3 = 0; i3 < range3.second - range3.first; i3++ ) {
+ for ( int i4 = 0; i4 < range4.second - range4.first; i4++ ) {
+ if ( a( i2, i3, i4 ) != b( i0, i1, i2 + range2.first, i3 + range3.first, i4 + range4.first ) ) {
errors++;
+ }
}
}
}
- if(errors>0)
- std::cout << "Error Suviews test_Check3D5D: " << errors <<std::endl;
+
+ if ( errors > 0 ) {
+ std::cout << "Error Suviews test_Check3D5D: " << errors << std::endl;
+ }
+
ASSERT_TRUE( errors == 0 );
}
-template<class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
+template< class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits >
void test_1d_assign_impl() {
-
- { //Breaks
- Kokkos::View<int*,LayoutOrg,Space> a_org("A",N0);
- Kokkos::View<int*,LayoutOrg,Space,MemTraits> a(a_org);
+ { // Breaks.
+ Kokkos::View< int*, LayoutOrg, Space > a_org( "A", N0 );
+ Kokkos::View< int*, LayoutOrg, Space, MemTraits > a( a_org );
Kokkos::fence();
- for(int i=0; i<N0; i++)
- a_org(i) = i;
+ for ( int i = 0; i < N0; i++ ) a_org( i ) = i;
- Kokkos::View<int[N0],Layout,Space,MemTraits> a1(a);
+ Kokkos::View< int[N0], Layout, Space, MemTraits > a1( a );
Kokkos::fence();
- test_Check1D(a1,a,std::pair<int,int>(0,N0));
+ test_Check1D( a1, a, std::pair< int, int >( 0, N0 ) );
- Kokkos::View<int[N0],LayoutSub,Space,MemTraits> a2(a1);
+ Kokkos::View< int[N0], LayoutSub, Space, MemTraits > a2( a1 );
Kokkos::fence();
- test_Check1D(a2,a,std::pair<int,int>(0,N0));
+ test_Check1D( a2, a, std::pair< int, int >( 0, N0 ) );
a1 = a;
- test_Check1D(a1,a,std::pair<int,int>(0,N0));
+ test_Check1D( a1, a, std::pair< int, int >( 0, N0 ) );
- //Runtime Fail expected
- //Kokkos::View<int[N1]> afail1(a);
+ // Runtime Fail expected.
+ //Kokkos::View< int[N1] > afail1( a );
- //Compile Time Fail expected
- //Kokkos::View<int[N1]> afail2(a1);
+ // Compile Time Fail expected.
+ //Kokkos::View< int[N1] > afail2( a1 );
}
- { // Works
- Kokkos::View<int[N0],LayoutOrg,Space,MemTraits> a("A");
- Kokkos::View<int*,Layout,Space,MemTraits> a1(a);
+ { // Works.
+ Kokkos::View< int[N0], LayoutOrg, Space, MemTraits > a( "A" );
+ Kokkos::View< int*, Layout, Space, MemTraits > a1( a );
Kokkos::fence();
- test_Check1D(a1,a,std::pair<int,int>(0,N0));
+ test_Check1D( a1, a, std::pair< int, int >( 0, N0 ) );
a1 = a;
Kokkos::fence();
- test_Check1D(a1,a,std::pair<int,int>(0,N0));
+ test_Check1D( a1, a, std::pair< int, int >( 0, N0 ) );
}
}
-template<class Space, class Type, class TypeSub,class LayoutSub, class Layout, class LayoutOrg,class MemTraits>
+template< class Space, class Type, class TypeSub, class LayoutSub, class Layout, class LayoutOrg, class MemTraits >
void test_2d_subview_3d_impl_type() {
- Kokkos::View<int***,LayoutOrg,Space> a_org("A",N0,N1,N2);
- Kokkos::View<Type,Layout,Space,MemTraits> a(a_org);
- for(int i0=0; i0<N0; i0++)
- for(int i1=0; i1<N1; i1++)
- for(int i2=0; i2<N2; i2++)
- a_org(i0,i1,i2) = i0*1000000+i1*1000+i2;
- Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a1;
- a1 = Kokkos::subview(a,3,Kokkos::ALL,Kokkos::ALL);
+ Kokkos::View< int***, LayoutOrg, Space > a_org( "A", N0, N1, N2 );
+ Kokkos::View< Type, Layout, Space, MemTraits > a( a_org );
+
+ for ( int i0 = 0; i0 < N0; i0++ )
+ for ( int i1 = 0; i1 < N1; i1++ )
+ for ( int i2 = 0; i2 < N2; i2++ )
+ {
+ a_org( i0, i1, i2 ) = i0 * 1000000 + i1 * 1000 + i2;
+ }
+
+ Kokkos::View< TypeSub, LayoutSub, Space, MemTraits > a1;
+ a1 = Kokkos::subview( a, 3, Kokkos::ALL, Kokkos::ALL );
Kokkos::fence();
- test_Check2D3D(a1,a,3,std::pair<int,int>(0,N1),std::pair<int,int>(0,N2));
+ test_Check2D3D( a1, a, 3, std::pair< int, int >( 0, N1 ), std::pair< int, int >( 0, N2 ) );
- Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a2(a,3,Kokkos::ALL,Kokkos::ALL);
+ Kokkos::View< TypeSub, LayoutSub, Space, MemTraits > a2( a, 3, Kokkos::ALL, Kokkos::ALL );
Kokkos::fence();
- test_Check2D3D(a2,a,3,std::pair<int,int>(0,N1),std::pair<int,int>(0,N2));
+ test_Check2D3D( a2, a, 3, std::pair< int, int >( 0, N1 ), std::pair< int, int >( 0, N2 ) );
}
-template<class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
+template< class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits >
void test_2d_subview_3d_impl_layout() {
- test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int[N0][N1][N2],int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type< Space, int[N0][N1][N2], int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, int[N0][N1][N2], int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, int[N0][N1][N2], int** , LayoutSub, Layout, LayoutOrg, MemTraits >();
- test_2d_subview_3d_impl_type<Space,int* [N1][N2],int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int* [N1][N2],int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int* [N1][N2],int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type< Space, int* [N1][N2], int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, int* [N1][N2], int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, int* [N1][N2], int** , LayoutSub, Layout, LayoutOrg, MemTraits >();
- test_2d_subview_3d_impl_type<Space,int** [N2],int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int** [N2],int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int** [N2],int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type< Space, int** [N2], int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, int** [N2], int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, int** [N2], int** , LayoutSub, Layout, LayoutOrg, MemTraits >();
- test_2d_subview_3d_impl_type<Space,int*** ,int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int*** ,int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,int*** ,int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type< Space, int*** , int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, int*** , int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, int*** , int** , LayoutSub, Layout, LayoutOrg, MemTraits >();
- test_2d_subview_3d_impl_type<Space,const int[N0][N1][N2],const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,const int[N0][N1][N2],const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,const int[N0][N1][N2],const int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type< Space, const int[N0][N1][N2], const int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, const int[N0][N1][N2], const int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, const int[N0][N1][N2], const int** , LayoutSub, Layout, LayoutOrg, MemTraits >();
- test_2d_subview_3d_impl_type<Space,const int* [N1][N2],const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,const int* [N1][N2],const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,const int* [N1][N2],const int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type< Space, const int* [N1][N2], const int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, const int* [N1][N2], const int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, const int* [N1][N2], const int** , LayoutSub, Layout, LayoutOrg, MemTraits >();
- test_2d_subview_3d_impl_type<Space,const int** [N2],const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,const int** [N2],const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,const int** [N2],const int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type< Space, const int** [N2], const int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, const int** [N2], const int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, const int** [N2], const int** , LayoutSub, Layout, LayoutOrg, MemTraits >();
- test_2d_subview_3d_impl_type<Space,const int*** ,const int[N1][N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,const int*** ,const int* [N2],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_2d_subview_3d_impl_type<Space,const int*** ,const int** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_2d_subview_3d_impl_type< Space, const int*** , const int[N1][N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, const int*** , const int* [N2], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_2d_subview_3d_impl_type< Space, const int*** , const int** , LayoutSub, Layout, LayoutOrg, MemTraits >();
}
-template<class Space, class Type, class TypeSub,class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
+template< class Space, class Type, class TypeSub, class LayoutSub, class Layout, class LayoutOrg, class MemTraits >
void test_3d_subview_5d_impl_type() {
- Kokkos::View<int*****,LayoutOrg,Space> a_org("A",N0,N1,N2,N3,N4);
- Kokkos::View<Type,Layout,Space,MemTraits> a(a_org);
- for(int i0=0; i0<N0; i0++)
- for(int i1=0; i1<N1; i1++)
- for(int i2=0; i2<N2; i2++)
- for(int i3=0; i3<N3; i3++)
- for(int i4=0; i4<N4; i4++)
- a_org(i0,i1,i2,i3,i4) = i0*1000000+i1*10000+i2*100+i3*10+i4;
- Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a1;
- a1 = Kokkos::subview(a,3,5,Kokkos::ALL,Kokkos::ALL,Kokkos::ALL);
+ Kokkos::View< int*****, LayoutOrg, Space > a_org( "A", N0, N1, N2, N3, N4 );
+ Kokkos::View< Type, Layout, Space, MemTraits > a( a_org );
+
+ for ( int i0 = 0; i0 < N0; i0++ )
+ for ( int i1 = 0; i1 < N1; i1++ )
+ for ( int i2 = 0; i2 < N2; i2++ )
+ for ( int i3 = 0; i3 < N3; i3++ )
+ for ( int i4 = 0; i4 < N4; i4++ )
+ {
+ a_org( i0, i1, i2, i3, i4 ) = i0 * 1000000 + i1 * 10000 + i2 * 100 + i3 * 10 + i4;
+ }
+
+ Kokkos::View< TypeSub, LayoutSub, Space, MemTraits > a1;
+ a1 = Kokkos::subview( a, 3, 5, Kokkos::ALL, Kokkos::ALL, Kokkos::ALL );
Kokkos::fence();
- test_Check3D5D(a1,a,3,5,std::pair<int,int>(0,N2),std::pair<int,int>(0,N3),std::pair<int,int>(0,N4));
+ test_Check3D5D( a1, a, 3, 5, std::pair< int, int >( 0, N2 ), std::pair< int, int >( 0, N3 ), std::pair< int, int >( 0, N4 ) );
- Kokkos::View<TypeSub,LayoutSub,Space,MemTraits> a2(a,3,5,Kokkos::ALL,Kokkos::ALL,Kokkos::ALL);
+ Kokkos::View< TypeSub, LayoutSub, Space, MemTraits > a2( a, 3, 5, Kokkos::ALL, Kokkos::ALL, Kokkos::ALL );
Kokkos::fence();
- test_Check3D5D(a2,a,3,5,std::pair<int,int>(0,N2),std::pair<int,int>(0,N3),std::pair<int,int>(0,N4));
+ test_Check3D5D( a2, a, 3, 5, std::pair< int, int >( 0, N2 ), std::pair< int, int >( 0, N3 ), std::pair< int, int >( 0, N4 ) );
}
-template<class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits>
+template< class Space, class LayoutSub, class Layout, class LayoutOrg, class MemTraits >
void test_3d_subview_5d_impl_layout() {
- test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int[N0][N1][N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int* [N1][N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, int** [N2][N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int** [N2][N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int** [N2][N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int** [N2][N3][N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, int*** [N3][N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int*** [N3][N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int*** [N3][N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int*** [N3][N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, int**** [N4],int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int**** [N4],int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int**** [N4],int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int**** [N4],int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, int***** ,int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int***** ,int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int***** ,int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, int***** ,int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int[N0][N1][N2][N3][N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int* [N1][N2][N3][N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, const int** [N2][N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int** [N2][N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int** [N2][N3][N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int** [N2][N3][N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, const int*** [N3][N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int*** [N3][N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int*** [N3][N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int*** [N3][N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, const int**** [N4],const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int**** [N4],const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int**** [N4],const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int**** [N4],const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
-
- test_3d_subview_5d_impl_type<Space, const int***** ,const int[N2][N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int***** ,const int* [N3][N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int***** ,const int** [N4],LayoutSub, Layout, LayoutOrg, MemTraits>();
- test_3d_subview_5d_impl_type<Space, const int***** ,const int*** ,LayoutSub, Layout, LayoutOrg, MemTraits>();
+ test_3d_subview_5d_impl_type< Space, int[N0][N1][N2][N3][N4], int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int[N0][N1][N2][N3][N4], int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int[N0][N1][N2][N3][N4], int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int[N0][N1][N2][N3][N4], int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, int* [N1][N2][N3][N4], int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int* [N1][N2][N3][N4], int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int* [N1][N2][N3][N4], int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int* [N1][N2][N3][N4], int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, int** [N2][N3][N4], int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int** [N2][N3][N4], int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int** [N2][N3][N4], int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int** [N2][N3][N4], int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, int*** [N3][N4], int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int*** [N3][N4], int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int*** [N3][N4], int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int*** [N3][N4], int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, int**** [N4], int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int**** [N4], int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int**** [N4], int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int**** [N4], int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, int***** , int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int***** , int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int***** , int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, int***** , int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, const int[N0][N1][N2][N3][N4], const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int[N0][N1][N2][N3][N4], const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int[N0][N1][N2][N3][N4], const int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int[N0][N1][N2][N3][N4], const int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, const int* [N1][N2][N3][N4], const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int* [N1][N2][N3][N4], const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int* [N1][N2][N3][N4], const int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int* [N1][N2][N3][N4], const int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, const int** [N2][N3][N4], const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int** [N2][N3][N4], const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int** [N2][N3][N4], const int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int** [N2][N3][N4], const int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, const int*** [N3][N4], const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int*** [N3][N4], const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int*** [N3][N4], const int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int*** [N3][N4], const int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, const int**** [N4], const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int**** [N4], const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int**** [N4], const int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int**** [N4], const int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
+
+ test_3d_subview_5d_impl_type< Space, const int***** , const int[N2][N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int***** , const int* [N3][N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int***** , const int** [N4], LayoutSub, Layout, LayoutOrg, MemTraits >();
+ test_3d_subview_5d_impl_type< Space, const int***** , const int*** , LayoutSub, Layout, LayoutOrg, MemTraits >();
}
inline
void test_subview_legal_args_right() {
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
-
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
-
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
-
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
-
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
-
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
-
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutRight,Kokkos::LayoutRight,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, int >::value ) );
+
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
+
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
+
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
+
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t >::value ) );
+
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
+
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutRight, Kokkos::LayoutRight, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
}
inline
void test_subview_legal_args_left() {
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,int>::value));
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,int>::value));
-
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
-
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>,int>::value));
-
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int>::value));
-
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,int,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,int,Kokkos::Impl::ALL_t>::value));
-
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,5,0,int,int,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
-
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(1,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::Impl::ALL_t,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::Impl::ALL_t>::value));
- ASSERT_EQ(0,(Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime<Kokkos::LayoutLeft,Kokkos::LayoutLeft,3,3,0,Kokkos::pair<int,int>,Kokkos::pair<int,int>,Kokkos::pair<int,int>>::value));
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, int >::value ) );
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, int >::value ) );
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, int >::value ) );
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, int >::value ) );
+
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
+
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int>, int >::value ) );
+
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int >::value ) );
+
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, int, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, int, Kokkos::Impl::ALL_t >::value ) );
+
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 5, 0, int, int, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
+
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 1, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::Impl::ALL_t, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::Impl::ALL_t >::value ) );
+ ASSERT_EQ( 0, ( Kokkos::Experimental::Impl::SubviewLegalArgsCompileTime< Kokkos::LayoutLeft, Kokkos::LayoutLeft, 3, 3, 0, Kokkos::pair<int, int>, Kokkos::pair<int, int>, Kokkos::pair<int, int> >::value ) );
}
-}
+} // namespace Impl
-template< class Space, class MemTraits = void>
+template< class Space, class MemTraits = void >
void test_1d_assign() {
- Impl::test_1d_assign_impl<Space,Kokkos::LayoutLeft ,Kokkos::LayoutLeft ,Kokkos::LayoutLeft, MemTraits>();
- //Impl::test_1d_assign_impl<Space,Kokkos::LayoutRight ,Kokkos::LayoutLeft ,Kokkos::LayoutLeft >();
- Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft ,Kokkos::LayoutLeft, MemTraits>();
- //Impl::test_1d_assign_impl<Space,Kokkos::LayoutLeft ,Kokkos::LayoutRight ,Kokkos::LayoutLeft >();
- Impl::test_1d_assign_impl<Space,Kokkos::LayoutRight ,Kokkos::LayoutRight ,Kokkos::LayoutRight, MemTraits>();
- Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutRight ,Kokkos::LayoutRight, MemTraits>();
- //Impl::test_1d_assign_impl<Space,Kokkos::LayoutLeft ,Kokkos::LayoutStride,Kokkos::LayoutLeft >();
- //Impl::test_1d_assign_impl<Space,Kokkos::LayoutRight ,Kokkos::LayoutStride,Kokkos::LayoutLeft >();
- Impl::test_1d_assign_impl<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft, MemTraits>();
+ Impl::test_1d_assign_impl< Space, Kokkos::LayoutLeft, Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits >();
+ //Impl::test_1d_assign_impl< Space, Kokkos::LayoutRight, Kokkos::LayoutLeft, Kokkos::LayoutLeft >();
+ Impl::test_1d_assign_impl< Space, Kokkos::LayoutStride, Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits >();
+ //Impl::test_1d_assign_impl< Space, Kokkos::LayoutLeft, Kokkos::LayoutRight, Kokkos::LayoutLeft >();
+ Impl::test_1d_assign_impl< Space, Kokkos::LayoutRight, Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits >();
+ Impl::test_1d_assign_impl< Space, Kokkos::LayoutStride, Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits >();
+ //Impl::test_1d_assign_impl< Space, Kokkos::LayoutLeft, Kokkos::LayoutStride, Kokkos::LayoutLeft >();
+ //Impl::test_1d_assign_impl< Space, Kokkos::LayoutRight, Kokkos::LayoutStride, Kokkos::LayoutLeft >();
+ Impl::test_1d_assign_impl< Space, Kokkos::LayoutStride, Kokkos::LayoutStride, Kokkos::LayoutLeft, MemTraits >();
}
-template<class Space, class MemTraits = void>
+template< class Space, class MemTraits = void >
void test_2d_subview_3d() {
- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutRight ,Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits>();
- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits>();
- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutRight, MemTraits>();
- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits>();
- Impl::test_2d_subview_3d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft, MemTraits>();
+ Impl::test_2d_subview_3d_impl_layout< Space, Kokkos::LayoutRight, Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits >();
+ Impl::test_2d_subview_3d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits >();
+ Impl::test_2d_subview_3d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutStride, Kokkos::LayoutRight, MemTraits >();
+ Impl::test_2d_subview_3d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits >();
+ Impl::test_2d_subview_3d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutStride, Kokkos::LayoutLeft, MemTraits >();
}
-template<class Space, class MemTraits = void>
+template< class Space, class MemTraits = void >
void test_3d_subview_5d_right() {
- Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits>();
- Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutRight, MemTraits>();
+ Impl::test_3d_subview_5d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutRight, Kokkos::LayoutRight, MemTraits >();
+ Impl::test_3d_subview_5d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutStride, Kokkos::LayoutRight, MemTraits >();
}
-template<class Space, class MemTraits = void>
+template< class Space, class MemTraits = void >
void test_3d_subview_5d_left() {
- Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits>();
- Impl::test_3d_subview_5d_impl_layout<Space,Kokkos::LayoutStride,Kokkos::LayoutStride,Kokkos::LayoutLeft, MemTraits>();
+ Impl::test_3d_subview_5d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutLeft, Kokkos::LayoutLeft, MemTraits >();
+ Impl::test_3d_subview_5d_impl_layout< Space, Kokkos::LayoutStride, Kokkos::LayoutStride, Kokkos::LayoutLeft, MemTraits >();
}
+namespace Impl {
+template< class Layout, class Space >
+struct FillView_3D {
+ Kokkos::View< int***, Layout, Space > a;
-namespace Impl {
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int & ii ) const
+ {
+ const int i = std::is_same< Layout, Kokkos::LayoutLeft >::value
+ ? ii % a.dimension_0()
+ : ii / ( a.dimension_1() * a.dimension_2() );
+
+ const int j = std::is_same< Layout, Kokkos::LayoutLeft >::value
+ ? ( ii / a.dimension_0() ) % a.dimension_1()
+ : ( ii / a.dimension_2() ) % a.dimension_1();
+
+ const int k = std::is_same< Layout, Kokkos::LayoutRight >::value
+ ? ii / ( a.dimension_0() * a.dimension_1() )
+ : ii % a.dimension_2();
- template<class Layout, class Space>
- struct FillView_3D {
- Kokkos::View<int***,Layout,Space> a;
-
- KOKKOS_INLINE_FUNCTION
- void operator() (const int& ii) const {
- const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
- ii % a.dimension_0(): ii / (a.dimension_1()*a.dimension_2());
- const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
- (ii / a.dimension_0()) % a.dimension_1() : (ii / a.dimension_2()) % a.dimension_1();
- const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
- ii / (a.dimension_0() * a.dimension_1()) : ii % a.dimension_2();
- a(i,j,k) = 1000000 * i + 1000 * j + k;
+ a( i, j, k ) = 1000000 * i + 1000 * j + k;
+ }
+};
+
+template< class Layout, class Space >
+struct FillView_4D {
+ Kokkos::View< int****, Layout, Space > a;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int & ii ) const {
+ const int i = std::is_same< Layout, Kokkos::LayoutLeft >::value
+ ? ii % a.dimension_0()
+ : ii / ( a.dimension_1() * a.dimension_2() * a.dimension_3() );
+
+ const int j = std::is_same< Layout, Kokkos::LayoutLeft >::value
+ ? ( ii / a.dimension_0() ) % a.dimension_1()
+ : ( ii / ( a.dimension_2() * a.dimension_3() ) % a.dimension_1() );
+
+ const int k = std::is_same< Layout, Kokkos::LayoutRight >::value
+ ? ( ii / ( a.dimension_0() * a.dimension_1() ) ) % a.dimension_2()
+ : ( ii / a.dimension_3() ) % a.dimension_2();
+
+ const int l = std::is_same< Layout, Kokkos::LayoutRight >::value
+ ? ii / ( a.dimension_0() * a.dimension_1() * a.dimension_2() )
+ : ii % a.dimension_3();
+
+ a( i, j, k, l ) = 1000000 * i + 10000 * j + 100 * k + l;
+ }
+};
+
+template< class Layout, class Space, class MemTraits >
+struct CheckSubviewCorrectness_3D_3D {
+ Kokkos::View< const int***, Layout, Space, MemTraits > a;
+ Kokkos::View< const int***, Layout, Space, MemTraits > b;
+ int offset_0, offset_2;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int & ii ) const
+ {
+ const int i = std::is_same< Layout, Kokkos::LayoutLeft >::value
+ ? ii % b.dimension_0()
+ : ii / ( b.dimension_1() * b.dimension_2() );
+
+ const int j = std::is_same< Layout, Kokkos::LayoutLeft >::value
+ ? ( ii / b.dimension_0() ) % b.dimension_1()
+ : ( ii / b.dimension_2() ) % b.dimension_1();
+
+ const int k = std::is_same< Layout, Kokkos::LayoutRight >::value
+ ? ii / ( b.dimension_0() * b.dimension_1() )
+ : ii % b.dimension_2();
+
+ if ( a( i + offset_0, j, k + offset_2 ) != b( i, j, k ) ) {
+ Kokkos::abort( "Error: check_subview_correctness 3D-3D (LayoutLeft -> LayoutLeft or LayoutRight -> LayoutRight)" );
}
- };
-
- template<class Layout, class Space>
- struct FillView_4D {
- Kokkos::View<int****,Layout,Space> a;
-
- KOKKOS_INLINE_FUNCTION
- void operator() (const int& ii) const {
- const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
- ii % a.dimension_0(): ii / (a.dimension_1()*a.dimension_2()*a.dimension_3());
- const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
- (ii / a.dimension_0()) % a.dimension_1() : (ii / (a.dimension_2()*a.dimension_3()) % a.dimension_1());
- const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
- (ii / (a.dimension_0() * a.dimension_1())) % a.dimension_2() : (ii / a.dimension_3()) % a.dimension_2();
- const int l = std::is_same<Layout,Kokkos::LayoutRight>::value ?
- ii / (a.dimension_0() * a.dimension_1() * a.dimension_2()) : ii % a.dimension_3();
- a(i,j,k,l) = 1000000 * i + 10000 * j + 100 * k + l;
+ }
+};
+
+template< class Layout, class Space, class MemTraits >
+struct CheckSubviewCorrectness_3D_4D {
+ Kokkos::View< const int****, Layout, Space, MemTraits > a;
+ Kokkos::View< const int***, Layout, Space, MemTraits > b;
+ int offset_0, offset_2, index;
+
+ KOKKOS_INLINE_FUNCTION
+ void operator()( const int & ii ) const {
+ const int i = std::is_same< Layout, Kokkos::LayoutLeft >::value
+ ? ii % b.dimension_0()
+ : ii / ( b.dimension_1() * b.dimension_2() );
+
+ const int j = std::is_same< Layout, Kokkos::LayoutLeft >::value
+ ? ( ii / b.dimension_0() ) % b.dimension_1()
+ : ( ii / b.dimension_2() ) % b.dimension_1();
+
+ const int k = std::is_same< Layout, Kokkos::LayoutRight >::value
+ ? ii / ( b.dimension_0() * b.dimension_1() )
+ : ii % b.dimension_2();
+
+ int i0, i1, i2, i3;
+
+ if ( std::is_same< Layout, Kokkos::LayoutLeft >::value ) {
+ i0 = i + offset_0;
+ i1 = j;
+ i2 = k + offset_2;
+ i3 = index;
}
- };
-
- template<class Layout, class Space, class MemTraits>
- struct CheckSubviewCorrectness_3D_3D {
- Kokkos::View<const int***,Layout,Space,MemTraits> a;
- Kokkos::View<const int***,Layout,Space,MemTraits> b;
- int offset_0,offset_2;
-
- KOKKOS_INLINE_FUNCTION
- void operator() (const int& ii) const {
- const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
- ii % b.dimension_0(): ii / (b.dimension_1()*b.dimension_2());
- const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
- (ii / b.dimension_0()) % b.dimension_1() : (ii / b.dimension_2()) % b.dimension_1();
- const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
- ii / (b.dimension_0() * b.dimension_1()) : ii % b.dimension_2();
- if( a(i+offset_0,j,k+offset_2) != b(i,j,k))
- Kokkos::abort("Error: check_subview_correctness 3D-3D (LayoutLeft -> LayoutLeft or LayoutRight -> LayoutRight)");
+ else {
+ i0 = index;
+ i1 = i + offset_0;
+ i2 = j;
+ i3 = k + offset_2;
}
- };
-
- template<class Layout, class Space, class MemTraits>
- struct CheckSubviewCorrectness_3D_4D {
- Kokkos::View<const int****,Layout,Space,MemTraits> a;
- Kokkos::View<const int***,Layout,Space,MemTraits> b;
- int offset_0,offset_2,index;
-
- KOKKOS_INLINE_FUNCTION
- void operator() (const int& ii) const {
- const int i = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
- ii % b.dimension_0(): ii / (b.dimension_1()*b.dimension_2());
- const int j = std::is_same<Layout,Kokkos::LayoutLeft>::value ?
- (ii / b.dimension_0()) % b.dimension_1() : (ii / b.dimension_2()) % b.dimension_1();
- const int k = std::is_same<Layout,Kokkos::LayoutRight>::value ?
- ii / (b.dimension_0() * b.dimension_1()) : ii % b.dimension_2();
-
- int i0,i1,i2,i3;
- if(std::is_same<Layout,Kokkos::LayoutLeft>::value) {
- i0 = i + offset_0;
- i1 = j;
- i2 = k + offset_2;
- i3 = index;
- } else {
- i0 = index;
- i1 = i + offset_0;
- i2 = j;
- i3 = k + offset_2;
- }
- if( a(i0,i1,i2,i3) != b(i,j,k))
- Kokkos::abort("Error: check_subview_correctness 3D-4D (LayoutLeft -> LayoutLeft or LayoutRight -> LayoutRight)");
+
+ if ( a( i0, i1, i2, i3 ) != b( i, j, k ) ) {
+ Kokkos::abort( "Error: check_subview_correctness 3D-4D (LayoutLeft -> LayoutLeft or LayoutRight -> LayoutRight)" );
}
- };
-}
+ }
+};
-template<class Space, class MemTraits = void>
+} // namespace Impl
+
+template< class Space, class MemTraits = void >
void test_layoutleft_to_layoutleft() {
Impl::test_subview_legal_args_left();
{
- Kokkos::View<int***,Kokkos::LayoutLeft,Space> a("A",100,4,3);
- Kokkos::View<int***,Kokkos::LayoutLeft,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::ALL);
+ Kokkos::View< int***, Kokkos::LayoutLeft, Space > a( "A", 100, 4, 3 );
+ Kokkos::View< int***, Kokkos::LayoutLeft, Space > b( a, Kokkos::pair< int, int >( 16, 32 ), Kokkos::ALL, Kokkos::ALL );
- Impl::FillView_3D<Kokkos::LayoutLeft,Space> fill;
+ Impl::FillView_3D< Kokkos::LayoutLeft, Space > fill;
fill.a = a;
- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)*a.extent(1)*a.extent(2)), fill);
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, a.extent( 0 ) * a.extent( 1 ) * a.extent( 2 ) ), fill );
- Impl::CheckSubviewCorrectness_3D_3D<Kokkos::LayoutLeft,Space,MemTraits> check;
+ Impl::CheckSubviewCorrectness_3D_3D< Kokkos::LayoutLeft, Space, MemTraits > check;
check.a = a;
check.b = b;
check.offset_0 = 16;
check.offset_2 = 0;
- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)*b.extent(1)*b.extent(2)), check);
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, b.extent( 0 ) * b.extent( 1 ) * b.extent( 2 ) ), check );
}
+
{
- Kokkos::View<int***,Kokkos::LayoutLeft,Space> a("A",100,4,5);
- Kokkos::View<int***,Kokkos::LayoutLeft,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::pair<int,int>(1,3));
+ Kokkos::View< int***, Kokkos::LayoutLeft, Space > a( "A", 100, 4, 5 );
+ Kokkos::View< int***, Kokkos::LayoutLeft, Space > b( a, Kokkos::pair< int, int >( 16, 32 ), Kokkos::ALL, Kokkos::pair< int, int >( 1, 3 ) );
- Impl::FillView_3D<Kokkos::LayoutLeft,Space> fill;
+ Impl::FillView_3D<Kokkos::LayoutLeft, Space> fill;
fill.a = a;
- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)*a.extent(1)*a.extent(2)), fill);
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, a.extent( 0 ) * a.extent( 1 ) * a.extent( 2 ) ), fill );
- Impl::CheckSubviewCorrectness_3D_3D<Kokkos::LayoutLeft,Space,MemTraits> check;
+ Impl::CheckSubviewCorrectness_3D_3D< Kokkos::LayoutLeft, Space, MemTraits > check;
check.a = a;
check.b = b;
check.offset_0 = 16;
check.offset_2 = 1;
- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)*b.extent(1)*b.extent(2)), check);
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, b.extent( 0 ) * b.extent( 1 ) * b.extent( 2 ) ), check );
}
+
{
- Kokkos::View<int****,Kokkos::LayoutLeft,Space> a("A",100,4,5,3);
- Kokkos::View<int***,Kokkos::LayoutLeft,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::pair<int,int>(1,3),1);
+ Kokkos::View< int****, Kokkos::LayoutLeft, Space > a( "A", 100, 4, 5, 3 );
+ Kokkos::View< int***, Kokkos::LayoutLeft, Space > b( a, Kokkos::pair< int, int >( 16, 32 ), Kokkos::ALL, Kokkos::pair< int, int >( 1, 3 ), 1 );
- Impl::FillView_4D<Kokkos::LayoutLeft,Space> fill;
+ Impl::FillView_4D< Kokkos::LayoutLeft, Space > fill;
fill.a = a;
- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)*a.extent(1)*a.extent(2)*a.extent(3)), fill);
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, a.extent( 0 ) * a.extent( 1 ) * a.extent( 2 ) * a.extent( 3 ) ), fill );
- Impl::CheckSubviewCorrectness_3D_4D<Kokkos::LayoutLeft,Space,MemTraits> check;
+ Impl::CheckSubviewCorrectness_3D_4D< Kokkos::LayoutLeft, Space, MemTraits > check;
check.a = a;
check.b = b;
check.offset_0 = 16;
check.offset_2 = 1;
check.index = 1;
- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)*b.extent(1)*b.extent(2)), check);
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, b.extent( 0 ) * b.extent( 1 ) * b.extent( 2 ) ), check );
}
}
-template<class Space, class MemTraits = void>
+template< class Space, class MemTraits = void >
void test_layoutright_to_layoutright() {
Impl::test_subview_legal_args_right();
{
- Kokkos::View<int***,Kokkos::LayoutRight,Space> a("A",100,4,3);
- Kokkos::View<int***,Kokkos::LayoutRight,Space> b(a,Kokkos::pair<int,int>(16,32),Kokkos::ALL,Kokkos::ALL);
+ Kokkos::View< int***, Kokkos::LayoutRight, Space > a( "A", 100, 4, 3 );
+ Kokkos::View< int***, Kokkos::LayoutRight, Space > b( a, Kokkos::pair< int, int >( 16, 32 ), Kokkos::ALL, Kokkos::ALL );
- Impl::FillView_3D<Kokkos::LayoutRight,Space> fill;
+ Impl::FillView_3D<Kokkos::LayoutRight, Space> fill;
fill.a = a;
- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)*a.extent(1)*a.extent(2)), fill);
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, a.extent( 0 ) * a.extent( 1 ) * a.extent( 2 ) ), fill );
- Impl::CheckSubviewCorrectness_3D_3D<Kokkos::LayoutRight,Space,MemTraits> check;
+ Impl::CheckSubviewCorrectness_3D_3D< Kokkos::LayoutRight, Space, MemTraits > check;
check.a = a;
check.b = b;
check.offset_0 = 16;
check.offset_2 = 0;
- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)*b.extent(1)*b.extent(2)), check);
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, b.extent( 0 ) * b.extent( 1 ) * b.extent( 2 ) ), check );
}
- {
- Kokkos::View<int****,Kokkos::LayoutRight,Space> a("A",3,4,5,100);
- Kokkos::View<int***,Kokkos::LayoutRight,Space> b(a,1,Kokkos::pair<int,int>(1,3),Kokkos::ALL,Kokkos::ALL);
+ {
+ Kokkos::View< int****, Kokkos::LayoutRight, Space > a( "A", 3, 4, 5, 100 );
+ Kokkos::View< int***, Kokkos::LayoutRight, Space > b( a, 1, Kokkos::pair< int, int >( 1, 3 ), Kokkos::ALL, Kokkos::ALL );
- Impl::FillView_4D<Kokkos::LayoutRight,Space> fill;
+ Impl::FillView_4D< Kokkos::LayoutRight, Space > fill;
fill.a = a;
- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,a.extent(0)*a.extent(1)*a.extent(2)*a.extent(3)), fill);
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, a.extent( 0 ) * a.extent( 1 ) * a.extent( 2 ) * a.extent( 3 ) ), fill );
- Impl::CheckSubviewCorrectness_3D_4D<Kokkos::LayoutRight,Space,MemTraits> check;
+ Impl::CheckSubviewCorrectness_3D_4D< Kokkos::LayoutRight, Space, MemTraits > check;
check.a = a;
check.b = b;
check.offset_0 = 1;
check.offset_2 = 0;
check.index = 1;
- Kokkos::parallel_for(Kokkos::RangePolicy<typename Space::execution_space>(0,b.extent(0)*b.extent(1)*b.extent(2)), check);
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename Space::execution_space >( 0, b.extent( 0 ) * b.extent( 1 ) * b.extent( 2 ) ), check );
}
}
-
-}
-//----------------------------------------------------------------------------
-
+} // namespace TestViewSubview
diff --git a/lib/kokkos/core/unit_test/UnitTestMain.cpp b/lib/kokkos/core/unit_test/UnitTestMain.cpp
index f952ab3db..4f52fc956 100644
--- a/lib/kokkos/core/unit_test/UnitTestMain.cpp
+++ b/lib/kokkos/core/unit_test/UnitTestMain.cpp
@@ -1,50 +1,49 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
-int main(int argc, char *argv[]) {
- ::testing::InitGoogleTest(&argc,argv);
+int main( int argc, char *argv[] ) {
+ ::testing::InitGoogleTest( &argc, argv );
return RUN_ALL_TESTS();
}
-
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda.hpp b/lib/kokkos/core/unit_test/cuda/TestCuda.hpp
index 36b9b0688..768b03920 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda.hpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda.hpp
@@ -1,111 +1,103 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#ifndef KOKKOS_TEST_CUDA_HPP
#define KOKKOS_TEST_CUDA_HPP
+
#include <gtest/gtest.h>
#include <Kokkos_Macros.hpp>
-
#include <Kokkos_Core.hpp>
#include <TestTile.hpp>
-
-//----------------------------------------------------------------------------
-
#include <TestSharedAlloc.hpp>
#include <TestViewMapping.hpp>
-
-
#include <TestViewAPI.hpp>
#include <TestViewOfClass.hpp>
#include <TestViewSubview.hpp>
#include <TestViewSpaceAssign.hpp>
#include <TestAtomic.hpp>
#include <TestAtomicOperations.hpp>
-
#include <TestAtomicViews.hpp>
-
#include <TestRange.hpp>
#include <TestTeam.hpp>
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestAggregate.hpp>
#include <TestCompilerMacros.hpp>
#include <TestTaskScheduler.hpp>
#include <TestMemoryPool.hpp>
-
-
#include <TestCXX11.hpp>
#include <TestCXX11Deduction.hpp>
#include <TestTeamVector.hpp>
#include <TestTemplateMetaFunctions.hpp>
-
#include <TestPolicyConstruction.hpp>
-
#include <TestMDRange.hpp>
namespace Test {
-// For Some Reason I can only have the definition of SetUp and TearDown in one cpp file ...
+// For some reason I can only have the definition of SetUp and TearDown in one cpp file ...
class cuda : public ::testing::Test {
protected:
static void SetUpTestCase();
static void TearDownTestCase();
};
#ifdef TEST_CUDA_INSTANTIATE_SETUP_TEARDOWN
void cuda::SetUpTestCase()
- {
- Kokkos::Cuda::print_configuration( std::cout );
- Kokkos::HostSpace::execution_space::initialize();
- Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice(0) );
- }
+{
+ Kokkos::print_configuration( std::cout );
+ Kokkos::HostSpace::execution_space::initialize();
+ Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice( 0 ) );
+}
void cuda::TearDownTestCase()
- {
- Kokkos::Cuda::finalize();
- Kokkos::HostSpace::execution_space::finalize();
- }
-#endif
+{
+ Kokkos::Cuda::finalize();
+ Kokkos::HostSpace::execution_space::finalize();
}
#endif
+
+} // namespace Test
+
+#endif
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp
index ff379dc80..7cf19b26d 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Atomics.cpp
@@ -1,203 +1,203 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda , atomics )
+TEST_F( cuda, atomics )
{
- const int loop_count = 1e3 ;
+ const int loop_count = 1e3;
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Cuda>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Cuda>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Cuda>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Cuda >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Cuda >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Cuda >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Cuda>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Cuda>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Cuda>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Cuda >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Cuda >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Cuda >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Cuda>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Cuda>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Cuda>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Cuda >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Cuda >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Cuda >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Cuda>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Cuda>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Cuda>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Cuda >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Cuda >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Cuda >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Cuda>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Cuda>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Cuda>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Cuda >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Cuda >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Cuda >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Cuda>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Cuda>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Cuda>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Cuda >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Cuda >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Cuda >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Cuda>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Cuda>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Cuda>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Cuda >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Cuda >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Cuda >( 100, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Cuda>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Cuda>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Cuda>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Cuda >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Cuda >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Cuda >( 100, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Cuda>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Cuda>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Cuda>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Cuda >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Cuda >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Cuda >( 100, 3 ) ) );
}
-TEST_F( cuda , atomic_operations )
+TEST_F( cuda, atomic_operations )
{
- const int start = 1; //Avoid zero for division
+ const int start = 1; // Avoid zero for division.
const int end = 11;
- for (int i = start; i < end; ++i)
+
+ for ( int i = start; i < end; ++i )
{
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Cuda>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Cuda>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Cuda>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Cuda>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Cuda>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Cuda>(start, end-i, 4 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Cuda>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Cuda >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Cuda >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Cuda >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Cuda >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Cuda >( start, end -i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Cuda >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Cuda >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Cuda >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Cuda >( start, end - i, 4 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Cuda >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Cuda >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Cuda >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Cuda >( start, end - i, 4 ) ) );
}
}
-TEST_F( cuda , atomic_views_integral )
+TEST_F( cuda, atomic_views_integral )
{
const long length = 1000000;
+
{
- //Integral Types
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Cuda>(length, 8 ) ) );
+ // Integral Types.
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Cuda >( length, 8 ) ) );
}
}
-TEST_F( cuda , atomic_views_nonintegral )
+TEST_F( cuda, atomic_views_nonintegral )
{
const long length = 1000000;
- {
- //Non-Integral Types
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Cuda>(length, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Cuda>(length, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Cuda>(length, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Cuda>(length, 4 ) ) );
+ {
+ // Non-Integral Types.
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Cuda >( length, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Cuda >( length, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Cuda >( length, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Cuda >( length, 4 ) ) );
}
}
-
-TEST_F( cuda , atomic_view_api )
+TEST_F( cuda, atomic_view_api )
{
- TestAtomicViews::TestAtomicViewAPI<int, Kokkos::Cuda>();
+ TestAtomicViews::TestAtomicViewAPI< int, Kokkos::Cuda >();
}
-
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp
index aeaa2a0e8..e655193a5 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Other.cpp
@@ -1,189 +1,194 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#define TEST_CUDA_INSTANTIATE_SETUP_TEARDOWN
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda , init ) {
+TEST_F( cuda, init )
+{
;
}
-TEST_F( cuda , md_range ) {
- TestMDRange_2D< Kokkos::Cuda >::test_for2(100,100);
-
- TestMDRange_3D< Kokkos::Cuda >::test_for3(100,100,100);
+TEST_F( cuda , mdrange_for ) {
+ TestMDRange_2D< Kokkos::Cuda >::test_for2( 100, 100 );
+ TestMDRange_3D< Kokkos::Cuda >::test_for3( 100, 100, 100 );
+ TestMDRange_4D< Kokkos::Cuda >::test_for4( 100, 10, 100, 10 );
+ TestMDRange_5D< Kokkos::Cuda >::test_for5( 100, 10, 10, 10, 5 );
+ TestMDRange_6D< Kokkos::Cuda >::test_for6( 100, 10, 5, 2, 10, 5 );
}
-TEST_F( cuda, policy_construction) {
+TEST_F( cuda, policy_construction )
+{
TestRangePolicyConstruction< Kokkos::Cuda >();
TestTeamPolicyConstruction< Kokkos::Cuda >();
}
-TEST_F( cuda , range_tag )
+TEST_F( cuda, range_tag )
{
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
-
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(2);
-
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
-
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
-
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
- TestRange< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_scan( 0 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 0 );
+
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_scan( 2 );
+
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 3 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 3 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 3 );
+
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_scan( 1000 );
+
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1001 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1001 );
+ TestRange< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 1001 );
}
-
//----------------------------------------------------------------------------
-TEST_F( cuda , compiler_macros )
+TEST_F( cuda, compiler_macros )
{
ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Cuda >() ) );
}
//----------------------------------------------------------------------------
-TEST_F( cuda , memory_pool )
+TEST_F( cuda, memory_pool )
{
bool val = TestMemoryPool::test_mempool< Kokkos::Cuda >( 128, 128000000 );
ASSERT_TRUE( val );
TestMemoryPool::test_mempool2< Kokkos::Cuda >( 64, 4, 1000000, 2000000 );
TestMemoryPool::test_memory_exhaustion< Kokkos::Cuda >();
}
//----------------------------------------------------------------------------
#if defined( KOKKOS_ENABLE_TASKDAG )
-TEST_F( cuda , task_fib )
+TEST_F( cuda, task_fib )
{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskScheduler::TestFib< Kokkos::Cuda >::run(i, (i+1)*(i+1)*10000 );
+ for ( int i = 0; i < 25; ++i ) {
+ TestTaskScheduler::TestFib< Kokkos::Cuda >::run( i, ( i + 1 ) * ( i + 1 ) * 10000 );
}
}
-TEST_F( cuda , task_depend )
+TEST_F( cuda, task_depend )
{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskScheduler::TestTaskDependence< Kokkos::Cuda >::run(i);
+ for ( int i = 0; i < 25; ++i ) {
+ TestTaskScheduler::TestTaskDependence< Kokkos::Cuda >::run( i );
}
}
-TEST_F( cuda , task_team )
+TEST_F( cuda, task_team )
{
- TestTaskScheduler::TestTaskTeam< Kokkos::Cuda >::run(1000);
- //TestTaskScheduler::TestTaskTeamValue< Kokkos::Cuda >::run(1000); //put back after testing
+ TestTaskScheduler::TestTaskTeam< Kokkos::Cuda >::run( 1000 );
+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Cuda >::run( 1000 ); // Put back after testing.
}
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
//----------------------------------------------------------------------------
#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
-TEST_F( cuda , cxx11 )
+TEST_F( cuda, cxx11 )
{
- if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Cuda >::value ) {
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(1) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(2) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(3) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >(4) ) );
+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Cuda >::value ) {
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >( 1 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >( 2 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >( 3 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Cuda >( 4 ) ) );
}
}
#endif
TEST_F( cuda, tile_layout )
{
- TestTile::test< Kokkos::Cuda , 1 , 1 >( 1 , 1 );
- TestTile::test< Kokkos::Cuda , 1 , 1 >( 2 , 3 );
- TestTile::test< Kokkos::Cuda , 1 , 1 >( 9 , 10 );
-
- TestTile::test< Kokkos::Cuda , 2 , 2 >( 1 , 1 );
- TestTile::test< Kokkos::Cuda , 2 , 2 >( 2 , 3 );
- TestTile::test< Kokkos::Cuda , 2 , 2 >( 4 , 4 );
- TestTile::test< Kokkos::Cuda , 2 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::Cuda , 2 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::Cuda , 4 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::Cuda , 4 , 4 >( 1 , 1 );
- TestTile::test< Kokkos::Cuda , 4 , 4 >( 4 , 4 );
- TestTile::test< Kokkos::Cuda , 4 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::Cuda , 4 , 4 >( 9 , 11 );
-
- TestTile::test< Kokkos::Cuda , 8 , 8 >( 1 , 1 );
- TestTile::test< Kokkos::Cuda , 8 , 8 >( 4 , 4 );
- TestTile::test< Kokkos::Cuda , 8 , 8 >( 9 , 9 );
- TestTile::test< Kokkos::Cuda , 8 , 8 >( 9 , 11 );
+ TestTile::test< Kokkos::Cuda, 1, 1 >( 1, 1 );
+ TestTile::test< Kokkos::Cuda, 1, 1 >( 2, 3 );
+ TestTile::test< Kokkos::Cuda, 1, 1 >( 9, 10 );
+
+ TestTile::test< Kokkos::Cuda, 2, 2 >( 1, 1 );
+ TestTile::test< Kokkos::Cuda, 2, 2 >( 2, 3 );
+ TestTile::test< Kokkos::Cuda, 2, 2 >( 4, 4 );
+ TestTile::test< Kokkos::Cuda, 2, 2 >( 9, 9 );
+
+ TestTile::test< Kokkos::Cuda, 2, 4 >( 9, 9 );
+ TestTile::test< Kokkos::Cuda, 4, 2 >( 9, 9 );
+
+ TestTile::test< Kokkos::Cuda, 4, 4 >( 1, 1 );
+ TestTile::test< Kokkos::Cuda, 4, 4 >( 4, 4 );
+ TestTile::test< Kokkos::Cuda, 4, 4 >( 9, 9 );
+ TestTile::test< Kokkos::Cuda, 4, 4 >( 9, 11 );
+
+ TestTile::test< Kokkos::Cuda, 8, 8 >( 1, 1 );
+ TestTile::test< Kokkos::Cuda, 8, 8 >( 4, 4 );
+ TestTile::test< Kokkos::Cuda, 8, 8 >( 9, 9 );
+ TestTile::test< Kokkos::Cuda, 8, 8 >( 9, 11 );
}
-#if defined (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
-#if defined (KOKKOS_COMPILER_CLANG)
-TEST_F( cuda , dispatch )
+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
+#if defined( KOKKOS_COMPILER_CLANG )
+TEST_F( cuda, dispatch )
{
- const int repeat = 100 ;
- for ( int i = 0 ; i < repeat ; ++i ) {
- for ( int j = 0 ; j < repeat ; ++j ) {
- Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda >(0,j)
- , KOKKOS_LAMBDA( int ) {} );
- }}
+ const int repeat = 100;
+ for ( int i = 0; i < repeat; ++i ) {
+ for ( int j = 0; j < repeat; ++j ) {
+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda >( 0, j )
+ , KOKKOS_LAMBDA( int ) {} );
+ }
+ }
}
#endif
#endif
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp
index b9ab9fe72..01eed4e02 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_a.cpp
@@ -1,56 +1,56 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda , reducers )
+TEST_F( cuda, reducers )
{
- TestReducers<int, Kokkos::Cuda>::execute_integer();
- TestReducers<size_t, Kokkos::Cuda>::execute_integer();
- TestReducers<double, Kokkos::Cuda>::execute_float();
- TestReducers<Kokkos::complex<double>, Kokkos::Cuda>::execute_basic();
+ TestReducers< int, Kokkos::Cuda >::execute_integer();
+ TestReducers< size_t, Kokkos::Cuda >::execute_integer();
+ TestReducers< double, Kokkos::Cuda >::execute_float();
+ TestReducers< Kokkos::complex<double>, Kokkos::Cuda >::execute_basic();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp
index c588d752d..7f4e0973e 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Reductions_b.cpp
@@ -1,130 +1,138 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, long_reduce) {
- TestReduce< long , Kokkos::Cuda >( 0 );
- TestReduce< long , Kokkos::Cuda >( 1000000 );
+TEST_F( cuda, long_reduce )
+{
+ TestReduce< long, Kokkos::Cuda >( 0 );
+ TestReduce< long, Kokkos::Cuda >( 1000000 );
}
-TEST_F( cuda, double_reduce) {
- TestReduce< double , Kokkos::Cuda >( 0 );
- TestReduce< double , Kokkos::Cuda >( 1000000 );
+TEST_F( cuda, double_reduce )
+{
+ TestReduce< double, Kokkos::Cuda >( 0 );
+ TestReduce< double, Kokkos::Cuda >( 1000000 );
}
-TEST_F( cuda, long_reduce_dynamic ) {
- TestReduceDynamic< long , Kokkos::Cuda >( 0 );
- TestReduceDynamic< long , Kokkos::Cuda >( 1000000 );
+TEST_F( cuda, long_reduce_dynamic )
+{
+ TestReduceDynamic< long, Kokkos::Cuda >( 0 );
+ TestReduceDynamic< long, Kokkos::Cuda >( 1000000 );
}
-TEST_F( cuda, double_reduce_dynamic ) {
- TestReduceDynamic< double , Kokkos::Cuda >( 0 );
- TestReduceDynamic< double , Kokkos::Cuda >( 1000000 );
+TEST_F( cuda, double_reduce_dynamic )
+{
+ TestReduceDynamic< double, Kokkos::Cuda >( 0 );
+ TestReduceDynamic< double, Kokkos::Cuda >( 1000000 );
}
-TEST_F( cuda, long_reduce_dynamic_view ) {
- TestReduceDynamicView< long , Kokkos::Cuda >( 0 );
- TestReduceDynamicView< long , Kokkos::Cuda >( 1000000 );
+TEST_F( cuda, long_reduce_dynamic_view )
+{
+ TestReduceDynamicView< long, Kokkos::Cuda >( 0 );
+ TestReduceDynamicView< long, Kokkos::Cuda >( 1000000 );
}
-TEST_F( cuda , scan )
+TEST_F( cuda, scan )
{
- TestScan< Kokkos::Cuda >::test_range( 1 , 1000 );
+ TestScan< Kokkos::Cuda >::test_range( 1, 1000 );
TestScan< Kokkos::Cuda >( 0 );
TestScan< Kokkos::Cuda >( 100000 );
TestScan< Kokkos::Cuda >( 10000000 );
Kokkos::Cuda::fence();
}
#if 0
-TEST_F( cuda , scan_small )
+TEST_F( cuda, scan_small )
{
- typedef TestScan< Kokkos::Cuda , Kokkos::Impl::CudaExecUseScanSmall > TestScanFunctor ;
- for ( int i = 0 ; i < 1000 ; ++i ) {
+ typedef TestScan< Kokkos::Cuda, Kokkos::Impl::CudaExecUseScanSmall > TestScanFunctor;
+
+ for ( int i = 0; i < 1000; ++i ) {
TestScanFunctor( 10 );
TestScanFunctor( 10000 );
}
TestScanFunctor( 1000000 );
TestScanFunctor( 10000000 );
Kokkos::Cuda::fence();
}
#endif
-TEST_F( cuda , team_scan )
+TEST_F( cuda, team_scan )
{
- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 10 );
- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 10000 );
- TestScanTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 10 );
+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 10000 );
+ TestScanTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
}
-TEST_F( cuda , team_long_reduce) {
- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< long , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+TEST_F( cuda, team_long_reduce )
+{
+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< long, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
}
-TEST_F( cuda , team_double_reduce) {
- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< double , Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+TEST_F( cuda, team_double_reduce )
+{
+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< double, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
}
-TEST_F( cuda , reduction_deduction )
+TEST_F( cuda, reduction_deduction )
{
TestCXX11::test_reduction_deduction< Kokkos::Cuda >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp
index f3cbc3b88..5bed7640d 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Spaces.cpp
@@ -1,399 +1,385 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
__global__
void test_abort()
{
- Kokkos::abort("test_abort");
+ Kokkos::abort( "test_abort" );
}
__global__
void test_cuda_spaces_int_value( int * ptr )
{
- if ( *ptr == 42 ) { *ptr = 2 * 42 ; }
+ if ( *ptr == 42 ) { *ptr = 2 * 42; }
}
-TEST_F( cuda , space_access )
+TEST_F( cuda, space_access )
{
- //--------------------------------------
-
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::HostSpace >::assignable , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::HostSpace >::assignable, "" );
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::CudaHostPinnedSpace >::assignable, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaSpace >::assignable , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::CudaSpace >::assignable, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaSpace >::accessible , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::CudaSpace >::accessible, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaUVMSpace >::assignable , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::CudaUVMSpace >::assignable, "" );
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace , Kokkos::CudaUVMSpace >::accessible , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::HostSpace, Kokkos::CudaUVMSpace >::accessible, "" );
//--------------------------------------
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaSpace >::assignable , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::CudaSpace >::assignable, "" );
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaUVMSpace >::assignable , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::CudaUVMSpace >::assignable, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::CudaHostPinnedSpace >::assignable, "" );
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::CudaHostPinnedSpace >::accessible , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::CudaHostPinnedSpace >::accessible, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::HostSpace >::assignable , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::HostSpace >::assignable, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace , Kokkos::HostSpace >::accessible , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaSpace, Kokkos::HostSpace >::accessible, "" );
//--------------------------------------
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaUVMSpace >::assignable , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::CudaUVMSpace >::assignable, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaSpace >::assignable , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::CudaSpace >::assignable, "" );
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaSpace >::accessible , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::CudaSpace >::accessible, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::HostSpace >::assignable , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::HostSpace >::assignable, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::HostSpace >::accessible , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::HostSpace >::accessible, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::CudaHostPinnedSpace >::assignable, "" );
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace , Kokkos::CudaHostPinnedSpace >::accessible , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaUVMSpace, Kokkos::CudaHostPinnedSpace >::accessible, "" );
//--------------------------------------
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaHostPinnedSpace >::assignable , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::CudaHostPinnedSpace >::assignable, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace >::assignable , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::HostSpace >::assignable, "" );
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace >::accessible , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::HostSpace >::accessible, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaSpace >::assignable , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::CudaSpace >::assignable, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaSpace >::accessible , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::CudaSpace >::accessible, "" );
static_assert(
- ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaUVMSpace >::assignable , "" );
+ ! Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::CudaUVMSpace >::assignable, "" );
static_assert(
- Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace , Kokkos::CudaUVMSpace >::accessible , "" );
+ Kokkos::Impl::MemorySpaceAccess< Kokkos::CudaHostPinnedSpace, Kokkos::CudaUVMSpace >::accessible, "" );
//--------------------------------------
static_assert(
- ! Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::HostSpace >::accessible , "" );
+ ! Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda, Kokkos::HostSpace >::accessible, "" );
static_assert(
- Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::CudaSpace >::accessible , "" );
+ Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda, Kokkos::CudaSpace >::accessible, "" );
static_assert(
- Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::CudaUVMSpace >::accessible , "" );
+ Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda, Kokkos::CudaUVMSpace >::accessible, "" );
static_assert(
- Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda , Kokkos::CudaHostPinnedSpace >::accessible , "" );
+ Kokkos::Impl::SpaceAccessibility< Kokkos::Cuda, Kokkos::CudaHostPinnedSpace >::accessible, "" );
static_assert(
- ! Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , Kokkos::CudaSpace >::accessible , "" );
+ ! Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, Kokkos::CudaSpace >::accessible, "" );
static_assert(
- Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , Kokkos::CudaUVMSpace >::accessible , "" );
+ Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, Kokkos::CudaUVMSpace >::accessible, "" );
static_assert(
- Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace >::accessible , "" );
-
+ Kokkos::Impl::SpaceAccessibility< Kokkos::HostSpace, Kokkos::CudaHostPinnedSpace >::accessible, "" );
static_assert(
std::is_same< Kokkos::Impl::HostMirror< Kokkos::CudaSpace >::Space
- , Kokkos::HostSpace >::value , "" );
+ , Kokkos::HostSpace >::value, "" );
static_assert(
std::is_same< Kokkos::Impl::HostMirror< Kokkos::CudaUVMSpace >::Space
, Kokkos::Device< Kokkos::HostSpace::execution_space
- , Kokkos::CudaUVMSpace > >::value , "" );
+ , Kokkos::CudaUVMSpace > >::value, "" );
static_assert(
std::is_same< Kokkos::Impl::HostMirror< Kokkos::CudaHostPinnedSpace >::Space
- , Kokkos::CudaHostPinnedSpace >::value , "" );
+ , Kokkos::CudaHostPinnedSpace >::value, "" );
static_assert(
std::is_same< Kokkos::Device< Kokkos::HostSpace::execution_space
, Kokkos::CudaUVMSpace >
, Kokkos::Device< Kokkos::HostSpace::execution_space
- , Kokkos::CudaUVMSpace > >::value , "" );
+ , Kokkos::CudaUVMSpace > >::value, "" );
static_assert(
Kokkos::Impl::SpaceAccessibility
< Kokkos::Impl::HostMirror< Kokkos::Cuda >::Space
, Kokkos::HostSpace
- >::accessible , "" );
+ >::accessible, "" );
static_assert(
Kokkos::Impl::SpaceAccessibility
< Kokkos::Impl::HostMirror< Kokkos::CudaSpace >::Space
, Kokkos::HostSpace
- >::accessible , "" );
+ >::accessible, "" );
static_assert(
Kokkos::Impl::SpaceAccessibility
< Kokkos::Impl::HostMirror< Kokkos::CudaUVMSpace >::Space
, Kokkos::HostSpace
- >::accessible , "" );
+ >::accessible, "" );
static_assert(
Kokkos::Impl::SpaceAccessibility
< Kokkos::Impl::HostMirror< Kokkos::CudaHostPinnedSpace >::Space
, Kokkos::HostSpace
- >::accessible , "" );
+ >::accessible, "" );
}
TEST_F( cuda, uvm )
{
if ( Kokkos::CudaUVMSpace::available() ) {
+ int * uvm_ptr = (int*) Kokkos::kokkos_malloc< Kokkos::CudaUVMSpace >( "uvm_ptr", sizeof( int ) );
- int * uvm_ptr = (int*) Kokkos::kokkos_malloc< Kokkos::CudaUVMSpace >("uvm_ptr",sizeof(int));
-
- *uvm_ptr = 42 ;
+ *uvm_ptr = 42;
Kokkos::Cuda::fence();
- test_cuda_spaces_int_value<<<1,1>>>(uvm_ptr);
+ test_cuda_spaces_int_value<<< 1, 1 >>>( uvm_ptr );
Kokkos::Cuda::fence();
- EXPECT_EQ( *uvm_ptr, int(2*42) );
-
- Kokkos::kokkos_free< Kokkos::CudaUVMSpace >(uvm_ptr );
+ EXPECT_EQ( *uvm_ptr, int( 2 * 42 ) );
+ Kokkos::kokkos_free< Kokkos::CudaUVMSpace >( uvm_ptr );
}
}
TEST_F( cuda, uvm_num_allocs )
{
- // The max number of uvm allocations allowed is 65536
+ // The max number of UVM allocations allowed is 65536.
#define MAX_NUM_ALLOCS 65536
if ( Kokkos::CudaUVMSpace::available() ) {
-
struct TestMaxUVMAllocs {
- using view_type = Kokkos::View< double* , Kokkos::CudaUVMSpace >;
- using view_of_view_type = Kokkos::View< view_type[ MAX_NUM_ALLOCS ]
+ using view_type = Kokkos::View< double*, Kokkos::CudaUVMSpace >;
+ using view_of_view_type = Kokkos::View< view_type[ MAX_NUM_ALLOCS ]
, Kokkos::CudaUVMSpace >;
- TestMaxUVMAllocs()
- : view_allocs_test("view_allocs_test")
+ TestMaxUVMAllocs() : view_allocs_test( "view_allocs_test" )
{
+ for ( auto i = 0; i < MAX_NUM_ALLOCS; ++i ) {
- for ( auto i = 0; i < MAX_NUM_ALLOCS ; ++i ) {
-
- // Kokkos will throw a runtime exception if an attempt is made to
- // allocate more than the maximum number of uvm allocations
+ // Kokkos will throw a runtime exception if an attempt is made to
+ // allocate more than the maximum number of uvm allocations.
// In this test, the max num of allocs occurs when i = MAX_NUM_ALLOCS - 1
// since the 'outer' view counts as one UVM allocation, leaving
- // 65535 possible UVM allocations, that is 'i in [0 , 65535)'
+ // 65535 possible UVM allocations, that is 'i in [0, 65535)'.
- // The test will catch the exception thrown in this case and continue
+ // The test will catch the exception thrown in this case and continue.
- if ( i == ( MAX_NUM_ALLOCS - 1) ) {
- EXPECT_ANY_THROW( { view_allocs_test(i) = view_type("inner_view",1); } ) ;
+ if ( i == ( MAX_NUM_ALLOCS - 1 ) ) {
+ EXPECT_ANY_THROW( { view_allocs_test( i ) = view_type( "inner_view", 1 ); } );
}
else {
- if(i<MAX_NUM_ALLOCS - 1000) {
- EXPECT_NO_THROW( { view_allocs_test(i) = view_type("inner_view",1); } ) ;
- } else { // This might or might not throw depending on compilation options.
+ if ( i < MAX_NUM_ALLOCS - 1000 ) {
+ EXPECT_NO_THROW( { view_allocs_test( i ) = view_type( "inner_view", 1 ); } );
+ } else { // This might or might not throw depending on compilation options.
try {
- view_allocs_test(i) = view_type("inner_view",1);
+ view_allocs_test( i ) = view_type( "inner_view", 1 );
}
- catch (...) {}
+ catch ( ... ) {}
}
}
- } //end allocation for loop
+ } // End allocation for loop.
- for ( auto i = 0; i < MAX_NUM_ALLOCS -1; ++i ) {
+ for ( auto i = 0; i < MAX_NUM_ALLOCS - 1; ++i ) {
- view_allocs_test(i) = view_type();
+ view_allocs_test( i ) = view_type();
- } //end deallocation for loop
+ } // End deallocation for loop.
- view_allocs_test = view_of_view_type(); // deallocate the view of views
+ view_allocs_test = view_of_view_type(); // Deallocate the view of views.
}
- // Member
- view_of_view_type view_allocs_test ;
- } ;
-
- // trigger the test via the TestMaxUVMAllocs constructor
- TestMaxUVMAllocs() ;
+ // Member.
+ view_of_view_type view_allocs_test;
+ };
+ // Trigger the test via the TestMaxUVMAllocs constructor.
+ TestMaxUVMAllocs();
}
- #undef MAX_NUM_ALLOCS
+
+ #undef MAX_NUM_ALLOCS
}
-template< class MemSpace , class ExecSpace >
+template< class MemSpace, class ExecSpace >
struct TestViewCudaAccessible {
-
enum { N = 1000 };
- using V = Kokkos::View<double*,MemSpace> ;
+ using V = Kokkos::View< double*, MemSpace >;
- V m_base ;
+ V m_base;
struct TagInit {};
struct TagTest {};
KOKKOS_INLINE_FUNCTION
- void operator()( const TagInit & , const int i ) const { m_base[i] = i + 1 ; }
+ void operator()( const TagInit &, const int i ) const { m_base[i] = i + 1; }
KOKKOS_INLINE_FUNCTION
- void operator()( const TagTest & , const int i , long & error_count ) const
- { if ( m_base[i] != i + 1 ) ++error_count ; }
+ void operator()( const TagTest &, const int i, long & error_count ) const
+ { if ( m_base[i] != i + 1 ) ++error_count; }
TestViewCudaAccessible()
- : m_base("base",N)
+ : m_base( "base", N )
{}
static void run()
- {
- TestViewCudaAccessible self ;
- Kokkos::parallel_for( Kokkos::RangePolicy< typename MemSpace::execution_space , TagInit >(0,N) , self );
- MemSpace::execution_space::fence();
- // Next access is a different execution space, must complete prior kernel.
- long error_count = -1 ;
- Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace , TagTest >(0,N) , self , error_count );
- EXPECT_EQ( error_count , 0 );
- }
+ {
+ TestViewCudaAccessible self;
+ Kokkos::parallel_for( Kokkos::RangePolicy< typename MemSpace::execution_space, TagInit >( 0, N ), self );
+ MemSpace::execution_space::fence();
+
+ // Next access is a different execution space, must complete prior kernel.
+ long error_count = -1;
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< ExecSpace, TagTest >( 0, N ), self, error_count );
+ EXPECT_EQ( error_count, 0 );
+ }
};
-TEST_F( cuda , impl_view_accessible )
+TEST_F( cuda, impl_view_accessible )
{
- TestViewCudaAccessible< Kokkos::CudaSpace , Kokkos::Cuda >::run();
+ TestViewCudaAccessible< Kokkos::CudaSpace, Kokkos::Cuda >::run();
- TestViewCudaAccessible< Kokkos::CudaUVMSpace , Kokkos::Cuda >::run();
- TestViewCudaAccessible< Kokkos::CudaUVMSpace , Kokkos::HostSpace::execution_space >::run();
+ TestViewCudaAccessible< Kokkos::CudaUVMSpace, Kokkos::Cuda >::run();
+ TestViewCudaAccessible< Kokkos::CudaUVMSpace, Kokkos::HostSpace::execution_space >::run();
- TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace , Kokkos::Cuda >::run();
- TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace::execution_space >::run();
+ TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda >::run();
+ TestViewCudaAccessible< Kokkos::CudaHostPinnedSpace, Kokkos::HostSpace::execution_space >::run();
}
template< class MemSpace >
struct TestViewCudaTexture {
-
enum { N = 1000 };
- using V = Kokkos::View<double*,MemSpace> ;
- using T = Kokkos::View<const double*, MemSpace, Kokkos::MemoryRandomAccess > ;
+ using V = Kokkos::View< double*, MemSpace >;
+ using T = Kokkos::View< const double*, MemSpace, Kokkos::MemoryRandomAccess >;
- V m_base ;
- T m_tex ;
+ V m_base;
+ T m_tex;
struct TagInit {};
struct TagTest {};
KOKKOS_INLINE_FUNCTION
- void operator()( const TagInit & , const int i ) const { m_base[i] = i + 1 ; }
+ void operator()( const TagInit &, const int i ) const { m_base[i] = i + 1; }
KOKKOS_INLINE_FUNCTION
- void operator()( const TagTest & , const int i , long & error_count ) const
- { if ( m_tex[i] != i + 1 ) ++error_count ; }
+ void operator()( const TagTest &, const int i, long & error_count ) const
+ { if ( m_tex[i] != i + 1 ) ++error_count; }
TestViewCudaTexture()
- : m_base("base",N)
+ : m_base( "base", N )
, m_tex( m_base )
{}
static void run()
- {
- EXPECT_TRUE( ( std::is_same< typename V::reference_type
- , double &
- >::value ) );
-
- EXPECT_TRUE( ( std::is_same< typename T::reference_type
- , const double
- >::value ) );
-
- EXPECT_TRUE( V::reference_type_is_lvalue_reference ); // An ordinary view
- EXPECT_FALSE( T::reference_type_is_lvalue_reference ); // Texture fetch returns by value
-
- TestViewCudaTexture self ;
- Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda , TagInit >(0,N) , self );
- long error_count = -1 ;
- Kokkos::parallel_reduce( Kokkos::RangePolicy< Kokkos::Cuda , TagTest >(0,N) , self , error_count );
- EXPECT_EQ( error_count , 0 );
- }
-};
+ {
+ EXPECT_TRUE( ( std::is_same< typename V::reference_type, double & >::value ) );
+ EXPECT_TRUE( ( std::is_same< typename T::reference_type, const double >::value ) );
+
+ EXPECT_TRUE( V::reference_type_is_lvalue_reference ); // An ordinary view.
+ EXPECT_FALSE( T::reference_type_is_lvalue_reference ); // Texture fetch returns by value.
+ TestViewCudaTexture self;
+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Cuda, TagInit >( 0, N ), self );
-TEST_F( cuda , impl_view_texture )
+ long error_count = -1;
+ Kokkos::parallel_reduce( Kokkos::RangePolicy< Kokkos::Cuda, TagTest >( 0, N ), self, error_count );
+ EXPECT_EQ( error_count, 0 );
+ }
+};
+
+TEST_F( cuda, impl_view_texture )
{
TestViewCudaTexture< Kokkos::CudaSpace >::run();
TestViewCudaTexture< Kokkos::CudaUVMSpace >::run();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp
index fd8a647ef..0aea35db5 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_a.cpp
@@ -1,92 +1,103 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_auto_1d_left ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Cuda >();
+TEST_F( cuda, view_subview_auto_1d_left )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft, Kokkos::Cuda >();
}
-TEST_F( cuda, view_subview_auto_1d_right ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Cuda >();
+TEST_F( cuda, view_subview_auto_1d_right )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight, Kokkos::Cuda >();
}
-TEST_F( cuda, view_subview_auto_1d_stride ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Cuda >();
+TEST_F( cuda, view_subview_auto_1d_stride )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride, Kokkos::Cuda >();
}
-TEST_F( cuda, view_subview_assign_strided ) {
+TEST_F( cuda, view_subview_assign_strided )
+{
TestViewSubview::test_1d_strided_assignment< Kokkos::Cuda >();
}
-TEST_F( cuda, view_subview_left_0 ) {
+TEST_F( cuda, view_subview_left_0 )
+{
TestViewSubview::test_left_0< Kokkos::CudaUVMSpace >();
}
-TEST_F( cuda, view_subview_left_1 ) {
+TEST_F( cuda, view_subview_left_1 )
+{
TestViewSubview::test_left_1< Kokkos::CudaUVMSpace >();
}
-TEST_F( cuda, view_subview_left_2 ) {
+TEST_F( cuda, view_subview_left_2 )
+{
TestViewSubview::test_left_2< Kokkos::CudaUVMSpace >();
}
-TEST_F( cuda, view_subview_left_3 ) {
+TEST_F( cuda, view_subview_left_3 )
+{
TestViewSubview::test_left_3< Kokkos::CudaUVMSpace >();
}
-TEST_F( cuda, view_subview_right_0 ) {
+TEST_F( cuda, view_subview_right_0 )
+{
TestViewSubview::test_right_0< Kokkos::CudaUVMSpace >();
}
-TEST_F( cuda, view_subview_right_1 ) {
+TEST_F( cuda, view_subview_right_1 )
+{
TestViewSubview::test_right_1< Kokkos::CudaUVMSpace >();
}
-TEST_F( cuda, view_subview_right_3 ) {
+TEST_F( cuda, view_subview_right_3 )
+{
TestViewSubview::test_right_3< Kokkos::CudaUVMSpace >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp
index 053fcfc20..f31f4cbe6 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_b.cpp
@@ -1,60 +1,62 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_layoutleft_to_layoutleft) {
+TEST_F( cuda, view_subview_layoutleft_to_layoutleft )
+{
TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda >();
- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::Atomic> >();
- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-TEST_F( cuda, view_subview_layoutright_to_layoutright) {
+TEST_F( cuda, view_subview_layoutright_to_layoutright )
+{
TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda >();
- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::Atomic> >();
- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
index 4c5f2ef72..0213a196e 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_1d_assign ) {
+TEST_F( cuda, view_subview_1d_assign )
+{
TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp
index aee6f1730..181e1bab2 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c02.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_1d_assign_atomic ) {
- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( cuda, view_subview_1d_assign_atomic )
+{
+ TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp
index 2ef48c686..708cc1f5b 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c03.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_1d_assign_randomaccess ) {
- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( cuda, view_subview_1d_assign_randomaccess )
+{
+ TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp
index aec123ac2..a3db996f8 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c04.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_2d_from_3d ) {
+TEST_F( cuda, view_subview_2d_from_3d )
+{
TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
index e8ad23199..2f7cffa75 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_2d_from_3d_atomic ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( cuda, view_subview_2d_from_3d_atomic )
+{
+ TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
index e86b4513f..949c6f3e0 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_2d_from_3d_randomaccess ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( cuda, view_subview_2d_from_3d_randomaccess )
+{
+ TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp
index ad9dcc0fd..3e68277a9 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c07.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_3d_from_5d_left ) {
+TEST_F( cuda, view_subview_3d_from_5d_left )
+{
TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp
index f97d97e59..0cd91b779 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c08.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_3d_from_5d_left_atomic ) {
- TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( cuda, view_subview_3d_from_5d_left_atomic )
+{
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp
index 2a07f28f8..cd1c13f7d 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c09.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_3d_from_5d_left_randomaccess ) {
- TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( cuda, view_subview_3d_from_5d_left_randomaccess )
+{
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp
index 3c51d9420..22d275354 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c10.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_3d_from_5d_right ) {
+TEST_F( cuda, view_subview_3d_from_5d_right )
+{
TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp
index 835caa7b8..5dc5f87b4 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c11.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_3d_from_5d_right_atomic ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( cuda, view_subview_3d_from_5d_right_atomic )
+{
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp
index 53bd5eee2..318d8edbb 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c12.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_subview_3d_from_5d_right_randomaccess ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( cuda, view_subview_3d_from_5d_right_randomaccess )
+{
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::CudaUVMSpace, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp
index e4348319f..a2158f06c 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c_all.cpp
@@ -1,12 +1,12 @@
-#include<cuda/TestCuda_SubView_c01.cpp>
-#include<cuda/TestCuda_SubView_c02.cpp>
-#include<cuda/TestCuda_SubView_c03.cpp>
-#include<cuda/TestCuda_SubView_c04.cpp>
-#include<cuda/TestCuda_SubView_c05.cpp>
-#include<cuda/TestCuda_SubView_c06.cpp>
-#include<cuda/TestCuda_SubView_c07.cpp>
-#include<cuda/TestCuda_SubView_c08.cpp>
-#include<cuda/TestCuda_SubView_c09.cpp>
-#include<cuda/TestCuda_SubView_c10.cpp>
-#include<cuda/TestCuda_SubView_c11.cpp>
-#include<cuda/TestCuda_SubView_c12.cpp>
+#include <cuda/TestCuda_SubView_c01.cpp>
+#include <cuda/TestCuda_SubView_c02.cpp>
+#include <cuda/TestCuda_SubView_c03.cpp>
+#include <cuda/TestCuda_SubView_c04.cpp>
+#include <cuda/TestCuda_SubView_c05.cpp>
+#include <cuda/TestCuda_SubView_c06.cpp>
+#include <cuda/TestCuda_SubView_c07.cpp>
+#include <cuda/TestCuda_SubView_c08.cpp>
+#include <cuda/TestCuda_SubView_c09.cpp>
+#include <cuda/TestCuda_SubView_c10.cpp>
+#include <cuda/TestCuda_SubView_c11.cpp>
+#include <cuda/TestCuda_SubView_c12.cpp>
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp
index 13834d09a..8d9b9328b 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_Team.cpp
@@ -1,120 +1,126 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda , team_tag )
+TEST_F( cuda, team_tag )
{
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 2 );
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 2 );
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
- TestTeamPolicy< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1000 );
+ TestTeamPolicy< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1000 );
}
-TEST_F( cuda , team_shared_request) {
- TestSharedTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
- TestSharedTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( cuda, team_shared_request )
+{
+ TestSharedTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
+ TestSharedTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-//THis Tests request to much L0 scratch
-//TEST_F( cuda, team_scratch_request) {
-// TestScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
-// TestScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
+// This tests request to much L0 scratch.
+//TEST_F( cuda, team_scratch_request )
+//{
+// TestScratchTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
+// TestScratchTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
//}
-#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
-TEST_F( cuda , team_lambda_shared_request) {
+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
+TEST_F( cuda, team_lambda_shared_request )
+{
TestLambdaSharedTeam< Kokkos::CudaSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
TestLambdaSharedTeam< Kokkos::CudaUVMSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
- TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
+ TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
TestLambdaSharedTeam< Kokkos::CudaSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
TestLambdaSharedTeam< Kokkos::CudaUVMSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
- TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
+ TestLambdaSharedTeam< Kokkos::CudaHostPinnedSpace, Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
}
#endif
-TEST_F( cuda, shmem_size) {
+TEST_F( cuda, shmem_size )
+{
TestShmemSize< Kokkos::Cuda >();
}
-TEST_F( cuda, multi_level_scratch) {
- TestMultiLevelScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Static> >();
- TestMultiLevelScratchTeam< Kokkos::Cuda , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( cuda, multi_level_scratch )
+{
+ TestMultiLevelScratchTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Static> >();
+ TestMultiLevelScratchTeam< Kokkos::Cuda, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-TEST_F( cuda , team_vector )
+#if !defined(KOKKOS_CUDA_CLANG_WORKAROUND) && !defined(KOKKOS_ARCH_PASCAL)
+TEST_F( cuda, team_vector )
{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(4) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(5) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(6) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(7) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(8) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(9) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >(10) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 0 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 1 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 2 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 3 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 4 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 5 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 6 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 7 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 8 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 9 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Cuda >( 10 ) ) );
}
+#endif
TEST_F( cuda, triple_nested_parallelism )
{
- TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 32 , 32 );
- TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 32 , 16 );
- TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048 , 16 , 16 );
+ TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048, 32, 32 );
+ TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048, 32, 16 );
+ TestTripleNestedReduce< double, Kokkos::Cuda >( 8192, 2048, 16, 16 );
}
-
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
index c01ca1c14..be0c4c571 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
@@ -1,59 +1,60 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda , impl_view_mapping_a ) {
+TEST_F( cuda, impl_view_mapping_a )
+{
test_view_mapping< Kokkos::CudaSpace >();
test_view_mapping_operator< Kokkos::CudaSpace >();
}
-TEST_F( cuda , view_of_class )
+TEST_F( cuda, view_of_class )
{
TestViewMappingClassValue< Kokkos::CudaSpace >::run();
TestViewMappingClassValue< Kokkos::CudaUVMSpace >::run();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp
index 8e821ada0..b4d8e5d95 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_b.cpp
@@ -1,53 +1,54 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda , impl_view_mapping_d ) {
+TEST_F( cuda, impl_view_mapping_d )
+{
test_view_mapping< Kokkos::CudaHostPinnedSpace >();
test_view_mapping_operator< Kokkos::CudaHostPinnedSpace >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp
index cf29a68e9..e4e6894c5 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_c.cpp
@@ -1,53 +1,54 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda , impl_view_mapping_c ) {
+TEST_F( cuda, impl_view_mapping_c )
+{
test_view_mapping< Kokkos::CudaUVMSpace >();
test_view_mapping_operator< Kokkos::CudaUVMSpace >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp
index db14b5158..82a3dd83e 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_d.cpp
@@ -1,112 +1,116 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda , view_nested_view )
+TEST_F( cuda, view_nested_view )
{
::Test::view_nested_view< Kokkos::Cuda >();
}
-
-
-TEST_F( cuda , view_remap )
+TEST_F( cuda, view_remap )
{
- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };
- typedef Kokkos::View< double*[N1][N2][N3] ,
- Kokkos::LayoutRight ,
- Kokkos::CudaUVMSpace > output_type ;
+ typedef Kokkos::View< double*[N1][N2][N3],
+ Kokkos::LayoutRight,
+ Kokkos::CudaUVMSpace > output_type;
- typedef Kokkos::View< int**[N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::CudaUVMSpace > input_type ;
+ typedef Kokkos::View< int**[N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::CudaUVMSpace > input_type;
- typedef Kokkos::View< int*[N0][N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::CudaUVMSpace > diff_type ;
+ typedef Kokkos::View< int*[N0][N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::CudaUVMSpace > diff_type;
- output_type output( "output" , N0 );
- input_type input ( "input" , N0 , N1 );
- diff_type diff ( "diff" , N0 );
+ output_type output( "output", N0 );
+ input_type input ( "input", N0, N1 );
+ diff_type diff ( "diff", N0 );
Kokkos::fence();
- int value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- input(i0,i1,i2,i3) = ++value ;
- }}}}
+
+ int value = 0;
+
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i0 = 0; i0 < N0; ++i0 )
+ {
+ input( i0, i1, i2, i3 ) = ++value;
+ }
+
Kokkos::fence();
- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
- Kokkos::deep_copy( output , input );
-
+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
+ Kokkos::deep_copy( output, input );
+
Kokkos::fence();
- value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- ++value ;
- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
- }}}}
+
+ value = 0;
+
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i0 = 0; i0 < N0; ++i0 )
+ {
+ ++value;
+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
+ }
+
Kokkos::fence();
}
-//----------------------------------------------------------------------------
-
-TEST_F( cuda , view_aggregate )
+TEST_F( cuda, view_aggregate )
{
TestViewAggregate< Kokkos::Cuda >();
}
-TEST_F( cuda , template_meta_functions )
+TEST_F( cuda, template_meta_functions )
{
- TestTemplateMetaFunctions<int, Kokkos::Cuda >();
+ TestTemplateMetaFunctions< int, Kokkos::Cuda >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp
index 07d425647..27450fa6f 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_e.cpp
@@ -1,63 +1,65 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda , impl_shared_alloc ) {
- test_shared_alloc< Kokkos::CudaSpace , Kokkos::HostSpace::execution_space >();
- test_shared_alloc< Kokkos::CudaUVMSpace , Kokkos::HostSpace::execution_space >();
- test_shared_alloc< Kokkos::CudaHostPinnedSpace , Kokkos::HostSpace::execution_space >();
+TEST_F( cuda, impl_shared_alloc )
+{
+ test_shared_alloc< Kokkos::CudaSpace, Kokkos::HostSpace::execution_space >();
+ test_shared_alloc< Kokkos::CudaUVMSpace, Kokkos::HostSpace::execution_space >();
+ test_shared_alloc< Kokkos::CudaHostPinnedSpace, Kokkos::HostSpace::execution_space >();
}
-TEST_F( cuda , impl_view_mapping_b ) {
+TEST_F( cuda, impl_view_mapping_b )
+{
test_view_mapping_subview< Kokkos::CudaSpace >();
test_view_mapping_subview< Kokkos::CudaUVMSpace >();
test_view_mapping_subview< Kokkos::CudaHostPinnedSpace >();
TestViewMappingAtomic< Kokkos::CudaSpace >::run();
TestViewMappingAtomic< Kokkos::CudaUVMSpace >::run();
TestViewMappingAtomic< Kokkos::CudaHostPinnedSpace >::run();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp
index 34721f02d..56524111a 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_f.cpp
@@ -1,55 +1,56 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_api_a) {
- typedef Kokkos::View< const int * , Kokkos::Cuda , Kokkos::MemoryTraits< Kokkos::RandomAccess > > view_texture_managed ;
- typedef Kokkos::View< const int * , Kokkos::Cuda , Kokkos::MemoryTraits< Kokkos::RandomAccess | Kokkos::Unmanaged > > view_texture_unmanaged ;
+TEST_F( cuda, view_api_a )
+{
+ typedef Kokkos::View< const int *, Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess> > view_texture_managed;
+ typedef Kokkos::View< const int *, Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess | Kokkos::Unmanaged> > view_texture_unmanaged;
- TestViewAPI< double , Kokkos::Cuda >();
+ TestViewAPI< double, Kokkos::Cuda >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp
index abbcf3bf8..d5fd24456 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_g.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_api_b) {
- TestViewAPI< double , Kokkos::CudaUVMSpace >();
+TEST_F( cuda, view_api_b )
+{
+ TestViewAPI< double, Kokkos::CudaUVMSpace >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp
index 989964203..649023e4a 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_h.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda, view_api_c) {
- TestViewAPI< double , Kokkos::CudaHostPinnedSpace >();
+TEST_F( cuda, view_api_c )
+{
+ TestViewAPI< double, Kokkos::CudaHostPinnedSpace >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_s.cpp b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_s.cpp
index 9bc09ba89..b46b1e5f8 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_s.cpp
+++ b/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_s.cpp
@@ -1,53 +1,54 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <cuda/TestCuda.hpp>
namespace Test {
-TEST_F( cuda , view_space_assign ) {
- view_space_assign< Kokkos::HostSpace , Kokkos::CudaHostPinnedSpace >();
- view_space_assign< Kokkos::CudaSpace , Kokkos::CudaUVMSpace >();
+TEST_F( cuda, view_space_assign )
+{
+ view_space_assign< Kokkos::HostSpace, Kokkos::CudaHostPinnedSpace >();
+ view_space_assign< Kokkos::CudaSpace, Kokkos::CudaUVMSpace >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp
index 28ae5b41b..ed9bb68cd 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP.hpp
@@ -1,117 +1,112 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#ifndef KOKKOS_TEST_OPENMP_HPP
#define KOKKOS_TEST_OPENMP_HPP
+
#include <gtest/gtest.h>
#include <Kokkos_Macros.hpp>
+
#ifdef KOKKOS_LAMBDA
#undef KOKKOS_LAMBDA
#endif
#define KOKKOS_LAMBDA [=]
#include <Kokkos_Core.hpp>
#include <TestTile.hpp>
-
-//----------------------------------------------------------------------------
-
#include <TestSharedAlloc.hpp>
#include <TestViewMapping.hpp>
-
-
#include <TestViewAPI.hpp>
#include <TestViewOfClass.hpp>
#include <TestViewSubview.hpp>
#include <TestAtomic.hpp>
#include <TestAtomicOperations.hpp>
#include <TestAtomicViews.hpp>
#include <TestRange.hpp>
#include <TestTeam.hpp>
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestAggregate.hpp>
#include <TestCompilerMacros.hpp>
#include <TestTaskScheduler.hpp>
#include <TestMemoryPool.hpp>
-
-
#include <TestCXX11.hpp>
#include <TestCXX11Deduction.hpp>
#include <TestTeamVector.hpp>
#include <TestTemplateMetaFunctions.hpp>
-
#include <TestPolicyConstruction.hpp>
-
#include <TestMDRange.hpp>
namespace Test {
class openmp : public ::testing::Test {
protected:
static void SetUpTestCase()
{
const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
- const unsigned threads_count = std::max( 1u , numa_count ) *
- std::max( 2u , ( cores_per_numa * threads_per_core ) / 2 );
+ const unsigned threads_count = std::max( 1u, numa_count ) *
+ std::max( 2u, ( cores_per_numa * threads_per_core ) / 2 );
Kokkos::OpenMP::initialize( threads_count );
- Kokkos::OpenMP::print_configuration( std::cout , true );
- srand(10231);
+ Kokkos::print_configuration( std::cout, true );
+ srand( 10231 );
}
static void TearDownTestCase()
{
Kokkos::OpenMP::finalize();
- omp_set_num_threads(1);
+ omp_set_num_threads( 1 );
- ASSERT_EQ( 1 , omp_get_max_threads() );
+ ASSERT_EQ( 1, omp_get_max_threads() );
}
};
-}
+} // namespace Test
+
#endif
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp
index ed6c9f8d1..2585c0197 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Atomics.cpp
@@ -1,204 +1,201 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp , atomics )
+TEST_F( openmp, atomics )
{
- const int loop_count = 1e4 ;
+ const int loop_count = 1e4;
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::OpenMP>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::OpenMP>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::OpenMP>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::OpenMP >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::OpenMP >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::OpenMP >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::OpenMP>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::OpenMP>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::OpenMP>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::OpenMP >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::OpenMP >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::OpenMP >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::OpenMP>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::OpenMP>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::OpenMP>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::OpenMP >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::OpenMP >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::OpenMP >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::OpenMP>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::OpenMP>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::OpenMP>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::OpenMP >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::OpenMP >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::OpenMP >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::OpenMP>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::OpenMP>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::OpenMP>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::OpenMP >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::OpenMP >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::OpenMP >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::OpenMP>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::OpenMP>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::OpenMP>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::OpenMP >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::OpenMP >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::OpenMP >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::OpenMP>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::OpenMP>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::OpenMP>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::OpenMP >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::OpenMP >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::OpenMP >( 100, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::OpenMP>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::OpenMP>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::OpenMP>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::OpenMP >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::OpenMP >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::OpenMP >( 100, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::OpenMP>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::OpenMP>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::OpenMP>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::OpenMP >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::OpenMP >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::OpenMP >( 100, 3 ) ) );
}
-TEST_F( openmp , atomic_operations )
+TEST_F( openmp, atomic_operations )
{
- const int start = 1; //Avoid zero for division
+ const int start = 1; // Avoid zero for division.
const int end = 11;
- for (int i = start; i < end; ++i)
+
+ for ( int i = start; i < end; ++i )
{
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::OpenMP>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::OpenMP>(start, end-i, 4 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::OpenMP>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::OpenMP >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::OpenMP >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::OpenMP >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::OpenMP >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::OpenMP >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::OpenMP >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::OpenMP >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::OpenMP >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::OpenMP >( start, end - i, 4 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::OpenMP >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::OpenMP >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::OpenMP >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::OpenMP >( start, end - i, 4 ) ) );
}
-
}
-
-TEST_F( openmp , atomic_views_integral )
+TEST_F( openmp, atomic_views_integral )
{
const long length = 1000000;
{
- //Integral Types
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::OpenMP>(length, 8 ) ) );
-
+ // Integral Types.
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::OpenMP >( length, 8 ) ) );
}
}
-TEST_F( openmp , atomic_views_nonintegral )
+TEST_F( openmp, atomic_views_nonintegral )
{
const long length = 1000000;
{
- //Non-Integral Types
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::OpenMP>(length, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::OpenMP>(length, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::OpenMP>(length, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::OpenMP>(length, 4 ) ) );
-
+ // Non-Integral Types.
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::OpenMP >( length, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::OpenMP >( length, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::OpenMP >( length, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::OpenMP >( length, 4 ) ) );
}
}
-TEST_F( openmp , atomic_view_api )
+TEST_F( openmp, atomic_view_api )
{
- TestAtomicViews::TestAtomicViewAPI<int, Kokkos::OpenMP>();
+ TestAtomicViews::TestAtomicViewAPI<int, Kokkos::OpenMP >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp
index 126d730f0..b4f32dac7 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Other.cpp
@@ -1,189 +1,212 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp , init ) {
+TEST_F( openmp, init )
+{
;
}
-TEST_F( openmp , md_range ) {
- TestMDRange_2D< Kokkos::OpenMP >::test_for2(100,100);
+TEST_F( openmp, mdrange_for )
+{
+ Kokkos::Timer timer;
+ TestMDRange_2D< Kokkos::OpenMP >::test_for2( 10000, 1000 );
+ std::cout << " 2D: " << timer.seconds() << std::endl;
+
+ timer.reset();
+ TestMDRange_3D< Kokkos::OpenMP >::test_for3( 100, 100, 1000 );
+ std::cout << " 3D: " << timer.seconds() << std::endl;
- TestMDRange_3D< Kokkos::OpenMP >::test_for3(100,100,100);
+ timer.reset();
+ TestMDRange_4D< Kokkos::OpenMP >::test_for4( 100, 10, 100, 100 );
+ std::cout << " 4D: " << timer.seconds() << std::endl;
+
+ timer.reset();
+ TestMDRange_5D< Kokkos::OpenMP >::test_for5( 100, 10, 10, 100, 50 );
+ std::cout << " 5D: " << timer.seconds() << std::endl;
+
+ timer.reset();
+ TestMDRange_6D< Kokkos::OpenMP >::test_for6( 10, 10, 10, 10, 50, 50 );
+ std::cout << " 6D: " << timer.seconds() << std::endl;
}
-TEST_F( openmp, policy_construction) {
+TEST_F( openmp, mdrange_reduce )
+{
+ TestMDRange_2D< Kokkos::OpenMP >::test_reduce2( 100, 100 );
+ TestMDRange_3D< Kokkos::OpenMP >::test_reduce3( 100, 10, 100 );
+}
+
+TEST_F( openmp, policy_construction )
+{
TestRangePolicyConstruction< Kokkos::OpenMP >();
TestTeamPolicyConstruction< Kokkos::OpenMP >();
}
-TEST_F( openmp , range_tag )
+TEST_F( openmp, range_tag )
{
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(0);
-
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(2);
-
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(3);
-
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
-
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
- TestRange< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_scan( 0 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 0 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 0 );
+
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_scan( 2 );
+
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 3 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 3 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 3 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 3 );
+
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_scan( 1000 );
+
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1001 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1001 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 1001 );
+ TestRange< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 1000 );
}
-
//----------------------------------------------------------------------------
-TEST_F( openmp , compiler_macros )
+TEST_F( openmp, compiler_macros )
{
ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::OpenMP >() ) );
}
//----------------------------------------------------------------------------
-TEST_F( openmp , memory_pool )
+TEST_F( openmp, memory_pool )
{
bool val = TestMemoryPool::test_mempool< Kokkos::OpenMP >( 128, 128000000 );
ASSERT_TRUE( val );
TestMemoryPool::test_mempool2< Kokkos::OpenMP >( 64, 4, 1000000, 2000000 );
TestMemoryPool::test_memory_exhaustion< Kokkos::OpenMP >();
}
//----------------------------------------------------------------------------
#if defined( KOKKOS_ENABLE_TASKDAG )
-TEST_F( openmp , task_fib )
+TEST_F( openmp, task_fib )
{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskScheduler::TestFib< Kokkos::OpenMP >::run(i, (i+1)*(i+1)*10000 );
+ for ( int i = 0; i < 25; ++i ) {
+ TestTaskScheduler::TestFib< Kokkos::OpenMP >::run( i, ( i + 1 ) * ( i + 1 ) * 10000 );
}
}
-TEST_F( openmp , task_depend )
+TEST_F( openmp, task_depend )
{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskScheduler::TestTaskDependence< Kokkos::OpenMP >::run(i);
+ for ( int i = 0; i < 25; ++i ) {
+ TestTaskScheduler::TestTaskDependence< Kokkos::OpenMP >::run( i );
}
}
-TEST_F( openmp , task_team )
+TEST_F( openmp, task_team )
{
- TestTaskScheduler::TestTaskTeam< Kokkos::OpenMP >::run(1000);
- //TestTaskScheduler::TestTaskTeamValue< Kokkos::OpenMP >::run(1000); //put back after testing
+ TestTaskScheduler::TestTaskTeam< Kokkos::OpenMP >::run( 1000 );
+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::OpenMP >::run( 1000 ); // Put back after testing.
}
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
//----------------------------------------------------------------------------
#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
-TEST_F( openmp , cxx11 )
+TEST_F( openmp, cxx11 )
{
- if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::OpenMP >::value ) {
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(1) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(2) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(3) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >(4) ) );
+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::OpenMP >::value ) {
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >( 1 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >( 2 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >( 3 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::OpenMP >( 4 ) ) );
}
}
#endif
TEST_F( openmp, tile_layout )
{
- TestTile::test< Kokkos::OpenMP , 1 , 1 >( 1 , 1 );
- TestTile::test< Kokkos::OpenMP , 1 , 1 >( 2 , 3 );
- TestTile::test< Kokkos::OpenMP , 1 , 1 >( 9 , 10 );
-
- TestTile::test< Kokkos::OpenMP , 2 , 2 >( 1 , 1 );
- TestTile::test< Kokkos::OpenMP , 2 , 2 >( 2 , 3 );
- TestTile::test< Kokkos::OpenMP , 2 , 2 >( 4 , 4 );
- TestTile::test< Kokkos::OpenMP , 2 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::OpenMP , 2 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::OpenMP , 4 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::OpenMP , 4 , 4 >( 1 , 1 );
- TestTile::test< Kokkos::OpenMP , 4 , 4 >( 4 , 4 );
- TestTile::test< Kokkos::OpenMP , 4 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::OpenMP , 4 , 4 >( 9 , 11 );
-
- TestTile::test< Kokkos::OpenMP , 8 , 8 >( 1 , 1 );
- TestTile::test< Kokkos::OpenMP , 8 , 8 >( 4 , 4 );
- TestTile::test< Kokkos::OpenMP , 8 , 8 >( 9 , 9 );
- TestTile::test< Kokkos::OpenMP , 8 , 8 >( 9 , 11 );
+ TestTile::test< Kokkos::OpenMP, 1, 1 >( 1, 1 );
+ TestTile::test< Kokkos::OpenMP, 1, 1 >( 2, 3 );
+ TestTile::test< Kokkos::OpenMP, 1, 1 >( 9, 10 );
+
+ TestTile::test< Kokkos::OpenMP, 2, 2 >( 1, 1 );
+ TestTile::test< Kokkos::OpenMP, 2, 2 >( 2, 3 );
+ TestTile::test< Kokkos::OpenMP, 2, 2 >( 4, 4 );
+ TestTile::test< Kokkos::OpenMP, 2, 2 >( 9, 9 );
+
+ TestTile::test< Kokkos::OpenMP, 2, 4 >( 9, 9 );
+ TestTile::test< Kokkos::OpenMP, 4, 2 >( 9, 9 );
+
+ TestTile::test< Kokkos::OpenMP, 4, 4 >( 1, 1 );
+ TestTile::test< Kokkos::OpenMP, 4, 4 >( 4, 4 );
+ TestTile::test< Kokkos::OpenMP, 4, 4 >( 9, 9 );
+ TestTile::test< Kokkos::OpenMP, 4, 4 >( 9, 11 );
+
+ TestTile::test< Kokkos::OpenMP, 8, 8 >( 1, 1 );
+ TestTile::test< Kokkos::OpenMP, 8, 8 >( 4, 4 );
+ TestTile::test< Kokkos::OpenMP, 8, 8 >( 9, 9 );
+ TestTile::test< Kokkos::OpenMP, 8, 8 >( 9, 11 );
}
-
-TEST_F( openmp , dispatch )
+TEST_F( openmp, dispatch )
{
- const int repeat = 100 ;
- for ( int i = 0 ; i < repeat ; ++i ) {
- for ( int j = 0 ; j < repeat ; ++j ) {
- Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::OpenMP >(0,j)
- , KOKKOS_LAMBDA( int ) {} );
- }}
+ const int repeat = 100;
+ for ( int i = 0; i < repeat; ++i ) {
+ for ( int j = 0; j < repeat; ++j ) {
+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::OpenMP >( 0, j )
+ , KOKKOS_LAMBDA( int ) {} );
+ }
+ }
}
-
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp
index d41e1493e..22c29308a 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Reductions.cpp
@@ -1,138 +1,146 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, long_reduce) {
- TestReduce< long , Kokkos::OpenMP >( 0 );
- TestReduce< long , Kokkos::OpenMP >( 1000000 );
+TEST_F( openmp, long_reduce )
+{
+ TestReduce< long, Kokkos::OpenMP >( 0 );
+ TestReduce< long, Kokkos::OpenMP >( 1000000 );
}
-TEST_F( openmp, double_reduce) {
- TestReduce< double , Kokkos::OpenMP >( 0 );
- TestReduce< double , Kokkos::OpenMP >( 1000000 );
+TEST_F( openmp, double_reduce )
+{
+ TestReduce< double, Kokkos::OpenMP >( 0 );
+ TestReduce< double, Kokkos::OpenMP >( 1000000 );
}
-TEST_F( openmp , reducers )
+TEST_F( openmp, reducers )
{
- TestReducers<int, Kokkos::OpenMP>::execute_integer();
- TestReducers<size_t, Kokkos::OpenMP>::execute_integer();
- TestReducers<double, Kokkos::OpenMP>::execute_float();
- TestReducers<Kokkos::complex<double>, Kokkos::OpenMP>::execute_basic();
+ TestReducers< int, Kokkos::OpenMP >::execute_integer();
+ TestReducers< size_t, Kokkos::OpenMP >::execute_integer();
+ TestReducers< double, Kokkos::OpenMP >::execute_float();
+ TestReducers< Kokkos::complex<double>, Kokkos::OpenMP >::execute_basic();
}
-TEST_F( openmp, long_reduce_dynamic ) {
- TestReduceDynamic< long , Kokkos::OpenMP >( 0 );
- TestReduceDynamic< long , Kokkos::OpenMP >( 1000000 );
+TEST_F( openmp, long_reduce_dynamic )
+{
+ TestReduceDynamic< long, Kokkos::OpenMP >( 0 );
+ TestReduceDynamic< long, Kokkos::OpenMP >( 1000000 );
}
-TEST_F( openmp, double_reduce_dynamic ) {
- TestReduceDynamic< double , Kokkos::OpenMP >( 0 );
- TestReduceDynamic< double , Kokkos::OpenMP >( 1000000 );
+TEST_F( openmp, double_reduce_dynamic )
+{
+ TestReduceDynamic< double, Kokkos::OpenMP >( 0 );
+ TestReduceDynamic< double, Kokkos::OpenMP >( 1000000 );
}
-TEST_F( openmp, long_reduce_dynamic_view ) {
- TestReduceDynamicView< long , Kokkos::OpenMP >( 0 );
- TestReduceDynamicView< long , Kokkos::OpenMP >( 1000000 );
+TEST_F( openmp, long_reduce_dynamic_view )
+{
+ TestReduceDynamicView< long, Kokkos::OpenMP >( 0 );
+ TestReduceDynamicView< long, Kokkos::OpenMP >( 1000000 );
}
-TEST_F( openmp , scan )
+TEST_F( openmp, scan )
{
- TestScan< Kokkos::OpenMP >::test_range( 1 , 1000 );
+ TestScan< Kokkos::OpenMP >::test_range( 1, 1000 );
TestScan< Kokkos::OpenMP >( 0 );
TestScan< Kokkos::OpenMP >( 100000 );
TestScan< Kokkos::OpenMP >( 10000000 );
Kokkos::OpenMP::fence();
}
#if 0
-TEST_F( openmp , scan_small )
+TEST_F( openmp, scan_small )
{
- typedef TestScan< Kokkos::OpenMP , Kokkos::Impl::OpenMPExecUseScanSmall > TestScanFunctor ;
- for ( int i = 0 ; i < 1000 ; ++i ) {
+ typedef TestScan< Kokkos::OpenMP, Kokkos::Impl::OpenMPExecUseScanSmall > TestScanFunctor;
+
+ for ( int i = 0; i < 1000; ++i ) {
TestScanFunctor( 10 );
TestScanFunctor( 10000 );
}
TestScanFunctor( 1000000 );
TestScanFunctor( 10000000 );
Kokkos::OpenMP::fence();
}
#endif
-TEST_F( openmp , team_scan )
+TEST_F( openmp, team_scan )
{
- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 10 );
- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 10000 );
- TestScanTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 10 );
+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 10000 );
+ TestScanTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
}
-TEST_F( openmp , team_long_reduce) {
- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< long , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+TEST_F( openmp, team_long_reduce )
+{
+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< long, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
}
-TEST_F( openmp , team_double_reduce) {
- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< double , Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+TEST_F( openmp, team_double_reduce )
+{
+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< double, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
}
-TEST_F( openmp , reduction_deduction )
+TEST_F( openmp, reduction_deduction )
{
TestCXX11::test_reduction_deduction< Kokkos::OpenMP >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp
index 9854417e4..fefae0732 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_a.cpp
@@ -1,92 +1,103 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_auto_1d_left ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::OpenMP >();
+TEST_F( openmp, view_subview_auto_1d_left )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft, Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_auto_1d_right ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::OpenMP >();
+TEST_F( openmp, view_subview_auto_1d_right )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight, Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_auto_1d_stride ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::OpenMP >();
+TEST_F( openmp, view_subview_auto_1d_stride )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride, Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_assign_strided ) {
+TEST_F( openmp, view_subview_assign_strided )
+{
TestViewSubview::test_1d_strided_assignment< Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_left_0 ) {
+TEST_F( openmp, view_subview_left_0 )
+{
TestViewSubview::test_left_0< Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_left_1 ) {
+TEST_F( openmp, view_subview_left_1 )
+{
TestViewSubview::test_left_1< Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_left_2 ) {
+TEST_F( openmp, view_subview_left_2 )
+{
TestViewSubview::test_left_2< Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_left_3 ) {
+TEST_F( openmp, view_subview_left_3 )
+{
TestViewSubview::test_left_3< Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_right_0 ) {
+TEST_F( openmp, view_subview_right_0 )
+{
TestViewSubview::test_right_0< Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_right_1 ) {
+TEST_F( openmp, view_subview_right_1 )
+{
TestViewSubview::test_right_1< Kokkos::OpenMP >();
}
-TEST_F( openmp, view_subview_right_3 ) {
+TEST_F( openmp, view_subview_right_3 )
+{
TestViewSubview::test_right_3< Kokkos::OpenMP >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp
index 2aa1fc5c6..7de7ca91b 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_b.cpp
@@ -1,60 +1,62 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_layoutleft_to_layoutleft) {
+TEST_F( openmp, view_subview_layoutleft_to_layoutleft )
+{
TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP >();
- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-TEST_F( openmp, view_subview_layoutright_to_layoutright) {
+TEST_F( openmp, view_subview_layoutright_to_layoutright )
+{
TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP >();
- TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
- TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp
index 1a6871cfc..d727ec0ee 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c01.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_1d_assign ) {
+TEST_F( openmp, view_subview_1d_assign )
+{
TestViewSubview::test_1d_assign< Kokkos::OpenMP >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp
index b04edbb99..df43f555d 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c02.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_1d_assign_atomic ) {
- TestViewSubview::test_1d_assign< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( openmp, view_subview_1d_assign_atomic )
+{
+ TestViewSubview::test_1d_assign< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp
index 765e23583..38f241ebf 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c03.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_1d_assign_randomaccess ) {
- TestViewSubview::test_1d_assign< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( openmp, view_subview_1d_assign_randomaccess )
+{
+ TestViewSubview::test_1d_assign< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp
index 9d8b62708..11a4ea8ac 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c04.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_2d_from_3d ) {
+TEST_F( openmp, view_subview_2d_from_3d )
+{
TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp
index 9c19cf0e5..a91baa34d 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c05.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_2d_from_3d_atomic ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( openmp, view_subview_2d_from_3d_atomic )
+{
+ TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp
index c1bdf7235..20d4d9bd6 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c06.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_2d_from_3d_randomaccess ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( openmp, view_subview_2d_from_3d_randomaccess )
+{
+ TestViewSubview::test_2d_subview_3d< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp
index 08a3b5a54..528df1c07 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c07.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_3d_from_5d_left ) {
+TEST_F( openmp, view_subview_3d_from_5d_left )
+{
TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp
index 0864ebbda..d9eea8dba 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c08.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_3d_from_5d_left_atomic ) {
- TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( openmp, view_subview_3d_from_5d_left_atomic )
+{
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp
index e38dfecbf..f909dc33c 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c09.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_3d_from_5d_left_randomaccess ) {
- TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( openmp, view_subview_3d_from_5d_left_randomaccess )
+{
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp
index b7e4683d2..59996d5e3 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c10.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_3d_from_5d_right ) {
+TEST_F( openmp, view_subview_3d_from_5d_right )
+{
TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp
index fc3e66fd4..3f9c215d9 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c11.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_3d_from_5d_right_atomic ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( openmp, view_subview_3d_from_5d_right_atomic )
+{
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp
index e21a13ee5..d3a73483a 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c12.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp, view_subview_3d_from_5d_right_randomaccess ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( openmp, view_subview_3d_from_5d_right_randomaccess )
+{
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::OpenMP, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp
index 9da159ab5..399c6e92e 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_SubView_c_all.cpp
@@ -1,12 +1,12 @@
-#include<openmp/TestOpenMP_SubView_c01.cpp>
-#include<openmp/TestOpenMP_SubView_c02.cpp>
-#include<openmp/TestOpenMP_SubView_c03.cpp>
-#include<openmp/TestOpenMP_SubView_c04.cpp>
-#include<openmp/TestOpenMP_SubView_c05.cpp>
-#include<openmp/TestOpenMP_SubView_c06.cpp>
-#include<openmp/TestOpenMP_SubView_c07.cpp>
-#include<openmp/TestOpenMP_SubView_c08.cpp>
-#include<openmp/TestOpenMP_SubView_c09.cpp>
-#include<openmp/TestOpenMP_SubView_c10.cpp>
-#include<openmp/TestOpenMP_SubView_c11.cpp>
-#include<openmp/TestOpenMP_SubView_c12.cpp>
+#include <openmp/TestOpenMP_SubView_c01.cpp>
+#include <openmp/TestOpenMP_SubView_c02.cpp>
+#include <openmp/TestOpenMP_SubView_c03.cpp>
+#include <openmp/TestOpenMP_SubView_c04.cpp>
+#include <openmp/TestOpenMP_SubView_c05.cpp>
+#include <openmp/TestOpenMP_SubView_c06.cpp>
+#include <openmp/TestOpenMP_SubView_c07.cpp>
+#include <openmp/TestOpenMP_SubView_c08.cpp>
+#include <openmp/TestOpenMP_SubView_c09.cpp>
+#include <openmp/TestOpenMP_SubView_c10.cpp>
+#include <openmp/TestOpenMP_SubView_c11.cpp>
+#include <openmp/TestOpenMP_SubView_c12.cpp>
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp
index 38cf0a0f4..216789e8b 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_Team.cpp
@@ -1,122 +1,127 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp , team_tag )
+TEST_F( openmp, team_tag )
{
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 2 );
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 2 );
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
- TestTeamPolicy< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1000 );
+ TestTeamPolicy< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1000 );
}
-TEST_F( openmp , team_shared_request) {
- TestSharedTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
- TestSharedTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( openmp, team_shared_request )
+{
+ TestSharedTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >();
+ TestSharedTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-TEST_F( openmp, team_scratch_request) {
- TestScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
- TestScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( openmp, team_scratch_request )
+{
+ TestScratchTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >();
+ TestScratchTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
-TEST_F( openmp , team_lambda_shared_request) {
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
+TEST_F( openmp, team_lambda_shared_request )
+{
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >();
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >();
}
#endif
-TEST_F( openmp, shmem_size) {
+TEST_F( openmp, shmem_size )
+{
TestShmemSize< Kokkos::OpenMP >();
}
-TEST_F( openmp, multi_level_scratch) {
- TestMultiLevelScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Static> >();
- TestMultiLevelScratchTeam< Kokkos::OpenMP , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( openmp, multi_level_scratch )
+{
+ TestMultiLevelScratchTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Static> >();
+ TestMultiLevelScratchTeam< Kokkos::OpenMP, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-TEST_F( openmp , team_vector )
+TEST_F( openmp, team_vector )
{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(4) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(5) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(6) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(7) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(8) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(9) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >(10) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 0 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 1 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 2 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 3 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 4 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 5 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 6 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 7 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 8 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 9 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::OpenMP >( 10 ) ) );
}
#ifdef KOKKOS_COMPILER_GNU
#if ( KOKKOS_COMPILER_GNU == 472 )
#define SKIP_TEST
#endif
#endif
#ifndef SKIP_TEST
TEST_F( openmp, triple_nested_parallelism )
{
- TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048 , 32 , 32 );
- TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048 , 32 , 16 );
- TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048 , 16 , 16 );
+ TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048, 32, 32 );
+ TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048, 32, 16 );
+ TestTripleNestedReduce< double, Kokkos::OpenMP >( 8192, 2048, 16, 16 );
}
#endif
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp
index 82cbf3ea1..aead381a1 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_a.cpp
@@ -1,53 +1,54 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp , impl_view_mapping_a ) {
+TEST_F( openmp, impl_view_mapping_a )
+{
test_view_mapping< Kokkos::OpenMP >();
test_view_mapping_operator< Kokkos::OpenMP >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp
index b2d4f87fd..c802fb79c 100644
--- a/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp
+++ b/lib/kokkos/core/unit_test/openmp/TestOpenMP_ViewAPI_b.cpp
@@ -1,121 +1,124 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <openmp/TestOpenMP.hpp>
namespace Test {
-TEST_F( openmp , impl_shared_alloc ) {
- test_shared_alloc< Kokkos::HostSpace , Kokkos::OpenMP >();
+TEST_F( openmp, impl_shared_alloc )
+{
+ test_shared_alloc< Kokkos::HostSpace, Kokkos::OpenMP >();
}
-TEST_F( openmp , impl_view_mapping_b ) {
+TEST_F( openmp, impl_view_mapping_b )
+{
test_view_mapping_subview< Kokkos::OpenMP >();
TestViewMappingAtomic< Kokkos::OpenMP >::run();
}
-TEST_F( openmp, view_api) {
- TestViewAPI< double , Kokkos::OpenMP >();
+TEST_F( openmp, view_api )
+{
+ TestViewAPI< double, Kokkos::OpenMP >();
}
-TEST_F( openmp , view_nested_view )
+TEST_F( openmp, view_nested_view )
{
::Test::view_nested_view< Kokkos::OpenMP >();
}
-
-
-TEST_F( openmp , view_remap )
+TEST_F( openmp, view_remap )
{
- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
-
- typedef Kokkos::View< double*[N1][N2][N3] ,
- Kokkos::LayoutRight ,
- Kokkos::OpenMP > output_type ;
-
- typedef Kokkos::View< int**[N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::OpenMP > input_type ;
-
- typedef Kokkos::View< int*[N0][N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::OpenMP > diff_type ;
-
- output_type output( "output" , N0 );
- input_type input ( "input" , N0 , N1 );
- diff_type diff ( "diff" , N0 );
-
- int value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- input(i0,i1,i2,i3) = ++value ;
- }}}}
-
- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
- Kokkos::deep_copy( output , input );
-
- value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- ++value ;
- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
- }}}}
+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };
+
+ typedef Kokkos::View< double*[N1][N2][N3],
+ Kokkos::LayoutRight,
+ Kokkos::OpenMP > output_type;
+
+ typedef Kokkos::View< int**[N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::OpenMP > input_type;
+
+ typedef Kokkos::View< int*[N0][N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::OpenMP > diff_type;
+
+ output_type output( "output", N0 );
+ input_type input ( "input", N0, N1 );
+ diff_type diff ( "diff", N0 );
+
+ int value = 0;
+
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i0 = 0; i0 < N0; ++i0 )
+ {
+ input( i0, i1, i2, i3 ) = ++value;
+ }
+
+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
+ Kokkos::deep_copy( output, input );
+
+ value = 0;
+
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i0 = 0; i0 < N0; ++i0 )
+ {
+ ++value;
+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
+ }
}
-//----------------------------------------------------------------------------
-
-TEST_F( openmp , view_aggregate )
+TEST_F( openmp, view_aggregate )
{
TestViewAggregate< Kokkos::OpenMP >();
}
-TEST_F( openmp , template_meta_functions )
+TEST_F( openmp, template_meta_functions )
{
- TestTemplateMetaFunctions<int, Kokkos::OpenMP >();
+ TestTemplateMetaFunctions< int, Kokkos::OpenMP >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads.hpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads.hpp
similarity index 86%
copy from lib/kokkos/core/unit_test/threads/TestThreads.hpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads.hpp
index 4f611cf99..907fe23ea 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads.hpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads.hpp
@@ -1,115 +1,109 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#ifndef KOKKOS_TEST_THREADS_HPP
-#define KOKKOS_TEST_THREADS_HPP
+
+#ifndef KOKKOS_TEST_QTHREADS_HPP
+#define KOKKOS_TEST_QTHREADS_HPP
+
#include <gtest/gtest.h>
#include <Kokkos_Macros.hpp>
+
#ifdef KOKKOS_LAMBDA
#undef KOKKOS_LAMBDA
#endif
#define KOKKOS_LAMBDA [=]
#include <Kokkos_Core.hpp>
#include <TestTile.hpp>
-
-//----------------------------------------------------------------------------
-
#include <TestSharedAlloc.hpp>
#include <TestViewMapping.hpp>
-
-
#include <TestViewAPI.hpp>
#include <TestViewOfClass.hpp>
#include <TestViewSubview.hpp>
#include <TestAtomic.hpp>
#include <TestAtomicOperations.hpp>
#include <TestAtomicViews.hpp>
#include <TestRange.hpp>
#include <TestTeam.hpp>
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestAggregate.hpp>
#include <TestCompilerMacros.hpp>
#include <TestTaskScheduler.hpp>
#include <TestMemoryPool.hpp>
-
-
#include <TestCXX11.hpp>
#include <TestCXX11Deduction.hpp>
#include <TestTeamVector.hpp>
#include <TestTemplateMetaFunctions.hpp>
-
#include <TestPolicyConstruction.hpp>
-
#include <TestMDRange.hpp>
namespace Test {
-class threads : public ::testing::Test {
+class qthreads : public ::testing::Test {
protected:
static void SetUpTestCase()
{
const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
- unsigned threads_count = 0 ;
+ const unsigned threads_count = std::max( 1u, numa_count ) *
+ std::max( 2u, ( cores_per_numa * threads_per_core ) / 2 );
- threads_count = std::max( 1u , numa_count )
- * std::max( 2u , cores_per_numa * threads_per_core );
+ Kokkos::Qthreads::initialize( threads_count );
+ Kokkos::print_configuration( std::cout, true );
- Kokkos::Threads::initialize( threads_count );
- Kokkos::Threads::print_configuration( std::cout , true /* detailed */ );
+ srand( 10231 );
}
static void TearDownTestCase()
{
- Kokkos::Threads::finalize();
+ Kokkos::Qthreads::finalize();
}
};
+} // namespace Test
-}
#endif
diff --git a/lib/kokkos/core/unit_test/qthreads/TestQthreads_Atomics.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Atomics.cpp
new file mode 100644
index 000000000..e64c3305d
--- /dev/null
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Atomics.cpp
@@ -0,0 +1,213 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#include <qthreads/TestQthreads.hpp>
+
+namespace Test {
+
+TEST_F( qthreads, atomics )
+{
+#if 0
+ const int loop_count = 1e4;
+
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Qthreads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Qthreads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Qthreads >( loop_count, 3 ) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Qthreads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Qthreads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Qthreads >( loop_count, 3 ) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Qthreads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Qthreads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Qthreads >( loop_count, 3 ) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Qthreads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Qthreads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Qthreads >( loop_count, 3 ) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Qthreads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Qthreads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Qthreads >( loop_count, 3 ) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Qthreads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Qthreads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Qthreads >( loop_count, 3 ) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Qthreads >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Qthreads >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Qthreads >( 100, 3 ) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Qthreads >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Qthreads >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Qthreads >( 100, 3 ) ) );
+
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Qthreads >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Qthreads >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Qthreads >( 100, 3 ) ) );
+#endif
+}
+
+TEST_F( qthreads, atomic_operations )
+{
+#if 0
+ const int start = 1; // Avoid zero for division.
+ const int end = 11;
+
+ for ( int i = start; i < end; ++i )
+ {
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Qthreads >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Qthreads >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Qthreads >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Qthreads >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Qthreads >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Qthreads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Qthreads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Qthreads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Qthreads >( start, end - i, 4 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Qthreads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Qthreads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Qthreads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Qthreads >( start, end - i, 4 ) ) );
+ }
+#endif
+}
+
+TEST_F( qthreads, atomic_views_integral )
+{
+#if 0
+ const long length = 1000000;
+
+ {
+ // Integral Types.
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Qthreads >( length, 8 ) ) );
+ }
+#endif
+}
+
+TEST_F( qthreads, atomic_views_nonintegral )
+{
+#if 0
+ const long length = 1000000;
+
+ {
+ // Non-Integral Types.
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Qthreads >( length, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Qthreads >( length, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Qthreads >( length, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Qthreads >( length, 4 ) ) );
+ }
+#endif
+}
+
+TEST_F( qthreads, atomic_view_api )
+{
+#if 0
+ TestAtomicViews::TestAtomicViewAPI< int, Kokkos::Qthreads >();
+#endif
+}
+
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/qthreads/TestQthreads_Other.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Other.cpp
new file mode 100644
index 000000000..0faec8405
--- /dev/null
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Other.cpp
@@ -0,0 +1,213 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#include <qthreads/TestQthreads.hpp>
+
+namespace Test {
+
+TEST_F( qthreads, init )
+{
+ ;
+}
+
+TEST_F( qthreads, md_range )
+{
+#if 0
+ TestMDRange_2D< Kokkos::Qthreads >::test_for2( 100, 100 );
+ TestMDRange_3D< Kokkos::Qthreads >::test_for3( 100, 100, 100 );
+#endif
+}
+
+TEST_F( qthreads, policy_construction )
+{
+#if 0
+ TestRangePolicyConstruction< Kokkos::Qthreads >();
+ TestTeamPolicyConstruction< Kokkos::Qthreads >();
+#endif
+}
+
+TEST_F( qthreads, range_tag )
+{
+#if 0
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 0 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 0 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 0 );
+
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 2 );
+
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 3 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 3 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 3 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 3 );
+
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 1000 );
+
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1001 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1001 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 1001 );
+ TestRange< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 1000 );
+#endif
+}
+
+//----------------------------------------------------------------------------
+
+TEST_F( qthreads, compiler_macros )
+{
+#if 0
+ ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Qthreads >() ) );
+#endif
+}
+
+//----------------------------------------------------------------------------
+
+TEST_F( qthreads, memory_pool )
+{
+#if 0
+ bool val = TestMemoryPool::test_mempool< Kokkos::Qthreads >( 128, 128000000 );
+ ASSERT_TRUE( val );
+
+ TestMemoryPool::test_mempool2< Kokkos::Qthreads >( 64, 4, 1000000, 2000000 );
+
+ TestMemoryPool::test_memory_exhaustion< Kokkos::Qthreads >();
+#endif
+}
+
+//----------------------------------------------------------------------------
+
+#if defined( KOKKOS_ENABLE_TASKDAG )
+
+TEST_F( qthreads, task_fib )
+{
+#if 0
+ for ( int i = 0; i < 25; ++i ) {
+ TestTaskScheduler::TestFib< Kokkos::Qthreads >::run( i, ( i + 1 ) * ( i + 1 ) * 10000 );
+ }
+#endif
+}
+
+TEST_F( qthreads, task_depend )
+{
+#if 0
+ for ( int i = 0; i < 25; ++i ) {
+ TestTaskScheduler::TestTaskDependence< Kokkos::Qthreads >::run( i );
+ }
+#endif
+}
+
+TEST_F( qthreads, task_team )
+{
+#if 0
+ TestTaskScheduler::TestTaskTeam< Kokkos::Qthreads >::run( 1000 );
+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Qthreads >::run( 1000 ); // Put back after testing.
+#endif
+}
+
+#endif // #if defined( KOKKOS_ENABLE_TASKDAG )
+
+//----------------------------------------------------------------------------
+
+#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_QTHREADS )
+
+TEST_F( qthreads, cxx11 )
+{
+#if 0
+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Qthreads >::value ) {
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Qthreads >( 1 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Qthreads >( 2 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Qthreads >( 3 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Qthreads >( 4 ) ) );
+ }
+#endif
+}
+
+#endif
+
+TEST_F( qthreads, tile_layout )
+{
+#if 0
+ TestTile::test< Kokkos::Qthreads, 1, 1 >( 1, 1 );
+ TestTile::test< Kokkos::Qthreads, 1, 1 >( 2, 3 );
+ TestTile::test< Kokkos::Qthreads, 1, 1 >( 9, 10 );
+
+ TestTile::test< Kokkos::Qthreads, 2, 2 >( 1, 1 );
+ TestTile::test< Kokkos::Qthreads, 2, 2 >( 2, 3 );
+ TestTile::test< Kokkos::Qthreads, 2, 2 >( 4, 4 );
+ TestTile::test< Kokkos::Qthreads, 2, 2 >( 9, 9 );
+
+ TestTile::test< Kokkos::Qthreads, 2, 4 >( 9, 9 );
+ TestTile::test< Kokkos::Qthreads, 4, 2 >( 9, 9 );
+
+ TestTile::test< Kokkos::Qthreads, 4, 4 >( 1, 1 );
+ TestTile::test< Kokkos::Qthreads, 4, 4 >( 4, 4 );
+ TestTile::test< Kokkos::Qthreads, 4, 4 >( 9, 9 );
+ TestTile::test< Kokkos::Qthreads, 4, 4 >( 9, 11 );
+
+ TestTile::test< Kokkos::Qthreads, 8, 8 >( 1, 1 );
+ TestTile::test< Kokkos::Qthreads, 8, 8 >( 4, 4 );
+ TestTile::test< Kokkos::Qthreads, 8, 8 >( 9, 9 );
+ TestTile::test< Kokkos::Qthreads, 8, 8 >( 9, 11 );
+#endif
+}
+
+TEST_F( qthreads, dispatch )
+{
+#if 0
+ const int repeat = 100;
+ for ( int i = 0; i < repeat; ++i ) {
+ for ( int j = 0; j < repeat; ++j ) {
+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Qthreads >( 0, j )
+ , KOKKOS_LAMBDA( int ) {} );
+ }
+ }
+#endif
+}
+
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/qthreads/TestQthreads_Reductions.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Reductions.cpp
new file mode 100644
index 000000000..a2470ac15
--- /dev/null
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Reductions.cpp
@@ -0,0 +1,168 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#include <qthreads/TestQthreads.hpp>
+
+namespace Test {
+
+TEST_F( qthreads, long_reduce )
+{
+#if 0
+ TestReduce< long, Kokkos::Qthreads >( 0 );
+ TestReduce< long, Kokkos::Qthreads >( 1000000 );
+#endif
+}
+
+TEST_F( qthreads, double_reduce )
+{
+#if 0
+ TestReduce< double, Kokkos::Qthreads >( 0 );
+ TestReduce< double, Kokkos::Qthreads >( 1000000 );
+#endif
+}
+
+TEST_F( qthreads, reducers )
+{
+#if 0
+ TestReducers< int, Kokkos::Qthreads >::execute_integer();
+ TestReducers< size_t, Kokkos::Qthreads >::execute_integer();
+ TestReducers< double, Kokkos::Qthreads >::execute_float();
+ TestReducers< Kokkos::complex<double >, Kokkos::Qthreads>::execute_basic();
+#endif
+}
+
+TEST_F( qthreads, long_reduce_dynamic )
+{
+#if 0
+ TestReduceDynamic< long, Kokkos::Qthreads >( 0 );
+ TestReduceDynamic< long, Kokkos::Qthreads >( 1000000 );
+#endif
+}
+
+TEST_F( qthreads, double_reduce_dynamic )
+{
+#if 0
+ TestReduceDynamic< double, Kokkos::Qthreads >( 0 );
+ TestReduceDynamic< double, Kokkos::Qthreads >( 1000000 );
+#endif
+}
+
+TEST_F( qthreads, long_reduce_dynamic_view )
+{
+#if 0
+ TestReduceDynamicView< long, Kokkos::Qthreads >( 0 );
+ TestReduceDynamicView< long, Kokkos::Qthreads >( 1000000 );
+#endif
+}
+
+TEST_F( qthreads, scan )
+{
+#if 0
+ TestScan< Kokkos::Qthreads >::test_range( 1, 1000 );
+ TestScan< Kokkos::Qthreads >( 0 );
+ TestScan< Kokkos::Qthreads >( 100000 );
+ TestScan< Kokkos::Qthreads >( 10000000 );
+ Kokkos::Qthreads::fence();
+#endif
+}
+
+TEST_F( qthreads, scan_small )
+{
+#if 0
+ typedef TestScan< Kokkos::Qthreads, Kokkos::Impl::QthreadsExecUseScanSmall > TestScanFunctor;
+
+ for ( int i = 0; i < 1000; ++i ) {
+ TestScanFunctor( 10 );
+ TestScanFunctor( 10000 );
+ }
+ TestScanFunctor( 1000000 );
+ TestScanFunctor( 10000000 );
+
+ Kokkos::Qthreads::fence();
+#endif
+}
+
+TEST_F( qthreads, team_scan )
+{
+#if 0
+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 10 );
+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 10000 );
+ TestScanTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
+#endif
+}
+
+TEST_F( qthreads, team_long_reduce )
+{
+#if 0
+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< long, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+#endif
+}
+
+TEST_F( qthreads, team_double_reduce )
+{
+#if 0
+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< double, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+#endif
+}
+
+TEST_F( qthreads, reduction_deduction )
+{
+#if 0
+ TestCXX11::test_reduction_deduction< Kokkos::Qthreads >();
+#endif
+}
+
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_a.cpp
similarity index 59%
copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_a.cpp
index 2df9e19de..ab873359a 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_a.cpp
@@ -1,92 +1,125 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <threads/TestThreads.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_auto_1d_left ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Threads >();
+TEST_F( qthreads, view_subview_auto_1d_left )
+{
+#if 0
+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft, Kokkos::Qthreads >();
+#endif
}
-TEST_F( threads, view_subview_auto_1d_right ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Threads >();
+TEST_F( qthreads, view_subview_auto_1d_right )
+{
+#if 0
+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight, Kokkos::Qthreads >();
+#endif
}
-TEST_F( threads, view_subview_auto_1d_stride ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Threads >();
+TEST_F( qthreads, view_subview_auto_1d_stride )
+{
+#if 0
+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride, Kokkos::Qthreads >();
+#endif
}
-TEST_F( threads, view_subview_assign_strided ) {
- TestViewSubview::test_1d_strided_assignment< Kokkos::Threads >();
+TEST_F( qthreads, view_subview_assign_strided )
+{
+#if 0
+ TestViewSubview::test_1d_strided_assignment< Kokkos::Qthreads >();
+#endif
}
-TEST_F( threads, view_subview_left_0 ) {
- TestViewSubview::test_left_0< Kokkos::Threads >();
+TEST_F( qthreads, view_subview_left_0 )
+{
+#if 0
+ TestViewSubview::test_left_0< Kokkos::Qthreads >();
+#endif
}
-TEST_F( threads, view_subview_left_1 ) {
- TestViewSubview::test_left_1< Kokkos::Threads >();
+TEST_F( qthreads, view_subview_left_1 )
+{
+#if 0
+ TestViewSubview::test_left_1< Kokkos::Qthreads >();
+#endif
}
-TEST_F( threads, view_subview_left_2 ) {
- TestViewSubview::test_left_2< Kokkos::Threads >();
+TEST_F( qthreads, view_subview_left_2 )
+{
+#if 0
+ TestViewSubview::test_left_2< Kokkos::Qthreads >();
+#endif
}
-TEST_F( threads, view_subview_left_3 ) {
- TestViewSubview::test_left_3< Kokkos::Threads >();
+TEST_F( qthreads, view_subview_left_3 )
+{
+#if 0
+ TestViewSubview::test_left_3< Kokkos::Qthreads >();
+#endif
}
-TEST_F( threads, view_subview_right_0 ) {
- TestViewSubview::test_right_0< Kokkos::Threads >();
+TEST_F( qthreads, view_subview_right_0 )
+{
+#if 0
+ TestViewSubview::test_right_0< Kokkos::Qthreads >();
+#endif
}
-TEST_F( threads, view_subview_right_1 ) {
- TestViewSubview::test_right_1< Kokkos::Threads >();
+TEST_F( qthreads, view_subview_right_1 )
+{
+#if 0
+ TestViewSubview::test_right_1< Kokkos::Qthreads >();
+#endif
}
-TEST_F( threads, view_subview_right_3 ) {
- TestViewSubview::test_right_3< Kokkos::Threads >();
+TEST_F( qthreads, view_subview_right_3 )
+{
+#if 0
+ TestViewSubview::test_right_3< Kokkos::Qthreads >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_b.cpp
similarity index 71%
copy from lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_b.cpp
index c01ca1c14..199c5c795 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_ViewAPI_a.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_b.cpp
@@ -1,59 +1,66 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <cuda/TestCuda.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( cuda , impl_view_mapping_a ) {
- test_view_mapping< Kokkos::CudaSpace >();
- test_view_mapping_operator< Kokkos::CudaSpace >();
+TEST_F( qthreads, view_subview_layoutleft_to_layoutleft )
+{
+#if 0
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Qthreads >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+#endif
}
-TEST_F( cuda , view_of_class )
+TEST_F( qthreads, view_subview_layoutright_to_layoutright )
{
- TestViewMappingClassValue< Kokkos::CudaSpace >::run();
- TestViewMappingClassValue< Kokkos::CudaUVMSpace >::run();
+#if 0
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Qthreads >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c01.cpp
similarity index 92%
copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c01.cpp
index 4c5f2ef72..f44909f3d 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c01.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <cuda/TestCuda.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( cuda, view_subview_1d_assign ) {
- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
+TEST_F( qthreads, view_subview_1d_assign )
+{
+#if 0
+ TestViewSubview::test_1d_assign< Kokkos::Qthreads >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c02.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c02.cpp
index e340240c4..7bb936f8d 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c02.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <threads/TestThreads.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_1d_assign_atomic ) {
- TestViewSubview::test_1d_assign< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( qthreads, view_subview_1d_assign_atomic )
+{
+#if 0
+ TestViewSubview::test_1d_assign< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c03.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c03.cpp
index ad27fa0fa..27073dfa8 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c03.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <threads/TestThreads.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_1d_assign_randomaccess ) {
- TestViewSubview::test_1d_assign< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( qthreads, view_subview_1d_assign_randomaccess )
+{
+#if 0
+ TestViewSubview::test_1d_assign< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c04.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c04.cpp
index 4c5f2ef72..1b3cf4885 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c04.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <cuda/TestCuda.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( cuda, view_subview_1d_assign ) {
- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
+TEST_F( qthreads, view_subview_2d_from_3d )
+{
+#if 0
+ TestViewSubview::test_2d_subview_3d< Kokkos::Qthreads >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c05.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c05.cpp
index c7dfca941..34dda63e6 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c05.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <threads/TestThreads.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_2d_from_3d_atomic ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( qthreads, view_subview_2d_from_3d_atomic )
+{
+#if 0
+ TestViewSubview::test_2d_subview_3d< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c06.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c06.cpp
index 38e839491..5a4ee50fb 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c06.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <threads/TestThreads.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_2d_from_3d_randomaccess ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( qthreads, view_subview_2d_from_3d_randomaccess )
+{
+#if 0
+ TestViewSubview::test_2d_subview_3d< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c07.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c07.cpp
index 7cef6fa07..fe386e34a 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c07.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <threads/TestThreads.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_3d_from_5d_right ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads >();
+TEST_F( qthreads, view_subview_3d_from_5d_left )
+{
+#if 0
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Qthreads >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c08.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c08.cpp
index e8ad23199..a3e0ab252 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c05.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c08.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <cuda/TestCuda.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( cuda, view_subview_2d_from_3d_atomic ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( qthreads, view_subview_3d_from_5d_left_atomic )
+{
+#if 0
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c09.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c09.cpp
index e86b4513f..df1f570e9 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c06.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c09.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <cuda/TestCuda.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( cuda, view_subview_2d_from_3d_randomaccess ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::CudaUVMSpace , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( qthreads, view_subview_3d_from_5d_left_randomaccess )
+{
+#if 0
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c10.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c10.cpp
index 4c5f2ef72..cc3c80d10 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c10.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <cuda/TestCuda.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( cuda, view_subview_1d_assign ) {
- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
+TEST_F( qthreads, view_subview_3d_from_5d_right )
+{
+#if 0
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Qthreads >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c11.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c11.cpp
index d67bf3157..14b331a45 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c11.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <threads/TestThreads.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_3d_from_5d_right_atomic ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( qthreads, view_subview_3d_from_5d_right_atomic )
+{
+#if 0
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c12.cpp
similarity index 91%
copy from lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c12.cpp
index e8a2c825c..571382e66 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c12.cpp
@@ -1,52 +1,55 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <threads/TestThreads.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_3d_from_5d_right_randomaccess ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( qthreads, view_subview_3d_from_5d_right_randomaccess )
+{
+#if 0
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Qthreads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c_all.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c_all.cpp
new file mode 100644
index 000000000..ab984c5f3
--- /dev/null
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_SubView_c_all.cpp
@@ -0,0 +1,12 @@
+#include <qthreads/TestQthreads_SubView_c01.cpp>
+#include <qthreads/TestQthreads_SubView_c02.cpp>
+#include <qthreads/TestQthreads_SubView_c03.cpp>
+#include <qthreads/TestQthreads_SubView_c04.cpp>
+#include <qthreads/TestQthreads_SubView_c05.cpp>
+#include <qthreads/TestQthreads_SubView_c06.cpp>
+#include <qthreads/TestQthreads_SubView_c07.cpp>
+#include <qthreads/TestQthreads_SubView_c08.cpp>
+#include <qthreads/TestQthreads_SubView_c09.cpp>
+#include <qthreads/TestQthreads_SubView_c10.cpp>
+#include <qthreads/TestQthreads_SubView_c11.cpp>
+#include <qthreads/TestQthreads_SubView_c12.cpp>
diff --git a/lib/kokkos/core/unit_test/qthreads/TestQthreads_Team.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Team.cpp
new file mode 100644
index 000000000..e7b81283f
--- /dev/null
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_Team.cpp
@@ -0,0 +1,143 @@
+/*
+//@HEADER
+// ************************************************************************
+//
+// Kokkos v. 2.0
+// Copyright (2014) Sandia Corporation
+//
+// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
+// the U.S. Government retains certain rights in this software.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the Corporation nor the names of the
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
+// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
+// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
+//
+// ************************************************************************
+//@HEADER
+*/
+
+#include <qthreads/TestQthreads.hpp>
+
+namespace Test {
+
+TEST_F( qthreads, team_tag )
+{
+#if 0
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
+
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 2 );
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 2 );
+
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1000 );
+ TestTeamPolicy< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1000 );
+#endif
+}
+
+TEST_F( qthreads, team_shared_request )
+{
+#if 0
+ TestSharedTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >();
+ TestSharedTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >();
+#endif
+}
+
+TEST_F( qthreads, team_scratch_request )
+{
+#if 0
+ TestScratchTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >();
+ TestScratchTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >();
+#endif
+}
+
+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
+TEST_F( qthreads, team_lambda_shared_request )
+{
+#if 0
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >();
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >();
+#endif
+}
+#endif
+
+TEST_F( qthreads, shmem_size )
+{
+#if 0
+ TestShmemSize< Kokkos::Qthreads >();
+#endif
+}
+
+TEST_F( qthreads, multi_level_scratch )
+{
+#if 0
+ TestMultiLevelScratchTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Static> >();
+ TestMultiLevelScratchTeam< Kokkos::Qthreads, Kokkos::Schedule<Kokkos::Dynamic> >();
+#endif
+}
+
+TEST_F( qthreads, team_vector )
+{
+#if 0
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 0 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 1 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 2 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 3 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 4 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 5 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 6 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 7 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 8 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 9 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Qthreads >( 10 ) ) );
+#endif
+}
+
+#ifdef KOKKOS_COMPILER_GNU
+#if ( KOKKOS_COMPILER_GNU == 472 )
+#define SKIP_TEST
+#endif
+#endif
+
+#ifndef SKIP_TEST
+TEST_F( qthreads, triple_nested_parallelism )
+{
+#if 0
+ TestTripleNestedReduce< double, Kokkos::Qthreads >( 8192, 2048, 32, 32 );
+ TestTripleNestedReduce< double, Kokkos::Qthreads >( 8192, 2048, 32, 16 );
+ TestTripleNestedReduce< double, Kokkos::Qthreads >( 8192, 2048, 16, 16 );
+#endif
+}
+#endif
+
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_a.cpp
similarity index 90%
copy from lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_a.cpp
index 4c5f2ef72..cd876a36b 100644
--- a/lib/kokkos/core/unit_test/cuda/TestCuda_SubView_c01.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_a.cpp
@@ -1,52 +1,56 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
-#include <cuda/TestCuda.hpp>
+
+#include <qthreads/TestQthreads.hpp>
namespace Test {
-TEST_F( cuda, view_subview_1d_assign ) {
- TestViewSubview::test_1d_assign< Kokkos::CudaUVMSpace >();
+TEST_F( qthreads, impl_view_mapping_a )
+{
+#if 0
+ test_view_mapping< Kokkos::Qthreads >();
+ test_view_mapping_operator< Kokkos::Qthreads >();
+#endif
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/perf_test/PerfTestHost.cpp b/lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_b.cpp
similarity index 51%
copy from lib/kokkos/core/perf_test/PerfTestHost.cpp
copy to lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_b.cpp
index 606177ca5..adf048b61 100644
--- a/lib/kokkos/core/perf_test/PerfTestHost.cpp
+++ b/lib/kokkos/core/unit_test/qthreads/TestQthreads_ViewAPI_b.cpp
@@ -1,115 +1,138 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
-#include <gtest/gtest.h>
-
-#include <Kokkos_Core.hpp>
+#include <qthreads/TestQthreads.hpp>
-#if defined( KOKKOS_ENABLE_OPENMP )
+namespace Test {
-typedef Kokkos::OpenMP TestHostDevice ;
-const char TestHostDeviceName[] = "Kokkos::OpenMP" ;
+TEST_F( qthreads, impl_shared_alloc )
+{
+#if 0
+ test_shared_alloc< Kokkos::HostSpace, Kokkos::Qthreads >();
+#endif
+}
-#elif defined( KOKKOS_ENABLE_PTHREAD )
+TEST_F( qthreads, impl_view_mapping_b )
+{
+#if 0
+ test_view_mapping_subview< Kokkos::Qthreads >();
+ TestViewMappingAtomic< Kokkos::Qthreads >::run();
+#endif
+}
-typedef Kokkos::Threads TestHostDevice ;
-const char TestHostDeviceName[] = "Kokkos::Threads" ;
+TEST_F( qthreads, view_api )
+{
+#if 0
+ TestViewAPI< double, Kokkos::Qthreads >();
+#endif
+}
-#elif defined( KOKKOS_ENABLE_SERIAL )
+TEST_F( qthreads, view_nested_view )
+{
+#if 0
+ ::Test::view_nested_view< Kokkos::Qthreads >();
+#endif
+}
-typedef Kokkos::Serial TestHostDevice ;
-const char TestHostDeviceName[] = "Kokkos::Serial" ;
+TEST_F( qthreads, view_remap )
+{
+#if 0
+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };
-#else
-# error "You must enable at least one of the following execution spaces in order to build this test: Kokkos::Threads, Kokkos::OpenMP, or Kokkos::Serial."
-#endif
+ typedef Kokkos::View< double*[N1][N2][N3],
+ Kokkos::LayoutRight,
+ Kokkos::Qthreads > output_type;
-#include <impl/Kokkos_Timer.hpp>
+ typedef Kokkos::View< int**[N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::Qthreads > input_type;
-#include <PerfTestHexGrad.hpp>
-#include <PerfTestBlasKernels.hpp>
-#include <PerfTestGramSchmidt.hpp>
-#include <PerfTestDriver.hpp>
+ typedef Kokkos::View< int*[N0][N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::Qthreads > diff_type;
-//------------------------------------------------------------------------
+ output_type output( "output", N0 );
+ input_type input ( "input", N0, N1 );
+ diff_type diff ( "diff", N0 );
-namespace Test {
+ int value = 0;
-class host : public ::testing::Test {
-protected:
- static void SetUpTestCase()
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i0 = 0; i0 < N0; ++i0 )
{
- if(Kokkos::hwloc::available()) {
- const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
- const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
- const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
-
- unsigned threads_count = 0 ;
-
- threads_count = std::max( 1u , numa_count )
- * std::max( 2u , cores_per_numa * threads_per_core );
-
- TestHostDevice::initialize( threads_count );
- } else {
- const unsigned thread_count = 4 ;
- TestHostDevice::initialize( thread_count );
- }
+ input( i0, i1, i2, i3 ) = ++value;
}
- static void TearDownTestCase()
+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
+ Kokkos::deep_copy( output, input );
+
+ value = 0;
+
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i0 = 0; i0 < N0; ++i0 )
{
- TestHostDevice::finalize();
+ ++value;
+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
}
-};
+#endif
+}
-TEST_F( host, hexgrad ) {
- EXPECT_NO_THROW(run_test_hexgrad< TestHostDevice>( 10, 20, TestHostDeviceName ));
+TEST_F( qthreads, view_aggregate )
+{
+#if 0
+ TestViewAggregate< Kokkos::Qthreads >();
+#endif
}
-TEST_F( host, gramschmidt ) {
- EXPECT_NO_THROW(run_test_gramschmidt< TestHostDevice>( 10, 20, TestHostDeviceName ));
+TEST_F( qthreads, template_meta_functions )
+{
+#if 0
+ TestTemplateMetaFunctions< int, Kokkos::Qthreads >();
+#endif
}
} // namespace Test
-
-
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial.hpp b/lib/kokkos/core/unit_test/serial/TestSerial.hpp
index c0ffa6afb..03da07e06 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial.hpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial.hpp
@@ -1,105 +1,99 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#ifndef KOKKOS_TEST_SERIAL_HPP
#define KOKKOS_TEST_SERIAL_HPP
+
#include <gtest/gtest.h>
#include <Kokkos_Macros.hpp>
+
#ifdef KOKKOS_LAMBDA
#undef KOKKOS_LAMBDA
#endif
#define KOKKOS_LAMBDA [=]
#include <Kokkos_Core.hpp>
#include <TestTile.hpp>
-
-//----------------------------------------------------------------------------
-
#include <TestSharedAlloc.hpp>
#include <TestViewMapping.hpp>
-
-
#include <TestViewAPI.hpp>
#include <TestViewOfClass.hpp>
#include <TestViewSubview.hpp>
#include <TestAtomic.hpp>
#include <TestAtomicOperations.hpp>
-
#include <TestAtomicViews.hpp>
-
#include <TestRange.hpp>
#include <TestTeam.hpp>
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestAggregate.hpp>
#include <TestCompilerMacros.hpp>
#include <TestTaskScheduler.hpp>
#include <TestMemoryPool.hpp>
-
-
#include <TestCXX11.hpp>
#include <TestCXX11Deduction.hpp>
#include <TestTeamVector.hpp>
#include <TestTemplateMetaFunctions.hpp>
-
#include <TestPolicyConstruction.hpp>
-
#include <TestMDRange.hpp>
namespace Test {
class serial : public ::testing::Test {
protected:
static void SetUpTestCase()
- {
- Kokkos::HostSpace::execution_space::initialize();
- }
+ {
+ Kokkos::HostSpace::execution_space::initialize();
+ }
+
static void TearDownTestCase()
- {
- Kokkos::HostSpace::execution_space::finalize();
- }
+ {
+ Kokkos::HostSpace::execution_space::finalize();
+ }
};
-}
+} // namespace Test
+
#endif
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp
index 729a76556..81ba532a3 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Atomics.cpp
@@ -1,204 +1,204 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial , atomics )
+TEST_F( serial, atomics )
{
- const int loop_count = 1e6 ;
+ const int loop_count = 1e6;
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Serial>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Serial >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Serial >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Serial >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Serial>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Serial >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Serial >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Serial >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Serial>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Serial >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Serial >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Serial >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Serial>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Serial >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Serial >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Serial >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Serial>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Serial >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Serial >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Serial >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Serial>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Serial >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Serial >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Serial >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Serial>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Serial >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Serial >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Serial >( 100, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Serial>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Serial >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Serial >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Serial >( 100, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Serial>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Serial >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Serial >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Serial >( 100, 3 ) ) );
}
-TEST_F( serial , atomic_operations )
+TEST_F( serial, atomic_operations )
{
- const int start = 1; //Avoid zero for division
+ const int start = 1; // Avoid zero for division.
const int end = 11;
- for (int i = start; i < end; ++i)
+
+ for ( int i = start; i < end; ++i )
{
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Serial>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Serial>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Serial>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Serial>(start, end-i, 12) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Serial>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Serial>(start, end-i, 4 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Serial>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Serial >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Serial >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Serial >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Serial >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Serial >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Serial >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Serial >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Serial >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Serial >( start, end - i, 4 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Serial >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Serial >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Serial >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Serial >( start, end - i, 4 ) ) );
}
-
}
-TEST_F( serial , atomic_views_integral )
+TEST_F( serial, atomic_views_integral )
{
const long length = 1000000;
- {
- //Integral Types
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Serial>(length, 8 ) ) );
+ {
+ // Integral Types.
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Serial >( length, 8 ) ) );
}
}
-TEST_F( serial , atomic_views_nonintegral )
+TEST_F( serial, atomic_views_nonintegral )
{
const long length = 1000000;
- {
- //Non-Integral Types
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Serial>(length, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Serial>(length, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Serial>(length, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Serial>(length, 4 ) ) );
+ {
+ // Non-Integral Types.
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Serial >( length, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Serial >( length, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Serial >( length, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Serial >( length, 4 ) ) );
}
}
-TEST_F( serial , atomic_view_api )
+TEST_F( serial, atomic_view_api )
{
- TestAtomicViews::TestAtomicViewAPI<int, Kokkos::Serial>();
+ TestAtomicViews::TestAtomicViewAPI< int, Kokkos::Serial >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp
index 43fc4c358..b40ed3f4a 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Other.cpp
@@ -1,165 +1,172 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial , md_range ) {
- TestMDRange_2D< Kokkos::Serial >::test_for2(100,100);
+TEST_F( serial , mdrange_for )
+{
+ TestMDRange_2D< Kokkos::Serial >::test_for2( 100, 100 );
+ TestMDRange_3D< Kokkos::Serial >::test_for3( 100, 10, 100 );
+ TestMDRange_4D< Kokkos::Serial >::test_for4( 100, 10, 10, 10 );
+ TestMDRange_5D< Kokkos::Serial >::test_for5( 100, 10, 10, 10, 5 );
+ TestMDRange_6D< Kokkos::Serial >::test_for6( 10, 10, 10, 10, 5, 5 );
+}
- TestMDRange_3D< Kokkos::Serial >::test_for3(100,100,100);
+TEST_F( serial , mdrange_reduce )
+{
+ TestMDRange_2D< Kokkos::Serial >::test_reduce2( 100, 100 );
+ TestMDRange_3D< Kokkos::Serial >::test_reduce3( 100, 10, 100 );
}
-TEST_F( serial, policy_construction) {
+TEST_F( serial, policy_construction )
+{
TestRangePolicyConstruction< Kokkos::Serial >();
TestTeamPolicyConstruction< Kokkos::Serial >();
}
-TEST_F( serial , range_tag )
+TEST_F( serial, range_tag )
{
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
-
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
- TestRange< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_scan( 0 );
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 0 );
+
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_scan( 1000 );
+
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1001 );
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1001 );
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 1001 );
+ TestRange< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 1000 );
}
-
//----------------------------------------------------------------------------
-TEST_F( serial , compiler_macros )
+TEST_F( serial, compiler_macros )
{
ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Serial >() ) );
}
//----------------------------------------------------------------------------
-TEST_F( serial , memory_pool )
+TEST_F( serial, memory_pool )
{
bool val = TestMemoryPool::test_mempool< Kokkos::Serial >( 128, 128000000 );
ASSERT_TRUE( val );
TestMemoryPool::test_mempool2< Kokkos::Serial >( 64, 4, 1000000, 2000000 );
TestMemoryPool::test_memory_exhaustion< Kokkos::Serial >();
}
//----------------------------------------------------------------------------
#if defined( KOKKOS_ENABLE_TASKDAG )
-TEST_F( serial , task_fib )
+TEST_F( serial, task_fib )
{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskScheduler::TestFib< Kokkos::Serial >::run(i);
+ for ( int i = 0; i < 25; ++i ) {
+ TestTaskScheduler::TestFib< Kokkos::Serial >::run( i );
}
}
-TEST_F( serial , task_depend )
+TEST_F( serial, task_depend )
{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskScheduler::TestTaskDependence< Kokkos::Serial >::run(i);
+ for ( int i = 0; i < 25; ++i ) {
+ TestTaskScheduler::TestTaskDependence< Kokkos::Serial >::run( i );
}
}
-TEST_F( serial , task_team )
+TEST_F( serial, task_team )
{
- TestTaskScheduler::TestTaskTeam< Kokkos::Serial >::run(1000);
- //TestTaskScheduler::TestTaskTeamValue< Kokkos::Serial >::run(1000); //put back after testing
+ TestTaskScheduler::TestTaskTeam< Kokkos::Serial >::run( 1000 );
+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Serial >::run( 1000 ); // Put back after testing.
}
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
//----------------------------------------------------------------------------
#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
-TEST_F( serial , cxx11 )
+TEST_F( serial, cxx11 )
{
- if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Serial >::value ) {
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(1) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(2) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(3) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >(4) ) );
+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Serial >::value ) {
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >( 1 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >( 2 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >( 3 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Serial >( 4 ) ) );
}
}
#endif
TEST_F( serial, tile_layout )
{
- TestTile::test< Kokkos::Serial , 1 , 1 >( 1 , 1 );
- TestTile::test< Kokkos::Serial , 1 , 1 >( 2 , 3 );
- TestTile::test< Kokkos::Serial , 1 , 1 >( 9 , 10 );
-
- TestTile::test< Kokkos::Serial , 2 , 2 >( 1 , 1 );
- TestTile::test< Kokkos::Serial , 2 , 2 >( 2 , 3 );
- TestTile::test< Kokkos::Serial , 2 , 2 >( 4 , 4 );
- TestTile::test< Kokkos::Serial , 2 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::Serial , 2 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::Serial , 4 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::Serial , 4 , 4 >( 1 , 1 );
- TestTile::test< Kokkos::Serial , 4 , 4 >( 4 , 4 );
- TestTile::test< Kokkos::Serial , 4 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::Serial , 4 , 4 >( 9 , 11 );
-
- TestTile::test< Kokkos::Serial , 8 , 8 >( 1 , 1 );
- TestTile::test< Kokkos::Serial , 8 , 8 >( 4 , 4 );
- TestTile::test< Kokkos::Serial , 8 , 8 >( 9 , 9 );
- TestTile::test< Kokkos::Serial , 8 , 8 >( 9 , 11 );
+ TestTile::test< Kokkos::Serial, 1, 1 >( 1, 1 );
+ TestTile::test< Kokkos::Serial, 1, 1 >( 2, 3 );
+ TestTile::test< Kokkos::Serial, 1, 1 >( 9, 10 );
+
+ TestTile::test< Kokkos::Serial, 2, 2 >( 1, 1 );
+ TestTile::test< Kokkos::Serial, 2, 2 >( 2, 3 );
+ TestTile::test< Kokkos::Serial, 2, 2 >( 4, 4 );
+ TestTile::test< Kokkos::Serial, 2, 2 >( 9, 9 );
+
+ TestTile::test< Kokkos::Serial, 2, 4 >( 9, 9 );
+ TestTile::test< Kokkos::Serial, 4, 2 >( 9, 9 );
+
+ TestTile::test< Kokkos::Serial, 4, 4 >( 1, 1 );
+ TestTile::test< Kokkos::Serial, 4, 4 >( 4, 4 );
+ TestTile::test< Kokkos::Serial, 4, 4 >( 9, 9 );
+ TestTile::test< Kokkos::Serial, 4, 4 >( 9, 11 );
+
+ TestTile::test< Kokkos::Serial, 8, 8 >( 1, 1 );
+ TestTile::test< Kokkos::Serial, 8, 8 >( 4, 4 );
+ TestTile::test< Kokkos::Serial, 8, 8 >( 9, 9 );
+ TestTile::test< Kokkos::Serial, 8, 8 >( 9, 11 );
}
-
-
-
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp
index 25b5ac6d1..8a3d518cf 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Reductions.cpp
@@ -1,122 +1,129 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, long_reduce) {
- TestReduce< long , Kokkos::Serial >( 0 );
- TestReduce< long , Kokkos::Serial >( 1000000 );
+TEST_F( serial, long_reduce )
+{
+ TestReduce< long, Kokkos::Serial >( 0 );
+ TestReduce< long, Kokkos::Serial >( 1000000 );
}
-TEST_F( serial, double_reduce) {
- TestReduce< double , Kokkos::Serial >( 0 );
- TestReduce< double , Kokkos::Serial >( 1000000 );
+TEST_F( serial, double_reduce )
+{
+ TestReduce< double, Kokkos::Serial >( 0 );
+ TestReduce< double, Kokkos::Serial >( 1000000 );
}
-TEST_F( serial , reducers )
+TEST_F( serial, reducers )
{
- TestReducers<int, Kokkos::Serial>::execute_integer();
- TestReducers<size_t, Kokkos::Serial>::execute_integer();
- TestReducers<double, Kokkos::Serial>::execute_float();
- TestReducers<Kokkos::complex<double>, Kokkos::Serial>::execute_basic();
+ TestReducers< int, Kokkos::Serial >::execute_integer();
+ TestReducers< size_t, Kokkos::Serial >::execute_integer();
+ TestReducers< double, Kokkos::Serial >::execute_float();
+ TestReducers< Kokkos::complex<double >, Kokkos::Serial>::execute_basic();
}
-TEST_F( serial, long_reduce_dynamic ) {
- TestReduceDynamic< long , Kokkos::Serial >( 0 );
- TestReduceDynamic< long , Kokkos::Serial >( 1000000 );
+TEST_F( serial, long_reduce_dynamic )
+{
+ TestReduceDynamic< long, Kokkos::Serial >( 0 );
+ TestReduceDynamic< long, Kokkos::Serial >( 1000000 );
}
-TEST_F( serial, double_reduce_dynamic ) {
- TestReduceDynamic< double , Kokkos::Serial >( 0 );
- TestReduceDynamic< double , Kokkos::Serial >( 1000000 );
+TEST_F( serial, double_reduce_dynamic )
+{
+ TestReduceDynamic< double, Kokkos::Serial >( 0 );
+ TestReduceDynamic< double, Kokkos::Serial >( 1000000 );
}
-TEST_F( serial, long_reduce_dynamic_view ) {
- TestReduceDynamicView< long , Kokkos::Serial >( 0 );
- TestReduceDynamicView< long , Kokkos::Serial >( 1000000 );
+TEST_F( serial, long_reduce_dynamic_view )
+{
+ TestReduceDynamicView< long, Kokkos::Serial >( 0 );
+ TestReduceDynamicView< long, Kokkos::Serial >( 1000000 );
}
-TEST_F( serial , scan )
+TEST_F( serial, scan )
{
- TestScan< Kokkos::Serial >::test_range( 1 , 1000 );
+ TestScan< Kokkos::Serial >::test_range( 1, 1000 );
TestScan< Kokkos::Serial >( 0 );
TestScan< Kokkos::Serial >( 10 );
TestScan< Kokkos::Serial >( 10000 );
}
-TEST_F( serial , team_scan )
+TEST_F( serial, team_scan )
{
- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 10 );
- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 10000 );
- TestScanTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 10 );
+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 10000 );
+ TestScanTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
}
-TEST_F( serial , team_long_reduce) {
- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< long , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+TEST_F( serial, team_long_reduce )
+{
+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< long, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
}
-TEST_F( serial , team_double_reduce) {
- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< double , Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+TEST_F( serial, team_double_reduce )
+{
+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< double, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
}
-TEST_F( serial , reduction_deduction )
+TEST_F( serial, reduction_deduction )
{
TestCXX11::test_reduction_deduction< Kokkos::Serial >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp
index bc838ccde..3dc3e2019 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_a.cpp
@@ -1,92 +1,103 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_auto_1d_left ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Serial >();
+TEST_F( serial, view_subview_auto_1d_left )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft, Kokkos::Serial >();
}
-TEST_F( serial, view_subview_auto_1d_right ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Serial >();
+TEST_F( serial, view_subview_auto_1d_right )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight, Kokkos::Serial >();
}
-TEST_F( serial, view_subview_auto_1d_stride ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Serial >();
+TEST_F( serial, view_subview_auto_1d_stride )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride, Kokkos::Serial >();
}
-TEST_F( serial, view_subview_assign_strided ) {
+TEST_F( serial, view_subview_assign_strided )
+{
TestViewSubview::test_1d_strided_assignment< Kokkos::Serial >();
}
-TEST_F( serial, view_subview_left_0 ) {
+TEST_F( serial, view_subview_left_0 )
+{
TestViewSubview::test_left_0< Kokkos::Serial >();
}
-TEST_F( serial, view_subview_left_1 ) {
+TEST_F( serial, view_subview_left_1 )
+{
TestViewSubview::test_left_1< Kokkos::Serial >();
}
-TEST_F( serial, view_subview_left_2 ) {
+TEST_F( serial, view_subview_left_2 )
+{
TestViewSubview::test_left_2< Kokkos::Serial >();
}
-TEST_F( serial, view_subview_left_3 ) {
+TEST_F( serial, view_subview_left_3 )
+{
TestViewSubview::test_left_3< Kokkos::Serial >();
}
-TEST_F( serial, view_subview_right_0 ) {
+TEST_F( serial, view_subview_right_0 )
+{
TestViewSubview::test_right_0< Kokkos::Serial >();
}
-TEST_F( serial, view_subview_right_1 ) {
+TEST_F( serial, view_subview_right_1 )
+{
TestViewSubview::test_right_1< Kokkos::Serial >();
}
-TEST_F( serial, view_subview_right_3 ) {
+TEST_F( serial, view_subview_right_3 )
+{
TestViewSubview::test_right_3< Kokkos::Serial >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp
index e6a5b56d3..536c3bf19 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_b.cpp
@@ -1,60 +1,62 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_layoutleft_to_layoutleft) {
+TEST_F( serial, view_subview_layoutleft_to_layoutleft )
+{
TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial >();
- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-TEST_F( serial, view_subview_layoutright_to_layoutright) {
+TEST_F( serial, view_subview_layoutright_to_layoutright )
+{
TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial >();
- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp
index 0b7a0d3bf..579a12bf7 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c01.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_1d_assign ) {
+TEST_F( serial, view_subview_1d_assign )
+{
TestViewSubview::test_1d_assign< Kokkos::Serial >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp
index 8ca7285c1..ff009fef2 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c02.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_1d_assign_atomic ) {
- TestViewSubview::test_1d_assign< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( serial, view_subview_1d_assign_atomic )
+{
+ TestViewSubview::test_1d_assign< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp
index 1d156c741..a20478433 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c03.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_1d_assign_randomaccess ) {
- TestViewSubview::test_1d_assign< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( serial, view_subview_1d_assign_randomaccess )
+{
+ TestViewSubview::test_1d_assign< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp
index ebf0e5c99..a34b26d9f 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c04.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_2d_from_3d ) {
+TEST_F( serial, view_subview_2d_from_3d )
+{
TestViewSubview::test_2d_subview_3d< Kokkos::Serial >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp
index 74acb92f1..6d1882cf0 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c05.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_2d_from_3d_atomic ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( serial, view_subview_2d_from_3d_atomic )
+{
+ TestViewSubview::test_2d_subview_3d< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp
index 8075d46e0..12fb883b6 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c06.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_2d_from_3d_randomaccess ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( serial, view_subview_2d_from_3d_randomaccess )
+{
+ TestViewSubview::test_2d_subview_3d< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp
index 9ce822264..8aae20c02 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c07.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_3d_from_5d_left ) {
+TEST_F( serial, view_subview_3d_from_5d_left )
+{
TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp
index c8a5c8f33..e75db8d52 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c08.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_3d_from_5d_left_atomic ) {
- TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( serial, view_subview_3d_from_5d_left_atomic )
+{
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp
index b66f15f17..b9cea2ce8 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c09.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_3d_from_5d_left_randomaccess ) {
- TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( serial, view_subview_3d_from_5d_left_randomaccess )
+{
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp
index 5e5e3cf3d..e5dbcead3 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c10.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_3d_from_5d_right ) {
+TEST_F( serial, view_subview_3d_from_5d_right )
+{
TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp
index 55a353bca..3005030f9 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c11.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_3d_from_5d_right_atomic ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( serial, view_subview_3d_from_5d_right_atomic )
+{
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp
index a168e1e23..fee8cb7af 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c12.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial, view_subview_3d_from_5d_right_randomaccess ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( serial, view_subview_3d_from_5d_right_randomaccess )
+{
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Serial, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp
index a489b0fcb..24dc6b506 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_SubView_c_all.cpp
@@ -1,12 +1,12 @@
-#include<serial/TestSerial_SubView_c01.cpp>
-#include<serial/TestSerial_SubView_c02.cpp>
-#include<serial/TestSerial_SubView_c03.cpp>
-#include<serial/TestSerial_SubView_c04.cpp>
-#include<serial/TestSerial_SubView_c05.cpp>
-#include<serial/TestSerial_SubView_c06.cpp>
-#include<serial/TestSerial_SubView_c07.cpp>
-#include<serial/TestSerial_SubView_c08.cpp>
-#include<serial/TestSerial_SubView_c09.cpp>
-#include<serial/TestSerial_SubView_c10.cpp>
-#include<serial/TestSerial_SubView_c11.cpp>
-#include<serial/TestSerial_SubView_c12.cpp>
+#include <serial/TestSerial_SubView_c01.cpp>
+#include <serial/TestSerial_SubView_c02.cpp>
+#include <serial/TestSerial_SubView_c03.cpp>
+#include <serial/TestSerial_SubView_c04.cpp>
+#include <serial/TestSerial_SubView_c05.cpp>
+#include <serial/TestSerial_SubView_c06.cpp>
+#include <serial/TestSerial_SubView_c07.cpp>
+#include <serial/TestSerial_SubView_c08.cpp>
+#include <serial/TestSerial_SubView_c09.cpp>
+#include <serial/TestSerial_SubView_c10.cpp>
+#include <serial/TestSerial_SubView_c11.cpp>
+#include <serial/TestSerial_SubView_c12.cpp>
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp
index df400b4cb..f13b2ce1b 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_Team.cpp
@@ -1,117 +1,122 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial , team_tag )
+TEST_F( serial, team_tag )
{
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
- TestTeamPolicy< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1000 );
+ TestTeamPolicy< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1000 );
}
-TEST_F( serial , team_shared_request) {
- TestSharedTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
- TestSharedTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( serial, team_shared_request )
+{
+ TestSharedTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >();
+ TestSharedTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-TEST_F( serial, team_scratch_request) {
- TestScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
- TestScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( serial, team_scratch_request )
+{
+ TestScratchTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >();
+ TestScratchTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
-TEST_F( serial , team_lambda_shared_request) {
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
+TEST_F( serial, team_lambda_shared_request )
+{
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >();
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >();
}
#endif
-TEST_F( serial, shmem_size) {
+TEST_F( serial, shmem_size )
+{
TestShmemSize< Kokkos::Serial >();
}
-TEST_F( serial, multi_level_scratch) {
- TestMultiLevelScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Static> >();
- TestMultiLevelScratchTeam< Kokkos::Serial , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( serial, multi_level_scratch )
+{
+ TestMultiLevelScratchTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Static> >();
+ TestMultiLevelScratchTeam< Kokkos::Serial, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-TEST_F( serial , team_vector )
+TEST_F( serial, team_vector )
{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(4) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(5) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(6) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(7) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(8) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(9) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >(10) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 0 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 1 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 2 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 3 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 4 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 5 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 6 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 7 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 8 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 9 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Serial >( 10 ) ) );
}
#ifdef KOKKOS_COMPILER_GNU
#if ( KOKKOS_COMPILER_GNU == 472 )
#define SKIP_TEST
#endif
#endif
#ifndef SKIP_TEST
TEST_F( serial, triple_nested_parallelism )
{
- TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048 , 32 , 32 );
- TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048 , 32 , 16 );
- TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048 , 16 , 16 );
+ TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048, 32, 32 );
+ TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048, 32, 16 );
+ TestTripleNestedReduce< double, Kokkos::Serial >( 8192, 2048, 16, 16 );
}
#endif
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp
index 4c655fe77..2192159b8 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_a.cpp
@@ -1,53 +1,54 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial , impl_view_mapping_a ) {
+TEST_F( serial, impl_view_mapping_a )
+{
test_view_mapping< Kokkos::Serial >();
test_view_mapping_operator< Kokkos::Serial >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp
index 4947f2eaa..8c48ad2ce 100644
--- a/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp
+++ b/lib/kokkos/core/unit_test/serial/TestSerial_ViewAPI_b.cpp
@@ -1,121 +1,124 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <serial/TestSerial.hpp>
namespace Test {
-TEST_F( serial , impl_shared_alloc ) {
- test_shared_alloc< Kokkos::HostSpace , Kokkos::Serial >();
+TEST_F( serial, impl_shared_alloc )
+{
+ test_shared_alloc< Kokkos::HostSpace, Kokkos::Serial >();
}
-TEST_F( serial , impl_view_mapping_b ) {
+TEST_F( serial, impl_view_mapping_b )
+{
test_view_mapping_subview< Kokkos::Serial >();
TestViewMappingAtomic< Kokkos::Serial >::run();
}
-TEST_F( serial, view_api) {
- TestViewAPI< double , Kokkos::Serial >();
+TEST_F( serial, view_api )
+{
+ TestViewAPI< double, Kokkos::Serial >();
}
-TEST_F( serial , view_nested_view )
+TEST_F( serial, view_nested_view )
{
::Test::view_nested_view< Kokkos::Serial >();
}
-
-
-TEST_F( serial , view_remap )
+TEST_F( serial, view_remap )
{
- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
-
- typedef Kokkos::View< double*[N1][N2][N3] ,
- Kokkos::LayoutRight ,
- Kokkos::Serial > output_type ;
-
- typedef Kokkos::View< int**[N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::Serial > input_type ;
-
- typedef Kokkos::View< int*[N0][N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::Serial > diff_type ;
-
- output_type output( "output" , N0 );
- input_type input ( "input" , N0 , N1 );
- diff_type diff ( "diff" , N0 );
-
- int value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- input(i0,i1,i2,i3) = ++value ;
- }}}}
-
- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
- Kokkos::deep_copy( output , input );
-
- value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- ++value ;
- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
- }}}}
+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };
+
+ typedef Kokkos::View< double*[N1][N2][N3],
+ Kokkos::LayoutRight,
+ Kokkos::Serial > output_type;
+
+ typedef Kokkos::View< int**[N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::Serial > input_type;
+
+ typedef Kokkos::View< int*[N0][N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::Serial > diff_type;
+
+ output_type output( "output", N0 );
+ input_type input ( "input", N0, N1 );
+ diff_type diff ( "diff", N0 );
+
+ int value = 0;
+
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i0 = 0; i0 < N0; ++i0 )
+ {
+ input( i0, i1, i2, i3 ) = ++value;
+ }
+
+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
+ Kokkos::deep_copy( output, input );
+
+ value = 0;
+
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i0 = 0; i0 < N0; ++i0 )
+ {
+ ++value;
+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
+ }
}
-//----------------------------------------------------------------------------
-
-TEST_F( serial , view_aggregate )
+TEST_F( serial, view_aggregate )
{
TestViewAggregate< Kokkos::Serial >();
}
-TEST_F( serial , template_meta_functions )
+TEST_F( serial, template_meta_functions )
{
- TestTemplateMetaFunctions<int, Kokkos::Serial >();
+ TestTemplateMetaFunctions< int, Kokkos::Serial >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads.hpp b/lib/kokkos/core/unit_test/threads/TestThreads.hpp
index 4f611cf99..0afd6772f 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads.hpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads.hpp
@@ -1,115 +1,109 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#ifndef KOKKOS_TEST_THREADS_HPP
#define KOKKOS_TEST_THREADS_HPP
+
#include <gtest/gtest.h>
#include <Kokkos_Macros.hpp>
+
#ifdef KOKKOS_LAMBDA
#undef KOKKOS_LAMBDA
#endif
#define KOKKOS_LAMBDA [=]
#include <Kokkos_Core.hpp>
#include <TestTile.hpp>
-
-//----------------------------------------------------------------------------
-
#include <TestSharedAlloc.hpp>
#include <TestViewMapping.hpp>
-
-
#include <TestViewAPI.hpp>
#include <TestViewOfClass.hpp>
#include <TestViewSubview.hpp>
#include <TestAtomic.hpp>
#include <TestAtomicOperations.hpp>
#include <TestAtomicViews.hpp>
#include <TestRange.hpp>
#include <TestTeam.hpp>
#include <TestReduce.hpp>
#include <TestScan.hpp>
#include <TestAggregate.hpp>
#include <TestCompilerMacros.hpp>
#include <TestTaskScheduler.hpp>
#include <TestMemoryPool.hpp>
-
-
#include <TestCXX11.hpp>
#include <TestCXX11Deduction.hpp>
#include <TestTeamVector.hpp>
#include <TestTemplateMetaFunctions.hpp>
-
#include <TestPolicyConstruction.hpp>
-
#include <TestMDRange.hpp>
namespace Test {
class threads : public ::testing::Test {
protected:
static void SetUpTestCase()
{
const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
- unsigned threads_count = 0 ;
+ unsigned threads_count = 0;
- threads_count = std::max( 1u , numa_count )
- * std::max( 2u , cores_per_numa * threads_per_core );
+ threads_count = std::max( 1u, numa_count )
+ * std::max( 2u, cores_per_numa * threads_per_core );
Kokkos::Threads::initialize( threads_count );
- Kokkos::Threads::print_configuration( std::cout , true /* detailed */ );
+ Kokkos::print_configuration( std::cout, true /* detailed */ );
}
static void TearDownTestCase()
{
Kokkos::Threads::finalize();
}
};
+} // namespace Test
-}
#endif
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp
index 6e24c4973..d2a5ea5d6 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Atomics.cpp
@@ -1,204 +1,200 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads , atomics )
+TEST_F( threads, atomics )
{
- const int loop_count = 1e4 ;
+ const int loop_count = 1e4;
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<int,Kokkos::Threads>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Threads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Threads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< int, Kokkos::Threads >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned int,Kokkos::Threads>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Threads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Threads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned int, Kokkos::Threads >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long int,Kokkos::Threads>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Threads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Threads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long int, Kokkos::Threads >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<unsigned long int,Kokkos::Threads>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Threads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Threads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< unsigned long int, Kokkos::Threads >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<long long int,Kokkos::Threads>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Threads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Threads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< long long int, Kokkos::Threads >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<double,Kokkos::Threads>(loop_count,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Threads >( loop_count, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Threads >( loop_count, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< double, Kokkos::Threads >( loop_count, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<float,Kokkos::Threads>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Threads >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Threads >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< float, Kokkos::Threads >( 100, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<Kokkos::complex<double> ,Kokkos::Threads>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Threads >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Threads >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< Kokkos::complex<double>, Kokkos::Threads >( 100, 3 ) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Threads>(100,1) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Threads>(100,2) ) );
- ASSERT_TRUE( ( TestAtomic::Loop<TestAtomic::SuperScalar<4> ,Kokkos::Threads>(100,3) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Threads >( 100, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Threads >( 100, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomic::Loop< TestAtomic::SuperScalar<4>, Kokkos::Threads >( 100, 3 ) ) );
}
-TEST_F( threads , atomic_operations )
+TEST_F( threads, atomic_operations )
{
- const int start = 1; //Avoid zero for division
+ const int start = 1; // Avoid zero for division.
const int end = 11;
- for (int i = start; i < end; ++i)
+ for ( int i = start; i < end; ++i )
{
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<int,Kokkos::Threads>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned int,Kokkos::Threads>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long int,Kokkos::Threads>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<unsigned long int,Kokkos::Threads>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 8 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 9 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 11 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType<long long int,Kokkos::Threads>(start, end-i, 12 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<double,Kokkos::Threads>(start, end-i, 4 ) ) );
-
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType<float,Kokkos::Threads>(start, end-i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< int, Kokkos::Threads >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned int, Kokkos::Threads >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long int, Kokkos::Threads >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< unsigned long int, Kokkos::Threads >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 8 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 9 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 11 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestIntegralType< long long int, Kokkos::Threads >( start, end - i, 12 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Threads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Threads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Threads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< double, Kokkos::Threads >( start, end - i, 4 ) ) );
+
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Threads >( start, end - i, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Threads >( start, end - i, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Threads >( start, end - i, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicOperations::AtomicOperationsTestNonIntegralType< float, Kokkos::Threads >( start, end - i, 4 ) ) );
}
-
}
-
-TEST_F( threads , atomic_views_integral )
+TEST_F( threads, atomic_views_integral )
{
const long length = 1000000;
{
- //Integral Types
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 4 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 5 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 6 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 7 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType<long, Kokkos::Threads>(length, 8 ) ) );
-
+ // Integral Types.
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 4 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 5 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 6 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 7 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestIntegralType< long, Kokkos::Threads >( length, 8 ) ) );
}
}
-TEST_F( threads , atomic_views_nonintegral )
+TEST_F( threads, atomic_views_nonintegral )
{
const long length = 1000000;
{
- //Non-Integral Types
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Threads>(length, 1 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Threads>(length, 2 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Threads>(length, 3 ) ) );
- ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType<double,Kokkos::Threads>(length, 4 ) ) );
-
+ // Non-Integral Types.
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Threads >( length, 1 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Threads >( length, 2 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Threads >( length, 3 ) ) );
+ ASSERT_TRUE( ( TestAtomicViews::AtomicViewsTestNonIntegralType< double, Kokkos::Threads >( length, 4 ) ) );
}
}
-TEST_F( threads , atomic_view_api )
+TEST_F( threads, atomic_view_api )
{
- TestAtomicViews::TestAtomicViewAPI<int, Kokkos::Threads>();
+ TestAtomicViews::TestAtomicViewAPI< int, Kokkos::Threads >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp
index ac0356eeb..7d268c145 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Other.cpp
@@ -1,189 +1,196 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads , init ) {
+TEST_F( threads, init )
+{
;
}
-TEST_F( threads , md_range ) {
- TestMDRange_2D< Kokkos::Threads >::test_for2(100,100);
+TEST_F( threads , mdrange_for ) {
+ TestMDRange_2D< Kokkos::Threads >::test_for2( 100, 100 );
+ TestMDRange_3D< Kokkos::Threads >::test_for3( 100, 10, 100 );
+ TestMDRange_4D< Kokkos::Threads >::test_for4( 100, 10, 10, 10 );
+ TestMDRange_5D< Kokkos::Threads >::test_for5( 100, 10, 10, 10, 5 );
+ TestMDRange_6D< Kokkos::Threads >::test_for6( 10, 10, 10, 10, 5, 5 );
+}
- TestMDRange_3D< Kokkos::Threads >::test_for3(100,100,100);
+TEST_F( threads , mdrange_reduce ) {
+ TestMDRange_2D< Kokkos::Threads >::test_reduce2( 100, 100 );
+ TestMDRange_3D< Kokkos::Threads >::test_reduce3( 100, 10, 100 );
}
-TEST_F( threads, policy_construction) {
+TEST_F( threads, policy_construction )
+{
TestRangePolicyConstruction< Kokkos::Threads >();
TestTeamPolicyConstruction< Kokkos::Threads >();
}
-TEST_F( threads , range_tag )
+TEST_F( threads, range_tag )
{
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(0);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(0);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(0);
-
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(2);
-
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(3);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(3);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(3);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(3);
-
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_scan(1000);
-
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1001);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1001);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_scan(1001);
- TestRange< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy(1000);
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 0 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 0 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 0 );
+
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 2 );
+
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 3 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 3 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 3 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 3 );
+
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_scan( 1000 );
+
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1001 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1001 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_scan( 1001 );
+ TestRange< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_dynamic_policy( 1000 );
}
-
//----------------------------------------------------------------------------
-TEST_F( threads , compiler_macros )
+TEST_F( threads, compiler_macros )
{
ASSERT_TRUE( ( TestCompilerMacros::Test< Kokkos::Threads >() ) );
}
//----------------------------------------------------------------------------
-TEST_F( threads , memory_pool )
+TEST_F( threads, memory_pool )
{
bool val = TestMemoryPool::test_mempool< Kokkos::Threads >( 128, 128000000 );
ASSERT_TRUE( val );
TestMemoryPool::test_mempool2< Kokkos::Threads >( 64, 4, 1000000, 2000000 );
TestMemoryPool::test_memory_exhaustion< Kokkos::Threads >();
}
//----------------------------------------------------------------------------
#if defined( KOKKOS_ENABLE_TASKDAG )
/*
-TEST_F( threads , task_fib )
+TEST_F( threads, task_fib )
{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskScheduler::TestFib< Kokkos::Threads >::run(i);
+ for ( int i = 0; i < 25; ++i ) {
+ TestTaskScheduler::TestFib< Kokkos::Threads >::run( i );
}
}
-TEST_F( threads , task_depend )
+TEST_F( threads, task_depend )
{
- for ( int i = 0 ; i < 25 ; ++i ) {
- TestTaskScheduler::TestTaskDependence< Kokkos::Threads >::run(i);
+ for ( int i = 0; i < 25; ++i ) {
+ TestTaskScheduler::TestTaskDependence< Kokkos::Threads >::run( i );
}
}
-TEST_F( threads , task_team )
+TEST_F( threads, task_team )
{
- TestTaskScheduler::TestTaskTeam< Kokkos::Threads >::run(1000);
- //TestTaskScheduler::TestTaskTeamValue< Kokkos::Threads >::run(1000); //put back after testing
+ TestTaskScheduler::TestTaskTeam< Kokkos::Threads >::run( 1000 );
+ //TestTaskScheduler::TestTaskTeamValue< Kokkos::Threads >::run( 1000 ); // Put back after testing.
}
*/
#endif /* #if defined( KOKKOS_ENABLE_TASKDAG ) */
//----------------------------------------------------------------------------
#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
-TEST_F( threads , cxx11 )
+TEST_F( threads, cxx11 )
{
- if ( std::is_same< Kokkos::DefaultExecutionSpace , Kokkos::Threads >::value ) {
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(1) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(2) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(3) ) );
- ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >(4) ) );
+ if ( std::is_same< Kokkos::DefaultExecutionSpace, Kokkos::Threads >::value ) {
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >( 1 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >( 2 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >( 3 ) ) );
+ ASSERT_TRUE( ( TestCXX11::Test< Kokkos::Threads >( 4 ) ) );
}
}
#endif
TEST_F( threads, tile_layout )
{
- TestTile::test< Kokkos::Threads , 1 , 1 >( 1 , 1 );
- TestTile::test< Kokkos::Threads , 1 , 1 >( 2 , 3 );
- TestTile::test< Kokkos::Threads , 1 , 1 >( 9 , 10 );
-
- TestTile::test< Kokkos::Threads , 2 , 2 >( 1 , 1 );
- TestTile::test< Kokkos::Threads , 2 , 2 >( 2 , 3 );
- TestTile::test< Kokkos::Threads , 2 , 2 >( 4 , 4 );
- TestTile::test< Kokkos::Threads , 2 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::Threads , 2 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::Threads , 4 , 2 >( 9 , 9 );
-
- TestTile::test< Kokkos::Threads , 4 , 4 >( 1 , 1 );
- TestTile::test< Kokkos::Threads , 4 , 4 >( 4 , 4 );
- TestTile::test< Kokkos::Threads , 4 , 4 >( 9 , 9 );
- TestTile::test< Kokkos::Threads , 4 , 4 >( 9 , 11 );
-
- TestTile::test< Kokkos::Threads , 8 , 8 >( 1 , 1 );
- TestTile::test< Kokkos::Threads , 8 , 8 >( 4 , 4 );
- TestTile::test< Kokkos::Threads , 8 , 8 >( 9 , 9 );
- TestTile::test< Kokkos::Threads , 8 , 8 >( 9 , 11 );
+ TestTile::test< Kokkos::Threads, 1, 1 >( 1, 1 );
+ TestTile::test< Kokkos::Threads, 1, 1 >( 2, 3 );
+ TestTile::test< Kokkos::Threads, 1, 1 >( 9, 10 );
+
+ TestTile::test< Kokkos::Threads, 2, 2 >( 1, 1 );
+ TestTile::test< Kokkos::Threads, 2, 2 >( 2, 3 );
+ TestTile::test< Kokkos::Threads, 2, 2 >( 4, 4 );
+ TestTile::test< Kokkos::Threads, 2, 2 >( 9, 9 );
+
+ TestTile::test< Kokkos::Threads, 2, 4 >( 9, 9 );
+ TestTile::test< Kokkos::Threads, 4, 2 >( 9, 9 );
+
+ TestTile::test< Kokkos::Threads, 4, 4 >( 1, 1 );
+ TestTile::test< Kokkos::Threads, 4, 4 >( 4, 4 );
+ TestTile::test< Kokkos::Threads, 4, 4 >( 9, 9 );
+ TestTile::test< Kokkos::Threads, 4, 4 >( 9, 11 );
+
+ TestTile::test< Kokkos::Threads, 8, 8 >( 1, 1 );
+ TestTile::test< Kokkos::Threads, 8, 8 >( 4, 4 );
+ TestTile::test< Kokkos::Threads, 8, 8 >( 9, 9 );
+ TestTile::test< Kokkos::Threads, 8, 8 >( 9, 11 );
}
-
-TEST_F( threads , dispatch )
+TEST_F( threads, dispatch )
{
- const int repeat = 100 ;
- for ( int i = 0 ; i < repeat ; ++i ) {
- for ( int j = 0 ; j < repeat ; ++j ) {
- Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Threads >(0,j)
- , KOKKOS_LAMBDA( int ) {} );
- }}
+ const int repeat = 100;
+ for ( int i = 0; i < repeat; ++i ) {
+ for ( int j = 0; j < repeat; ++j ) {
+ Kokkos::parallel_for( Kokkos::RangePolicy< Kokkos::Threads >( 0, j )
+ , KOKKOS_LAMBDA( int ) {} );
+ }
+ }
}
-
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp
index a637d1e3a..d2b75ca89 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Reductions.cpp
@@ -1,138 +1,146 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, long_reduce) {
- TestReduce< long , Kokkos::Threads >( 0 );
- TestReduce< long , Kokkos::Threads >( 1000000 );
+TEST_F( threads, long_reduce )
+{
+ TestReduce< long, Kokkos::Threads >( 0 );
+ TestReduce< long, Kokkos::Threads >( 1000000 );
}
-TEST_F( threads, double_reduce) {
- TestReduce< double , Kokkos::Threads >( 0 );
- TestReduce< double , Kokkos::Threads >( 1000000 );
+TEST_F( threads, double_reduce )
+{
+ TestReduce< double, Kokkos::Threads >( 0 );
+ TestReduce< double, Kokkos::Threads >( 1000000 );
}
-TEST_F( threads , reducers )
+TEST_F( threads, reducers )
{
- TestReducers<int, Kokkos::Threads>::execute_integer();
- TestReducers<size_t, Kokkos::Threads>::execute_integer();
- TestReducers<double, Kokkos::Threads>::execute_float();
- TestReducers<Kokkos::complex<double>, Kokkos::Threads>::execute_basic();
+ TestReducers< int, Kokkos::Threads >::execute_integer();
+ TestReducers< size_t, Kokkos::Threads >::execute_integer();
+ TestReducers< double, Kokkos::Threads >::execute_float();
+ TestReducers< Kokkos::complex<double>, Kokkos::Threads >::execute_basic();
}
-TEST_F( threads, long_reduce_dynamic ) {
- TestReduceDynamic< long , Kokkos::Threads >( 0 );
- TestReduceDynamic< long , Kokkos::Threads >( 1000000 );
+TEST_F( threads, long_reduce_dynamic )
+{
+ TestReduceDynamic< long, Kokkos::Threads >( 0 );
+ TestReduceDynamic< long, Kokkos::Threads >( 1000000 );
}
-TEST_F( threads, double_reduce_dynamic ) {
- TestReduceDynamic< double , Kokkos::Threads >( 0 );
- TestReduceDynamic< double , Kokkos::Threads >( 1000000 );
+TEST_F( threads, double_reduce_dynamic )
+{
+ TestReduceDynamic< double, Kokkos::Threads >( 0 );
+ TestReduceDynamic< double, Kokkos::Threads >( 1000000 );
}
-TEST_F( threads, long_reduce_dynamic_view ) {
- TestReduceDynamicView< long , Kokkos::Threads >( 0 );
- TestReduceDynamicView< long , Kokkos::Threads >( 1000000 );
+TEST_F( threads, long_reduce_dynamic_view )
+{
+ TestReduceDynamicView< long, Kokkos::Threads >( 0 );
+ TestReduceDynamicView< long, Kokkos::Threads >( 1000000 );
}
-TEST_F( threads , scan )
+TEST_F( threads, scan )
{
- TestScan< Kokkos::Threads >::test_range( 1 , 1000 );
+ TestScan< Kokkos::Threads >::test_range( 1, 1000 );
TestScan< Kokkos::Threads >( 0 );
TestScan< Kokkos::Threads >( 100000 );
TestScan< Kokkos::Threads >( 10000000 );
Kokkos::Threads::fence();
}
#if 0
-TEST_F( threads , scan_small )
+TEST_F( threads, scan_small )
{
- typedef TestScan< Kokkos::Threads , Kokkos::Impl::ThreadsExecUseScanSmall > TestScanFunctor ;
- for ( int i = 0 ; i < 1000 ; ++i ) {
+ typedef TestScan< Kokkos::Threads, Kokkos::Impl::ThreadsExecUseScanSmall > TestScanFunctor;
+
+ for ( int i = 0; i < 1000; ++i ) {
TestScanFunctor( 10 );
TestScanFunctor( 10000 );
}
TestScanFunctor( 1000000 );
TestScanFunctor( 10000000 );
Kokkos::Threads::fence();
}
#endif
-TEST_F( threads , team_scan )
+TEST_F( threads, team_scan )
{
- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 10 );
- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 10000 );
- TestScanTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 10 );
+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 10 );
+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 10000 );
+ TestScanTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 10000 );
}
-TEST_F( threads , team_long_reduce) {
- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< long , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+TEST_F( threads, team_long_reduce )
+{
+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< long, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
}
-TEST_F( threads , team_double_reduce) {
- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 0 );
- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 3 );
- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >( 100000 );
- TestReduceTeam< double , Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
+TEST_F( threads, team_double_reduce )
+{
+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 0 );
+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 0 );
+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 3 );
+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 3 );
+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >( 100000 );
+ TestReduceTeam< double, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >( 100000 );
}
-TEST_F( threads , reduction_deduction )
+TEST_F( threads, reduction_deduction )
{
TestCXX11::test_reduction_deduction< Kokkos::Threads >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
index 2df9e19de..68a9da6ae 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_a.cpp
@@ -1,92 +1,103 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_auto_1d_left ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutLeft,Kokkos::Threads >();
+TEST_F( threads, view_subview_auto_1d_left )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutLeft, Kokkos::Threads >();
}
-TEST_F( threads, view_subview_auto_1d_right ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutRight,Kokkos::Threads >();
+TEST_F( threads, view_subview_auto_1d_right )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutRight, Kokkos::Threads >();
}
-TEST_F( threads, view_subview_auto_1d_stride ) {
- TestViewSubview::test_auto_1d< Kokkos::LayoutStride,Kokkos::Threads >();
+TEST_F( threads, view_subview_auto_1d_stride )
+{
+ TestViewSubview::test_auto_1d< Kokkos::LayoutStride, Kokkos::Threads >();
}
-TEST_F( threads, view_subview_assign_strided ) {
+TEST_F( threads, view_subview_assign_strided )
+{
TestViewSubview::test_1d_strided_assignment< Kokkos::Threads >();
}
-TEST_F( threads, view_subview_left_0 ) {
+TEST_F( threads, view_subview_left_0 )
+{
TestViewSubview::test_left_0< Kokkos::Threads >();
}
-TEST_F( threads, view_subview_left_1 ) {
+TEST_F( threads, view_subview_left_1 )
+{
TestViewSubview::test_left_1< Kokkos::Threads >();
}
-TEST_F( threads, view_subview_left_2 ) {
+TEST_F( threads, view_subview_left_2 )
+{
TestViewSubview::test_left_2< Kokkos::Threads >();
}
-TEST_F( threads, view_subview_left_3 ) {
+TEST_F( threads, view_subview_left_3 )
+{
TestViewSubview::test_left_3< Kokkos::Threads >();
}
-TEST_F( threads, view_subview_right_0 ) {
+TEST_F( threads, view_subview_right_0 )
+{
TestViewSubview::test_right_0< Kokkos::Threads >();
}
-TEST_F( threads, view_subview_right_1 ) {
+TEST_F( threads, view_subview_right_1 )
+{
TestViewSubview::test_right_1< Kokkos::Threads >();
}
-TEST_F( threads, view_subview_right_3 ) {
+TEST_F( threads, view_subview_right_3 )
+{
TestViewSubview::test_right_3< Kokkos::Threads >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp
index d57dbe97c..c5cf061e8 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_b.cpp
@@ -1,60 +1,62 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_layoutleft_to_layoutleft) {
+TEST_F( threads, view_subview_layoutleft_to_layoutleft )
+{
TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads >();
- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
- TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutleft_to_layoutleft< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-TEST_F( threads, view_subview_layoutright_to_layoutright) {
+TEST_F( threads, view_subview_layoutright_to_layoutright )
+{
TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads >();
- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
- TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
+ TestViewSubview::test_layoutright_to_layoutright< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp
index 67d998c0e..9018c1f4f 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c01.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_1d_assign ) {
+TEST_F( threads, view_subview_1d_assign )
+{
TestViewSubview::test_1d_assign< Kokkos::Threads >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
index e340240c4..9483abd9c 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c02.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_1d_assign_atomic ) {
- TestViewSubview::test_1d_assign< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( threads, view_subview_1d_assign_atomic )
+{
+ TestViewSubview::test_1d_assign< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
index ad27fa0fa..e252a2656 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c03.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_1d_assign_randomaccess ) {
- TestViewSubview::test_1d_assign< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( threads, view_subview_1d_assign_randomaccess )
+{
+ TestViewSubview::test_1d_assign< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp
index 6fca47cc4..3e211b1a5 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c04.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_2d_from_3d ) {
+TEST_F( threads, view_subview_2d_from_3d )
+{
TestViewSubview::test_2d_subview_3d< Kokkos::Threads >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
index c7dfca941..865d50b1a 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c05.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_2d_from_3d_atomic ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( threads, view_subview_2d_from_3d_atomic )
+{
+ TestViewSubview::test_2d_subview_3d< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
index 38e839491..c5840073b 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c06.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_2d_from_3d_randomaccess ) {
- TestViewSubview::test_2d_subview_3d< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( threads, view_subview_2d_from_3d_randomaccess )
+{
+ TestViewSubview::test_2d_subview_3d< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp
index 1f01fe6b5..7b8825ef6 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c07.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_3d_from_5d_left ) {
+TEST_F( threads, view_subview_3d_from_5d_left )
+{
TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp
index e9a1ccbe3..7bc16a582 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c08.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_3d_from_5d_left_atomic ) {
- TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( threads, view_subview_3d_from_5d_left_atomic )
+{
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp
index c8b6c8743..57b87b609 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c09.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_3d_from_5d_left_randomaccess ) {
- TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( threads, view_subview_3d_from_5d_left_randomaccess )
+{
+ TestViewSubview::test_3d_subview_5d_left< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
index 7cef6fa07..1875a883d 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c10.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_3d_from_5d_right ) {
+TEST_F( threads, view_subview_3d_from_5d_right )
+{
TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
index d67bf3157..cf6428b18 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c11.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_3d_from_5d_right_atomic ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::Atomic> >();
+TEST_F( threads, view_subview_3d_from_5d_right_atomic )
+{
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::Atomic> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
index e8a2c825c..7060fdb27 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_SubView_c12.cpp
@@ -1,52 +1,53 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads, view_subview_3d_from_5d_right_randomaccess ) {
- TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads , Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
+TEST_F( threads, view_subview_3d_from_5d_right_randomaccess )
+{
+ TestViewSubview::test_3d_subview_5d_right< Kokkos::Threads, Kokkos::MemoryTraits<Kokkos::RandomAccess> >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp
index 4690be4d3..d802d6583 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_Team.cpp
@@ -1,122 +1,127 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads , team_tag )
+TEST_F( threads, team_tag )
{
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(0);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(0);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(0);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(0);
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 0 );
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 0 );
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 0 );
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 0 );
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(2);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(2);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(2);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(2);
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 2 );
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 2 );
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 2 );
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 2 );
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_for(1000);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >::test_reduce(1000);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_for(1000);
- TestTeamPolicy< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce(1000);
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_for( 1000 );
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >::test_reduce( 1000 );
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_for( 1000 );
+ TestTeamPolicy< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >::test_reduce( 1000 );
}
-TEST_F( threads , team_shared_request) {
- TestSharedTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
- TestSharedTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( threads, team_shared_request )
+{
+ TestSharedTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >();
+ TestSharedTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-TEST_F( threads, team_scratch_request) {
- TestScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
- TestScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( threads, team_scratch_request )
+{
+ TestScratchTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >();
+ TestScratchTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
-TEST_F( threads , team_lambda_shared_request) {
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
- TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
+#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
+TEST_F( threads, team_lambda_shared_request )
+{
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >();
+ TestLambdaSharedTeam< Kokkos::HostSpace, Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >();
}
#endif
-TEST_F( threads, shmem_size) {
+TEST_F( threads, shmem_size )
+{
TestShmemSize< Kokkos::Threads >();
}
-TEST_F( threads, multi_level_scratch) {
- TestMultiLevelScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Static> >();
- TestMultiLevelScratchTeam< Kokkos::Threads , Kokkos::Schedule<Kokkos::Dynamic> >();
+TEST_F( threads, multi_level_scratch )
+{
+ TestMultiLevelScratchTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Static> >();
+ TestMultiLevelScratchTeam< Kokkos::Threads, Kokkos::Schedule<Kokkos::Dynamic> >();
}
-TEST_F( threads , team_vector )
+TEST_F( threads, team_vector )
{
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(0) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(1) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(2) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(3) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(4) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(5) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(6) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(7) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(8) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(9) ) );
- ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >(10) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 0 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 1 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 2 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 3 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 4 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 5 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 6 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 7 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 8 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 9 ) ) );
+ ASSERT_TRUE( ( TestTeamVector::Test< Kokkos::Threads >( 10 ) ) );
}
#ifdef KOKKOS_COMPILER_GNU
#if ( KOKKOS_COMPILER_GNU == 472 )
#define SKIP_TEST
#endif
#endif
#ifndef SKIP_TEST
TEST_F( threads, triple_nested_parallelism )
{
- TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048 , 32 , 32 );
- TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048 , 32 , 16 );
- TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048 , 16 , 16 );
+ TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048, 32, 32 );
+ TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048, 32, 16 );
+ TestTripleNestedReduce< double, Kokkos::Threads >( 8192, 2048, 16, 16 );
}
#endif
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp
index 46a576b02..36eae2879 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_a.cpp
@@ -1,53 +1,54 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads , impl_view_mapping_a ) {
+TEST_F( threads, impl_view_mapping_a )
+{
test_view_mapping< Kokkos::Threads >();
test_view_mapping_operator< Kokkos::Threads >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp
index b5d6ac843..8c78d0944 100644
--- a/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp
+++ b/lib/kokkos/core/unit_test/threads/TestThreads_ViewAPI_b.cpp
@@ -1,121 +1,124 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
+
#include <threads/TestThreads.hpp>
namespace Test {
-TEST_F( threads , impl_shared_alloc ) {
- test_shared_alloc< Kokkos::HostSpace , Kokkos::Threads >();
+TEST_F( threads, impl_shared_alloc )
+{
+ test_shared_alloc< Kokkos::HostSpace, Kokkos::Threads >();
}
-TEST_F( threads , impl_view_mapping_b ) {
+TEST_F( threads, impl_view_mapping_b )
+{
test_view_mapping_subview< Kokkos::Threads >();
TestViewMappingAtomic< Kokkos::Threads >::run();
}
-TEST_F( threads, view_api) {
- TestViewAPI< double , Kokkos::Threads >();
+TEST_F( threads, view_api )
+{
+ TestViewAPI< double, Kokkos::Threads >();
}
-TEST_F( threads , view_nested_view )
+TEST_F( threads, view_nested_view )
{
::Test::view_nested_view< Kokkos::Threads >();
}
-
-
-TEST_F( threads , view_remap )
+TEST_F( threads, view_remap )
{
- enum { N0 = 3 , N1 = 2 , N2 = 8 , N3 = 9 };
-
- typedef Kokkos::View< double*[N1][N2][N3] ,
- Kokkos::LayoutRight ,
- Kokkos::Threads > output_type ;
-
- typedef Kokkos::View< int**[N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::Threads > input_type ;
-
- typedef Kokkos::View< int*[N0][N2][N3] ,
- Kokkos::LayoutLeft ,
- Kokkos::Threads > diff_type ;
-
- output_type output( "output" , N0 );
- input_type input ( "input" , N0 , N1 );
- diff_type diff ( "diff" , N0 );
-
- int value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- input(i0,i1,i2,i3) = ++value ;
- }}}}
-
- // Kokkos::deep_copy( diff , input ); // throw with incompatible shape
- Kokkos::deep_copy( output , input );
-
- value = 0 ;
- for ( size_t i3 = 0 ; i3 < N3 ; ++i3 ) {
- for ( size_t i2 = 0 ; i2 < N2 ; ++i2 ) {
- for ( size_t i1 = 0 ; i1 < N1 ; ++i1 ) {
- for ( size_t i0 = 0 ; i0 < N0 ; ++i0 ) {
- ++value ;
- ASSERT_EQ( value , ((int) output(i0,i1,i2,i3) ) );
- }}}}
+ enum { N0 = 3, N1 = 2, N2 = 8, N3 = 9 };
+
+ typedef Kokkos::View< double*[N1][N2][N3],
+ Kokkos::LayoutRight,
+ Kokkos::Threads > output_type;
+
+ typedef Kokkos::View< int**[N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::Threads > input_type;
+
+ typedef Kokkos::View< int*[N0][N2][N3],
+ Kokkos::LayoutLeft,
+ Kokkos::Threads > diff_type;
+
+ output_type output( "output", N0 );
+ input_type input ( "input", N0, N1 );
+ diff_type diff ( "diff", N0 );
+
+ int value = 0;
+
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i0 = 0; i0 < N0; ++i0 )
+ {
+ input( i0, i1, i2, i3 ) = ++value;
+ }
+
+ // Kokkos::deep_copy( diff, input ); // Throw with incompatible shape.
+ Kokkos::deep_copy( output, input );
+
+ value = 0;
+
+ for ( size_t i3 = 0; i3 < N3; ++i3 )
+ for ( size_t i2 = 0; i2 < N2; ++i2 )
+ for ( size_t i1 = 0; i1 < N1; ++i1 )
+ for ( size_t i0 = 0; i0 < N0; ++i0 )
+ {
+ ++value;
+ ASSERT_EQ( value, ( (int) output( i0, i1, i2, i3 ) ) );
+ }
}
-//----------------------------------------------------------------------------
-
-TEST_F( threads , view_aggregate )
+TEST_F( threads, view_aggregate )
{
TestViewAggregate< Kokkos::Threads >();
}
-TEST_F( threads , template_meta_functions )
+TEST_F( threads, template_meta_functions )
{
- TestTemplateMetaFunctions<int, Kokkos::Threads >();
+ TestTemplateMetaFunctions< int, Kokkos::Threads >();
}
-} // namespace test
-
+} // namespace Test
diff --git a/lib/kokkos/doc/design_notes_space_instances.md b/lib/kokkos/doc/design_notes_space_instances.md
index 487fa25bc..0124dfbc8 100644
--- a/lib/kokkos/doc/design_notes_space_instances.md
+++ b/lib/kokkos/doc/design_notes_space_instances.md
@@ -1,166 +1,131 @@
# Design Notes for Execution and Memory Space Instances
+## Objective
-## Execution Spaces
+ * Enable Kokkos interoperability with coarse-grain tasking models
+
+## Requirements
- * Work is *dispatched* to an execution space instance
+ * Backwards compatable with existing Kokkos API
+ * Support existing Host execution spaces (Serial, Threads, OpenMP, maybe Qthreads)
+ * Support DARMA threading model (may require a new Host execution space)
+ * Support Uintah threading model, i.e. indepentant worker threadpools working of of shared task queues
+
+
+## Execution Space
+ * Parallel work is *dispatched* on an execution space instance
+
+ * Execution space instances are conceptually disjoint/independant from each other
+
-
-## Host Associated Execution Space Instances
-
-Vocabulary and examples assuming C++11 Threads Support Library
+## Host Execution Space Instances
* A host-side *control* thread dispatches work to an instance
- * `this_thread` is the control thread
-
* `main` is the initial control thread
- * An execution space instance is a pool of threads
+ * A host execution space instance is an organized thread pool
- * All instances are disjoint thread pools
+ * All instances are disjoint, i.e. hardware resources are not shared between instances
* Exactly one control thread is associated with
an instance and only that control thread may
dispatch work to to that instance
- * A control thread may be a member of an instance,
- if so then it is also the control thread associated
- with that instance
+ * The control thread is a member of the instance
- * The pool of threads associated with an instances is not mutatable
+ * The pool of threads associated with an instances is not mutatable during that instance existance
* The pool of threads associated with an instance may be masked
- Allows work to be dispatched to a subset of the pool
- Example: only one hyperthread per core of the instance
- - When a mask is applied to an instance that mask
- remains until cleared or another mask is applied
-
- - Masking is portable by defining it as using a fraction
- of the available resources (threads)
-
- * Instances are shared (referenced counted) objects,
- just like `Kokkos::View`
-
-```
-struct StdThread {
- void mask( float fraction );
- void unmask() { mask( 1.0 ); }
-};
-```
-
-
-
-### Requesting an Execution Space Instance
-
- * `Space::request(` *who* `,` *what* `,` *control-opt* `)`
-
- * *who* is an identifier for subsquent queries regarding
- who requested each instance
-
- * *what* is the number of threads and how they should be placed
-
- - Placement within locality-topology hierarchy; e.g., HWLOC
-
- - Compact within a level of hierarchy, or striped across that level;
- e.g., socket or NUMA region
-
- - Granularity of request is core
-
- * *control-opt* optionally specifies whether the instance
- has a new control thread
-
- - *control-opt* includes a control function / closure
-
- - The new control thread is a member of the instance
-
- - The control function is called by the new control thread
- and is passed a `const` instance
-
- - The instance is **not** returned to the creating control thread
-
- * `std::thread` that is not a member of an instance is
- *hard blocked* on a `std::mutex`
-
- - One global mutex or one mutex per thread?
-
- * `std::thread` that is a member of an instance is
- *spinning* waiting for work, or are working
-
-```
-struct StdThread {
-
- struct Resource ;
-
- static StdThread request(); // default
+ - A mask can be applied during the policy creation of a parallel algorithm
+
+ - Masking is portable by defining it as ceiling of fraction between [0.0, 1.0]
+ of the available resources
- static StdThread request( const std::string & , const Resource & );
-
- // If the instance can be reserved then
- // allocate a copy of ControlClosure and invoke
- // ControlClosure::operator()( const StdThread intance ) const
- template< class ControlClosure >
- static bool request( const std::string & , const Resource &
- , const ControlClosure & );
-};
```
-
-### Relinquishing an Execution Space Instance
-
- * De-referencing the last reference-counted instance
- relinquishes the pool of threads
-
- * If a control thread was created for the instance then
- it is relinquished when that control thread returns
- from the control function
-
- - Requires the reference count to be zero, an error if not
-
- * No *forced* relinquish
-
-
-
-## CUDA Associated Execution Space Instances
-
- * Only a signle CUDA architecture
-
- * An instance is a device + stream
-
- * A stream is exclusive to an instance
-
- * Only a host-side control thread can dispatch work to an instance
-
- * Finite number of streams per device
-
- * ISSUE: How to use CUDA `const` memory with multiple streams?
-
- * Masking can be mapped to restricting the number of CUDA blocks
- to the fraction of available resources; e.g., maximum resident blocks
-
-
-### Requesting an Execution Space Instance
-
- * `Space::request(` *who* `,` *what* `)`
-
- * *who* is an identifier for subsquent queries regarding
- who requested each instance
-
- * *what* is which device, the stream is a requested/relinquished resource
-
+class ExecutionSpace {
+public:
+ using execution_space = ExecutionSpace;
+ using memory_space = ...;
+ using device_type = Kokkos::Device<execution_space, memory_space>;
+ using array_layout = ...;
+ using size_type = ...;
+ using scratch_memory_space = ...;
+
+
+ class Instance
+ {
+ int thread_pool_size( int depth = 0 );
+ ...
+ };
+
+ class InstanceRequest
+ {
+ public:
+ using Control = std::function< void( Instance * )>;
+
+ InstanceRequest( Control control
+ , unsigned thread_count
+ , unsigned use_numa_count = 0
+ , unsigned use_cores_per_numa = 0
+ );
+
+ };
+
+ static bool in_parallel();
+
+ static bool sleep();
+ static bool wake();
+
+ static void fence();
+
+ static void print_configuration( std::ostream &, const bool detailed = false );
+
+ static void initialize( unsigned thread_count = 0
+ , unsigned use_numa_count = 0
+ , unsigned use_cores_per_numa = 0
+ );
+
+ // Partition the current instance into the requested instances
+ // and run the given functions on the cooresponding instances
+ // will block until all the partitioned instances complete and
+ // the original instance will be restored
+ //
+ // Requires that the space has already been initialized
+ // Requires that the request can be statisfied by the current instance
+ // i.e. the sum of number of requested threads must be less than the
+ // max_hardware_threads
+ //
+ // Each control functor will accept a handle to its new default instance
+ // Each instance must be independant of all other instances
+ // i.e. no assumption on scheduling between instances
+ // The user is responible for checking the return code for errors
+ static int run_instances( std::vector< InstanceRequest> const& requests );
+
+ static void finalize();
+
+ static int is_initialized();
+
+ static int concurrency();
+
+ static int thread_pool_size( int depth = 0 );
+
+ static int thread_pool_rank();
+
+ static int max_hardware_threads();
+
+ static int hardware_thread_id();
+
+ };
```
-struct Cuda {
+
- struct Resource ;
-
- static Cuda request();
-
- static Cuda request( const std::string & , const Resource & );
-};
-```
diff --git a/lib/kokkos/example/md_skeleton/types.h b/lib/kokkos/example/md_skeleton/types.h
index 7f92b7cd0..c9689188a 100644
--- a/lib/kokkos/example/md_skeleton/types.h
+++ b/lib/kokkos/example/md_skeleton/types.h
@@ -1,118 +1,118 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#ifndef TYPES_H_
#define TYPES_H_
/* Determine default device type and necessary includes */
#include <Kokkos_Core.hpp>
typedef Kokkos::DefaultExecutionSpace execution_space ;
-#if ! defined( KOKKOS_HAVE_CUDA )
+#if ! defined( KOKKOS_ENABLE_CUDA )
struct double2 {
double x, y;
KOKKOS_INLINE_FUNCTION
double2(double xinit, double yinit) {
x = xinit;
y = yinit;
}
KOKKOS_INLINE_FUNCTION
double2() {
x = 0.0;
y = 0.0;
}
KOKKOS_INLINE_FUNCTION
double2& operator += (const double2& src) {
x+=src.x;
y+=src.y;
return *this;
}
KOKKOS_INLINE_FUNCTION
volatile double2& operator += (const volatile double2& src) volatile {
x+=src.x;
y+=src.y;
return *this;
}
};
#endif
#include <impl/Kokkos_Timer.hpp>
/* Define types used throughout the code */
//Position arrays
typedef Kokkos::View<double*[3], Kokkos::LayoutRight, execution_space> t_x_array ;
typedef t_x_array::HostMirror t_x_array_host ;
typedef Kokkos::View<const double*[3], Kokkos::LayoutRight, execution_space> t_x_array_const ;
typedef Kokkos::View<const double*[3], Kokkos::LayoutRight, execution_space, Kokkos::MemoryRandomAccess > t_x_array_randomread ;
//Force array
typedef Kokkos::View<double*[3], execution_space> t_f_array ;
//Neighborlist
typedef Kokkos::View<int**, execution_space > t_neighbors ;
typedef Kokkos::View<const int**, execution_space > t_neighbors_const ;
typedef Kokkos::View<int*, execution_space, Kokkos::MemoryUnmanaged > t_neighbors_sub ;
typedef Kokkos::View<const int*, execution_space, Kokkos::MemoryUnmanaged > t_neighbors_const_sub ;
//1d int array
typedef Kokkos::View<int*, execution_space > t_int_1d ;
typedef t_int_1d::HostMirror t_int_1d_host ;
typedef Kokkos::View<const int*, execution_space > t_int_1d_const ;
typedef Kokkos::View<int*, execution_space , Kokkos::MemoryUnmanaged> t_int_1d_um ;
typedef Kokkos::View<const int* , execution_space , Kokkos::MemoryUnmanaged> t_int_1d_const_um ;
//2d int array
typedef Kokkos::View<int**, Kokkos::LayoutRight, execution_space > t_int_2d ;
typedef t_int_2d::HostMirror t_int_2d_host ;
//Scalar ints
typedef Kokkos::View<int[1], Kokkos::LayoutLeft, execution_space> t_int_scalar ;
typedef t_int_scalar::HostMirror t_int_scalar_host ;
#endif /* TYPES_H_ */
diff --git a/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp b/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp
index 326d06410..249d44ab5 100644
--- a/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp
+++ b/lib/kokkos/example/tutorial/01_hello_world_lambda/hello_world_lambda.cpp
@@ -1,112 +1,112 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <cstdio>
#include <typeinfo>
//
// "Hello world" parallel_for example:
// 1. Start up Kokkos
// 2. Execute a parallel for loop in the default execution space,
// using a C++11 lambda to define the loop body
// 3. Shut down Kokkos
//
// This example only builds if C++11 is enabled. Compare this example
// to 01_hello_world, which uses functors (explicitly defined classes)
// to define the loop body of the parallel_for. Both functors and
// lambdas have their places.
//
int main (int argc, char* argv[]) {
// You must call initialize() before you may call Kokkos.
//
// With no arguments, this initializes the default execution space
// (and potentially its host execution space) with default
// parameters. You may also pass in argc and argv, analogously to
// MPI_Init(). It reads and removes command-line arguments that
// start with "--kokkos-".
Kokkos::initialize (argc, argv);
// Print the name of Kokkos' default execution space. We're using
// typeid here, so the name might get a bit mangled by the linker,
// but you should still be able to figure out what it is.
printf ("Hello World on Kokkos execution space %s\n",
typeid (Kokkos::DefaultExecutionSpace).name ());
// Run lambda on the default Kokkos execution space in parallel,
// with a parallel for loop count of 15. The lambda's argument is
// an integer which is the parallel for's loop index. As you learn
// about different kinds of parallelism, you will find out that
// there are other valid argument types as well.
//
// For a single level of parallelism, we prefer that you use the
// KOKKOS_LAMBDA macro. If CUDA is disabled, this just turns into
// [=]. That captures variables from the surrounding scope by
// value. Do NOT capture them by reference! If CUDA is enabled,
// this macro may have a special definition that makes the lambda
// work correctly with CUDA. Compare to the KOKKOS_INLINE_FUNCTION
// macro, which has a special meaning if CUDA is enabled.
//
// The following parallel_for would look like this if we were using
// OpenMP by itself, instead of Kokkos:
//
// #pragma omp parallel for
// for (int i = 0; i < 15; ++i) {
// printf ("Hello from i = %i\n", i);
// }
//
// You may notice that the printed numbers do not print out in
// order. Parallel for loops may execute in any order.
// We also need to protect the usage of a lambda against compiling
// with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
-#if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
+#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
Kokkos::parallel_for (15, KOKKOS_LAMBDA (const int i) {
// printf works in a CUDA parallel kernel; std::ostream does not.
printf ("Hello from i = %i\n", i);
});
#endif
// You must call finalize() after you are done using Kokkos.
Kokkos::finalize ();
}
diff --git a/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp b/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp
index 70eea4324..f7f467ad2 100644
--- a/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp
+++ b/lib/kokkos/example/tutorial/02_simple_reduce_lambda/simple_reduce_lambda.cpp
@@ -1,94 +1,94 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <cstdio>
//
// First reduction (parallel_reduce) example:
// 1. Start up Kokkos
// 2. Execute a parallel_reduce loop in the default execution space,
// using a C++11 lambda to define the loop body
// 3. Shut down Kokkos
//
// This example only builds if C++11 is enabled. Compare this example
// to 02_simple_reduce, which uses a functor to define the loop body
// of the parallel_reduce.
//
int main (int argc, char* argv[]) {
Kokkos::initialize (argc, argv);
const int n = 10;
// Compute the sum of squares of integers from 0 to n-1, in
// parallel, using Kokkos. This time, use a lambda instead of a
// functor. The lambda takes the same arguments as the functor's
// operator().
int sum = 0;
// The KOKKOS_LAMBDA macro replaces the capture-by-value clause [=].
// It also handles any other syntax needed for CUDA.
// We also need to protect the usage of a lambda against compiling
// with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
- #if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
+ #if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
Kokkos::parallel_reduce (n, KOKKOS_LAMBDA (const int i, int& lsum) {
lsum += i*i;
}, sum);
#endif
printf ("Sum of squares of integers from 0 to %i, "
"computed in parallel, is %i\n", n - 1, sum);
// Compare to a sequential loop.
int seqSum = 0;
for (int i = 0; i < n; ++i) {
seqSum += i*i;
}
printf ("Sum of squares of integers from 0 to %i, "
"computed sequentially, is %i\n", n - 1, seqSum);
Kokkos::finalize ();
-#if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
+#if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
return (sum == seqSum) ? 0 : -1;
#else
return 0;
#endif
}
diff --git a/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp b/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp
index dd0641be5..3450ad1bb 100644
--- a/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp
+++ b/lib/kokkos/example/tutorial/03_simple_view_lambda/simple_view_lambda.cpp
@@ -1,120 +1,120 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
//
// First Kokkos::View (multidimensional array) example:
// 1. Start up Kokkos
// 2. Allocate a Kokkos::View
// 3. Execute a parallel_for and a parallel_reduce over that View's data
// 4. Shut down Kokkos
//
// Compare this example to 03_simple_view, which uses functors to
// define the loop bodies of the parallel_for and parallel_reduce.
//
#include <Kokkos_Core.hpp>
#include <cstdio>
// A Kokkos::View is an array of zero or more dimensions. The number
// of dimensions is specified at compile time, as part of the type of
// the View. This array has two dimensions. The first one
// (represented by the asterisk) is a run-time dimension, and the
// second (represented by [3]) is a compile-time dimension. Thus,
// this View type is an N x 3 array of type double, where N is
// specified at run time in the View's constructor.
//
// The first dimension of the View is the dimension over which it is
// efficient for Kokkos to parallelize.
typedef Kokkos::View<double*[3]> view_type;
int main (int argc, char* argv[]) {
Kokkos::initialize (argc, argv);
// Allocate the View. The first dimension is a run-time parameter
// N. We set N = 10 here. The second dimension is a compile-time
// parameter, 3. We don't specify it here because we already set it
// by declaring the type of the View.
//
// Views get initialized to zero by default. This happens in
// parallel, using the View's memory space's default execution
// space. Parallel initialization ensures first-touch allocation.
// There is a way to shut off default initialization.
//
// You may NOT allocate a View inside of a parallel_{for, reduce,
// scan}. Treat View allocation as a "thread collective."
//
// The string "A" is just the label; it only matters for debugging.
// Different Views may have the same label.
view_type a ("A", 10);
// Fill the View with some data. The parallel_for loop will iterate
// over the View's first dimension N.
//
// Note that the View is passed by value into the lambda. The macro
// KOKKOS_LAMBDA includes the "capture by value" clause [=]. This
// tells the lambda to "capture all variables in the enclosing scope
// by value." Views have "view semantics"; they behave like
// pointers, not like std::vector. Passing them by value does a
// shallow copy. A deep copy never happens unless you explicitly
// ask for one.
// We also need to protect the usage of a lambda against compiling
// with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
- #if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
+ #if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
Kokkos::parallel_for (10, KOKKOS_LAMBDA (const int i) {
// Acesss the View just like a Fortran array. The layout depends
// on the View's memory space, so don't rely on the View's
// physical memory layout unless you know what you're doing.
a(i,0) = 1.0*i;
a(i,1) = 1.0*i*i;
a(i,2) = 1.0*i*i*i;
});
// Reduction functor that reads the View given to its constructor.
double sum = 0;
Kokkos::parallel_reduce (10, KOKKOS_LAMBDA (const int i, double& lsum) {
lsum += a(i,0)*a(i,1)/(a(i,2)+0.1);
}, sum);
printf ("Result: %f\n", sum);
#endif
Kokkos::finalize ();
}
diff --git a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp
index 216db7f12..9ea5e8b70 100644
--- a/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp
+++ b/lib/kokkos/example/tutorial/Hierarchical_Parallelism/01_thread_teams_lambda/thread_teams_lambda.cpp
@@ -1,97 +1,97 @@
/*
//@HEADER
// ************************************************************************
-//
+//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
-//
+//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
-//
+//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
-//
+//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <cstdio>
// Demonstrate a parallel reduction using thread teams (TeamPolicy).
//
// A thread team consists of 1 to n threads. The hardware determines
// the maxmimum value of n. On a dual-socket CPU machine with 8 cores
// per socket, the maximum size of a team is 8. The number of teams
// (the league_size) is not limited by physical constraints (up to
// some reasonable bound, which eventually depends upon the hardware
// and programming model implementation).
int main (int narg, char* args[]) {
using Kokkos::parallel_reduce;
typedef Kokkos::TeamPolicy<> team_policy;
typedef typename team_policy::member_type team_member;
Kokkos::initialize (narg, args);
// Set up a policy that launches 12 teams, with the maximum number
// of threads per team.
const team_policy policy (12, Kokkos::AUTO);
// This is a reduction with a team policy. The team policy changes
// the first argument of the lambda. Rather than an integer index
// (as with RangePolicy), it's now TeamPolicy::member_type. This
// object provides all information to identify a thread uniquely.
// It also provides some team-related function calls such as a team
// barrier (which a subsequent example will use).
//
// Every member of the team contributes to the total sum. It is
// helpful to think of the lambda's body as a "team parallel
// region." That is, every team member is active and will execute
// the body of the lambda.
int sum = 0;
// We also need to protect the usage of a lambda against compiling
// with a backend which doesn't support it (i.e. Cuda 6.5/7.0).
- #if (KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
+ #if defined(KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA)
parallel_reduce (policy, KOKKOS_LAMBDA (const team_member& thread, int& lsum) {
lsum += 1;
// TeamPolicy<>::member_type provides functions to query the
// multidimensional index of a thread, as well as the number of
// thread teams and the size of each team.
printf ("Hello World: %i %i // %i %i\n", thread.league_rank (),
thread.team_rank (), thread.league_size (), thread.team_size ());
}, sum);
#endif
// The result will be 12*team_policy::team_size_max([=]{})
printf ("Result %i\n",sum);
Kokkos::finalize ();
}
diff --git a/lib/kokkos/generate_makefile.bash b/lib/kokkos/generate_makefile.bash
index e7bd9da36..e671293ff 100755
--- a/lib/kokkos/generate_makefile.bash
+++ b/lib/kokkos/generate_makefile.bash
@@ -1,413 +1,437 @@
#!/bin/bash
KOKKOS_DEVICES=""
MAKE_J_OPTION="32"
while [[ $# > 0 ]]
do
-key="$1"
+ key="$1"
-case $key in
+ case $key in
--kokkos-path*)
- KOKKOS_PATH="${key#*=}"
- ;;
+ KOKKOS_PATH="${key#*=}"
+ ;;
+ --qthreads-path*)
+ QTHREADS_PATH="${key#*=}"
+ ;;
--prefix*)
- PREFIX="${key#*=}"
- ;;
+ PREFIX="${key#*=}"
+ ;;
--with-cuda)
- KOKKOS_DEVICES="${KOKKOS_DEVICES},Cuda"
- CUDA_PATH_NVCC=`which nvcc`
- CUDA_PATH=${CUDA_PATH_NVCC%/bin/nvcc}
- ;;
+ KOKKOS_DEVICES="${KOKKOS_DEVICES},Cuda"
+ CUDA_PATH_NVCC=`which nvcc`
+ CUDA_PATH=${CUDA_PATH_NVCC%/bin/nvcc}
+ ;;
# Catch this before '--with-cuda*'
--with-cuda-options*)
- KOKKOS_CUDA_OPT="${key#*=}"
- ;;
+ KOKKOS_CUDA_OPT="${key#*=}"
+ ;;
--with-cuda*)
- KOKKOS_DEVICES="${KOKKOS_DEVICES},Cuda"
- CUDA_PATH="${key#*=}"
- ;;
+ KOKKOS_DEVICES="${KOKKOS_DEVICES},Cuda"
+ CUDA_PATH="${key#*=}"
+ ;;
--with-openmp)
- KOKKOS_DEVICES="${KOKKOS_DEVICES},OpenMP"
- ;;
+ KOKKOS_DEVICES="${KOKKOS_DEVICES},OpenMP"
+ ;;
--with-pthread)
- KOKKOS_DEVICES="${KOKKOS_DEVICES},Pthread"
- ;;
+ KOKKOS_DEVICES="${KOKKOS_DEVICES},Pthread"
+ ;;
--with-serial)
- KOKKOS_DEVICES="${KOKKOS_DEVICES},Serial"
- ;;
- --with-qthread*)
- KOKKOS_DEVICES="${KOKKOS_DEVICES},Qthread"
- QTHREAD_PATH="${key#*=}"
- ;;
+ KOKKOS_DEVICES="${KOKKOS_DEVICES},Serial"
+ ;;
+ --with-qthreads*)
+ KOKKOS_DEVICES="${KOKKOS_DEVICES},Qthreads"
+ if [ -z "$QTHREADS_PATH" ]; then
+ QTHREADS_PATH="${key#*=}"
+ fi
+ ;;
--with-devices*)
- DEVICES="${key#*=}"
- KOKKOS_DEVICES="${KOKKOS_DEVICES},${DEVICES}"
- ;;
+ DEVICES="${key#*=}"
+ KOKKOS_DEVICES="${KOKKOS_DEVICES},${DEVICES}"
+ ;;
--with-gtest*)
- GTEST_PATH="${key#*=}"
- ;;
+ GTEST_PATH="${key#*=}"
+ ;;
--with-hwloc*)
- HWLOC_PATH="${key#*=}"
- ;;
+ HWLOC_PATH="${key#*=}"
+ ;;
--arch*)
- KOKKOS_ARCH="${key#*=}"
- ;;
+ KOKKOS_ARCH="${key#*=}"
+ ;;
--cxxflags*)
- CXXFLAGS="${key#*=}"
- ;;
+ CXXFLAGS="${key#*=}"
+ ;;
--ldflags*)
- LDFLAGS="${key#*=}"
- ;;
+ LDFLAGS="${key#*=}"
+ ;;
--debug|-dbg)
- KOKKOS_DEBUG=yes
- ;;
+ KOKKOS_DEBUG=yes
+ ;;
--make-j*)
- MAKE_J_OPTION="${key#*=}"
- ;;
+ MAKE_J_OPTION="${key#*=}"
+ ;;
--compiler*)
- COMPILER="${key#*=}"
- CNUM=`which ${COMPILER} 2>&1 >/dev/null | grep "no ${COMPILER}" | wc -l`
- if [ ${CNUM} -gt 0 ]; then
- echo "Invalid compiler by --compiler command: '${COMPILER}'"
- exit
- fi
- if [[ ! -n ${COMPILER} ]]; then
- echo "Empty compiler specified by --compiler command."
- exit
- fi
- CNUM=`which ${COMPILER} | grep ${COMPILER} | wc -l`
- if [ ${CNUM} -eq 0 ]; then
- echo "Invalid compiler by --compiler command: '${COMPILER}'"
- exit
- fi
- ;;
- --with-options*)
- KOKKOS_OPT="${key#*=}"
- ;;
+ COMPILER="${key#*=}"
+ CNUM=`which ${COMPILER} 2>&1 >/dev/null | grep "no ${COMPILER}" | wc -l`
+ if [ ${CNUM} -gt 0 ]; then
+ echo "Invalid compiler by --compiler command: '${COMPILER}'"
+ exit
+ fi
+ if [[ ! -n ${COMPILER} ]]; then
+ echo "Empty compiler specified by --compiler command."
+ exit
+ fi
+ CNUM=`which ${COMPILER} | grep ${COMPILER} | wc -l`
+ if [ ${CNUM} -eq 0 ]; then
+ echo "Invalid compiler by --compiler command: '${COMPILER}'"
+ exit
+ fi
+ ;;
+ --with-options*)
+ KOKKOS_OPT="${key#*=}"
+ ;;
--help)
- echo "Kokkos configure options:"
- echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory"
- echo "--prefix=/Install/Path: Path to where the Kokkos library should be installed"
- echo ""
- echo "--with-cuda[=/Path/To/Cuda]: enable Cuda and set path to Cuda Toolkit"
- echo "--with-openmp: enable OpenMP backend"
- echo "--with-pthread: enable Pthreads backend"
- echo "--with-serial: enable Serial backend"
- echo "--with-qthread=/Path/To/Qthread: enable Qthread backend"
- echo "--with-devices: explicitly add a set of backends"
- echo ""
- echo "--arch=[OPTIONS]: set target architectures. Options are:"
- echo " ARMv80 = ARMv8.0 Compatible CPU"
- echo " ARMv81 = ARMv8.1 Compatible CPU"
- echo " ARMv8-ThunderX = ARMv8 Cavium ThunderX CPU"
- echo " SNB = Intel Sandy/Ivy Bridge CPUs"
- echo " HSW = Intel Haswell CPUs"
- echo " BDW = Intel Broadwell Xeon E-class CPUs"
- echo " SKX = Intel Sky Lake Xeon E-class HPC CPUs (AVX512)"
- echo " KNC = Intel Knights Corner Xeon Phi"
- echo " KNL = Intel Knights Landing Xeon Phi"
- echo " Kepler30 = NVIDIA Kepler generation CC 3.0"
- echo " Kepler35 = NVIDIA Kepler generation CC 3.5"
- echo " Kepler37 = NVIDIA Kepler generation CC 3.7"
- echo " Pascal60 = NVIDIA Pascal generation CC 6.0"
- echo " Pascal61 = NVIDIA Pascal generation CC 6.1"
- echo " Maxwell50 = NVIDIA Maxwell generation CC 5.0"
- echo " Power8 = IBM POWER8 CPUs"
- echo " Power9 = IBM POWER9 CPUs"
- echo ""
- echo "--compiler=/Path/To/Compiler set the compiler"
- echo "--debug,-dbg: enable Debugging"
- echo "--cxxflags=[FLAGS] overwrite CXXFLAGS for library build and test build"
- echo " This will still set certain required flags via"
- echo " KOKKOS_CXXFLAGS (such as -fopenmp, --std=c++11, etc.)"
- echo "--ldflags=[FLAGS] overwrite LDFLAGS for library build and test build"
- echo " This will still set certain required flags via"
- echo " KOKKOS_LDFLAGS (such as -fopenmp, -lpthread, etc.)"
- echo "--with-gtest=/Path/To/Gtest: set path to gtest (used in unit and performance tests"
- echo "--with-hwloc=/Path/To/Hwloc: set path to hwloc"
- echo "--with-options=[OPTIONS]: additional options to Kokkos:"
- echo " aggressive_vectorization = add ivdep on loops"
- echo "--with-cuda-options=[OPT]: additional options to CUDA:"
- echo " force_uvm, use_ldg, enable_lambda, rdc"
- echo "--make-j=[NUM]: set -j flag used during build."
- exit 0
- ;;
+ echo "Kokkos configure options:"
+ echo "--kokkos-path=/Path/To/Kokkos: Path to the Kokkos root directory."
+ echo "--qthreads-path=/Path/To/Qthreads: Path to Qthreads install directory."
+ echo " Overrides path given by --with-qthreads."
+ echo "--prefix=/Install/Path: Path to install the Kokkos library."
+ echo ""
+ echo "--with-cuda[=/Path/To/Cuda]: Enable Cuda and set path to Cuda Toolkit."
+ echo "--with-openmp: Enable OpenMP backend."
+ echo "--with-pthread: Enable Pthreads backend."
+ echo "--with-serial: Enable Serial backend."
+ echo "--with-qthreads[=/Path/To/Qthreads]: Enable Qthreads backend."
+ echo "--with-devices: Explicitly add a set of backends."
+ echo ""
+ echo "--arch=[OPT]: Set target architectures. Options are:"
+ echo " ARMv80 = ARMv8.0 Compatible CPU"
+ echo " ARMv81 = ARMv8.1 Compatible CPU"
+ echo " ARMv8-ThunderX = ARMv8 Cavium ThunderX CPU"
+ echo " SNB = Intel Sandy/Ivy Bridge CPUs"
+ echo " HSW = Intel Haswell CPUs"
+ echo " BDW = Intel Broadwell Xeon E-class CPUs"
+ echo " SKX = Intel Sky Lake Xeon E-class HPC CPUs (AVX512)"
+ echo " KNC = Intel Knights Corner Xeon Phi"
+ echo " KNL = Intel Knights Landing Xeon Phi"
+ echo " Kepler30 = NVIDIA Kepler generation CC 3.0"
+ echo " Kepler35 = NVIDIA Kepler generation CC 3.5"
+ echo " Kepler37 = NVIDIA Kepler generation CC 3.7"
+ echo " Pascal60 = NVIDIA Pascal generation CC 6.0"
+ echo " Pascal61 = NVIDIA Pascal generation CC 6.1"
+ echo " Maxwell50 = NVIDIA Maxwell generation CC 5.0"
+ echo " Power8 = IBM POWER8 CPUs"
+ echo " Power9 = IBM POWER9 CPUs"
+ echo ""
+ echo "--compiler=/Path/To/Compiler Set the compiler."
+ echo "--debug,-dbg: Enable Debugging."
+ echo "--cxxflags=[FLAGS] Overwrite CXXFLAGS for library build and test"
+ echo " build. This will still set certain required"
+ echo " flags via KOKKOS_CXXFLAGS (such as -fopenmp,"
+ echo " --std=c++11, etc.)."
+ echo "--ldflags=[FLAGS] Overwrite LDFLAGS for library build and test"
+ echo " build. This will still set certain required"
+ echo " flags via KOKKOS_LDFLAGS (such as -fopenmp,"
+ echo " -lpthread, etc.)."
+ echo "--with-gtest=/Path/To/Gtest: Set path to gtest. (Used in unit and performance"
+ echo " tests.)"
+ echo "--with-hwloc=/Path/To/Hwloc: Set path to hwloc."
+ echo "--with-options=[OPT]: Additional options to Kokkos:"
+ echo " aggressive_vectorization = add ivdep on loops"
+ echo "--with-cuda-options=[OPT]: Additional options to CUDA:"
+ echo " force_uvm, use_ldg, enable_lambda, rdc"
+ echo "--make-j=[NUM]: Set -j flag used during build."
+ exit 0
+ ;;
*)
- echo "warning: ignoring unknown option $key"
- ;;
-esac
-shift
+ echo "warning: ignoring unknown option $key"
+ ;;
+ esac
+
+ shift
done
-# If KOKKOS_PATH undefined, assume parent dir of this
-# script is the KOKKOS_PATH
+# Remove leading ',' from KOKKOS_DEVICES.
+KOKKOS_DEVICES=$(echo $KOKKOS_DEVICES | sed 's/^,//')
+
+# If KOKKOS_PATH undefined, assume parent dir of this script is the KOKKOS_PATH.
if [ -z "$KOKKOS_PATH" ]; then
- KOKKOS_PATH=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
+ KOKKOS_PATH=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
else
- # Ensure KOKKOS_PATH is abs path
- KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
+ # Ensure KOKKOS_PATH is abs path
+ KOKKOS_PATH=$( cd $KOKKOS_PATH && pwd )
fi
if [ "${KOKKOS_PATH}" = "${PWD}" ] || [ "${KOKKOS_PATH}" = "${PWD}/" ]; then
-echo "Running generate_makefile.sh in the Kokkos root directory is not allowed"
-exit
+ echo "Running generate_makefile.sh in the Kokkos root directory is not allowed"
+ exit
fi
KOKKOS_SRC_PATH=${KOKKOS_PATH}
KOKKOS_SETTINGS="KOKKOS_SRC_PATH=${KOKKOS_SRC_PATH}"
#KOKKOS_SETTINGS="KOKKOS_PATH=${KOKKOS_PATH}"
if [ ${#COMPILER} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CXX=${COMPILER}"
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CXX=${COMPILER}"
fi
+
if [ ${#KOKKOS_DEVICES} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_DEVICES=${KOKKOS_DEVICES}"
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_DEVICES=${KOKKOS_DEVICES}"
fi
+
if [ ${#KOKKOS_ARCH} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_ARCH=${KOKKOS_ARCH}"
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_ARCH=${KOKKOS_ARCH}"
fi
+
if [ ${#KOKKOS_DEBUG} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_DEBUG=${KOKKOS_DEBUG}"
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_DEBUG=${KOKKOS_DEBUG}"
fi
+
if [ ${#CUDA_PATH} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CUDA_PATH=${CUDA_PATH}"
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CUDA_PATH=${CUDA_PATH}"
fi
+
if [ ${#CXXFLAGS} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CXXFLAGS=\"${CXXFLAGS}\""
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} CXXFLAGS=\"${CXXFLAGS}\""
fi
+
if [ ${#LDFLAGS} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} LDFLAGS=\"${LDFLAGS}\""
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} LDFLAGS=\"${LDFLAGS}\""
fi
+
if [ ${#GTEST_PATH} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} GTEST_PATH=${GTEST_PATH}"
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} GTEST_PATH=${GTEST_PATH}"
else
-GTEST_PATH=${KOKKOS_PATH}/tpls/gtest
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} GTEST_PATH=${GTEST_PATH}"
+ GTEST_PATH=${KOKKOS_PATH}/tpls/gtest
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} GTEST_PATH=${GTEST_PATH}"
fi
+
if [ ${#HWLOC_PATH} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} HWLOC_PATH=${HWLOC_PATH} KOKKOS_USE_TPLS=hwloc"
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} HWLOC_PATH=${HWLOC_PATH} KOKKOS_USE_TPLS=hwloc"
fi
-if [ ${#QTHREAD_PATH} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} QTHREAD_PATH=${QTHREAD_PATH}"
+
+if [ ${#QTHREADS_PATH} -gt 0 ]; then
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} QTHREADS_PATH=${QTHREADS_PATH}"
fi
+
if [ ${#KOKKOS_OPT} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_OPTIONS=${KOKKOS_OPT}"
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_OPTIONS=${KOKKOS_OPT}"
fi
+
if [ ${#KOKKOS_CUDA_OPT} -gt 0 ]; then
-KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_CUDA_OPTIONS=${KOKKOS_CUDA_OPT}"
+ KOKKOS_SETTINGS="${KOKKOS_SETTINGS} KOKKOS_CUDA_OPTIONS=${KOKKOS_CUDA_OPT}"
fi
KOKKOS_SETTINGS_NO_KOKKOS_PATH="${KOKKOS_SETTINGS}"
KOKKOS_TEST_INSTALL_PATH="${PWD}/install"
if [ ${#PREFIX} -gt 0 ]; then
-KOKKOS_INSTALL_PATH="${PREFIX}"
+ KOKKOS_INSTALL_PATH="${PREFIX}"
else
-KOKKOS_INSTALL_PATH=${KOKKOS_TEST_INSTALL_PATH}
+ KOKKOS_INSTALL_PATH=${KOKKOS_TEST_INSTALL_PATH}
fi
mkdir install
echo "#Makefile to satisfy existens of target kokkos-clean before installing the library" > install/Makefile.kokkos
echo "kokkos-clean:" >> install/Makefile.kokkos
echo "" >> install/Makefile.kokkos
mkdir core
mkdir core/unit_test
mkdir core/perf_test
mkdir containers
mkdir containers/unit_tests
mkdir containers/performance_tests
mkdir algorithms
mkdir algorithms/unit_tests
mkdir algorithms/performance_tests
mkdir example
mkdir example/fixture
mkdir example/feint
mkdir example/fenl
mkdir example/tutorial
if [ ${#KOKKOS_ENABLE_EXAMPLE_ICHOL} -gt 0 ]; then
-mkdir example/ichol
+ mkdir example/ichol
fi
KOKKOS_SETTINGS="${KOKKOS_SETTINGS_NO_KOKKOS_PATH} KOKKOS_PATH=${KOKKOS_PATH}"
# Generate subdirectory makefiles.
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > core/unit_test/Makefile
echo "" >> core/unit_test/Makefile
echo "all:" >> core/unit_test/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_SETTINGS}" >> core/unit_test/Makefile
echo "" >> core/unit_test/Makefile
echo "test: all" >> core/unit_test/Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_SETTINGS} test" >> core/unit_test/Makefile
echo "" >> core/unit_test/Makefile
echo "clean:" >> core/unit_test/Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/core/unit_test/Makefile ${KOKKOS_SETTINGS} clean" >> core/unit_test/Makefile
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > core/perf_test/Makefile
echo "" >> core/perf_test/Makefile
echo "all:" >> core/perf_test/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_SETTINGS}" >> core/perf_test/Makefile
echo "" >> core/perf_test/Makefile
echo "test: all" >> core/perf_test/Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_SETTINGS} test" >> core/perf_test/Makefile
echo "" >> core/perf_test/Makefile
echo "clean:" >> core/perf_test/Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/core/perf_test/Makefile ${KOKKOS_SETTINGS} clean" >> core/perf_test/Makefile
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > containers/unit_tests/Makefile
echo "" >> containers/unit_tests/Makefile
echo "all:" >> containers/unit_tests/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_SETTINGS}" >> containers/unit_tests/Makefile
echo "" >> containers/unit_tests/Makefile
echo "test: all" >> containers/unit_tests/Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_SETTINGS} test" >> containers/unit_tests/Makefile
echo "" >> containers/unit_tests/Makefile
echo "clean:" >> containers/unit_tests/Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/containers/unit_tests/Makefile ${KOKKOS_SETTINGS} clean" >> containers/unit_tests/Makefile
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > containers/performance_tests/Makefile
echo "" >> containers/performance_tests/Makefile
echo "all:" >> containers/performance_tests/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_SETTINGS}" >> containers/performance_tests/Makefile
echo "" >> containers/performance_tests/Makefile
echo "test: all" >> containers/performance_tests/Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_SETTINGS} test" >> containers/performance_tests/Makefile
echo "" >> containers/performance_tests/Makefile
echo "clean:" >> containers/performance_tests/Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/containers/performance_tests/Makefile ${KOKKOS_SETTINGS} clean" >> containers/performance_tests/Makefile
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > algorithms/unit_tests/Makefile
echo "" >> algorithms/unit_tests/Makefile
echo "all:" >> algorithms/unit_tests/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_SETTINGS}" >> algorithms/unit_tests/Makefile
echo "" >> algorithms/unit_tests/Makefile
echo "test: all" >> algorithms/unit_tests/Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_SETTINGS} test" >> algorithms/unit_tests/Makefile
echo "" >> algorithms/unit_tests/Makefile
echo "clean:" >> algorithms/unit_tests/Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/algorithms/unit_tests/Makefile ${KOKKOS_SETTINGS} clean" >> algorithms/unit_tests/Makefile
KOKKOS_SETTINGS="${KOKKOS_SETTINGS_NO_KOKKOS_PATH} KOKKOS_PATH=${KOKKOS_TEST_INSTALL_PATH}"
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/fixture/Makefile
echo "" >> example/fixture/Makefile
echo "all:" >> example/fixture/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_SETTINGS}" >> example/fixture/Makefile
echo "" >> example/fixture/Makefile
echo "test: all" >> example/fixture/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_SETTINGS} test" >> example/fixture/Makefile
echo "" >> example/fixture/Makefile
echo "clean:" >> example/fixture/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fixture/Makefile ${KOKKOS_SETTINGS} clean" >> example/fixture/Makefile
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/feint/Makefile
echo "" >> example/feint/Makefile
echo "all:" >> example/feint/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_SETTINGS}" >> example/feint/Makefile
echo "" >> example/feint/Makefile
echo "test: all" >> example/feint/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_SETTINGS} test" >> example/feint/Makefile
echo "" >> example/feint/Makefile
echo "clean:" >> example/feint/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/feint/Makefile ${KOKKOS_SETTINGS} clean" >> example/feint/Makefile
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/fenl/Makefile
echo "" >> example/fenl/Makefile
echo "all:" >> example/fenl/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_SETTINGS}" >> example/fenl/Makefile
echo "" >> example/fenl/Makefile
echo "test: all" >> example/fenl/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_SETTINGS} test" >> example/fenl/Makefile
echo "" >> example/fenl/Makefile
echo "clean:" >> example/fenl/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/fenl/Makefile ${KOKKOS_SETTINGS} clean" >> example/fenl/Makefile
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/tutorial/Makefile
echo "" >> example/tutorial/Makefile
echo "build:" >> example/tutorial/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/tutorial/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}' KOKKOS_PATH=${KOKKOS_PATH} build">> example/tutorial/Makefile
echo "" >> example/tutorial/Makefile
echo "test: build" >> example/tutorial/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/tutorial/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}' KOKKOS_PATH=${KOKKOS_PATH} test" >> example/tutorial/Makefile
echo "" >> example/tutorial/Makefile
echo "clean:" >> example/tutorial/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/tutorial/Makefile KOKKOS_SETTINGS='${KOKKOS_SETTINGS}' KOKKOS_PATH=${KOKKOS_PATH} clean" >> example/tutorial/Makefile
if [ ${#KOKKOS_ENABLE_EXAMPLE_ICHOL} -gt 0 ]; then
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > example/ichol/Makefile
echo "" >> example/ichol/Makefile
echo "all:" >> example/ichol/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_SETTINGS}" >> example/ichol/Makefile
echo "" >> example/ichol/Makefile
echo "test: all" >> example/ichol/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_SETTINGS} test" >> example/ichol/Makefile
echo "" >> example/ichol/Makefile
echo "clean:" >> example/ichol/Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/example/ichol/Makefile ${KOKKOS_SETTINGS} clean" >> example/ichol/Makefile
fi
KOKKOS_SETTINGS="${KOKKOS_SETTINGS_NO_KOKKOS_PATH} KOKKOS_PATH=${KOKKOS_PATH}"
# Generate top level directory makefile.
echo "Generating Makefiles with options " ${KOKKOS_SETTINGS}
echo "KOKKOS_SETTINGS=${KOKKOS_SETTINGS}" > Makefile
echo "" >> Makefile
echo "kokkoslib:" >> Makefile
echo -e "\tcd core; \\" >> Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_INSTALL_PATH} build-lib" >> Makefile
echo "" >> Makefile
echo "install: kokkoslib" >> Makefile
echo -e "\tcd core; \\" >> Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_INSTALL_PATH} install" >> Makefile
echo "" >> Makefile
echo "kokkoslib-test:" >> Makefile
echo -e "\tcd core; \\" >> Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_TEST_INSTALL_PATH} build-lib" >> Makefile
echo "" >> Makefile
echo "install-test: kokkoslib-test" >> Makefile
echo -e "\tcd core; \\" >> Makefile
echo -e "\tmake -j ${MAKE_J_OPTION} -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} PREFIX=${KOKKOS_TEST_INSTALL_PATH} install" >> Makefile
echo "" >> Makefile
echo "build-test: install-test" >> Makefile
echo -e "\tmake -C core/unit_test" >> Makefile
echo -e "\tmake -C core/perf_test" >> Makefile
echo -e "\tmake -C containers/unit_tests" >> Makefile
echo -e "\tmake -C containers/performance_tests" >> Makefile
echo -e "\tmake -C algorithms/unit_tests" >> Makefile
echo -e "\tmake -C example/fixture" >> Makefile
echo -e "\tmake -C example/feint" >> Makefile
echo -e "\tmake -C example/fenl" >> Makefile
echo -e "\tmake -C example/tutorial build" >> Makefile
echo "" >> Makefile
echo "test: build-test" >> Makefile
echo -e "\tmake -C core/unit_test test" >> Makefile
echo -e "\tmake -C core/perf_test test" >> Makefile
echo -e "\tmake -C containers/unit_tests test" >> Makefile
echo -e "\tmake -C containers/performance_tests test" >> Makefile
echo -e "\tmake -C algorithms/unit_tests test" >> Makefile
echo -e "\tmake -C example/fixture test" >> Makefile
echo -e "\tmake -C example/feint test" >> Makefile
echo -e "\tmake -C example/fenl test" >> Makefile
echo -e "\tmake -C example/tutorial test" >> Makefile
echo "" >> Makefile
echo "unit-tests-only:" >> Makefile
echo -e "\tmake -C core/unit_test test" >> Makefile
echo -e "\tmake -C containers/unit_tests test" >> Makefile
echo -e "\tmake -C algorithms/unit_tests test" >> Makefile
echo "" >> Makefile
echo "clean:" >> Makefile
echo -e "\tmake -C core/unit_test clean" >> Makefile
echo -e "\tmake -C core/perf_test clean" >> Makefile
echo -e "\tmake -C containers/unit_tests clean" >> Makefile
echo -e "\tmake -C containers/performance_tests clean" >> Makefile
echo -e "\tmake -C algorithms/unit_tests clean" >> Makefile
echo -e "\tmake -C example/fixture clean" >> Makefile
echo -e "\tmake -C example/feint clean" >> Makefile
echo -e "\tmake -C example/fenl clean" >> Makefile
echo -e "\tmake -C example/tutorial clean" >> Makefile
echo -e "\tcd core; \\" >> Makefile
echo -e "\tmake -f ${KOKKOS_PATH}/core/src/Makefile ${KOKKOS_SETTINGS} clean" >> Makefile
diff --git a/lib/linalg/Install.py b/lib/linalg/Install.py
new file mode 100644
index 000000000..c7076ca52
--- /dev/null
+++ b/lib/linalg/Install.py
@@ -0,0 +1,52 @@
+#!/usr/bin/env python
+
+# install.py tool to do build of the linear algebra library
+# used to automate the steps described in the README file in this dir
+
+import sys,commands,os
+
+# help message
+
+help = """
+Syntax: python Install.py -m machine
+ -m = peform a clean followed by "make -f Makefile.machine"
+ machine = suffix of a lib/Makefile.* file
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+machine = None
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-m":
+ if iarg+2 > nargs: error()
+ machine = args[iarg+1]
+ iarg += 2
+ else: error()
+
+# set lib from working dir
+
+cwd = os.getcwd()
+lib = os.path.basename(cwd)
+
+# make the library
+
+print "Building lib%s.a ..." % lib
+cmd = "make -f Makefile.%s clean; make -f Makefile.%s" % (machine,machine)
+txt = commands.getoutput(cmd)
+print txt
+
+if os.path.exists("lib%s.a" % lib): print "Build was successful"
+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
diff --git a/lib/linalg/README b/lib/linalg/README
index 20f3ff094..725df86c4 100644
--- a/lib/linalg/README
+++ b/lib/linalg/README
@@ -1,23 +1,29 @@
This directory has BLAS and LAPACK files needed by the USER-ATC and
USER-AWPMD packages, and possibly by other packages in the future.
Note that this is an *incomplete* subset of full BLAS/LAPACK.
-You should only need to build and use the resulting library in this
-directory if you want to build LAMMPS with the USER-ATC and/or
-USER-AWPMD packages AND you do not have any other suitable BLAS and
-LAPACK libraries installed on your system. E.g. ATLAS, GOTO-BLAS,
-OpenBLAS, ACML, or MKL.
+You should only need to build and use the library in this directory if
+you want to build LAMMPS with the USER-ATC and/or USER-AWPMD packages
+AND you do not have any other suitable BLAS and LAPACK libraries
+installed on your system. E.g. ATLAS, GOTO-BLAS, OpenBLAS, ACML, or
+MKL.
+
+You can type "make lib-linalg" from the src directory to see help on
+how to build this library via make commands, or you can do the same
+thing by typing "python Install.py" from within this directory, or you
+can do it manually by following the instructions below.
Build the library using one of the provided Makefile.* files or create
your own, specific to your compiler and system. For example:
make -f Makefile.gfortran
When you are done building this library, one file should exist in this
directory:
liblinalg.a the library LAMMPS will link against
You can then include this library and its path in the Makefile.lammps
-file of any packages that need it, e.g. in lib/atc/Makefile.lammps.
+file of any packages that need it. As an example, see the
+lib/atc/Makefile.lammps.linalg file.
diff --git a/lib/meam/Install.py b/lib/meam/Install.py
new file mode 100644
index 000000000..18b426f92
--- /dev/null
+++ b/lib/meam/Install.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+
+# install.py tool to do a generic build of a library
+# soft linked to by many of the lib/Install.py files
+# used to automate the steps described in the corresponding lib/README
+
+import sys,commands,os
+
+# help message
+
+help = """
+Syntax: python Install.py -m machine -e suffix
+ specify -m and optionally -e, order does not matter
+ -m = peform a clean followed by "make -f Makefile.machine"
+ machine = suffix of a lib/Makefile.* file
+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
+ does not alter existing Makefile.machine
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+machine = None
+extraflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-m":
+ if iarg+2 > nargs: error()
+ machine = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-e":
+ if iarg+2 > nargs: error()
+ extraflag = 1
+ suffix = args[iarg+1]
+ iarg += 2
+ else: error()
+
+# set lib from working dir
+
+cwd = os.getcwd()
+lib = os.path.basename(cwd)
+
+# create Makefile.auto as copy of Makefile.machine
+# reset EXTRAMAKE if requested
+
+if not os.path.exists("Makefile.%s" % machine):
+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
+
+lines = open("Makefile.%s" % machine,'r').readlines()
+fp = open("Makefile.auto",'w')
+
+for line in lines:
+ words = line.split()
+ if len(words) == 3 and extraflag and \
+ words[0] == "EXTRAMAKE" and words[1] == '=':
+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
+ print >>fp,line,
+
+fp.close()
+
+# make the library via Makefile.auto
+
+print "Building lib%s.a ..." % lib
+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
+txt = commands.getoutput(cmd)
+print txt
+
+if os.path.exists("lib%s.a" % lib): print "Build was successful"
+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
+if not os.path.exists("Makefile.lammps"):
+ print "lib/%s/Makefile.lammps was NOT created" % lib
diff --git a/lib/meam/README b/lib/meam/README
index 436259ee8..b3111c131 100644
--- a/lib/meam/README
+++ b/lib/meam/README
@@ -1,46 +1,51 @@
MEAM (modified embedded atom method) library
Greg Wagner, Sandia National Labs
gjwagne at sandia.gov
Jan 2007
This library is in implementation of the MEAM potential, specifically
designed to work with LAMMPS.
-------------------------------------------------
This directory has source files to build a library that LAMMPS
links against when using the MEAM package.
This library must be built with a F90 compiler, before LAMMPS is
built, so LAMMPS can link against it.
+You can type "make lib-meam" from the src directory to see help on how
+to build this library via make commands, or you can do the same thing
+by typing "python Install.py" from within this directory, or you can
+do it manually by following the instructions below.
+
Build the library using one of the provided Makefile.* files or create
your own, specific to your compiler and system. For example:
make -f Makefile.gfortran
When you are done building this library, two files should
exist in this directory:
libmeam.a the library LAMMPS will link against
Makefile.lammps settings the LAMMPS Makefile will import
Makefile.lammps is created by the make command, by copying one of the
Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
Makefile.* files.
IMPORTANT: You must examine the final Makefile.lammps to insure it is
correct for your system, else the LAMMPS build will likely fail.
Makefile.lammps has settings for 3 variables:
user-meam_SYSINC = leave blank for this package
user-meam_SYSLIB = auxiliary F90 libs needed to link a F90 lib with
a C++ program (LAMMPS) via a C++ compiler
user-meam_SYSPATH = path(s) to where those libraries are
Because you have a F90 compiler on your system, you should have these
libraries. But you will have to figure out which ones are needed and
where they are. Examples of common configurations are in the
Makefile.lammps.* files.
diff --git a/lib/molfile/Makefile.lammps b/lib/molfile/Makefile.lammps
index 08118991a..a181f48ae 100644
--- a/lib/molfile/Makefile.lammps
+++ b/lib/molfile/Makefile.lammps
@@ -1,37 +1,43 @@
# This file contains the hooks to build and link LAMMPS with the VMD
# molfile plugins described here:
#
# http://www.ks.uiuc.edu/Research/vmd/plugins/molfile
#
# When you build LAMMPS with the USER-MOLFILE package installed, it will
# use the 3 settings in this file. They should be set as follows.
#
+# The molfile_SYSINC setting is to point to the folder with the VMD
+# plugin headers. By default it points to bundled headers in this folder
+#
# The molfile_SYSLIB setting is for a system dynamic loading library
# that will be used to load the molfile plugins. It contains functions
# like dlopen(), dlsym() and so on for dynamic linking of executable
# code into an executable. For Linux and most current Unix-like
# operating systems, the setting of "-ldl" will work. On some platforms
# you may need "-ldld". For compilation on Windows, a different
# mechanism is used that is part of the Windows programming environment
# and thus molfile_SYSLIB can be left blank.
#
# The molfile_SYSINC and molfile_SYSPATH variables do not typically need
# to be set. If the dl library is not in a place the linker can find
# it, specify its directory via the molfile_SYSPATH variable, e.g.
# -Ldir.
# -----------------------------------------------------------
# Settings that the LAMMPS build will import when this package is installed
-molfile_SYSINC =
+# change this to -I/path/to/your/lib/vmd/plugins/include if the bundled
+# header files are incompatible with your VMD plugsins
+molfile_SYSINC =-I../../lib/molfile
+#
ifneq ($(LIBOBJDIR),/Obj_mingw32)
ifneq ($(LIBOBJDIR),/Obj_mingw64)
ifneq ($(LIBOBJDIR),/Obj_mingw32-mpi)
ifneq ($(LIBOBJDIR),/Obj_mingw64-mpi)
molfile_SYSLIB = -ldl
endif
endif
endif
endif
molfile_SYSPATH =
diff --git a/lib/molfile/README b/lib/molfile/README
index 09ea3cc5c..9e8260c20 100644
--- a/lib/molfile/README
+++ b/lib/molfile/README
@@ -1,22 +1,35 @@
This directory has a Makefile.lammps file with settings that allows
LAMMPS to dynamically link to the VMD molfile library. This is
required to use the USER-MOLFILE package and its interface to the dump
and write_dump commands in a LAMMPS input script.
More information about the VMD molfile plugins can be found at
http://www.ks.uiuc.edu/Research/vmd/plugins/molfile.
-More specifically, to be able to dynamically load and execute the
-plugins from inside LAMMPS, you need to link with a system library
-containing functions like dlopen(), dlsym() and so on for dynamic
-linking of executable code into an executable. This library is
-defined by setting the molfile_SYSLIB variable in the Makefile.lammps
-file in this dir.
+NOTE: while the programming interface (API) of the VMD molfile plugins
+is backward compatible (i.e. you can expect to be able to compile this
+package for plugins from newer VMD packages), the binary interface
+(ABI) is not. So it is necessary to compile this package with the
+VMD molfile plugin header files (vmdplugin.h and molfile_plugin.h)
+matching VMD installation that the (binary) plugin files are taken from.
+These header files can be found inside the VMD installation tree under
+"plugins/include". For convenience, this package includes a set of
+header files that is compatible with VMD 1.9.3 (the current version
+in April 2017). You need to adjust the molfile_SYSINC variable in the
+Makefile.lammps file in this directory, in case you want to use VMD
+molfile plugins from a different version. The interface is compatible
+with plugins starting from VMD version 1.8.4.
+
+In order to be able to dynamically load and execute the plugins from
+inside LAMMPS, you need to link with a system library containing functions
+like dlopen(), dlsym() and so on for dynamic linking of executable code
+into an executable. This library is defined by setting the molfile_SYSLIB
+variable in the Makefile.lammps file in this dir.
For Linux and most current unix-like operating systems, this can be
kept at the default setting of "-ldl" (on some platforms this library
is called "-ldld"). For compilation on Windows, a slightly different
mechanism is used that is part of the Windows programming environment
-and this library is not needed.
+and this kind of library is not needed.
See the header of Makefile.lammps for more info.
diff --git a/src/USER-MOLFILE/molfile_plugin.h b/lib/molfile/molfile_plugin.h
similarity index 92%
rename from src/USER-MOLFILE/molfile_plugin.h
rename to lib/molfile/molfile_plugin.h
index 7a2d7ca42..c79e7a5ab 100644
--- a/src/USER-MOLFILE/molfile_plugin.h
+++ b/lib/molfile/molfile_plugin.h
@@ -1,890 +1,903 @@
/***************************************************************************
*cr
*cr (C) Copyright 1995-2006 The Board of Trustees of the
*cr University of Illinois
*cr All Rights Reserved
*cr
***************************************************************************/
/***************************************************************************
* RCS INFORMATION:
*
* $RCSfile: molfile_plugin.h,v $
* $Author: johns $ $Locker: $ $State: Exp $
- * $Revision: 1.103 $ $Date: 2011/03/05 03:56:11 $
+ * $Revision: 1.108 $ $Date: 2016/02/26 03:17:01 $
*
***************************************************************************/
/** @file
* API for C extensions to define a way to load structure, coordinate,
* trajectory, and volumetric data files
*/
#ifndef MOL_FILE_PLUGIN_H
#define MOL_FILE_PLUGIN_H
#include "vmdplugin.h"
#if defined(DESRES_READ_TIMESTEP2)
/* includes needed for large integer types used for frame counts */
#include <sys/types.h>
typedef ssize_t molfile_ssize_t; /**< for frame counts */
#endif
/**
* Define a common plugin type to be used when registering the plugin.
*/
#define MOLFILE_PLUGIN_TYPE "mol file reader"
/**
* File converter plugins use the same API but register under a different
* type so that regular file readers can have priority.
*/
#define MOLFILE_CONVERTER_PLUGIN_TYPE "mol file converter"
/* File plugin symbolic constants for better code readability */
#define MOLFILE_SUCCESS 0 /**< succeeded in reading file */
#define MOLFILE_EOF -1 /**< end of file */
#define MOLFILE_ERROR -1 /**< error reading/opening a file */
#define MOLFILE_NOSTRUCTUREDATA -2 /**< no structure data in this file */
#define MOLFILE_NUMATOMS_UNKNOWN -1 /**< unknown number of atoms */
#define MOLFILE_NUMATOMS_NONE 0 /**< no atoms in this file type */
/**
* Maximum string size macro
*/
#define MOLFILE_BUFSIZ 81 /**< maximum chars in string data */
#define MOLFILE_BIGBUFSIZ 4096 /**< maximum chars in long strings */
#define MOLFILE_MAXWAVEPERTS 25 /**< maximum number of wavefunctions
* per timestep */
+/**
+ * Hard-coded direct-I/O page size constants for use by both VMD
+ * and the plugins that want to use direct, unbuffered I/O for high
+ * performance with SSDs etc. We use two constants to define the
+ * range of hardware page sizes that we can support, so that we can
+ * add support for larger 8KB or 16KB page sizes in the future
+ * as they become more prevalent in high-end storage systems.
+ *
+ * At present, VMD uses a hard-coded 4KB page size to reduce memory
+ * fragmentation, but these constants will make it easier to enable the
+ * use of larger page sizes in the future if it becomes necessary.
+ */
+#define MOLFILE_DIRECTIO_MIN_BLOCK_SIZE 4096
+#define MOLFILE_DIRECTIO_MAX_BLOCK_SIZE 4096
+
/**
* File level comments, origin information, and annotations.
*/
typedef struct {
char database[81]; /**< database of origin, if any */
char accession[81]; /**< database accession code, if any */
char date[81]; /**< date/time stamp for this data */
char title[81]; /**< brief title for this data */
int remarklen; /**< length of remarks string */
char *remarks; /**< free-form remarks about data */
} molfile_metadata_t;
/*
* Struct for specifying atoms in a molecular structure. The first
* six components are required, the rest are optional and their presence is
* indicating by setting the corresponding bit in optsflag. When omitted,
* the application (for read_structure) or plugin (for write_structure)
* must be able to supply default values if the missing parameters are
* part of its internal data structure.
* Note that it is not possible to specify coordinates with this structure.
* This is intentional; all coordinate I/O is done with the read_timestep and
* write_timestep functions.
*/
/**
* Per-atom attributes and information.
*/
typedef struct {
/* these fields absolutely must be set or initialized to empty */
char name[16]; /**< required atom name string */
char type[16]; /**< required atom type string */
char resname[8]; /**< required residue name string */
int resid; /**< required integer residue ID */
char segid[8]; /**< required segment name string, or "" */
+#if 0 && vmdplugin_ABIVERSION > 17
+ /* The new PDB file formats allows for much larger structures, */
+ /* which can therefore require longer chain ID strings. The */
+ /* new PDBx/mmCIF file formats do not have length limits on */
+ /* fields, so PDB chains could be arbitrarily long strings */
+ /* in such files. At present, we know we need at least 3-char */
+ /* chains for existing PDBx/mmCIF files. */
+ char chain[4]; /**< required chain name, or "" */
+#else
char chain[2]; /**< required chain name, or "" */
-
+#endif
/* rest are optional; use optflags to specify what's present */
char altloc[2]; /**< optional PDB alternate location code */
char insertion[2]; /**< optional PDB insertion code */
float occupancy; /**< optional occupancy value */
float bfactor; /**< optional B-factor value */
float mass; /**< optional mass value */
float charge; /**< optional charge value */
float radius; /**< optional radius value */
int atomicnumber; /**< optional element atomic number */
+
+#if 0
+ char complex[16];
+ char assembly[16];
+ int qmregion;
+ int qmregionlink;
+ int qmlayer;
+ int qmlayerlink;
+ int qmfrag;
+ int qmfraglink;
+ string qmecp;
+ int qmadapt;
+ int qmect; /**< boolean */
+ int qmparam;
+ int autoparam;
+#endif
+
#if defined(DESRES_CTNUMBER)
int ctnumber; /**< mae ct block, 0-based, including meta */
#endif
} molfile_atom_t;
/*@{*/
/** Plugin optional data field availability flag */
#define MOLFILE_NOOPTIONS 0x0000 /**< no optional data */
#define MOLFILE_INSERTION 0x0001 /**< insertion codes provided */
#define MOLFILE_OCCUPANCY 0x0002 /**< occupancy data provided */
#define MOLFILE_BFACTOR 0x0004 /**< B-factor data provided */
#define MOLFILE_MASS 0x0008 /**< Atomic mass provided */
#define MOLFILE_CHARGE 0x0010 /**< Atomic charge provided */
#define MOLFILE_RADIUS 0x0020 /**< Atomic VDW radius provided */
#define MOLFILE_ALTLOC 0x0040 /**< Multiple conformations present */
#define MOLFILE_ATOMICNUMBER 0x0080 /**< Atomic element number provided */
#define MOLFILE_BONDSSPECIAL 0x0100 /**< Only non-standard bonds provided */
#if defined(DESRES_CTNUMBER)
#define MOLFILE_CTNUMBER 0x0200 /**< ctnumber provided */
#endif
#define MOLFILE_BADOPTIONS 0xFFFFFFFF /**< Detect badly behaved plugins */
/*@}*/
/*@{*/
/** Flags indicating availability of optional data fields
* for QM timesteps
*/
#define MOLFILE_QMTS_NOOPTIONS 0x0000 /**< no optional data */
#define MOLFILE_QMTS_GRADIENT 0x0001 /**< energy gradients provided */
#define MOLFILE_QMTS_SCFITER 0x0002
/*@}*/
-#if vmdplugin_ABIVERSION > 10
typedef struct molfile_timestep_metadata {
unsigned int count; /**< total # timesteps; -1 if unknown */
unsigned int avg_bytes_per_timestep; /** bytes per timestep */
int has_velocities; /**< if timesteps have velocities */
} molfile_timestep_metadata_t;
-#endif
/*
* Per-timestep atom coordinates and periodic cell information
*/
typedef struct {
float *coords; /**< coordinates of all atoms, arranged xyzxyzxyz */
-#if vmdplugin_ABIVERSION > 10
float *velocities; /**< space for velocities of all atoms; same layout */
/**< NULL unless has_velocities is set */
-#endif
/*@{*/
/**
* Unit cell specification of the form A, B, C, alpha, beta, gamma.
* notes: A, B, C are side lengths of the unit cell
* alpha = angle between b and c
* beta = angle between a and c
* gamma = angle between a and b
*/
float A, B, C, alpha, beta, gamma;
/*@}*/
-#if vmdplugin_ABIVERSION > 10
double physical_time; /**< physical time point associated with this frame */
-#endif
#if defined(DESRES_READ_TIMESTEP2)
/* HACK to support generic trajectory information */
double total_energy;
double potential_energy;
double kinetic_energy;
double extended_energy;
double force_energy;
double total_pressure;
#endif
} molfile_timestep_t;
/**
* Metadata for volumetric datasets, read initially and used for subsequent
* memory allocations and file loading.
*/
typedef struct {
char dataname[256]; /**< name of volumetric data set */
float origin[3]; /**< origin: origin of volume (x=0, y=0, z=0 corner */
/*
* x/y/z axis:
* These the three cell sides, providing both direction and length
* (not unit vectors) for the x, y, and z axes. In the simplest
* case, these would be <size,0,0> <0,size,0> and <0,0,size) for
* an orthogonal cubic volume set. For other cell shapes these
* axes can be oriented non-orthogonally, and the parallelpiped
* may have different side lengths, not just a cube/rhombus.
*/
float xaxis[3]; /**< direction (and length) for X axis */
float yaxis[3]; /**< direction (and length) for Y axis */
float zaxis[3]; /**< direction (and length) for Z axis */
/*
* x/y/z size:
* Number of grid cells along each axis. This is _not_ the
* physical size of the box, this is the number of voxels in each
* direction, independent of the shape of the volume set.
*/
- int xsize; /**< number of grid cells along the X axis */
- int ysize; /**< number of grid cells along the Y axis */
- int zsize; /**< number of grid cells along the Z axis */
-
- int has_color; /**< flag indicating presence of voxel color data */
+ int xsize; /**< number of grid cells along the X axis */
+ int ysize; /**< number of grid cells along the Y axis */
+ int zsize; /**< number of grid cells along the Z axis */
+
+#if vmdplugin_ABIVERSION > 16
+ int has_scalar; /**< flag indicating presence of scalar volume */
+ int has_gradient; /**< flag indicating presence of vector volume */
+ int has_variance; /**< flag indicating presence of variance map */
+#endif
+ int has_color; /**< flag indicating presence of voxel color data */
} molfile_volumetric_t;
+#if vmdplugin_ABIVERSION > 16
+/**
+ * Volumetric dataset read/write structure with both flag/parameter sets
+ * and VMD-allocated pointers for fields to be used by the plugin.
+ */
+typedef struct {
+ int setidx; /**< volumetric dataset index to load/save */
+ float *scalar; /**< scalar density/potential field data */
+ float *gradient; /**< gradient vector field */
+ float *variance; /**< variance map indicating signal/noise */
+ float *rgb3f; /**< RGB floating point color texture map */
+ unsigned char *rgb3u; /**< RGB unsigned byte color texture map */
+} molfile_volumetric_readwrite_t;
+#endif
/**************************************************************
**************************************************************
**** ****
**** Data structures for QM files ****
**** ****
**************************************************************
**************************************************************/
-#if vmdplugin_ABIVERSION > 9
-
-
/* macros for the convergence status of a QM calculation. */
#define MOLFILE_QMSTATUS_UNKNOWN -1 /* don't know yet */
#define MOLFILE_QMSTATUS_OPT_CONV 0 /* optimization converged */
#define MOLFILE_QMSTATUS_SCF_NOT_CONV 1 /* SCF convergence failed */
#define MOLFILE_QMSTATUS_OPT_NOT_CONV 2 /* optimization not converged */
#define MOLFILE_QMSTATUS_FILE_TRUNCATED 3 /* file was truncated */
/* macros describing the SCF method (SCFTYP in GAMESS) */
#define MOLFILE_SCFTYPE_UNKNOWN -1 /* no info about the method */
#define MOLFILE_SCFTYPE_NONE 0 /* calculation didn't make use of SCF */
#define MOLFILE_SCFTYPE_RHF 1 /* restricted Hartree-Fock */
#define MOLFILE_SCFTYPE_UHF 2 /* unrestricted Hartree-Fock */
#define MOLFILE_SCFTYPE_ROHF 3 /* restricted open-shell Hartree-Fock */
#define MOLFILE_SCFTYPE_GVB 4 /* generalized valence bond orbitals */
#define MOLFILE_SCFTYPE_MCSCF 5 /* multi-configuration SCF */
#define MOLFILE_SCFTYPE_FF 6 /* classical force-field based sim. */
/* macros describing the type of calculation (RUNTYP in GAMESS) */
#define MOLFILE_RUNTYPE_UNKNOWN 0 /* single point run */
#define MOLFILE_RUNTYPE_ENERGY 1 /* single point run */
#define MOLFILE_RUNTYPE_OPTIMIZE 2 /* geometry optimization */
#define MOLFILE_RUNTYPE_SADPOINT 3 /* saddle point search */
#define MOLFILE_RUNTYPE_HESSIAN 4 /* Hessian/frequency calculation */
#define MOLFILE_RUNTYPE_SURFACE 5 /* potential surface scan */
#define MOLFILE_RUNTYPE_GRADIENT 6 /* energy gradient calculation */
#define MOLFILE_RUNTYPE_MEX 7 /* minimum energy crossing */
#define MOLFILE_RUNTYPE_DYNAMICS 8 /* Any type of molecular dynamics
* e.g. Born-Oppenheimer, Car-Parinello,
* or classical MD */
#define MOLFILE_RUNTYPE_PROPERTIES 9 /* Properties were calculated from a
* wavefunction that was read from file */
/**
* Sizes of various QM-related, timestep independent data arrays
* which must be allocated by the caller (VMD) so that the plugin
* can fill in the arrays with data.
*/
typedef struct {
/* hessian data */
int nimag; /**< number of imaginary modes */
int nintcoords; /**< number internal coordinates */
int ncart; /**< number cartesian coordinates */
/* orbital/basisset data */
int num_basis_funcs; /**< number of uncontracted basis functions in basis array */
int num_basis_atoms; /**< number of atoms in basis set */
int num_shells; /**< total number of atomic shells */
int wavef_size; /**< size of the wavefunction
* i.e. size of secular eq. or
* # of cartesian contracted
* gaussian basis functions */
/* everything else */
int have_sysinfo;
int have_carthessian; /**< hessian in cartesian coords available */
int have_inthessian; /**< hessian in internal coords available */
int have_normalmodes; /**< normal modes available */
} molfile_qm_metadata_t;
/**
* QM run info. Parameters that stay unchanged during a single file.
*/
typedef struct {
int nproc; /**< number of processors used. */
int memory; /**< amount of memory used in Mbyte. */
int runtype; /**< flag indicating the calculation method. */
int scftype; /**< SCF type: RHF, UHF, ROHF, GVB or MCSCF wfn. */
int status; /**< indicates wether SCF and geometry optimization
* have converged properly. */
int num_electrons; /**< number of electrons. XXX: can be fractional in some DFT codes */
int totalcharge; /**< total charge of system. XXX: can be fractional in some DFT codes */
int num_occupied_A; /**< number of occupied alpha orbitals */
int num_occupied_B; /**< number of occupied beta orbitals */
double *nuc_charge; /**< array(natom) containing the nuclear charge of atom i */
char basis_string[MOLFILE_BUFSIZ]; /**< basis name as "nice" string. */
char runtitle[MOLFILE_BIGBUFSIZ]; /**< title of run. */
char geometry[MOLFILE_BUFSIZ]; /**< type of provided geometry, XXX: remove?
* e.g. UNIQUE, ZMT, CART, ... */
char version_string[MOLFILE_BUFSIZ]; /**< QM code version information. */
} molfile_qm_sysinfo_t;
/**
* Data for QM basis set
*/
typedef struct {
int *num_shells_per_atom; /**< number of shells per atom */
int *num_prim_per_shell; /**< number of shell primitives shell */
float *basis; /**< contraction coeffients and exponents for
* the basis functions in the form
* {exp(1), c-coeff(1), exp(2), c-coeff(2), ...};
* array size = 2*num_basis_funcs
* The basis must NOT be normalized. */
int *atomic_number; /**< atomic numbers (chem. element) of atoms in basis set */
int *angular_momentum; /**< 3 ints per wave function coefficient do describe the
* cartesian components of the angular momentum.
* E.g. S={0 0 0}, Px={1 0 0}, Dxy={1 1 0}, or Fyyz={0 2 1}.
*/
int *shell_types; /**< type for each shell in basis */
} molfile_qm_basis_t;
/**
* Data from QM Hessian/normal mode runs
*
* A noteworthy comment from one of Axel's emails:
* The molfile_qm_hessian_t, I'd rename to molfile_hessian_t (one
* can do vibrational analysis without QM) and would make this a
* completely separate entity. This could then be also used to
* read in data from, say, principal component analysis or normal
* mode analysis and VMD could contain code to either project a
* trajectory on the contained eigenvectors or animate them and
* so on. There is a bunch of possible applications...
*/
typedef struct {
double *carthessian; /**< hessian matrix in cartesian coordinates (ncart)*(ncart)
* as a single array of doubles (row(1), ...,row(natoms)) */
int *imag_modes; /**< list(nimag) of imaginary modes */
double *inthessian; /**< hessian matrix in internal coordinates
* (nintcoords*nintcoords) as a single array of
* doubles (row(1), ...,row(nintcoords)) */
float *wavenumbers; /**< array(ncart) of wavenumbers of normal modes */
float *intensities; /**< array(ncart) of intensities of normal modes */
float *normalmodes; /**< matrix(ncart*ncart) of normal modes */
} molfile_qm_hessian_t;
/**
* QM related information that is timestep independent
*/
typedef struct {
molfile_qm_sysinfo_t run; /* system info */
molfile_qm_basis_t basis; /* basis set info */
molfile_qm_hessian_t hess; /* hessian info */
} molfile_qm_t;
/**
* Enumeration of all of the wavefunction types that can be read
* from QM file reader plugins.
*
* CANON = canonical (i.e diagonalized) wavefunction
* GEMINAL = GVB-ROHF geminal pairs
* MCSCFNAT = Multi-Configuration SCF natural orbitals
* MCSCFOPT = Multi-Configuration SCF optimized orbitals
* CINATUR = Configuration-Interaction natural orbitals
* BOYS = Boys localization
* RUEDEN = Ruedenberg localization
* PIPEK = Pipek-Mezey population localization
*
* NBO related localizations:
* --------------------------
* NAO = Natural Atomic Orbitals
* PNAO = pre-orthogonal NAOs
* NBO = Natural Bond Orbitals
* PNBO = pre-orthogonal NBOs
* NHO = Natural Hybrid Orbitals
* PNHO = pre-orthogonal NHOs
* NLMO = Natural Localized Molecular Orbitals
* PNLMO = pre-orthogonal NLMOs
*
* UNKNOWN = Use this for any type not listed here
* You can use the string field for description
*/
enum molfile_qm_wavefunc_type {
MOLFILE_WAVE_CANON, MOLFILE_WAVE_GEMINAL,
MOLFILE_WAVE_MCSCFNAT, MOLFILE_WAVE_MCSCFOPT,
MOLFILE_WAVE_CINATUR,
MOLFILE_WAVE_PIPEK, MOLFILE_WAVE_BOYS, MOLFILE_WAVE_RUEDEN,
MOLFILE_WAVE_NAO, MOLFILE_WAVE_PNAO, MOLFILE_WAVE_NHO,
MOLFILE_WAVE_PNHO, MOLFILE_WAVE_NBO, MOLFILE_WAVE_PNBO,
MOLFILE_WAVE_PNLMO, MOLFILE_WAVE_NLMO, MOLFILE_WAVE_MOAO,
MOLFILE_WAVE_NATO, MOLFILE_WAVE_UNKNOWN
};
/**
* Enumeration of all of the supported QM related charge
* types
*/
enum molfile_qm_charge_type {
MOLFILE_QMCHARGE_UNKNOWN,
MOLFILE_QMCHARGE_MULLIKEN, MOLFILE_QMCHARGE_LOWDIN,
MOLFILE_QMCHARGE_ESP, MOLFILE_QMCHARGE_NPA
};
/**
* Sizes of various QM-related, per-timestep data arrays
* which must be allocated by the caller (VMD) so that the plugin
* can fill in the arrays with data.
*/
typedef struct molfile_qm_timestep_metadata {
unsigned int count; /**< total # timesteps; -1 if unknown */
unsigned int avg_bytes_per_timestep; /**< bytes per timestep */
int has_gradient; /**< if timestep contains gradient */
int num_scfiter; /**< # scf iterations for this ts */
int num_orbitals_per_wavef[MOLFILE_MAXWAVEPERTS]; /**< # orbitals for each wavefunction */
int has_orben_per_wavef[MOLFILE_MAXWAVEPERTS]; /**< orbital energy flags */
int has_occup_per_wavef[MOLFILE_MAXWAVEPERTS]; /**< orbital occupancy flags */
int num_wavef ; /**< # wavefunctions in this ts */
int wavef_size; /**< size of one wavefunction
* (# of gaussian basis fctns) */
int num_charge_sets; /**< # of charge values per atom */
} molfile_qm_timestep_metadata_t;
/**
* QM wavefunction
*/
typedef struct {
int type; /**< MOLFILE_WAVE_CANON, MOLFILE_WAVE_MCSCFNAT, ... */
int spin; /**< 1 for alpha, -1 for beta */
int excitation; /**< 0 for ground state, 1,2,3,... for excited states */
int multiplicity; /**< spin multiplicity of the state, zero if unknown */
char info[MOLFILE_BUFSIZ]; /**< string for additional type info */
double energy; /**< energy of the electronic state.
* i.e. HF-SCF energy, CI state energy,
* MCSCF energy, etc. */
float *wave_coeffs; /**< expansion coefficients for wavefunction in the
* form {orbital1(c1),orbital1(c2),.....,orbitalM(cN)} */
float *orbital_energies; /**< list of orbital energies for wavefunction */
float *occupancies; /**< orbital occupancies */
int *orbital_ids; /**< orbital ID numbers; If NULL then VMD will
* assume 1,2,3,...num_orbs. */
} molfile_qm_wavefunction_t;
/**
* QM per trajectory timestep info
* Note that each timestep can contain multiple wavefunctions.
*/
typedef struct {
molfile_qm_wavefunction_t *wave; /**< array of wavefunction objects */
float *gradient; /**< force on each atom (=gradient of energy) */
double *scfenergies; /**< energies from the SCF cycles */
double *charges; /**< per-atom charges */
int *charge_types; /**< type of each charge set */
} molfile_qm_timestep_t;
-#endif
-
/**************************************************************
**************************************************************/
/**
* Enumeration of all of the supported graphics objects that can be read
* from graphics file reader plugins.
*/
enum molfile_graphics_type {
MOLFILE_POINT, MOLFILE_TRIANGLE, MOLFILE_TRINORM, MOLFILE_NORMS,
MOLFILE_LINE, MOLFILE_CYLINDER, MOLFILE_CAPCYL, MOLFILE_CONE,
MOLFILE_SPHERE, MOLFILE_TEXT, MOLFILE_COLOR, MOLFILE_TRICOLOR
};
/**
* Individual graphics object/element data
*/
typedef struct {
int type; /* One of molfile_graphics_type */
int style; /* A general style parameter */
float size; /* A general size parameter */
float data[9]; /* All data for the element */
} molfile_graphics_t;
/*
* Types for raw graphics elements stored in files. Data for each type
* should be stored by the plugin as follows:
type data style size
---- ---- ----- ----
point x, y, z pixel size
triangle x1,y1,z1,x2,y2,z2,x3,y3,z3
trinorm x1,y1,z1,x2,y2,z2,x3,y3,z3
the next array element must be NORMS
tricolor x1,y1,z1,x2,y2,z2,x3,y3,z3
the next array elements must be NORMS
the following element must be COLOR, with three RGB triples
norms x1,y1,z1,x2,y2,z2,x3,y3,z3
line x1,y1,z1,x2,y2,z2 0=solid pixel width
1=stippled
cylinder x1,y1,z1,x2,y2,z2 resolution radius
capcyl x1,y1,z1,x2,y2,z2 resolution radius
sphere x1,y1,z1 resolution radius
text x, y, z, up to 24 bytes of text pixel size
color r, g, b
*/
/**
* Main file reader API. Any function in this struct may be NULL
* if not implemented by the plugin; the application checks this to determine
* what functionality is present in the plugin.
*/
typedef struct {
/**
* Required header
*/
vmdplugin_HEAD
/**
* Filename extension for this file type. May be NULL if no filename
* extension exists and/or is known. For file types that match several
* common extensions, list them in a comma separated list such as:
* "pdb,ent,foo,bar,baz,ban"
* The comma separated list will be expanded when filename extension matching
* is performed. If multiple plugins solicit the same filename extensions,
* the one that lists the extension earliest in its list is selected. In the
* case of a "tie", the first one tried/checked "wins".
*/
const char *filename_extension;
/**
* Try to open the file for reading. Return an opaque handle, or NULL on
* failure. Set the number of atoms; if the number of atoms cannot be
* determined, set natoms to MOLFILE_NUMATOMS_UNKNOWN.
* Filetype should be the name under which this plugin was registered;
* this is provided so that plugins can provide the same function pointer
* to handle multiple file types.
*/
void *(* open_file_read)(const char *filepath, const char *filetype,
int *natoms);
/**
* Read molecular structure from the given file handle. atoms is allocated
* by the caller and points to space for natoms.
* On success, place atom information in the passed-in pointer.
* optflags specifies which optional fields in the atoms will be set by
* the plugin.
*/
int (*read_structure)(void *, int *optflags, molfile_atom_t *atoms);
/**
* Read bond information for the molecule. On success the arrays from
* and to should point to the (one-based) indices of bonded atoms.
* Each unique bond should be specified only once, so file formats that list
* bonds twice will need post-processing before the results are returned to
* the caller.
* If the plugin provides bond information, but the file loaded doesn't
* actually contain any bond info, the nbonds parameter should be
* set to 0 and from/to should be set to NULL to indicate that no bond
* information was actually present, and automatic bond search should be
* performed.
*
* If the plugin provides bond order information, the bondorder array
* will contain the bond order for each from/to pair. If not, the bondorder
* pointer should be set to NULL, in which case the caller will provide a
* default bond order value of 1.0.
*
* If the plugin provides bond type information, the bondtype array
* will contain the bond type index for each from/to pair. These numbers
* are consecutive integers starting from 0.
* the bondtypenames list, contains the corresponding names, if available,
* as a NULL string terminated list. nbondtypes is provided for convenience
* and consistency checking.
*
* These arrays must be freed by the plugin in the close_file_read function.
* This function can be called only after read_structure().
* Return MOLFILE_SUCCESS if no errors occur.
*/
-#if vmdplugin_ABIVERSION > 14
int (*read_bonds)(void *, int *nbonds, int **from, int **to, float **bondorder,
int **bondtype, int *nbondtypes, char ***bondtypename);
-#else
- int (*read_bonds)(void *, int *nbonds, int **from, int **to, float **bondorder);
-#endif
/**
* XXX this function will be augmented and possibly superceded by a
* new QM-capable version named read_timestep(), when finished.
*
* Read the next timestep from the file. Return MOLFILE_SUCCESS, or
* MOLFILE_EOF on EOF. If the molfile_timestep_t argument is NULL, then
* the frame should be skipped. Otherwise, the application must prepare
* molfile_timestep_t by allocating space in coords for the corresponding
* number of coordinates.
* The natoms parameter exists because some coordinate file formats
* (like CRD) cannot determine for themselves how many atoms are in a
* timestep; the app must therefore obtain this information elsewhere
* and provide it to the plugin.
*/
int (* read_next_timestep)(void *, int natoms, molfile_timestep_t *);
/**
* Close the file and release all data. The handle cannot be reused.
*/
void (* close_file_read)(void *);
/**
* Open a coordinate file for writing using the given header information.
* Return an opaque handle, or NULL on failure. The application must
* specify the number of atoms to be written.
* filetype should be the name under which this plugin was registered.
*/
void *(* open_file_write)(const char *filepath, const char *filetype,
int natoms);
/**
* Write structure information. Return success.
*/
int (* write_structure)(void *, int optflags, const molfile_atom_t *atoms);
/**
* Write a timestep to the coordinate file. Return MOLFILE_SUCCESS if no
* errors occur. If the file contains structure information in each
* timestep (like a multi-entry PDB), it will have to cache the information
* from the initial calls from write_structure.
*/
int (* write_timestep)(void *, const molfile_timestep_t *);
/**
* Close the file and release all data. The handle cannot be reused.
*/
void (* close_file_write)(void *);
/**
* Retrieve metadata pertaining to volumetric datasets in this file.
* Set nsets to the number of volumetric data sets, and set *metadata
* to point to an array of molfile_volumetric_t. The array is owned by
* the plugin and should be freed by close_file_read(). The application
* may call this function any number of times.
*/
int (* read_volumetric_metadata)(void *, int *nsets,
molfile_volumetric_t **metadata);
/**
* Read the specified volumetric data set into the space pointed to by
* datablock. The set is specified with a zero-based index. The space
* allocated for the datablock must be equal to
* xsize * ysize * zsize. No space will be allocated for colorblock
* unless has_color is nonzero; in that case, colorblock should be
* filled in with three RGB floats per datapoint.
*/
int (* read_volumetric_data)(void *, int set, float *datablock,
float *colorblock);
+#if vmdplugin_ABIVERSION > 16
+ int (* read_volumetric_data_ex)(void *, molfile_volumetric_readwrite_t *v);
+#endif
/**
* Read raw graphics data stored in this file. Return the number of data
* elements and the data itself as an array of molfile_graphics_t in the
* pointer provided by the application. The plugin is responsible for
* freeing the data when the file is closed.
*/
int (* read_rawgraphics)(void *, int *nelem, const molfile_graphics_t **data);
/**
* Read molecule metadata such as what database (if any) this file/data
* came from, what the accession code for the database is, textual remarks
* and other notes pertaining to the contained structure/trajectory/volume
* and anything else that's informative at the whole file level.
*/
int (* read_molecule_metadata)(void *, molfile_metadata_t **metadata);
/**
* Write bond information for the molecule. The arrays from
* and to point to the (one-based) indices of bonded atoms.
* Each unique bond will be specified only once by the caller.
* File formats that list bonds twice will need to emit both the
* from/to and to/from versions of each.
* This function must be called before write_structure().
*
* Like the read_bonds() routine, the bondorder pointer is set to NULL
* if the caller doesn't have such information, in which case the
* plugin should assume a bond order of 1.0 if the file format requires
* bond order information.
*
* Support for bond types follows the bondorder rules. bondtype is
* an integer array of the size nbonds that contains the bond type
* index (consecutive integers starting from 0) and bondtypenames
* contain the corresponding strings, in case the naming/numbering
* scheme is different from the index numbers.
* if the pointers are set to NULL, then this information is not available.
* bondtypenames can only be used of bondtypes is also given.
* Return MOLFILE_SUCCESS if no errors occur.
*/
-#if vmdplugin_ABIVERSION > 14
int (* write_bonds)(void *, int nbonds, int *from, int *to, float *bondorder,
int *bondtype, int nbondtypes, char **bondtypename);
-#else
- int (* write_bonds)(void *, int nbonds, int *from, int *to, float *bondorder);
-#endif
-#if vmdplugin_ABIVERSION > 9
/**
* Write the specified volumetric data set into the space pointed to by
* datablock. The * allocated for the datablock must be equal to
* xsize * ysize * zsize. No space will be allocated for colorblock
* unless has_color is nonzero; in that case, colorblock should be
* filled in with three RGB floats per datapoint.
*/
int (* write_volumetric_data)(void *, molfile_volumetric_t *metadata,
float *datablock, float *colorblock);
+#if vmdplugin_ABIVERSION > 16
+ int (* write_volumetric_data_ex)(void *, molfile_volumetric_t *metadata,
+ molfile_volumetric_readwrite_t *v);
+#endif
-#if vmdplugin_ABIVERSION > 15
/**
* Read in Angles, Dihedrals, Impropers, and Cross Terms and optionally types.
* (Cross terms pertain to the CHARMM/NAMD CMAP feature)
*/
int (* read_angles)(void *handle, int *numangles, int **angles, int **angletypes,
int *numangletypes, char ***angletypenames, int *numdihedrals,
int **dihedrals, int **dihedraltypes, int *numdihedraltypes,
char ***dihedraltypenames, int *numimpropers, int **impropers,
int **impropertypes, int *numimpropertypes, char ***impropertypenames,
int *numcterms, int **cterms, int *ctermcols, int *ctermrows);
/**
* Write out Angles, Dihedrals, Impropers, and Cross Terms
* (Cross terms pertain to the CHARMM/NAMD CMAP feature)
*/
int (* write_angles)(void *handle, int numangles, const int *angles, const int *angletypes,
int numangletypes, const char **angletypenames, int numdihedrals,
const int *dihedrals, const int *dihedraltypes, int numdihedraltypes,
const char **dihedraltypenames, int numimpropers,
const int *impropers, const int *impropertypes, int numimpropertypes,
const char **impropertypenames, int numcterms, const int *cterms,
int ctermcols, int ctermrows);
-#else
- /**
- * Read in Angles, Dihedrals, Impropers, and Cross Terms
- * Forces are in Kcal/mol
- * (Cross terms pertain to the CHARMM/NAMD CMAP feature, forces are given
- * as a 2-D matrix)
- */
- int (* read_angles)(void *,
- int *numangles, int **angles, double **angleforces,
- int *numdihedrals, int **dihedrals, double **dihedralforces,
- int *numimpropers, int **impropers, double **improperforces,
- int *numcterms, int **cterms,
- int *ctermcols, int *ctermrows, double **ctermforces);
-
- /**
- * Write out Angles, Dihedrals, Impropers, and Cross Terms
- * Forces are in Kcal/mol
- * (Cross terms pertain to the CHARMM/NAMD CMAP feature, forces are given
- * as a 2-D matrix)
- */
- int (* write_angles)(void *,
- int numangles, const int *angles, const double *angleforces,
- int numdihedrals, const int *dihedrals, const double *dihedralforces,
- int numimpropers, const int *impropers, const double *improperforces,
- int numcterms, const int *cterms,
- int ctermcols, int ctermrows, const double *ctermforces);
-#endif
/**
* Retrieve metadata pertaining to timestep independent
* QM datasets in this file.
*
* The metadata are the sizes of the QM related data structure
* arrays that will be populated by the plugin when
* read_qm_rundata() is called. Since the allocation of these
* arrays is done by VMD rather than the plugin, VMD needs to
* know the sizes beforehand. Consequently read_qm_metadata()
* has to be called before read_qm_rundata().
*/
int (* read_qm_metadata)(void *, molfile_qm_metadata_t *metadata);
/**
* Read timestep independent QM data.
*
* Typical data that are defined only once per trajectory are
* general info about the calculation (such as the used method),
* the basis set and normal modes.
* The data structures to be populated must have been allocated
* before by VMD according to sizes obtained through
* read_qm_metadata().
*/
int (* read_qm_rundata)(void *, molfile_qm_t *qmdata);
/**
* Read the next timestep from the file. Return MOLFILE_SUCCESS, or
* MOLFILE_EOF on EOF. If the molfile_timestep_t or molfile_qm_metadata_t
* arguments are NULL, then the coordinate or qm data should be skipped.
* Otherwise, the application must prepare molfile_timestep_t and
* molfile_qm_timestep_t by allocating space for the corresponding
* number of coordinates, orbital wavefunction coefficients, etc.
* Since it is common for users to want to load only the final timestep
* data from a QM run, the application may provide any combination of
* valid, or NULL pointers for the molfile_timestep_t and
* molfile_qm_timestep_t parameters, depending on what information the
* user is interested in.
* The natoms and qm metadata parameters exist because some file formats
* cannot determine for themselves how many atoms etc are in a
* timestep; the app must therefore obtain this information elsewhere
* and provide it to the plugin.
*/
int (* read_timestep)(void *, int natoms, molfile_timestep_t *,
molfile_qm_metadata_t *, molfile_qm_timestep_t *);
-#endif
-#if vmdplugin_ABIVERSION > 10
int (* read_timestep_metadata)(void *, molfile_timestep_metadata_t *);
-#endif
-#if vmdplugin_ABIVERSION > 11
int (* read_qm_timestep_metadata)(void *, molfile_qm_timestep_metadata_t *);
-#endif
#if defined(DESRES_READ_TIMESTEP2)
/**
* Read a specified timestep!
*/
int (* read_timestep2)(void *, molfile_ssize_t index, molfile_timestep_t *);
/**
* write up to count times beginning at index start into the given
* space. Return the number read, or -1 on error.
*/
molfile_ssize_t (* read_times)( void *,
molfile_ssize_t start,
molfile_ssize_t count,
double * times );
#endif
-#if vmdplugin_ABIVERSION > 13
/**
* Console output, READ-ONLY function pointer.
* Function pointer that plugins can use for printing to the host
* application's text console. This provides a clean way for plugins
* to send message strings back to the calling application, giving the
* caller the ability to prioritize, buffer, and redirect console messages
* to an appropriate output channel, window, etc. This enables the use of
* graphical consoles like TkCon without losing console output from plugins.
* If the function pointer is NULL, no console output service is provided
* by the calling application, and the output should default to stdout
* stream. If the function pointer is non-NULL, all output will be
* subsequently dealt with by the calling application.
*
* XXX this should really be put into a separate block of
* application-provided read-only function pointers for any
* application-provided services
*/
int (* cons_fputs)(const int, const char*);
-#endif
} molfile_plugin_t;
#endif
+
diff --git a/src/USER-MOLFILE/vmdplugin.h b/lib/molfile/vmdplugin.h
similarity index 98%
rename from src/USER-MOLFILE/vmdplugin.h
rename to lib/molfile/vmdplugin.h
index 37299408f..842d1e431 100644
--- a/src/USER-MOLFILE/vmdplugin.h
+++ b/lib/molfile/vmdplugin.h
@@ -1,191 +1,191 @@
/***************************************************************************
*cr
*cr (C) Copyright 1995-2006 The Board of Trustees of the
*cr University of Illinois
*cr All Rights Reserved
*cr
***************************************************************************/
/***************************************************************************
* RCS INFORMATION:
*
* $RCSfile: vmdplugin.h,v $
* $Author: johns $ $Locker: $ $State: Exp $
- * $Revision: 1.32 $ $Date: 2009/02/24 05:12:35 $
+ * $Revision: 1.33 $ $Date: 2015/10/29 05:10:54 $
*
***************************************************************************/
/** @file
* This header must be included by every VMD plugin library. It defines the
* API for every plugin so that VMD can organize the plugins it finds.
*/
#ifndef VMD_PLUGIN_H
#define VMD_PLUGIN_H
/*
* Preprocessor tricks to make it easier for us to redefine the names of
* functions when building static plugins.
*/
#if !defined(VMDPLUGIN)
/**
* macro defining VMDPLUGIN if it hasn't already been set to the name of
* a static plugin that is being compiled. This is the catch-all case.
*/
#define VMDPLUGIN vmdplugin
#endif
/** concatenation macro, joins args x and y together as a single string */
#define xcat(x, y) cat(x, y)
/** concatenation macro, joins args x and y together as a single string */
#define cat(x, y) x ## y
/*
* macros to correctly define plugin function names depending on whether
* the plugin is being compiled for static linkage or dynamic loading.
* When compiled for static linkage, each plugin needs to have unique
* function names for all of its entry points. When compiled for dynamic
* loading, the plugins must name their entry points consistently so that
* the plugin loading mechanism can find the register, register_tcl, init,
* and fini routines via dlopen() or similar operating system interfaces.
*/
/*@{*/
/** Macro names entry points correctly for static linkage or dynamic loading */
#define VMDPLUGIN_register xcat(VMDPLUGIN, _register)
#define VMDPLUGIN_register_tcl xcat(VMDPLUGIN, _register_tcl)
#define VMDPLUGIN_init xcat(VMDPLUGIN, _init)
#define VMDPLUGIN_fini xcat(VMDPLUGIN, _fini)
/*@}*/
/** "WIN32" is defined on both WIN32 and WIN64 platforms... */
#if (defined(WIN32))
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#if !defined(STATIC_PLUGIN)
#if defined(VMDPLUGIN_EXPORTS)
/**
* Only define DllMain for plugins, not in VMD or in statically linked plugins
* VMDPLUGIN_EXPORTS is only defined when compiling dynamically loaded plugins
*/
BOOL APIENTRY DllMain( HANDLE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
return TRUE;
}
#define VMDPLUGIN_API __declspec(dllexport)
#else
#define VMDPLUGIN_API __declspec(dllimport)
#endif /* VMDPLUGIN_EXPORTS */
#else /* ! STATIC_PLUGIN */
#define VMDPLUGIN_API
#endif /* ! STATIC_PLUGIN */
#else
/** If we're not compiling on Windows, then this macro is defined empty */
#define VMDPLUGIN_API
#endif
/** define plugin linkage correctly for both C and C++ based plugins */
#ifdef __cplusplus
#define VMDPLUGIN_EXTERN extern "C" VMDPLUGIN_API
#else
#define VMDPLUGIN_EXTERN extern VMDPLUGIN_API
#endif /* __cplusplus */
/*
* Plugin API functions start here
*/
/**
* Init routine: called the first time the library is loaded by the
* application and before any other API functions are referenced.
* Return 0 on success.
*/
VMDPLUGIN_EXTERN int VMDPLUGIN_init(void);
/**
* Macro for creating a struct header used in all plugin structures.
*
* This header should be placed at the top of every plugin API definition
* so that it can be treated as a subtype of the base plugin type.
*
* abiversion: Defines the ABI for the base plugin type (not for other plugins)
* type: A string descriptor of the plugin type.
* name: A name for the plugin.
* author: A string identifier, possibly including newlines.
* Major and minor version.
* is_reentrant: Whether this library can be run concurrently with itself.
*/
#define vmdplugin_HEAD \
int abiversion; \
const char *type; \
const char *name; \
const char *prettyname; \
const char *author; \
int majorv; \
int minorv; \
int is_reentrant;
/**
* Typedef for generic plugin header, individual plugins can
* make their own structures as long as the header info remains
* the same as the generic plugin header, most easily done by
* using the vmdplugin_HEAD macro.
*/
typedef struct {
vmdplugin_HEAD
} vmdplugin_t;
/**
* Use this macro to initialize the abiversion member of each plugin
*/
-#define vmdplugin_ABIVERSION 16
+#define vmdplugin_ABIVERSION 17
/*@{*/
/** Use this macro to indicate a plugin's thread-safety at registration time */
#define VMDPLUGIN_THREADUNSAFE 0
#define VMDPLUGIN_THREADSAFE 1
/*@}*/
/*@{*/
/** Error return code for use in the plugin registration and init functions */
#define VMDPLUGIN_SUCCESS 0
#define VMDPLUGIN_ERROR -1
/*@}*/
/**
* Function pointer typedef for register callback functions
*/
typedef int (*vmdplugin_register_cb)(void *, vmdplugin_t *);
/**
* Allow the library to register plugins with the application.
* The callback should be called using the passed-in void pointer, which
* should not be interpreted in any way by the library. Each vmdplugin_t
* pointer passed to the application should point to statically-allocated
* or heap-allocated memory and should never be later modified by the plugin.
* Applications must be permitted to retain only a copy of the the plugin
* pointer, without making any deep copy of the items in the struct.
*/
VMDPLUGIN_EXTERN int VMDPLUGIN_register(void *, vmdplugin_register_cb);
/**
* Allow the library to register Tcl extensions.
* This API is optional; if found by dlopen, it will be called after first
* calling init and register.
*/
VMDPLUGIN_EXTERN int VMDPLUGIN_register_tcl(void *, void *tcl_interp,
vmdplugin_register_cb);
/**
* The Fini method is called when the application will no longer use
* any plugins in the library.
*/
VMDPLUGIN_EXTERN int VMDPLUGIN_fini(void);
#endif /* VMD_PLUGIN_H */
diff --git a/lib/mscg/Install.py b/lib/mscg/Install.py
new file mode 100644
index 000000000..e54723261
--- /dev/null
+++ b/lib/mscg/Install.py
@@ -0,0 +1,122 @@
+#!/usr/bin/env python
+
+# Install.py tool to download, unpack, build, and link to the MS-CG library
+# used to automate the steps described in the README file in this dir
+
+import sys,os,re,commands
+
+# help message
+
+help = """
+Syntax: python Install.py -h hpath hdir -g -b [suffix] -l
+ specify one or more options, order does not matter
+ -h = set home dir of MS-CG to be hpath/hdir
+ hpath can be full path, contain '~' or '.' chars
+ default hpath = . = lib/mscg
+ default hdir = MSCG-release-master = what GitHub zipfile unpacks to
+ -g = grab (download) zipfile from MS-CG GitHub website
+ unpack it to hpath/hdir
+ hpath must already exist
+ if hdir already exists, it will be deleted before unpack
+ -b = build MS-CG library in its src dir
+ optional suffix specifies which src/Make/Makefile.suffix to use
+ default suffix = g++_simple
+ -l = create 2 softlinks (includelink,liblink) in lib/mscg to MS-CG src dir
+"""
+
+# settings
+
+url = "https://github.com/uchicago-voth/MSCG-release/archive/master.zip"
+zipfile = "MS-CG-master.zip"
+zipdir = "MSCG-release-master"
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# expand to full path name
+# process leading '~' or relative path
+
+def fullpath(path):
+ return os.path.abspath(os.path.expanduser(path))
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+homepath = "."
+homedir = zipdir
+
+grabflag = 0
+buildflag = 0
+msuffix = "g++_simple"
+linkflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-h":
+ if iarg+3 > nargs: error()
+ homepath = args[iarg+1]
+ homedir = args[iarg+2]
+ iarg += 3
+ elif args[iarg] == "-g":
+ grabflag = 1
+ iarg += 1
+ elif args[iarg] == "-b":
+ buildflag = 1
+ if iarg+1 < nargs and args[iarg+1][0] != '-':
+ msuffix = args[iarg+1]
+ iarg += 1
+ iarg += 1
+ elif args[iarg] == "-l":
+ linkflag = 1
+ iarg += 1
+ else: error()
+
+homepath = fullpath(homepath)
+if not os.path.isdir(homepath): error("MS-CG path does not exist")
+homedir = "%s/%s" % (homepath,homedir)
+
+# download and unpack MS-CG zipfile
+
+if grabflag:
+ print "Downloading MS-CG ..."
+ cmd = "curl -L %s > %s/%s" % (url,homepath,zipfile)
+ print cmd
+ print commands.getoutput(cmd)
+
+ print "Unpacking MS-CG zipfile ..."
+ if os.path.exists("%s/%s" % (homepath,zipdir)):
+ commands.getoutput("rm -rf %s/%s" % (homepath,zipdir))
+ cmd = "cd %s; unzip %s" % (homepath,zipfile)
+ commands.getoutput(cmd)
+ if os.path.basename(homedir) != zipdir:
+ if os.path.exists(homedir): commands.getoutput("rm -rf %s" % homedir)
+ os.rename("%s/%s" % (homepath,zipdir),homedir)
+
+# build MS-CG
+
+if buildflag:
+ print "Building MS-CG ..."
+ cmd = "cd %s/src; cp Make/Makefile.%s .; make -f Makefile.%s" % \
+ (homedir,msuffix,msuffix)
+ txt = commands.getoutput(cmd)
+ print txt
+
+# create 2 links in lib/mscg to MS-CG src dir
+
+if linkflag:
+ print "Creating links to MS-CG include and lib files"
+ if os.path.isfile("includelink") or os.path.islink("includelink"):
+ os.remove("includelink")
+ if os.path.isfile("liblink") or os.path.islink("liblink"):
+ os.remove("liblink")
+ cmd = "ln -s %s/src includelink" % homedir
+ commands.getoutput(cmd)
+ cmd = "ln -s %s/src liblink" % homedir
+ commands.getoutput(cmd)
diff --git a/lib/mscg/Makefile.lammps b/lib/mscg/Makefile.lammps
index 0aa55b087..f0d9a9b8a 100644
--- a/lib/mscg/Makefile.lammps
+++ b/lib/mscg/Makefile.lammps
@@ -1,5 +1,5 @@
# Settings that the LAMMPS build will import when this package library is used
-mscg_SYSINC =
-mscg_SYSLIB = -lm -lgsl -llapack -lcblas
+mscg_SYSINC = -std=c++11
+mscg_SYSLIB = -lm -lgsl -llapack -lgslcblas
mscg_SYSPATH =
diff --git a/lib/mscg/README b/lib/mscg/README
index cc4fc9a66..b73c8563c 100755
--- a/lib/mscg/README
+++ b/lib/mscg/README
@@ -1,53 +1,67 @@
This directory contains links to the Multi-scale Coarse-graining
(MS-CG) library which is required to use the MSCG package and its fix
command in a LAMMPS input script.
The MS-CG library is available at
https://github.com/uchicago-voth/MSCG-release and was developed by
Jacob Wagner in Greg Voth's group at the University of Chicago.
+This library requires a compiler with C++11 support (e.g., g++ v4.9+),
+LAPACK, and the GNU scientific library (GSL v 2.1+).
+
+You can type "make lib-mscg" from the src directory to see help on how
+to download and build this library via make commands, or you can do
+the same thing by typing "python Install.py" from within this
+directory, or you can do it manually by following the instructions
+below.
+
-----------------
You must perform the following steps yourself.
1. Download MS-CG at https://github.com/uchicago-voth/MSCG-release
either as a tarball or via SVN, and unpack the tarball either in
this /lib/mscg directory or somewhere else on your system.
-2. Compile MS-CG from within its home directory using your makefile choice:
+2. Ensure that you have LAPACK and GSL (or Intel MKL) as well as a compiler
+ with support for C++11.
+
+3. Compile MS-CG from within its home directory using your makefile of choice:
% make -f Makefile."name" libmscg.a
+ It is recommended that you start with Makefile.g++_simple
+ for most machines
-3. There is no need to install MS-CG if you only wish
+4. There is no need to install MS-CG if you only wish
to use it from LAMMPS.
-4. Create two soft links in this dir (lib/mscg) to the MS-CG src
+5. Create two soft links in this dir (lib/mscg) to the MS-CG src
directory. E.g if you built MS-CG in this dir:
- % ln -s mscgfm-master/src includelink
- % ln -s mscgfm-master/src liblink
+ % ln -s src includelink
+ % ln -s src liblink
These links could instead be set to the include and lib
directories created by a MS-CG install, e.g.
% ln -s /usr/local/include includelink
% ln -s /usr/local/lib liblink
-----------------
When these steps are complete you can build LAMMPS with the MS-CG
package installed:
% cd lammps/src
% make yes-USER-MSCG
% make g++ (or whatever target you wish)
Note that if you download and unpack a new LAMMPS tarball, the
"includelink" and "liblink" files will be lost and you will need to
re-create them (step 4). If you built MS-CG in this directory (as
opposed to somewhere else on your system) and did not install it
somewhere else, you will also need to repeat steps 1,2,3.
The Makefile.lammps file in this directory is there for compatibility
with the way other libraries under the lib dir are linked with by
-LAMMPS. MS-CG requires the GSL, LAPACK, and BLAS libraries as listed
-in Makefile.lammps. If they are not in default locations where your
+LAMMPS. MS-CG requires the GSL and LAPACK libraries as listed in
+Makefile.lammps. If they are not in default locations where your
LD_LIBRARY_PATH environment settings can find them, then you should
add the approrpriate -L paths to the mscg_SYSPATH variable in
Makefile.lammps.
diff --git a/lib/netcdf/README b/lib/netcdf/README
index 00db8df00..b18ea1d27 100644
--- a/lib/netcdf/README
+++ b/lib/netcdf/README
@@ -1,43 +1,46 @@
The Makefile.lammps file in this directory is used when building
LAMMPS with packages that make use of the NetCDF library or its
-parallel version. The file has several settings needed to compile
+parallel version. For example, the USER-NETCDF package which adds
+dump netcdf and dump netcdf/mpiio commands.
+
+The file has several settings needed to compile
and link LAMMPS with the NetCDF and parallel NetCDF support.
For any regular NetCDF installation, all required flags should be
autodetected. Please note that parallel NetCDF support is
beneficial only when you run on a machine with very many processors
like an IBM BlueGene or Cray. For most people regular NetCDF
support should be sufficient and not cause any performance
penalties.
If you have problems compiling or linking, you may have to set
the flags manually. There are three makefile variables
1) netcdf_SYSINC
This is for setting preprocessor options and include file paths.
Set -DLMP_HAS_NETCDF, if you have NetCDF installed.
Set -DLMP_HAS_PNETCDF, if you have parallel NetCDF installed.
You can have either or both defines set. If none of these are
set, LAMMPS will compile, but the NetCDF enabled functionality
will not be available.
In addition you may have to point to the folder with the include
with -I/path/to/netcdf/include
Example for a Fedora 24 machine with serial NetCDF installed as
netcdf-devel-4.4.0-3.fc24.x86_64 RPM package:
netcdf_SYSINC = -DLMP_HAS_NETCDF -I/usr/include -I/usr/include/hdf
2) netcdf_SYSLIB
This is the setting for all required libraries that need to be linked to.
Example for a Fedora 24 machine with serial NetCDF installed as
netcdf-devel-4.4.0-3.fc24.x86_64 RPM package:
netcdf_SYSLIB = -lnetcdf
3) netcdf_SYSPATH
This is the setting for the path of directories with the NetCDF libraries.
Typically, this will be of the form -L/path/to/netcdf/lib
In the example from above, it can be left empty, because the Linux
distribution provided libraries are installed in a system library location.
diff --git a/lib/poems/Install.py b/lib/poems/Install.py
new file mode 100644
index 000000000..18b426f92
--- /dev/null
+++ b/lib/poems/Install.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+
+# install.py tool to do a generic build of a library
+# soft linked to by many of the lib/Install.py files
+# used to automate the steps described in the corresponding lib/README
+
+import sys,commands,os
+
+# help message
+
+help = """
+Syntax: python Install.py -m machine -e suffix
+ specify -m and optionally -e, order does not matter
+ -m = peform a clean followed by "make -f Makefile.machine"
+ machine = suffix of a lib/Makefile.* file
+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
+ does not alter existing Makefile.machine
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+machine = None
+extraflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-m":
+ if iarg+2 > nargs: error()
+ machine = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-e":
+ if iarg+2 > nargs: error()
+ extraflag = 1
+ suffix = args[iarg+1]
+ iarg += 2
+ else: error()
+
+# set lib from working dir
+
+cwd = os.getcwd()
+lib = os.path.basename(cwd)
+
+# create Makefile.auto as copy of Makefile.machine
+# reset EXTRAMAKE if requested
+
+if not os.path.exists("Makefile.%s" % machine):
+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
+
+lines = open("Makefile.%s" % machine,'r').readlines()
+fp = open("Makefile.auto",'w')
+
+for line in lines:
+ words = line.split()
+ if len(words) == 3 and extraflag and \
+ words[0] == "EXTRAMAKE" and words[1] == '=':
+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
+ print >>fp,line,
+
+fp.close()
+
+# make the library via Makefile.auto
+
+print "Building lib%s.a ..." % lib
+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
+txt = commands.getoutput(cmd)
+print txt
+
+if os.path.exists("lib%s.a" % lib): print "Build was successful"
+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
+if not os.path.exists("Makefile.lammps"):
+ print "lib/%s/Makefile.lammps was NOT created" % lib
diff --git a/lib/poems/README b/lib/poems/README
index 836595bdd..e0ded85e4 100644
--- a/lib/poems/README
+++ b/lib/poems/README
@@ -1,65 +1,70 @@
POEMS (Parallelizable Open source Efficient Multibody Software) library
Rudranarayan Mukherjee, RPI
mukher at rpi.edu
June 2006
This is version 1.0 of the POEMS library, general purpose distributed
multibody dynamics software, which is able to simulate the dynamics of
articulated body systems.
POEMS is supported by the funding agencies listed in the Grants' List.
POEMS is an open source program distributed under the Rensselaer
Scorec License.
The Authors as listed in Authors' List reserve the right to reject the
request on technical supports of the POEMS freely obtained.
We are open to hear from you about bugs, an idea for improvement, and
suggestions, etc. We keep improving the POEMS. Check the POEMS web
site (www.rpi.edu/~anderk5/POEMS) for the recent changes.
All correspondence regarding the POEMS should be sent to:
By email: (preferred)
Prof. Kurt Anderson (anderk5@rpi.edu) or
Rudranarayan Mukherjee (mukher@rpi.edu) - include "[POEMS]" in the subject
or by mail:
Prof. Kurt S. Anderson
4006 Jonsson Engineering Center
Rensselaer Polytechnic Institute
110 8th Street,
Troy, NY 12180-3510, U.S.A.
-------------------------------------------------
This directory has source files to build a library that LAMMPS
links against when using the POEMA package.
This library must be built with a C++ compiler, before LAMMPS is
built, so LAMMPS can link against it.
+You can type "make lib-poems" from the src directory to see help on
+how to build this library via make commands, or you can do the same
+thing by typing "python Install.py" from within this directory, or you
+can do it manually by following the instructions below.
+
Build the library using one of the provided Makefile.* files or create
your own, specific to your compiler and system. For example:
make -f Makefile.g++
When you are done building this library, two files should
exist in this directory:
libpoems.a the library LAMMPS will link against
Makefile.lammps settings the LAMMPS Makefile will import
Makefile.lammps is created by the make command, by copying one of the
Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
Makefile.* files.
Makefile.lammps has settings for 3 variables:
user-poems_SYSINC = leave blank for this package
user-poems_SYSLIB = leave blank for this package
user-poems_SYSPATH = leave blank for this package
Because this library does not currently need the additional settings
the settings in Makefile.lammps.empty should work.
diff --git a/lib/qmmm/Install.py b/lib/qmmm/Install.py
new file mode 100644
index 000000000..18b426f92
--- /dev/null
+++ b/lib/qmmm/Install.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+
+# install.py tool to do a generic build of a library
+# soft linked to by many of the lib/Install.py files
+# used to automate the steps described in the corresponding lib/README
+
+import sys,commands,os
+
+# help message
+
+help = """
+Syntax: python Install.py -m machine -e suffix
+ specify -m and optionally -e, order does not matter
+ -m = peform a clean followed by "make -f Makefile.machine"
+ machine = suffix of a lib/Makefile.* file
+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
+ does not alter existing Makefile.machine
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+machine = None
+extraflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-m":
+ if iarg+2 > nargs: error()
+ machine = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-e":
+ if iarg+2 > nargs: error()
+ extraflag = 1
+ suffix = args[iarg+1]
+ iarg += 2
+ else: error()
+
+# set lib from working dir
+
+cwd = os.getcwd()
+lib = os.path.basename(cwd)
+
+# create Makefile.auto as copy of Makefile.machine
+# reset EXTRAMAKE if requested
+
+if not os.path.exists("Makefile.%s" % machine):
+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
+
+lines = open("Makefile.%s" % machine,'r').readlines()
+fp = open("Makefile.auto",'w')
+
+for line in lines:
+ words = line.split()
+ if len(words) == 3 and extraflag and \
+ words[0] == "EXTRAMAKE" and words[1] == '=':
+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
+ print >>fp,line,
+
+fp.close()
+
+# make the library via Makefile.auto
+
+print "Building lib%s.a ..." % lib
+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
+txt = commands.getoutput(cmd)
+print txt
+
+if os.path.exists("lib%s.a" % lib): print "Build was successful"
+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
+if not os.path.exists("Makefile.lammps"):
+ print "lib/%s/Makefile.lammps was NOT created" % lib
diff --git a/lib/qmmm/README b/lib/qmmm/README
index b50f25ed6..2746c9e86 100644
--- a/lib/qmmm/README
+++ b/lib/qmmm/README
@@ -1,187 +1,196 @@
QM/MM support library
Axel Kohlmeyer, akohlmey@gmail.com
Temple University, Philadelphia and ICTP, Trieste
with contributions by
Carlo Cavazzoni & Mariella Ippolito
Cineca, Italy
This library provides the basic glue code to combine LAMMPS with the
Quantum ESPRESSO package plane wave density functional theory code for
performing QM/MM molecular dynamics simulations. More information on
Quantum ESPRESSO can be found at: http://www.quantum-espresso.org
The interface code itself is designed so it can also be combined with
other QM codes, however only support for Quantum ESPRESSO is currently
the only option. Adding support for a different QM code will require
to write a new version of the top-level wrapper code, pwqmmm.c, and
also an interface layer into the QM code similar to the one in QE.
+You can type "make lib-qmmm" from the src directory to see help on how
+to build this library (steps 1 and 2 below) via make commands, or you
+can do the same thing by typing "python Install.py" from within this
+directory, or you can do it manually by following the instructions
+below.
+
+However you perform steps 1 and 2, you will need to perform steps 3
+and 4 manually, as outlined below.
+
-------------------------------------------------
WARNING: This is experimental code under developement and is provided
at this early stage to encourage others to write interfaces to other
QM codes. Please test *very* carefully before using this software for
production calculations. At the time of the last update of this README
(July 2016) you have to download a QE snapshot (revision 12611) from
the QE subversion repository.
At this point, both mechanical and multipole based electrostatic
coupling have been successfully tested on a cluster of water
molecules as included in the two example folders.
-------------------------------------------------
Building the QM/MM executable has to be done in multiple stages.
Step 1)
Build the qmmm coupling library in this directory using one of the
provided Makefile.<compiler> files or create your own, specific to
your compiler and system. For example with:
make -f Makefile.gfortran
When you are done building this library, two new files should
exist in this directory:
libqmmm.a the library LAMMPS will link against
Makefile.lammps settings the LAMMPS Makefile will import
Makefile.lammps is created by the make command by simply copying the
Makefile.lammps.empty file. Currently no additional dependencies for
this library exist.
Step 2)
Build a standalone LAMMPS executable as described in the LAMMPS
documentation and include the USER-QMMM package. This executable
is not functional for QM/MM, but it will usually be needed to
run all MM calculations for equilibration and testing and also
to confirm that the classical part of the code is set up correctly.
Step 3)
Build a standalone pw.x executable in the Quantum ESPRESSO directory
and also make the "couple" target. At the time of this writing
(July 2016) you have to download a QE snapshot (revision 12611)
from the SVN repository, since no official release with the
completed QM/MM support code has been made available yet. The current
plan is to have a usable QM/MM interface released with the next
Quantum ESPRESSO release version 6.0. Building the standalone pw.x
binary is also needed to confirm that corresponding QM input is
working correctly and to run test calculations on QM atoms only.
Step 4)
To compile and link the final QM/MM executable, which combines the
compiled sources from both packages, you have to return to the lib/qmmm
directory and now edit the Makefile.<compiler> for the Makefile
configuration used to compile LAMMPS and also update the directory
and library settings for the Quantum ESPRESSO installation.
The makefile variable MPILIBS needs to be set to include all linker
flags that will need to be used in addition to the various libraries
from _both_ packages. Please see the provided example(s).
"make -f Makefile.<compiler> all" will now recurse through both the
Quantum ESPRESSO and LAMMPS directories to compile all files that
require recompilation and then link the combined QM/MM executable.
If you want to only update the local objects and the QM/MM executable,
you can use "make -f Makefile.<compiler> pwqmmm.x"
Please refer to the specific LAMMPS and Quantum ESPRESSO documentation
for details on how to set up compilation for each package and make
sure you have a set of settings and flags that allow you to build
each package successfully, so that it can run on its own.
-------------------------------------------------
How it works.
This directory has the source files for an interface layer and a
toplevel code that combines objects/libraries from the QM code and
LAMMPS to build a QM/MM executable. LAMMPS will act as the MD "driver"
and will delegate computation of forces for the QM subset of the QM
code, i.e. Quantum ESPRESSO currently. While the code is combined into
a single executable, this executable can only act as either "QM slave",
"MM slave" or "MM master" and information between those is done solely
via MPI. Thus MPI is required to make it work, and both codes have
to be configured to use the same MPI library.
The toplevel code provided here will split the total number of cpus
into three partitions: the first for running a DFT calculation, the
second for running the "master" classical MD calculation, and the
third for a "slave" classical MD calculation. Each calculation will
have to be run in its own subdirectory with its own specific input
data and will write its output there as well. This and other settings
are provided in the QM/MM input file that is mandatory argument to the
QM/MM executable. The number of MM cpus is provided as the optional
second argument. The MM "slave" partition is always run with only 1
cpu thus the minimum required number of MM CPU is 2, which is also
the default. Therefore a QM/MM calculation with this code requires at
least 3 processes.
Thus the overall calling sequence is like this:
mpirun -np <total #cpus> ./pwqmmm.x <QM/MM input> [<#cpus for MM>]
A commented example QM/MM input file is given below.
-------------------------------------------------
To run a QM/MM calculation, you need to set up 4 inputs, each is
best placed in a separate subdirectory:
1: the total system as classical MD input. this becomes the MM master
and in addition to the regular MD setup it needs to define a group,
e.g. "wat" for the atoms that are treated as QM atoms and then add
the QM/MM fix like this:
fix 1 wat qmmm
2: the QM system as classical MD input
This system must only contain the atom (and bonds, angles, etc) for
the subsystem that is supposed to be treated with the QM code. This
will become the MM slave run and here the QM/MM fix needs to be
applied to all atoms:
fix 1 all qmmm
3: the QM system as QM input
This needs to be a cluster calculation for the QM subset, i.e. the
same atoms as in the MM slave configuration. For Quantum ESPRESSO
this is a regular input which in addition contains the line
tqmmm = .true.
in the &CONTROL namelist. This will make the include QE code
connect to the LAMMPS code and receive updated positions while
it sends QM forces back to the MM code.
4: the fourth input is the QM/MM configuration file which tells the
QM/MM wrapper code where to find the other 3 inputs, where to place
the corresponding output of the partitions and how many MD steps are
to run with this setup.
-------------------------------------------------
# configuration file for QMMM wrapper
mode mech # coupling choices: o(ff), m(echanical), e(lectrostatic)
steps 20 # number of QM/MM (MD) steps
verbose 1 # verbosity level (0=no QM/MM output during run)
restart water.restart # checkpoint/restart file to write out at end
# QM system config
qmdir qm-pw # directory to run QM system in
qminp water.in # input file for QM code
qmout NULL # output file for QM code (or NULL to print to screen)
# MM master config
madir mm-master # directory to run MM master in
mainp water.in # input file for MM master
maout water.out # output file for MM master (or NULL to print to screen)
# MM slave config
sldir mm-slave # directory to run MM slave in
slinp water_single.in # input file for MM slave
slout water_single.out # output file for MM slave (or NULL to print to screen)
diff --git a/lib/reax/Install.py b/lib/reax/Install.py
new file mode 100644
index 000000000..18b426f92
--- /dev/null
+++ b/lib/reax/Install.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+
+# install.py tool to do a generic build of a library
+# soft linked to by many of the lib/Install.py files
+# used to automate the steps described in the corresponding lib/README
+
+import sys,commands,os
+
+# help message
+
+help = """
+Syntax: python Install.py -m machine -e suffix
+ specify -m and optionally -e, order does not matter
+ -m = peform a clean followed by "make -f Makefile.machine"
+ machine = suffix of a lib/Makefile.* file
+ -e = set EXTRAMAKE variable in Makefile.machine to Makefile.lammps.suffix
+ does not alter existing Makefile.machine
+"""
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+machine = None
+extraflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-m":
+ if iarg+2 > nargs: error()
+ machine = args[iarg+1]
+ iarg += 2
+ elif args[iarg] == "-e":
+ if iarg+2 > nargs: error()
+ extraflag = 1
+ suffix = args[iarg+1]
+ iarg += 2
+ else: error()
+
+# set lib from working dir
+
+cwd = os.getcwd()
+lib = os.path.basename(cwd)
+
+# create Makefile.auto as copy of Makefile.machine
+# reset EXTRAMAKE if requested
+
+if not os.path.exists("Makefile.%s" % machine):
+ error("lib/%s/Makefile.%s does not exist" % (lib,machine))
+
+lines = open("Makefile.%s" % machine,'r').readlines()
+fp = open("Makefile.auto",'w')
+
+for line in lines:
+ words = line.split()
+ if len(words) == 3 and extraflag and \
+ words[0] == "EXTRAMAKE" and words[1] == '=':
+ line = line.replace(words[2],"Makefile.lammps.%s" % suffix)
+ print >>fp,line,
+
+fp.close()
+
+# make the library via Makefile.auto
+
+print "Building lib%s.a ..." % lib
+cmd = "make -f Makefile.auto clean; make -f Makefile.auto"
+txt = commands.getoutput(cmd)
+print txt
+
+if os.path.exists("lib%s.a" % lib): print "Build was successful"
+else: error("Build of lib/%s/lib%s.a was NOT successful" % (lib,lib))
+if not os.path.exists("Makefile.lammps"):
+ print "lib/%s/Makefile.lammps was NOT created" % lib
diff --git a/lib/reax/README b/lib/reax/README
index 2840a242a..f21a47061 100644
--- a/lib/reax/README
+++ b/lib/reax/README
@@ -1,73 +1,78 @@
ReaxFF library
Aidan Thompson, Sandia National Labs
athomps at sandia.gov
Jan 2008
This library is an implementation of the ReaxFF potential,
specifically designed to work with LAMMPS. It is derived from Adri van
Duin's original serial code, with intervening incarnations in CMDF and
GRASP.
-------------------------------------------------
This directory has source files to build a library that LAMMPS
links against when using the REAX package.
This library must be built with a F90 compiler, before LAMMPS is
built, so LAMMPS can link against it.
+You can type "make lib-reax" from the src directory to see help on how
+to build this library via make commands, or you can do the same thing
+by typing "python Install.py" from within this directory, or you can
+do it manually by following the instructions below.
+
Build the library using one of the provided Makefile.* files or create
your own, specific to your compiler and system. For example:
make -f Makefile.gfortran
When you are done building this library, two files should
exist in this directory:
libreax.a the library LAMMPS will link against
Makefile.lammps settings the LAMMPS Makefile will import
Makefile.lammps is created by the make command, by copying one of the
Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
Makefile.* files.
IMPORTANT: You must examine the final Makefile.lammps to insure it is
correct for your system, else the LAMMPS build will likely fail.
Makefile.lammps has settings for 3 variables:
user-reax_SYSINC = leave blank for this package
user-reax_SYSLIB = auxiliary F90 libs needed to link a F90 lib with
a C++ program (LAMMPS) via a C++ compiler
user-reax_SYSPATH = path(s) to where those libraries are
Because you have a F90 compiler on your system, you should have these
libraries. But you will have to figure out which ones are needed and
where they are. Examples of common configurations are in the
Makefile.lammps.* files.
-------------------------------------------------
Additional build notes:
The include file reax_defs.h is used by both the ReaxFF library source
files and the LAMMPS pair_reax.cpp source file (in package src/REAX).
It contains dimensions of statically-allocated arrays created by the
ReaxFF library. The size of these arrays must be set small enough to
avoid exceeding the available machine memory, and large enough to fit
the actual data generated by ReaxFF. If you change the values in
reax_defs.h, you must first rebuild the library and then rebuild
LAMMPS.
This library is called by functions in pair_reax.cpp. The C++ to
FORTRAN function calls in pair_reax.cpp assume that FORTRAN object
names are converted to C object names by appending an underscore
character. This is generally the case, but on machines that do not
conform to this convention, you will need to modify either the C++
code or your compiler settings. The name conversion is handled by the
preprocessor macro called FORTRAN in the file pair_reax_fortran.h,
which is included by pair_reax.cpp. Different definitions of this
macro can be obtained by adding a machine-specific macro definition to
the CCFLAGS variable in your your LAMMPS Makefile e.g. -D_IBM. See
pair_reax_fortran.h for more info.
diff --git a/lib/smd/Install.py b/lib/smd/Install.py
new file mode 100644
index 000000000..dc0a3187c
--- /dev/null
+++ b/lib/smd/Install.py
@@ -0,0 +1,103 @@
+#!/usr/bin/env python
+
+# Install.py tool to download, unpack, and point to the Eigen library
+# used to automate the steps described in the README file in this dir
+
+import sys,os,re,glob,commands
+
+# help message
+
+help = """
+Syntax: python Install.py -h hpath hdir -g -l
+ specify one or more options, order does not matter
+ -h = set home dir of Eigen to be hpath/hdir
+ hpath can be full path, contain '~' or '.' chars
+ default hpath = . = lib/smd
+ default hdir = "ee" = what tarball unpacks to (eigen-eigen-*)
+ -g = grab (download) tarball from http://eigen.tuxfamily.org website
+ unpack it to hpath/hdir
+ hpath must already exist
+ if hdir already exists, it will be deleted before unpack
+ -l = create softlink (includelink) in lib/smd to Eigen src dir
+"""
+
+# settings
+
+url = "http://bitbucket.org/eigen/eigen/get/3.3.3.tar.gz"
+tarball = "eigen.tar.gz"
+
+# print error message or help
+
+def error(str=None):
+ if not str: print help
+ else: print "ERROR",str
+ sys.exit()
+
+# expand to full path name
+# process leading '~' or relative path
+
+def fullpath(path):
+ return os.path.abspath(os.path.expanduser(path))
+
+# parse args
+
+args = sys.argv[1:]
+nargs = len(args)
+if nargs == 0: error()
+
+homepath = "."
+homedir = "ee"
+
+grabflag = 0
+linkflag = 0
+
+iarg = 0
+while iarg < nargs:
+ if args[iarg] == "-h":
+ if iarg+3 > nargs: error()
+ homepath = args[iarg+1]
+ homedir = args[iarg+2]
+ iarg += 3
+ elif args[iarg] == "-g":
+ grabflag = 1
+ iarg += 1
+ elif args[iarg] == "-l":
+ linkflag = 1
+ iarg += 1
+ else: error()
+
+homepath = fullpath(homepath)
+if not os.path.isdir(homepath): error("Eigen path does not exist")
+
+# download and unpack Eigen tarball
+# glob to find name of dir it unpacks to
+
+if grabflag:
+ print "Downloading Eigen ..."
+ cmd = "curl -L %s > %s/%s" % (url,homepath,tarball)
+ print cmd
+ print commands.getoutput(cmd)
+
+ print "Unpacking Eigen tarball ..."
+ edir = glob.glob("%s/eigen-eigen-*" % homepath)
+ for one in edir:
+ if os.path.isdir(one): commands.getoutput("rm -rf %s" % one)
+ cmd = "cd %s; tar zxvf %s" % (homepath,tarball)
+ commands.getoutput(cmd)
+ if homedir != "ee":
+ if os.path.exists(homedir): commands.getoutput("rm -rf %s" % homedir)
+ edir = glob.glob("%s/eigen-eigen-*" % homepath)
+ os.rename(edir[0],"%s/%s" % (homepath,homedir))
+
+# create link in lib/smd to Eigen src dir
+
+if linkflag:
+ print "Creating link to Eigen files"
+ if os.path.isfile("includelink") or os.path.islink("includelink"):
+ os.remove("includelink")
+ if homedir == "ee":
+ edir = glob.glob("%s/eigen-eigen-*" % homepath)
+ linkdir = edir[0]
+ else: linkdir = "%s/%s" % (homepath,homedir)
+ cmd = "ln -s %s includelink" % linkdir
+ commands.getoutput(cmd)
diff --git a/lib/smd/README b/lib/smd/README
index 846c440da..1bd5902a1 100644
--- a/lib/smd/README
+++ b/lib/smd/README
@@ -1,41 +1,44 @@
This directory contains links to the Eigen library which is required
to use the USER-SMD package in a LAMMPS input script.
The Eigen library is available at http://eigen.tuxfamily.org. It's
a general C++ template library for linear algebra.
-You must perform the following steps yourself, or you can use the
-install.py Python script to automate any or all steps of the process.
-Type "python install.py" for instructions.
+You can type "make lib-smd" from the src directory to see help on how
+to download build this library via make commands, or you can do the
+same thing by typing "python Install.py" from within this directory,
+or you can do it manually by following the instructions below.
+
+Instructions:
1. Download the Eigen tarball at http://eigen.tuxfamily.org and
unpack the tarball either in this /lib/smd directory or somewhere
else on your system. It should unpack with into a directory with
a name similar to eigen-eigen-bdd17ee3b1b3. You can rename
the directory to just "eigen" if you wish. Note that Eigen is a
template library, so you do not have to build it.
2. Create a soft link in this dir (lib/smd)
to the eigen directory. E.g if you unpacked Eigen in this dir:
% ln -s eigen-eigen-bdd17ee3b1b3 includelink
If you unpacked Eigen somewhere else and renamed
the resulting directory to just eigen, then do something like this:
% ln -s /home/sjplimp/tools/eigen includelink
When these steps are complete you can build LAMMPS
with the USER-SMD package installed:
% cd lammps/src
% make yes-user-smd
% make g++ (or whatever target you wish)
Note that if you download and unpack a new LAMMPS tarball, the
"includelink" and "liblink" files will be lost and you will need to
re-create them (step 2). If you unpacked the Eigen library in this
directory (as opposed to somewhere else on your system), you will also
need to repeat step 1.
The Makefile.lammps file in this directory is there for compatibility
with the way other libraries under the lib dir are linked with by
LAMMPS. However, the Eigen library requires no auxiliary files or
settings, so its variables are blank.
diff --git a/lib/voronoi/Install.py b/lib/voronoi/Install.py
index 8ae08917e..7d847183b 100644
--- a/lib/voronoi/Install.py
+++ b/lib/voronoi/Install.py
@@ -1,116 +1,118 @@
#!/usr/bin/env python
-# install.py tool to download, unpack, build, and link to the Voro++ library
+# Install.py tool to download, unpack, build, and link to the Voro++ library
# used to automate the steps described in the README file in this dir
import sys,os,re,urllib,commands
# help message
help = """
-Syntax: install.py -v version -g gdir [gname] -b bdir -l ldir
+Syntax: python Install.py -v version -h hpath hdir -g -b -l
specify one or more options, order does not matter
- gdir,bdir,ldir can be paths relative to lib/latte, full paths, or contain ~
-v = version of Voro++ to download and build
- default = voro++-0.4.6 (current as of Jan 2015)
- -g = grab (download) from math.lbl.gov/voro++ website
- unpack tarfile in gdir to produce version dir (e.g. voro++-0.4.6)
- if optional gname specified, rename version dir to gname within gdir
- -b = build Voro++, bdir = Voro++ home directory
- note that bdir must include the version suffix unless renamed
- -l = create 2 softlinks (includelink,liblink)
- in lib/voronoi to src dir of ldir = Voro++ home directory
- note that ldir must include the version suffix unless renamed
+ default version = voro++-0.4.6 (current as of Jan 2015)
+ -h = set home dir of Voro++ to be hpath/hdir
+ hpath can be full path, contain '~' or '.' chars
+ default hpath = . = lib/voronoi
+ default hdir = voro++-0.4.6 = what tarball unpacks to
+ -g = grab (download) tarball from math.lbl.gov/voro++ website
+ unpack it to hpath/hdir
+ hpath must already exist
+ if hdir already exists, it will be deleted before unpack
+ -b = build Voro++ library in its src dir
+ -l = create 2 softlinks (includelink,liblink) in lib/voronoi to Voro++ src dir
"""
# settings
version = "voro++-0.4.6"
url = "http://math.lbl.gov/voro++/download/dir/%s.tar.gz" % version
# print error message or help
def error(str=None):
if not str: print help
else: print "ERROR",str
sys.exit()
# expand to full path name
# process leading '~' or relative path
def fullpath(path):
return os.path.abspath(os.path.expanduser(path))
# parse args
args = sys.argv[1:]
nargs = len(args)
if nargs == 0: error()
+homepath = "."
+homedir = version
+
grabflag = 0
buildflag = 0
linkflag = 0
iarg = 0
while iarg < nargs:
if args[iarg] == "-v":
if iarg+2 > nargs: error()
version = args[iarg+1]
- iarg += 2
+ iarg += 2
+ elif args[iarg] == "-h":
+ if iarg+3 > nargs: error()
+ homepath = args[iarg+1]
+ homedir = args[iarg+2]
+ iarg += 3
elif args[iarg] == "-g":
- if iarg+2 > nargs: error()
grabflag = 1
- grabdir = args[iarg+1]
- grabname = None
- if iarg+2 < nargs and args[iarg+2][0] != '-':
- grabname = args[iarg+2]
- iarg += 1
- iarg += 2
+ iarg += 1
elif args[iarg] == "-b":
- if iarg+2 > nargs: error()
buildflag = 1
- builddir = args[iarg+1]
- iarg += 2
+ iarg += 1
elif args[iarg] == "-l":
- if iarg+2 > nargs: error()
linkflag = 1
- linkdir = args[iarg+1]
- iarg += 2
+ iarg += 1
else: error()
+homepath = fullpath(homepath)
+if not os.path.isdir(homepath): error("Voro++ path does not exist")
+homedir = "%s/%s" % (homepath,homedir)
+
# download and unpack Voro++ tarball
if grabflag:
print "Downloading Voro++ ..."
- grabdir = fullpath(grabdir)
- if not os.path.isdir(grabdir): error("Grab directory does not exist")
- urllib.urlretrieve(url,"%s/%s.tar.gz" % (grabdir,version))
+ urllib.urlretrieve(url,"%s/%s.tar.gz" % (homepath,version))
print "Unpacking Voro++ tarball ..."
- tardir = "%s/%s" % (grabdir,version)
- if os.path.exists(tardir): commands.getoutput("rm -rf %s" % tardir)
- cmd = "cd %s; tar zxvf %s.tar.gz" % (grabdir,version)
- txt = commands.getoutput(cmd)
- print tardir,grabdir,grabname
- if grabname: os.rename(tardir,"%s/%s" % (grabdir,grabname))
+ if os.path.exists("%s/%s" % (homepath,version)):
+ commands.getoutput("rm -rf %s/%s" % (homepath,version))
+ cmd = "cd %s; tar zxvf %s.tar.gz" % (homepath,version)
+ commands.getoutput(cmd)
+ if os.path.basename(homedir) != version:
+ if os.path.exists(homedir): commands.getoutput("rm -rf %s" % homedir)
+ os.rename("%s/%s" % (homepath,version),homedir)
# build Voro++
if buildflag:
print "Building Voro++ ..."
- cmd = "cd %s; make" % builddir
+ cmd = "cd %s; make" % homedir
txt = commands.getoutput(cmd)
print txt
# create 2 links in lib/voronoi to Voro++ src dir
if linkflag:
print "Creating links to Voro++ include and lib files"
if os.path.isfile("includelink") or os.path.islink("includelink"):
os.remove("includelink")
if os.path.isfile("liblink") or os.path.islink("liblink"):
os.remove("liblink")
- cmd = "ln -s %s/src includelink" % linkdir
+ cmd = "ln -s %s/src includelink" % homedir
commands.getoutput(cmd)
- cmd = "ln -s %s/src liblink" % linkdir
+ cmd = "ln -s %s/src liblink" % homedir
commands.getoutput(cmd)
diff --git a/lib/voronoi/README b/lib/voronoi/README
index 2507a9bae..9863632be 100644
--- a/lib/voronoi/README
+++ b/lib/voronoi/README
@@ -1,59 +1,63 @@
This directory contains links to the Voro++ library which is required
to use the VORONOI package and its compute voronoi/atom command in a
LAMMPS input script.
The Voro++ library is available at http://math.lbl.gov/voro++ and was
developed by Chris H. Rycroft while at UC Berkeley / Lawrence Berkeley
Laboratory.
+You can type "make lib-voronoi" from the src directory to see help on
+how to download and build this library via make commands, or you can
+do the same thing by typing "python Install.py" from within this
+directory, or you can do it manually by following the instructions
+below.
+
-----------------
-You must perform the following steps yourself, or you can use the
-Install.py Python script to automate any or all steps of the process.
-Type "python Install.py" for instructions.
+Instructions:
1. Download Voro++ at http://math.lbl.gov/voro++/download
either as a tarball or via SVN, and unpack the
tarball either in this /lib/voronoi directory
or somewhere else on your system.
2. compile Voro++ from within its home directory
% make
3. There is no need to install Voro++ if you only wish
to use it from LAMMPS. You can install it if you
wish to use it stand-alone or from other codes:
a) install under the default /usr/local
% sudo make install
b) install under a user-writeable location by first
changing the PREFIX variable in the config.mk file, then
% make install
4. Create two soft links in this dir (lib/voronoi)
to the Voro++ src directory is. E.g if you built Voro++ in this dir:
% ln -s voro++-0.4.6/src includelink
% ln -s voro++-0.4.6/src liblink
These links could instead be set to the include and lib
directories created by a Voro++ install, e.g.
% ln -s /usr/local/include includelink
% ln -s /usr/local/lib liblink
-----------------
When these steps are complete you can build LAMMPS
with the VORONOI package installed:
% cd lammps/src
% make yes-voronoi
% make g++ (or whatever target you wish)
Note that if you download and unpack a new LAMMPS tarball, the
"includelink" and "liblink" files will be lost and you will need to
re-create them (step 4). If you built Voro++ in this directory (as
opposed to somewhere else on your system) and did not install it
somewhere else, you will also need to repeat steps 1,2,3.
The Makefile.lammps file in this directory is there for compatibility
with the way other libraries under the lib dir are linked with by
LAMMPS. However, Voro++ requires no auxiliary files or settings, so
its variables are blank.
diff --git a/lib/vtk/Makefile.lammps b/lib/vtk/Makefile.lammps
index e3b28ed92..b86856a9c 100644
--- a/lib/vtk/Makefile.lammps
+++ b/lib/vtk/Makefile.lammps
@@ -1,13 +1,12 @@
# Settings that the LAMMPS build will import when this package library is used
-#
+
# settings for VTK-5.8.0 on RHEL/CentOS 6.x
vtk_SYSINC = -I/usr/include/vtk
vtk_SYSLIB = -lvtkCommon -lvtkIO
vtk_SYSPATH = -L/usr/lib64/vtk
-#
+
# settings for VTK 6.2.0 on Fedora 23
#vtk_SYSINC = -I/usr/include/vtk
#vtk_SYSLIB = -lvtkCommonCore -lvtkIOCore -lvtkCommonDataModel -lvtkIOXML -lvtkIOLegacy -lvtkIOParallelXML
#vtk_SYSPATH = -L/usr/lib64/vtk
-#
diff --git a/lib/vtk/README b/lib/vtk/README
index 11add94f5..61e2a40c2 100644
--- a/lib/vtk/README
+++ b/lib/vtk/README
@@ -1,28 +1,30 @@
-The Makefile.lammps file in this directory is used when building LAMMPS with
-its USER-VTK package installed. The file has several settings needed to
-compile and link LAMMPS with the VTK library. You should choose a
-Makefile.lammps.* file compatible with your system and your version of VTK, and
-copy it to Makefile.lammps before building LAMMPS itself. You may need to edit
-one of the provided files to match your system.
+The Makefile.lammps file in this directory is used when building
+LAMMPS with its USER-VTK package installed. The file has several
+settings needed to compile and link LAMMPS with the VTK library. You
+should choose a Makefile.lammps.* file compatible with your system and
+your version of VTK, and copy it to Makefile.lammps before building
+LAMMPS itself. You may need to edit one of the provided files to
+match your system.
-If you create a new Makefile.lammps file suitable for some version of VTK on
-some system, that is not a match to one of the provided Makefile.lammps.*
-files, you can send it to the developers, and we can include it in the
-distribution for others to use.
+If you create a new Makefile.lammps file suitable for some version of
+VTK on some system, that is not a match to one of the provided
+Makefile.lammps.* files, you can send it to the developers, and we can
+include it in the distribution for others to use.
To illustrate, these are example settings from the
Makefile.lammps.ubuntu14.04_vtk6 file:
vtk_SYSINC = -I/usr/include/vtk-6.0
vtk_SYSLIB = -lvtkCommonCore-6.0 -lvtkIOCore-6.0 -lvtkIOXML-6.0 -lvtkIOLegacy-6.0 -lvtkCommonDataModel-6.0
vtk_SYSPATH =
vtk_SYSINC refers to the include directory of the installed VTK library
-vtk_SYSLIB refers to the libraries needed to link to from an application
-(LAMMPS in this case) to "embed" VTK in the application. VTK consists of
-multiple shared libraries which are needed when using the USER-VTK package.
+vtk_SYSLIB refers to the libraries needed to link to from an
+application (LAMMPS in this case) to "embed" VTK in the
+application. VTK consists of multiple shared libraries which are
+needed when using the USER-VTK package.
-vtk_SYSPATH = refers to the path (e.g. -L/usr/local/lib) where the VTK library
-can be found. You may not need this setting if the path is already included in
-your LD_LIBRARY_PATH environment variable.
+vtk_SYSPATH = refers to the path (e.g. -L/usr/local/lib) where the VTK
+library can be found. You may not need this setting if the path is
+already included in your LD_LIBRARY_PATH environment variable.
diff --git a/src/.gitignore b/src/.gitignore
index bb6f0a392..1327704e4 100644
--- a/src/.gitignore
+++ b/src/.gitignore
@@ -1,1066 +1,1068 @@
/Makefile.package
/Makefile.package.settings
/MAKE/MINE
/Make.py.last
/lmp_*
/style_*.h
/*_gpu.h
/*_gpu.cpp
/*_intel.h
/*_intel.cpp
/*_kokkos.h
/*_kokkos.cpp
/*_omp.h
/*_omp.cpp
/*_tally.h
/*_tally.cpp
/*_rx.h
/*_rx.cpp
/*_ssa.h
/*_ssa.cpp
/kokkos.cpp
/kokkos.h
/kokkos_type.h
/kokkos_few.h
/manifold*.cpp
/manifold*.h
/fix_*manifold*.cpp
/fix_*manifold*.h
/fix_qeq*.cpp
/fix_qeq*.h
/compute_test_nbl.cpp
/compute_test_nbl.h
/pair_multi_lucy.cpp
/pair_multi_lucy.h
/colvarproxy_lammps.cpp
/colvarproxy_lammps.h
/fix_colvars.cpp
/fix_colvars.h
/dump_molfile.cpp
/dump_molfile.h
/molfile_interface.cpp
/molfile_interface.h
-/molfile_plugin.h
-/vmdplugin.h
/type_detector.h
/intel_buffers.cpp
/intel_buffers.h
/intel_intrinsics.h
/intel_preprocess.h
/intel_simd.h
/compute_sna_atom.cpp
/compute_sna_atom.h
/compute_snad_atom.cpp
/compute_snad_atom.h
/compute_snav_atom.cpp
/compute_snav_atom.h
/openmp_snap.h
/pair_snap.cpp
/pair_snap.h
/sna.cpp
/sna.h
/atom_vec_wavepacket.cpp
/atom_vec_wavepacket.h
/fix_nve_awpmd.cpp
/fix_nve_awpmd.h
/pair_awpmd_cut.cpp
/pair_awpmd_cut.h
-/dihedral_charmmfsh.cpp
-/dihedral_charmmfsh.h
+/dihedral_charmmfsw.cpp
+/dihedral_charmmfsw.h
/pair_lj_charmmfsw_coul_charmmfsh.cpp
/pair_lj_charmmfsw_coul_charmmfsh.h
/pair_lj_charmmfsw_coul_long.cpp
/pair_lj_charmmfsw_coul_long.h
/angle_cg_cmm.cpp
/angle_cg_cmm.h
/angle_charmm.cpp
/angle_charmm.h
/angle_class2.cpp
/angle_class2.h
/angle_cosine.cpp
/angle_cosine.h
/angle_cosine_delta.cpp
/angle_cosine_delta.h
/angle_cosine_periodic.cpp
/angle_cosine_periodic.h
/angle_cosine_shift.cpp
/angle_cosine_shift.h
/angle_cosine_shift_exp.cpp
/angle_cosine_shift_exp.h
/angle_cosine_squared.cpp
/angle_cosine_squared.h
/angle_dipole.cpp
/angle_dipole.h
/angle_fourier.cpp
/angle_fourier.h
/angle_fourier_simple.cpp
/angle_fourier_simple.h
/angle_harmonic.cpp
/angle_harmonic.h
/angle_quartic.cpp
/angle_quartic.h
/angle_sdk.cpp
/angle_sdk.h
/angle_table.cpp
/angle_table.h
/atom_vec_angle.cpp
/atom_vec_angle.h
/atom_vec_bond.cpp
/atom_vec_bond.h
/atom_vec_colloid.cpp
/atom_vec_colloid.h
/atom_vec_dipole.cpp
/atom_vec_dipole.h
/atom_vec_dpd.cpp
/atom_vec_dpd.h
/atom_vec_electron.cpp
/atom_vec_electron.h
/atom_vec_ellipsoid.cpp
/atom_vec_ellipsoid.h
/atom_vec_full.cpp
/atom_vec_full.h
/atom_vec_full_hars.cpp
/atom_vec_full_hars.h
/atom_vec_granular.cpp
/atom_vec_granular.h
/atom_vec_meso.cpp
/atom_vec_meso.h
/atom_vec_molecular.cpp
/atom_vec_molecular.h
/atom_vec_peri.cpp
/atom_vec_peri.h
/atom_vec_template.cpp
/atom_vec_template.h
/body_nparticle.cpp
/body_nparticle.h
/bond_class2.cpp
/bond_class2.h
/bond_fene.cpp
/bond_fene.h
/bond_fene_expand.cpp
/bond_fene_expand.h
/bond_harmonic.cpp
/bond_harmonic.h
/bond_harmonic_shift.cpp
/bond_harmonic_shift.h
/bond_harmonic_shift_cut.cpp
/bond_harmonic_shift_cut.h
/bond_morse.cpp
/bond_morse.h
/bond_nonlinear.cpp
/bond_nonlinear.h
/bond_oxdna_fene.cpp
/bond_oxdna_fene.h
+/bond_oxdna2_fene.cpp
+/bond_oxdna2_fene.h
/bond_quartic.cpp
/bond_quartic.h
/bond_table.cpp
/bond_table.h
/cg_cmm_parms.cpp
/cg_cmm_parms.h
/commgrid.cpp
/commgrid.h
/compute_ackland_atom.cpp
/compute_ackland_atom.h
/compute_basal_atom.cpp
/compute_basal_atom.h
/compute_body_local.cpp
/compute_body_local.h
/compute_cna_atom2.cpp
/compute_cna_atom2.h
/compute_damage_atom.cpp
/compute_damage_atom.h
/compute_dilatation_atom.cpp
/compute_dilatation_atom.h
/compute_dpd.cpp
/compute_dpd.h
/compute_dpd_atom.cpp
/compute_dpd_atom.h
/compute_erotate_asphere.cpp
/compute_erotate_asphere.h
/compute_erotate_rigid.cpp
/compute_erotate_rigid.h
/compute_event_displace.cpp
/compute_event_displace.h
/compute_fep.cpp
/compute_fep.h
/compute_force_tally.cpp
/compute_force_tally.h
/compute_heat_flux_tally.cpp
/compute_heat_flux_tally.h
/compute_ke_atom_eff.cpp
/compute_ke_atom_eff.h
/compute_ke_eff.cpp
/compute_ke_eff.h
/compute_ke_rigid.cpp
/compute_ke_rigid.h
/compute_meso_e_atom.cpp
/compute_meso_e_atom.h
/compute_meso_rho_atom.cpp
/compute_meso_rho_atom.h
/compute_meso_t_atom.cpp
/compute_meso_t_atom.h
/compute_msd_nongauss.cpp
/compute_msd_nongauss.h
/compute_pe_tally.cpp
/compute_pe_tally.h
/compute_plasticity_atom.cpp
/compute_plasticity_atom.h
/compute_pressure_grem.cpp
/compute_pressure_grem.h
/compute_rigid_local.cpp
/compute_rigid_local.h
/compute_spec_atom.cpp
/compute_spec_atom.h
/compute_stress_tally.cpp
/compute_stress_tally.h
/compute_temp_asphere.cpp
/compute_temp_asphere.h
/compute_temp_body.cpp
/compute_temp_body.h
/compute_temp_deform_eff.cpp
/compute_temp_deform_eff.h
/compute_temp_eff.cpp
/compute_temp_eff.h
/compute_temp_region_eff.cpp
/compute_temp_region_eff.h
/compute_temp_rotate.cpp
/compute_temp_rotate.h
/compute_ti.cpp
/compute_ti.h
/compute_voronoi_atom.cpp
/compute_voronoi_atom.h
/dihedral_charmm.cpp
/dihedral_charmm.h
/dihedral_class2.cpp
/dihedral_class2.h
/dihedral_cosine_shift_exp.cpp
/dihedral_cosine_shift_exp.h
/dihedral_fourier.cpp
/dihedral_fourier.h
/dihedral_harmonic.cpp
/dihedral_harmonic.h
/dihedral_helix.cpp
/dihedral_helix.h
/dihedral_hybrid.cpp
/dihedral_hybrid.h
/dihedral_multi_harmonic.cpp
/dihedral_multi_harmonic.h
/dihedral_nharmonic.cpp
/dihedral_nharmonic.h
/dihedral_opls.cpp
/dihedral_opls.h
/dihedral_quadratic.cpp
/dihedral_quadratic.h
/dihedral_spherical.cpp
/dihedral_spherical.h
/dihedral_table.cpp
/dihedral_table.h
/dump_atom_gz.cpp
/dump_atom_gz.h
/dump_xyz_gz.cpp
/dump_xyz_gz.h
/dump_atom_mpiio.cpp
/dump_atom_mpiio.h
/dump_cfg_gz.cpp
/dump_cfg_gz.h
/dump_cfg_mpiio.cpp
/dump_cfg_mpiio.h
/dump_custom_gz.cpp
/dump_custom_gz.h
/dump_custom_mpiio.cpp
/dump_custom_mpiio.h
/dump_custom_vtk.cpp
/dump_custom_vtk.h
/dump_h5md.cpp
/dump_h5md.h
/dump_nc.cpp
/dump_nc.h
/dump_nc_mpiio.cpp
/dump_nc_mpiio.h
/dump_xtc.cpp
/dump_xtc.h
/dump_xyz_mpiio.cpp
/dump_xyz_mpiio.h
/ewald.cpp
/ewald.h
/ewald_cg.cpp
/ewald_cg.h
/ewald_disp.cpp
/ewald_disp.h
/ewald_n.cpp
/ewald_n.h
/fft3d.cpp
/fft3d.h
/fft3d_wrap.cpp
/fft3d_wrap.h
/fix_adapt_fep.cpp
/fix_adapt_fep.h
/fix_addtorque.cpp
/fix_addtorque.h
/fix_append_atoms.cpp
/fix_append_atoms.h
/fix_atc.cpp
/fix_atc.h
/fix_ave_correlate_long.cpp
/fix_ave_correlate_long.h
/fix_bond_break.cpp
/fix_bond_break.h
/fix_bond_create.cpp
/fix_bond_create.h
/fix_bond_swap.cpp
/fix_bond_swap.h
/fix_cmap.cpp
/fix_cmap.h
/fix_deposit.cpp
/fix_deposit.h
/fix_dpd_energy.cpp
/fix_dpd_energy.h
/fix_efield.cpp
/fix_efield.h
/fix_eos_cv.cpp
/fix_eos_cv.h
/fix_eos_table.cpp
/fix_eos_table.h
/fix_evaporate.cpp
/fix_evaporate.h
/fix_filter_corotate.cpp
/fix_filter_corotate.h
/fix_viscosity.cpp
/fix_viscosity.h
/fix_ehex.cpp
/fix_ehex.h
/fix_event.cpp
/fix_event.h
/fix_event_prd.cpp
/fix_event_prd.h
/fix_event_tad.cpp
/fix_event_tad.h
/fix_flow_gauss.cpp
/fix_flow_gauss.h
/fix_freeze.cpp
/fix_freeze.h
/fix_gcmc.cpp
/fix_gcmc.h
/fix_gld.cpp
/fix_gld.h
/fix_gle.cpp
/fix_gle.h
/fix_gpu.cpp
/fix_gpu.h
/fix_grem.cpp
/fix_grem.h
/fix_imd.cpp
/fix_imd.h
/fix_ipi.cpp
/fix_ipi.h
/fix_lambdah_calc.cpp
/fix_lambdah_calc.h
/fix_langevin_eff.cpp
/fix_langevin_eff.h
/fix_lb_fluid.cpp
/fix_lb_fluid.h
/fix_lb_momentum.cpp
/fix_lb_momentum.h
/fix_lb_pc.cpp
/fix_lb_pc.h
/fix_lb_rigid_pc_sphere.cpp
/fix_lb_rigid_pc_sphere.h
/fix_lb_viscous.cpp
/fix_lb_viscous.h
/fix_load_report.cpp
/fix_load_report.h
/fix_meso.cpp
/fix_meso.h
/fix_meso_stationary.cpp
/fix_meso_stationary.h
/fix_mscg.cpp
/fix_mscg.h
/fix_msst.cpp
/fix_msst.h
/fix_neb.cpp
/fix_neb.h
/fix_nh_asphere.cpp
/fix_nh_asphere.h
/fix_nph_asphere.cpp
/fix_nph_asphere.h
/fix_npt_asphere.cpp
/fix_npt_asphere.h
/fix_nve_asphere.cpp
/fix_nve_asphere.h
/fix_nve_asphere_noforce.cpp
/fix_nve_asphere_noforce.h
/fix_nve_dot.cpp
/fix_nve_dot.h
/fix_nve_dotc_langevin.cpp
/fix_nve_dotc_langevin.h
/fix_nh_body.cpp
/fix_nh_body.h
/fix_nph_body.cpp
/fix_nph_body.h
/fix_npt_body.cpp
/fix_npt_body.h
/fix_nvk.cpp
/fix_nvk.h
/fix_nvt_body.cpp
/fix_nvt_body.h
/fix_nve_body.cpp
/fix_nve_body.h
/fix_nvt_asphere.cpp
/fix_nvt_asphere.h
/fix_nh_eff.cpp
/fix_nh_eff.h
/fix_nph_eff.cpp
/fix_nph_eff.h
/fix_nphug.cpp
/fix_nphug.h
/fix_npt_eff.cpp
/fix_npt_eff.h
/fix_nve_eff.cpp
/fix_nve_eff.h
/fix_nve_line.cpp
/fix_nve_line.h
/fix_nvt_eff.cpp
/fix_nvt_eff.h
/fix_nvt_sllod_eff.cpp
/fix_nvt_sllod_eff.h
/fix_nve_tri.cpp
/fix_nve_tri.h
/fix_oneway.cpp
/fix_oneway.h
/fix_orient_bcc.cpp
/fix_orient_bcc.h
/fix_orient_fcc.cpp
/fix_orient_fcc.h
/fix_peri_neigh.cpp
/fix_peri_neigh.h
/fix_phonon.cpp
/fix_phonon.h
/fix_poems.cpp
/fix_poems.h
/fix_pour.cpp
/fix_pour.h
/fix_qeq_comb.cpp
/fix_qeq_comb.h
/fix_qeq_reax.cpp
/fix_qeq_fire.cpp
/fix_qeq_fire.h
/fix_qeq_reax.h
/fix_qmmm.cpp
/fix_qmmm.h
/fix_reax_bonds.cpp
/fix_reax_bonds.h
/fix_reax_c.cpp
/fix_reax_c.h
/fix_reaxc_bonds.cpp
/fix_reaxc_bonds.h
/fix_reaxc_species.cpp
/fix_reaxc_species.h
/fix_rigid.cpp
/fix_rigid.h
/fix_rigid_nh.cpp
/fix_rigid_nh.h
/fix_rigid_nph.cpp
/fix_rigid_nph.h
/fix_rigid_npt.cpp
/fix_rigid_npt.h
/fix_rigid_nve.cpp
/fix_rigid_nve.h
/fix_rigid_nvt.cpp
/fix_rigid_nvt.h
/fix_rigid_nh_small.cpp
/fix_rigid_nh_small.h
/fix_rigid_nph_small.cpp
/fix_rigid_nph_small.h
/fix_rigid_npt_small.cpp
/fix_rigid_npt_small.h
/fix_rigid_nve_small.cpp
/fix_rigid_nve_small.h
/fix_rigid_nvt_small.cpp
/fix_rigid_nvt_small.h
/fix_rigid_small.cpp
/fix_rigid_small.h
/fix_shake.cpp
/fix_shake.h
/fix_shardlow.cpp
/fix_shardlow.h
/fix_smd.cpp
/fix_smd.h
/fix_species.cpp
/fix_species.h
/fix_spring_pull.cpp
/fix_spring_pull.h
/fix_srd.cpp
/fix_srd.h
/fix_temp_rescale_eff.cpp
/fix_temp_rescale_eff.h
/fix_thermal_conductivity.cpp
/fix_thermal_conductivity.h
/fix_ti_rs.cpp
/fix_ti_rs.h
/fix_ti_spring.cpp
/fix_ti_spring.h
/fix_ttm.cpp
/fix_ttm.h
/fix_tune_kspace.cpp
/fix_tune_kspace.h
/fix_wall_colloid.cpp
/fix_wall_colloid.h
/fix_wall_gran.cpp
/fix_wall_gran.h
/fix_wall_gran_region.cpp
/fix_wall_gran_region.h
/fix_wall_piston.cpp
/fix_wall_piston.h
/fix_wall_srd.cpp
/fix_wall_srd.h
/gpu_extra.h
/gridcomm.cpp
/gridcomm.h
/group_ndx.cpp
/group_ndx.h
/ndx_group.cpp
/ndx_group.h
/improper_class2.cpp
/improper_class2.h
/improper_cossq.cpp
/improper_cossq.h
/improper_cvff.cpp
/improper_cvff.h
/improper_distance.cpp
/improper_distance.h
/improper_fourier.cpp
/improper_fourier.h
/improper_harmonic.cpp
/improper_harmonic.h
/improper_hybrid.cpp
/improper_hybrid.h
/improper_ring.cpp
/improper_ring.h
/improper_umbrella.cpp
/improper_umbrella.h
/kissfft.h
/lj_sdk_common.h
/math_complex.h
/math_vector.h
/mgpt_*.cpp
/mgpt_*.h
/msm.cpp
/msm.h
/msm_cg.cpp
/msm_cg.h
/neb.cpp
/neb.h
/pair_adp.cpp
/pair_adp.h
/pair_agni.cpp
/pair_agni.h
/pair_airebo.cpp
/pair_airebo.h
/pair_airebo_morse.cpp
/pair_airebo_morse.h
/pair_body.cpp
/pair_body.h
/pair_bop.cpp
/pair_bop.h
/pair_born_coul_long.cpp
/pair_born_coul_long.h
/pair_born_coul_msm.cpp
/pair_born_coul_msm.h
/pair_brownian.cpp
/pair_brownian.h
/pair_brownian_poly.cpp
/pair_brownian_poly.h
/pair_buck_coul_long.cpp
/pair_buck_coul_long.h
/pair_buck_coul_msm.cpp
/pair_buck_coul_msm.h
/pair_buck_coul.cpp
/pair_buck_coul.h
/pair_buck_long_coul_long.cpp
/pair_buck_long_coul_long.h
/pair_cdeam.cpp
/pair_cdeam.h
/pair_cg_cmm.cpp
/pair_cg_cmm.h
/pair_cg_cmm_coul_cut.cpp
/pair_cg_cmm_coul_cut.h
/pair_cg_cmm_coul_long.cpp
/pair_cg_cmm_coul_long.h
/pair_cmm_common.cpp
/pair_cmm_common.h
/pair_cg_cmm_coul_msm.cpp
/pair_cg_cmm_coul_msm.h
/pair_comb.cpp
/pair_comb.h
/pair_comb3.cpp
/pair_comb3.h
/pair_colloid.cpp
/pair_colloid.h
/pair_coul_diel.cpp
/pair_coul_diel.h
/pair_coul_long.cpp
/pair_coul_long.h
/pair_coul_msm.cpp
/pair_coul_msm.h
/pair_dipole_cut.cpp
/pair_dipole_cut.h
/pair_dipole_sf.cpp
/pair_dipole_sf.h
/pair_dpd_mt.cpp
/pair_dpd_mt.h
/pair_dsmc.cpp
/pair_dsmc.h
/pair_eam.cpp
/pair_eam.h
/pair_eam_opt.cpp
/pair_eam_opt.h
/pair_eam_alloy.cpp
/pair_eam_alloy.h
/pair_eam_alloy_opt.cpp
/pair_eam_alloy_opt.h
/pair_eam_fs.cpp
/pair_eam_fs.h
/pair_eam_fs_opt.cpp
/pair_eam_fs_opt.h
/pair_edip.cpp
/pair_edip.h
/pair_eff_cut.cpp
/pair_eff_cut.h
/pair_eff_inline.h
/pair_eim.cpp
/pair_eim.h
/pair_gauss_cut.cpp
/pair_gauss_cut.h
/pair_gayberne.cpp
/pair_gayberne.h
/pair_gran_easy.cpp
/pair_gran_easy.h
/pair_gran_hertz_history.cpp
/pair_gran_hertz_history.h
/pair_gran_hooke.cpp
/pair_gran_hooke.h
/pair_gran_hooke_history.cpp
/pair_gran_hooke_history.h
/pair_gw.cpp
/pair_gw.h
/pair_gw_zbl.cpp
/pair_gw_zbl.h
/pair_hbond_dreiding_lj.cpp
/pair_hbond_dreiding_lj.h
/pair_hbond_dreiding_morse.cpp
/pair_hbond_dreiding_morse.h
/pair_kolmogorov_crespi_z.cpp
/pair_kolmogorov_crespi_z.h
/pair_lcbop.cpp
/pair_lcbop.h
/pair_line_lj.cpp
/pair_line_lj.h
/pair_list.cpp
/pair_list.h
/pair_lj_charmm_coul_charmm.cpp
/pair_lj_charmm_coul_charmm.h
/pair_lj_charmm_coul_charmm_implicit.cpp
/pair_lj_charmm_coul_charmm_implicit.h
/pair_lj_charmm_coul_long.cpp
/pair_lj_charmm_coul_long.h
/pair_lj_charmm_coul_long_opt.cpp
/pair_lj_charmm_coul_long_opt.h
/pair_lj_charmm_coul_long_soft.cpp
/pair_lj_charmm_coul_long_soft.h
/pair_lj_charmm_coul_msm.cpp
/pair_lj_charmm_coul_msm.h
/pair_lj_class2.cpp
/pair_lj_class2.h
/pair_lj_class2_coul_cut.cpp
/pair_lj_class2_coul_cut.h
/pair_lj_class2_coul_long.cpp
/pair_lj_class2_coul_long.h
/pair_lj_coul.cpp
/pair_lj_coul.h
/pair_coul_cut_soft.cpp
/pair_coul_cut_soft.h
/pair_coul_long_soft.cpp
/pair_coul_long_soft.h
/pair_lj_cut_coul_cut_soft.cpp
/pair_lj_cut_coul_cut_soft.h
/pair_lj_cut_tip4p_cut.cpp
/pair_lj_cut_tip4p_cut.h
/pair_lj_cut_coul_long.cpp
/pair_lj_cut_coul_long.h
/pair_lj_cut_coul_long_opt.cpp
/pair_lj_cut_coul_long_opt.h
/pair_lj_cut_coul_long_soft.cpp
/pair_lj_cut_coul_long_soft.h
/pair_lj_cut_coul_msm.cpp
/pair_lj_cut_coul_msm.h
/pair_lj_cut_dipole_cut.cpp
/pair_lj_cut_dipole_cut.h
/pair_lj_cut_dipole_long.cpp
/pair_lj_cut_dipole_long.h
/pair_lj_cut_*hars_*.cpp
/pair_lj_cut_*hars_*.h
/pair_lj_cut_soft.cpp
/pair_lj_cut_soft.h
/pair_lj_cut_tip4p_long.cpp
/pair_lj_cut_tip4p_long.h
/pair_lj_cut_tip4p_long_opt.cpp
/pair_lj_cut_tip4p_long_opt.h
/pair_lj_cut_tip4p_long_soft.cpp
/pair_lj_cut_tip4p_long_soft.h
/pair_lj_long_coul_long.cpp
/pair_lj_long_coul_long.h
/pair_lj_long_coul_long_opt.cpp
/pair_lj_long_coul_long_opt.h
/pair_lj_long_dipole_long.cpp
/pair_lj_long_dipole_long.h
/pair_lj_long_tip4p_long.cpp
/pair_lj_long_tip4p_long.h
/pair_lj_cut_opt.cpp
/pair_lj_cut_opt.h
/pair_lj_cut_tgpu.cpp
/pair_lj_cut_tgpu.h
/pair_lj_sdk.cpp
/pair_lj_sdk.h
/pair_lj_sdk_coul_long.cpp
/pair_lj_sdk_coul_long.h
/pair_lj_sdk_coul_msm.cpp
/pair_lj_sdk_coul_msm.h
/pair_lj_sf.cpp
/pair_lj_sf.h
/pair_lj_sf_dipole_sf.cpp
/pair_lj_sf_dipole_sf.h
/pair_lubricateU.cpp
/pair_lubricateU.h
/pair_lubricateU_poly.cpp
/pair_lubricateU_poly.h
/pair_lubricate_poly.cpp
/pair_lubricate_poly.h
/pair_lubricate.cpp
/pair_lubricate.h
/pair_meam.cpp
/pair_meam.h
/pair_meam_spline.cpp
/pair_meam_spline.h
/pair_meam_sw_spline.cpp
/pair_meam_sw_spline.h
/pair_morse_opt.cpp
/pair_morse_opt.h
/pair_morse_soft.cpp
/pair_morse_soft.h
/pair_nb3b_harmonic.cpp
/pair_nb3b_harmonic.h
/pair_nm_cut.cpp
/pair_nm_cut.h
/pair_nm_cut_coul_cut.cpp
/pair_nm_cut_coul_cut.h
/pair_nm_cut_coul_long.cpp
/pair_nm_cut_coul_long.h
/pair_oxdna_*.cpp
/pair_oxdna_*.h
+/pair_oxdna2_*.cpp
+/pair_oxdna2_*.h
/mf_oxdna.h
/pair_peri_eps.cpp
/pair_peri_eps.h
/pair_peri_lps.cpp
/pair_peri_lps.h
/pair_peri_pmb.cpp
/pair_peri_pmb.h
/pair_peri_ves.cpp
/pair_peri_ves.h
/pair_reax.cpp
/pair_reax.h
/pair_reax_fortran.h
/pair_reax_c.cpp
/pair_reax_c.h
/pair_rebo.cpp
/pair_rebo.h
/pair_resquared.cpp
/pair_resquared.h
/pair_sph_heatconduction.cpp
/pair_sph_heatconduction.h
/pair_sph_idealgas.cpp
/pair_sph_idealgas.h
/pair_sph_lj.cpp
/pair_sph_lj.h
/pair_sph_rhosum.cpp
/pair_sph_rhosum.h
/pair_sph_taitwater.cpp
/pair_sph_taitwater.h
/pair_sph_taitwater_morris.cpp
/pair_sph_taitwater_morris.h
/pair_sw.cpp
/pair_sw.h
/pair_tersoff.cpp
/pair_tersoff.h
/pair_tersoff_mod.cpp
/pair_tersoff_mod.h
/pair_tersoff_mod_c.cpp
/pair_tersoff_mod_c.h
/pair_tersoff_table.cpp
/pair_tersoff_table.h
/pair_tersoff_zbl.cpp
/pair_tersoff_zbl.h
/pair_tip4p_cut.cpp
/pair_tip4p_cut.h
/pair_tip4p_long.cpp
/pair_tip4p_long.h
/pair_tip4p_long_soft.cpp
/pair_tip4p_long_soft.h
/pair_tri_lj.cpp
/pair_tri_lj.h
/pair_yukawa_colloid.cpp
/pair_yukawa_colloid.h
/pair_momb.cpp
/pair_momb.h
/pppm.cpp
/pppm.h
/pppm_cg.cpp
/pppm_cg.h
/pppm_disp.cpp
/pppm_disp.h
/pppm_disp_tip4p.cpp
/pppm_disp_tip4p.h
/pppm_old.cpp
/pppm_old.h
/pppm_proxy.cpp
/pppm_proxy.h
/pppm_stagger.cpp
/pppm_stagger.h
/pppm_tip4p.cpp
/pppm_tip4p.h
/pppm_tip4p_proxy.cpp
/pppm_tip4p_proxy.h
/pppm_tip4p_cg.cpp
/pppm_tip4p_cg.h
/prd.cpp
/prd.h
/python_impl.cpp
/python_impl.h
/reader_molfile.cpp
/reader_molfile.h
/reaxc_allocate.cpp
/reaxc_allocate.h
/reaxc_basic_comm.cpp
/reaxc_basic_comm.h
/reaxc_bond_orders.cpp
/reaxc_bond_orders.h
/reaxc_bonds.cpp
/reaxc_bonds.h
/reaxc_control.cpp
/reaxc_control.h
/reaxc_defs.h
/reaxc_ffield.cpp
/reaxc_ffield.h
/reaxc_forces.cpp
/reaxc_forces.h
/reaxc_hydrogen_bonds.cpp
/reaxc_hydrogen_bonds.h
/reaxc_init_md.cpp
/reaxc_init_md.h
/reaxc_io_tools.cpp
/reaxc_io_tools.h
/reaxc_list.cpp
/reaxc_list.h
/reaxc_lookup.cpp
/reaxc_lookup.h
/reaxc_multi_body.cpp
/reaxc_multi_body.h
/reaxc_nonbonded.cpp
/reaxc_nonbonded.h
/reaxc_reset_tools.cpp
/reaxc_reset_tools.h
/reaxc_system_props.cpp
/reaxc_system_props.h
/reaxc_tool_box.cpp
/reaxc_tool_box.h
/reaxc_torsion_angles.cpp
/reaxc_torsion_angles.h
/reaxc_traj.cpp
/reaxc_traj.h
/reaxc_types.h
/reaxc_valence_angles.cpp
/reaxc_valence_angles.h
/reaxc_vector.cpp
/reaxc_vector.h
/remap.cpp
/remap.h
/remap_wrap.cpp
/remap_wrap.h
/restart_mpiio.cpp
/restart_mpiio.h
/smd_kernels.h
/smd_material_models.cpp
/smd_material_models.h
/smd_math.h
/tad.cpp
/tad.h
/temper.cpp
/temper.h
/temper_grem.cpp
/temper_grem.h
/thr_data.cpp
/thr_data.h
/verlet_split.cpp
/verlet_split.h
/write_dump.cpp
/write_dump.h
/xdr_compat.cpp
/xdr_compat.h
/atom_vec_smd.cpp
/atom_vec_smd.h
/compute_saed.cpp
/compute_saed.h
/compute_saed_consts.h
/compute_smd_contact_radius.cpp
/compute_smd_contact_radius.h
/compute_smd_damage.cpp
/compute_smd_damage.h
/compute_smd_hourglass_error.cpp
/compute_smd_hourglass_error.h
/compute_smd_internal_energy.cpp
/compute_smd_internal_energy.h
/compute_smd_plastic_strain.cpp
/compute_smd_plastic_strain.h
/compute_smd_plastic_strain_rate.cpp
/compute_smd_plastic_strain_rate.h
/compute_smd_rho.cpp
/compute_smd_rho.h
/compute_smd_tlsph_defgrad.cpp
/compute_smd_tlsph_defgrad.h
/compute_smd_tlsph_dt.cpp
/compute_smd_tlsph_dt.h
/compute_smd_tlsph_num_neighs.cpp
/compute_smd_tlsph_num_neighs.h
/compute_smd_tlsph_shape.cpp
/compute_smd_tlsph_shape.h
/compute_smd_tlsph_strain.cpp
/compute_smd_tlsph_strain.h
/compute_smd_tlsph_strain_rate.cpp
/compute_smd_tlsph_strain_rate.h
/compute_smd_tlsph_stress.cpp
/compute_smd_tlsph_stress.h
/compute_smd_triangle_mesh_vertices.cpp
/compute_smd_triangle_mesh_vertices.h
/compute_smd_ulsph_effm.cpp
/compute_smd_ulsph_effm.h
/compute_smd_ulsph_num_neighs.cpp
/compute_smd_ulsph_num_neighs.h
/compute_smd_ulsph_strain.cpp
/compute_smd_ulsph_strain.h
/compute_smd_ulsph_strain_rate.cpp
/compute_smd_ulsph_strain_rate.h
/compute_smd_ulsph_stress.cpp
/compute_smd_ulsph_stress.h
/compute_smd_vol.cpp
/compute_smd_vol.h
/compute_temp_cs.cpp
/compute_temp_cs.h
/compute_temp_drude.cpp
/compute_temp_drude.h
/compute_xrd.cpp
/compute_xrd.h
/compute_xrd_consts.h
/fix_atom_swap.cpp
/fix_atom_swap.h
/fix_ave_spatial_sphere.cpp
/fix_ave_spatial_sphere.h
/fix_drude.cpp
/fix_drude.h
/fix_drude_transform.cpp
/fix_drude_transform.h
/fix_langevin_drude.cpp
/fix_langevin_drude.h
/fix_pimd.cpp
/fix_pimd.h
/fix_qbmsst.cpp
/fix_qbmsst.h
/fix_qtb.cpp
/fix_qtb.h
/fix_rattle.cpp
/fix_rattle.h
/fix_saed_vtk.cpp
/fix_saed_vtk.h
/fix_smd_adjust_dt.cpp
/fix_smd_adjust_dt.h
/fix_smd_integrate_tlsph.cpp
/fix_smd_integrate_tlsph.h
/fix_smd_integrate_ulsph.cpp
/fix_smd_integrate_ulsph.h
/fix_smd_move_triangulated_surface.cpp
/fix_smd_move_triangulated_surface.h
/fix_smd_setvel.cpp
/fix_smd_setvel.h
/fix_smd_tlsph_reference_configuration.cpp
/fix_smd_tlsph_reference_configuration.h
/fix_smd_wall_surface.cpp
/fix_smd_wall_surface.h
/fix_srp.cpp
/fix_srp.h
/fix_tfmc.cpp
/fix_tfmc.h
/fix_ttm_mod.cpp
/fix_ttm_mod.h
/pair_born_coul_long_cs.cpp
/pair_born_coul_long_cs.h
/pair_born_coul_dsf_cs.cpp
/pair_born_coul_dsf_cs.h
/pair_buck_coul_long_cs.cpp
/pair_buck_coul_long_cs.h
/pair_coul_long_cs.cpp
/pair_coul_long_cs.h
/pair_lj_cut_thole_long.cpp
/pair_lj_cut_thole_long.h
/pair_plum_hb.cpp
/pair_plum_hb.h
/pair_plum_hp.cpp
/pair_plum_hp.h
/pair_polymorphic.cpp
/pair_polymorphic.h
/pair_smd_hertz.cpp
/pair_smd_hertz.h
/pair_smd_tlsph.cpp
/pair_smd_tlsph.h
/pair_smd_triangulated_surface.cpp
/pair_smd_triangulated_surface.h
/pair_smd_ulsph.cpp
/pair_smd_ulsph.h
/pair_srp.cpp
/pair_srp.h
/pair_thole.cpp
/pair_thole.h
/pair_buck_mdf.cpp
/pair_buck_mdf.h
/pair_dpd_conservative.cpp
/pair_dpd_conservative.h
/pair_dpd_fdt.cpp
/pair_dpd_fdt.h
/pair_dpd_fdt_energy.cpp
/pair_dpd_fdt_energy.h
/pair_lennard_mdf.cpp
/pair_lennard_mdf.h
/pair_lj_cut_coul_long_cs.cpp
/pair_lj_cut_coul_long_cs.h
/pair_lj_mdf.cpp
/pair_lj_mdf.h
/pair_mgpt.cpp
/pair_mgpt.h
/pair_morse_smooth_linear.cpp
/pair_morse_smooth_linear.h
/pair_smtbq.cpp
/pair_smtbq.h
/pair_vashishta*.cpp
/pair_vashishta*.h
diff --git a/src/Depend.sh b/src/Depend.sh
index 5a48a7c16..520d9ae2b 100644
--- a/src/Depend.sh
+++ b/src/Depend.sh
@@ -1,129 +1,129 @@
# Depend.sh = Install/unInstall files due to package dependencies
# this script is invoked after any package is installed/uninstalled
# enforce using portable C locale
LC_ALL=C
export LC_ALL
# all parent/child package dependencies should be listed below
# parent package = has files that files in another package derive from
# child package = has files that derive from files in another package
# update child packages that depend on the parent,
# but only if the child package is already installed
# this is necessary to insure the child package installs
# only child files whose parent package files are now installed
# decisions on (un)installing individual child files are made by
# the Install.sh script in the child package
# depend function: arg = child-package
# checks if child-package is installed, if not just return
# otherwise invoke update of child package via its Install.sh
depend () {
cd $1
installed=0
for file in *.cpp *.h; do
if (test -e ../$file) then
installed=1
fi
done
cd ..
if (test $installed = 0) then
return
fi
echo " updating package $1"
if (test -e $1/Install.sh) then
cd $1; /bin/sh Install.sh 2; cd ..
else
cd $1; /bin/sh ../Install.sh 2; cd ..
fi
}
# add one if statement per parent package
# add one depend() call per child package that depends on that parent
if (test $1 = "ASPHERE") then
depend GPU
depend USER-OMP
depend USER-CGDNA
depend USER-INTEL
fi
if (test $1 = "CLASS2") then
depend GPU
depend KOKKOS
depend USER-OMP
fi
if (test $1 = "COLLOID") then
depend GPU
depend USER-OMP
fi
if (test $1 = "DIPOLE") then
depend USER-MISC
depend USER-OMP
fi
if (test $1 = "GRANULAR") then
depend USER-OMP
fi
if (test $1 = "KSPACE") then
depend CORESHELL
depend GPU
depend KOKKOS
depend OPT
depend USER-OMP
depend USER-INTEL
depend USER-PHONON
depend USER-FEP
fi
if (test $1 = "MANYBODY") then
depend GPU
depend KOKKOS
depend OPT
depend USER-MISC
depend USER-OMP
fi
if (test $1 = "MOLECULE") then
depend GPU
depend KOKKOS
depend USER-MISC
depend USER-OMP
depend USER-FEP
depend USER-CGDNA
depend USER-INTEL
fi
if (test $1 = "PERI") then
depend USER-OMP
fi
if (test $1 = "RIGID") then
depend USER-OMP
fi
-if (test $1 = "USER-CG-CMM") then
+if (test $1 = "USER-CGSDK") then
depend GPU
depend KOKKOS
depend USER-OMP
fi
if (test $1 = "USER-FEP") then
depend USER-OMP
fi
if (test $1 = "USER-MISC") then
depend GPU
depend USER-OMP
fi
if (test $1 = "USER-REAXC") then
depend KOKKOS
fi
diff --git a/src/GPU/pair_lj_sdk_coul_long_gpu.cpp b/src/GPU/pair_lj_sdk_coul_long_gpu.cpp
index 0b8d0f3b3..77c0dc066 100644
--- a/src/GPU/pair_lj_sdk_coul_long_gpu.cpp
+++ b/src/GPU/pair_lj_sdk_coul_long_gpu.cpp
@@ -1,352 +1,352 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Mike Brown (SNL)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include "pair_lj_sdk_coul_long_gpu.h"
#include "atom.h"
#include "atom_vec.h"
#include "comm.h"
#include "force.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "integrate.h"
#include "memory.h"
#include "error.h"
#include "neigh_request.h"
#include "universe.h"
#include "update.h"
#include "domain.h"
#include <string.h>
#include "kspace.h"
#include "gpu_extra.h"
#define EWALD_F 1.12837917
#define EWALD_P 0.3275911
#define A1 0.254829592
#define A2 -0.284496736
#define A3 1.421413741
#define A4 -1.453152027
#define A5 1.061405429
using namespace LAMMPS_NS;
// External functions from cuda library for atom decomposition
-int cmml_gpu_init(const int ntypes, double **cutsq, int **lj_type,
+int sdkl_gpu_init(const int ntypes, double **cutsq, int **lj_type,
double **host_lj1, double **host_lj2, double **host_lj3,
double **host_lj4, double **offset, double *special_lj,
const int nlocal, const int nall, const int max_nbors,
const int maxspecial, const double cell_size, int &gpu_mode,
FILE *screen, double **host_cut_ljsq, double host_cut_coulsq,
double *host_special_coul, const double qqrd2e,
const double g_ewald);
-void cmml_gpu_clear();
-int ** cmml_gpu_compute_n(const int ago, const int inum, const int nall,
+void sdkl_gpu_clear();
+int ** sdkl_gpu_compute_n(const int ago, const int inum, const int nall,
double **host_x, int *host_type, double *sublo,
double *subhi, tagint *tag, int **nspecial,
tagint **special, const bool eflag, const bool vflag,
const bool eatom, const bool vatom, int &host_start,
int **ilist, int **jnum, const double cpu_time,
bool &success, double *host_q, double *boxlo,
double *prd);
-void cmml_gpu_compute(const int ago, const int inum, const int nall,
+void sdkl_gpu_compute(const int ago, const int inum, const int nall,
double **host_x, int *host_type, int *ilist, int *numj,
int **firstneigh, const bool eflag, const bool vflag,
const bool eatom, const bool vatom, int &host_start,
const double cpu_time, bool &success, double *host_q,
const int nlocal, double *boxlo, double *prd);
-double cmml_gpu_bytes();
+double sdkl_gpu_bytes();
#include "lj_sdk_common.h"
using namespace LJSDKParms;
/* ---------------------------------------------------------------------- */
PairLJSDKCoulLongGPU::PairLJSDKCoulLongGPU(LAMMPS *lmp) :
PairLJSDKCoulLong(lmp), gpu_mode(GPU_FORCE)
{
respa_enable = 0;
reinitflag = 0;
cpu_time = 0.0;
GPU_EXTRA::gpu_ready(lmp->modify, lmp->error);
}
/* ----------------------------------------------------------------------
free all arrays
------------------------------------------------------------------------- */
PairLJSDKCoulLongGPU::~PairLJSDKCoulLongGPU()
{
- cmml_gpu_clear();
+ sdkl_gpu_clear();
}
/* ---------------------------------------------------------------------- */
void PairLJSDKCoulLongGPU::compute(int eflag, int vflag)
{
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
int nall = atom->nlocal + atom->nghost;
int inum, host_start;
bool success = true;
int *ilist, *numneigh, **firstneigh;
if (gpu_mode != GPU_FORCE) {
inum = atom->nlocal;
- firstneigh = cmml_gpu_compute_n(neighbor->ago, inum, nall, atom->x,
+ firstneigh = sdkl_gpu_compute_n(neighbor->ago, inum, nall, atom->x,
atom->type, domain->sublo, domain->subhi,
atom->tag, atom->nspecial, atom->special,
eflag, vflag, eflag_atom, vflag_atom,
host_start, &ilist, &numneigh, cpu_time,
success, atom->q, domain->boxlo,
domain->prd);
} else {
inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
- cmml_gpu_compute(neighbor->ago, inum, nall, atom->x, atom->type,
+ sdkl_gpu_compute(neighbor->ago, inum, nall, atom->x, atom->type,
ilist, numneigh, firstneigh, eflag, vflag, eflag_atom,
vflag_atom, host_start, cpu_time, success, atom->q,
atom->nlocal, domain->boxlo, domain->prd);
}
if (!success)
error->one(FLERR,"Insufficient memory on accelerator");
if (host_start<inum) {
cpu_time = MPI_Wtime();
if (evflag) {
if (eflag) cpu_compute<1,1>(host_start, inum, ilist, numneigh, firstneigh);
else cpu_compute<1,0>(host_start, inum, ilist, numneigh, firstneigh);
} else cpu_compute<0,0>(host_start, inum, ilist, numneigh, firstneigh);
cpu_time = MPI_Wtime() - cpu_time;
}
}
/* ----------------------------------------------------------------------
init specific to this pair style
------------------------------------------------------------------------- */
void PairLJSDKCoulLongGPU::init_style()
{
if (!atom->q_flag)
error->all(FLERR,"Pair style lj/sdk/coul/long/gpu requires atom attribute q");
if (force->newton_pair)
error->all(FLERR,"Cannot use newton pair with lj/sdk/coul/long/gpu pair style");
// Repeat cutsq calculation because done after call to init_style
double maxcut = -1.0;
double cut;
for (int i = 1; i <= atom->ntypes; i++) {
for (int j = i; j <= atom->ntypes; j++) {
if (setflag[i][j] != 0 || (setflag[i][i] != 0 && setflag[j][j] != 0)) {
cut = init_one(i,j);
cut *= cut;
if (cut > maxcut)
maxcut = cut;
cutsq[i][j] = cutsq[j][i] = cut;
} else
cutsq[i][j] = cutsq[j][i] = 0.0;
}
}
double cell_size = sqrt(maxcut) + neighbor->skin;
cut_coulsq = cut_coul * cut_coul;
// insure use of KSpace long-range solver, set g_ewald
if (force->kspace == NULL)
error->all(FLERR,"Pair style is incompatible with KSpace style");
g_ewald = force->kspace->g_ewald;
// setup force tables
if (ncoultablebits) init_tables(cut_coul,NULL);
int maxspecial=0;
if (atom->molecular)
maxspecial=atom->maxspecial;
- int success = cmml_gpu_init(atom->ntypes+1, cutsq, lj_type, lj1, lj2, lj3,
+ int success = sdkl_gpu_init(atom->ntypes+1, cutsq, lj_type, lj1, lj2, lj3,
lj4, offset, force->special_lj, atom->nlocal,
atom->nlocal+atom->nghost, 300, maxspecial,
cell_size, gpu_mode, screen, cut_ljsq,
cut_coulsq, force->special_coul,
force->qqrd2e, g_ewald);
GPU_EXTRA::check_flag(success,error,world);
if (gpu_mode == GPU_FORCE) {
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->full = 1;
}
}
/* ---------------------------------------------------------------------- */
double PairLJSDKCoulLongGPU::memory_usage()
{
double bytes = Pair::memory_usage();
- return bytes + cmml_gpu_bytes();
+ return bytes + sdkl_gpu_bytes();
}
/* ---------------------------------------------------------------------- */
template <int EVFLAG, int EFLAG>
void PairLJSDKCoulLongGPU::cpu_compute(int start, int inum, int *ilist,
int *numneigh, int **firstneigh)
{
int i,j,ii,jj;
double qtmp,xtmp,ytmp,ztmp;
double r2inv,forcecoul,forcelj,factor_coul,factor_lj;
const double * const * const x = atom->x;
double * const * const f = atom->f;
const double * const q = atom->q;
const int * const type = atom->type;
const double * const special_coul = force->special_coul;
const double * const special_lj = force->special_lj;
const double qqrd2e = force->qqrd2e;
double fxtmp,fytmp,fztmp;
// loop over neighbors of my atoms
for (ii = start; ii < inum; ii++) {
i = ilist[ii];
qtmp = q[i];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
fxtmp=fytmp=fztmp=0.0;
const int itype = type[i];
const int * const jlist = firstneigh[i];
const int jnum = numneigh[i];
for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
factor_lj = special_lj[sbmask(j)];
factor_coul = special_coul[sbmask(j)];
j &= NEIGHMASK;
const double delx = xtmp - x[j][0];
const double dely = ytmp - x[j][1];
const double delz = ztmp - x[j][2];
const double rsq = delx*delx + dely*dely + delz*delz;
const int jtype = type[j];
double evdwl = 0.0;
double ecoul = 0.0;
double fpair = 0.0;
if (rsq < cutsq[itype][jtype]) {
r2inv = 1.0/rsq;
const int ljt = lj_type[itype][jtype];
if (rsq < cut_coulsq) {
if (!ncoultablebits || rsq <= tabinnersq) {
const double r = sqrt(rsq);
const double grij = g_ewald * r;
const double expm2 = exp(-grij*grij);
const double t = 1.0 / (1.0 + EWALD_P*grij);
const double erfc = t * (A1+t*(A2+t*(A3+t*(A4+t*A5)))) * expm2;
const double prefactor = qqrd2e * qtmp*q[j]/r;
forcecoul = prefactor * (erfc + EWALD_F*grij*expm2);
if (EFLAG) ecoul = prefactor*erfc;
if (factor_coul < 1.0) {
forcecoul -= (1.0-factor_coul)*prefactor;
if (EFLAG) ecoul -= (1.0-factor_coul)*prefactor;
}
} else {
union_int_float_t rsq_lookup;
rsq_lookup.f = rsq;
int itable = rsq_lookup.i & ncoulmask;
itable >>= ncoulshiftbits;
const double fraction = (rsq_lookup.f - rtable[itable]) *
drtable[itable];
const double table = ftable[itable] + fraction*dftable[itable];
forcecoul = qtmp*q[j] * table;
if (EFLAG) {
const double table2 = etable[itable] + fraction*detable[itable];
ecoul = qtmp*q[j] * table2;
}
if (factor_coul < 1.0) {
const double table2 = ctable[itable] + fraction*dctable[itable];
const double prefactor = qtmp*q[j] * table2;
forcecoul -= (1.0-factor_coul)*prefactor;
if (EFLAG) ecoul -= (1.0-factor_coul)*prefactor;
}
}
} else {
forcecoul = 0.0;
ecoul = 0.0;
}
if (rsq < cut_ljsq[itype][jtype]) {
if (ljt == LJ12_4) {
const double r4inv=r2inv*r2inv;
forcelj = r4inv*(lj1[itype][jtype]*r4inv*r4inv
- lj2[itype][jtype]);
if (EFLAG)
evdwl = r4inv*(lj3[itype][jtype]*r4inv*r4inv
- lj4[itype][jtype]) - offset[itype][jtype];
} else if (ljt == LJ9_6) {
const double r3inv = r2inv*sqrt(r2inv);
const double r6inv = r3inv*r3inv;
forcelj = r6inv*(lj1[itype][jtype]*r3inv
- lj2[itype][jtype]);
if (EFLAG)
evdwl = r6inv*(lj3[itype][jtype]*r3inv
- lj4[itype][jtype]) - offset[itype][jtype];
} else if (ljt == LJ12_6) {
const double r6inv = r2inv*r2inv*r2inv;
forcelj = r6inv*(lj1[itype][jtype]*r6inv
- lj2[itype][jtype]);
if (EFLAG)
evdwl = r6inv*(lj3[itype][jtype]*r6inv
- lj4[itype][jtype]) - offset[itype][jtype];
}
if (EFLAG) evdwl *= factor_lj;
} else {
forcelj=0.0;
evdwl = 0.0;
}
fpair = (forcecoul + factor_lj*forcelj) * r2inv;
fxtmp += delx*fpair;
fytmp += dely*fpair;
fztmp += delz*fpair;
if (EVFLAG) ev_tally_full(i,evdwl,ecoul,fpair,delx,dely,delz);
}
}
f[i][0] += fxtmp;
f[i][1] += fytmp;
f[i][2] += fztmp;
}
}
diff --git a/src/GPU/pair_lj_sdk_coul_long_gpu.h b/src/GPU/pair_lj_sdk_coul_long_gpu.h
index 61de27297..3248e9497 100644
--- a/src/GPU/pair_lj_sdk_coul_long_gpu.h
+++ b/src/GPU/pair_lj_sdk_coul_long_gpu.h
@@ -1,69 +1,68 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/sdk/coul/long/gpu,PairLJSDKCoulLongGPU)
-PairStyle(cg/cmm/coul/long/gpu,PairLJSDKCoulLongGPU)
#else
#ifndef LMP_PAIR_LJ_SDK_COUL_LONG_GPU_H
#define LMP_PAIR_LJ_SDK_COUL_LONG_GPU_H
#include "pair_lj_sdk_coul_long.h"
namespace LAMMPS_NS {
class PairLJSDKCoulLongGPU : public PairLJSDKCoulLong {
public:
PairLJSDKCoulLongGPU(LAMMPS *lmp);
~PairLJSDKCoulLongGPU();
template <int, int>
void cpu_compute(int, int, int *, int *, int **);
void compute(int, int);
void init_style();
double memory_usage();
enum { GPU_FORCE, GPU_NEIGH, GPU_HYB_NEIGH };
private:
int gpu_mode;
double cpu_time;
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Insufficient memory on accelerator
There is insufficient memory on one of the devices specified for the gpu
package
E: Pair style lj/sdk/coul/long/gpu requires atom attribute q
The atom style defined does not have this attribute.
E: Cannot use newton pair with lj/sdk/coul/long/gpu pair style
Self-explanatory.
E: Pair style is incompatible with KSpace style
If a pair style with a long-range Coulombic component is selected,
then a kspace style must also be used.
*/
diff --git a/src/GPU/pair_lj_sdk_gpu.cpp b/src/GPU/pair_lj_sdk_gpu.cpp
index e7e9b690f..67103181d 100644
--- a/src/GPU/pair_lj_sdk_gpu.cpp
+++ b/src/GPU/pair_lj_sdk_gpu.cpp
@@ -1,262 +1,262 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Mike Brown (SNL)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include "pair_lj_sdk_gpu.h"
#include "atom.h"
#include "atom_vec.h"
#include "comm.h"
#include "force.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "integrate.h"
#include "memory.h"
#include "error.h"
#include "neigh_request.h"
#include "universe.h"
#include "update.h"
#include "domain.h"
#include <string.h>
#include "gpu_extra.h"
using namespace LAMMPS_NS;
// External functions from cuda library for atom decomposition
-int cmm_gpu_init(const int ntypes, double **cutsq, int **cg_types,
+int sdk_gpu_init(const int ntypes, double **cutsq, int **cg_types,
double **host_lj1, double **host_lj2, double **host_lj3,
double **host_lj4, double **offset, double *special_lj,
const int nlocal, const int nall, const int max_nbors,
const int maxspecial, const double cell_size, int &gpu_mode,
FILE *screen);
-void cmm_gpu_clear();
-int ** cmm_gpu_compute_n(const int ago, const int inum, const int nall,
+void sdk_gpu_clear();
+int ** sdk_gpu_compute_n(const int ago, const int inum, const int nall,
double **host_x, int *host_type, double *sublo,
double *subhi, tagint *tag, int **nspecial,
tagint **special, const bool eflag, const bool vflag,
const bool eatom, const bool vatom, int &host_start,
int **ilist, int **jnum,
const double cpu_time, bool &success);
-void cmm_gpu_compute(const int ago, const int inum, const int nall,
+void sdk_gpu_compute(const int ago, const int inum, const int nall,
double **host_x, int *host_type, int *ilist, int *numj,
int **firstneigh, const bool eflag, const bool vflag,
const bool eatom, const bool vatom, int &host_start,
const double cpu_time, bool &success);
-double cmm_gpu_bytes();
+double sdk_gpu_bytes();
#include "lj_sdk_common.h"
using namespace LJSDKParms;
/* ---------------------------------------------------------------------- */
PairLJSDKGPU::PairLJSDKGPU(LAMMPS *lmp) : PairLJSDK(lmp), gpu_mode(GPU_FORCE)
{
respa_enable = 0;
reinitflag = 0;
cpu_time = 0.0;
GPU_EXTRA::gpu_ready(lmp->modify, lmp->error);
}
/* ----------------------------------------------------------------------
free all arrays
------------------------------------------------------------------------- */
PairLJSDKGPU::~PairLJSDKGPU()
{
- cmm_gpu_clear();
+ sdk_gpu_clear();
}
/* ---------------------------------------------------------------------- */
void PairLJSDKGPU::compute(int eflag, int vflag)
{
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
int nall = atom->nlocal + atom->nghost;
int inum, host_start;
bool success = true;
int *ilist, *numneigh, **firstneigh;
if (gpu_mode != GPU_FORCE) {
inum = atom->nlocal;
- firstneigh = cmm_gpu_compute_n(neighbor->ago, inum, nall, atom->x,
+ firstneigh = sdk_gpu_compute_n(neighbor->ago, inum, nall, atom->x,
atom->type, domain->sublo, domain->subhi,
atom->tag, atom->nspecial, atom->special,
eflag, vflag, eflag_atom, vflag_atom,
host_start, &ilist, &numneigh, cpu_time,
success);
} else {
inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
- cmm_gpu_compute(neighbor->ago, inum, nall, atom->x, atom->type,
+ sdk_gpu_compute(neighbor->ago, inum, nall, atom->x, atom->type,
ilist, numneigh, firstneigh, eflag, vflag, eflag_atom,
vflag_atom, host_start, cpu_time, success);
}
if (!success)
error->one(FLERR,"Insufficient memory on accelerator");
if (host_start<inum) {
cpu_time = MPI_Wtime();
if (evflag) {
if (eflag) cpu_compute<1,1>(host_start, inum, ilist, numneigh, firstneigh);
else cpu_compute<1,0>(host_start, inum, ilist, numneigh, firstneigh);
} else cpu_compute<0,0>(host_start, inum, ilist, numneigh, firstneigh);
cpu_time = MPI_Wtime() - cpu_time;
}
}
/* ----------------------------------------------------------------------
init specific to this pair style
------------------------------------------------------------------------- */
void PairLJSDKGPU::init_style()
{
if (force->newton_pair)
error->all(FLERR,"Cannot use newton pair with lj/sdk/gpu pair style");
// Repeat cutsq calculation because done after call to init_style
double maxcut = -1.0;
double cut;
for (int i = 1; i <= atom->ntypes; i++) {
for (int j = i; j <= atom->ntypes; j++) {
if (setflag[i][j] != 0 || (setflag[i][i] != 0 && setflag[j][j] != 0)) {
cut = init_one(i,j);
cut *= cut;
if (cut > maxcut)
maxcut = cut;
cutsq[i][j] = cutsq[j][i] = cut;
} else
cutsq[i][j] = cutsq[j][i] = 0.0;
}
}
double cell_size = sqrt(maxcut) + neighbor->skin;
int maxspecial=0;
if (atom->molecular)
maxspecial=atom->maxspecial;
- int success = cmm_gpu_init(atom->ntypes+1,cutsq,lj_type,lj1,lj2,lj3,lj4,
+ int success = sdk_gpu_init(atom->ntypes+1,cutsq,lj_type,lj1,lj2,lj3,lj4,
offset, force->special_lj, atom->nlocal,
atom->nlocal+atom->nghost, 300, maxspecial,
cell_size, gpu_mode, screen);
GPU_EXTRA::check_flag(success,error,world);
if (gpu_mode == GPU_FORCE) {
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->full = 1;
}
}
/* ---------------------------------------------------------------------- */
double PairLJSDKGPU::memory_usage()
{
double bytes = Pair::memory_usage();
- return bytes + cmm_gpu_bytes();
+ return bytes + sdk_gpu_bytes();
}
/* ---------------------------------------------------------------------- */
template <int EVFLAG, int EFLAG>
void PairLJSDKGPU::cpu_compute(int start, int inum, int *ilist,
int *numneigh, int **firstneigh)
{
int i,j,ii,jj,jtype;
double xtmp,ytmp,ztmp,delx,dely,delz,evdwl,fpair;
double rsq,r2inv,forcelj,factor_lj;
const double * const * const x = atom->x;
double * const * const f = atom->f;
const int * const type = atom->type;
const double * const special_lj = force->special_lj;
double fxtmp,fytmp,fztmp;
evdwl=0.0;
// loop over neighbors of my atoms
for (ii = start; ii < inum; ii++) {
i = ilist[ii];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
fxtmp=fytmp=fztmp=0.0;
const int itype = type[i];
const int * const jlist = firstneigh[i];
const int jnum = numneigh[i];
for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
factor_lj = special_lj[sbmask(j)];
j &= NEIGHMASK;
delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
jtype = type[j];
if (rsq < cutsq[itype][jtype]) {
r2inv = 1.0/rsq;
const int ljt = lj_type[itype][jtype];
if (ljt == LJ12_4) {
const double r4inv=r2inv*r2inv;
forcelj = r4inv*(lj1[itype][jtype]*r4inv*r4inv
- lj2[itype][jtype]);
if (EFLAG)
evdwl = r4inv*(lj3[itype][jtype]*r4inv*r4inv
- lj4[itype][jtype]) - offset[itype][jtype];
} else if (ljt == LJ9_6) {
const double r3inv = r2inv*sqrt(r2inv);
const double r6inv = r3inv*r3inv;
forcelj = r6inv*(lj1[itype][jtype]*r3inv
- lj2[itype][jtype]);
if (EFLAG)
evdwl = r6inv*(lj3[itype][jtype]*r3inv
- lj4[itype][jtype]) - offset[itype][jtype];
} else if (ljt == LJ12_6) {
const double r6inv = r2inv*r2inv*r2inv;
forcelj = r6inv*(lj1[itype][jtype]*r6inv
- lj2[itype][jtype]);
if (EFLAG)
evdwl = r6inv*(lj3[itype][jtype]*r6inv
- lj4[itype][jtype]) - offset[itype][jtype];
} else continue;
fpair = factor_lj*forcelj*r2inv;
fxtmp += delx*fpair;
fytmp += dely*fpair;
fztmp += delz*fpair;
if (EVFLAG) ev_tally_full(i,evdwl,0.0,fpair,delx,dely,delz);
}
}
f[i][0] += fxtmp;
f[i][1] += fytmp;
f[i][2] += fztmp;
}
}
diff --git a/src/GPU/pair_lj_sdk_gpu.h b/src/GPU/pair_lj_sdk_gpu.h
index 610fb8b0e..3865b3404 100644
--- a/src/GPU/pair_lj_sdk_gpu.h
+++ b/src/GPU/pair_lj_sdk_gpu.h
@@ -1,60 +1,59 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/sdk/gpu,PairLJSDKGPU)
-PairStyle(cg/cmm/gpu,PairLJSDKGPU)
#else
#ifndef LMP_PAIR_LJ_SDK_GPU_H
#define LMP_PAIR_LJ_SDK_GPU_H
#include "pair_lj_sdk.h"
namespace LAMMPS_NS {
class PairLJSDKGPU : public PairLJSDK {
public:
PairLJSDKGPU(LAMMPS *lmp);
~PairLJSDKGPU();
template <int, int>
void cpu_compute(int, int, int *, int *, int **);
void compute(int, int);
void init_style();
double memory_usage();
enum { GPU_FORCE, GPU_NEIGH, GPU_HYB_NEIGH };
private:
int gpu_mode;
double cpu_time;
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Insufficient memory on accelerator
There is insufficient memory on one of the devices specified for the gpu
package
E: Cannot use newton pair with lj/sdk/gpu pair style
Self-explanatory.
*/
diff --git a/src/KOKKOS/Install.sh b/src/KOKKOS/Install.sh
index bbebc36c1..790b9224c 100644
--- a/src/KOKKOS/Install.sh
+++ b/src/KOKKOS/Install.sh
@@ -1,256 +1,256 @@
# Install/unInstall package files in LAMMPS
# mode = 0/1/2 for uninstall/install/update
mode=$1
# enforce using portable C locale
LC_ALL=C
export LC_ALL
# arg1 = file, arg2 = file it depends on
action () {
if (test $mode = 0) then
rm -f ../$1
elif (! cmp -s $1 ../$1) then
if (test -z "$2" || test -e ../$2) then
cp $1 ..
if (test $mode = 2) then
echo " updating src/$1"
fi
fi
elif (test -n "$2") then
if (test ! -e ../$2) then
rm -f ../$1
fi
fi
}
# force rebuild of files with LMP_KOKKOS switch
touch ../accelerator_kokkos.h
touch ../memory.h
# list of files with optional dependcies
action angle_charmm_kokkos.cpp angle_charmm.cpp
action angle_charmm_kokkos.h angle_charmm.h
action angle_class2_kokkos.cpp angle_class2.cpp
action angle_class2_kokkos.h angle_class2.h
action angle_harmonic_kokkos.cpp angle_harmonic.cpp
action angle_harmonic_kokkos.h angle_harmonic.h
action atom_kokkos.cpp
action atom_kokkos.h
action atom_vec_angle_kokkos.cpp atom_vec_angle.cpp
action atom_vec_angle_kokkos.h atom_vec_angle.h
action atom_vec_atomic_kokkos.cpp
action atom_vec_atomic_kokkos.h
action atom_vec_bond_kokkos.cpp atom_vec_bond.cpp
action atom_vec_bond_kokkos.h atom_vec_bond.h
action atom_vec_charge_kokkos.cpp
action atom_vec_charge_kokkos.h
action atom_vec_full_kokkos.cpp atom_vec_full.cpp
action atom_vec_full_kokkos.h atom_vec_full.h
action atom_vec_kokkos.cpp
action atom_vec_kokkos.h
action atom_vec_molecular_kokkos.cpp atom_vec_molecular.cpp
action atom_vec_molecular_kokkos.h atom_vec_molecular.h
action bond_class2_kokkos.cpp bond_class2.cpp
action bond_class2_kokkos.h bond_class2.h
action bond_fene_kokkos.cpp bond_fene.cpp
action bond_fene_kokkos.h bond_fene.h
action bond_harmonic_kokkos.cpp bond_harmonic.cpp
action bond_harmonic_kokkos.h bond_harmonic.h
action comm_kokkos.cpp
action comm_kokkos.h
action comm_tiled_kokkos.cpp
action comm_tiled_kokkos.h
action compute_temp_kokkos.cpp
action compute_temp_kokkos.h
action dihedral_charmm_kokkos.cpp dihedral_charmm.cpp
action dihedral_charmm_kokkos.h dihedral_charmm.h
action dihedral_class2_kokkos.cpp dihedral_class2.cpp
action dihedral_class2_kokkos.h dihedral_class2.h
action dihedral_opls_kokkos.cpp dihedral_opls.cpp
action dihedral_opls_kokkos.h dihedral_opls.h
action domain_kokkos.cpp
action domain_kokkos.h
action fix_deform_kokkos.cpp
action fix_deform_kokkos.h
action fix_langevin_kokkos.cpp
action fix_langevin_kokkos.h
action fix_nh_kokkos.cpp
action fix_nh_kokkos.h
action fix_nph_kokkos.cpp
action fix_nph_kokkos.h
action fix_npt_kokkos.cpp
action fix_npt_kokkos.h
action fix_nve_kokkos.cpp
action fix_nve_kokkos.h
action fix_nvt_kokkos.cpp
action fix_nvt_kokkos.h
action fix_qeq_reax_kokkos.cpp fix_qeq_reax.cpp
action fix_qeq_reax_kokkos.h fix_qeq_reax.h
action fix_reaxc_bonds_kokkos.cpp fix_reaxc_bonds.cpp
action fix_reaxc_bonds_kokkos.h fix_reaxc_bonds.h
action fix_reaxc_species_kokkos.cpp fix_reaxc_species.cpp
action fix_reaxc_species_kokkos.h fix_reaxc_species.h
action fix_setforce_kokkos.cpp
action fix_setforce_kokkos.h
action fix_momentum_kokkos.cpp
action fix_momentum_kokkos.h
action fix_wall_reflect_kokkos.cpp
action fix_wall_reflect_kokkos.h
action gridcomm_kokkos.cpp gridcomm.cpp
action gridcomm_kokkos.h gridcomm.h
action improper_class2_kokkos.cpp improper_class2.cpp
action improper_class2_kokkos.h improper_class2.h
action improper_harmonic_kokkos.cpp improper_harmonic.cpp
action improper_harmonic_kokkos.h improper_harmonic.h
action kokkos.cpp
action kokkos.h
action kokkos_type.h
action kokkos_few.h
action memory_kokkos.h
action modify_kokkos.cpp
action modify_kokkos.h
action neigh_bond_kokkos.cpp
action neigh_bond_kokkos.h
action neigh_list_kokkos.cpp
action neigh_list_kokkos.h
action neighbor_kokkos.cpp
action neighbor_kokkos.h
action npair_copy_kokkos.cpp
action npair_copy_kokkos.h
action npair_kokkos.cpp
action npair_kokkos.h
action nbin_kokkos.cpp
action nbin_kokkos.h
action math_special_kokkos.cpp
action math_special_kokkos.h
action pair_buck_coul_cut_kokkos.cpp
action pair_buck_coul_cut_kokkos.h
action pair_buck_coul_long_kokkos.cpp pair_buck_coul_long.cpp
action pair_buck_coul_long_kokkos.h pair_buck_coul_long.h
action pair_buck_kokkos.cpp
action pair_buck_kokkos.h
action pair_coul_cut_kokkos.cpp
action pair_coul_cut_kokkos.h
action pair_coul_debye_kokkos.cpp
action pair_coul_debye_kokkos.h
action pair_coul_dsf_kokkos.cpp
action pair_coul_dsf_kokkos.h
action pair_coul_long_kokkos.cpp pair_coul_long.cpp
action pair_coul_long_kokkos.h pair_coul_long.h
action pair_coul_wolf_kokkos.cpp
action pair_coul_wolf_kokkos.h
action pair_eam_kokkos.cpp pair_eam.cpp
action pair_eam_kokkos.h pair_eam.h
action pair_eam_alloy_kokkos.cpp pair_eam_alloy.cpp
action pair_eam_alloy_kokkos.h pair_eam_alloy.h
action pair_eam_fs_kokkos.cpp pair_eam_fs.cpp
action pair_eam_fs_kokkos.h pair_eam_fs.h
action pair_kokkos.h
action pair_lj_charmm_coul_charmm_implicit_kokkos.cpp pair_lj_charmm_coul_charmm_implicit.cpp
action pair_lj_charmm_coul_charmm_implicit_kokkos.h pair_lj_charmm_coul_charmm_implicit.h
action pair_lj_charmm_coul_charmm_kokkos.cpp pair_lj_charmm_coul_charmm.cpp
action pair_lj_charmm_coul_charmm_kokkos.h pair_lj_charmm_coul_charmm.h
action pair_lj_charmm_coul_long_kokkos.cpp pair_lj_charmm_coul_long.cpp
action pair_lj_charmm_coul_long_kokkos.h pair_lj_charmm_coul_long.h
action pair_lj_class2_coul_cut_kokkos.cpp pair_lj_class2_coul_cut.cpp
action pair_lj_class2_coul_cut_kokkos.h pair_lj_class2_coul_cut.h
action pair_lj_class2_coul_long_kokkos.cpp pair_lj_class2_coul_long.cpp
action pair_lj_class2_coul_long_kokkos.h pair_lj_class2_coul_long.h
action pair_lj_class2_kokkos.cpp pair_lj_class2.cpp
action pair_lj_class2_kokkos.h pair_lj_class2.h
action pair_lj_cut_coul_cut_kokkos.cpp
action pair_lj_cut_coul_cut_kokkos.h
action pair_lj_cut_coul_debye_kokkos.cpp
action pair_lj_cut_coul_debye_kokkos.h
action pair_lj_cut_coul_dsf_kokkos.cpp
action pair_lj_cut_coul_dsf_kokkos.h
action pair_lj_cut_coul_long_kokkos.cpp pair_lj_cut_coul_long.cpp
action pair_lj_cut_coul_long_kokkos.h pair_lj_cut_coul_long.h
action pair_lj_cut_kokkos.cpp
action pair_lj_cut_kokkos.h
action pair_lj_expand_kokkos.cpp
action pair_lj_expand_kokkos.h
action pair_lj_gromacs_coul_gromacs_kokkos.cpp
action pair_lj_gromacs_coul_gromacs_kokkos.h
action pair_lj_gromacs_kokkos.cpp
action pair_lj_gromacs_kokkos.h
action pair_lj_sdk_kokkos.cpp pair_lj_sdk.cpp
action pair_lj_sdk_kokkos.h pair_lj_sdk.h
action pair_morse_kokkos.cpp
action pair_morse_kokkos.h
-action pair_reax_c_kokkos.cpp pair_reax_c.cpp
-action pair_reax_c_kokkos.h pair_reax_c.h
+action pair_reaxc_kokkos.cpp pair_reaxc.cpp
+action pair_reaxc_kokkos.h pair_reaxc.h
action pair_sw_kokkos.cpp pair_sw.cpp
action pair_sw_kokkos.h pair_sw.h
action pair_vashishta_kokkos.cpp pair_vashishta.cpp
action pair_vashishta_kokkos.h pair_vashishta.h
action pair_table_kokkos.cpp
action pair_table_kokkos.h
action pair_tersoff_kokkos.cpp pair_tersoff.cpp
action pair_tersoff_kokkos.h pair_tersoff.h
action pair_tersoff_mod_kokkos.cpp pair_tersoff_mod.cpp
action pair_tersoff_mod_kokkos.h pair_tersoff_mod.h
action pair_tersoff_zbl_kokkos.cpp pair_tersoff_zbl.cpp
action pair_tersoff_zbl_kokkos.h pair_tersoff_zbl.h
action pppm_kokkos.cpp pppm.cpp
action pppm_kokkos.h pppm.h
action region_block_kokkos.cpp
action region_block_kokkos.h
action verlet_kokkos.cpp
action verlet_kokkos.h
# edit 2 Makefile.package files to include/exclude package info
if (test $1 = 1) then
if (test -e ../Makefile.package) then
sed -i -e 's/[^ \t]*kokkos[^ \t]* //g' ../Makefile.package
sed -i -e 's/[^ \t]*KOKKOS[^ \t]* //g' ../Makefile.package
sed -i -e 's|^PKG_INC =[ \t]*|&-DLMP_KOKKOS |' ../Makefile.package
# sed -i -e 's|^PKG_PATH =[ \t]*|&-L..\/..\/lib\/kokkos\/core\/src |' ../Makefile.package
sed -i -e 's|^PKG_CPP_DEPENDS =[ \t]*|&$(KOKKOS_CPP_DEPENDS) |' ../Makefile.package
sed -i -e 's|^PKG_LIB =[ \t]*|&$(KOKKOS_LIBS) |' ../Makefile.package
sed -i -e 's|^PKG_LINK_DEPENDS =[ \t]*|&$(KOKKOS_LINK_DEPENDS) |' ../Makefile.package
sed -i -e 's|^PKG_SYSINC =[ \t]*|&$(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) |' ../Makefile.package
sed -i -e 's|^PKG_SYSLIB =[ \t]*|&$(KOKKOS_LDFLAGS) |' ../Makefile.package
# sed -i -e 's|^PKG_SYSPATH =[ \t]*|&$(kokkos_SYSPATH) |' ../Makefile.package
fi
if (test -e ../Makefile.package.settings) then
sed -i -e '/CXX\ =\ \$(CC)/d' ../Makefile.package.settings
sed -i -e '/^include.*kokkos.*$/d' ../Makefile.package.settings
# multiline form needed for BSD sed on Macs
sed -i -e '4 i \
CXX = $(CC)
' ../Makefile.package.settings
sed -i -e '5 i \
include ..\/..\/lib\/kokkos\/Makefile.kokkos
' ../Makefile.package.settings
fi
# comb/omp triggers a persistent bug in nvcc. deleting it.
rm -f ../*_comb_omp.*
elif (test $1 = 2) then
# comb/omp triggers a persistent bug in nvcc. deleting it.
rm -f ../*_comb_omp.*
elif (test $1 = 0) then
if (test -e ../Makefile.package) then
sed -i -e 's/[^ \t]*kokkos[^ \t]* //g' ../Makefile.package
sed -i -e 's/[^ \t]*KOKKOS[^ \t]* //g' ../Makefile.package
fi
if (test -e ../Makefile.package.settings) then
sed -i -e '/CXX\ =\ \$(CC)/d' ../Makefile.package.settings
sed -i -e '/^include.*kokkos.*$/d' ../Makefile.package.settings
fi
fi
diff --git a/src/KOKKOS/atom_vec_angle_kokkos.cpp b/src/KOKKOS/atom_vec_angle_kokkos.cpp
index 48fc3a352..34b868aad 100644
--- a/src/KOKKOS/atom_vec_angle_kokkos.cpp
+++ b/src/KOKKOS/atom_vec_angle_kokkos.cpp
@@ -1,1982 +1,1982 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <stdlib.h>
#include "atom_vec_angle_kokkos.h"
#include "atom_kokkos.h"
#include "comm_kokkos.h"
#include "domain.h"
#include "modify.h"
#include "fix.h"
#include "atom_masks.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
#define DELTA 10000
/* ---------------------------------------------------------------------- */
AtomVecAngleKokkos::AtomVecAngleKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
{
molecular = 1;
bonds_allow = angles_allow = 1;
mass_type = 1;
comm_x_only = comm_f_only = 1;
size_forward = 3;
size_reverse = 3;
size_border = 7;
size_velocity = 3;
size_data_atom = 6;
size_data_vel = 4;
xcol_data = 4;
atom->molecule_flag = 1;
k_count = DAT::tdual_int_1d("atom::k_count",1);
atomKK = (AtomKokkos *) atom;
commKK = (CommKokkos *) comm;
buffer = NULL;
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by DELTA
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecAngleKokkos::grow(int n)
{
if (n == 0) nmax += DELTA;
else nmax = n;
atomKK->nmax = nmax;
if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
sync(Device,ALL_MASK);
modified(Device,ALL_MASK);
memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");
memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");
memory->grow_kokkos(atomKK->k_molecule,atomKK->molecule,nmax,"atom:molecule");
memory->grow_kokkos(atomKK->k_nspecial,atomKK->nspecial,nmax,3,"atom:nspecial");
memory->grow_kokkos(atomKK->k_special,atomKK->special,nmax,atomKK->maxspecial,
- "atom:special");
+ "atom:special");
memory->grow_kokkos(atomKK->k_num_bond,atomKK->num_bond,nmax,"atom:num_bond");
memory->grow_kokkos(atomKK->k_bond_type,atomKK->bond_type,nmax,atomKK->bond_per_atom,
- "atom:bond_type");
+ "atom:bond_type");
memory->grow_kokkos(atomKK->k_bond_atom,atomKK->bond_atom,nmax,atomKK->bond_per_atom,
- "atom:bond_atom");
+ "atom:bond_atom");
memory->grow_kokkos(atomKK->k_num_angle,atomKK->num_angle,nmax,"atom:num_angle");
memory->grow_kokkos(atomKK->k_angle_type,atomKK->angle_type,nmax,atomKK->angle_per_atom,
- "atom:angle_type");
+ "atom:angle_type");
memory->grow_kokkos(atomKK->k_angle_atom1,atomKK->angle_atom1,nmax,atomKK->angle_per_atom,
- "atom:angle_atom1");
+ "atom:angle_atom1");
memory->grow_kokkos(atomKK->k_angle_atom2,atomKK->angle_atom2,nmax,atomKK->angle_per_atom,
- "atom:angle_atom2");
+ "atom:angle_atom2");
memory->grow_kokkos(atomKK->k_angle_atom3,atomKK->angle_atom3,nmax,atomKK->angle_per_atom,
- "atom:angle_atom3");
+ "atom:angle_atom3");
grow_reset();
sync(Host,ALL_MASK);
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecAngleKokkos::grow_reset()
{
tag = atomKK->tag;
d_tag = atomKK->k_tag.d_view;
h_tag = atomKK->k_tag.h_view;
type = atomKK->type;
d_type = atomKK->k_type.d_view;
h_type = atomKK->k_type.h_view;
mask = atomKK->mask;
d_mask = atomKK->k_mask.d_view;
h_mask = atomKK->k_mask.h_view;
image = atomKK->image;
d_image = atomKK->k_image.d_view;
h_image = atomKK->k_image.h_view;
x = atomKK->x;
d_x = atomKK->k_x.d_view;
h_x = atomKK->k_x.h_view;
v = atomKK->v;
d_v = atomKK->k_v.d_view;
h_v = atomKK->k_v.h_view;
f = atomKK->f;
d_f = atomKK->k_f.d_view;
h_f = atomKK->k_f.h_view;
molecule = atomKK->molecule;
d_molecule = atomKK->k_molecule.d_view;
h_molecule = atomKK->k_molecule.h_view;
nspecial = atomKK->nspecial;
d_nspecial = atomKK->k_nspecial.d_view;
h_nspecial = atomKK->k_nspecial.h_view;
special = atomKK->special;
d_special = atomKK->k_special.d_view;
h_special = atomKK->k_special.h_view;
num_bond = atomKK->num_bond;
d_num_bond = atomKK->k_num_bond.d_view;
h_num_bond = atomKK->k_num_bond.h_view;
bond_type = atomKK->bond_type;
d_bond_type = atomKK->k_bond_type.d_view;
h_bond_type = atomKK->k_bond_type.h_view;
bond_atom = atomKK->bond_atom;
d_bond_atom = atomKK->k_bond_atom.d_view;
h_bond_atom = atomKK->k_bond_atom.h_view;
num_angle = atomKK->num_angle;
d_num_angle = atomKK->k_num_angle.d_view;
h_num_angle = atomKK->k_num_angle.h_view;
angle_type = atomKK->angle_type;
d_angle_type = atomKK->k_angle_type.d_view;
h_angle_type = atomKK->k_angle_type.h_view;
angle_atom1 = atomKK->angle_atom1;
d_angle_atom1 = atomKK->k_angle_atom1.d_view;
h_angle_atom1 = atomKK->k_angle_atom1.h_view;
angle_atom2 = atomKK->angle_atom2;
d_angle_atom2 = atomKK->k_angle_atom2.d_view;
h_angle_atom2 = atomKK->k_angle_atom2.h_view;
angle_atom3 = atomKK->angle_atom3;
d_angle_atom3 = atomKK->k_angle_atom3.d_view;
h_angle_atom3 = atomKK->k_angle_atom3.h_view;
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecAngleKokkos::copy(int i, int j, int delflag)
{
int k;
h_tag[j] = h_tag[i];
h_type[j] = h_type[i];
mask[j] = mask[i];
h_image[j] = h_image[i];
h_x(j,0) = h_x(i,0);
h_x(j,1) = h_x(i,1);
h_x(j,2) = h_x(i,2);
h_v(j,0) = h_v(i,0);
h_v(j,1) = h_v(i,1);
h_v(j,2) = h_v(i,2);
h_molecule(j) = h_molecule(i);
h_num_bond(j) = h_num_bond(i);
for (k = 0; k < h_num_bond(j); k++) {
h_bond_type(j,k) = h_bond_type(i,k);
h_bond_atom(j,k) = h_bond_atom(i,k);
}
h_nspecial(j,0) = h_nspecial(i,0);
h_nspecial(j,1) = h_nspecial(i,1);
h_nspecial(j,2) = h_nspecial(i,2);
for (k = 0; k < h_nspecial(j,2); k++)
h_special(j,k) = h_special(i,k);
h_num_angle(j) = h_num_angle(i);
for (k = 0; k < h_num_angle(j); k++) {
h_angle_type(j,k) = h_angle_type(i,k);
h_angle_atom1(j,k) = h_angle_atom1(i,k);
h_angle_atom2(j,k) = h_angle_atom2(i,k);
h_angle_atom3(j,k) = h_angle_atom3(i,k);
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecAngleKokkos_PackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecAngleKokkos_PackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
const size_t maxsend = (buf.view<DeviceType>().dimension_0()
- *buf.view<DeviceType>().dimension_1())/3;
+ *buf.view<DeviceType>().dimension_1())/3;
const size_t elements = 3;
buffer_view<DeviceType>(_buf,buf,maxsend,elements);
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_comm_kokkos(const int &n,
- const DAT::tdual_int_2d &list,
- const int & iswap,
- const DAT::tdual_xfloat_2d &buf,
- const int &pbc_flag,
- const int* const pbc)
+ const DAT::tdual_int_2d &list,
+ const int & iswap,
+ const DAT::tdual_xfloat_2d &buf,
+ const int &pbc_flag,
+ const int* const pbc)
{
// Check whether to always run forward communication on the host
// Choose correct forward PackComm kernel
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecAngleKokkos_PackComm<LMPHostType,1,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAngleKokkos_PackComm<LMPHostType,1,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecAngleKokkos_PackComm<LMPHostType,0,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAngleKokkos_PackComm<LMPHostType,0,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecAngleKokkos_PackComm<LMPDeviceType,1,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAngleKokkos_PackComm<LMPDeviceType,1,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecAngleKokkos_PackComm<LMPDeviceType,0,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAngleKokkos_PackComm<LMPDeviceType,0,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
- return n*size_forward;
+ return n*size_forward;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecAngleKokkos_PackCommSelf {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_x_array _xw;
int _nfirst;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecAngleKokkos_PackCommSelf(
const typename DAT::tdual_x_array &x,
const int &nfirst,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_xw(i+_nfirst,0) = _x(j,0);
_xw(i+_nfirst,1) = _x(j,1);
_xw(i+_nfirst,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list,
const int & iswap,
const int nfirst, const int &pbc_flag,
const int* const pbc) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecAngleKokkos_PackCommSelf<LMPHostType,1,1>
f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAngleKokkos_PackCommSelf<LMPHostType,1,0>
f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecAngleKokkos_PackCommSelf<LMPHostType,0,1>
f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAngleKokkos_PackCommSelf<LMPHostType,0,0>
f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecAngleKokkos_PackCommSelf<LMPDeviceType,1,1>
f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAngleKokkos_PackCommSelf<LMPDeviceType,1,0>
f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecAngleKokkos_PackCommSelf<LMPDeviceType,0,1>
f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAngleKokkos_PackCommSelf<LMPDeviceType,0,0>
f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*3;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecAngleKokkos_UnpackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
int _first;
AtomVecAngleKokkos_UnpackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
_first(first) {};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
}
};
/* ---------------------------------------------------------------------- */
void AtomVecAngleKokkos::unpack_comm_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf ) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
struct AtomVecAngleKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
struct AtomVecAngleKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecAngleKokkos::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void AtomVecAngleKokkos::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_reverse(int n, int first, double *buf)
{
if(n > 0)
sync(Host,F_MASK);
int m = 0;
const int last = first + n;
for (int i = first; i < last; i++) {
buf[m++] = h_f(i,0);
buf[m++] = h_f(i,1);
buf[m++] = h_f(i,2);
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecAngleKokkos::unpack_reverse(int n, int *list, double *buf)
{
if(n > 0)
modified(Host,F_MASK);
int m = 0;
for (int i = 0; i < n; i++) {
const int j = list[i];
h_f(j,0) += buf[m++];
h_f(j,1) += buf[m++];
h_f(j,2) += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG>
struct AtomVecAngleKokkos_PackBorder {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_xfloat_2d _buf;
const typename AT::t_int_2d_const _list;
const int _iswap;
const typename AT::t_x_array_randomread _x;
const typename AT::t_tagint_1d _tag;
const typename AT::t_int_1d _type;
const typename AT::t_int_1d _mask;
const typename AT::t_tagint_1d _molecule;
X_FLOAT _dx,_dy,_dz;
AtomVecAngleKokkos_PackBorder(
const typename AT::t_xfloat_2d &buf,
const typename AT::t_int_2d_const &list,
const int & iswap,
const typename AT::t_x_array &x,
const typename AT::t_tagint_1d &tag,
const typename AT::t_int_1d &type,
const typename AT::t_int_1d &mask,
const typename AT::t_tagint_1d &molecule,
const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
_buf(buf),_list(list),_iswap(iswap),
_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
_dx(dx),_dy(dy),_dz(dz) {}
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
- _buf(i,3) = _tag(j);
- _buf(i,4) = _type(j);
- _buf(i,5) = _mask(j);
- _buf(i,6) = _molecule(j);
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
+ _buf(i,6) = d_ubuf(_molecule(j)).d;
} else {
_buf(i,0) = _x(j,0) + _dx;
_buf(i,1) = _x(j,1) + _dy;
_buf(i,2) = _x(j,2) + _dz;
- _buf(i,3) = _tag(j);
- _buf(i,4) = _type(j);
- _buf(i,5) = _mask(j);
- _buf(i,6) = _molecule(j);
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
+ _buf(i,6) = d_ubuf(_molecule(j)).d;
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist,
DAT::tdual_xfloat_2d buf,int iswap,
int pbc_flag, int *pbc, ExecutionSpace space)
{
X_FLOAT dx,dy,dz;
if (pbc_flag != 0) {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if(space==Host) {
AtomVecAngleKokkos_PackBorder<LMPHostType,1> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecAngleKokkos_PackBorder<LMPDeviceType,1> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
} else {
dx = dy = dz = 0;
if(space==Host) {
AtomVecAngleKokkos_PackBorder<LMPHostType,0> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecAngleKokkos_PackBorder<LMPDeviceType,0> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
return n*size_border;
}
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_molecule(j);
}
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecAngleKokkos_UnpackBorder {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
const typename AT::t_xfloat_2d_const _buf;
typename AT::t_x_array _x;
typename AT::t_tagint_1d _tag;
typename AT::t_int_1d _type;
typename AT::t_int_1d _mask;
typename AT::t_tagint_1d _molecule;
int _first;
AtomVecAngleKokkos_UnpackBorder(
const typename AT::t_xfloat_2d_const &buf,
typename AT::t_x_array &x,
typename AT::t_tagint_1d &tag,
typename AT::t_int_1d &type,
typename AT::t_int_1d &mask,
typename AT::t_tagint_1d &molecule,
const int& first):
_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
_first(first){
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
- _tag(i+_first) = static_cast<tagint> (_buf(i,3));
- _type(i+_first) = static_cast<int> (_buf(i,4));
- _mask(i+_first) = static_cast<int> (_buf(i,5));
- _molecule(i+_first) = static_cast<tagint> (_buf(i,6));
+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
+ _molecule(i+_first) = (tagint) d_ubuf(_buf(i,6)).i;
}
};
/* ---------------------------------------------------------------------- */
void AtomVecAngleKokkos::unpack_border_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf,
ExecutionSpace space) {
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
while (first+n >= nmax) grow(0);
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
if(space==Host) {
struct AtomVecAngleKokkos_UnpackBorder<LMPHostType>
f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,h_molecule,first);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
struct AtomVecAngleKokkos_UnpackBorder<LMPDeviceType>
f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,d_molecule,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
void AtomVecAngleKokkos::unpack_border(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecAngleKokkos::unpack_border_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|V_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::unpack_border_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++)
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecAngleKokkos_PackExchangeFunctor {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array_randomread _x;
typename AT::t_v_array_randomread _v;
typename AT::t_tagint_1d_randomread _tag;
typename AT::t_int_1d_randomread _type;
typename AT::t_int_1d_randomread _mask;
typename AT::t_imageint_1d_randomread _image;
typename AT::t_tagint_1d_randomread _molecule;
typename AT::t_int_2d_randomread _nspecial;
typename AT::t_tagint_2d_randomread _special;
typename AT::t_int_1d_randomread _num_bond;
typename AT::t_int_2d_randomread _bond_type;
typename AT::t_tagint_2d_randomread _bond_atom;
typename AT::t_int_1d_randomread _num_angle;
typename AT::t_int_2d_randomread _angle_type;
typename AT::t_tagint_2d_randomread _angle_atom1,_angle_atom2,_angle_atom3;
typename AT::t_x_array _xw;
typename AT::t_v_array _vw;
typename AT::t_tagint_1d _tagw;
typename AT::t_int_1d _typew;
typename AT::t_int_1d _maskw;
typename AT::t_imageint_1d _imagew;
typename AT::t_tagint_1d _moleculew;
typename AT::t_int_2d _nspecialw;
typename AT::t_tagint_2d _specialw;
typename AT::t_int_1d _num_bondw;
typename AT::t_int_2d _bond_typew;
typename AT::t_tagint_2d _bond_atomw;
typename AT::t_int_1d _num_anglew;
typename AT::t_int_2d _angle_typew;
typename AT::t_tagint_2d _angle_atom1w,_angle_atom2w,_angle_atom3w;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d_const _sendlist;
typename AT::t_int_1d_const _copylist;
int _nlocal,_dim;
X_FLOAT _lo,_hi;
size_t elements;
AtomVecAngleKokkos_PackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d sendlist,
typename AT::tdual_int_1d copylist,int nlocal, int dim,
X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_molecule(atom->k_molecule.view<DeviceType>()),
_nspecial(atom->k_nspecial.view<DeviceType>()),
_special(atom->k_special.view<DeviceType>()),
_num_bond(atom->k_num_bond.view<DeviceType>()),
_bond_type(atom->k_bond_type.view<DeviceType>()),
_bond_atom(atom->k_bond_atom.view<DeviceType>()),
_num_angle(atom->k_num_angle.view<DeviceType>()),
_angle_type(atom->k_angle_type.view<DeviceType>()),
_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
_xw(atom->k_x.view<DeviceType>()),
_vw(atom->k_v.view<DeviceType>()),
_tagw(atom->k_tag.view<DeviceType>()),
_typew(atom->k_type.view<DeviceType>()),
_maskw(atom->k_mask.view<DeviceType>()),
_imagew(atom->k_image.view<DeviceType>()),
_moleculew(atom->k_molecule.view<DeviceType>()),
_nspecialw(atom->k_nspecial.view<DeviceType>()),
_specialw(atom->k_special.view<DeviceType>()),
_num_bondw(atom->k_num_bond.view<DeviceType>()),
_bond_typew(atom->k_bond_type.view<DeviceType>()),
_bond_atomw(atom->k_bond_atom.view<DeviceType>()),
_num_anglew(atom->k_num_angle.view<DeviceType>()),
_angle_typew(atom->k_angle_type.view<DeviceType>()),
_angle_atom1w(atom->k_angle_atom1.view<DeviceType>()),
_angle_atom2w(atom->k_angle_atom2.view<DeviceType>()),
_angle_atom3w(atom->k_angle_atom3.view<DeviceType>()),
_sendlist(sendlist.template view<DeviceType>()),
_copylist(copylist.template view<DeviceType>()),
_nlocal(nlocal),_dim(dim),
_lo(lo),_hi(hi){
// 3 comp of x, 3 comp of v, 1 tag, 1 type, 1 mask, 1 image, 1 molecule, 3 nspecial,
// maxspecial special, 1 num_bond, bond_per_atom bond_type, bond_per_atom bond_atom,
// 1 num_angle, angle_per_atom angle_type, angle_per_atom angle_atom1, angle_atom2,
// and angle_atom3
// 1 to store buffer length
elements = 17+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &mysend) const {
int k;
const int i = _sendlist(mysend);
_buf(mysend,0) = elements;
int m = 1;
_buf(mysend,m++) = _x(i,0);
_buf(mysend,m++) = _x(i,1);
_buf(mysend,m++) = _x(i,2);
_buf(mysend,m++) = _v(i,0);
_buf(mysend,m++) = _v(i,1);
_buf(mysend,m++) = _v(i,2);
- _buf(mysend,m++) = _tag(i);
- _buf(mysend,m++) = _type(i);
- _buf(mysend,m++) = _mask(i);
- _buf(mysend,m++) = _image(i);
- _buf(mysend,m++) = _molecule(i);
- _buf(mysend,m++) = _num_bond(i);
+ _buf(mysend,m++) = d_ubuf(_tag(i)).d;
+ _buf(mysend,m++) = d_ubuf(_type(i)).d;
+ _buf(mysend,m++) = d_ubuf(_mask(i)).d;
+ _buf(mysend,m++) = d_ubuf(_image(i)).d;
+ _buf(mysend,m++) = d_ubuf(_molecule(i)).d;
+ _buf(mysend,m++) = d_ubuf(_num_bond(i)).d;
for (k = 0; k < _num_bond(i); k++) {
- _buf(mysend,m++) = _bond_type(i,k);
- _buf(mysend,m++) = _bond_atom(i,k);
+ _buf(mysend,m++) = d_ubuf(_bond_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_bond_atom(i,k)).d;
}
- _buf(mysend,m++) = _num_angle(i);
+ _buf(mysend,m++) = d_ubuf(_num_angle(i)).d;
for (k = 0; k < _num_angle(i); k++) {
- _buf(mysend,m++) = _angle_type(i,k);
- _buf(mysend,m++) = _angle_atom1(i,k);
- _buf(mysend,m++) = _angle_atom2(i,k);
- _buf(mysend,m++) = _angle_atom3(i,k);
+ _buf(mysend,m++) = d_ubuf(_angle_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_angle_atom1(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_angle_atom2(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_angle_atom3(i,k)).d;
}
- _buf(mysend,m++) = _nspecial(i,0);
- _buf(mysend,m++) = _nspecial(i,1);
- _buf(mysend,m++) = _nspecial(i,2);
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,0)).d;
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,1)).d;
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,2)).d;
for (k = 0; k < _nspecial(i,2); k++)
- _buf(mysend,m++) = _special(i,k);
+ _buf(mysend,m++) = d_ubuf(_special(i,k)).d;
const int j = _copylist(mysend);
if(j>-1) {
_xw(i,0) = _x(j,0);
_xw(i,1) = _x(j,1);
_xw(i,2) = _x(j,2);
_vw(i,0) = _v(j,0);
_vw(i,1) = _v(j,1);
_vw(i,2) = _v(j,2);
_tagw(i) = _tag(j);
_typew(i) = _type(j);
_maskw(i) = _mask(j);
_imagew(i) = _image(j);
_moleculew(i) = _molecule(j);
_num_bondw(i) = _num_bond(j);
for (k = 0; k < _num_bond(j); k++) {
_bond_typew(i,k) = _bond_type(j,k);
_bond_atomw(i,k) = _bond_atom(j,k);
}
_num_anglew(i) = _num_angle(j);
for (k = 0; k < _num_angle(j); k++) {
_angle_typew(i,k) = _angle_type(j,k);
_angle_atom1w(i,k) = _angle_atom1(j,k);
_angle_atom2w(i,k) = _angle_atom2(j,k);
_angle_atom3w(i,k) = _angle_atom3(j,k);
}
_nspecialw(i,0) = _nspecial(j,0);
_nspecialw(i,1) = _nspecial(j,1);
_nspecialw(i,2) = _nspecial(j,2);
for (k = 0; k < _nspecial(j,2); k++)
_specialw(i,k) = _special(j,k);
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf,
DAT::tdual_int_1d k_sendlist,
DAT::tdual_int_1d k_copylist,
ExecutionSpace space,int dim,X_FLOAT lo,
X_FLOAT hi )
{
const int elements = 17+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom;
if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*
k_buf.view<LMPHostType>().dimension_1())/elements) {
int newsize = nsend*elements/k_buf.view<LMPHostType>().dimension_1()+1;
k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
}
if(space == Host) {
AtomVecAngleKokkos_PackExchangeFunctor<LMPHostType>
f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPHostType::fence();
return nsend*elements;
} else {
AtomVecAngleKokkos_PackExchangeFunctor<LMPDeviceType>
f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPDeviceType::fence();
return nsend*elements;
}
}
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_exchange(int i, double *buf)
{
int k;
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = ubuf(h_molecule(i)).d;
buf[m++] = ubuf(h_num_bond(i)).d;
for (k = 0; k < h_num_bond(i); k++) {
buf[m++] = ubuf(h_bond_type(i,k)).d;
buf[m++] = ubuf(h_bond_atom(i,k)).d;
}
buf[m++] = ubuf(h_num_angle(i)).d;
for (k = 0; k < h_num_angle(i); k++) {
buf[m++] = ubuf(h_angle_type(i,k)).d;
buf[m++] = ubuf(h_angle_atom1(i,k)).d;
buf[m++] = ubuf(h_angle_atom2(i,k)).d;
buf[m++] = ubuf(h_angle_atom3(i,k)).d;
}
buf[m++] = ubuf(h_nspecial(i,0)).d;
buf[m++] = ubuf(h_nspecial(i,1)).d;
buf[m++] = ubuf(h_nspecial(i,2)).d;
for (k = 0; k < h_nspecial(i,2); k++)
buf[m++] = ubuf(h_special(i,k)).d;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecAngleKokkos_UnpackExchangeFunctor {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array _x;
typename AT::t_v_array _v;
typename AT::t_tagint_1d _tag;
typename AT::t_int_1d _type;
typename AT::t_int_1d _mask;
typename AT::t_imageint_1d _image;
typename AT::t_tagint_1d _molecule;
typename AT::t_int_2d _nspecial;
typename AT::t_tagint_2d _special;
typename AT::t_int_1d _num_bond;
typename AT::t_int_2d _bond_type;
typename AT::t_tagint_2d _bond_atom;
typename AT::t_int_1d _num_angle;
typename AT::t_int_2d _angle_type;
typename AT::t_tagint_2d _angle_atom1,_angle_atom2,_angle_atom3;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d _nlocal;
int _dim;
X_FLOAT _lo,_hi;
size_t elements;
AtomVecAngleKokkos_UnpackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d nlocal,
int dim, X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_molecule(atom->k_molecule.view<DeviceType>()),
_nspecial(atom->k_nspecial.view<DeviceType>()),
_special(atom->k_special.view<DeviceType>()),
_num_bond(atom->k_num_bond.view<DeviceType>()),
_bond_type(atom->k_bond_type.view<DeviceType>()),
_bond_atom(atom->k_bond_atom.view<DeviceType>()),
_num_angle(atom->k_num_angle.view<DeviceType>()),
_angle_type(atom->k_angle_type.view<DeviceType>()),
_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
_lo(lo),_hi(hi){
elements =17+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
- buf.template view<DeviceType>().dimension_1())/elements;
+ buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &myrecv) const {
X_FLOAT x = _buf(myrecv,_dim+1);
if (x >= _lo && x < _hi) {
int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
int m = 1;
_x(i,0) = _buf(myrecv,m++);
_x(i,1) = _buf(myrecv,m++);
_x(i,2) = _buf(myrecv,m++);
_v(i,0) = _buf(myrecv,m++);
_v(i,1) = _buf(myrecv,m++);
_v(i,2) = _buf(myrecv,m++);
- _tag(i) = _buf(myrecv,m++);
- _type(i) = _buf(myrecv,m++);
- _mask(i) = _buf(myrecv,m++);
- _image(i) = _buf(myrecv,m++);
+ _tag(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _type(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _mask(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _image(i) = (imageint) d_ubuf(_buf(myrecv,m++)).i;
- _molecule(i) = _buf(myrecv,m++);
- _num_bond(i) = _buf(myrecv,m++);
+ _molecule(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _num_bond(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
int k;
for (k = 0; k < _num_bond(i); k++) {
- _bond_type(i,k) = _buf(myrecv,m++);
- _bond_atom(i,k) = _buf(myrecv,m++);
+ _bond_type(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _bond_atom(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _num_angle(i) = _buf(myrecv,m++);
+ _num_angle(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
for (k = 0; k < _num_angle(i); k++) {
- _angle_type(i,k) = _buf(myrecv,m++);
- _angle_atom1(i,k) = _buf(myrecv,m++);
- _angle_atom2(i,k) = _buf(myrecv,m++);
- _angle_atom3(i,k) = _buf(myrecv,m++);
+ _angle_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _angle_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _angle_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _angle_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _nspecial(i,0) = _buf(myrecv,m++);
- _nspecial(i,1) = _buf(myrecv,m++);
- _nspecial(i,2) = _buf(myrecv,m++);
+ _nspecial(i,0) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _nspecial(i,1) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _nspecial(i,2) = (int) d_ubuf(_buf(myrecv,m++)).i;
for (k = 0; k < _nspecial(i,2); k++)
- _special(i,k) = _buf(myrecv,m++);
+ _special(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,
int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,
ExecutionSpace space) {
const size_t elements = 17+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom;
if(space == Host) {
k_count.h_view(0) = nlocal;
AtomVecAngleKokkos_UnpackExchangeFunctor<LMPHostType>
f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/elements,f);
LMPHostType::fence();
return k_count.h_view(0);
} else {
k_count.h_view(0) = nlocal;
k_count.modify<LMPHostType>();
k_count.sync<LMPDeviceType>();
AtomVecAngleKokkos_UnpackExchangeFunctor<LMPDeviceType>
f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/elements,f);
LMPDeviceType::fence();
k_count.modify<LMPDeviceType>();
k_count.sync<LMPHostType>();
return k_count.h_view(0);
}
}
/* ---------------------------------------------------------------------- */
int AtomVecAngleKokkos::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | MOLECULE_MASK | BOND_MASK |
ANGLE_MASK | SPECIAL_MASK);
int k;
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;
h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_bond(nlocal); k++) {
h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_angle(nlocal); k++) {
h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_nspecial(nlocal,0) = (int) ubuf(buf[m++]).i;
h_nspecial(nlocal,1) = (int) ubuf(buf[m++]).i;
h_nspecial(nlocal,2) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_nspecial(nlocal,2); k++)
h_special(nlocal,k) = (tagint) ubuf(buf[m++]).i;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecAngleKokkos::size_restart()
{
int i;
int nlocal = atom->nlocal;
int n = 0;
for (i = 0; i < nlocal; i++)
n += 14 + 2*h_num_bond(i) + 4*h_num_angle(i);
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_restart(int i, double *buf)
{
sync(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | MOLECULE_MASK | BOND_MASK |
ANGLE_MASK | SPECIAL_MASK);
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = ubuf(h_molecule(i)).d;
buf[m++] = ubuf(h_num_bond(i)).d;
for (int k = 0; k < h_num_bond(i); k++) {
buf[m++] = ubuf(MAX(h_bond_type(i,k),-h_bond_type(i,k))).d;
buf[m++] = ubuf(h_bond_atom(i,k)).d;
}
buf[m++] = ubuf(h_num_angle(i)).d;
for (int k = 0; k < h_num_angle(i); k++) {
buf[m++] = ubuf(MAX(h_angle_type(i,k),-h_angle_type(i,k))).d;
buf[m++] = ubuf(h_angle_atom1(i,k)).d;
buf[m++] = ubuf(h_angle_atom2(i,k)).d;
buf[m++] = ubuf(h_angle_atom3(i,k)).d;
}
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecAngleKokkos::unpack_restart(double *buf)
{
int k;
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | MOLECULE_MASK | BOND_MASK |
ANGLE_MASK | SPECIAL_MASK);
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;
h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_bond(nlocal); k++) {
h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_angle(nlocal); k++) {
h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecAngleKokkos::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
atomKK->modified(Host,ALL_MASK);
grow(0);
}
atomKK->modified(Host,ALL_MASK);
tag[nlocal] = 0;
type[nlocal] = itype;
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_mask(nlocal) = 1;
h_image(nlocal) = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
h_molecule(nlocal) = 0;
h_num_bond(nlocal) = 0;
h_num_angle(nlocal) = 0;
h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecAngleKokkos::data_atom(double *coord, imageint imagetmp,
char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
atomKK->modified(Host,ALL_MASK);
h_tag(nlocal) = atoi(values[0]);
h_molecule(nlocal) = atoi(values[1]);
h_type(nlocal) = atoi(values[2]);
if (h_type(nlocal) <= 0 || h_type(nlocal) > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_image(nlocal) = imagetmp;
h_mask(nlocal) = 1;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
h_num_bond(nlocal) = 0;
h_num_angle(nlocal) = 0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecAngleKokkos::data_atom_hybrid(int nlocal, char **values)
{
h_molecule(nlocal) = atoi(values[0]);
h_num_bond(nlocal) = 0;
h_num_angle(nlocal) = 0;
return 1;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecAngleKokkos::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = h_tag(i);
buf[i][1] = h_molecule(i);
buf[i][2] = h_type(i);
buf[i][3] = h_x(i,0);
buf[i][4] = h_x(i,1);
buf[i][5] = h_x(i,2);
buf[i][6] = (h_image[i] & IMGMASK) - IMGMAX;
buf[i][7] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
buf[i][8] = (h_image[i] >> IMG2BITS) - IMGMAX;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecAngleKokkos::pack_data_hybrid(int i, double *buf)
{
buf[0] = h_molecule(i);
return 1;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecAngleKokkos::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,"%d %d %d %-1.16e %-1.16e %-1.16e %d %d %d\n",
(int) buf[i][0],(int) buf[i][1], (int) buf[i][2],
buf[i][3],buf[i][4],buf[i][5],
(int) buf[i][6],(int) buf[i][7],(int) buf[i][8]);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecAngleKokkos::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," " TAGINT_FORMAT, (tagint) (buf[0]));
return 1;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecAngleKokkos::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);
if (atom->memcheck("molecule")) bytes += memory->usage(molecule,nmax);
if (atom->memcheck("nspecial")) bytes += memory->usage(nspecial,nmax,3);
if (atom->memcheck("special"))
bytes += memory->usage(special,nmax,atom->maxspecial);
if (atom->memcheck("num_bond")) bytes += memory->usage(num_bond,nmax);
if (atom->memcheck("bond_type"))
bytes += memory->usage(bond_type,nmax,atom->bond_per_atom);
if (atom->memcheck("bond_atom"))
bytes += memory->usage(bond_atom,nmax,atom->bond_per_atom);
if (atom->memcheck("num_angle")) bytes += memory->usage(num_angle,nmax);
if (atom->memcheck("angle_type"))
bytes += memory->usage(angle_type,nmax,atom->angle_per_atom);
if (atom->memcheck("angle_atom1"))
bytes += memory->usage(angle_atom1,nmax,atom->angle_per_atom);
if (atom->memcheck("angle_atom2"))
bytes += memory->usage(angle_atom2,nmax,atom->angle_per_atom);
if (atom->memcheck("angle_atom3"))
bytes += memory->usage(angle_atom3,nmax,atom->angle_per_atom);
return bytes;
}
/* ---------------------------------------------------------------------- */
void AtomVecAngleKokkos::sync(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPDeviceType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.sync<LMPDeviceType>();
atomKK->k_special.sync<LMPDeviceType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.sync<LMPDeviceType>();
atomKK->k_bond_type.sync<LMPDeviceType>();
atomKK->k_bond_atom.sync<LMPDeviceType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.sync<LMPDeviceType>();
atomKK->k_angle_type.sync<LMPDeviceType>();
atomKK->k_angle_atom1.sync<LMPDeviceType>();
atomKK->k_angle_atom2.sync<LMPDeviceType>();
atomKK->k_angle_atom3.sync<LMPDeviceType>();
}
} else {
if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPHostType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.sync<LMPHostType>();
atomKK->k_special.sync<LMPHostType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.sync<LMPHostType>();
atomKK->k_bond_type.sync<LMPHostType>();
atomKK->k_bond_atom.sync<LMPHostType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.sync<LMPHostType>();
atomKK->k_angle_type.sync<LMPHostType>();
atomKK->k_angle_atom1.sync<LMPHostType>();
atomKK->k_angle_atom2.sync<LMPHostType>();
atomKK->k_angle_atom3.sync<LMPHostType>();
}
}
}
void AtomVecAngleKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
if (mask & SPECIAL_MASK) {
if (atomKK->k_nspecial.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
if (atomKK->k_special.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
}
if (mask & BOND_MASK) {
if (atomKK->k_num_bond.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
if (atomKK->k_bond_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
if (atomKK->k_bond_atom.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
}
if (mask & ANGLE_MASK) {
if (atomKK->k_num_angle.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
if (atomKK->k_angle_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
if (atomKK->k_angle_atom1.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
if (atomKK->k_angle_atom2.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
if (atomKK->k_angle_atom3.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
}
} else {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
if (mask & SPECIAL_MASK) {
if (atomKK->k_nspecial.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
if (atomKK->k_special.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
}
if (mask & BOND_MASK) {
if (atomKK->k_num_bond.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
if (atomKK->k_bond_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
if (atomKK->k_bond_atom.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
}
if (mask & ANGLE_MASK) {
if (atomKK->k_num_angle.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
if (atomKK->k_angle_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
if (atomKK->k_angle_atom1.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
if (atomKK->k_angle_atom2.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
if (atomKK->k_angle_atom3.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
}
}
}
/* ---------------------------------------------------------------------- */
void AtomVecAngleKokkos::modified(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPDeviceType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.modify<LMPDeviceType>();
atomKK->k_special.modify<LMPDeviceType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.modify<LMPDeviceType>();
atomKK->k_bond_type.modify<LMPDeviceType>();
atomKK->k_bond_atom.modify<LMPDeviceType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.modify<LMPDeviceType>();
atomKK->k_angle_type.modify<LMPDeviceType>();
atomKK->k_angle_atom1.modify<LMPDeviceType>();
atomKK->k_angle_atom2.modify<LMPDeviceType>();
atomKK->k_angle_atom3.modify<LMPDeviceType>();
}
} else {
if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPHostType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.modify<LMPHostType>();
atomKK->k_special.modify<LMPHostType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.modify<LMPHostType>();
atomKK->k_bond_type.modify<LMPHostType>();
atomKK->k_bond_atom.modify<LMPHostType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.modify<LMPHostType>();
atomKK->k_angle_type.modify<LMPHostType>();
atomKK->k_angle_atom1.modify<LMPHostType>();
atomKK->k_angle_atom2.modify<LMPHostType>();
atomKK->k_angle_atom3.modify<LMPHostType>();
}
}
}
diff --git a/src/KOKKOS/atom_vec_atomic_kokkos.cpp b/src/KOKKOS/atom_vec_atomic_kokkos.cpp
index dc254e6a7..d040bd355 100644
--- a/src/KOKKOS/atom_vec_atomic_kokkos.cpp
+++ b/src/KOKKOS/atom_vec_atomic_kokkos.cpp
@@ -1,1438 +1,1438 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale AtomicKokkos/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <stdlib.h>
#include "atom_vec_atomic_kokkos.h"
#include "atom_kokkos.h"
#include "comm_kokkos.h"
#include "domain.h"
#include "modify.h"
#include "fix.h"
#include "atom_masks.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
#define DELTA 10000
/* ---------------------------------------------------------------------- */
AtomVecAtomicKokkos::AtomVecAtomicKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
{
molecular = 0;
mass_type = 1;
comm_x_only = comm_f_only = 1;
size_forward = 3;
size_reverse = 3;
size_border = 6;
size_velocity = 3;
size_data_atom = 5;
size_data_vel = 4;
xcol_data = 3;
k_count = DAT::tdual_int_1d("atom::k_count",1);
atomKK = (AtomKokkos *) atom;
commKK = (CommKokkos *) comm;
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by DELTA
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecAtomicKokkos::grow(int n)
{
if (n == 0) nmax += DELTA;
else nmax = n;
atomKK->nmax = nmax;
if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
sync(Device,ALL_MASK);
modified(Device,ALL_MASK);
memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");
memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");
grow_reset();
sync(Host,ALL_MASK);
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecAtomicKokkos::grow_reset()
{
tag = atomKK->tag;
d_tag = atomKK->k_tag.d_view;
h_tag = atomKK->k_tag.h_view;
type = atomKK->type;
d_type = atomKK->k_type.d_view;
h_type = atomKK->k_type.h_view;
mask = atomKK->mask;
d_mask = atomKK->k_mask.d_view;
h_mask = atomKK->k_mask.h_view;
image = atomKK->image;
d_image = atomKK->k_image.d_view;
h_image = atomKK->k_image.h_view;
x = atomKK->x;
d_x = atomKK->k_x.d_view;
h_x = atomKK->k_x.h_view;
v = atomKK->v;
d_v = atomKK->k_v.d_view;
h_v = atomKK->k_v.h_view;
f = atomKK->f;
d_f = atomKK->k_f.d_view;
h_f = atomKK->k_f.h_view;
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecAtomicKokkos::copy(int i, int j, int delflag)
{
h_tag[j] = h_tag[i];
h_type[j] = h_type[i];
mask[j] = mask[i];
h_image[j] = h_image[i];
h_x(j,0) = h_x(i,0);
h_x(j,1) = h_x(i,1);
h_x(j,2) = h_x(i,2);
h_v(j,0) = h_v(i,0);
h_v(j,1) = h_v(i,1);
h_v(j,2) = h_v(i,2);
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecAtomicKokkos_PackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecAtomicKokkos_PackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
const size_t maxsend = (buf.view<DeviceType>().dimension_0()*buf.view<DeviceType>().dimension_1())/3;
const size_t elements = 3;
buffer_view<DeviceType>(_buf,buf,maxsend,elements);
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_comm_kokkos(const int &n,
const DAT::tdual_int_2d &list,
const int & iswap,
const DAT::tdual_xfloat_2d &buf,
const int &pbc_flag,
const int* const pbc)
{
// Check whether to always run forward communication on the host
// Choose correct forward PackComm kernel
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecAtomicKokkos_PackComm<LMPHostType,1,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAtomicKokkos_PackComm<LMPHostType,1,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecAtomicKokkos_PackComm<LMPHostType,0,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAtomicKokkos_PackComm<LMPHostType,0,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecAtomicKokkos_PackComm<LMPDeviceType,1,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAtomicKokkos_PackComm<LMPDeviceType,1,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecAtomicKokkos_PackComm<LMPDeviceType,0,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAtomicKokkos_PackComm<LMPDeviceType,0,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*size_forward;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecAtomicKokkos_PackCommSelf {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_x_array _xw;
int _nfirst;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecAtomicKokkos_PackCommSelf(
const typename DAT::tdual_x_array &x,
const int &nfirst,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_xw(i+_nfirst,0) = _x(j,0);
_xw(i+_nfirst,1) = _x(j,1);
_xw(i+_nfirst,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list, const int & iswap,
const int nfirst, const int &pbc_flag, const int* const pbc) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecAtomicKokkos_PackCommSelf<LMPHostType,1,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAtomicKokkos_PackCommSelf<LMPHostType,1,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecAtomicKokkos_PackCommSelf<LMPHostType,0,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAtomicKokkos_PackCommSelf<LMPHostType,0,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecAtomicKokkos_PackCommSelf<LMPDeviceType,1,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAtomicKokkos_PackCommSelf<LMPDeviceType,1,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecAtomicKokkos_PackCommSelf<LMPDeviceType,0,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecAtomicKokkos_PackCommSelf<LMPDeviceType,0,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*3;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecAtomicKokkos_UnpackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
int _first;
AtomVecAtomicKokkos_UnpackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
_first(first) {};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
}
};
/* ---------------------------------------------------------------------- */
void AtomVecAtomicKokkos::unpack_comm_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf ) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
struct AtomVecAtomicKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
struct AtomVecAtomicKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomicKokkos::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomicKokkos::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_reverse(int n, int first, double *buf)
{
if(n > 0)
sync(Host,F_MASK);
int m = 0;
const int last = first + n;
for (int i = first; i < last; i++) {
buf[m++] = h_f(i,0);
buf[m++] = h_f(i,1);
buf[m++] = h_f(i,2);
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomicKokkos::unpack_reverse(int n, int *list, double *buf)
{
if(n > 0) {
sync(Host,F_MASK);
modified(Host,F_MASK);
}
int m = 0;
for (int i = 0; i < n; i++) {
const int j = list[i];
h_f(j,0) += buf[m++];
h_f(j,1) += buf[m++];
h_f(j,2) += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG>
struct AtomVecAtomicKokkos_PackBorder {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_xfloat_2d _buf;
const typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
const typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
const typename ArrayTypes<DeviceType>::t_tagint_1d _tag;
const typename ArrayTypes<DeviceType>::t_int_1d _type;
const typename ArrayTypes<DeviceType>::t_int_1d _mask;
X_FLOAT _dx,_dy,_dz;
AtomVecAtomicKokkos_PackBorder(
const typename ArrayTypes<DeviceType>::t_xfloat_2d &buf,
const typename ArrayTypes<DeviceType>::t_int_2d_const &list,
const int & iswap,
const typename ArrayTypes<DeviceType>::t_x_array &x,
const typename ArrayTypes<DeviceType>::t_tagint_1d &tag,
const typename ArrayTypes<DeviceType>::t_int_1d &type,
const typename ArrayTypes<DeviceType>::t_int_1d &mask,
const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
_buf(buf),_list(list),_iswap(iswap),
_x(x),_tag(tag),_type(type),_mask(mask),
_dx(dx),_dy(dy),_dz(dz) {}
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
- _buf(i,3) = _tag(j);
- _buf(i,4) = _type(j);
- _buf(i,5) = _mask(j);
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
} else {
_buf(i,0) = _x(j,0) + _dx;
_buf(i,1) = _x(j,1) + _dy;
_buf(i,2) = _x(j,2) + _dz;
- _buf(i,3) = _tag(j);
- _buf(i,4) = _type(j);
- _buf(i,5) = _mask(j);
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist, DAT::tdual_xfloat_2d buf,int iswap,
int pbc_flag, int *pbc, ExecutionSpace space)
{
X_FLOAT dx,dy,dz;
if (pbc_flag != 0) {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if(space==Host) {
AtomVecAtomicKokkos_PackBorder<LMPHostType,1> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecAtomicKokkos_PackBorder<LMPDeviceType,1> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
} else {
dx = dy = dz = 0;
if(space==Host) {
AtomVecAtomicKokkos_PackBorder<LMPHostType,0> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecAtomicKokkos_PackBorder<LMPDeviceType,0> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
return n*6;
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecAtomicKokkos_UnpackBorder {
typedef DeviceType device_type;
const typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
typename ArrayTypes<DeviceType>::t_x_array _x;
typename ArrayTypes<DeviceType>::t_tagint_1d _tag;
typename ArrayTypes<DeviceType>::t_int_1d _type;
typename ArrayTypes<DeviceType>::t_int_1d _mask;
int _first;
AtomVecAtomicKokkos_UnpackBorder(
const typename ArrayTypes<DeviceType>::t_xfloat_2d_const &buf,
typename ArrayTypes<DeviceType>::t_x_array &x,
typename ArrayTypes<DeviceType>::t_tagint_1d &tag,
typename ArrayTypes<DeviceType>::t_int_1d &type,
typename ArrayTypes<DeviceType>::t_int_1d &mask,
const int& first):
_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_first(first){
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
- _tag(i+_first) = static_cast<tagint> (_buf(i,3));
- _type(i+_first) = static_cast<int> (_buf(i,4));
- _mask(i+_first) = static_cast<int> (_buf(i,5));
+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
// printf("%i %i %lf %lf %lf %i BORDER\n",_tag(i+_first),i+_first,_x(i+_first,0),_x(i+_first,1),_x(i+_first,2),_type(i+_first));
}
};
/* ---------------------------------------------------------------------- */
void AtomVecAtomicKokkos::unpack_border_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf,ExecutionSpace space) {
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK);
while (first+n >= nmax) grow(0);
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK);
if(space==Host) {
struct AtomVecAtomicKokkos_UnpackBorder<LMPHostType> f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,first);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
struct AtomVecAtomicKokkos_UnpackBorder<LMPDeviceType> f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomicKokkos::unpack_border(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomicKokkos::unpack_border_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|V_MASK|TAG_MASK|TYPE_MASK|MASK_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecAtomicKokkos_PackExchangeFunctor {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array_randomread _x;
typename AT::t_v_array_randomread _v;
typename AT::t_tagint_1d_randomread _tag;
typename AT::t_int_1d_randomread _type;
typename AT::t_int_1d_randomread _mask;
typename AT::t_imageint_1d_randomread _image;
typename AT::t_x_array _xw;
typename AT::t_v_array _vw;
typename AT::t_tagint_1d _tagw;
typename AT::t_int_1d _typew;
typename AT::t_int_1d _maskw;
typename AT::t_imageint_1d _imagew;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d_const _sendlist;
typename AT::t_int_1d_const _copylist;
int _nlocal,_dim;
X_FLOAT _lo,_hi;
AtomVecAtomicKokkos_PackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d sendlist,
typename AT::tdual_int_1d copylist,int nlocal, int dim,
X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_xw(atom->k_x.view<DeviceType>()),
_vw(atom->k_v.view<DeviceType>()),
_tagw(atom->k_tag.view<DeviceType>()),
_typew(atom->k_type.view<DeviceType>()),
_maskw(atom->k_mask.view<DeviceType>()),
_imagew(atom->k_image.view<DeviceType>()),
_sendlist(sendlist.template view<DeviceType>()),
_copylist(copylist.template view<DeviceType>()),
_nlocal(nlocal),_dim(dim),
_lo(lo),_hi(hi){
const size_t elements = 11;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &mysend) const {
const int i = _sendlist(mysend);
_buf(mysend,0) = 11;
_buf(mysend,1) = _x(i,0);
_buf(mysend,2) = _x(i,1);
_buf(mysend,3) = _x(i,2);
_buf(mysend,4) = _v(i,0);
_buf(mysend,5) = _v(i,1);
_buf(mysend,6) = _v(i,2);
- _buf(mysend,7) = _tag[i];
- _buf(mysend,8) = _type[i];
- _buf(mysend,9) = _mask[i];
- _buf(mysend,10) = _image[i];
+ _buf(mysend,7) = d_ubuf(_tag[i]).d;
+ _buf(mysend,8) = d_ubuf(_type[i]).d;
+ _buf(mysend,9) = d_ubuf(_mask[i]).d;
+ _buf(mysend,10) = d_ubuf(_image[i]).d;
const int j = _copylist(mysend);
if(j>-1) {
_xw(i,0) = _x(j,0);
_xw(i,1) = _x(j,1);
_xw(i,2) = _x(j,2);
_vw(i,0) = _v(j,0);
_vw(i,1) = _v(j,1);
_vw(i,2) = _v(j,2);
_tagw[i] = _tag(j);
_typew[i] = _type(j);
_maskw[i] = _mask(j);
_imagew[i] = _image(j);
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf, DAT::tdual_int_1d k_sendlist,DAT::tdual_int_1d k_copylist,ExecutionSpace space,int dim,X_FLOAT lo,X_FLOAT hi )
{
if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*k_buf.view<LMPHostType>().dimension_1())/11) {
int newsize = nsend*11/k_buf.view<LMPHostType>().dimension_1()+1;
k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
}
if(space == Host) {
AtomVecAtomicKokkos_PackExchangeFunctor<LMPHostType> f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPHostType::fence();
return nsend*11;
} else {
AtomVecAtomicKokkos_PackExchangeFunctor<LMPDeviceType> f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPDeviceType::fence();
return nsend*11;
}
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_exchange(int i, double *buf)
{
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecAtomicKokkos_UnpackExchangeFunctor {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array _x;
typename AT::t_v_array _v;
typename AT::t_tagint_1d _tag;
typename AT::t_int_1d _type;
typename AT::t_int_1d _mask;
typename AT::t_imageint_1d _image;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d _nlocal;
int _dim;
X_FLOAT _lo,_hi;
AtomVecAtomicKokkos_UnpackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d nlocal,
int dim, X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
_lo(lo),_hi(hi){
const size_t elements = 11;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &myrecv) const {
X_FLOAT x = _buf(myrecv,_dim+1);
if (x >= _lo && x < _hi) {
int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
_x(i,0) = _buf(myrecv,1);
_x(i,1) = _buf(myrecv,2);
_x(i,2) = _buf(myrecv,3);
_v(i,0) = _buf(myrecv,4);
_v(i,1) = _buf(myrecv,5);
_v(i,2) = _buf(myrecv,6);
- _tag[i] = _buf(myrecv,7);
- _type[i] = _buf(myrecv,8);
- _mask[i] = _buf(myrecv,9);
- _image[i] = _buf(myrecv,10);
+ _tag[i] = (tagint) d_ubuf(_buf(myrecv,7)).i;
+ _type[i] = (int) d_ubuf(_buf(myrecv,8)).i;
+ _mask[i] = (int) d_ubuf(_buf(myrecv,9)).i;
+ _image[i] = (imageint) d_ubuf(_buf(myrecv,10)).i;
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,ExecutionSpace space) {
if(space == Host) {
k_count.h_view(0) = nlocal;
AtomVecAtomicKokkos_UnpackExchangeFunctor<LMPHostType> f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/11,f);
LMPHostType::fence();
return k_count.h_view(0);
} else {
k_count.h_view(0) = nlocal;
k_count.modify<LMPHostType>();
k_count.sync<LMPDeviceType>();
AtomVecAtomicKokkos_UnpackExchangeFunctor<LMPDeviceType> f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/11,f);
LMPDeviceType::fence();
k_count.modify<LMPDeviceType>();
k_count.sync<LMPHostType>();
return k_count.h_view(0);
}
}
/* ---------------------------------------------------------------------- */
int AtomVecAtomicKokkos::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK);
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecAtomicKokkos::size_restart()
{
int i;
int nlocal = atom->nlocal;
int n = 11 * nlocal;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecAtomicKokkos::pack_restart(int i, double *buf)
{
sync(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK );
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecAtomicKokkos::unpack_restart(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK );
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecAtomicKokkos::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
//if(nlocal>2) printf("typeA: %i %i\n",type[0],type[1]);
atomKK->modified(Host,ALL_MASK);
grow(0);
//if(nlocal>2) printf("typeB: %i %i\n",type[0],type[1]);
}
atomKK->modified(Host,ALL_MASK);
tag[nlocal] = 0;
type[nlocal] = itype;
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_mask[nlocal] = 1;
h_image[nlocal] = ((tagint) IMGMAX << IMG2BITS) |
((tagint) IMGMAX << IMGBITS) | IMGMAX;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecAtomicKokkos::data_atom(double *coord, tagint imagetmp,
char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
h_tag[nlocal] = atoi(values[0]);
h_type[nlocal] = atoi(values[1]);
if (type[nlocal] <= 0 || type[nlocal] > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_image[nlocal] = imagetmp;
h_mask[nlocal] = 1;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
atomKK->modified(Host,ALL_MASK);
atom->nlocal++;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecAtomicKokkos::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = h_tag[i];
buf[i][1] = h_type[i];
buf[i][2] = h_x(i,0);
buf[i][3] = h_x(i,1);
buf[i][4] = h_x(i,2);
buf[i][5] = (h_image[i] & IMGMASK) - IMGMAX;
buf[i][6] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
buf[i][7] = (h_image[i] >> IMG2BITS) - IMGMAX;
}
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecAtomicKokkos::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,"%d %d %-1.16e %-1.16e %-1.16e %d %d %d\n",
(int) buf[i][0],(int) buf[i][1],buf[i][2],buf[i][3],buf[i][4],
(int) buf[i][5],(int) buf[i][6],(int) buf[i][7]);
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecAtomicKokkos::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);
return bytes;
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomicKokkos::sync(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
} else {
if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
}
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomicKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
} else {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
}
}
/* ---------------------------------------------------------------------- */
void AtomVecAtomicKokkos::modified(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
} else {
if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
}
}
diff --git a/src/KOKKOS/atom_vec_bond_kokkos.cpp b/src/KOKKOS/atom_vec_bond_kokkos.cpp
index f10decac2..c46c49cb2 100644
--- a/src/KOKKOS/atom_vec_bond_kokkos.cpp
+++ b/src/KOKKOS/atom_vec_bond_kokkos.cpp
@@ -1,1786 +1,1786 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <stdlib.h>
#include "atom_vec_bond_kokkos.h"
#include "atom_kokkos.h"
#include "comm_kokkos.h"
#include "domain.h"
#include "modify.h"
#include "fix.h"
#include "atom_masks.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
#define DELTA 10000
/* ---------------------------------------------------------------------- */
AtomVecBondKokkos::AtomVecBondKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
{
molecular = 1;
bonds_allow = 1;
mass_type = 1;
comm_x_only = comm_f_only = 1;
size_forward = 3;
size_reverse = 3;
size_border = 7;
size_velocity = 3;
size_data_atom = 6;
size_data_vel = 4;
xcol_data = 4;
k_count = DAT::tdual_int_1d("atom::k_count",1);
atomKK = (AtomKokkos *) atom;
commKK = (CommKokkos *) comm;
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by DELTA
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecBondKokkos::grow(int n)
{
if (n == 0) nmax += DELTA;
else nmax = n;
atomKK->nmax = nmax;
if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
sync(Device,ALL_MASK);
modified(Device,ALL_MASK);
memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");
memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");
memory->grow_kokkos(atomKK->k_molecule,atomKK->molecule,nmax,"atom:molecule");
memory->grow_kokkos(atomKK->k_nspecial,atomKK->nspecial,nmax,3,"atom:nspecial");
memory->grow_kokkos(atomKK->k_special,atomKK->special,nmax,atomKK->maxspecial,"atom:special");
memory->grow_kokkos(atomKK->k_num_bond,atomKK->num_bond,nmax,"atom:num_bond");
memory->grow_kokkos(atomKK->k_bond_type,atomKK->bond_type,nmax,atomKK->bond_per_atom,"atom:bond_type");
memory->grow_kokkos(atomKK->k_bond_atom,atomKK->bond_atom,nmax,atomKK->bond_per_atom,"atom:bond_atom");
grow_reset();
sync(Host,ALL_MASK);
if (atom->nextra_grow)
for (int iextra = 0; iextra < atomKK->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecBondKokkos::grow_reset()
{
tag = atomKK->tag;
d_tag = atomKK->k_tag.d_view;
h_tag = atomKK->k_tag.h_view;
type = atomKK->type;
d_type = atomKK->k_type.d_view;
h_type = atomKK->k_type.h_view;
mask = atomKK->mask;
d_mask = atomKK->k_mask.d_view;
h_mask = atomKK->k_mask.h_view;
image = atomKK->image;
d_image = atomKK->k_image.d_view;
h_image = atomKK->k_image.h_view;
x = atomKK->x;
d_x = atomKK->k_x.d_view;
h_x = atomKK->k_x.h_view;
v = atomKK->v;
d_v = atomKK->k_v.d_view;
h_v = atomKK->k_v.h_view;
f = atomKK->f;
d_f = atomKK->k_f.d_view;
h_f = atomKK->k_f.h_view;
molecule = atomKK->molecule;
d_molecule = atomKK->k_molecule.d_view;
h_molecule = atomKK->k_molecule.h_view;
nspecial = atomKK->nspecial;
d_nspecial = atomKK->k_nspecial.d_view;
h_nspecial = atomKK->k_nspecial.h_view;
special = atomKK->special;
d_special = atomKK->k_special.d_view;
h_special = atomKK->k_special.h_view;
num_bond = atomKK->num_bond;
d_num_bond = atomKK->k_num_bond.d_view;
h_num_bond = atomKK->k_num_bond.h_view;
bond_type = atomKK->bond_type;
d_bond_type = atomKK->k_bond_type.d_view;
h_bond_type = atomKK->k_bond_type.h_view;
bond_atom = atomKK->bond_atom;
d_bond_atom = atomKK->k_bond_atom.d_view;
h_bond_atom = atomKK->k_bond_atom.h_view;
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecBondKokkos::copy(int i, int j, int delflag)
{
int k;
h_tag[j] = h_tag[i];
h_type[j] = h_type[i];
mask[j] = mask[i];
h_image[j] = h_image[i];
h_x(j,0) = h_x(i,0);
h_x(j,1) = h_x(i,1);
h_x(j,2) = h_x(i,2);
h_v(j,0) = h_v(i,0);
h_v(j,1) = h_v(i,1);
h_v(j,2) = h_v(i,2);
h_molecule(j) = h_molecule(i);
h_num_bond(j) = h_num_bond(i);
for (k = 0; k < h_num_bond(j); k++) {
h_bond_type(j,k) = h_bond_type(i,k);
h_bond_atom(j,k) = h_bond_atom(i,k);
}
h_nspecial(j,0) = h_nspecial(i,0);
h_nspecial(j,1) = h_nspecial(i,1);
h_nspecial(j,2) = h_nspecial(i,2);
for (k = 0; k < h_nspecial(j,2); k++) h_special(j,k) = h_special(i,k);
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecBondKokkos_PackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecBondKokkos_PackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
const size_t maxsend = (buf.view<DeviceType>().dimension_0()*buf.view<DeviceType>().dimension_1())/3;
const size_t elements = 3;
buffer_view<DeviceType>(_buf,buf,maxsend,elements);
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_comm_kokkos(const int &n,
const DAT::tdual_int_2d &list,
const int & iswap,
const DAT::tdual_xfloat_2d &buf,
const int &pbc_flag,
const int* const pbc)
{
// Check whether to always run forward communication on the host
// Choose correct forward PackComm kernel
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecBondKokkos_PackComm<LMPHostType,1,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecBondKokkos_PackComm<LMPHostType,1,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecBondKokkos_PackComm<LMPHostType,0,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecBondKokkos_PackComm<LMPHostType,0,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecBondKokkos_PackComm<LMPDeviceType,1,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecBondKokkos_PackComm<LMPDeviceType,1,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecBondKokkos_PackComm<LMPDeviceType,0,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecBondKokkos_PackComm<LMPDeviceType,0,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*size_forward;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecBondKokkos_PackCommSelf {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_x_array _xw;
int _nfirst;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecBondKokkos_PackCommSelf(
const typename DAT::tdual_x_array &x,
const int &nfirst,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_xw(i+_nfirst,0) = _x(j,0);
_xw(i+_nfirst,1) = _x(j,1);
_xw(i+_nfirst,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list, const int & iswap,
const int nfirst, const int &pbc_flag, const int* const pbc) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecBondKokkos_PackCommSelf<LMPHostType,1,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecBondKokkos_PackCommSelf<LMPHostType,1,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecBondKokkos_PackCommSelf<LMPHostType,0,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecBondKokkos_PackCommSelf<LMPHostType,0,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecBondKokkos_PackCommSelf<LMPDeviceType,1,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecBondKokkos_PackCommSelf<LMPDeviceType,1,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecBondKokkos_PackCommSelf<LMPDeviceType,0,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecBondKokkos_PackCommSelf<LMPDeviceType,0,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*3;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecBondKokkos_UnpackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
int _first;
AtomVecBondKokkos_UnpackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
_first(first) {};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
}
};
/* ---------------------------------------------------------------------- */
void AtomVecBondKokkos::unpack_comm_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf ) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
struct AtomVecBondKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
struct AtomVecBondKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecBondKokkos::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void AtomVecBondKokkos::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_reverse(int n, int first, double *buf)
{
if(n > 0)
sync(Host,F_MASK);
int m = 0;
const int last = first + n;
for (int i = first; i < last; i++) {
buf[m++] = h_f(i,0);
buf[m++] = h_f(i,1);
buf[m++] = h_f(i,2);
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecBondKokkos::unpack_reverse(int n, int *list, double *buf)
{
if(n > 0)
modified(Host,F_MASK);
int m = 0;
for (int i = 0; i < n; i++) {
const int j = list[i];
h_f(j,0) += buf[m++];
h_f(j,1) += buf[m++];
h_f(j,2) += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG>
struct AtomVecBondKokkos_PackBorder {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_xfloat_2d _buf;
const typename AT::t_int_2d_const _list;
const int _iswap;
const typename AT::t_x_array_randomread _x;
const typename AT::t_tagint_1d _tag;
const typename AT::t_int_1d _type;
const typename AT::t_int_1d _mask;
const typename AT::t_tagint_1d _molecule;
X_FLOAT _dx,_dy,_dz;
AtomVecBondKokkos_PackBorder(
const typename AT::t_xfloat_2d &buf,
const typename AT::t_int_2d_const &list,
const int & iswap,
const typename AT::t_x_array &x,
const typename AT::t_tagint_1d &tag,
const typename AT::t_int_1d &type,
const typename AT::t_int_1d &mask,
const typename AT::t_tagint_1d &molecule,
const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
_buf(buf),_list(list),_iswap(iswap),
_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
_dx(dx),_dy(dy),_dz(dz) {}
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
- _buf(i,3) = _tag(j);
- _buf(i,4) = _type(j);
- _buf(i,5) = _mask(j);
- _buf(i,6) = _molecule(j);
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
+ _buf(i,6) = d_ubuf(_molecule(j)).d;
} else {
_buf(i,0) = _x(j,0) + _dx;
_buf(i,1) = _x(j,1) + _dy;
_buf(i,2) = _x(j,2) + _dz;
- _buf(i,3) = _tag(j);
- _buf(i,4) = _type(j);
- _buf(i,5) = _mask(j);
- _buf(i,6) = _molecule(j);
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
+ _buf(i,6) = d_ubuf(_molecule(j)).d;
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist,
DAT::tdual_xfloat_2d buf,int iswap,
int pbc_flag, int *pbc, ExecutionSpace space)
{
X_FLOAT dx,dy,dz;
if (pbc_flag != 0) {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if(space==Host) {
AtomVecBondKokkos_PackBorder<LMPHostType,1> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecBondKokkos_PackBorder<LMPDeviceType,1> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
} else {
dx = dy = dz = 0;
if(space==Host) {
AtomVecBondKokkos_PackBorder<LMPHostType,0> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecBondKokkos_PackBorder<LMPDeviceType,0> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
return n*size_border;
}
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_molecule(j);
}
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecBondKokkos_UnpackBorder {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
const typename AT::t_xfloat_2d_const _buf;
typename AT::t_x_array _x;
typename AT::t_tagint_1d _tag;
typename AT::t_int_1d _type;
typename AT::t_int_1d _mask;
typename AT::t_tagint_1d _molecule;
int _first;
AtomVecBondKokkos_UnpackBorder(
const typename AT::t_xfloat_2d_const &buf,
typename AT::t_x_array &x,
typename AT::t_tagint_1d &tag,
typename AT::t_int_1d &type,
typename AT::t_int_1d &mask,
typename AT::t_tagint_1d &molecule,
const int& first):
_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
_first(first){
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
- _tag(i+_first) = static_cast<tagint> (_buf(i,3));
- _type(i+_first) = static_cast<int> (_buf(i,4));
- _mask(i+_first) = static_cast<int> (_buf(i,5));
- _molecule(i+_first) = static_cast<tagint> (_buf(i,6));
+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
+ _molecule(i+_first) = (tagint) d_ubuf(_buf(i,6)).i;
}
};
/* ---------------------------------------------------------------------- */
void AtomVecBondKokkos::unpack_border_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf,
ExecutionSpace space) {
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
while (first+n >= nmax) grow(0);
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
if(space==Host) {
struct AtomVecBondKokkos_UnpackBorder<LMPHostType>
f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,h_molecule,first);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
struct AtomVecBondKokkos_UnpackBorder<LMPDeviceType>
f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,d_molecule,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
void AtomVecBondKokkos::unpack_border(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecBondKokkos::unpack_border_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|V_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::unpack_border_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++)
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecBondKokkos_PackExchangeFunctor {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array_randomread _x;
typename AT::t_v_array_randomread _v;
typename AT::t_tagint_1d_randomread _tag;
typename AT::t_int_1d_randomread _type;
typename AT::t_int_1d_randomread _mask;
typename AT::t_imageint_1d_randomread _image;
typename AT::t_tagint_1d_randomread _molecule;
typename AT::t_int_2d_randomread _nspecial;
typename AT::t_tagint_2d_randomread _special;
typename AT::t_int_1d_randomread _num_bond;
typename AT::t_int_2d_randomread _bond_type;
typename AT::t_tagint_2d_randomread _bond_atom;
typename AT::t_x_array _xw;
typename AT::t_v_array _vw;
typename AT::t_tagint_1d _tagw;
typename AT::t_int_1d _typew;
typename AT::t_int_1d _maskw;
typename AT::t_imageint_1d _imagew;
typename AT::t_tagint_1d _moleculew;
typename AT::t_int_2d _nspecialw;
typename AT::t_tagint_2d _specialw;
typename AT::t_int_1d _num_bondw;
typename AT::t_int_2d _bond_typew;
typename AT::t_tagint_2d _bond_atomw;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d_const _sendlist;
typename AT::t_int_1d_const _copylist;
int _nlocal,_dim;
X_FLOAT _lo,_hi;
size_t elements;
AtomVecBondKokkos_PackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d sendlist,
typename AT::tdual_int_1d copylist,int nlocal, int dim,
X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_molecule(atom->k_molecule.view<DeviceType>()),
_nspecial(atom->k_nspecial.view<DeviceType>()),
_special(atom->k_special.view<DeviceType>()),
_num_bond(atom->k_num_bond.view<DeviceType>()),
_bond_type(atom->k_bond_type.view<DeviceType>()),
_bond_atom(atom->k_bond_atom.view<DeviceType>()),
_xw(atom->k_x.view<DeviceType>()),
_vw(atom->k_v.view<DeviceType>()),
_tagw(atom->k_tag.view<DeviceType>()),
_typew(atom->k_type.view<DeviceType>()),
_maskw(atom->k_mask.view<DeviceType>()),
_imagew(atom->k_image.view<DeviceType>()),
_moleculew(atom->k_molecule.view<DeviceType>()),
_nspecialw(atom->k_nspecial.view<DeviceType>()),
_specialw(atom->k_special.view<DeviceType>()),
_num_bondw(atom->k_num_bond.view<DeviceType>()),
_bond_typew(atom->k_bond_type.view<DeviceType>()),
_bond_atomw(atom->k_bond_atom.view<DeviceType>()),
_sendlist(sendlist.template view<DeviceType>()),
_copylist(copylist.template view<DeviceType>()),
_nlocal(nlocal),_dim(dim),
_lo(lo),_hi(hi){
// 3 comp of x, 3 comp of v, 1 tag, 1 type, 1 mask, 1 image, 1 molecule, 3 nspecial,
// maxspecial special, 1 num_bond, bond_per_atom bond_type, bond_per_atom bond_atom,
// 1 to store buffer lenght
elements = 16+atom->maxspecial+atom->bond_per_atom+atom->bond_per_atom;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &mysend) const {
int k;
const int i = _sendlist(mysend);
_buf(mysend,0) = elements;
int m = 1;
_buf(mysend,m++) = _x(i,0);
_buf(mysend,m++) = _x(i,1);
_buf(mysend,m++) = _x(i,2);
_buf(mysend,m++) = _v(i,0);
_buf(mysend,m++) = _v(i,1);
_buf(mysend,m++) = _v(i,2);
- _buf(mysend,m++) = _tag(i);
- _buf(mysend,m++) = _type(i);
- _buf(mysend,m++) = _mask(i);
- _buf(mysend,m++) = _image(i);
- _buf(mysend,m++) = _molecule(i);
- _buf(mysend,m++) = _num_bond(i);
+ _buf(mysend,m++) = d_ubuf(_tag(i)).d;
+ _buf(mysend,m++) = d_ubuf(_type(i)).d;
+ _buf(mysend,m++) = d_ubuf(_mask(i)).d;
+ _buf(mysend,m++) = d_ubuf(_image(i)).d;
+ _buf(mysend,m++) = d_ubuf(_molecule(i)).d;
+ _buf(mysend,m++) = d_ubuf(_num_bond(i)).d;
for (k = 0; k < _num_bond(i); k++) {
- _buf(mysend,m++) = _bond_type(i,k);
- _buf(mysend,m++) = _bond_atom(i,k);
+ _buf(mysend,m++) = d_ubuf(_bond_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_bond_atom(i,k)).d;
}
- _buf(mysend,m++) = _nspecial(i,0);
- _buf(mysend,m++) = _nspecial(i,1);
- _buf(mysend,m++) = _nspecial(i,2);
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,0)).d;
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,1)).d;
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,2)).d;
for (k = 0; k < _nspecial(i,2); k++)
- _buf(mysend,m++) = _special(i,k);
+ _buf(mysend,m++) = d_ubuf(_special(i,k)).d;
const int j = _copylist(mysend);
if(j>-1) {
_xw(i,0) = _x(j,0);
_xw(i,1) = _x(j,1);
_xw(i,2) = _x(j,2);
_vw(i,0) = _v(j,0);
_vw(i,1) = _v(j,1);
_vw(i,2) = _v(j,2);
_tagw(i) = _tag(j);
_typew(i) = _type(j);
_maskw(i) = _mask(j);
_imagew(i) = _image(j);
_moleculew(i) = _molecule(j);
_num_bondw(i) = _num_bond(j);
for (k = 0; k < _num_bond(j); k++) {
_bond_typew(i,k) = _bond_type(j,k);
_bond_atomw(i,k) = _bond_atom(j,k);
}
_nspecialw(i,0) = _nspecial(j,0);
_nspecialw(i,1) = _nspecial(j,1);
_nspecialw(i,2) = _nspecial(j,2);
for (k = 0; k < _nspecial(j,2); k++)
_specialw(i,k) = _special(j,k);
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf,
DAT::tdual_int_1d k_sendlist,
DAT::tdual_int_1d k_copylist,
ExecutionSpace space,int dim,X_FLOAT lo,
X_FLOAT hi )
{
const int elements = 16+atomKK->maxspecial+atomKK->bond_per_atom+atomKK->bond_per_atom;
if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*
k_buf.view<LMPHostType>().dimension_1())/elements) {
int newsize = nsend*elements/k_buf.view<LMPHostType>().dimension_1()+1;
k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
}
if(space == Host) {
AtomVecBondKokkos_PackExchangeFunctor<LMPHostType>
f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPHostType::fence();
return nsend*elements;
} else {
AtomVecBondKokkos_PackExchangeFunctor<LMPDeviceType>
f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPDeviceType::fence();
return nsend*elements;
}
}
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_exchange(int i, double *buf)
{
int k;
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = ubuf(h_molecule(i)).d;
buf[m++] = ubuf(h_num_bond(i)).d;
for (k = 0; k < h_num_bond(i); k++) {
buf[m++] = ubuf(h_bond_type(i,k)).d;
buf[m++] = ubuf(h_bond_atom(i,k)).d;
}
buf[m++] = ubuf(h_nspecial(i,0)).d;
buf[m++] = ubuf(h_nspecial(i,1)).d;
buf[m++] = ubuf(h_nspecial(i,2)).d;
for (k = 0; k < h_nspecial(i,2); k++)
buf[m++] = ubuf(h_special(i,k)).d;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecBondKokkos_UnpackExchangeFunctor {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array _x;
typename AT::t_v_array _v;
typename AT::t_tagint_1d _tag;
typename AT::t_int_1d _type;
typename AT::t_int_1d _mask;
typename AT::t_imageint_1d _image;
typename AT::t_tagint_1d _molecule;
typename AT::t_int_2d _nspecial;
typename AT::t_tagint_2d _special;
typename AT::t_int_1d _num_bond;
typename AT::t_int_2d _bond_type;
typename AT::t_tagint_2d _bond_atom;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d _nlocal;
int _dim;
X_FLOAT _lo,_hi;
size_t elements;
AtomVecBondKokkos_UnpackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d nlocal,
int dim, X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_molecule(atom->k_molecule.view<DeviceType>()),
_nspecial(atom->k_nspecial.view<DeviceType>()),
_special(atom->k_special.view<DeviceType>()),
_num_bond(atom->k_num_bond.view<DeviceType>()),
_bond_type(atom->k_bond_type.view<DeviceType>()),
_bond_atom(atom->k_bond_atom.view<DeviceType>()),
_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
_lo(lo),_hi(hi){
elements = 16+atom->maxspecial+atom->bond_per_atom+atom->bond_per_atom;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &myrecv) const {
X_FLOAT x = _buf(myrecv,_dim+1);
if (x >= _lo && x < _hi) {
int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
int m = 1;
_x(i,0) = _buf(myrecv,m++);
_x(i,1) = _buf(myrecv,m++);
_x(i,2) = _buf(myrecv,m++);
_v(i,0) = _buf(myrecv,m++);
_v(i,1) = _buf(myrecv,m++);
_v(i,2) = _buf(myrecv,m++);
- _tag(i) = _buf(myrecv,m++);
- _type(i) = _buf(myrecv,m++);
- _mask(i) = _buf(myrecv,m++);
- _image(i) = _buf(myrecv,m++);
+ _tag(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _type(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _mask(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _image(i) = (imageint) d_ubuf(_buf(myrecv,m++)).i;
- _molecule(i) = _buf(myrecv,m++);
- _num_bond(i) = _buf(myrecv,m++);
+ _molecule(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _num_bond(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
int k;
for (k = 0; k < _num_bond(i); k++) {
- _bond_type(i,k) = _buf(myrecv,m++);
- _bond_atom(i,k) = _buf(myrecv,m++);
+ _bond_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _bond_atom(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _nspecial(i,0) = _buf(myrecv,m++);
- _nspecial(i,1) = _buf(myrecv,m++);
- _nspecial(i,2) = _buf(myrecv,m++);
+ _nspecial(i,0) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _nspecial(i,1) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _nspecial(i,2) = (int) d_ubuf(_buf(myrecv,m++)).i;
for (k = 0; k < _nspecial(i,2); k++)
- _special(i,k) = _buf(myrecv,m++);
+ _special(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,
int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,
ExecutionSpace space) {
const size_t elements = 16+atomKK->maxspecial+atomKK->bond_per_atom+atomKK->bond_per_atom;
if(space == Host) {
k_count.h_view(0) = nlocal;
AtomVecBondKokkos_UnpackExchangeFunctor<LMPHostType>
f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/elements,f);
LMPHostType::fence();
return k_count.h_view(0);
} else {
k_count.h_view(0) = nlocal;
k_count.modify<LMPHostType>();
k_count.sync<LMPDeviceType>();
AtomVecBondKokkos_UnpackExchangeFunctor<LMPDeviceType>
f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/elements,f);
LMPDeviceType::fence();
k_count.modify<LMPDeviceType>();
k_count.sync<LMPHostType>();
return k_count.h_view(0);
}
}
/* ---------------------------------------------------------------------- */
int AtomVecBondKokkos::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | MOLECULE_MASK | BOND_MASK | SPECIAL_MASK);
int k;
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;
h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_bond(nlocal); k++) {
h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_nspecial(nlocal,0) = (int) ubuf(buf[m++]).i;
h_nspecial(nlocal,1) = (int) ubuf(buf[m++]).i;
h_nspecial(nlocal,2) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_nspecial(nlocal,2); k++)
h_special(nlocal,k) = (tagint) ubuf(buf[m++]).i;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecBondKokkos::size_restart()
{
int i;
int nlocal = atom->nlocal;
int n = 0;
for (i = 0; i < nlocal; i++)
n += 13 + 2*h_num_bond[i];
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_restart(int i, double *buf)
{
sync(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | MOLECULE_MASK | BOND_MASK | SPECIAL_MASK);
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = ubuf(h_molecule(i)).d;
buf[m++] = ubuf(h_num_bond(i)).d;
for (int k = 0; k < h_num_bond(i); k++) {
buf[m++] = ubuf(MAX(h_bond_type(i,k),-h_bond_type(i,k))).d;
buf[m++] = ubuf(h_bond_atom(i,k)).d;
}
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecBondKokkos::unpack_restart(double *buf)
{
int k;
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | MOLECULE_MASK | BOND_MASK | SPECIAL_MASK);
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;
h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_bond(nlocal); k++) {
h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecBondKokkos::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
atomKK->modified(Host,ALL_MASK);
grow(0);
}
atomKK->modified(Host,ALL_MASK);
tag[nlocal] = 0;
type[nlocal] = itype;
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_mask(nlocal) = 1;
h_image(nlocal) = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
h_molecule(nlocal) = 0;
h_num_bond(nlocal) = 0;
h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecBondKokkos::data_atom(double *coord, imageint imagetmp,
char **values)
{
int nlocal = atomKK->nlocal;
if (nlocal == nmax) grow(0);
atomKK->modified(Host,ALL_MASK);
h_tag(nlocal) = atoi(values[0]);
h_molecule(nlocal) = atoi(values[1]);
h_type(nlocal) = atoi(values[2]);
if (h_type(nlocal) <= 0 || h_type(nlocal) > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_image(nlocal) = imagetmp;
h_mask(nlocal) = 1;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
h_num_bond(nlocal) = 0;
atomKK->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecBondKokkos::data_atom_hybrid(int nlocal, char **values)
{
h_molecule(nlocal) = atoi(values[0]);
h_num_bond(nlocal) = 0;
return 1;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecBondKokkos::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = h_tag(i);
buf[i][1] = h_molecule(i);
buf[i][2] = h_type(i);
buf[i][3] = h_x(i,0);
buf[i][4] = h_x(i,1);
buf[i][5] = h_x(i,2);
buf[i][6] = (h_image[i] & IMGMASK) - IMGMAX;
buf[i][7] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
buf[i][8] = (h_image[i] >> IMG2BITS) - IMGMAX;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecBondKokkos::pack_data_hybrid(int i, double *buf)
{
buf[0] = h_molecule(i);
return 1;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecBondKokkos::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,"%d %d %d %-1.16e %-1.16e %-1.16e %d %d %d\n",
(int) buf[i][0],(int) buf[i][1], (int) buf[i][2],
buf[i][3],buf[i][4],buf[i][5],
(int) buf[i][6],(int) buf[i][7],(int) buf[i][8]);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecBondKokkos::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," " TAGINT_FORMAT, (tagint) (buf[0]));
return 1;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecBondKokkos::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);
if (atom->memcheck("molecule")) bytes += memory->usage(molecule,nmax);
if (atom->memcheck("nspecial")) bytes += memory->usage(nspecial,nmax,3);
if (atom->memcheck("special"))
bytes += memory->usage(special,nmax,atom->maxspecial);
if (atom->memcheck("num_bond")) bytes += memory->usage(num_bond,nmax);
if (atom->memcheck("bond_type"))
bytes += memory->usage(bond_type,nmax,atom->bond_per_atom);
if (atom->memcheck("bond_atom"))
bytes += memory->usage(bond_atom,nmax,atom->bond_per_atom);
return bytes;
}
/* ---------------------------------------------------------------------- */
void AtomVecBondKokkos::sync(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPDeviceType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.sync<LMPDeviceType>();
atomKK->k_special.sync<LMPDeviceType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.sync<LMPDeviceType>();
atomKK->k_bond_type.sync<LMPDeviceType>();
atomKK->k_bond_atom.sync<LMPDeviceType>();
}
} else {
if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPHostType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.sync<LMPHostType>();
atomKK->k_special.sync<LMPHostType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.sync<LMPHostType>();
atomKK->k_bond_type.sync<LMPHostType>();
atomKK->k_bond_atom.sync<LMPHostType>();
}
}
}
/* ---------------------------------------------------------------------- */
void AtomVecBondKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
if (mask & SPECIAL_MASK) {
if (atomKK->k_nspecial.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
if (atomKK->k_special.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
}
if (mask & BOND_MASK) {
if (atomKK->k_num_bond.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
if (atomKK->k_bond_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
if (atomKK->k_bond_atom.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
}
} else {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
if (mask & SPECIAL_MASK) {
if (atomKK->k_nspecial.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
if (atomKK->k_special.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
}
if (mask & BOND_MASK) {
if (atomKK->k_num_bond.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
if (atomKK->k_bond_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
if (atomKK->k_bond_atom.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
}
}
}
/* ---------------------------------------------------------------------- */
void AtomVecBondKokkos::modified(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPDeviceType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.modify<LMPDeviceType>();
atomKK->k_special.modify<LMPDeviceType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.modify<LMPDeviceType>();
atomKK->k_bond_type.modify<LMPDeviceType>();
atomKK->k_bond_atom.modify<LMPDeviceType>();
}
} else {
if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPHostType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.modify<LMPHostType>();
atomKK->k_special.modify<LMPHostType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.modify<LMPHostType>();
atomKK->k_bond_type.modify<LMPHostType>();
atomKK->k_bond_atom.modify<LMPHostType>();
}
}
}
diff --git a/src/KOKKOS/atom_vec_charge_kokkos.cpp b/src/KOKKOS/atom_vec_charge_kokkos.cpp
index f6952f127..856660d1e 100644
--- a/src/KOKKOS/atom_vec_charge_kokkos.cpp
+++ b/src/KOKKOS/atom_vec_charge_kokkos.cpp
@@ -1,1562 +1,1562 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <stdlib.h>
#include "atom_vec_charge_kokkos.h"
#include "atom_kokkos.h"
#include "comm_kokkos.h"
#include "domain.h"
#include "modify.h"
#include "fix.h"
#include "atom_masks.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
#define DELTA 10000
/* ---------------------------------------------------------------------- */
AtomVecChargeKokkos::AtomVecChargeKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
{
molecular = 0;
mass_type = 1;
comm_x_only = comm_f_only = 1;
size_forward = 3;
size_reverse = 3;
size_border = 7;
size_velocity = 3;
size_data_atom = 6;
size_data_vel = 4;
xcol_data = 4;
atom->q_flag = 1;
k_count = DAT::tdual_int_1d("atom::k_count",1);
atomKK = (AtomKokkos *) atom;
commKK = (CommKokkos *) comm;
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by DELTA
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecChargeKokkos::grow(int n)
{
if (n == 0) nmax += DELTA;
else nmax = n;
atomKK->nmax = nmax;
if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
sync(Device,ALL_MASK);
modified(Device,ALL_MASK);
memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");
memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");
memory->grow_kokkos(atomKK->k_q,atomKK->q,nmax,"atom:q");
grow_reset();
sync(Host,ALL_MASK);
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecChargeKokkos::grow_reset()
{
tag = atomKK->tag;
d_tag = atomKK->k_tag.d_view;
h_tag = atomKK->k_tag.h_view;
type = atomKK->type;
d_type = atomKK->k_type.d_view;
h_type = atomKK->k_type.h_view;
mask = atomKK->mask;
d_mask = atomKK->k_mask.d_view;
h_mask = atomKK->k_mask.h_view;
image = atomKK->image;
d_image = atomKK->k_image.d_view;
h_image = atomKK->k_image.h_view;
x = atomKK->x;
d_x = atomKK->k_x.d_view;
h_x = atomKK->k_x.h_view;
v = atomKK->v;
d_v = atomKK->k_v.d_view;
h_v = atomKK->k_v.h_view;
f = atomKK->f;
d_f = atomKK->k_f.d_view;
h_f = atomKK->k_f.h_view;
q = atomKK->q;
d_q = atomKK->k_q.d_view;
h_q = atomKK->k_q.h_view;
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecChargeKokkos::copy(int i, int j, int delflag)
{
h_tag[j] = h_tag[i];
h_type[j] = h_type[i];
mask[j] = mask[i];
h_image[j] = h_image[i];
h_x(j,0) = h_x(i,0);
h_x(j,1) = h_x(i,1);
h_x(j,2) = h_x(i,2);
h_v(j,0) = h_v(i,0);
h_v(j,1) = h_v(i,1);
h_v(j,2) = h_v(i,2);
h_q[j] = h_q[i];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecChargeKokkos_PackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecChargeKokkos_PackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
const size_t maxsend = (buf.view<DeviceType>().dimension_0()*buf.view<DeviceType>().dimension_1())/3;
const size_t elements = 3;
buffer_view<DeviceType>(_buf,buf,maxsend,elements);
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_comm_kokkos(const int &n,
const DAT::tdual_int_2d &list,
const int & iswap,
const DAT::tdual_xfloat_2d &buf,
const int &pbc_flag,
const int* const pbc)
{
// Check whether to always run forward communication on the host
// Choose correct forward PackComm kernel
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecChargeKokkos_PackComm<LMPHostType,1,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecChargeKokkos_PackComm<LMPHostType,1,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecChargeKokkos_PackComm<LMPHostType,0,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecChargeKokkos_PackComm<LMPHostType,0,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecChargeKokkos_PackComm<LMPDeviceType,1,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecChargeKokkos_PackComm<LMPDeviceType,1,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecChargeKokkos_PackComm<LMPDeviceType,0,1> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecChargeKokkos_PackComm<LMPDeviceType,0,0> f(atomKK->k_x,buf,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*size_forward;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecChargeKokkos_PackCommSelf {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_x_array _xw;
int _nfirst;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecChargeKokkos_PackCommSelf(
const typename DAT::tdual_x_array &x,
const int &nfirst,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_xw(i+_nfirst,0) = _x(j,0);
_xw(i+_nfirst,1) = _x(j,1);
_xw(i+_nfirst,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list, const int & iswap,
- const int nfirst, const int &pbc_flag, const int* const pbc) {
+ const int nfirst, const int &pbc_flag, const int* const pbc) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecChargeKokkos_PackCommSelf<LMPHostType,1,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecChargeKokkos_PackCommSelf<LMPHostType,1,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecChargeKokkos_PackCommSelf<LMPHostType,0,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecChargeKokkos_PackCommSelf<LMPHostType,0,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecChargeKokkos_PackCommSelf<LMPDeviceType,1,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecChargeKokkos_PackCommSelf<LMPDeviceType,1,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecChargeKokkos_PackCommSelf<LMPDeviceType,0,1> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecChargeKokkos_PackCommSelf<LMPDeviceType,0,0> f(atomKK->k_x,nfirst,list,iswap,
domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*3;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecChargeKokkos_UnpackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
int _first;
AtomVecChargeKokkos_UnpackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
_first(first) {};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
}
};
/* ---------------------------------------------------------------------- */
void AtomVecChargeKokkos::unpack_comm_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf ) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
struct AtomVecChargeKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
struct AtomVecChargeKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecChargeKokkos::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void AtomVecChargeKokkos::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_reverse(int n, int first, double *buf)
{
if(n > 0)
sync(Host,F_MASK);
int m = 0;
const int last = first + n;
for (int i = first; i < last; i++) {
buf[m++] = h_f(i,0);
buf[m++] = h_f(i,1);
buf[m++] = h_f(i,2);
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecChargeKokkos::unpack_reverse(int n, int *list, double *buf)
{
if(n > 0)
modified(Host,F_MASK);
int m = 0;
for (int i = 0; i < n; i++) {
const int j = list[i];
h_f(j,0) += buf[m++];
h_f(j,1) += buf[m++];
h_f(j,2) += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG>
struct AtomVecChargeKokkos_PackBorder {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_xfloat_2d _buf;
const typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
const typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
const typename ArrayTypes<DeviceType>::t_tagint_1d _tag;
const typename ArrayTypes<DeviceType>::t_int_1d _type;
const typename ArrayTypes<DeviceType>::t_int_1d _mask;
const typename ArrayTypes<DeviceType>::t_float_1d _q;
X_FLOAT _dx,_dy,_dz;
AtomVecChargeKokkos_PackBorder(
const typename ArrayTypes<DeviceType>::t_xfloat_2d &buf,
const typename ArrayTypes<DeviceType>::t_int_2d_const &list,
const int & iswap,
const typename ArrayTypes<DeviceType>::t_x_array &x,
const typename ArrayTypes<DeviceType>::t_tagint_1d &tag,
const typename ArrayTypes<DeviceType>::t_int_1d &type,
const typename ArrayTypes<DeviceType>::t_int_1d &mask,
const typename ArrayTypes<DeviceType>::t_float_1d &q,
const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
_buf(buf),_list(list),_iswap(iswap),
_x(x),_tag(tag),_type(type),_mask(mask),_q(q),
_dx(dx),_dy(dy),_dz(dz) {}
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
- _buf(i,3) = _tag(j);
- _buf(i,4) = _type(j);
- _buf(i,5) = _mask(j);
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
_buf(i,6) = _q(j);
} else {
_buf(i,0) = _x(j,0) + _dx;
_buf(i,1) = _x(j,1) + _dy;
_buf(i,2) = _x(j,2) + _dz;
- _buf(i,3) = _tag(j);
- _buf(i,4) = _type(j);
- _buf(i,5) = _mask(j);
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
_buf(i,6) = _q(j);
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist, DAT::tdual_xfloat_2d buf,int iswap,
int pbc_flag, int *pbc, ExecutionSpace space)
{
X_FLOAT dx,dy,dz;
if (pbc_flag != 0) {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if(space==Host) {
AtomVecChargeKokkos_PackBorder<LMPHostType,1> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,h_q,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecChargeKokkos_PackBorder<LMPDeviceType,1> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,d_q,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
} else {
dx = dy = dz = 0;
if(space==Host) {
AtomVecChargeKokkos_PackBorder<LMPHostType,0> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,h_q,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecChargeKokkos_PackBorder<LMPDeviceType,0> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,d_q,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
return n*size_border;
}
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_q(j);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_q(j);
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_q[j];
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_q[j];
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_q[j];
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_q[j];
}
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecChargeKokkos_UnpackBorder {
typedef DeviceType device_type;
const typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
typename ArrayTypes<DeviceType>::t_x_array _x;
typename ArrayTypes<DeviceType>::t_tagint_1d _tag;
typename ArrayTypes<DeviceType>::t_int_1d _type;
typename ArrayTypes<DeviceType>::t_int_1d _mask;
typename ArrayTypes<DeviceType>::t_float_1d _q;
int _first;
AtomVecChargeKokkos_UnpackBorder(
const typename ArrayTypes<DeviceType>::t_xfloat_2d_const &buf,
typename ArrayTypes<DeviceType>::t_x_array &x,
typename ArrayTypes<DeviceType>::t_tagint_1d &tag,
typename ArrayTypes<DeviceType>::t_int_1d &type,
typename ArrayTypes<DeviceType>::t_int_1d &mask,
typename ArrayTypes<DeviceType>::t_float_1d &q,
const int& first):
_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_q(q),_first(first){
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
- _tag(i+_first) = static_cast<tagint> (_buf(i,3));
- _type(i+_first) = static_cast<int> (_buf(i,4));
- _mask(i+_first) = static_cast<int> (_buf(i,5));
+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
_q(i+_first) = _buf(i,6);
}
};
/* ---------------------------------------------------------------------- */
void AtomVecChargeKokkos::unpack_border_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf,ExecutionSpace space) {
if (first+n >= nmax) {
grow(first+n+100);
}
if(space==Host) {
struct AtomVecChargeKokkos_UnpackBorder<LMPHostType>
f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,h_q,first);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
struct AtomVecChargeKokkos_UnpackBorder<LMPDeviceType>
f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,d_q,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|Q_MASK);
}
/* ---------------------------------------------------------------------- */
void AtomVecChargeKokkos::unpack_border(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) {
grow(0);
}
modified(Host,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|Q_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_q[i] = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecChargeKokkos::unpack_border_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|V_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|Q_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_q[i] = buf[m++];
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::unpack_border_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++)
h_q[i] = buf[m++];
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecChargeKokkos_PackExchangeFunctor {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array_randomread _x;
typename AT::t_v_array_randomread _v;
typename AT::t_tagint_1d_randomread _tag;
typename AT::t_int_1d_randomread _type;
typename AT::t_int_1d_randomread _mask;
typename AT::t_imageint_1d_randomread _image;
typename AT::t_float_1d_randomread _q;
typename AT::t_x_array _xw;
typename AT::t_v_array _vw;
typename AT::t_tagint_1d _tagw;
typename AT::t_int_1d _typew;
typename AT::t_int_1d _maskw;
typename AT::t_imageint_1d _imagew;
typename AT::t_float_1d _qw;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d_const _sendlist;
typename AT::t_int_1d_const _copylist;
int _nlocal,_dim;
X_FLOAT _lo,_hi;
AtomVecChargeKokkos_PackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d sendlist,
typename AT::tdual_int_1d copylist,int nlocal, int dim,
X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_q(atom->k_q.view<DeviceType>()),
_xw(atom->k_x.view<DeviceType>()),
_vw(atom->k_v.view<DeviceType>()),
_tagw(atom->k_tag.view<DeviceType>()),
_typew(atom->k_type.view<DeviceType>()),
_maskw(atom->k_mask.view<DeviceType>()),
_imagew(atom->k_image.view<DeviceType>()),
_qw(atom->k_q.view<DeviceType>()),
_sendlist(sendlist.template view<DeviceType>()),
_copylist(copylist.template view<DeviceType>()),
_nlocal(nlocal),_dim(dim),
_lo(lo),_hi(hi){
const size_t elements = 12;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &mysend) const {
const int i = _sendlist(mysend);
_buf(mysend,0) = 12;
_buf(mysend,1) = _x(i,0);
_buf(mysend,2) = _x(i,1);
_buf(mysend,3) = _x(i,2);
_buf(mysend,4) = _v(i,0);
_buf(mysend,5) = _v(i,1);
_buf(mysend,6) = _v(i,2);
- _buf(mysend,7) = _tag[i];
- _buf(mysend,8) = _type[i];
- _buf(mysend,9) = _mask[i];
- _buf(mysend,10) = _image[i];
+ _buf(mysend,7) = d_ubuf(_tag[i]).d;
+ _buf(mysend,8) = d_ubuf(_type[i]).d;
+ _buf(mysend,9) = d_ubuf(_mask[i]).d;
+ _buf(mysend,10) = d_ubuf(_image[i]).d;
_buf(mysend,11) = _q[i];
const int j = _copylist(mysend);
if(j>-1) {
_xw(i,0) = _x(j,0);
_xw(i,1) = _x(j,1);
_xw(i,2) = _x(j,2);
_vw(i,0) = _v(j,0);
_vw(i,1) = _v(j,1);
_vw(i,2) = _v(j,2);
_tagw(i) = _tag(j);
_typew(i) = _type(j);
_maskw(i) = _mask(j);
_imagew(i) = _image(j);
_qw(i) = _q(j);
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf,
DAT::tdual_int_1d k_sendlist,
DAT::tdual_int_1d k_copylist,
ExecutionSpace space,int dim,
X_FLOAT lo,X_FLOAT hi )
{
if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*k_buf.view<LMPHostType>().dimension_1())/12) {
int newsize = nsend*12/k_buf.view<LMPHostType>().dimension_1()+1;
k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
}
if(space == Host) {
AtomVecChargeKokkos_PackExchangeFunctor<LMPHostType>
f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPHostType::fence();
return nsend*12;
} else {
AtomVecChargeKokkos_PackExchangeFunctor<LMPDeviceType>
f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPDeviceType::fence();
return nsend*12;
}
}
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_exchange(int i, double *buf)
{
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = h_q[i];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecChargeKokkos_UnpackExchangeFunctor {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array _x;
typename AT::t_v_array _v;
typename AT::t_tagint_1d _tag;
typename AT::t_int_1d _type;
typename AT::t_int_1d _mask;
typename AT::t_imageint_1d _image;
typename AT::t_float_1d _q;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d _nlocal;
int _dim;
X_FLOAT _lo,_hi;
AtomVecChargeKokkos_UnpackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d nlocal,
int dim, X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_q(atom->k_q.view<DeviceType>()),
_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
_lo(lo),_hi(hi){
const size_t elements = 12;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &myrecv) const {
X_FLOAT x = _buf(myrecv,_dim+1);
if (x >= _lo && x < _hi) {
int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
_x(i,0) = _buf(myrecv,1);
_x(i,1) = _buf(myrecv,2);
_x(i,2) = _buf(myrecv,3);
_v(i,0) = _buf(myrecv,4);
_v(i,1) = _buf(myrecv,5);
_v(i,2) = _buf(myrecv,6);
- _tag[i] = _buf(myrecv,7);
- _type[i] = _buf(myrecv,8);
- _mask[i] = _buf(myrecv,9);
- _image[i] = _buf(myrecv,10);
+ _tag[i] = (tagint) d_ubuf(_buf(myrecv,7)).i;
+ _type[i] = (int) d_ubuf(_buf(myrecv,8)).i;
+ _mask[i] = (int) d_ubuf(_buf(myrecv,9)).i;
+ _image[i] = (imageint) d_ubuf(_buf(myrecv,10)).i;
_q[i] = _buf(myrecv,11);
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,
int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,
ExecutionSpace space) {
if(space == Host) {
k_count.h_view(0) = nlocal;
AtomVecChargeKokkos_UnpackExchangeFunctor<LMPHostType> f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/12,f);
LMPHostType::fence();
return k_count.h_view(0);
} else {
k_count.h_view(0) = nlocal;
k_count.modify<LMPHostType>();
k_count.sync<LMPDeviceType>();
AtomVecChargeKokkos_UnpackExchangeFunctor<LMPDeviceType>
f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/12,f);
LMPDeviceType::fence();
k_count.modify<LMPDeviceType>();
k_count.sync<LMPHostType>();
return k_count.h_view(0);
}
}
/* ---------------------------------------------------------------------- */
int AtomVecChargeKokkos::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | Q_MASK);
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_q[nlocal] = buf[m++];
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecChargeKokkos::size_restart()
{
int i;
int nlocal = atom->nlocal;
int n = 12 * nlocal;
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_restart(int i, double *buf)
{
sync(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | Q_MASK);
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = h_q[i];
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecChargeKokkos::unpack_restart(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | Q_MASK);
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_q[nlocal] = buf[m++];
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecChargeKokkos::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
atomKK->modified(Host,ALL_MASK);
grow(0);
}
atomKK->sync(Host,ALL_MASK);
atomKK->modified(Host,ALL_MASK);
tag[nlocal] = 0;
type[nlocal] = itype;
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_mask[nlocal] = 1;
h_image[nlocal] = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
h_q[nlocal] = 0.0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecChargeKokkos::data_atom(double *coord, imageint imagetmp,
char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
h_tag[nlocal] = atoi(values[0]);
h_type[nlocal] = atoi(values[1]);
if (type[nlocal] <= 0 || type[nlocal] > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
h_q[nlocal] = atof(values[2]);
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_image[nlocal] = imagetmp;
h_mask[nlocal] = 1;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
atomKK->modified(Host,ALL_MASK);
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecChargeKokkos::data_atom_hybrid(int nlocal, char **values)
{
h_q[nlocal] = atof(values[0]);
return 1;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecChargeKokkos::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = h_tag[i];
buf[i][1] = h_type[i];
buf[i][2] = h_q[i];
buf[i][3] = h_x(i,0);
buf[i][4] = h_x(i,1);
buf[i][5] = h_x(i,2);
buf[i][6] = (h_image[i] & IMGMASK) - IMGMAX;
buf[i][7] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
buf[i][8] = (h_image[i] >> IMG2BITS) - IMGMAX;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecChargeKokkos::pack_data_hybrid(int i, double *buf)
{
buf[0] = h_q[i];
return 1;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecChargeKokkos::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,"%d %d %-1.16e %-1.16e %-1.16e %-1.16e %d %d %d\n",
(int) buf[i][0],(int) buf[i][1],buf[i][2],buf[i][3],buf[i][4],buf[i][5],
(int) buf[i][6],(int) buf[i][7],(int) buf[i][8]);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecChargeKokkos::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," %-1.16e",buf[0]);
return 1;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecChargeKokkos::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);
if (atom->memcheck("q")) bytes += memory->usage(q,nmax);
return bytes;
}
/* ---------------------------------------------------------------------- */
void AtomVecChargeKokkos::sync(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
if (mask & Q_MASK) atomKK->k_q.sync<LMPDeviceType>();
} else {
if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
if (mask & Q_MASK) atomKK->k_q.sync<LMPHostType>();
}
}
/* ---------------------------------------------------------------------- */
void AtomVecChargeKokkos::modified(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
if (mask & Q_MASK) atomKK->k_q.modify<LMPDeviceType>();
} else {
if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
if (mask & Q_MASK) atomKK->k_q.modify<LMPHostType>();
}
}
void AtomVecChargeKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
if ((mask & MOLECULE_MASK) && atomKK->k_q.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_float_1d>(atomKK->k_q,space);
} else {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
if ((mask & MOLECULE_MASK) && atomKK->k_q.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_float_1d>(atomKK->k_q,space);
}
}
diff --git a/src/KOKKOS/atom_vec_full_kokkos.cpp b/src/KOKKOS/atom_vec_full_kokkos.cpp
index 731168b6e..fa4cf18ae 100644
--- a/src/KOKKOS/atom_vec_full_kokkos.cpp
+++ b/src/KOKKOS/atom_vec_full_kokkos.cpp
@@ -1,2490 +1,2444 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <stdlib.h>
#include "atom_vec_full_kokkos.h"
#include "atom_kokkos.h"
#include "comm_kokkos.h"
#include "domain.h"
#include "modify.h"
#include "fix.h"
#include "atom_masks.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
#define DELTA 10000
/* ---------------------------------------------------------------------- */
AtomVecFullKokkos::AtomVecFullKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
{
molecular = 1;
bonds_allow = angles_allow = dihedrals_allow = impropers_allow = 1;
mass_type = 1;
comm_x_only = comm_f_only = 1;
size_forward = 3;
size_reverse = 3;
size_border = 8;
size_velocity = 3;
size_data_atom = 7;
size_data_vel = 4;
xcol_data = 5;
atom->molecule_flag = atom->q_flag = 1;
k_count = DAT::tdual_int_1d("atom::k_count",1);
atomKK = (AtomKokkos *) atom;
commKK = (CommKokkos *) comm;
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by DELTA
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecFullKokkos::grow(int n)
{
if (n == 0) nmax += DELTA;
else nmax = n;
atomKK->nmax = nmax;
if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
sync(Device,ALL_MASK);
modified(Device,ALL_MASK);
memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");
memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");
memory->grow_kokkos(atomKK->k_q,atomKK->q,nmax,"atom:q");
memory->grow_kokkos(atomKK->k_molecule,atomKK->molecule,nmax,"atom:molecule");
memory->grow_kokkos(atomKK->k_nspecial,atomKK->nspecial,nmax,3,"atom:nspecial");
memory->grow_kokkos(atomKK->k_special,atomKK->special,nmax,atomKK->maxspecial,
"atom:special");
memory->grow_kokkos(atomKK->k_num_bond,atomKK->num_bond,nmax,"atom:num_bond");
memory->grow_kokkos(atomKK->k_bond_type,atomKK->bond_type,nmax,atomKK->bond_per_atom,
"atom:bond_type");
memory->grow_kokkos(atomKK->k_bond_atom,atomKK->bond_atom,nmax,atomKK->bond_per_atom,
"atom:bond_atom");
memory->grow_kokkos(atomKK->k_num_angle,atomKK->num_angle,nmax,"atom:num_angle");
memory->grow_kokkos(atomKK->k_angle_type,atomKK->angle_type,nmax,atomKK->angle_per_atom,
"atom:angle_type");
memory->grow_kokkos(atomKK->k_angle_atom1,atomKK->angle_atom1,nmax,atomKK->angle_per_atom,
"atom:angle_atom1");
memory->grow_kokkos(atomKK->k_angle_atom2,atomKK->angle_atom2,nmax,atomKK->angle_per_atom,
"atom:angle_atom2");
memory->grow_kokkos(atomKK->k_angle_atom3,atomKK->angle_atom3,nmax,atomKK->angle_per_atom,
"atom:angle_atom3");
memory->grow_kokkos(atomKK->k_num_dihedral,atomKK->num_dihedral,nmax,"atom:num_dihedral");
memory->grow_kokkos(atomKK->k_dihedral_type,atomKK->dihedral_type,nmax,
atomKK->dihedral_per_atom,"atom:dihedral_type");
memory->grow_kokkos(atomKK->k_dihedral_atom1,atomKK->dihedral_atom1,nmax,
atomKK->dihedral_per_atom,"atom:dihedral_atom1");
memory->grow_kokkos(atomKK->k_dihedral_atom2,atomKK->dihedral_atom2,nmax,
atomKK->dihedral_per_atom,"atom:dihedral_atom2");
memory->grow_kokkos(atomKK->k_dihedral_atom3,atomKK->dihedral_atom3,nmax,
atomKK->dihedral_per_atom,"atom:dihedral_atom3");
memory->grow_kokkos(atomKK->k_dihedral_atom4,atomKK->dihedral_atom4,nmax,
atomKK->dihedral_per_atom,"atom:dihedral_atom4");
memory->grow_kokkos(atomKK->k_num_improper,atomKK->num_improper,nmax,"atom:num_improper");
memory->grow_kokkos(atomKK->k_improper_type,atomKK->improper_type,nmax,
atomKK->improper_per_atom,"atom:improper_type");
memory->grow_kokkos(atomKK->k_improper_atom1,atomKK->improper_atom1,nmax,
atomKK->improper_per_atom,"atom:improper_atom1");
memory->grow_kokkos(atomKK->k_improper_atom2,atomKK->improper_atom2,nmax,
atomKK->improper_per_atom,"atom:improper_atom2");
memory->grow_kokkos(atomKK->k_improper_atom3,atomKK->improper_atom3,nmax,
atomKK->improper_per_atom,"atom:improper_atom3");
memory->grow_kokkos(atomKK->k_improper_atom4,atomKK->improper_atom4,nmax,
atomKK->improper_per_atom,"atom:improper_atom4");
grow_reset();
sync(Host,ALL_MASK);
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecFullKokkos::grow_reset()
{
tag = atomKK->tag;
d_tag = atomKK->k_tag.d_view;
h_tag = atomKK->k_tag.h_view;
type = atomKK->type;
d_type = atomKK->k_type.d_view;
h_type = atomKK->k_type.h_view;
mask = atomKK->mask;
d_mask = atomKK->k_mask.d_view;
h_mask = atomKK->k_mask.h_view;
image = atomKK->image;
d_image = atomKK->k_image.d_view;
h_image = atomKK->k_image.h_view;
x = atomKK->x;
d_x = atomKK->k_x.d_view;
h_x = atomKK->k_x.h_view;
v = atomKK->v;
d_v = atomKK->k_v.d_view;
h_v = atomKK->k_v.h_view;
f = atomKK->f;
d_f = atomKK->k_f.d_view;
h_f = atomKK->k_f.h_view;
q = atomKK->q;
d_q = atomKK->k_q.d_view;
h_q = atomKK->k_q.h_view;
molecule = atomKK->molecule;
d_molecule = atomKK->k_molecule.d_view;
h_molecule = atomKK->k_molecule.h_view;
nspecial = atomKK->nspecial;
d_nspecial = atomKK->k_nspecial.d_view;
h_nspecial = atomKK->k_nspecial.h_view;
special = atomKK->special;
d_special = atomKK->k_special.d_view;
h_special = atomKK->k_special.h_view;
num_bond = atomKK->num_bond;
d_num_bond = atomKK->k_num_bond.d_view;
h_num_bond = atomKK->k_num_bond.h_view;
bond_type = atomKK->bond_type;
d_bond_type = atomKK->k_bond_type.d_view;
h_bond_type = atomKK->k_bond_type.h_view;
bond_atom = atomKK->bond_atom;
d_bond_atom = atomKK->k_bond_atom.d_view;
h_bond_atom = atomKK->k_bond_atom.h_view;
num_angle = atomKK->num_angle;
d_num_angle = atomKK->k_num_angle.d_view;
h_num_angle = atomKK->k_num_angle.h_view;
angle_type = atomKK->angle_type;
d_angle_type = atomKK->k_angle_type.d_view;
h_angle_type = atomKK->k_angle_type.h_view;
angle_atom1 = atomKK->angle_atom1;
d_angle_atom1 = atomKK->k_angle_atom1.d_view;
h_angle_atom1 = atomKK->k_angle_atom1.h_view;
angle_atom2 = atomKK->angle_atom2;
d_angle_atom2 = atomKK->k_angle_atom2.d_view;
h_angle_atom2 = atomKK->k_angle_atom2.h_view;
angle_atom3 = atomKK->angle_atom3;
d_angle_atom3 = atomKK->k_angle_atom3.d_view;
h_angle_atom3 = atomKK->k_angle_atom3.h_view;
num_dihedral = atomKK->num_dihedral;
d_num_dihedral = atomKK->k_num_dihedral.d_view;
h_num_dihedral = atomKK->k_num_dihedral.h_view;
dihedral_type = atomKK->dihedral_type;
d_dihedral_type = atomKK->k_dihedral_type.d_view;
h_dihedral_type = atomKK->k_dihedral_type.h_view;
dihedral_atom1 = atomKK->dihedral_atom1;
d_dihedral_atom1 = atomKK->k_dihedral_atom1.d_view;
h_dihedral_atom1 = atomKK->k_dihedral_atom1.h_view;
dihedral_atom2 = atomKK->dihedral_atom2;
d_dihedral_atom2 = atomKK->k_dihedral_atom2.d_view;
h_dihedral_atom2 = atomKK->k_dihedral_atom2.h_view;
dihedral_atom3 = atomKK->dihedral_atom3;
d_dihedral_atom3 = atomKK->k_dihedral_atom3.d_view;
h_dihedral_atom3 = atomKK->k_dihedral_atom3.h_view;
dihedral_atom4 = atomKK->dihedral_atom4;
d_dihedral_atom4 = atomKK->k_dihedral_atom4.d_view;
h_dihedral_atom4 = atomKK->k_dihedral_atom4.h_view;
num_improper = atomKK->num_improper;
d_num_improper = atomKK->k_num_improper.d_view;
h_num_improper = atomKK->k_num_improper.h_view;
improper_type = atomKK->improper_type;
d_improper_type = atomKK->k_improper_type.d_view;
h_improper_type = atomKK->k_improper_type.h_view;
improper_atom1 = atomKK->improper_atom1;
d_improper_atom1 = atomKK->k_improper_atom1.d_view;
h_improper_atom1 = atomKK->k_improper_atom1.h_view;
improper_atom2 = atomKK->improper_atom2;
d_improper_atom2 = atomKK->k_improper_atom2.d_view;
h_improper_atom2 = atomKK->k_improper_atom2.h_view;
improper_atom3 = atomKK->improper_atom3;
d_improper_atom3 = atomKK->k_improper_atom3.d_view;
h_improper_atom3 = atomKK->k_improper_atom3.h_view;
improper_atom4 = atomKK->improper_atom4;
d_improper_atom4 = atomKK->k_improper_atom4.d_view;
h_improper_atom4 = atomKK->k_improper_atom4.h_view;
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecFullKokkos::copy(int i, int j, int delflag)
{
int k;
h_tag[j] = h_tag[i];
h_type[j] = h_type[i];
mask[j] = mask[i];
h_image[j] = h_image[i];
h_x(j,0) = h_x(i,0);
h_x(j,1) = h_x(i,1);
h_x(j,2) = h_x(i,2);
h_v(j,0) = h_v(i,0);
h_v(j,1) = h_v(i,1);
h_v(j,2) = h_v(i,2);
h_q[j] = h_q[i];
h_molecule(j) = h_molecule(i);
h_num_bond(j) = h_num_bond(i);
for (k = 0; k < h_num_bond(j); k++) {
h_bond_type(j,k) = h_bond_type(i,k);
h_bond_atom(j,k) = h_bond_atom(i,k);
}
h_nspecial(j,0) = h_nspecial(i,0);
h_nspecial(j,1) = h_nspecial(i,1);
h_nspecial(j,2) = h_nspecial(i,2);
for (k = 0; k < h_nspecial(j,2); k++)
h_special(j,k) = h_special(i,k);
h_num_angle(j) = h_num_angle(i);
for (k = 0; k < h_num_angle(j); k++) {
h_angle_type(j,k) = h_angle_type(i,k);
h_angle_atom1(j,k) = h_angle_atom1(i,k);
h_angle_atom2(j,k) = h_angle_atom2(i,k);
h_angle_atom3(j,k) = h_angle_atom3(i,k);
}
h_num_dihedral(j) = h_num_dihedral(i);
for (k = 0; k < h_num_dihedral(j); k++) {
h_dihedral_type(j,k) = h_dihedral_type(i,k);
h_dihedral_atom1(j,k) = h_dihedral_atom1(i,k);
h_dihedral_atom2(j,k) = h_dihedral_atom2(i,k);
h_dihedral_atom3(j,k) = h_dihedral_atom3(i,k);
h_dihedral_atom4(j,k) = h_dihedral_atom4(i,k);
}
h_num_improper(j) = h_num_improper(i);
for (k = 0; k < h_num_improper(j); k++) {
h_improper_type(j,k) = h_improper_type(i,k);
h_improper_atom1(j,k) = h_improper_atom1(i,k);
h_improper_atom2(j,k) = h_improper_atom2(i,k);
h_improper_atom3(j,k) = h_improper_atom3(i,k);
h_improper_atom4(j,k) = h_improper_atom4(i,k);
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecFullKokkos_PackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecFullKokkos_PackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
const size_t maxsend = (buf.view<DeviceType>().dimension_0()
*buf.view<DeviceType>().dimension_1())/3;
const size_t elements = 3;
buffer_view<DeviceType>(_buf,buf,maxsend,elements);
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_comm_kokkos(const int &n,
const DAT::tdual_int_2d &list,
const int & iswap,
const DAT::tdual_xfloat_2d &buf,
const int &pbc_flag,
const int* const pbc)
{
// Check whether to always run forward communication on the host
// Choose correct forward PackComm kernel
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecFullKokkos_PackComm<LMPHostType,1,1>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecFullKokkos_PackComm<LMPHostType,1,0>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecFullKokkos_PackComm<LMPHostType,0,1>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecFullKokkos_PackComm<LMPHostType,0,0>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecFullKokkos_PackComm<LMPDeviceType,1,1>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecFullKokkos_PackComm<LMPDeviceType,1,0>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecFullKokkos_PackComm<LMPDeviceType,0,1>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecFullKokkos_PackComm<LMPDeviceType,0,0>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*size_forward;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecFullKokkos_PackCommSelf {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_x_array _xw;
int _nfirst;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecFullKokkos_PackCommSelf(
const typename DAT::tdual_x_array &x,
const int &nfirst,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),
_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_xw(i+_nfirst,0) = _x(j,0);
_xw(i+_nfirst,1) = _x(j,1);
_xw(i+_nfirst,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list,
const int & iswap,
const int nfirst, const int &pbc_flag,
const int* const pbc) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecFullKokkos_PackCommSelf<LMPHostType,1,1>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecFullKokkos_PackCommSelf<LMPHostType,1,0>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecFullKokkos_PackCommSelf<LMPHostType,0,1>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecFullKokkos_PackCommSelf<LMPHostType,0,0>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecFullKokkos_PackCommSelf<LMPDeviceType,1,1>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecFullKokkos_PackCommSelf<LMPDeviceType,1,0>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecFullKokkos_PackCommSelf<LMPDeviceType,0,1>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecFullKokkos_PackCommSelf<LMPDeviceType,0,0>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*3;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecFullKokkos_UnpackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
int _first;
AtomVecFullKokkos_UnpackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
_first(first) {};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
}
};
/* ---------------------------------------------------------------------- */
void AtomVecFullKokkos::unpack_comm_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf ) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
struct AtomVecFullKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
struct AtomVecFullKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecFullKokkos::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void AtomVecFullKokkos::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_reverse(int n, int first, double *buf)
{
if(n > 0)
sync(Host,F_MASK);
int m = 0;
const int last = first + n;
for (int i = first; i < last; i++) {
buf[m++] = h_f(i,0);
buf[m++] = h_f(i,1);
buf[m++] = h_f(i,2);
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecFullKokkos::unpack_reverse(int n, int *list, double *buf)
{
if(n > 0)
modified(Host,F_MASK);
int m = 0;
for (int i = 0; i < n; i++) {
const int j = list[i];
h_f(j,0) += buf[m++];
h_f(j,1) += buf[m++];
h_f(j,2) += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG>
struct AtomVecFullKokkos_PackBorder {
- union ubuf {
- double d;
- int64_t i;
- KOKKOS_INLINE_FUNCTION
- ubuf(double arg) : d(arg) {}
- KOKKOS_INLINE_FUNCTION
- ubuf(int64_t arg) : i(arg) {}
- KOKKOS_INLINE_FUNCTION
- ubuf(int arg) : i(arg) {}
- };
-
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_xfloat_2d _buf;
const typename AT::t_int_2d_const _list;
const int _iswap;
const typename AT::t_x_array_randomread _x;
const typename AT::t_tagint_1d _tag;
const typename AT::t_int_1d _type;
const typename AT::t_int_1d _mask;
const typename AT::t_float_1d _q;
const typename AT::t_tagint_1d _molecule;
X_FLOAT _dx,_dy,_dz;
AtomVecFullKokkos_PackBorder(
const typename AT::t_xfloat_2d &buf,
const typename AT::t_int_2d_const &list,
const int & iswap,
const typename AT::t_x_array &x,
const typename AT::t_tagint_1d &tag,
const typename AT::t_int_1d &type,
const typename AT::t_int_1d &mask,
const typename AT::t_float_1d &q,
const typename AT::t_tagint_1d &molecule,
const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
_buf(buf),_list(list),_iswap(iswap),
_x(x),_tag(tag),_type(type),_mask(mask),_q(q),_molecule(molecule),
_dx(dx),_dy(dy),_dz(dz) {}
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
- _buf(i,3) = ubuf(_tag(j)).d;
- _buf(i,4) = ubuf(_type(j)).d;
- _buf(i,5) = ubuf(_mask(j)).d;
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
_buf(i,6) = _q(j);
- _buf(i,7) = ubuf(_molecule(j)).d;
+ _buf(i,7) = d_ubuf(_molecule(j)).d;
} else {
_buf(i,0) = _x(j,0) + _dx;
_buf(i,1) = _x(j,1) + _dy;
_buf(i,2) = _x(j,2) + _dz;
- _buf(i,3) = ubuf(_tag(j)).d;
- _buf(i,4) = ubuf(_type(j)).d;
- _buf(i,5) = ubuf(_mask(j)).d;
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
_buf(i,6) = _q(j);
- _buf(i,7) = ubuf(_molecule(j)).d;
+ _buf(i,7) = d_ubuf(_molecule(j)).d;
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist,
DAT::tdual_xfloat_2d buf,int iswap,
int pbc_flag, int *pbc, ExecutionSpace space)
{
X_FLOAT dx,dy,dz;
if (pbc_flag != 0) {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if(space==Host) {
AtomVecFullKokkos_PackBorder<LMPHostType,1> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,h_q,h_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecFullKokkos_PackBorder<LMPDeviceType,1> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,d_q,d_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
} else {
dx = dy = dz = 0;
if(space==Host) {
AtomVecFullKokkos_PackBorder<LMPHostType,0> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,h_q,h_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecFullKokkos_PackBorder<LMPDeviceType,0> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,d_q,d_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
return n*size_border;
}
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_q(j);
buf[m++] = ubuf(h_molecule(j)).d;
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_q(j);
buf[m++] = ubuf(h_molecule(j)).d;
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_q(j);
buf[m++] = ubuf(h_molecule(j)).d;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_q(j);
buf[m++] = ubuf(h_molecule(j)).d;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = h_q(j);
buf[m++] = ubuf(h_molecule(j)).d;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_q(j);
buf[m++] = ubuf(h_molecule(j)).d;
}
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecFullKokkos_UnpackBorder {
- union ubuf {
- double d;
- int64_t i;
- KOKKOS_INLINE_FUNCTION
- ubuf(double arg) : d(arg) {}
- KOKKOS_INLINE_FUNCTION
- ubuf(int64_t arg) : i(arg) {}
- KOKKOS_INLINE_FUNCTION
- ubuf(int arg) : i(arg) {}
- };
-
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
const typename AT::t_xfloat_2d_const _buf;
typename AT::t_x_array _x;
typename AT::t_tagint_1d _tag;
typename AT::t_int_1d _type;
typename AT::t_int_1d _mask;
typename AT::t_float_1d _q;
typename AT::t_tagint_1d _molecule;
int _first;
AtomVecFullKokkos_UnpackBorder(
const typename AT::t_xfloat_2d_const &buf,
typename AT::t_x_array &x,
typename AT::t_tagint_1d &tag,
typename AT::t_int_1d &type,
typename AT::t_int_1d &mask,
typename AT::t_float_1d &q,
typename AT::t_tagint_1d &molecule,
const int& first):
_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_q(q),_molecule(molecule),
_first(first){
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
- _tag(i+_first) = (tagint) ubuf(_buf(i,3)).i;
- _type(i+_first) = (int) ubuf(_buf(i,4)).i;
- _mask(i+_first) = (int) ubuf(_buf(i,5)).i;
+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
_q(i+_first) = _buf(i,6);
- _molecule(i+_first) = (tagint) ubuf(_buf(i,7)).i;
+ _molecule(i+_first) = (tagint) d_ubuf(_buf(i,7)).i;
}
};
/* ---------------------------------------------------------------------- */
void AtomVecFullKokkos::unpack_border_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf,
ExecutionSpace space) {
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|Q_MASK|MOLECULE_MASK);
while (first+n >= nmax) grow(0);
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|Q_MASK|MOLECULE_MASK);
if(space==Host) {
struct AtomVecFullKokkos_UnpackBorder<LMPHostType>
f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,h_q,h_molecule,first);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
struct AtomVecFullKokkos_UnpackBorder<LMPDeviceType>
f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,d_q,d_molecule,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
void AtomVecFullKokkos::unpack_border(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|Q_MASK|MOLECULE_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_q(i) = buf[m++];
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecFullKokkos::unpack_border_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|V_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|Q_MASK|MOLECULE_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_q(i) = buf[m++];
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::unpack_border_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_q(i) = buf[m++];
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
}
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecFullKokkos_PackExchangeFunctor {
-
- union ubuf {
- double d;
- int64_t i;
- KOKKOS_INLINE_FUNCTION
- ubuf(double arg) : d(arg) {}
- KOKKOS_INLINE_FUNCTION
- ubuf(int64_t arg) : i(arg) {}
- KOKKOS_INLINE_FUNCTION
- ubuf(int arg) : i(arg) {}
- };
-
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array_randomread _x;
typename AT::t_v_array_randomread _v;
typename AT::t_tagint_1d_randomread _tag;
typename AT::t_int_1d_randomread _type;
typename AT::t_int_1d_randomread _mask;
typename AT::t_imageint_1d_randomread _image;
typename AT::t_float_1d_randomread _q;
typename AT::t_tagint_1d_randomread _molecule;
typename AT::t_int_2d_randomread _nspecial;
typename AT::t_tagint_2d_randomread _special;
typename AT::t_int_1d_randomread _num_bond;
typename AT::t_int_2d_randomread _bond_type;
typename AT::t_tagint_2d_randomread _bond_atom;
typename AT::t_int_1d_randomread _num_angle;
typename AT::t_int_2d_randomread _angle_type;
typename AT::t_tagint_2d_randomread _angle_atom1,_angle_atom2,_angle_atom3;
typename AT::t_int_1d_randomread _num_dihedral;
typename AT::t_int_2d_randomread _dihedral_type;
typename AT::t_tagint_2d_randomread _dihedral_atom1,_dihedral_atom2,
_dihedral_atom3,_dihedral_atom4;
typename AT::t_int_1d_randomread _num_improper;
typename AT::t_int_2d_randomread _improper_type;
typename AT::t_tagint_2d_randomread _improper_atom1,_improper_atom2,
_improper_atom3,_improper_atom4;
typename AT::t_x_array _xw;
typename AT::t_v_array _vw;
typename AT::t_tagint_1d _tagw;
typename AT::t_int_1d _typew;
typename AT::t_int_1d _maskw;
typename AT::t_imageint_1d _imagew;
typename AT::t_float_1d _qw;
typename AT::t_tagint_1d _moleculew;
typename AT::t_int_2d _nspecialw;
typename AT::t_tagint_2d _specialw;
typename AT::t_int_1d _num_bondw;
typename AT::t_int_2d _bond_typew;
typename AT::t_tagint_2d _bond_atomw;
typename AT::t_int_1d _num_anglew;
typename AT::t_int_2d _angle_typew;
typename AT::t_tagint_2d _angle_atom1w,_angle_atom2w,_angle_atom3w;
typename AT::t_int_1d _num_dihedralw;
typename AT::t_int_2d _dihedral_typew;
typename AT::t_tagint_2d _dihedral_atom1w,_dihedral_atom2w,
_dihedral_atom3w,_dihedral_atom4w;
typename AT::t_int_1d _num_improperw;
typename AT::t_int_2d _improper_typew;
typename AT::t_tagint_2d _improper_atom1w,_improper_atom2w,
_improper_atom3w,_improper_atom4w;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d_const _sendlist;
typename AT::t_int_1d_const _copylist;
int _nlocal,_dim;
X_FLOAT _lo,_hi;
size_t elements;
AtomVecFullKokkos_PackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d sendlist,
typename AT::tdual_int_1d copylist,int nlocal, int dim,
X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_q(atom->k_q.view<DeviceType>()),
_molecule(atom->k_molecule.view<DeviceType>()),
_nspecial(atom->k_nspecial.view<DeviceType>()),
_special(atom->k_special.view<DeviceType>()),
_num_bond(atom->k_num_bond.view<DeviceType>()),
_bond_type(atom->k_bond_type.view<DeviceType>()),
_bond_atom(atom->k_bond_atom.view<DeviceType>()),
_num_angle(atom->k_num_angle.view<DeviceType>()),
_angle_type(atom->k_angle_type.view<DeviceType>()),
_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
_num_dihedral(atom->k_num_dihedral.view<DeviceType>()),
_dihedral_type(atom->k_dihedral_type.view<DeviceType>()),
_dihedral_atom1(atom->k_dihedral_atom1.view<DeviceType>()),
_dihedral_atom2(atom->k_dihedral_atom2.view<DeviceType>()),
_dihedral_atom3(atom->k_dihedral_atom3.view<DeviceType>()),
_dihedral_atom4(atom->k_dihedral_atom4.view<DeviceType>()),
_num_improper(atom->k_num_improper.view<DeviceType>()),
_improper_type(atom->k_improper_type.view<DeviceType>()),
_improper_atom1(atom->k_improper_atom1.view<DeviceType>()),
_improper_atom2(atom->k_improper_atom2.view<DeviceType>()),
_improper_atom3(atom->k_improper_atom3.view<DeviceType>()),
_improper_atom4(atom->k_improper_atom4.view<DeviceType>()),
_xw(atom->k_x.view<DeviceType>()),
_vw(atom->k_v.view<DeviceType>()),
_tagw(atom->k_tag.view<DeviceType>()),
_typew(atom->k_type.view<DeviceType>()),
_maskw(atom->k_mask.view<DeviceType>()),
_imagew(atom->k_image.view<DeviceType>()),
_qw(atom->k_q.view<DeviceType>()),
_moleculew(atom->k_molecule.view<DeviceType>()),
_nspecialw(atom->k_nspecial.view<DeviceType>()),
_specialw(atom->k_special.view<DeviceType>()),
_num_bondw(atom->k_num_bond.view<DeviceType>()),
_bond_typew(atom->k_bond_type.view<DeviceType>()),
_bond_atomw(atom->k_bond_atom.view<DeviceType>()),
_num_anglew(atom->k_num_angle.view<DeviceType>()),
_angle_typew(atom->k_angle_type.view<DeviceType>()),
_angle_atom1w(atom->k_angle_atom1.view<DeviceType>()),
_angle_atom2w(atom->k_angle_atom2.view<DeviceType>()),
_angle_atom3w(atom->k_angle_atom3.view<DeviceType>()),
_num_dihedralw(atom->k_num_dihedral.view<DeviceType>()),
_dihedral_typew(atom->k_dihedral_type.view<DeviceType>()),
_dihedral_atom1w(atom->k_dihedral_atom1.view<DeviceType>()),
_dihedral_atom2w(atom->k_dihedral_atom2.view<DeviceType>()),
_dihedral_atom3w(atom->k_dihedral_atom3.view<DeviceType>()),
_dihedral_atom4w(atom->k_dihedral_atom4.view<DeviceType>()),
_num_improperw(atom->k_num_improper.view<DeviceType>()),
_improper_typew(atom->k_improper_type.view<DeviceType>()),
_improper_atom1w(atom->k_improper_atom1.view<DeviceType>()),
_improper_atom2w(atom->k_improper_atom2.view<DeviceType>()),
_improper_atom3w(atom->k_improper_atom3.view<DeviceType>()),
_improper_atom4w(atom->k_improper_atom4.view<DeviceType>()),
_sendlist(sendlist.template view<DeviceType>()),
_copylist(copylist.template view<DeviceType>()),
_nlocal(nlocal),_dim(dim),
_lo(lo),_hi(hi){
// 3 comp of x, 3 comp of v, 1 tag, 1 type, 1 mask, 1 image, 1 molecule, 3 nspecial,
// maxspecial special, 1 num_bond, bond_per_atom bond_type, bond_per_atom bond_atom,
// 1 num_angle, angle_per_atom angle_type, angle_per_atom angle_atom1, angle_atom2,
// and angle_atom3
// 1 num_dihedral, dihedral_per_atom dihedral_type, 4*dihedral_per_atom
// 1 num_improper, 5*improper_per_atom
// 1 charge
// 1 to store buffer length
elements = 20+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom+
5*atom->dihedral_per_atom + 5*atom->improper_per_atom;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
- buf.template view<DeviceType>().dimension_1())/elements;
+ buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &mysend) const {
int k;
const int i = _sendlist(mysend);
_buf(mysend,0) = elements;
int m = 1;
_buf(mysend,m++) = _x(i,0);
_buf(mysend,m++) = _x(i,1);
_buf(mysend,m++) = _x(i,2);
_buf(mysend,m++) = _v(i,0);
_buf(mysend,m++) = _v(i,1);
_buf(mysend,m++) = _v(i,2);
- _buf(mysend,m++) = ubuf(_tag(i)).d;
- _buf(mysend,m++) = ubuf(_type(i)).d;
- _buf(mysend,m++) = ubuf(_mask(i)).d;
- _buf(mysend,m++) = ubuf(_image(i)).d;
+ _buf(mysend,m++) = d_ubuf(_tag(i)).d;
+ _buf(mysend,m++) = d_ubuf(_type(i)).d;
+ _buf(mysend,m++) = d_ubuf(_mask(i)).d;
+ _buf(mysend,m++) = d_ubuf(_image(i)).d;
_buf(mysend,m++) = _q(i);
- _buf(mysend,m++) = ubuf(_molecule(i)).d;
- _buf(mysend,m++) = ubuf(_num_bond(i)).d;
+ _buf(mysend,m++) = d_ubuf(_molecule(i)).d;
+ _buf(mysend,m++) = d_ubuf(_num_bond(i)).d;
for (k = 0; k < _num_bond(i); k++) {
- _buf(mysend,m++) = ubuf(_bond_type(i,k)).d;
- _buf(mysend,m++) = ubuf(_bond_atom(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_bond_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_bond_atom(i,k)).d;
}
- _buf(mysend,m++) = ubuf(_num_angle(i)).d;
+ _buf(mysend,m++) = d_ubuf(_num_angle(i)).d;
for (k = 0; k < _num_angle(i); k++) {
- _buf(mysend,m++) = ubuf(_angle_type(i,k)).d;
- _buf(mysend,m++) = ubuf(_angle_atom1(i,k)).d;
- _buf(mysend,m++) = ubuf(_angle_atom2(i,k)).d;
- _buf(mysend,m++) = ubuf(_angle_atom3(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_angle_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_angle_atom1(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_angle_atom2(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_angle_atom3(i,k)).d;
}
- _buf(mysend,m++) = ubuf(_num_dihedral(i)).d;
+ _buf(mysend,m++) = d_ubuf(_num_dihedral(i)).d;
for (k = 0; k < _num_dihedral(i); k++) {
- _buf(mysend,m++) = ubuf(_dihedral_type(i,k)).d;
- _buf(mysend,m++) = ubuf(_dihedral_atom1(i,k)).d;
- _buf(mysend,m++) = ubuf(_dihedral_atom2(i,k)).d;
- _buf(mysend,m++) = ubuf(_dihedral_atom3(i,k)).d;
- _buf(mysend,m++) = ubuf(_dihedral_atom4(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_dihedral_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_dihedral_atom1(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_dihedral_atom2(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_dihedral_atom3(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_dihedral_atom4(i,k)).d;
}
- _buf(mysend,m++) = ubuf(_num_improper(i)).d;
+ _buf(mysend,m++) = d_ubuf(_num_improper(i)).d;
for (k = 0; k < _num_improper(i); k++) {
- _buf(mysend,m++) = ubuf(_improper_type(i,k)).d;
- _buf(mysend,m++) = ubuf(_improper_atom1(i,k)).d;
- _buf(mysend,m++) = ubuf(_improper_atom2(i,k)).d;
- _buf(mysend,m++) = ubuf(_improper_atom3(i,k)).d;
- _buf(mysend,m++) = ubuf(_improper_atom4(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_improper_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_improper_atom1(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_improper_atom2(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_improper_atom3(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_improper_atom4(i,k)).d;
}
- _buf(mysend,m++) = ubuf(_nspecial(i,0)).d;
- _buf(mysend,m++) = ubuf(_nspecial(i,1)).d;
- _buf(mysend,m++) = ubuf(_nspecial(i,2)).d;
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,0)).d;
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,1)).d;
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,2)).d;
for (k = 0; k < _nspecial(i,2); k++)
- _buf(mysend,m++) = ubuf(_special(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_special(i,k)).d;
const int j = _copylist(mysend);
if(j>-1) {
_xw(i,0) = _x(j,0);
_xw(i,1) = _x(j,1);
_xw(i,2) = _x(j,2);
_vw(i,0) = _v(j,0);
_vw(i,1) = _v(j,1);
_vw(i,2) = _v(j,2);
_tagw(i) = _tag(j);
_typew(i) = _type(j);
_maskw(i) = _mask(j);
_imagew(i) = _image(j);
_qw(i) = _q(j);
_moleculew(i) = _molecule(j);
_num_bondw(i) = _num_bond(j);
for (k = 0; k < _num_bond(j); k++) {
_bond_typew(i,k) = _bond_type(j,k);
_bond_atomw(i,k) = _bond_atom(j,k);
}
_num_anglew(i) = _num_angle(j);
for (k = 0; k < _num_angle(j); k++) {
_angle_typew(i,k) = _angle_type(j,k);
_angle_atom1w(i,k) = _angle_atom1(j,k);
_angle_atom2w(i,k) = _angle_atom2(j,k);
_angle_atom3w(i,k) = _angle_atom3(j,k);
}
_num_dihedralw(i) = _num_dihedral(j);
for (k = 0; k < _num_dihedral(j); k++) {
_dihedral_typew(i,k) = _dihedral_type(j,k);
_dihedral_atom1w(i,k) = _dihedral_atom1(j,k);
_dihedral_atom2w(i,k) = _dihedral_atom2(j,k);
_dihedral_atom3w(i,k) = _dihedral_atom3(j,k);
_dihedral_atom4w(i,k) = _dihedral_atom4(j,k);
}
_num_improperw(i) = _num_improper(j);
for (k = 0; k < _num_improper(j); k++) {
_improper_typew(i,k) = _improper_type(j,k);
_improper_atom1w(i,k) = _improper_atom1(j,k);
_improper_atom2w(i,k) = _improper_atom2(j,k);
_improper_atom3w(i,k) = _improper_atom3(j,k);
_improper_atom4w(i,k) = _improper_atom4(j,k);
}
_nspecialw(i,0) = _nspecial(j,0);
_nspecialw(i,1) = _nspecial(j,1);
_nspecialw(i,2) = _nspecial(j,2);
for (k = 0; k < _nspecial(j,2); k++)
_specialw(i,k) = _special(j,k);
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf,
DAT::tdual_int_1d k_sendlist,
DAT::tdual_int_1d k_copylist,
ExecutionSpace space,int dim,X_FLOAT lo,
X_FLOAT hi )
{
const int elements = 20+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom+
5*atom->dihedral_per_atom + 5*atom->improper_per_atom;
if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*
k_buf.view<LMPHostType>().dimension_1())/elements) {
int newsize = nsend*elements/k_buf.view<LMPHostType>().dimension_1()+1;
k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
}
if(space == Host) {
AtomVecFullKokkos_PackExchangeFunctor<LMPHostType>
f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPHostType::fence();
return nsend*elements;
} else {
AtomVecFullKokkos_PackExchangeFunctor<LMPDeviceType>
f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPDeviceType::fence();
return nsend*elements;
}
}
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_exchange(int i, double *buf)
{
int k;
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = h_q(i);
buf[m++] = ubuf(h_molecule(i)).d;
buf[m++] = ubuf(h_num_bond(i)).d;
for (k = 0; k < h_num_bond(i); k++) {
buf[m++] = ubuf(h_bond_type(i,k)).d;
buf[m++] = ubuf(h_bond_atom(i,k)).d;
}
buf[m++] = ubuf(h_num_angle(i)).d;
for (k = 0; k < h_num_angle(i); k++) {
buf[m++] = ubuf(h_angle_type(i,k)).d;
buf[m++] = ubuf(h_angle_atom1(i,k)).d;
buf[m++] = ubuf(h_angle_atom2(i,k)).d;
buf[m++] = ubuf(h_angle_atom3(i,k)).d;
}
buf[m++] = ubuf(h_num_dihedral(i)).d;
for (k = 0; k < h_num_dihedral(i); k++) {
buf[m++] = ubuf(h_dihedral_type(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom1(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom2(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom3(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom4(i,k)).d;
}
buf[m++] = ubuf(h_num_improper(i)).d;
for (k = 0; k < h_num_improper(i); k++) {
buf[m++] = ubuf(h_improper_type(i,k)).d;
buf[m++] = ubuf(h_improper_atom1(i,k)).d;
buf[m++] = ubuf(h_improper_atom2(i,k)).d;
buf[m++] = ubuf(h_improper_atom3(i,k)).d;
buf[m++] = ubuf(h_improper_atom4(i,k)).d;
}
buf[m++] = ubuf(h_nspecial(i,0)).d;
buf[m++] = ubuf(h_nspecial(i,1)).d;
buf[m++] = ubuf(h_nspecial(i,2)).d;
for (k = 0; k < h_nspecial(i,2); k++)
buf[m++] = ubuf(h_special(i,k)).d;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecFullKokkos_UnpackExchangeFunctor {
-
- union ubuf {
- double d;
- int64_t i;
- KOKKOS_INLINE_FUNCTION
- ubuf(double arg) : d(arg) {}
- KOKKOS_INLINE_FUNCTION
- ubuf(int64_t arg) : i(arg) {}
- KOKKOS_INLINE_FUNCTION
- ubuf(int arg) : i(arg) {}
- };
-
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array _x;
typename AT::t_v_array _v;
typename AT::t_tagint_1d _tag;
typename AT::t_int_1d _type;
typename AT::t_int_1d _mask;
typename AT::t_imageint_1d _image;
typename AT::t_float_1d _q;
typename AT::t_tagint_1d _molecule;
typename AT::t_int_2d _nspecial;
typename AT::t_tagint_2d _special;
typename AT::t_int_1d _num_bond;
typename AT::t_int_2d _bond_type;
typename AT::t_tagint_2d _bond_atom;
typename AT::t_int_1d _num_angle;
typename AT::t_int_2d _angle_type;
typename AT::t_tagint_2d _angle_atom1,_angle_atom2,_angle_atom3;
typename AT::t_int_1d _num_dihedral;
typename AT::t_int_2d _dihedral_type;
typename AT::t_tagint_2d _dihedral_atom1,_dihedral_atom2,
_dihedral_atom3,_dihedral_atom4;
typename AT::t_int_1d _num_improper;
typename AT::t_int_2d _improper_type;
typename AT::t_tagint_2d _improper_atom1,_improper_atom2,
_improper_atom3,_improper_atom4;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d _nlocal;
int _dim;
X_FLOAT _lo,_hi;
size_t elements;
AtomVecFullKokkos_UnpackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d nlocal,
int dim, X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_q(atom->k_q.view<DeviceType>()),
_molecule(atom->k_molecule.view<DeviceType>()),
_nspecial(atom->k_nspecial.view<DeviceType>()),
_special(atom->k_special.view<DeviceType>()),
_num_bond(atom->k_num_bond.view<DeviceType>()),
_bond_type(atom->k_bond_type.view<DeviceType>()),
_bond_atom(atom->k_bond_atom.view<DeviceType>()),
_num_angle(atom->k_num_angle.view<DeviceType>()),
_angle_type(atom->k_angle_type.view<DeviceType>()),
_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
_num_dihedral(atom->k_num_dihedral.view<DeviceType>()),
_dihedral_type(atom->k_dihedral_type.view<DeviceType>()),
_dihedral_atom1(atom->k_dihedral_atom1.view<DeviceType>()),
_dihedral_atom2(atom->k_dihedral_atom2.view<DeviceType>()),
_dihedral_atom3(atom->k_dihedral_atom3.view<DeviceType>()),
_dihedral_atom4(atom->k_dihedral_atom4.view<DeviceType>()),
_num_improper(atom->k_num_improper.view<DeviceType>()),
_improper_type(atom->k_improper_type.view<DeviceType>()),
_improper_atom1(atom->k_improper_atom1.view<DeviceType>()),
_improper_atom2(atom->k_improper_atom2.view<DeviceType>()),
_improper_atom3(atom->k_improper_atom3.view<DeviceType>()),
_improper_atom4(atom->k_improper_atom4.view<DeviceType>()),
_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
_lo(lo),_hi(hi){
elements = 20+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom+
5*atom->dihedral_per_atom + 5*atom->improper_per_atom;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
- buf.template view<DeviceType>().dimension_1())/elements;
+ buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &myrecv) const {
X_FLOAT x = _buf(myrecv,_dim+1);
if (x >= _lo && x < _hi) {
int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
int m = 1;
_x(i,0) = _buf(myrecv,m++);
_x(i,1) = _buf(myrecv,m++);
_x(i,2) = _buf(myrecv,m++);
_v(i,0) = _buf(myrecv,m++);
_v(i,1) = _buf(myrecv,m++);
_v(i,2) = _buf(myrecv,m++);
- _tag(i) = (tagint) ubuf(_buf(myrecv,m++)).i;
- _type(i) = (int) ubuf(_buf(myrecv,m++)).i;
- _mask(i) = (int) ubuf(_buf(myrecv,m++)).i;
- _image(i) = (imageint) ubuf(_buf(myrecv,m++)).i;
+ _tag(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _type(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _mask(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _image(i) = (imageint) d_ubuf(_buf(myrecv,m++)).i;
_q(i) = _buf(myrecv,m++);
- _molecule(i) = (tagint) ubuf(_buf(myrecv,m++)).i;
- _num_bond(i) = (int) ubuf(_buf(myrecv,m++)).i;
+ _molecule(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _num_bond(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
int k;
for (k = 0; k < _num_bond(i); k++) {
- _bond_type(i,k) = (int) ubuf(_buf(myrecv,m++)).i;
- _bond_atom(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
+ _bond_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _bond_atom(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _num_angle(i) = (int) ubuf(_buf(myrecv,m++)).i;
+ _num_angle(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
for (k = 0; k < _num_angle(i); k++) {
- _angle_type(i,k) = (int) ubuf(_buf(myrecv,m++)).i;
- _angle_atom1(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
- _angle_atom2(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
- _angle_atom3(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
+ _angle_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _angle_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _angle_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _angle_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _num_dihedral(i) = (int) ubuf(_buf(myrecv,m++)).i;
+ _num_dihedral(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
for (k = 0; k < _num_dihedral(i); k++) {
- _dihedral_type(i,k) = (int) ubuf(_buf(myrecv,m++)).i;
- _dihedral_atom1(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
- _dihedral_atom2(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
- _dihedral_atom3(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
- _dihedral_atom4(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
+ _dihedral_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _dihedral_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _dihedral_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _dihedral_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _dihedral_atom4(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _num_improper(i) = (int) ubuf(_buf(myrecv,m++)).i;
+ _num_improper(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
for (k = 0; k < _num_improper(i); k++) {
- _improper_type(i,k) = (int) ubuf(_buf(myrecv,m++)).i;
- _improper_atom1(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
- _improper_atom2(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
- _improper_atom3(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
- _improper_atom4(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
+ _improper_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _improper_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _improper_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _improper_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _improper_atom4(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _nspecial(i,0) = (int) ubuf(_buf(myrecv,m++)).i;
- _nspecial(i,1) = (int) ubuf(_buf(myrecv,m++)).i;
- _nspecial(i,2) = (int) ubuf(_buf(myrecv,m++)).i;
+ _nspecial(i,0) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _nspecial(i,1) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _nspecial(i,2) = (int) d_ubuf(_buf(myrecv,m++)).i;
for (k = 0; k < _nspecial(i,2); k++)
- _special(i,k) = (tagint) ubuf(_buf(myrecv,m++)).i;
+ _special(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,
int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,
ExecutionSpace space) {
const size_t elements = 20+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom+
5*atom->dihedral_per_atom + 5*atom->improper_per_atom;
if(space == Host) {
k_count.h_view(0) = nlocal;
AtomVecFullKokkos_UnpackExchangeFunctor<LMPHostType>
f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/elements,f);
LMPHostType::fence();
return k_count.h_view(0);
} else {
k_count.h_view(0) = nlocal;
k_count.modify<LMPHostType>();
k_count.sync<LMPDeviceType>();
AtomVecFullKokkos_UnpackExchangeFunctor<LMPDeviceType>
f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/elements,f);
LMPDeviceType::fence();
k_count.modify<LMPDeviceType>();
k_count.sync<LMPHostType>();
return k_count.h_view(0);
}
}
/* ---------------------------------------------------------------------- */
int AtomVecFullKokkos::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | Q_MASK | MOLECULE_MASK | BOND_MASK |
ANGLE_MASK | DIHEDRAL_MASK | IMPROPER_MASK | SPECIAL_MASK);
int k;
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_q(nlocal) = buf[m++];
h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;
h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_bond(nlocal); k++) {
h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_angle(nlocal); k++) {
h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_dihedral(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_dihedral(nlocal); k++) {
h_dihedral_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_dihedral_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_improper(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_improper(nlocal); k++) {
h_improper_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_improper_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_nspecial(nlocal,0) = (int) ubuf(buf[m++]).i;
h_nspecial(nlocal,1) = (int) ubuf(buf[m++]).i;
h_nspecial(nlocal,2) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_nspecial(nlocal,2); k++)
h_special(nlocal,k) = (tagint) ubuf(buf[m++]).i;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecFullKokkos::size_restart()
{
int i;
int nlocal = atom->nlocal;
int n = 0;
for (i = 0; i < nlocal; i++)
n += 17 + 2*num_bond[i] + 4*num_angle[i] +
5*num_dihedral[i] + 5*num_improper[i];
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_restart(int i, double *buf)
{
sync(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | Q_MASK | MOLECULE_MASK | BOND_MASK |
ANGLE_MASK | DIHEDRAL_MASK | IMPROPER_MASK | SPECIAL_MASK);
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = h_q(i);
buf[m++] = ubuf(h_molecule(i)).d;
buf[m++] = ubuf(h_num_bond(i)).d;
for (int k = 0; k < h_num_bond(i); k++) {
buf[m++] = ubuf(MAX(h_bond_type(i,k),-h_bond_type(i,k))).d;
buf[m++] = ubuf(h_bond_atom(i,k)).d;
}
buf[m++] = ubuf(h_num_angle(i)).d;
for (int k = 0; k < h_num_angle(i); k++) {
buf[m++] = ubuf(MAX(h_angle_type(i,k),-h_angle_type(i,k))).d;
buf[m++] = ubuf(h_angle_atom1(i,k)).d;
buf[m++] = ubuf(h_angle_atom2(i,k)).d;
buf[m++] = ubuf(h_angle_atom3(i,k)).d;
}
buf[m++] = ubuf(h_num_dihedral(i)).d;
for (int k = 0; k < h_num_dihedral(i); k++) {
buf[m++] = ubuf(MAX(h_dihedral_type(i,k),-h_dihedral_type(i,k))).d;
buf[m++] = ubuf(h_dihedral_atom1(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom2(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom3(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom4(i,k)).d;
}
buf[m++] = ubuf(h_num_improper(i)).d;
for (int k = 0; k < h_num_improper(i); k++) {
buf[m++] = ubuf(MAX(h_improper_type(i,k),-h_improper_type(i,k))).d;
buf[m++] = ubuf(h_improper_atom1(i,k)).d;
buf[m++] = ubuf(h_improper_atom2(i,k)).d;
buf[m++] = ubuf(h_improper_atom3(i,k)).d;
buf[m++] = ubuf(h_improper_atom4(i,k)).d;
}
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecFullKokkos::unpack_restart(double *buf)
{
int k;
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
sync(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | Q_MASK | MOLECULE_MASK | BOND_MASK |
ANGLE_MASK | DIHEDRAL_MASK | IMPROPER_MASK | SPECIAL_MASK);
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | Q_MASK | MOLECULE_MASK | BOND_MASK |
ANGLE_MASK | DIHEDRAL_MASK | IMPROPER_MASK | SPECIAL_MASK);
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_q(nlocal) = buf[m++];
h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;
h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_bond(nlocal); k++) {
h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_angle(nlocal); k++) {
h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_dihedral(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_dihedral(nlocal); k++) {
h_dihedral_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_dihedral_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_improper(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_improper(nlocal); k++) {
h_improper_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_improper_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecFullKokkos::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
atomKK->modified(Host,ALL_MASK);
grow(0);
}
atomKK->sync(Host,ALL_MASK);
atomKK->modified(Host,ALL_MASK);
tag[nlocal] = 0;
type[nlocal] = itype;
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_mask(nlocal) = 1;
h_image(nlocal) = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
h_q(nlocal) = 0.0;
h_molecule(nlocal) = 0;
h_num_bond(nlocal) = 0;
h_num_angle(nlocal) = 0;
h_num_dihedral(nlocal) = 0;
h_num_improper(nlocal) = 0;
h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecFullKokkos::data_atom(double *coord, imageint imagetmp,
char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
atomKK->modified(Host,ALL_MASK);
h_tag(nlocal) = atoi(values[0]);
h_molecule(nlocal) = atoi(values[1]);
h_type(nlocal) = atoi(values[2]);
if (h_type(nlocal) <= 0 || h_type(nlocal) > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
h_q(nlocal) = atof(values[3]);
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_image(nlocal) = imagetmp;
h_mask(nlocal) = 1;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
h_num_bond(nlocal) = 0;
h_num_angle(nlocal) = 0;
h_num_dihedral(nlocal) = 0;
h_num_improper(nlocal) = 0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecFullKokkos::data_atom_hybrid(int nlocal, char **values)
{
h_molecule(nlocal) = atoi(values[0]);
h_q(nlocal) = atof(values[1]);
h_num_bond(nlocal) = 0;
h_num_angle(nlocal) = 0;
h_num_dihedral(nlocal) = 0;
h_num_improper(nlocal) = 0;
return 2;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecFullKokkos::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = h_tag(i);
buf[i][1] = h_molecule(i);
buf[i][2] = h_type(i);
buf[i][3] = h_q(i);
buf[i][4] = h_x(i,0);
buf[i][5] = h_x(i,1);
buf[i][6] = h_x(i,2);
buf[i][7] = (h_image[i] & IMGMASK) - IMGMAX;
buf[i][8] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
buf[i][9] = (h_image[i] >> IMG2BITS) - IMGMAX;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecFullKokkos::pack_data_hybrid(int i, double *buf)
{
buf[0] = h_molecule(i);
buf[1] = h_q(i);
return 2;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecFullKokkos::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,"%d %d %d %-1.16e %-1.16e %-1.16e %-1.16e %d %d %d\n",
(int) buf[i][0],(int) buf[i][1], (int) buf[i][2], buf[i][3],
buf[i][4],buf[i][5],buf[i][6],
(int) buf[i][7],(int) buf[i][8],(int) buf[i][9]);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecFullKokkos::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," " TAGINT_FORMAT " %-1.16e",(tagint) ubuf(buf[0]).i,buf[1]);
return 2;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecFullKokkos::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);
if (atom->memcheck("q")) bytes += memory->usage(q,nmax);
if (atom->memcheck("molecule")) bytes += memory->usage(molecule,nmax);
if (atom->memcheck("nspecial")) bytes += memory->usage(nspecial,nmax,3);
if (atom->memcheck("special"))
bytes += memory->usage(special,nmax,atom->maxspecial);
if (atom->memcheck("num_bond")) bytes += memory->usage(num_bond,nmax);
if (atom->memcheck("bond_type"))
bytes += memory->usage(bond_type,nmax,atom->bond_per_atom);
if (atom->memcheck("bond_atom"))
bytes += memory->usage(bond_atom,nmax,atom->bond_per_atom);
if (atom->memcheck("num_angle")) bytes += memory->usage(num_angle,nmax);
if (atom->memcheck("angle_type"))
bytes += memory->usage(angle_type,nmax,atom->angle_per_atom);
if (atom->memcheck("angle_atom1"))
bytes += memory->usage(angle_atom1,nmax,atom->angle_per_atom);
if (atom->memcheck("angle_atom2"))
bytes += memory->usage(angle_atom2,nmax,atom->angle_per_atom);
if (atom->memcheck("angle_atom3"))
bytes += memory->usage(angle_atom3,nmax,atom->angle_per_atom);
if (atom->memcheck("num_dihedral")) bytes += memory->usage(num_dihedral,nmax);
if (atom->memcheck("dihedral_type"))
bytes += memory->usage(dihedral_type,nmax,atom->dihedral_per_atom);
if (atom->memcheck("dihedral_atom1"))
bytes += memory->usage(dihedral_atom1,nmax,atom->dihedral_per_atom);
if (atom->memcheck("dihedral_atom2"))
bytes += memory->usage(dihedral_atom2,nmax,atom->dihedral_per_atom);
if (atom->memcheck("dihedral_atom3"))
bytes += memory->usage(dihedral_atom3,nmax,atom->dihedral_per_atom);
if (atom->memcheck("dihedral_atom4"))
bytes += memory->usage(dihedral_atom4,nmax,atom->dihedral_per_atom);
if (atom->memcheck("num_improper")) bytes += memory->usage(num_improper,nmax);
if (atom->memcheck("improper_type"))
bytes += memory->usage(improper_type,nmax,atom->improper_per_atom);
if (atom->memcheck("improper_atom1"))
bytes += memory->usage(improper_atom1,nmax,atom->improper_per_atom);
if (atom->memcheck("improper_atom2"))
bytes += memory->usage(improper_atom2,nmax,atom->improper_per_atom);
if (atom->memcheck("improper_atom3"))
bytes += memory->usage(improper_atom3,nmax,atom->improper_per_atom);
if (atom->memcheck("improper_atom4"))
bytes += memory->usage(improper_atom4,nmax,atom->improper_per_atom);
return bytes;
}
/* ---------------------------------------------------------------------- */
void AtomVecFullKokkos::sync(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
if (mask & Q_MASK) atomKK->k_q.sync<LMPDeviceType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPDeviceType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.sync<LMPDeviceType>();
atomKK->k_special.sync<LMPDeviceType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.sync<LMPDeviceType>();
atomKK->k_bond_type.sync<LMPDeviceType>();
atomKK->k_bond_atom.sync<LMPDeviceType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.sync<LMPDeviceType>();
atomKK->k_angle_type.sync<LMPDeviceType>();
atomKK->k_angle_atom1.sync<LMPDeviceType>();
atomKK->k_angle_atom2.sync<LMPDeviceType>();
atomKK->k_angle_atom3.sync<LMPDeviceType>();
}
if (mask & DIHEDRAL_MASK) {
atomKK->k_num_dihedral.sync<LMPDeviceType>();
atomKK->k_dihedral_type.sync<LMPDeviceType>();
atomKK->k_dihedral_atom1.sync<LMPDeviceType>();
atomKK->k_dihedral_atom2.sync<LMPDeviceType>();
atomKK->k_dihedral_atom3.sync<LMPDeviceType>();
atomKK->k_dihedral_atom4.sync<LMPDeviceType>();
}
if (mask & IMPROPER_MASK) {
atomKK->k_num_improper.sync<LMPDeviceType>();
atomKK->k_improper_type.sync<LMPDeviceType>();
atomKK->k_improper_atom1.sync<LMPDeviceType>();
atomKK->k_improper_atom2.sync<LMPDeviceType>();
atomKK->k_improper_atom3.sync<LMPDeviceType>();
atomKK->k_improper_atom4.sync<LMPDeviceType>();
}
} else {
if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
if (mask & Q_MASK) atomKK->k_q.sync<LMPHostType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPHostType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.sync<LMPHostType>();
atomKK->k_special.sync<LMPHostType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.sync<LMPHostType>();
atomKK->k_bond_type.sync<LMPHostType>();
atomKK->k_bond_atom.sync<LMPHostType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.sync<LMPHostType>();
atomKK->k_angle_type.sync<LMPHostType>();
atomKK->k_angle_atom1.sync<LMPHostType>();
atomKK->k_angle_atom2.sync<LMPHostType>();
atomKK->k_angle_atom3.sync<LMPHostType>();
}
if (mask & DIHEDRAL_MASK) {
atomKK->k_num_dihedral.sync<LMPHostType>();
atomKK->k_dihedral_type.sync<LMPHostType>();
atomKK->k_dihedral_atom1.sync<LMPHostType>();
atomKK->k_dihedral_atom2.sync<LMPHostType>();
atomKK->k_dihedral_atom3.sync<LMPHostType>();
atomKK->k_dihedral_atom4.sync<LMPHostType>();
}
if (mask & IMPROPER_MASK) {
atomKK->k_num_improper.sync<LMPHostType>();
atomKK->k_improper_type.sync<LMPHostType>();
atomKK->k_improper_atom1.sync<LMPHostType>();
atomKK->k_improper_atom2.sync<LMPHostType>();
atomKK->k_improper_atom3.sync<LMPHostType>();
atomKK->k_improper_atom4.sync<LMPHostType>();
}
}
}
/* ---------------------------------------------------------------------- */
void AtomVecFullKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
if ((mask & Q_MASK) && atomKK->k_q.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_float_1d>(atomKK->k_q,space);
if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
if (mask & SPECIAL_MASK) {
if (atomKK->k_nspecial.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
if (atomKK->k_special.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
}
if (mask & BOND_MASK) {
if (atomKK->k_num_bond.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
if (atomKK->k_bond_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
if (atomKK->k_bond_atom.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
}
if (mask & ANGLE_MASK) {
if (atomKK->k_num_angle.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
if (atomKK->k_angle_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
if (atomKK->k_angle_atom1.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
if (atomKK->k_angle_atom2.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
if (atomKK->k_angle_atom3.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
}
if (mask & DIHEDRAL_MASK) {
if (atomKK->k_num_dihedral.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_dihedral,space);
if (atomKK->k_dihedral_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_dihedral_type,space);
if (atomKK->k_dihedral_atom1.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom1,space);
if (atomKK->k_dihedral_atom2.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom2,space);
if (atomKK->k_dihedral_atom3.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom3,space);
}
if (mask & IMPROPER_MASK) {
if (atomKK->k_num_improper.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_improper,space);
if (atomKK->k_improper_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_improper_type,space);
if (atomKK->k_improper_atom1.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom1,space);
if (atomKK->k_improper_atom2.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom2,space);
if (atomKK->k_improper_atom3.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom3,space);
if (atomKK->k_improper_atom4.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom4,space);
}
} else {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
if ((mask & Q_MASK) && atomKK->k_q.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_float_1d>(atomKK->k_q,space);
if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
if (mask & SPECIAL_MASK) {
if (atomKK->k_nspecial.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
if (atomKK->k_special.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
}
if (mask & BOND_MASK) {
if (atomKK->k_num_bond.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
if (atomKK->k_bond_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
if (atomKK->k_bond_atom.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
}
if (mask & ANGLE_MASK) {
if (atomKK->k_num_angle.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
if (atomKK->k_angle_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
if (atomKK->k_angle_atom1.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
if (atomKK->k_angle_atom2.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
if (atomKK->k_angle_atom3.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
}
if (mask & DIHEDRAL_MASK) {
if (atomKK->k_num_dihedral.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_dihedral,space);
if (atomKK->k_dihedral_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_dihedral_type,space);
if (atomKK->k_dihedral_atom1.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom1,space);
if (atomKK->k_dihedral_atom2.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom2,space);
if (atomKK->k_dihedral_atom3.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom3,space);
if (atomKK->k_dihedral_atom4.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom4,space);
}
if (mask & IMPROPER_MASK) {
if (atomKK->k_num_improper.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_improper,space);
if (atomKK->k_improper_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_improper_type,space);
if (atomKK->k_improper_atom1.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom1,space);
if (atomKK->k_improper_atom2.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom2,space);
if (atomKK->k_improper_atom3.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom3,space);
if (atomKK->k_improper_atom4.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom4,space);
}
}
}
/* ---------------------------------------------------------------------- */
void AtomVecFullKokkos::modified(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
if (mask & Q_MASK) atomKK->k_q.modify<LMPDeviceType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPDeviceType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.modify<LMPDeviceType>();
atomKK->k_special.modify<LMPDeviceType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.modify<LMPDeviceType>();
atomKK->k_bond_type.modify<LMPDeviceType>();
atomKK->k_bond_atom.modify<LMPDeviceType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.modify<LMPDeviceType>();
atomKK->k_angle_type.modify<LMPDeviceType>();
atomKK->k_angle_atom1.modify<LMPDeviceType>();
atomKK->k_angle_atom2.modify<LMPDeviceType>();
atomKK->k_angle_atom3.modify<LMPDeviceType>();
}
if (mask & DIHEDRAL_MASK) {
atomKK->k_num_dihedral.modify<LMPDeviceType>();
atomKK->k_dihedral_type.modify<LMPDeviceType>();
atomKK->k_dihedral_atom1.modify<LMPDeviceType>();
atomKK->k_dihedral_atom2.modify<LMPDeviceType>();
atomKK->k_dihedral_atom3.modify<LMPDeviceType>();
atomKK->k_dihedral_atom4.modify<LMPDeviceType>();
}
if (mask & IMPROPER_MASK) {
atomKK->k_num_improper.modify<LMPDeviceType>();
atomKK->k_improper_type.modify<LMPDeviceType>();
atomKK->k_improper_atom1.modify<LMPDeviceType>();
atomKK->k_improper_atom2.modify<LMPDeviceType>();
atomKK->k_improper_atom3.modify<LMPDeviceType>();
atomKK->k_improper_atom4.modify<LMPDeviceType>();
}
} else {
if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
if (mask & Q_MASK) atomKK->k_q.modify<LMPHostType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPHostType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.modify<LMPHostType>();
atomKK->k_special.modify<LMPHostType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.modify<LMPHostType>();
atomKK->k_bond_type.modify<LMPHostType>();
atomKK->k_bond_atom.modify<LMPHostType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.modify<LMPHostType>();
atomKK->k_angle_type.modify<LMPHostType>();
atomKK->k_angle_atom1.modify<LMPHostType>();
atomKK->k_angle_atom2.modify<LMPHostType>();
atomKK->k_angle_atom3.modify<LMPHostType>();
}
if (mask & DIHEDRAL_MASK) {
atomKK->k_num_dihedral.modify<LMPHostType>();
atomKK->k_dihedral_type.modify<LMPHostType>();
atomKK->k_dihedral_atom1.modify<LMPHostType>();
atomKK->k_dihedral_atom2.modify<LMPHostType>();
atomKK->k_dihedral_atom3.modify<LMPHostType>();
atomKK->k_dihedral_atom4.modify<LMPHostType>();
}
if (mask & IMPROPER_MASK) {
atomKK->k_num_improper.modify<LMPHostType>();
atomKK->k_improper_type.modify<LMPHostType>();
atomKK->k_improper_atom1.modify<LMPHostType>();
atomKK->k_improper_atom2.modify<LMPHostType>();
atomKK->k_improper_atom3.modify<LMPHostType>();
atomKK->k_improper_atom4.modify<LMPHostType>();
}
}
}
diff --git a/src/KOKKOS/atom_vec_kokkos.h b/src/KOKKOS/atom_vec_kokkos.h
index 7ac66f162..7f593f235 100644
--- a/src/KOKKOS/atom_vec_kokkos.h
+++ b/src/KOKKOS/atom_vec_kokkos.h
@@ -1,155 +1,166 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifndef LMP_ATOM_VEC_KOKKOS_H
#define LMP_ATOM_VEC_KOKKOS_H
#include "atom_vec.h"
#include "kokkos_type.h"
#include <type_traits>
namespace LAMMPS_NS {
+union d_ubuf {
+ double d;
+ int64_t i;
+ KOKKOS_INLINE_FUNCTION
+ d_ubuf(double arg) : d(arg) {}
+ KOKKOS_INLINE_FUNCTION
+ d_ubuf(int64_t arg) : i(arg) {}
+ KOKKOS_INLINE_FUNCTION
+ d_ubuf(int arg) : i(arg) {}
+};
+
class AtomVecKokkos : public AtomVec {
public:
AtomVecKokkos(class LAMMPS *);
virtual ~AtomVecKokkos() {}
virtual void sync(ExecutionSpace space, unsigned int mask) = 0;
virtual void modified(ExecutionSpace space, unsigned int mask) = 0;
virtual void sync_overlapping_device(ExecutionSpace space, unsigned int mask) {};
virtual int
pack_comm_self(const int &n, const DAT::tdual_int_2d &list,
const int & iswap, const int nfirst,
const int &pbc_flag, const int pbc[]) = 0;
//{return 0;}
virtual int
pack_comm_kokkos(const int &n, const DAT::tdual_int_2d &list,
const int & iswap, const DAT::tdual_xfloat_2d &buf,
const int &pbc_flag, const int pbc[]) = 0;
//{return 0;}
virtual void
unpack_comm_kokkos(const int &n, const int &nfirst,
const DAT::tdual_xfloat_2d &buf) = 0;
virtual int
pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist,
DAT::tdual_xfloat_2d buf,int iswap,
int pbc_flag, int *pbc, ExecutionSpace space) = 0;
//{return 0;};
virtual void
unpack_border_kokkos(const int &n, const int &nfirst,
const DAT::tdual_xfloat_2d &buf,
ExecutionSpace space) = 0;
virtual int
pack_exchange_kokkos(const int &nsend, DAT::tdual_xfloat_2d &buf,
DAT::tdual_int_1d k_sendlist,
DAT::tdual_int_1d k_copylist,
ExecutionSpace space, int dim, X_FLOAT lo, X_FLOAT hi) = 0;
//{return 0;};
virtual int
unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf, int nrecv,
int nlocal, int dim, X_FLOAT lo, X_FLOAT hi,
ExecutionSpace space) = 0;
//{return 0;};
protected:
class CommKokkos *commKK;
size_t buffer_size;
void* buffer;
#ifdef KOKKOS_HAVE_CUDA
template<class ViewType>
Kokkos::View<typename ViewType::data_type,
typename ViewType::array_layout,
Kokkos::CudaHostPinnedSpace,
Kokkos::MemoryTraits<Kokkos::Unmanaged> >
create_async_copy(const ViewType& src) {
typedef Kokkos::View<typename ViewType::data_type,
typename ViewType::array_layout,
typename std::conditional<
std::is_same<typename ViewType::execution_space,LMPDeviceType>::value,
Kokkos::CudaHostPinnedSpace,typename ViewType::memory_space>::type,
Kokkos::MemoryTraits<Kokkos::Unmanaged> > mirror_type;
if (buffer_size == 0) {
buffer = Kokkos::kokkos_malloc<Kokkos::CudaHostPinnedSpace>(src.capacity());
buffer_size = src.capacity();
} else if (buffer_size < src.capacity()) {
buffer = Kokkos::kokkos_realloc<Kokkos::CudaHostPinnedSpace>(buffer,src.capacity());
buffer_size = src.capacity();
}
return mirror_type( buffer ,
src.dimension_0() ,
src.dimension_1() ,
src.dimension_2() ,
src.dimension_3() ,
src.dimension_4() ,
src.dimension_5() ,
src.dimension_6() ,
src.dimension_7() );
}
template<class ViewType>
void perform_async_copy(const ViewType& src, unsigned int space) {
typedef Kokkos::View<typename ViewType::data_type,
typename ViewType::array_layout,
typename std::conditional<
std::is_same<typename ViewType::execution_space,LMPDeviceType>::value,
Kokkos::CudaHostPinnedSpace,typename ViewType::memory_space>::type,
Kokkos::MemoryTraits<Kokkos::Unmanaged> > mirror_type;
if (buffer_size == 0) {
buffer = Kokkos::kokkos_malloc<Kokkos::CudaHostPinnedSpace>(src.capacity()*sizeof(typename ViewType::value_type));
buffer_size = src.capacity();
} else if (buffer_size < src.capacity()) {
buffer = Kokkos::kokkos_realloc<Kokkos::CudaHostPinnedSpace>(buffer,src.capacity()*sizeof(typename ViewType::value_type));
buffer_size = src.capacity();
}
mirror_type tmp_view( (typename ViewType::value_type*)buffer ,
src.dimension_0() ,
src.dimension_1() ,
src.dimension_2() ,
src.dimension_3() ,
src.dimension_4() ,
src.dimension_5() ,
src.dimension_6() ,
src.dimension_7() );
if(space == Device) {
Kokkos::deep_copy(LMPHostType(),tmp_view,src.h_view),
Kokkos::deep_copy(LMPHostType(),src.d_view,tmp_view);
src.modified_device() = src.modified_host();
} else {
Kokkos::deep_copy(LMPHostType(),tmp_view,src.d_view),
Kokkos::deep_copy(LMPHostType(),src.h_view,tmp_view);
src.modified_device() = src.modified_host();
}
}
#else
template<class ViewType>
void perform_async_copy(ViewType& src, unsigned int space) {
if(space == Device)
src.template sync<LMPDeviceType>();
else
src.template sync<LMPHostType>();
}
#endif
};
}
#endif
/* ERROR/WARNING messages:
*/
diff --git a/src/KOKKOS/atom_vec_molecular_kokkos.cpp b/src/KOKKOS/atom_vec_molecular_kokkos.cpp
index 4fd811437..5c16ac151 100644
--- a/src/KOKKOS/atom_vec_molecular_kokkos.cpp
+++ b/src/KOKKOS/atom_vec_molecular_kokkos.cpp
@@ -1,2386 +1,2386 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <stdlib.h>
#include "atom_vec_molecular_kokkos.h"
#include "atom_kokkos.h"
#include "comm_kokkos.h"
#include "domain.h"
#include "modify.h"
#include "fix.h"
#include "atom_masks.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
#define DELTA 10000
/* ---------------------------------------------------------------------- */
AtomVecMolecularKokkos::AtomVecMolecularKokkos(LAMMPS *lmp) : AtomVecKokkos(lmp)
{
molecular = 1;
bonds_allow = angles_allow = dihedrals_allow = impropers_allow = 1;
mass_type = 1;
comm_x_only = comm_f_only = 1;
size_forward = 3;
size_reverse = 3;
size_border = 7;
size_velocity = 3;
size_data_atom = 6;
size_data_vel = 4;
xcol_data = 4;
atom->molecule_flag = 1;
k_count = DAT::tdual_int_1d("atom::k_count",1);
atomKK = (AtomKokkos *) atom;
commKK = (CommKokkos *) comm;
}
/* ----------------------------------------------------------------------
grow atom arrays
n = 0 grows arrays by DELTA
n > 0 allocates arrays to size n
------------------------------------------------------------------------- */
void AtomVecMolecularKokkos::grow(int n)
{
if (n == 0) nmax += DELTA;
else nmax = n;
atomKK->nmax = nmax;
if (nmax < 0 || nmax > MAXSMALLINT)
error->one(FLERR,"Per-processor system is too big");
sync(Device,ALL_MASK);
modified(Device,ALL_MASK);
memory->grow_kokkos(atomKK->k_tag,atomKK->tag,nmax,"atom:tag");
memory->grow_kokkos(atomKK->k_type,atomKK->type,nmax,"atom:type");
memory->grow_kokkos(atomKK->k_mask,atomKK->mask,nmax,"atom:mask");
memory->grow_kokkos(atomKK->k_image,atomKK->image,nmax,"atom:image");
memory->grow_kokkos(atomKK->k_x,atomKK->x,nmax,3,"atom:x");
memory->grow_kokkos(atomKK->k_v,atomKK->v,nmax,3,"atom:v");
memory->grow_kokkos(atomKK->k_f,atomKK->f,nmax,3,"atom:f");
memory->grow_kokkos(atomKK->k_molecule,atomKK->molecule,nmax,"atom:molecule");
memory->grow_kokkos(atomKK->k_nspecial,atomKK->nspecial,nmax,3,"atom:nspecial");
memory->grow_kokkos(atomKK->k_special,atomKK->special,nmax,atomKK->maxspecial,
"atom:special");
memory->grow_kokkos(atomKK->k_num_bond,atomKK->num_bond,nmax,"atom:num_bond");
memory->grow_kokkos(atomKK->k_bond_type,atomKK->bond_type,nmax,atomKK->bond_per_atom,
"atom:bond_type");
memory->grow_kokkos(atomKK->k_bond_atom,atomKK->bond_atom,nmax,atomKK->bond_per_atom,
"atom:bond_atom");
memory->grow_kokkos(atomKK->k_num_angle,atomKK->num_angle,nmax,"atom:num_angle");
memory->grow_kokkos(atomKK->k_angle_type,atomKK->angle_type,nmax,atomKK->angle_per_atom,
"atom:angle_type");
memory->grow_kokkos(atomKK->k_angle_atom1,atomKK->angle_atom1,nmax,atomKK->angle_per_atom,
"atom:angle_atom1");
memory->grow_kokkos(atomKK->k_angle_atom2,atomKK->angle_atom2,nmax,atomKK->angle_per_atom,
"atom:angle_atom2");
memory->grow_kokkos(atomKK->k_angle_atom3,atomKK->angle_atom3,nmax,atomKK->angle_per_atom,
"atom:angle_atom3");
memory->grow_kokkos(atomKK->k_num_dihedral,atomKK->num_dihedral,nmax,"atom:num_dihedral");
memory->grow_kokkos(atomKK->k_dihedral_type,atomKK->dihedral_type,nmax,
atomKK->dihedral_per_atom,"atom:dihedral_type");
memory->grow_kokkos(atomKK->k_dihedral_atom1,atomKK->dihedral_atom1,nmax,
atomKK->dihedral_per_atom,"atom:dihedral_atom1");
memory->grow_kokkos(atomKK->k_dihedral_atom2,atomKK->dihedral_atom2,nmax,
atomKK->dihedral_per_atom,"atom:dihedral_atom2");
memory->grow_kokkos(atomKK->k_dihedral_atom3,atomKK->dihedral_atom3,nmax,
atomKK->dihedral_per_atom,"atom:dihedral_atom3");
memory->grow_kokkos(atomKK->k_dihedral_atom4,atomKK->dihedral_atom4,nmax,
atomKK->dihedral_per_atom,"atom:dihedral_atom4");
memory->grow_kokkos(atomKK->k_num_improper,atomKK->num_improper,nmax,"atom:num_improper");
memory->grow_kokkos(atomKK->k_improper_type,atomKK->improper_type,nmax,
atomKK->improper_per_atom,"atom:improper_type");
memory->grow_kokkos(atomKK->k_improper_atom1,atomKK->improper_atom1,nmax,
atomKK->improper_per_atom,"atom:improper_atom1");
memory->grow_kokkos(atomKK->k_improper_atom2,atomKK->improper_atom2,nmax,
atomKK->improper_per_atom,"atom:improper_atom2");
memory->grow_kokkos(atomKK->k_improper_atom3,atomKK->improper_atom3,nmax,
atomKK->improper_per_atom,"atom:improper_atom3");
memory->grow_kokkos(atomKK->k_improper_atom4,atomKK->improper_atom4,nmax,
atomKK->improper_per_atom,"atom:improper_atom4");
grow_reset();
sync(Host,ALL_MASK);
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->grow_arrays(nmax);
}
/* ----------------------------------------------------------------------
reset local array ptrs
------------------------------------------------------------------------- */
void AtomVecMolecularKokkos::grow_reset()
{
tag = atomKK->tag;
d_tag = atomKK->k_tag.d_view;
h_tag = atomKK->k_tag.h_view;
type = atomKK->type;
d_type = atomKK->k_type.d_view;
h_type = atomKK->k_type.h_view;
mask = atomKK->mask;
d_mask = atomKK->k_mask.d_view;
h_mask = atomKK->k_mask.h_view;
image = atomKK->image;
d_image = atomKK->k_image.d_view;
h_image = atomKK->k_image.h_view;
x = atomKK->x;
d_x = atomKK->k_x.d_view;
h_x = atomKK->k_x.h_view;
v = atomKK->v;
d_v = atomKK->k_v.d_view;
h_v = atomKK->k_v.h_view;
f = atomKK->f;
d_f = atomKK->k_f.d_view;
h_f = atomKK->k_f.h_view;
molecule = atomKK->molecule;
d_molecule = atomKK->k_molecule.d_view;
h_molecule = atomKK->k_molecule.h_view;
nspecial = atomKK->nspecial;
d_nspecial = atomKK->k_nspecial.d_view;
h_nspecial = atomKK->k_nspecial.h_view;
special = atomKK->special;
d_special = atomKK->k_special.d_view;
h_special = atomKK->k_special.h_view;
num_bond = atomKK->num_bond;
d_num_bond = atomKK->k_num_bond.d_view;
h_num_bond = atomKK->k_num_bond.h_view;
bond_type = atomKK->bond_type;
d_bond_type = atomKK->k_bond_type.d_view;
h_bond_type = atomKK->k_bond_type.h_view;
bond_atom = atomKK->bond_atom;
d_bond_atom = atomKK->k_bond_atom.d_view;
h_bond_atom = atomKK->k_bond_atom.h_view;
num_angle = atomKK->num_angle;
d_num_angle = atomKK->k_num_angle.d_view;
h_num_angle = atomKK->k_num_angle.h_view;
angle_type = atomKK->angle_type;
d_angle_type = atomKK->k_angle_type.d_view;
h_angle_type = atomKK->k_angle_type.h_view;
angle_atom1 = atomKK->angle_atom1;
d_angle_atom1 = atomKK->k_angle_atom1.d_view;
h_angle_atom1 = atomKK->k_angle_atom1.h_view;
angle_atom2 = atomKK->angle_atom2;
d_angle_atom2 = atomKK->k_angle_atom2.d_view;
h_angle_atom2 = atomKK->k_angle_atom2.h_view;
angle_atom3 = atomKK->angle_atom3;
d_angle_atom3 = atomKK->k_angle_atom3.d_view;
h_angle_atom3 = atomKK->k_angle_atom3.h_view;
num_dihedral = atomKK->num_dihedral;
d_num_dihedral = atomKK->k_num_dihedral.d_view;
h_num_dihedral = atomKK->k_num_dihedral.h_view;
dihedral_type = atomKK->dihedral_type;
d_dihedral_type = atomKK->k_dihedral_type.d_view;
h_dihedral_type = atomKK->k_dihedral_type.h_view;
dihedral_atom1 = atomKK->dihedral_atom1;
d_dihedral_atom1 = atomKK->k_dihedral_atom1.d_view;
h_dihedral_atom1 = atomKK->k_dihedral_atom1.h_view;
dihedral_atom2 = atomKK->dihedral_atom2;
d_dihedral_atom2 = atomKK->k_dihedral_atom2.d_view;
h_dihedral_atom2 = atomKK->k_dihedral_atom2.h_view;
dihedral_atom3 = atomKK->dihedral_atom3;
d_dihedral_atom3 = atomKK->k_dihedral_atom3.d_view;
h_dihedral_atom3 = atomKK->k_dihedral_atom3.h_view;
dihedral_atom4 = atomKK->dihedral_atom4;
d_dihedral_atom4 = atomKK->k_dihedral_atom4.d_view;
h_dihedral_atom4 = atomKK->k_dihedral_atom4.h_view;
num_improper = atomKK->num_improper;
d_num_improper = atomKK->k_num_improper.d_view;
h_num_improper = atomKK->k_num_improper.h_view;
improper_type = atomKK->improper_type;
d_improper_type = atomKK->k_improper_type.d_view;
h_improper_type = atomKK->k_improper_type.h_view;
improper_atom1 = atomKK->improper_atom1;
d_improper_atom1 = atomKK->k_improper_atom1.d_view;
h_improper_atom1 = atomKK->k_improper_atom1.h_view;
improper_atom2 = atomKK->improper_atom2;
d_improper_atom2 = atomKK->k_improper_atom2.d_view;
h_improper_atom2 = atomKK->k_improper_atom2.h_view;
improper_atom3 = atomKK->improper_atom3;
d_improper_atom3 = atomKK->k_improper_atom3.d_view;
h_improper_atom3 = atomKK->k_improper_atom3.h_view;
improper_atom4 = atomKK->improper_atom4;
d_improper_atom4 = atomKK->k_improper_atom4.d_view;
h_improper_atom4 = atomKK->k_improper_atom4.h_view;
}
/* ----------------------------------------------------------------------
copy atom I info to atom J
------------------------------------------------------------------------- */
void AtomVecMolecularKokkos::copy(int i, int j, int delflag)
{
int k;
h_tag[j] = h_tag[i];
h_type[j] = h_type[i];
mask[j] = mask[i];
h_image[j] = h_image[i];
h_x(j,0) = h_x(i,0);
h_x(j,1) = h_x(i,1);
h_x(j,2) = h_x(i,2);
h_v(j,0) = h_v(i,0);
h_v(j,1) = h_v(i,1);
h_v(j,2) = h_v(i,2);
h_molecule(j) = h_molecule(i);
h_num_bond(j) = h_num_bond(i);
for (k = 0; k < h_num_bond(j); k++) {
h_bond_type(j,k) = h_bond_type(i,k);
h_bond_atom(j,k) = h_bond_atom(i,k);
}
h_nspecial(j,0) = h_nspecial(i,0);
h_nspecial(j,1) = h_nspecial(i,1);
h_nspecial(j,2) = h_nspecial(i,2);
for (k = 0; k < h_nspecial(j,2); k++)
h_special(j,k) = h_special(i,k);
h_num_angle(j) = h_num_angle(i);
for (k = 0; k < h_num_angle(j); k++) {
h_angle_type(j,k) = h_angle_type(i,k);
h_angle_atom1(j,k) = h_angle_atom1(i,k);
h_angle_atom2(j,k) = h_angle_atom2(i,k);
h_angle_atom3(j,k) = h_angle_atom3(i,k);
}
h_num_dihedral(j) = h_num_dihedral(i);
for (k = 0; k < h_num_dihedral(j); k++) {
h_dihedral_type(j,k) = h_dihedral_type(i,k);
h_dihedral_atom1(j,k) = h_dihedral_atom1(i,k);
h_dihedral_atom2(j,k) = h_dihedral_atom2(i,k);
h_dihedral_atom3(j,k) = h_dihedral_atom3(i,k);
h_dihedral_atom4(j,k) = h_dihedral_atom4(i,k);
}
h_num_improper(j) = h_num_improper(i);
for (k = 0; k < h_num_improper(j); k++) {
h_improper_type(j,k) = h_improper_type(i,k);
h_improper_atom1(j,k) = h_improper_atom1(i,k);
h_improper_atom2(j,k) = h_improper_atom2(i,k);
h_improper_atom3(j,k) = h_improper_atom3(i,k);
h_improper_atom4(j,k) = h_improper_atom4(i,k);
}
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
modify->fix[atom->extra_grow[iextra]]->copy_arrays(i,j,delflag);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecMolecularKokkos_PackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_um _buf;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecMolecularKokkos_PackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
const size_t maxsend = (buf.view<DeviceType>().dimension_0()
*buf.view<DeviceType>().dimension_1())/3;
const size_t elements = 3;
buffer_view<DeviceType>(_buf,buf,maxsend,elements);
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_buf(i,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_buf(i,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_buf(i,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_comm_kokkos(const int &n,
const DAT::tdual_int_2d &list,
const int & iswap,
const DAT::tdual_xfloat_2d &buf,
const int &pbc_flag,
const int* const pbc)
{
// Check whether to always run forward communication on the host
// Choose correct forward PackComm kernel
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecMolecularKokkos_PackComm<LMPHostType,1,1>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecMolecularKokkos_PackComm<LMPHostType,1,0>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecMolecularKokkos_PackComm<LMPHostType,0,1>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecMolecularKokkos_PackComm<LMPHostType,0,0>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecMolecularKokkos_PackComm<LMPDeviceType,1,1>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecMolecularKokkos_PackComm<LMPDeviceType,1,0>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecMolecularKokkos_PackComm<LMPDeviceType,0,1>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecMolecularKokkos_PackComm<LMPDeviceType,0,0>
f(atomKK->k_x,buf,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*size_forward;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG,int TRICLINIC>
struct AtomVecMolecularKokkos_PackCommSelf {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array_randomread _x;
typename ArrayTypes<DeviceType>::t_x_array _xw;
int _nfirst;
typename ArrayTypes<DeviceType>::t_int_2d_const _list;
const int _iswap;
X_FLOAT _xprd,_yprd,_zprd,_xy,_xz,_yz;
X_FLOAT _pbc[6];
AtomVecMolecularKokkos_PackCommSelf(
const typename DAT::tdual_x_array &x,
const int &nfirst,
const typename DAT::tdual_int_2d &list,
const int & iswap,
const X_FLOAT &xprd, const X_FLOAT &yprd, const X_FLOAT &zprd,
const X_FLOAT &xy, const X_FLOAT &xz, const X_FLOAT &yz, const int* const pbc):
_x(x.view<DeviceType>()),_xw(x.view<DeviceType>()),_nfirst(nfirst),
_list(list.view<DeviceType>()),_iswap(iswap),
_xprd(xprd),_yprd(yprd),_zprd(zprd),
_xy(xy),_xz(xz),_yz(yz) {
_pbc[0] = pbc[0]; _pbc[1] = pbc[1]; _pbc[2] = pbc[2];
_pbc[3] = pbc[3]; _pbc[4] = pbc[4]; _pbc[5] = pbc[5];
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_xw(i+_nfirst,0) = _x(j,0);
_xw(i+_nfirst,1) = _x(j,1);
_xw(i+_nfirst,2) = _x(j,2);
} else {
if (TRICLINIC == 0) {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
} else {
_xw(i+_nfirst,0) = _x(j,0) + _pbc[0]*_xprd + _pbc[5]*_xy + _pbc[4]*_xz;
_xw(i+_nfirst,1) = _x(j,1) + _pbc[1]*_yprd + _pbc[3]*_yz;
_xw(i+_nfirst,2) = _x(j,2) + _pbc[2]*_zprd;
}
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_comm_self(const int &n, const DAT::tdual_int_2d &list,
const int & iswap,
const int nfirst, const int &pbc_flag,
const int* const pbc) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecMolecularKokkos_PackCommSelf<LMPHostType,1,1>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecMolecularKokkos_PackCommSelf<LMPHostType,1,0>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecMolecularKokkos_PackCommSelf<LMPHostType,0,1>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecMolecularKokkos_PackCommSelf<LMPHostType,0,0>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPHostType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
if(pbc_flag) {
if(domain->triclinic) {
struct AtomVecMolecularKokkos_PackCommSelf<LMPDeviceType,1,1>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecMolecularKokkos_PackCommSelf<LMPDeviceType,1,0>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
} else {
if(domain->triclinic) {
struct AtomVecMolecularKokkos_PackCommSelf<LMPDeviceType,0,1>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
} else {
struct AtomVecMolecularKokkos_PackCommSelf<LMPDeviceType,0,0>
f(atomKK->k_x,nfirst,list,iswap,domain->xprd,domain->yprd,domain->zprd,
domain->xy,domain->xz,domain->yz,pbc);
Kokkos::parallel_for(n,f);
}
}
LMPDeviceType::fence();
}
return n*3;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecMolecularKokkos_UnpackComm {
typedef DeviceType device_type;
typename ArrayTypes<DeviceType>::t_x_array _x;
typename ArrayTypes<DeviceType>::t_xfloat_2d_const _buf;
int _first;
AtomVecMolecularKokkos_UnpackComm(
const typename DAT::tdual_x_array &x,
const typename DAT::tdual_xfloat_2d &buf,
const int& first):_x(x.view<DeviceType>()),_buf(buf.view<DeviceType>()),
_first(first) {};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
}
};
/* ---------------------------------------------------------------------- */
void AtomVecMolecularKokkos::unpack_comm_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf ) {
if(commKK->forward_comm_on_host) {
sync(Host,X_MASK);
modified(Host,X_MASK);
struct AtomVecMolecularKokkos_UnpackComm<LMPHostType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
} else {
sync(Device,X_MASK);
modified(Device,X_MASK);
struct AtomVecMolecularKokkos_UnpackComm<LMPDeviceType> f(atomKK->k_x,buf,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_comm_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecMolecularKokkos::unpack_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void AtomVecMolecularKokkos::unpack_comm_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_reverse(int n, int first, double *buf)
{
if(n > 0)
sync(Host,F_MASK);
int m = 0;
const int last = first + n;
for (int i = first; i < last; i++) {
buf[m++] = h_f(i,0);
buf[m++] = h_f(i,1);
buf[m++] = h_f(i,2);
}
return m;
}
/* ---------------------------------------------------------------------- */
void AtomVecMolecularKokkos::unpack_reverse(int n, int *list, double *buf)
{
if(n > 0)
modified(Host,F_MASK);
int m = 0;
for (int i = 0; i < n; i++) {
const int j = list[i];
h_f(j,0) += buf[m++];
h_f(j,1) += buf[m++];
h_f(j,2) += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType,int PBC_FLAG>
struct AtomVecMolecularKokkos_PackBorder {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_xfloat_2d _buf;
const typename AT::t_int_2d_const _list;
const int _iswap;
const typename AT::t_x_array_randomread _x;
const typename AT::t_tagint_1d _tag;
const typename AT::t_int_1d _type;
const typename AT::t_int_1d _mask;
const typename AT::t_tagint_1d _molecule;
X_FLOAT _dx,_dy,_dz;
AtomVecMolecularKokkos_PackBorder(
const typename AT::t_xfloat_2d &buf,
const typename AT::t_int_2d_const &list,
const int & iswap,
const typename AT::t_x_array &x,
const typename AT::t_tagint_1d &tag,
const typename AT::t_int_1d &type,
const typename AT::t_int_1d &mask,
const typename AT::t_tagint_1d &molecule,
const X_FLOAT &dx, const X_FLOAT &dy, const X_FLOAT &dz):
_buf(buf),_list(list),_iswap(iswap),
_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
_dx(dx),_dy(dy),_dz(dz) {}
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
const int j = _list(_iswap,i);
if (PBC_FLAG == 0) {
_buf(i,0) = _x(j,0);
_buf(i,1) = _x(j,1);
_buf(i,2) = _x(j,2);
- _buf(i,3) = _tag(j);
- _buf(i,4) = _type(j);
- _buf(i,5) = _mask(j);
- _buf(i,6) = _molecule(j);
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
+ _buf(i,6) = d_ubuf(_molecule(j)).d;
} else {
_buf(i,0) = _x(j,0) + _dx;
_buf(i,1) = _x(j,1) + _dy;
_buf(i,2) = _x(j,2) + _dz;
- _buf(i,3) = _tag(j);
- _buf(i,4) = _type(j);
- _buf(i,5) = _mask(j);
- _buf(i,6) = _molecule(j);
+ _buf(i,3) = d_ubuf(_tag(j)).d;
+ _buf(i,4) = d_ubuf(_type(j)).d;
+ _buf(i,5) = d_ubuf(_mask(j)).d;
+ _buf(i,6) = d_ubuf(_molecule(j)).d;
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_border_kokkos(int n, DAT::tdual_int_2d k_sendlist,
DAT::tdual_xfloat_2d buf,int iswap,
int pbc_flag, int *pbc, ExecutionSpace space)
{
X_FLOAT dx,dy,dz;
if (pbc_flag != 0) {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if(space==Host) {
AtomVecMolecularKokkos_PackBorder<LMPHostType,1> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecMolecularKokkos_PackBorder<LMPDeviceType,1> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
} else {
dx = dy = dz = 0;
if(space==Host) {
AtomVecMolecularKokkos_PackBorder<LMPHostType,0> f(
buf.view<LMPHostType>(), k_sendlist.view<LMPHostType>(),
iswap,h_x,h_tag,h_type,h_mask,h_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
AtomVecMolecularKokkos_PackBorder<LMPDeviceType,0> f(
buf.view<LMPDeviceType>(), k_sendlist.view<LMPDeviceType>(),
iswap,d_x,d_tag,d_type,d_mask,d_molecule,dx,dy,dz);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
return n*size_border;
}
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_border(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_border_vel(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz,dvx,dvy,dvz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0);
buf[m++] = h_x(j,1);
buf[m++] = h_x(j,2);
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0];
dy = pbc[1];
dz = pbc[2];
}
if (!deform_vremap) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
} else {
dvx = pbc[0]*h_rate[0] + pbc[5]*h_rate[5] + pbc[4]*h_rate[4];
dvy = pbc[1]*h_rate[1] + pbc[3]*h_rate[3];
dvz = pbc[2]*h_rate[2];
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_x(j,0) + dx;
buf[m++] = h_x(j,1) + dy;
buf[m++] = h_x(j,2) + dz;
buf[m++] = ubuf(h_tag(j)).d;
buf[m++] = ubuf(h_type(j)).d;
buf[m++] = ubuf(h_mask(j)).d;
buf[m++] = ubuf(h_molecule(j)).d;
if (mask[i] & deform_groupbit) {
buf[m++] = h_v(j,0) + dvx;
buf[m++] = h_v(j,1) + dvy;
buf[m++] = h_v(j,2) + dvz;
} else {
buf[m++] = h_v(j,0);
buf[m++] = h_v(j,1);
buf[m++] = h_v(j,2);
}
}
}
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->pack_border(n,list,&buf[m]);
return m;
}
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_border_hybrid(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = h_molecule(j);
}
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecMolecularKokkos_UnpackBorder {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
const typename AT::t_xfloat_2d_const _buf;
typename AT::t_x_array _x;
typename AT::t_tagint_1d _tag;
typename AT::t_int_1d _type;
typename AT::t_int_1d _mask;
typename AT::t_tagint_1d _molecule;
int _first;
AtomVecMolecularKokkos_UnpackBorder(
const typename AT::t_xfloat_2d_const &buf,
typename AT::t_x_array &x,
typename AT::t_tagint_1d &tag,
typename AT::t_int_1d &type,
typename AT::t_int_1d &mask,
typename AT::t_tagint_1d &molecule,
const int& first):
_buf(buf),_x(x),_tag(tag),_type(type),_mask(mask),_molecule(molecule),
_first(first){
};
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
_x(i+_first,0) = _buf(i,0);
_x(i+_first,1) = _buf(i,1);
_x(i+_first,2) = _buf(i,2);
- _tag(i+_first) = static_cast<tagint> (_buf(i,3));
- _type(i+_first) = static_cast<int> (_buf(i,4));
- _mask(i+_first) = static_cast<int> (_buf(i,5));
- _molecule(i+_first) = static_cast<tagint> (_buf(i,6));
+ _tag(i+_first) = (tagint) d_ubuf(_buf(i,3)).i;
+ _type(i+_first) = (int) d_ubuf(_buf(i,4)).i;
+ _mask(i+_first) = (int) d_ubuf(_buf(i,5)).i;
+ _molecule(i+_first) = (tagint) d_ubuf(_buf(i,6)).i;
}
};
/* ---------------------------------------------------------------------- */
void AtomVecMolecularKokkos::unpack_border_kokkos(const int &n, const int &first,
const DAT::tdual_xfloat_2d &buf,
ExecutionSpace space) {
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
while (first+n >= nmax) grow(0);
modified(space,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
if(space==Host) {
struct AtomVecMolecularKokkos_UnpackBorder<LMPHostType>
f(buf.view<LMPHostType>(),h_x,h_tag,h_type,h_mask,h_molecule,first);
Kokkos::parallel_for(n,f);
LMPHostType::fence();
} else {
struct AtomVecMolecularKokkos_UnpackBorder<LMPDeviceType>
f(buf.view<LMPDeviceType>(),d_x,d_tag,d_type,d_mask,d_molecule,first);
Kokkos::parallel_for(n,f);
LMPDeviceType::fence();
}
}
/* ---------------------------------------------------------------------- */
void AtomVecMolecularKokkos::unpack_border(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
void AtomVecMolecularKokkos::unpack_border_vel(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
if (i == nmax) grow(0);
modified(Host,X_MASK|V_MASK|TAG_MASK|TYPE_MASK|MASK_MASK|MOLECULE_MASK);
h_x(i,0) = buf[m++];
h_x(i,1) = buf[m++];
h_x(i,2) = buf[m++];
h_tag(i) = (tagint) ubuf(buf[m++]).i;
h_type(i) = (int) ubuf(buf[m++]).i;
h_mask(i) = (int) ubuf(buf[m++]).i;
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
h_v(i,0) = buf[m++];
h_v(i,1) = buf[m++];
h_v(i,2) = buf[m++];
}
if (atom->nextra_border)
for (int iextra = 0; iextra < atom->nextra_border; iextra++)
m += modify->fix[atom->extra_border[iextra]]->
unpack_border(n,first,&buf[m]);
}
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::unpack_border_hybrid(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++)
h_molecule(i) = (tagint) ubuf(buf[m++]).i;
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecMolecularKokkos_PackExchangeFunctor {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array_randomread _x;
typename AT::t_v_array_randomread _v;
typename AT::t_tagint_1d_randomread _tag;
typename AT::t_int_1d_randomread _type;
typename AT::t_int_1d_randomread _mask;
typename AT::t_imageint_1d_randomread _image;
typename AT::t_tagint_1d_randomread _molecule;
typename AT::t_int_2d_randomread _nspecial;
typename AT::t_tagint_2d_randomread _special;
typename AT::t_int_1d_randomread _num_bond;
typename AT::t_int_2d_randomread _bond_type;
typename AT::t_tagint_2d_randomread _bond_atom;
typename AT::t_int_1d_randomread _num_angle;
typename AT::t_int_2d_randomread _angle_type;
typename AT::t_tagint_2d_randomread _angle_atom1,_angle_atom2,_angle_atom3;
typename AT::t_int_1d_randomread _num_dihedral;
typename AT::t_int_2d_randomread _dihedral_type;
typename AT::t_tagint_2d_randomread _dihedral_atom1,_dihedral_atom2,
_dihedral_atom3,_dihedral_atom4;
typename AT::t_int_1d_randomread _num_improper;
typename AT::t_int_2d_randomread _improper_type;
typename AT::t_tagint_2d_randomread _improper_atom1,_improper_atom2,
_improper_atom3,_improper_atom4;
typename AT::t_x_array _xw;
typename AT::t_v_array _vw;
typename AT::t_tagint_1d _tagw;
typename AT::t_int_1d _typew;
typename AT::t_int_1d _maskw;
typename AT::t_imageint_1d _imagew;
typename AT::t_tagint_1d _moleculew;
typename AT::t_int_2d _nspecialw;
typename AT::t_tagint_2d _specialw;
typename AT::t_int_1d _num_bondw;
typename AT::t_int_2d _bond_typew;
typename AT::t_tagint_2d _bond_atomw;
typename AT::t_int_1d _num_anglew;
typename AT::t_int_2d _angle_typew;
typename AT::t_tagint_2d _angle_atom1w,_angle_atom2w,_angle_atom3w;
typename AT::t_int_1d _num_dihedralw;
typename AT::t_int_2d _dihedral_typew;
typename AT::t_tagint_2d _dihedral_atom1w,_dihedral_atom2w,
_dihedral_atom3w,_dihedral_atom4w;
typename AT::t_int_1d _num_improperw;
typename AT::t_int_2d _improper_typew;
typename AT::t_tagint_2d _improper_atom1w,_improper_atom2w,
_improper_atom3w,_improper_atom4w;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d_const _sendlist;
typename AT::t_int_1d_const _copylist;
int _nlocal,_dim;
X_FLOAT _lo,_hi;
size_t elements;
AtomVecMolecularKokkos_PackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d sendlist,
typename AT::tdual_int_1d copylist,int nlocal, int dim,
X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_molecule(atom->k_molecule.view<DeviceType>()),
_nspecial(atom->k_nspecial.view<DeviceType>()),
_special(atom->k_special.view<DeviceType>()),
_num_bond(atom->k_num_bond.view<DeviceType>()),
_bond_type(atom->k_bond_type.view<DeviceType>()),
_bond_atom(atom->k_bond_atom.view<DeviceType>()),
_num_angle(atom->k_num_angle.view<DeviceType>()),
_angle_type(atom->k_angle_type.view<DeviceType>()),
_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
_num_dihedral(atom->k_num_dihedral.view<DeviceType>()),
_dihedral_type(atom->k_dihedral_type.view<DeviceType>()),
_dihedral_atom1(atom->k_dihedral_atom1.view<DeviceType>()),
_dihedral_atom2(atom->k_dihedral_atom2.view<DeviceType>()),
_dihedral_atom3(atom->k_dihedral_atom3.view<DeviceType>()),
_dihedral_atom4(atom->k_dihedral_atom4.view<DeviceType>()),
_num_improper(atom->k_num_improper.view<DeviceType>()),
_improper_type(atom->k_improper_type.view<DeviceType>()),
_improper_atom1(atom->k_improper_atom1.view<DeviceType>()),
_improper_atom2(atom->k_improper_atom2.view<DeviceType>()),
_improper_atom3(atom->k_improper_atom3.view<DeviceType>()),
_improper_atom4(atom->k_improper_atom4.view<DeviceType>()),
_xw(atom->k_x.view<DeviceType>()),
_vw(atom->k_v.view<DeviceType>()),
_tagw(atom->k_tag.view<DeviceType>()),
_typew(atom->k_type.view<DeviceType>()),
_maskw(atom->k_mask.view<DeviceType>()),
_imagew(atom->k_image.view<DeviceType>()),
_moleculew(atom->k_molecule.view<DeviceType>()),
_nspecialw(atom->k_nspecial.view<DeviceType>()),
_specialw(atom->k_special.view<DeviceType>()),
_num_bondw(atom->k_num_bond.view<DeviceType>()),
_bond_typew(atom->k_bond_type.view<DeviceType>()),
_bond_atomw(atom->k_bond_atom.view<DeviceType>()),
_num_anglew(atom->k_num_angle.view<DeviceType>()),
_angle_typew(atom->k_angle_type.view<DeviceType>()),
_angle_atom1w(atom->k_angle_atom1.view<DeviceType>()),
_angle_atom2w(atom->k_angle_atom2.view<DeviceType>()),
_angle_atom3w(atom->k_angle_atom3.view<DeviceType>()),
_num_dihedralw(atom->k_num_dihedral.view<DeviceType>()),
_dihedral_typew(atom->k_dihedral_type.view<DeviceType>()),
_dihedral_atom1w(atom->k_dihedral_atom1.view<DeviceType>()),
_dihedral_atom2w(atom->k_dihedral_atom2.view<DeviceType>()),
_dihedral_atom3w(atom->k_dihedral_atom3.view<DeviceType>()),
_dihedral_atom4w(atom->k_dihedral_atom4.view<DeviceType>()),
_num_improperw(atom->k_num_improper.view<DeviceType>()),
_improper_typew(atom->k_improper_type.view<DeviceType>()),
_improper_atom1w(atom->k_improper_atom1.view<DeviceType>()),
_improper_atom2w(atom->k_improper_atom2.view<DeviceType>()),
_improper_atom3w(atom->k_improper_atom3.view<DeviceType>()),
_improper_atom4w(atom->k_improper_atom4.view<DeviceType>()),
_sendlist(sendlist.template view<DeviceType>()),
_copylist(copylist.template view<DeviceType>()),
_nlocal(nlocal),_dim(dim),
_lo(lo),_hi(hi){
// 3 comp of x, 3 comp of v, 1 tag, 1 type, 1 mask, 1 image, 1 molecule, 3 nspecial,
// maxspecial special, 1 num_bond, bond_per_atom bond_type, bond_per_atom bond_atom,
// 1 num_angle, angle_per_atom angle_type, angle_per_atom angle_atom1, angle_atom2,
// and angle_atom3
// 1 num_dihedral, dihedral_per_atom dihedral_type, 4*dihedral_per_atom
// 1 num_improper, 5*improper_per_atom
// 1 to store buffer length
elements = 19+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom+
5*atom->dihedral_per_atom + 5*atom->improper_per_atom;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
- buf.template view<DeviceType>().dimension_1())/elements;
+ buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &mysend) const {
int k;
const int i = _sendlist(mysend);
_buf(mysend,0) = elements;
int m = 1;
_buf(mysend,m++) = _x(i,0);
_buf(mysend,m++) = _x(i,1);
_buf(mysend,m++) = _x(i,2);
_buf(mysend,m++) = _v(i,0);
_buf(mysend,m++) = _v(i,1);
_buf(mysend,m++) = _v(i,2);
- _buf(mysend,m++) = _tag(i);
- _buf(mysend,m++) = _type(i);
- _buf(mysend,m++) = _mask(i);
- _buf(mysend,m++) = _image(i);
- _buf(mysend,m++) = _molecule(i);
- _buf(mysend,m++) = _num_bond(i);
+ _buf(mysend,m++) = d_ubuf(_tag(i)).d;
+ _buf(mysend,m++) = d_ubuf(_type(i)).d;
+ _buf(mysend,m++) = d_ubuf(_mask(i)).d;
+ _buf(mysend,m++) = d_ubuf(_image(i)).d;
+ _buf(mysend,m++) = d_ubuf(_molecule(i)).d;
+ _buf(mysend,m++) = d_ubuf(_num_bond(i)).d;
for (k = 0; k < _num_bond(i); k++) {
- _buf(mysend,m++) = _bond_type(i,k);
- _buf(mysend,m++) = _bond_atom(i,k);
+ _buf(mysend,m++) = d_ubuf(_bond_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_bond_atom(i,k)).d;
}
- _buf(mysend,m++) = _num_angle(i);
+ _buf(mysend,m++) = d_ubuf(_num_angle(i)).d;
for (k = 0; k < _num_angle(i); k++) {
- _buf(mysend,m++) = _angle_type(i,k);
- _buf(mysend,m++) = _angle_atom1(i,k);
- _buf(mysend,m++) = _angle_atom2(i,k);
- _buf(mysend,m++) = _angle_atom3(i,k);
+ _buf(mysend,m++) = d_ubuf(_angle_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_angle_atom1(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_angle_atom2(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_angle_atom3(i,k)).d;
}
- _buf(mysend,m++) = _num_dihedral(i);
+ _buf(mysend,m++) = d_ubuf(_num_dihedral(i)).d;
for (k = 0; k < _num_dihedral(i); k++) {
- _buf(mysend,m++) = _dihedral_type(i,k);
- _buf(mysend,m++) = _dihedral_atom1(i,k);
- _buf(mysend,m++) = _dihedral_atom2(i,k);
- _buf(mysend,m++) = _dihedral_atom3(i,k);
- _buf(mysend,m++) = _dihedral_atom4(i,k);
+ _buf(mysend,m++) = d_ubuf(_dihedral_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_dihedral_atom1(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_dihedral_atom2(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_dihedral_atom3(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_dihedral_atom4(i,k)).d;
}
- _buf(mysend,m++) = _num_improper(i);
+ _buf(mysend,m++) = d_ubuf(_num_improper(i)).d;
for (k = 0; k < _num_improper(i); k++) {
- _buf(mysend,m++) = _improper_type(i,k);
- _buf(mysend,m++) = _improper_atom1(i,k);
- _buf(mysend,m++) = _improper_atom2(i,k);
- _buf(mysend,m++) = _improper_atom3(i,k);
- _buf(mysend,m++) = _improper_atom4(i,k);
+ _buf(mysend,m++) = d_ubuf(_improper_type(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_improper_atom1(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_improper_atom2(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_improper_atom3(i,k)).d;
+ _buf(mysend,m++) = d_ubuf(_improper_atom4(i,k)).d;
}
- _buf(mysend,m++) = _nspecial(i,0);
- _buf(mysend,m++) = _nspecial(i,1);
- _buf(mysend,m++) = _nspecial(i,2);
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,0)).d;
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,1)).d;
+ _buf(mysend,m++) = d_ubuf(_nspecial(i,2)).d;
for (k = 0; k < _nspecial(i,2); k++)
- _buf(mysend,m++) = _special(i,k);
+ _buf(mysend,m++) = d_ubuf(_special(i,k)).d;
const int j = _copylist(mysend);
if(j>-1) {
_xw(i,0) = _x(j,0);
_xw(i,1) = _x(j,1);
_xw(i,2) = _x(j,2);
_vw(i,0) = _v(j,0);
_vw(i,1) = _v(j,1);
_vw(i,2) = _v(j,2);
_tagw(i) = _tag(j);
_typew(i) = _type(j);
_maskw(i) = _mask(j);
_imagew(i) = _image(j);
_moleculew(i) = _molecule(j);
_num_bondw(i) = _num_bond(j);
for (k = 0; k < _num_bond(j); k++) {
_bond_typew(i,k) = _bond_type(j,k);
_bond_atomw(i,k) = _bond_atom(j,k);
}
_num_anglew(i) = _num_angle(j);
for (k = 0; k < _num_angle(j); k++) {
_angle_typew(i,k) = _angle_type(j,k);
_angle_atom1w(i,k) = _angle_atom1(j,k);
_angle_atom2w(i,k) = _angle_atom2(j,k);
_angle_atom3w(i,k) = _angle_atom3(j,k);
}
_num_dihedralw(i) = _num_dihedral(j);
for (k = 0; k < _num_dihedral(j); k++) {
_dihedral_typew(i,k) = _dihedral_type(j,k);
_dihedral_atom1w(i,k) = _dihedral_atom1(j,k);
_dihedral_atom2w(i,k) = _dihedral_atom2(j,k);
_dihedral_atom3w(i,k) = _dihedral_atom3(j,k);
_dihedral_atom4w(i,k) = _dihedral_atom4(j,k);
}
_num_improperw(i) = _num_improper(j);
for (k = 0; k < _num_improper(j); k++) {
_improper_typew(i,k) = _improper_type(j,k);
_improper_atom1w(i,k) = _improper_atom1(j,k);
_improper_atom2w(i,k) = _improper_atom2(j,k);
_improper_atom3w(i,k) = _improper_atom3(j,k);
_improper_atom4w(i,k) = _improper_atom4(j,k);
}
_nspecialw(i,0) = _nspecial(j,0);
_nspecialw(i,1) = _nspecial(j,1);
_nspecialw(i,2) = _nspecial(j,2);
for (k = 0; k < _nspecial(j,2); k++)
_specialw(i,k) = _special(j,k);
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_exchange_kokkos(const int &nsend,DAT::tdual_xfloat_2d &k_buf,
DAT::tdual_int_1d k_sendlist,
DAT::tdual_int_1d k_copylist,
ExecutionSpace space,int dim,X_FLOAT lo,
X_FLOAT hi )
{
const int elements = 19+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom+
5*atom->dihedral_per_atom + 5*atom->improper_per_atom;
if(nsend > (int) (k_buf.view<LMPHostType>().dimension_0()*
k_buf.view<LMPHostType>().dimension_1())/elements) {
int newsize = nsend*elements/k_buf.view<LMPHostType>().dimension_1()+1;
k_buf.resize(newsize,k_buf.view<LMPHostType>().dimension_1());
}
if(space == Host) {
AtomVecMolecularKokkos_PackExchangeFunctor<LMPHostType>
f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPHostType::fence();
return nsend*elements;
} else {
AtomVecMolecularKokkos_PackExchangeFunctor<LMPDeviceType>
f(atomKK,k_buf,k_sendlist,k_copylist,atom->nlocal,dim,lo,hi);
Kokkos::parallel_for(nsend,f);
LMPDeviceType::fence();
return nsend*elements;
}
}
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_exchange(int i, double *buf)
{
int k;
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = ubuf(h_molecule(i)).d;
buf[m++] = ubuf(h_num_bond(i)).d;
for (k = 0; k < h_num_bond(i); k++) {
buf[m++] = ubuf(h_bond_type(i,k)).d;
buf[m++] = ubuf(h_bond_atom(i,k)).d;
}
buf[m++] = ubuf(h_num_angle(i)).d;
for (k = 0; k < h_num_angle(i); k++) {
buf[m++] = ubuf(h_angle_type(i,k)).d;
buf[m++] = ubuf(h_angle_atom1(i,k)).d;
buf[m++] = ubuf(h_angle_atom2(i,k)).d;
buf[m++] = ubuf(h_angle_atom3(i,k)).d;
}
buf[m++] = ubuf(h_num_dihedral(i)).d;
for (k = 0; k < h_num_dihedral(i); k++) {
buf[m++] = ubuf(h_dihedral_type(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom1(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom2(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom3(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom4(i,k)).d;
}
buf[m++] = ubuf(h_num_improper(i)).d;
for (k = 0; k < h_num_improper(i); k++) {
buf[m++] = ubuf(h_improper_type(i,k)).d;
buf[m++] = ubuf(h_improper_atom1(i,k)).d;
buf[m++] = ubuf(h_improper_atom2(i,k)).d;
buf[m++] = ubuf(h_improper_atom3(i,k)).d;
buf[m++] = ubuf(h_improper_atom4(i,k)).d;
}
buf[m++] = ubuf(h_nspecial(i,0)).d;
buf[m++] = ubuf(h_nspecial(i,1)).d;
buf[m++] = ubuf(h_nspecial(i,2)).d;
for (k = 0; k < h_nspecial(i,2); k++)
buf[m++] = ubuf(h_special(i,k)).d;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->pack_exchange(i,&buf[m]);
buf[0] = m;
return m;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
struct AtomVecMolecularKokkos_UnpackExchangeFunctor {
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typename AT::t_x_array _x;
typename AT::t_v_array _v;
typename AT::t_tagint_1d _tag;
typename AT::t_int_1d _type;
typename AT::t_int_1d _mask;
typename AT::t_imageint_1d _image;
typename AT::t_tagint_1d _molecule;
typename AT::t_int_2d _nspecial;
typename AT::t_tagint_2d _special;
typename AT::t_int_1d _num_bond;
typename AT::t_int_2d _bond_type;
typename AT::t_tagint_2d _bond_atom;
typename AT::t_int_1d _num_angle;
typename AT::t_int_2d _angle_type;
typename AT::t_tagint_2d _angle_atom1,_angle_atom2,_angle_atom3;
typename AT::t_int_1d _num_dihedral;
typename AT::t_int_2d _dihedral_type;
typename AT::t_tagint_2d _dihedral_atom1,_dihedral_atom2,
_dihedral_atom3,_dihedral_atom4;
typename AT::t_int_1d _num_improper;
typename AT::t_int_2d _improper_type;
typename AT::t_tagint_2d _improper_atom1,_improper_atom2,
_improper_atom3,_improper_atom4;
typename AT::t_xfloat_2d_um _buf;
typename AT::t_int_1d _nlocal;
int _dim;
X_FLOAT _lo,_hi;
size_t elements;
AtomVecMolecularKokkos_UnpackExchangeFunctor(
const AtomKokkos* atom,
const typename AT::tdual_xfloat_2d buf,
typename AT::tdual_int_1d nlocal,
int dim, X_FLOAT lo, X_FLOAT hi):
_x(atom->k_x.view<DeviceType>()),
_v(atom->k_v.view<DeviceType>()),
_tag(atom->k_tag.view<DeviceType>()),
_type(atom->k_type.view<DeviceType>()),
_mask(atom->k_mask.view<DeviceType>()),
_image(atom->k_image.view<DeviceType>()),
_molecule(atom->k_molecule.view<DeviceType>()),
_nspecial(atom->k_nspecial.view<DeviceType>()),
_special(atom->k_special.view<DeviceType>()),
_num_bond(atom->k_num_bond.view<DeviceType>()),
_bond_type(atom->k_bond_type.view<DeviceType>()),
_bond_atom(atom->k_bond_atom.view<DeviceType>()),
_num_angle(atom->k_num_angle.view<DeviceType>()),
_angle_type(atom->k_angle_type.view<DeviceType>()),
_angle_atom1(atom->k_angle_atom1.view<DeviceType>()),
_angle_atom2(atom->k_angle_atom2.view<DeviceType>()),
_angle_atom3(atom->k_angle_atom3.view<DeviceType>()),
_num_dihedral(atom->k_num_dihedral.view<DeviceType>()),
_dihedral_type(atom->k_dihedral_type.view<DeviceType>()),
_dihedral_atom1(atom->k_dihedral_atom1.view<DeviceType>()),
_dihedral_atom2(atom->k_dihedral_atom2.view<DeviceType>()),
_dihedral_atom3(atom->k_dihedral_atom3.view<DeviceType>()),
_dihedral_atom4(atom->k_dihedral_atom4.view<DeviceType>()),
_num_improper(atom->k_num_improper.view<DeviceType>()),
_improper_type(atom->k_improper_type.view<DeviceType>()),
_improper_atom1(atom->k_improper_atom1.view<DeviceType>()),
_improper_atom2(atom->k_improper_atom2.view<DeviceType>()),
_improper_atom3(atom->k_improper_atom3.view<DeviceType>()),
_improper_atom4(atom->k_improper_atom4.view<DeviceType>()),
_nlocal(nlocal.template view<DeviceType>()),_dim(dim),
_lo(lo),_hi(hi){
elements = 19+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom+
5*atom->dihedral_per_atom + 5*atom->improper_per_atom;
const int maxsendlist = (buf.template view<DeviceType>().dimension_0()*
- buf.template view<DeviceType>().dimension_1())/elements;
+ buf.template view<DeviceType>().dimension_1())/elements;
buffer_view<DeviceType>(_buf,buf,maxsendlist,elements);
}
KOKKOS_INLINE_FUNCTION
void operator() (const int &myrecv) const {
X_FLOAT x = _buf(myrecv,_dim+1);
if (x >= _lo && x < _hi) {
int i = Kokkos::atomic_fetch_add(&_nlocal(0),1);
int m = 1;
_x(i,0) = _buf(myrecv,m++);
_x(i,1) = _buf(myrecv,m++);
_x(i,2) = _buf(myrecv,m++);
_v(i,0) = _buf(myrecv,m++);
_v(i,1) = _buf(myrecv,m++);
_v(i,2) = _buf(myrecv,m++);
- _tag(i) = _buf(myrecv,m++);
- _type(i) = _buf(myrecv,m++);
- _mask(i) = _buf(myrecv,m++);
- _image(i) = _buf(myrecv,m++);
+ _tag(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _type(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _mask(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _image(i) = (imageint) d_ubuf(_buf(myrecv,m++)).i;
- _molecule(i) = _buf(myrecv,m++);
- _num_bond(i) = _buf(myrecv,m++);
+ _molecule(i) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _num_bond(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
int k;
for (k = 0; k < _num_bond(i); k++) {
- _bond_type(i,k) = _buf(myrecv,m++);
- _bond_atom(i,k) = _buf(myrecv,m++);
+ _bond_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _bond_atom(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _num_angle(i) = _buf(myrecv,m++);
+ _num_angle(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
for (k = 0; k < _num_angle(i); k++) {
- _angle_type(i,k) = _buf(myrecv,m++);
- _angle_atom1(i,k) = _buf(myrecv,m++);
- _angle_atom2(i,k) = _buf(myrecv,m++);
- _angle_atom3(i,k) = _buf(myrecv,m++);
+ _angle_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _angle_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _angle_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _angle_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _num_dihedral(i) = _buf(myrecv,m++);
+ _num_dihedral(i) = d_ubuf(_buf(myrecv,m++)).i;
for (k = 0; k < _num_dihedral(i); k++) {
- _dihedral_type(i,k) = _buf(myrecv,m++);
- _dihedral_atom1(i,k) = _buf(myrecv,m++);
- _dihedral_atom2(i,k) = _buf(myrecv,m++);
- _dihedral_atom3(i,k) = _buf(myrecv,m++);
- _dihedral_atom4(i,k) = _buf(myrecv,m++);
+ _dihedral_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _dihedral_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _dihedral_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _dihedral_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _dihedral_atom4(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _num_improper(i) = _buf(myrecv,m++);
- for (k = 0; k < _num_improper(i); k++) {
- _improper_type(i,k) = _buf(myrecv,m++);
- _improper_atom1(i,k) = _buf(myrecv,m++);
- _improper_atom2(i,k) = _buf(myrecv,m++);
- _improper_atom3(i,k) = _buf(myrecv,m++);
- _improper_atom4(i,k) = _buf(myrecv,m++);
+ _num_improper(i) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ for (k = 0; k < (int) _num_improper(i); k++) {
+ _improper_type(i,k) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _improper_atom1(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _improper_atom2(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _improper_atom3(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
+ _improper_atom4(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
- _nspecial(i,0) = _buf(myrecv,m++);
- _nspecial(i,1) = _buf(myrecv,m++);
- _nspecial(i,2) = _buf(myrecv,m++);
+ _nspecial(i,0) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _nspecial(i,1) = (int) d_ubuf(_buf(myrecv,m++)).i;
+ _nspecial(i,2) = (int) d_ubuf(_buf(myrecv,m++)).i;
for (k = 0; k < _nspecial(i,2); k++)
- _special(i,k) = _buf(myrecv,m++);
+ _special(i,k) = (tagint) d_ubuf(_buf(myrecv,m++)).i;
}
}
};
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::unpack_exchange_kokkos(DAT::tdual_xfloat_2d &k_buf,int nrecv,
int nlocal,int dim,X_FLOAT lo,X_FLOAT hi,
ExecutionSpace space) {
const size_t elements = 19+atom->maxspecial+2*atom->bond_per_atom+4*atom->angle_per_atom+
5*atom->dihedral_per_atom + 5*atom->improper_per_atom;
if(space == Host) {
k_count.h_view(0) = nlocal;
AtomVecMolecularKokkos_UnpackExchangeFunctor<LMPHostType>
f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/elements,f);
LMPHostType::fence();
return k_count.h_view(0);
} else {
k_count.h_view(0) = nlocal;
k_count.modify<LMPHostType>();
k_count.sync<LMPDeviceType>();
AtomVecMolecularKokkos_UnpackExchangeFunctor<LMPDeviceType>
f(atomKK,k_buf,k_count,dim,lo,hi);
Kokkos::parallel_for(nrecv/elements,f);
LMPDeviceType::fence();
k_count.modify<LMPDeviceType>();
k_count.sync<LMPHostType>();
return k_count.h_view(0);
}
}
/* ---------------------------------------------------------------------- */
int AtomVecMolecularKokkos::unpack_exchange(double *buf)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | MOLECULE_MASK | BOND_MASK |
ANGLE_MASK | DIHEDRAL_MASK | IMPROPER_MASK | SPECIAL_MASK);
int k;
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;
h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_bond(nlocal); k++) {
h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_angle(nlocal); k++) {
h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_dihedral(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_dihedral(nlocal); k++) {
h_dihedral_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_dihedral_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_improper(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_improper(nlocal); k++) {
h_improper_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_improper_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_nspecial(nlocal,0) = (int) ubuf(buf[m++]).i;
h_nspecial(nlocal,1) = (int) ubuf(buf[m++]).i;
h_nspecial(nlocal,2) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_nspecial(nlocal,2); k++)
h_special(nlocal,k) = (tagint) ubuf(buf[m++]).i;
if (atom->nextra_grow)
for (int iextra = 0; iextra < atom->nextra_grow; iextra++)
m += modify->fix[atom->extra_grow[iextra]]->
unpack_exchange(nlocal,&buf[m]);
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
size of restart data for all atoms owned by this proc
include extra data stored by fixes
------------------------------------------------------------------------- */
int AtomVecMolecularKokkos::size_restart()
{
int i;
int nlocal = atom->nlocal;
int n = 0;
for (i = 0; i < nlocal; i++)
n += 16 + 2*num_bond[i] + 4*num_angle[i] +
5*num_dihedral[i] + 5*num_improper[i];
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
for (i = 0; i < nlocal; i++)
n += modify->fix[atom->extra_restart[iextra]]->size_restart(i);
return n;
}
/* ----------------------------------------------------------------------
pack atom I's data for restart file including extra quantities
xyz must be 1st 3 values, so that read_restart can test on them
molecular types may be negative, but write as positive
------------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_restart(int i, double *buf)
{
sync(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | MOLECULE_MASK | BOND_MASK |
ANGLE_MASK | DIHEDRAL_MASK | IMPROPER_MASK | SPECIAL_MASK);
int m = 1;
buf[m++] = h_x(i,0);
buf[m++] = h_x(i,1);
buf[m++] = h_x(i,2);
buf[m++] = ubuf(h_tag(i)).d;
buf[m++] = ubuf(h_type(i)).d;
buf[m++] = ubuf(h_mask(i)).d;
buf[m++] = ubuf(h_image(i)).d;
buf[m++] = h_v(i,0);
buf[m++] = h_v(i,1);
buf[m++] = h_v(i,2);
buf[m++] = ubuf(h_molecule(i)).d;
buf[m++] = ubuf(h_num_bond(i)).d;
for (int k = 0; k < h_num_bond(i); k++) {
buf[m++] = ubuf(MAX(h_bond_type(i,k),-h_bond_type(i,k))).d;
buf[m++] = ubuf(h_bond_atom(i,k)).d;
}
buf[m++] = ubuf(h_num_angle(i)).d;
for (int k = 0; k < h_num_angle(i); k++) {
buf[m++] = ubuf(MAX(h_angle_type(i,k),-h_angle_type(i,k))).d;
buf[m++] = ubuf(h_angle_atom1(i,k)).d;
buf[m++] = ubuf(h_angle_atom2(i,k)).d;
buf[m++] = ubuf(h_angle_atom3(i,k)).d;
}
buf[m++] = ubuf(h_num_dihedral(i)).d;
for (int k = 0; k < h_num_dihedral(i); k++) {
buf[m++] = ubuf(MAX(h_dihedral_type(i,k),-h_dihedral_type(i,k))).d;
buf[m++] = ubuf(h_dihedral_atom1(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom2(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom3(i,k)).d;
buf[m++] = ubuf(h_dihedral_atom4(i,k)).d;
}
buf[m++] = ubuf(h_num_improper(i)).d;
for (int k = 0; k < h_num_improper(i); k++) {
buf[m++] = ubuf(MAX(h_improper_type(i,k),-h_improper_type(i,k))).d;
buf[m++] = ubuf(h_improper_atom1(i,k)).d;
buf[m++] = ubuf(h_improper_atom2(i,k)).d;
buf[m++] = ubuf(h_improper_atom3(i,k)).d;
buf[m++] = ubuf(h_improper_atom4(i,k)).d;
}
if (atom->nextra_restart)
for (int iextra = 0; iextra < atom->nextra_restart; iextra++)
m += modify->fix[atom->extra_restart[iextra]]->pack_restart(i,&buf[m]);
buf[0] = m;
return m;
}
/* ----------------------------------------------------------------------
unpack data for one atom from restart file including extra quantities
------------------------------------------------------------------------- */
int AtomVecMolecularKokkos::unpack_restart(double *buf)
{
int k;
int nlocal = atom->nlocal;
if (nlocal == nmax) {
grow(0);
if (atom->nextra_store)
memory->grow(atom->extra,nmax,atom->nextra_store,"atom:extra");
}
modified(Host,X_MASK | V_MASK | TAG_MASK | TYPE_MASK |
MASK_MASK | IMAGE_MASK | MOLECULE_MASK | BOND_MASK |
ANGLE_MASK | DIHEDRAL_MASK | IMPROPER_MASK | SPECIAL_MASK);
int m = 1;
h_x(nlocal,0) = buf[m++];
h_x(nlocal,1) = buf[m++];
h_x(nlocal,2) = buf[m++];
h_tag(nlocal) = (tagint) ubuf(buf[m++]).i;
h_type(nlocal) = (int) ubuf(buf[m++]).i;
h_mask(nlocal) = (int) ubuf(buf[m++]).i;
h_image(nlocal) = (imageint) ubuf(buf[m++]).i;
h_v(nlocal,0) = buf[m++];
h_v(nlocal,1) = buf[m++];
h_v(nlocal,2) = buf[m++];
h_molecule(nlocal) = (tagint) ubuf(buf[m++]).i;
h_num_bond(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_bond(nlocal); k++) {
h_bond_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_bond_atom(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_angle(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_angle(nlocal); k++) {
h_angle_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_angle_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_angle_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_dihedral(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_dihedral(nlocal); k++) {
h_dihedral_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_dihedral_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_dihedral_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_num_improper(nlocal) = (int) ubuf(buf[m++]).i;
for (k = 0; k < h_num_improper(nlocal); k++) {
h_improper_type(nlocal,k) = (int) ubuf(buf[m++]).i;
h_improper_atom1(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom2(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom3(nlocal,k) = (tagint) ubuf(buf[m++]).i;
h_improper_atom4(nlocal,k) = (tagint) ubuf(buf[m++]).i;
}
h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;
double **extra = atom->extra;
if (atom->nextra_store) {
int size = static_cast<int> (buf[0]) - m;
for (int i = 0; i < size; i++) extra[nlocal][i] = buf[m++];
}
atom->nlocal++;
return m;
}
/* ----------------------------------------------------------------------
create one atom of itype at coord
set other values to defaults
------------------------------------------------------------------------- */
void AtomVecMolecularKokkos::create_atom(int itype, double *coord)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) {
atomKK->modified(Host,ALL_MASK);
grow(0);
}
atomKK->modified(Host,ALL_MASK);
tag[nlocal] = 0;
type[nlocal] = itype;
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_mask(nlocal) = 1;
h_image(nlocal) = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
h_molecule(nlocal) = 0;
h_num_bond(nlocal) = 0;
h_num_angle(nlocal) = 0;
h_num_dihedral(nlocal) = 0;
h_num_improper(nlocal) = 0;
h_nspecial(nlocal,0) = h_nspecial(nlocal,1) = h_nspecial(nlocal,2) = 0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack one line from Atoms section of data file
initialize other atom quantities
------------------------------------------------------------------------- */
void AtomVecMolecularKokkos::data_atom(double *coord, imageint imagetmp,
char **values)
{
int nlocal = atom->nlocal;
if (nlocal == nmax) grow(0);
atomKK->modified(Host,ALL_MASK);
h_tag(nlocal) = atoi(values[0]);
h_molecule(nlocal) = atoi(values[1]);
h_type(nlocal) = atoi(values[2]);
if (h_type(nlocal) <= 0 || h_type(nlocal) > atom->ntypes)
error->one(FLERR,"Invalid atom type in Atoms section of data file");
h_x(nlocal,0) = coord[0];
h_x(nlocal,1) = coord[1];
h_x(nlocal,2) = coord[2];
h_image(nlocal) = imagetmp;
h_mask(nlocal) = 1;
h_v(nlocal,0) = 0.0;
h_v(nlocal,1) = 0.0;
h_v(nlocal,2) = 0.0;
h_num_bond(nlocal) = 0;
h_num_angle(nlocal) = 0;
h_num_dihedral(nlocal) = 0;
h_num_improper(nlocal) = 0;
atom->nlocal++;
}
/* ----------------------------------------------------------------------
unpack hybrid quantities from one line in Atoms section of data file
initialize other atom quantities for this sub-style
------------------------------------------------------------------------- */
int AtomVecMolecularKokkos::data_atom_hybrid(int nlocal, char **values)
{
h_molecule(nlocal) = atoi(values[0]);
h_num_bond(nlocal) = 0;
h_num_angle(nlocal) = 0;
h_num_dihedral(nlocal) = 0;
h_num_improper(nlocal) = 0;
return 1;
}
/* ----------------------------------------------------------------------
pack atom info for data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecMolecularKokkos::pack_data(double **buf)
{
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
buf[i][0] = h_tag(i);
buf[i][1] = h_molecule(i);
buf[i][2] = h_type(i);
buf[i][3] = h_x(i,0);
buf[i][4] = h_x(i,1);
buf[i][5] = h_x(i,2);
buf[i][6] = (h_image[i] & IMGMASK) - IMGMAX;
buf[i][7] = (h_image[i] >> IMGBITS & IMGMASK) - IMGMAX;
buf[i][8] = (h_image[i] >> IMG2BITS) - IMGMAX;
}
}
/* ----------------------------------------------------------------------
pack hybrid atom info for data file
------------------------------------------------------------------------- */
int AtomVecMolecularKokkos::pack_data_hybrid(int i, double *buf)
{
buf[0] = h_molecule(i);
return 1;
}
/* ----------------------------------------------------------------------
write atom info to data file including 3 image flags
------------------------------------------------------------------------- */
void AtomVecMolecularKokkos::write_data(FILE *fp, int n, double **buf)
{
for (int i = 0; i < n; i++)
fprintf(fp,"%d %d %d %-1.16e %-1.16e %-1.16e %d %d %d\n",
(int) buf[i][0],(int) buf[i][1], (int) buf[i][2],
buf[i][3],buf[i][4],buf[i][5],
(int) buf[i][6],(int) buf[i][7],(int) buf[i][8]);
}
/* ----------------------------------------------------------------------
write hybrid atom info to data file
------------------------------------------------------------------------- */
int AtomVecMolecularKokkos::write_data_hybrid(FILE *fp, double *buf)
{
fprintf(fp," " TAGINT_FORMAT, (tagint) (buf[0]));
return 1;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint AtomVecMolecularKokkos::memory_usage()
{
bigint bytes = 0;
if (atom->memcheck("tag")) bytes += memory->usage(tag,nmax);
if (atom->memcheck("type")) bytes += memory->usage(type,nmax);
if (atom->memcheck("mask")) bytes += memory->usage(mask,nmax);
if (atom->memcheck("image")) bytes += memory->usage(image,nmax);
if (atom->memcheck("x")) bytes += memory->usage(x,nmax,3);
if (atom->memcheck("v")) bytes += memory->usage(v,nmax,3);
if (atom->memcheck("f")) bytes += memory->usage(f,nmax*commKK->nthreads,3);
if (atom->memcheck("molecule")) bytes += memory->usage(molecule,nmax);
if (atom->memcheck("nspecial")) bytes += memory->usage(nspecial,nmax,3);
if (atom->memcheck("special"))
bytes += memory->usage(special,nmax,atom->maxspecial);
if (atom->memcheck("num_bond")) bytes += memory->usage(num_bond,nmax);
if (atom->memcheck("bond_type"))
bytes += memory->usage(bond_type,nmax,atom->bond_per_atom);
if (atom->memcheck("bond_atom"))
bytes += memory->usage(bond_atom,nmax,atom->bond_per_atom);
if (atom->memcheck("num_angle")) bytes += memory->usage(num_angle,nmax);
if (atom->memcheck("angle_type"))
bytes += memory->usage(angle_type,nmax,atom->angle_per_atom);
if (atom->memcheck("angle_atom1"))
bytes += memory->usage(angle_atom1,nmax,atom->angle_per_atom);
if (atom->memcheck("angle_atom2"))
bytes += memory->usage(angle_atom2,nmax,atom->angle_per_atom);
if (atom->memcheck("angle_atom3"))
bytes += memory->usage(angle_atom3,nmax,atom->angle_per_atom);
if (atom->memcheck("num_dihedral")) bytes += memory->usage(num_dihedral,nmax);
if (atom->memcheck("dihedral_type"))
bytes += memory->usage(dihedral_type,nmax,atom->dihedral_per_atom);
if (atom->memcheck("dihedral_atom1"))
bytes += memory->usage(dihedral_atom1,nmax,atom->dihedral_per_atom);
if (atom->memcheck("dihedral_atom2"))
bytes += memory->usage(dihedral_atom2,nmax,atom->dihedral_per_atom);
if (atom->memcheck("dihedral_atom3"))
bytes += memory->usage(dihedral_atom3,nmax,atom->dihedral_per_atom);
if (atom->memcheck("dihedral_atom4"))
bytes += memory->usage(dihedral_atom4,nmax,atom->dihedral_per_atom);
if (atom->memcheck("num_improper")) bytes += memory->usage(num_improper,nmax);
if (atom->memcheck("improper_type"))
bytes += memory->usage(improper_type,nmax,atom->improper_per_atom);
if (atom->memcheck("improper_atom1"))
bytes += memory->usage(improper_atom1,nmax,atom->improper_per_atom);
if (atom->memcheck("improper_atom2"))
bytes += memory->usage(improper_atom2,nmax,atom->improper_per_atom);
if (atom->memcheck("improper_atom3"))
bytes += memory->usage(improper_atom3,nmax,atom->improper_per_atom);
if (atom->memcheck("improper_atom4"))
bytes += memory->usage(improper_atom4,nmax,atom->improper_per_atom);
return bytes;
}
/* ---------------------------------------------------------------------- */
void AtomVecMolecularKokkos::sync(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.sync<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPDeviceType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPDeviceType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.sync<LMPDeviceType>();
atomKK->k_special.sync<LMPDeviceType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.sync<LMPDeviceType>();
atomKK->k_bond_type.sync<LMPDeviceType>();
atomKK->k_bond_atom.sync<LMPDeviceType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.sync<LMPDeviceType>();
atomKK->k_angle_type.sync<LMPDeviceType>();
atomKK->k_angle_atom1.sync<LMPDeviceType>();
atomKK->k_angle_atom2.sync<LMPDeviceType>();
atomKK->k_angle_atom3.sync<LMPDeviceType>();
}
if (mask & DIHEDRAL_MASK) {
atomKK->k_num_dihedral.sync<LMPDeviceType>();
atomKK->k_dihedral_type.sync<LMPDeviceType>();
atomKK->k_dihedral_atom1.sync<LMPDeviceType>();
atomKK->k_dihedral_atom2.sync<LMPDeviceType>();
atomKK->k_dihedral_atom3.sync<LMPDeviceType>();
atomKK->k_dihedral_atom4.sync<LMPDeviceType>();
}
if (mask & IMPROPER_MASK) {
atomKK->k_num_improper.sync<LMPDeviceType>();
atomKK->k_improper_type.sync<LMPDeviceType>();
atomKK->k_improper_atom1.sync<LMPDeviceType>();
atomKK->k_improper_atom2.sync<LMPDeviceType>();
atomKK->k_improper_atom3.sync<LMPDeviceType>();
atomKK->k_improper_atom4.sync<LMPDeviceType>();
}
} else {
if (mask & X_MASK) atomKK->k_x.sync<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.sync<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.sync<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.sync<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.sync<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.sync<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.sync<LMPHostType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.sync<LMPHostType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.sync<LMPHostType>();
atomKK->k_special.sync<LMPHostType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.sync<LMPHostType>();
atomKK->k_bond_type.sync<LMPHostType>();
atomKK->k_bond_atom.sync<LMPHostType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.sync<LMPHostType>();
atomKK->k_angle_type.sync<LMPHostType>();
atomKK->k_angle_atom1.sync<LMPHostType>();
atomKK->k_angle_atom2.sync<LMPHostType>();
atomKK->k_angle_atom3.sync<LMPHostType>();
}
if (mask & DIHEDRAL_MASK) {
atomKK->k_num_dihedral.sync<LMPHostType>();
atomKK->k_dihedral_type.sync<LMPHostType>();
atomKK->k_dihedral_atom1.sync<LMPHostType>();
atomKK->k_dihedral_atom2.sync<LMPHostType>();
atomKK->k_dihedral_atom3.sync<LMPHostType>();
atomKK->k_dihedral_atom4.sync<LMPHostType>();
}
if (mask & IMPROPER_MASK) {
atomKK->k_num_improper.sync<LMPHostType>();
atomKK->k_improper_type.sync<LMPHostType>();
atomKK->k_improper_atom1.sync<LMPHostType>();
atomKK->k_improper_atom2.sync<LMPHostType>();
atomKK->k_improper_atom3.sync<LMPHostType>();
atomKK->k_improper_atom4.sync<LMPHostType>();
}
}
}
void AtomVecMolecularKokkos::sync_overlapping_device(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
if (mask & SPECIAL_MASK) {
if (atomKK->k_nspecial.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
if (atomKK->k_special.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
}
if (mask & BOND_MASK) {
if (atomKK->k_num_bond.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
if (atomKK->k_bond_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
if (atomKK->k_bond_atom.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
}
if (mask & ANGLE_MASK) {
if (atomKK->k_num_angle.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
if (atomKK->k_angle_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
if (atomKK->k_angle_atom1.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
if (atomKK->k_angle_atom2.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
if (atomKK->k_angle_atom3.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
}
if (mask & DIHEDRAL_MASK) {
if (atomKK->k_num_dihedral.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_dihedral,space);
if (atomKK->k_dihedral_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_dihedral_type,space);
if (atomKK->k_dihedral_atom1.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom1,space);
if (atomKK->k_dihedral_atom2.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom2,space);
if (atomKK->k_dihedral_atom3.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom3,space);
}
if (mask & IMPROPER_MASK) {
if (atomKK->k_num_improper.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_improper,space);
if (atomKK->k_improper_type.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_improper_type,space);
if (atomKK->k_improper_atom1.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom1,space);
if (atomKK->k_improper_atom2.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom2,space);
if (atomKK->k_improper_atom3.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom3,space);
if (atomKK->k_improper_atom4.need_sync<LMPDeviceType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom4,space);
}
} else {
if ((mask & X_MASK) && atomKK->k_x.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_x_array>(atomKK->k_x,space);
if ((mask & V_MASK) && atomKK->k_v.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_v_array>(atomKK->k_v,space);
if ((mask & F_MASK) && atomKK->k_f.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_f_array>(atomKK->k_f,space);
if ((mask & TAG_MASK) && atomKK->k_tag.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_tag,space);
if ((mask & TYPE_MASK) && atomKK->k_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_type,space);
if ((mask & MASK_MASK) && atomKK->k_mask.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_mask,space);
if ((mask & IMAGE_MASK) && atomKK->k_image.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_imageint_1d>(atomKK->k_image,space);
if ((mask & MOLECULE_MASK) && atomKK->k_molecule.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_1d>(atomKK->k_molecule,space);
if (mask & SPECIAL_MASK) {
if (atomKK->k_nspecial.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_nspecial,space);
if (atomKK->k_special.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_special,space);
}
if (mask & BOND_MASK) {
if (atomKK->k_num_bond.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_bond,space);
if (atomKK->k_bond_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_bond_type,space);
if (atomKK->k_bond_atom.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_bond_atom,space);
}
if (mask & ANGLE_MASK) {
if (atomKK->k_num_angle.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_angle,space);
if (atomKK->k_angle_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_angle_type,space);
if (atomKK->k_angle_atom1.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom1,space);
if (atomKK->k_angle_atom2.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom2,space);
if (atomKK->k_angle_atom3.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_angle_atom3,space);
}
if (mask & DIHEDRAL_MASK) {
if (atomKK->k_num_dihedral.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_dihedral,space);
if (atomKK->k_dihedral_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_dihedral_type,space);
if (atomKK->k_dihedral_atom1.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom1,space);
if (atomKK->k_dihedral_atom2.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom2,space);
if (atomKK->k_dihedral_atom3.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom3,space);
if (atomKK->k_dihedral_atom4.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_dihedral_atom4,space);
}
if (mask & IMPROPER_MASK) {
if (atomKK->k_num_improper.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_1d>(atomKK->k_num_improper,space);
if (atomKK->k_improper_type.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_int_2d>(atomKK->k_improper_type,space);
if (atomKK->k_improper_atom1.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom1,space);
if (atomKK->k_improper_atom2.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom2,space);
if (atomKK->k_improper_atom3.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom3,space);
if (atomKK->k_improper_atom4.need_sync<LMPHostType>())
perform_async_copy<DAT::tdual_tagint_2d>(atomKK->k_improper_atom4,space);
}
}
}
/* ---------------------------------------------------------------------- */
void AtomVecMolecularKokkos::modified(ExecutionSpace space, unsigned int mask)
{
if (space == Device) {
if (mask & X_MASK) atomKK->k_x.modify<LMPDeviceType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPDeviceType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPDeviceType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPDeviceType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPDeviceType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPDeviceType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPDeviceType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPDeviceType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.modify<LMPDeviceType>();
atomKK->k_special.modify<LMPDeviceType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.modify<LMPDeviceType>();
atomKK->k_bond_type.modify<LMPDeviceType>();
atomKK->k_bond_atom.modify<LMPDeviceType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.modify<LMPDeviceType>();
atomKK->k_angle_type.modify<LMPDeviceType>();
atomKK->k_angle_atom1.modify<LMPDeviceType>();
atomKK->k_angle_atom2.modify<LMPDeviceType>();
atomKK->k_angle_atom3.modify<LMPDeviceType>();
}
if (mask & DIHEDRAL_MASK) {
atomKK->k_num_dihedral.modify<LMPDeviceType>();
atomKK->k_dihedral_type.modify<LMPDeviceType>();
atomKK->k_dihedral_atom1.modify<LMPDeviceType>();
atomKK->k_dihedral_atom2.modify<LMPDeviceType>();
atomKK->k_dihedral_atom3.modify<LMPDeviceType>();
atomKK->k_dihedral_atom4.modify<LMPDeviceType>();
}
if (mask & IMPROPER_MASK) {
atomKK->k_num_improper.modify<LMPDeviceType>();
atomKK->k_improper_type.modify<LMPDeviceType>();
atomKK->k_improper_atom1.modify<LMPDeviceType>();
atomKK->k_improper_atom2.modify<LMPDeviceType>();
atomKK->k_improper_atom3.modify<LMPDeviceType>();
atomKK->k_improper_atom4.modify<LMPDeviceType>();
}
} else {
if (mask & X_MASK) atomKK->k_x.modify<LMPHostType>();
if (mask & V_MASK) atomKK->k_v.modify<LMPHostType>();
if (mask & F_MASK) atomKK->k_f.modify<LMPHostType>();
if (mask & TAG_MASK) atomKK->k_tag.modify<LMPHostType>();
if (mask & TYPE_MASK) atomKK->k_type.modify<LMPHostType>();
if (mask & MASK_MASK) atomKK->k_mask.modify<LMPHostType>();
if (mask & IMAGE_MASK) atomKK->k_image.modify<LMPHostType>();
if (mask & MOLECULE_MASK) atomKK->k_molecule.modify<LMPHostType>();
if (mask & SPECIAL_MASK) {
atomKK->k_nspecial.modify<LMPHostType>();
atomKK->k_special.modify<LMPHostType>();
}
if (mask & BOND_MASK) {
atomKK->k_num_bond.modify<LMPHostType>();
atomKK->k_bond_type.modify<LMPHostType>();
atomKK->k_bond_atom.modify<LMPHostType>();
}
if (mask & ANGLE_MASK) {
atomKK->k_num_angle.modify<LMPHostType>();
atomKK->k_angle_type.modify<LMPHostType>();
atomKK->k_angle_atom1.modify<LMPHostType>();
atomKK->k_angle_atom2.modify<LMPHostType>();
atomKK->k_angle_atom3.modify<LMPHostType>();
}
if (mask & DIHEDRAL_MASK) {
atomKK->k_num_dihedral.modify<LMPHostType>();
atomKK->k_dihedral_type.modify<LMPHostType>();
atomKK->k_dihedral_atom1.modify<LMPHostType>();
atomKK->k_dihedral_atom2.modify<LMPHostType>();
atomKK->k_dihedral_atom3.modify<LMPHostType>();
atomKK->k_dihedral_atom4.modify<LMPHostType>();
}
if (mask & IMPROPER_MASK) {
atomKK->k_num_improper.modify<LMPHostType>();
atomKK->k_improper_type.modify<LMPHostType>();
atomKK->k_improper_atom1.modify<LMPHostType>();
atomKK->k_improper_atom2.modify<LMPHostType>();
atomKK->k_improper_atom3.modify<LMPHostType>();
atomKK->k_improper_atom4.modify<LMPHostType>();
}
}
}
diff --git a/src/KOKKOS/fix_qeq_reax_kokkos.cpp b/src/KOKKOS/fix_qeq_reax_kokkos.cpp
index 3b8d5a85e..2e46b85fd 100644
--- a/src/KOKKOS/fix_qeq_reax_kokkos.cpp
+++ b/src/KOKKOS/fix_qeq_reax_kokkos.cpp
@@ -1,1231 +1,1232 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Ray Shan (SNL), Stan Moore (SNL)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "fix_qeq_reax_kokkos.h"
#include "kokkos.h"
#include "atom.h"
#include "atom_masks.h"
#include "atom_kokkos.h"
#include "comm.h"
#include "force.h"
#include "group.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list_kokkos.h"
#include "neigh_request.h"
#include "update.h"
#include "integrate.h"
#include "respa.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
-#include "pair_reax_c_kokkos.h"
+#include "pair_reaxc_kokkos.h"
using namespace LAMMPS_NS;
using namespace FixConst;
#define SMALL 0.0001
#define EV_TO_KCAL_PER_MOL 14.4
#define TEAMSIZE 128
/* ---------------------------------------------------------------------- */
template<class DeviceType>
-FixQEqReaxKokkos<DeviceType>::FixQEqReaxKokkos(LAMMPS *lmp, int narg, char **arg) :
+FixQEqReaxKokkos<DeviceType>::
+FixQEqReaxKokkos(LAMMPS *lmp, int narg, char **arg) :
FixQEqReax(lmp, narg, arg)
{
kokkosable = 1;
atomKK = (AtomKokkos *) atom;
execution_space = ExecutionSpaceFromDevice<DeviceType>::space;
datamask_read = X_MASK | V_MASK | F_MASK | MASK_MASK | Q_MASK | TYPE_MASK;
datamask_modify = Q_MASK | X_MASK;
nmax = nmax = m_cap = 0;
allocated_flag = 0;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
FixQEqReaxKokkos<DeviceType>::~FixQEqReaxKokkos()
{
if (copymode) return;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::init()
{
atomKK->k_q.modify<LMPHostType>();
atomKK->k_q.sync<LMPDeviceType>();
FixQEqReax::init();
neighflag = lmp->kokkos->neighflag_qeq;
int irequest = neighbor->nrequest - 1;
neighbor->requests[irequest]->
kokkos_host = Kokkos::Impl::is_same<DeviceType,LMPHostType>::value &&
!Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
neighbor->requests[irequest]->
kokkos_device = Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
if (neighflag == FULL) {
neighbor->requests[irequest]->fix = 1;
neighbor->requests[irequest]->pair = 0;
neighbor->requests[irequest]->full = 1;
neighbor->requests[irequest]->half = 0;
} else { //if (neighflag == HALF || neighflag == HALFTHREAD)
neighbor->requests[irequest]->fix = 1;
neighbor->requests[irequest]->pair = 0;
neighbor->requests[irequest]->full = 0;
neighbor->requests[irequest]->half = 1;
neighbor->requests[irequest]->ghost = 1;
}
int ntypes = atom->ntypes;
k_params = Kokkos::DualView<params_qeq*,Kokkos::LayoutRight,DeviceType>
("FixQEqReax::params",ntypes+1);
params = k_params.template view<DeviceType>();
for (n = 1; n <= ntypes; n++) {
k_params.h_view(n).chi = chi[n];
k_params.h_view(n).eta = eta[n];
k_params.h_view(n).gamma = gamma[n];
}
k_params.template modify<LMPHostType>();
cutsq = swb * swb;
init_shielding_k();
init_hist();
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::init_shielding_k()
{
int i,j;
int ntypes = atom->ntypes;
k_shield = DAT::tdual_ffloat_2d("qeq/kk:shield",ntypes+1,ntypes+1);
d_shield = k_shield.template view<DeviceType>();
for( i = 1; i <= ntypes; ++i )
for( j = 1; j <= ntypes; ++j )
k_shield.h_view(i,j) = pow( gamma[i] * gamma[j], -1.5 );
k_shield.template modify<LMPHostType>();
k_shield.template sync<DeviceType>();
k_tap = DAT::tdual_ffloat_1d("qeq/kk:tap",8);
d_tap = k_tap.template view<DeviceType>();
for (i = 0; i < 8; i ++)
k_tap.h_view(i) = Tap[i];
k_tap.template modify<LMPHostType>();
k_tap.template sync<DeviceType>();
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::init_hist()
{
int i,j;
k_s_hist = DAT::tdual_ffloat_2d("qeq/kk:s_hist",atom->nmax,5);
d_s_hist = k_s_hist.template view<DeviceType>();
h_s_hist = k_s_hist.h_view;
k_t_hist = DAT::tdual_ffloat_2d("qeq/kk:t_hist",atom->nmax,5);
d_t_hist = k_t_hist.template view<DeviceType>();
h_t_hist = k_t_hist.h_view;
for( i = 0; i < atom->nmax; i++ )
for( j = 0; j < 5; j++ )
k_s_hist.h_view(i,j) = k_t_hist.h_view(i,j) = 0.0;
k_s_hist.template modify<LMPHostType>();
k_s_hist.template sync<DeviceType>();
k_t_hist.template modify<LMPHostType>();
k_t_hist.template sync<DeviceType>();
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::setup_pre_force(int vflag)
{
//neighbor->build_one(list);
pre_force(vflag);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::pre_force(int vflag)
{
if (update->ntimestep % nevery) return;
atomKK->sync(execution_space,datamask_read);
atomKK->modified(execution_space,datamask_modify);
x = atomKK->k_x.view<DeviceType>();
v = atomKK->k_v.view<DeviceType>();
f = atomKK->k_f.view<DeviceType>();
q = atomKK->k_q.view<DeviceType>();
tag = atomKK->k_tag.view<DeviceType>();
type = atomKK->k_type.view<DeviceType>();
mask = atomKK->k_mask.view<DeviceType>();
nlocal = atomKK->nlocal;
nall = atom->nlocal + atom->nghost;
newton_pair = force->newton_pair;
k_params.template sync<DeviceType>();
k_shield.template sync<DeviceType>();
k_tap.template sync<DeviceType>();
NeighListKokkos<DeviceType>* k_list = static_cast<NeighListKokkos<DeviceType>*>(list);
d_numneigh = k_list->d_numneigh;
d_neighbors = k_list->d_neighbors;
d_ilist = k_list->d_ilist;
inum = list->inum;
k_list->clean_copy();
//cleanup_copy();
copymode = 1;
int teamsize = TEAMSIZE;
// allocate
allocate_array();
// get max number of neighbor
if (!allocated_flag || update->ntimestep == neighbor->lastcall)
allocate_matrix();
// compute_H
FixQEqReaxKokkosComputeHFunctor<DeviceType> computeH_functor(this);
Kokkos::parallel_scan(inum,computeH_functor);
DeviceType::fence();
// init_matvec
FixQEqReaxKokkosMatVecFunctor<DeviceType> matvec_functor(this);
Kokkos::parallel_for(inum,matvec_functor);
DeviceType::fence();
// comm->forward_comm_fix(this); //Dist_vector( s );
pack_flag = 2;
k_s.template modify<DeviceType>();
k_s.template sync<LMPHostType>();
comm->forward_comm_fix(this);
k_s.template modify<LMPHostType>();
k_s.template sync<DeviceType>();
// comm->forward_comm_fix(this); //Dist_vector( t );
pack_flag = 3;
k_t.template modify<DeviceType>();
k_t.template sync<LMPHostType>();
comm->forward_comm_fix(this);
k_t.template modify<LMPHostType>();
k_t.template sync<DeviceType>();
// 1st cg solve over b_s, s
cg_solve1();
DeviceType::fence();
// 2nd cg solve over b_t, t
cg_solve2();
DeviceType::fence();
// calculate_Q();
calculate_q();
DeviceType::fence();
copymode = 0;
if (!allocated_flag)
allocated_flag = 1;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::num_neigh_item(int ii, int &maxneigh) const
{
const int i = d_ilist[ii];
maxneigh += d_numneigh[i];
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::allocate_matrix()
{
int i,ii,m;
const int inum = list->inum;
nmax = atom->nmax;
// determine the total space for the H matrix
m_cap = 0;
FixQEqReaxKokkosNumNeighFunctor<DeviceType> neigh_functor(this);
Kokkos::parallel_reduce(inum,neigh_functor,m_cap);
d_firstnbr = typename AT::t_int_1d("qeq/kk:firstnbr",nmax);
d_numnbrs = typename AT::t_int_1d("qeq/kk:numnbrs",nmax);
d_jlist = typename AT::t_int_1d("qeq/kk:jlist",m_cap);
d_val = typename AT::t_ffloat_1d("qeq/kk:val",m_cap);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::allocate_array()
{
if (atom->nmax > nmax) {
nmax = atom->nmax;
k_o = DAT::tdual_ffloat_1d("qeq/kk:h_o",nmax);
d_o = k_o.template view<DeviceType>();
h_o = k_o.h_view;
d_Hdia_inv = typename AT::t_ffloat_1d("qeq/kk:h_Hdia_inv",nmax);
d_b_s = typename AT::t_ffloat_1d("qeq/kk:h_b_s",nmax);
d_b_t = typename AT::t_ffloat_1d("qeq/kk:h_b_t",nmax);
k_s = DAT::tdual_ffloat_1d("qeq/kk:h_s",nmax);
d_s = k_s.template view<DeviceType>();
h_s = k_s.h_view;
k_t = DAT::tdual_ffloat_1d("qeq/kk:h_t",nmax);
d_t = k_t.template view<DeviceType>();
h_t = k_t.h_view;
d_p = typename AT::t_ffloat_1d("qeq/kk:h_p",nmax);
d_r = typename AT::t_ffloat_1d("qeq/kk:h_r",nmax);
k_d = DAT::tdual_ffloat_1d("qeq/kk:h_d",nmax);
d_d = k_d.template view<DeviceType>();
h_d = k_d.h_view;
k_s_hist = DAT::tdual_ffloat_2d("qeq/kk:s_hist",nmax,5);
d_s_hist = k_s_hist.template view<DeviceType>();
h_s_hist = k_s_hist.h_view;
k_t_hist = DAT::tdual_ffloat_2d("qeq/kk:t_hist",nmax,5);
d_t_hist = k_t_hist.template view<DeviceType>();
h_t_hist = k_t_hist.h_view;
}
// init_storage
const int ignum = atom->nlocal + atom->nghost;
FixQEqReaxKokkosZeroFunctor<DeviceType> zero_functor(this);
Kokkos::parallel_for(ignum,zero_functor);
DeviceType::fence();
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::zero_item(int ii) const
{
const int i = d_ilist[ii];
const int itype = type(i);
if (mask[i] & groupbit) {
d_Hdia_inv[i] = 1.0 / params(itype).eta;
d_b_s[i] = -params(itype).chi;
d_b_t[i] = -1.0;
d_s[i] = 0.0;
d_t[i] = 0.0;
d_p[i] = 0.0;
d_o[i] = 0.0;
d_r[i] = 0.0;
d_d[i] = 0.0;
//for( int j = 0; j < 5; j++ )
//d_s_hist(i,j) = d_t_hist(i,j) = 0.0;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::compute_h_item(int ii, int &m_fill, const bool &final) const
{
const int i = d_ilist[ii];
int j,jj,jtype,flag;
if (mask[i] & groupbit) {
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const int itype = type(i);
const tagint itag = tag(i);
const int jnum = d_numneigh[i];
if (final)
d_firstnbr[i] = m_fill;
for (jj = 0; jj < jnum; jj++) {
j = d_neighbors(i,jj);
j &= NEIGHMASK;
jtype = type(j);
const X_FLOAT delx = x(j,0) - xtmp;
const X_FLOAT dely = x(j,1) - ytmp;
const X_FLOAT delz = x(j,2) - ztmp;
if (neighflag != FULL) {
const tagint jtag = tag(j);
flag = 0;
if (j < nlocal) flag = 1;
else if (itag < jtag) flag = 1;
else if (itag == jtag) {
if (delz > SMALL) flag = 1;
else if (fabs(delz) < SMALL) {
if (dely > SMALL) flag = 1;
else if (fabs(dely) < SMALL && delx > SMALL)
flag = 1;
}
}
if (!flag) continue;
}
const F_FLOAT rsq = delx*delx + dely*dely + delz*delz;
if (rsq > cutsq) continue;
if (final) {
const F_FLOAT r = sqrt(rsq);
d_jlist(m_fill) = j;
const F_FLOAT shldij = d_shield(itype,jtype);
d_val(m_fill) = calculate_H_k(r,shldij);
}
m_fill++;
}
if (final)
d_numnbrs[i] = m_fill - d_firstnbr[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::calculate_H_k(const F_FLOAT &r, const F_FLOAT &shld) const
{
F_FLOAT taper, denom;
taper = d_tap[7] * r + d_tap[6];
taper = taper * r + d_tap[5];
taper = taper * r + d_tap[4];
taper = taper * r + d_tap[3];
taper = taper * r + d_tap[2];
taper = taper * r + d_tap[1];
taper = taper * r + d_tap[0];
denom = r * r * r + shld;
denom = pow(denom,0.3333333333333);
return taper * EV_TO_KCAL_PER_MOL / denom;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::mat_vec_item(int ii) const
{
const int i = d_ilist[ii];
const int itype = type(i);
if (mask[i] & groupbit) {
d_Hdia_inv[i] = 1.0 / params(itype).eta;
d_b_s[i] = -params(itype).chi;
d_b_t[i] = -1.0;
d_t[i] = d_t_hist(i,2) + 3*(d_t_hist(i,0) - d_t_hist(i,1));
d_s[i] = 4*(d_s_hist(i,0)+d_s_hist(i,2))-(6*d_s_hist(i,1)+d_s_hist(i,3));
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::cg_solve1()
// b = b_s, x = s;
{
const int inum = list->inum;
const int ignum = inum + list->gnum;
F_FLOAT tmp, sig_old, b_norm;
const int teamsize = TEAMSIZE;
// sparse_matvec( &H, x, q );
FixQEqReaxKokkosSparse12Functor<DeviceType> sparse12_functor(this);
Kokkos::parallel_for(inum,sparse12_functor);
DeviceType::fence();
if (neighflag != FULL) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
DeviceType::fence();
if (neighflag == HALF) {
FixQEqReaxKokkosSparse13Functor<DeviceType,HALF> sparse13_functor(this);
Kokkos::parallel_for(inum,sparse13_functor);
} else {
FixQEqReaxKokkosSparse13Functor<DeviceType,HALFTHREAD> sparse13_functor(this);
Kokkos::parallel_for(inum,sparse13_functor);
}
} else {
Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec1> (inum, teamsize), *this);
}
DeviceType::fence();
if (neighflag != FULL) {
k_o.template modify<DeviceType>();
k_o.template sync<LMPHostType>();
comm->reverse_comm_fix(this); //Coll_vector( q );
k_o.template modify<LMPHostType>();
k_o.template sync<DeviceType>();
}
// vector_sum( r , 1., b, -1., q, nn );
// preconditioning: d[j] = r[j] * Hdia_inv[j];
// b_norm = parallel_norm( b, nn );
F_FLOAT my_norm = 0.0;
FixQEqReaxKokkosNorm1Functor<DeviceType> norm1_functor(this);
Kokkos::parallel_reduce(inum,norm1_functor,my_norm);
DeviceType::fence();
F_FLOAT norm_sqr = 0.0;
MPI_Allreduce( &my_norm, &norm_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
b_norm = sqrt(norm_sqr);
DeviceType::fence();
// sig_new = parallel_dot( r, d, nn);
F_FLOAT my_dot = 0.0;
FixQEqReaxKokkosDot1Functor<DeviceType> dot1_functor(this);
Kokkos::parallel_reduce(inum,dot1_functor,my_dot);
DeviceType::fence();
F_FLOAT dot_sqr = 0.0;
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
F_FLOAT sig_new = dot_sqr;
DeviceType::fence();
int loop;
const int loopmax = 200;
for (loop = 1; loop < loopmax & sqrt(sig_new)/b_norm > tolerance; loop++) {
// comm->forward_comm_fix(this); //Dist_vector( d );
pack_flag = 1;
k_d.template modify<DeviceType>();
k_d.template sync<LMPHostType>();
comm->forward_comm_fix(this);
k_d.template modify<LMPHostType>();
k_d.template sync<DeviceType>();
// sparse_matvec( &H, d, q );
FixQEqReaxKokkosSparse22Functor<DeviceType> sparse22_functor(this);
Kokkos::parallel_for(inum,sparse22_functor);
DeviceType::fence();
if (neighflag != FULL) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
DeviceType::fence();
if (neighflag == HALF) {
FixQEqReaxKokkosSparse23Functor<DeviceType,HALF> sparse23_functor(this);
Kokkos::parallel_for(inum,sparse23_functor);
} else {
FixQEqReaxKokkosSparse23Functor<DeviceType,HALFTHREAD> sparse23_functor(this);
Kokkos::parallel_for(inum,sparse23_functor);
}
} else {
Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec2> (inum, teamsize), *this);
}
DeviceType::fence();
if (neighflag != FULL) {
k_o.template modify<DeviceType>();
k_o.template sync<LMPHostType>();
comm->reverse_comm_fix(this); //Coll_vector( q );
k_o.template modify<LMPHostType>();
k_o.template sync<DeviceType>();
}
// tmp = parallel_dot( d, q, nn);
my_dot = dot_sqr = 0.0;
FixQEqReaxKokkosDot2Functor<DeviceType> dot2_functor(this);
Kokkos::parallel_reduce(inum,dot2_functor,my_dot);
DeviceType::fence();
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
tmp = dot_sqr;
alpha = sig_new / tmp;
sig_old = sig_new;
// vector_add( s, alpha, d, nn );
// vector_add( r, -alpha, q, nn );
my_dot = dot_sqr = 0.0;
FixQEqReaxKokkosPrecon1Functor<DeviceType> precon1_functor(this);
Kokkos::parallel_for(inum,precon1_functor);
DeviceType::fence();
// preconditioning: p[j] = r[j] * Hdia_inv[j];
// sig_new = parallel_dot( r, p, nn);
FixQEqReaxKokkosPreconFunctor<DeviceType> precon_functor(this);
Kokkos::parallel_reduce(inum,precon_functor,my_dot);
DeviceType::fence();
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
sig_new = dot_sqr;
beta = sig_new / sig_old;
// vector_sum( d, 1., p, beta, d, nn );
FixQEqReaxKokkosVecSum2Functor<DeviceType> vecsum2_functor(this);
Kokkos::parallel_for(inum,vecsum2_functor);
DeviceType::fence();
}
if (loop >= loopmax && comm->me == 0) {
char str[128];
sprintf(str,"Fix qeq/reax cg_solve1 convergence failed after %d iterations "
"at " BIGINT_FORMAT " step: %f",loop,update->ntimestep,sqrt(sig_new)/b_norm);
error->warning(FLERR,str);
//error->all(FLERR,str);
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::cg_solve2()
// b = b_t, x = t;
{
const int inum = list->inum;
const int ignum = inum + list->gnum;
F_FLOAT tmp, sig_old, b_norm;
const int teamsize = TEAMSIZE;
// sparse_matvec( &H, x, q );
FixQEqReaxKokkosSparse32Functor<DeviceType> sparse32_functor(this);
Kokkos::parallel_for(inum,sparse32_functor);
DeviceType::fence();
if (neighflag != FULL) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
DeviceType::fence();
if (neighflag == HALF) {
FixQEqReaxKokkosSparse33Functor<DeviceType,HALF> sparse33_functor(this);
Kokkos::parallel_for(inum,sparse33_functor);
} else {
FixQEqReaxKokkosSparse33Functor<DeviceType,HALFTHREAD> sparse33_functor(this);
Kokkos::parallel_for(inum,sparse33_functor);
}
} else {
Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec3> (inum, teamsize), *this);
}
DeviceType::fence();
if (neighflag != FULL) {
k_o.template modify<DeviceType>();
k_o.template sync<LMPHostType>();
comm->reverse_comm_fix(this); //Coll_vector( q );
k_o.template modify<LMPHostType>();
k_o.template sync<DeviceType>();
}
// vector_sum( r , 1., b, -1., q, nn );
// preconditioning: d[j] = r[j] * Hdia_inv[j];
// b_norm = parallel_norm( b, nn );
F_FLOAT my_norm = 0.0;
FixQEqReaxKokkosNorm2Functor<DeviceType> norm2_functor(this);
Kokkos::parallel_reduce(inum,norm2_functor,my_norm);
DeviceType::fence();
F_FLOAT norm_sqr = 0.0;
MPI_Allreduce( &my_norm, &norm_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
b_norm = sqrt(norm_sqr);
DeviceType::fence();
// sig_new = parallel_dot( r, d, nn);
F_FLOAT my_dot = 0.0;
FixQEqReaxKokkosDot1Functor<DeviceType> dot1_functor(this);
Kokkos::parallel_reduce(inum,dot1_functor,my_dot);
DeviceType::fence();
F_FLOAT dot_sqr = 0.0;
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
F_FLOAT sig_new = dot_sqr;
DeviceType::fence();
int loop;
const int loopmax = 200;
for (loop = 1; loop < loopmax & sqrt(sig_new)/b_norm > tolerance; loop++) {
// comm->forward_comm_fix(this); //Dist_vector( d );
pack_flag = 1;
k_d.template modify<DeviceType>();
k_d.template sync<LMPHostType>();
comm->forward_comm_fix(this);
k_d.template modify<LMPHostType>();
k_d.template sync<DeviceType>();
// sparse_matvec( &H, d, q );
FixQEqReaxKokkosSparse22Functor<DeviceType> sparse22_functor(this);
Kokkos::parallel_for(inum,sparse22_functor);
DeviceType::fence();
if (neighflag != FULL) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType,TagZeroQGhosts>(nlocal,nlocal+atom->nghost),*this);
DeviceType::fence();
if (neighflag == HALF) {
FixQEqReaxKokkosSparse23Functor<DeviceType,HALF> sparse23_functor(this);
Kokkos::parallel_for(inum,sparse23_functor);
} else {
FixQEqReaxKokkosSparse23Functor<DeviceType,HALFTHREAD> sparse23_functor(this);
Kokkos::parallel_for(inum,sparse23_functor);
}
} else {
Kokkos::parallel_for(Kokkos::TeamPolicy <DeviceType, TagSparseMatvec2> (inum, teamsize), *this);
}
DeviceType::fence();
if (neighflag != FULL) {
k_o.template modify<DeviceType>();
k_o.template sync<LMPHostType>();
comm->reverse_comm_fix(this); //Coll_vector( q );
k_o.template modify<LMPHostType>();
k_o.template sync<DeviceType>();
}
// tmp = parallel_dot( d, q, nn);
my_dot = dot_sqr = 0.0;
FixQEqReaxKokkosDot2Functor<DeviceType> dot2_functor(this);
Kokkos::parallel_reduce(inum,dot2_functor,my_dot);
DeviceType::fence();
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
tmp = dot_sqr;
DeviceType::fence();
alpha = sig_new / tmp;
sig_old = sig_new;
// vector_add( t, alpha, d, nn );
// vector_add( r, -alpha, q, nn );
my_dot = dot_sqr = 0.0;
FixQEqReaxKokkosPrecon2Functor<DeviceType> precon2_functor(this);
Kokkos::parallel_for(inum,precon2_functor);
DeviceType::fence();
// preconditioning: p[j] = r[j] * Hdia_inv[j];
// sig_new = parallel_dot( r, p, nn);
FixQEqReaxKokkosPreconFunctor<DeviceType> precon_functor(this);
Kokkos::parallel_reduce(inum,precon_functor,my_dot);
DeviceType::fence();
MPI_Allreduce( &my_dot, &dot_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
sig_new = dot_sqr;
beta = sig_new / sig_old;
// vector_sum( d, 1., p, beta, d, nn );
FixQEqReaxKokkosVecSum2Functor<DeviceType> vecsum2_functor(this);
Kokkos::parallel_for(inum,vecsum2_functor);
DeviceType::fence();
}
if (loop >= loopmax && comm->me == 0) {
char str[128];
sprintf(str,"Fix qeq/reax cg_solve2 convergence failed after %d iterations "
"at " BIGINT_FORMAT " step: %f",loop,update->ntimestep,sqrt(sig_new)/b_norm);
error->warning(FLERR,str);
//error->all(FLERR,str);
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::calculate_q()
{
F_FLOAT sum, sum_all;
const int inum = list->inum;
// s_sum = parallel_vector_acc( s, nn );
sum = sum_all = 0.0;
FixQEqReaxKokkosVecAcc1Functor<DeviceType> vecacc1_functor(this);
Kokkos::parallel_reduce(inum,vecacc1_functor,sum);
DeviceType::fence();
MPI_Allreduce(&sum, &sum_all, 1, MPI_DOUBLE, MPI_SUM, world );
const F_FLOAT s_sum = sum_all;
// t_sum = parallel_vector_acc( t, nn);
sum = sum_all = 0.0;
FixQEqReaxKokkosVecAcc2Functor<DeviceType> vecacc2_functor(this);
Kokkos::parallel_reduce(inum,vecacc2_functor,sum);
DeviceType::fence();
MPI_Allreduce(&sum, &sum_all, 1, MPI_DOUBLE, MPI_SUM, world );
const F_FLOAT t_sum = sum_all;
// u = s_sum / t_sum;
delta = s_sum/t_sum;
// q[i] = s[i] - u * t[i];
FixQEqReaxKokkosCalculateQFunctor<DeviceType> calculateQ_functor(this);
Kokkos::parallel_for(inum,calculateQ_functor);
DeviceType::fence();
pack_flag = 4;
//comm->forward_comm_fix( this ); //Dist_vector( atom->q );
atomKK->k_q.modify<DeviceType>();
atomKK->k_q.sync<LMPHostType>();
comm->forward_comm_fix(this);
atomKK->k_q.modify<LMPHostType>();
atomKK->k_q.sync<DeviceType>();
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse12_item(int ii) const
{
const int i = d_ilist[ii];
const int itype = type(i);
if (mask[i] & groupbit) {
d_o[i] = params(itype).eta * d_s[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse13_item(int ii) const
{
// The q array is atomic for Half/Thread neighbor style
Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_o = d_o;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
F_FLOAT tmp = 0.0;
for(int jj = d_firstnbr[i]; jj < d_firstnbr[i] + d_numnbrs[i]; jj++) {
const int j = d_jlist(jj);
tmp += d_val(jj) * d_s[j];
a_o[j] += d_val(jj) * d_s[i];
}
a_o[i] += tmp;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::operator() (TagSparseMatvec1, const membertype1 &team) const
{
const int i = d_ilist[team.league_rank()];
if (mask[i] & groupbit) {
F_FLOAT doitmp;
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, d_firstnbr[i], d_firstnbr[i] + d_numnbrs[i]), [&] (const int &jj, F_FLOAT &doi) {
const int j = d_jlist(jj);
doi += d_val(jj) * d_s[j];
}, doitmp);
Kokkos::single(Kokkos::PerTeam(team), [&] () {d_o[i] += doitmp;});
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse22_item(int ii) const
{
const int i = d_ilist[ii];
const int itype = type(i);
if (mask[i] & groupbit) {
d_o[i] = params(itype).eta * d_d[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse23_item(int ii) const
{
// The q array is atomic for Half/Thread neighbor style
Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_o = d_o;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
F_FLOAT tmp = 0.0;
for(int jj = d_firstnbr[i]; jj < d_firstnbr[i] + d_numnbrs[i]; jj++) {
const int j = d_jlist(jj);
tmp += d_val(jj) * d_d[j];
a_o[j] += d_val(jj) * d_d[i];
}
a_o[i] += tmp;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::operator() (TagSparseMatvec2, const membertype2 &team) const
{
const int i = d_ilist[team.league_rank()];
if (mask[i] & groupbit) {
F_FLOAT doitmp;
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, d_firstnbr[i], d_firstnbr[i] + d_numnbrs[i]), [&] (const int &jj, F_FLOAT &doi) {
const int j = d_jlist(jj);
doi += d_val(jj) * d_d[j];
}, doitmp);
Kokkos::single(Kokkos::PerTeam(team), [&] () {d_o[i] += doitmp; });
}
}
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::operator() (TagZeroQGhosts, const int &i) const
{
if (mask[i] & groupbit)
d_o[i] = 0.0;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse32_item(int ii) const
{
const int i = d_ilist[ii];
const int itype = type(i);
if (mask[i] & groupbit)
d_o[i] = params(itype).eta * d_t[i];
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::sparse33_item(int ii) const
{
// The q array is atomic for Half/Thread neighbor style
Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_o = d_o;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
F_FLOAT tmp = 0.0;
for(int jj = d_firstnbr[i]; jj < d_firstnbr[i] + d_numnbrs[i]; jj++) {
const int j = d_jlist(jj);
tmp += d_val(jj) * d_t[j];
a_o[j] += d_val(jj) * d_t[i];
}
a_o[i] += tmp;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::operator() (TagSparseMatvec3, const membertype3 &team) const
{
const int i = d_ilist[team.league_rank()];
if (mask[i] & groupbit) {
F_FLOAT doitmp;
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, d_firstnbr[i], d_firstnbr[i] + d_numnbrs[i]), [&] (const int &jj, F_FLOAT &doi) {
const int j = d_jlist(jj);
doi += d_val(jj) * d_t[j];
}, doitmp);
Kokkos::single(Kokkos::PerTeam(team), [&] () {d_o[i] += doitmp;});
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::vecsum2_item(int ii) const
{
const int i = d_ilist[ii];
if (mask[i] & groupbit)
d_d[i] = 1.0 * d_p[i] + beta * d_d[i];
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::norm1_item(int ii) const
{
F_FLOAT tmp = 0;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
d_r[i] = 1.0*d_b_s[i] + -1.0*d_o[i];
d_d[i] = d_r[i] * d_Hdia_inv[i];
tmp = d_b_s[i] * d_b_s[i];
}
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::norm2_item(int ii) const
{
F_FLOAT tmp = 0;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
d_r[i] = 1.0*d_b_t[i] + -1.0*d_o[i];
d_d[i] = d_r[i] * d_Hdia_inv[i];
tmp = d_b_t[i] * d_b_t[i];
}
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::dot1_item(int ii) const
{
F_FLOAT tmp = 0.0;
const int i = d_ilist[ii];
if (mask[i] & groupbit)
tmp = d_r[i] * d_d[i];
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::dot2_item(int ii) const
{
double tmp = 0.0;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
tmp = d_d[i] * d_o[i];
}
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::precon1_item(int ii) const
{
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
d_s[i] += alpha * d_d[i];
d_r[i] += -alpha * d_o[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::precon2_item(int ii) const
{
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
d_t[i] += alpha * d_d[i];
d_r[i] += -alpha * d_o[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::precon_item(int ii) const
{
F_FLOAT tmp = 0.0;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
d_p[i] = d_r[i] * d_Hdia_inv[i];
tmp = d_r[i] * d_p[i];
}
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::vecacc1_item(int ii) const
{
F_FLOAT tmp = 0.0;
const int i = d_ilist[ii];
if (mask[i] & groupbit)
tmp = d_s[i];
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
double FixQEqReaxKokkos<DeviceType>::vecacc2_item(int ii) const
{
F_FLOAT tmp = 0.0;
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
tmp = d_t[i];
}
return tmp;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void FixQEqReaxKokkos<DeviceType>::calculate_q_item(int ii) const
{
const int i = d_ilist[ii];
if (mask[i] & groupbit) {
q(i) = d_s[i] - delta * d_t[i];
for (int k = 4; k > 0; --k) {
d_s_hist(i,k) = d_s_hist(i,k-1);
d_t_hist(i,k) = d_t_hist(i,k-1);
}
d_s_hist(i,0) = d_s[i];
d_t_hist(i,0) = d_t[i];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
int FixQEqReaxKokkos<DeviceType>::pack_forward_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int m;
if( pack_flag == 1)
for(m = 0; m < n; m++) buf[m] = h_d[list[m]];
else if( pack_flag == 2 )
for(m = 0; m < n; m++) buf[m] = h_s[list[m]];
else if( pack_flag == 3 )
for(m = 0; m < n; m++) buf[m] = h_t[list[m]];
else if( pack_flag == 4 )
for(m = 0; m < n; m++) buf[m] = atom->q[list[m]];
return n;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::unpack_forward_comm(int n, int first, double *buf)
{
int i, m;
if( pack_flag == 1)
for(m = 0, i = first; m < n; m++, i++) h_d[i] = buf[m];
else if( pack_flag == 2)
for(m = 0, i = first; m < n; m++, i++) h_s[i] = buf[m];
else if( pack_flag == 3)
for(m = 0, i = first; m < n; m++, i++) h_t[i] = buf[m];
else if( pack_flag == 4)
for(m = 0, i = first; m < n; m++, i++) atom->q[i] = buf[m];
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
int FixQEqReaxKokkos<DeviceType>::pack_reverse_comm(int n, int first, double *buf)
{
int i, m;
for(m = 0, i = first; m < n; m++, i++) {
buf[m] = h_o[i];
}
return n;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::unpack_reverse_comm(int n, int *list, double *buf)
{
for(int m = 0; m < n; m++) {
h_o[list[m]] += buf[m];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void FixQEqReaxKokkos<DeviceType>::cleanup_copy()
{
id = style = NULL;
}
/* ----------------------------------------------------------------------
memory usage of local atom-based arrays
------------------------------------------------------------------------- */
template<class DeviceType>
double FixQEqReaxKokkos<DeviceType>::memory_usage()
{
double bytes;
bytes = atom->nmax*5*2 * sizeof(F_FLOAT); // s_hist & t_hist
bytes += atom->nmax*8 * sizeof(F_FLOAT); // storage
bytes += n_cap*2 * sizeof(int); // matrix...
bytes += m_cap * sizeof(int);
bytes += m_cap * sizeof(F_FLOAT);
return bytes;
}
/* ---------------------------------------------------------------------- */\
namespace LAMMPS_NS {
template class FixQEqReaxKokkos<LMPDeviceType>;
#ifdef KOKKOS_HAVE_CUDA
template class FixQEqReaxKokkos<LMPHostType>;
#endif
}
diff --git a/src/KOKKOS/fix_reaxc_bonds_kokkos.cpp b/src/KOKKOS/fix_reaxc_bonds_kokkos.cpp
index 7688d6745..e4fb9385a 100644
--- a/src/KOKKOS/fix_reaxc_bonds_kokkos.cpp
+++ b/src/KOKKOS/fix_reaxc_bonds_kokkos.cpp
@@ -1,126 +1,126 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Stan Moore (Sandia)
------------------------------------------------------------------------- */
#include <stdlib.h>
#include <string.h>
#include "fix_ave_atom.h"
#include "fix_reaxc_bonds_kokkos.h"
#include "atom.h"
#include "update.h"
-#include "pair_reax_c_kokkos.h"
+#include "pair_reaxc_kokkos.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "comm.h"
#include "force.h"
#include "compute.h"
#include "input.h"
#include "variable.h"
#include "memory.h"
#include "error.h"
#include "reaxc_list.h"
#include "reaxc_types.h"
#include "reaxc_defs.h"
#include "atom_masks.h"
using namespace LAMMPS_NS;
using namespace FixConst;
/* ---------------------------------------------------------------------- */
FixReaxCBondsKokkos::FixReaxCBondsKokkos(LAMMPS *lmp, int narg, char **arg) :
FixReaxCBonds(lmp, narg, arg)
{
kokkosable = 1;
atomKK = (AtomKokkos *) atom;
datamask_read = EMPTY_MASK;
datamask_modify = EMPTY_MASK;
}
/* ---------------------------------------------------------------------- */
FixReaxCBondsKokkos::~FixReaxCBondsKokkos()
{
}
/* ---------------------------------------------------------------------- */
void FixReaxCBondsKokkos::init()
{
Pair *pair_kk = force->pair_match("reax/c/kk",1);
if (pair_kk == NULL) error->all(FLERR,"Cannot use fix reax/c/bonds without "
"pair_style reax/c/kk");
FixReaxCBonds::init();
}
/* ---------------------------------------------------------------------- */
void FixReaxCBondsKokkos::Output_ReaxC_Bonds(bigint ntimestep, FILE *fp)
{
int nbuf_local;
int nlocal_max, numbonds, numbonds_max;
double *buf;
DAT::tdual_ffloat_1d k_buf;
int nlocal = atom->nlocal;
int nlocal_tot = static_cast<int> (atom->natoms);
numbonds = 0;
if (reaxc->execution_space == Device)
((PairReaxCKokkos<LMPDeviceType>*) reaxc)->FindBond(numbonds);
else
((PairReaxCKokkos<LMPHostType>*) reaxc)->FindBond(numbonds);
// allocate a temporary buffer for the snapshot info
MPI_Allreduce(&numbonds,&numbonds_max,1,MPI_INT,MPI_MAX,world);
MPI_Allreduce(&nlocal,&nlocal_max,1,MPI_INT,MPI_MAX,world);
nbuf = 1+(numbonds_max*2+10)*nlocal_max;
memory->create_kokkos(k_buf,buf,nbuf,"reax/c/bonds:buf");
// Pass information to buffer
if (reaxc->execution_space == Device)
((PairReaxCKokkos<LMPDeviceType>*) reaxc)->PackBondBuffer(k_buf,nbuf_local);
else
((PairReaxCKokkos<LMPHostType>*) reaxc)->PackBondBuffer(k_buf,nbuf_local);
buf[0] = nlocal;
// Receive information from buffer for output
RecvBuffer(buf, nbuf, nbuf_local, nlocal_tot, numbonds_max);
memory->destroy_kokkos(k_buf,buf);
}
/* ---------------------------------------------------------------------- */
double FixReaxCBondsKokkos::memory_usage()
{
double bytes;
bytes = nbuf*sizeof(double);
// These are accounted for in PairReaxCKokkos:
//bytes += nmax*sizeof(int);
//bytes += 1.0*nmax*MAXREAXBOND*sizeof(double);
//bytes += 1.0*nmax*MAXREAXBOND*sizeof(int);
return bytes;
}
diff --git a/src/KOKKOS/fix_reaxc_species_kokkos.cpp b/src/KOKKOS/fix_reaxc_species_kokkos.cpp
index 17b42174c..ce84de30c 100644
--- a/src/KOKKOS/fix_reaxc_species_kokkos.cpp
+++ b/src/KOKKOS/fix_reaxc_species_kokkos.cpp
@@ -1,159 +1,159 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing authors: Stan Moore (Sandia)
------------------------------------------------------------------------- */
#include <stdlib.h>
#include <math.h>
#include "atom.h"
#include <string.h>
#include "fix_ave_atom.h"
#include "fix_reaxc_species_kokkos.h"
#include "domain.h"
#include "update.h"
-#include "pair_reax_c_kokkos.h"
+#include "pair_reaxc_kokkos.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "comm.h"
#include "force.h"
#include "compute.h"
#include "input.h"
#include "variable.h"
#include "memory.h"
#include "error.h"
#include "reaxc_list.h"
#include "atom_masks.h"
using namespace LAMMPS_NS;
using namespace FixConst;
/* ---------------------------------------------------------------------- */
FixReaxCSpeciesKokkos::FixReaxCSpeciesKokkos(LAMMPS *lmp, int narg, char **arg) :
FixReaxCSpecies(lmp, narg, arg)
{
kokkosable = 1;
atomKK = (AtomKokkos *) atom;
// NOTE: Could improve performance if a Kokkos version of ComputeSpecAtom is added
datamask_read = X_MASK | V_MASK | Q_MASK | MASK_MASK;
datamask_modify = EMPTY_MASK;
}
/* ---------------------------------------------------------------------- */
FixReaxCSpeciesKokkos::~FixReaxCSpeciesKokkos()
{
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpeciesKokkos::init()
{
Pair* pair_kk = force->pair_match("reax/c/kk",1);
if (pair_kk == NULL) error->all(FLERR,"Cannot use fix reax/c/species/kk without "
"pair_style reax/c/kk");
FixReaxCSpecies::init();
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpeciesKokkos::FindMolecule()
{
int i,j,ii,jj,inum,itype,jtype,loop,looptot;
int change,done,anychange;
int *mask = atom->mask;
double bo_tmp,bo_cut;
double **spec_atom = f_SPECBOND->array_atom;
inum = reaxc->list->inum;
typename ArrayTypes<LMPHostType>::t_int_1d ilist;
if (reaxc->execution_space == Host) {
NeighListKokkos<LMPHostType>* k_list = static_cast<NeighListKokkos<LMPHostType>*>(reaxc->list);
k_list->k_ilist.sync<LMPHostType>();
ilist = k_list->k_ilist.h_view;
} else {
NeighListKokkos<LMPDeviceType>* k_list = static_cast<NeighListKokkos<LMPDeviceType>*>(reaxc->list);
k_list->k_ilist.sync<LMPHostType>();
ilist = k_list->k_ilist.h_view;
}
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
if (mask[i] & groupbit) {
clusterID[i] = atom->tag[i];
x0[i].x = spec_atom[i][1];
x0[i].y = spec_atom[i][2];
x0[i].z = spec_atom[i][3];
}
else clusterID[i] = 0.0;
}
loop = 0;
while (1) {
comm->forward_comm_fix(this);
loop ++;
change = 0;
while (1) {
done = 1;
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
if (!(mask[i] & groupbit)) continue;
itype = atom->type[i];
for (jj = 0; jj < MAXSPECBOND; jj++) {
j = reaxc->tmpid[i][jj];
if (j < i) continue;
if (!(mask[j] & groupbit)) continue;
if (clusterID[i] == clusterID[j] && PBCconnected[i] == PBCconnected[j]
&& x0[i].x == x0[j].x && x0[i].y == x0[j].y && x0[i].z == x0[j].z) continue;
jtype = atom->type[j];
bo_cut = BOCut[itype][jtype];
bo_tmp = spec_atom[i][jj+7];
if (bo_tmp > bo_cut) {
clusterID[i] = clusterID[j] = MIN(clusterID[i], clusterID[j]);
PBCconnected[i] = PBCconnected[j] = MAX(PBCconnected[i], PBCconnected[j]);
x0[i] = x0[j] = chAnchor(x0[i], x0[j]);
if ((fabs(spec_atom[i][1] - spec_atom[j][1]) > reaxc->control->bond_cut)
|| (fabs(spec_atom[i][2] - spec_atom[j][2]) > reaxc->control->bond_cut)
|| (fabs(spec_atom[i][3] - spec_atom[j][3]) > reaxc->control->bond_cut))
PBCconnected[i] = PBCconnected[j] = 1;
done = 0;
}
}
}
if (!done) change = 1;
if (done) break;
}
MPI_Allreduce(&change,&anychange,1,MPI_INT,MPI_MAX,world);
if (!anychange) break;
MPI_Allreduce(&loop,&looptot,1,MPI_INT,MPI_SUM,world);
if (looptot >= 400*nprocs) break;
}
-}
\ No newline at end of file
+}
diff --git a/src/KOKKOS/modify_kokkos.cpp b/src/KOKKOS/modify_kokkos.cpp
index b4a89c8e3..c9242f211 100644
--- a/src/KOKKOS/modify_kokkos.cpp
+++ b/src/KOKKOS/modify_kokkos.cpp
@@ -1,721 +1,761 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include "modify_kokkos.h"
#include "atom_kokkos.h"
#include "update.h"
#include "fix.h"
#include "compute.h"
#include "kokkos.h"
using namespace LAMMPS_NS;
#define BIG 1.0e20
/* ---------------------------------------------------------------------- */
ModifyKokkos::ModifyKokkos(LAMMPS *lmp) : Modify(lmp)
{
atomKK = (AtomKokkos *) atom;
}
/* ----------------------------------------------------------------------
setup for run, calls setup() of all fixes and computes
called from Verlet, RESPA, Min
------------------------------------------------------------------------- */
void ModifyKokkos::setup(int vflag)
{
// compute setup needs to come before fix setup
// b/c NH fixes need use DOF of temperature computes
for (int i = 0; i < ncompute; i++) compute[i]->setup();
if (update->whichflag == 1)
for (int i = 0; i < nfix; i++) {
atomKK->sync(fix[i]->execution_space,fix[i]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[i]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[i]->setup(vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[i]->execution_space,fix[i]->datamask_modify);
}
else if (update->whichflag == 2)
for (int i = 0; i < nfix; i++) {
atomKK->sync(fix[i]->execution_space,fix[i]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[i]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[i]->min_setup(vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[i]->execution_space,fix[i]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
setup pre_exchange call, only for fixes that define pre_exchange
called from Verlet, RESPA, Min, and WriteRestart with whichflag = 0
------------------------------------------------------------------------- */
void ModifyKokkos::setup_pre_exchange()
{
if (update->whichflag <= 1)
for (int i = 0; i < n_pre_exchange; i++) {
atomKK->sync(fix[list_pre_exchange[i]]->execution_space,
fix[list_pre_exchange[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_pre_exchange[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_pre_exchange[i]]->setup_pre_exchange();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_pre_exchange[i]]->execution_space,
fix[list_pre_exchange[i]]->datamask_modify);
}
else if (update->whichflag == 2)
for (int i = 0; i < n_min_pre_exchange; i++) {
atomKK->sync(fix[list_min_pre_exchange[i]]->execution_space,
fix[list_min_pre_exchange[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_pre_exchange[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_pre_exchange[i]]->setup_pre_exchange();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_pre_exchange[i]]->execution_space,
fix[list_min_pre_exchange[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
setup pre_neighbor call, only for fixes that define pre_neighbor
called from Verlet, RESPA
------------------------------------------------------------------------- */
void ModifyKokkos::setup_pre_neighbor()
{
if (update->whichflag == 1)
for (int i = 0; i < n_pre_neighbor; i++) {
atomKK->sync(fix[list_pre_neighbor[i]]->execution_space,
fix[list_pre_neighbor[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_pre_neighbor[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_pre_neighbor[i]]->setup_pre_neighbor();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_pre_neighbor[i]]->execution_space,
fix[list_pre_neighbor[i]]->datamask_modify);
}
else if (update->whichflag == 2)
for (int i = 0; i < n_min_pre_neighbor; i++) {
atomKK->sync(fix[list_min_pre_neighbor[i]]->execution_space,
fix[list_min_pre_neighbor[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_pre_neighbor[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_pre_neighbor[i]]->setup_pre_neighbor();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_pre_neighbor[i]]->execution_space,
fix[list_min_pre_neighbor[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
setup pre_force call, only for fixes that define pre_force
called from Verlet, RESPA, Min
------------------------------------------------------------------------- */
void ModifyKokkos::setup_pre_force(int vflag)
{
if (update->whichflag == 1)
for (int i = 0; i < n_pre_force; i++) {
atomKK->sync(fix[list_pre_force[i]]->execution_space,
fix[list_pre_force[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_pre_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_pre_force[i]]->setup_pre_force(vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_pre_force[i]]->execution_space,
fix[list_pre_force[i]]->datamask_modify);
}
else if (update->whichflag == 2)
for (int i = 0; i < n_min_pre_force; i++) {
atomKK->sync(fix[list_min_pre_force[i]]->execution_space,
fix[list_min_pre_force[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_pre_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_pre_force[i]]->setup_pre_force(vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_pre_force[i]]->execution_space,
fix[list_min_pre_force[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
setup pre_reverse call, only for fixes that define pre_reverse
called from Verlet, RESPA, Min
------------------------------------------------------------------------- */
void ModifyKokkos::setup_pre_reverse(int eflag, int vflag)
{
if (update->whichflag == 1)
for (int i = 0; i < n_pre_reverse; i++) {
atomKK->sync(fix[list_pre_reverse[i]]->execution_space,
fix[list_pre_reverse[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_pre_reverse[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_pre_reverse[i]]->setup_pre_reverse(eflag,vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_pre_reverse[i]]->execution_space,
fix[list_pre_reverse[i]]->datamask_modify);
}
else if (update->whichflag == 2)
for (int i = 0; i < n_min_pre_reverse; i++) {
atomKK->sync(fix[list_min_pre_reverse[i]]->execution_space,
fix[list_min_pre_reverse[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_pre_reverse[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_pre_reverse[i]]->setup_pre_reverse(eflag,vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_pre_reverse[i]]->execution_space,
fix[list_min_pre_reverse[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
1st half of integrate call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::initial_integrate(int vflag)
{
for (int i = 0; i < n_initial_integrate; i++) {
atomKK->sync(fix[list_initial_integrate[i]]->execution_space,
fix[list_initial_integrate[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_initial_integrate[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_initial_integrate[i]]->initial_integrate(vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_initial_integrate[i]]->execution_space,
fix[list_initial_integrate[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
post_integrate call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::post_integrate()
{
for (int i = 0; i < n_post_integrate; i++) {
atomKK->sync(fix[list_post_integrate[i]]->execution_space,
fix[list_post_integrate[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_post_integrate[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_post_integrate[i]]->post_integrate();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_post_integrate[i]]->execution_space,
fix[list_post_integrate[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
pre_exchange call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::pre_exchange()
{
for (int i = 0; i < n_pre_exchange; i++) {
atomKK->sync(fix[list_pre_exchange[i]]->execution_space,
fix[list_pre_exchange[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_pre_exchange[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_pre_exchange[i]]->pre_exchange();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_pre_exchange[i]]->execution_space,
fix[list_pre_exchange[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
pre_neighbor call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::pre_neighbor()
{
for (int i = 0; i < n_pre_neighbor; i++) {
atomKK->sync(fix[list_pre_neighbor[i]]->execution_space,
fix[list_pre_neighbor[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_pre_neighbor[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_pre_neighbor[i]]->pre_neighbor();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_pre_neighbor[i]]->execution_space,
fix[list_pre_neighbor[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
pre_force call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::pre_force(int vflag)
{
for (int i = 0; i < n_pre_force; i++) {
atomKK->sync(fix[list_pre_force[i]]->execution_space,
fix[list_pre_force[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_pre_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_pre_force[i]]->pre_force(vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_pre_force[i]]->execution_space,
fix[list_pre_force[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
pre_reverse call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::pre_reverse(int eflag, int vflag)
{
for (int i = 0; i < n_pre_reverse; i++) {
atomKK->sync(fix[list_pre_reverse[i]]->execution_space,
fix[list_pre_reverse[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_pre_reverse[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_pre_reverse[i]]->pre_reverse(eflag,vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_pre_reverse[i]]->execution_space,
fix[list_pre_reverse[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
post_force call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::post_force(int vflag)
{
for (int i = 0; i < n_post_force; i++) {
atomKK->sync(fix[list_post_force[i]]->execution_space,
fix[list_post_force[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_post_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_post_force[i]]->post_force(vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_post_force[i]]->execution_space,
fix[list_post_force[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
2nd half of integrate call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::final_integrate()
{
for (int i = 0; i < n_final_integrate; i++) {
atomKK->sync(fix[list_final_integrate[i]]->execution_space,
fix[list_final_integrate[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_final_integrate[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_final_integrate[i]]->final_integrate();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_final_integrate[i]]->execution_space,
fix[list_final_integrate[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
end-of-timestep call, only for relevant fixes
only call fix->end_of_step() on timesteps that are multiples of nevery
------------------------------------------------------------------------- */
void ModifyKokkos::end_of_step()
{
for (int i = 0; i < n_end_of_step; i++)
if (update->ntimestep % end_of_step_every[i] == 0) {
atomKK->sync(fix[list_end_of_step[i]]->execution_space,
fix[list_end_of_step[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_end_of_step[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_end_of_step[i]]->end_of_step();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_end_of_step[i]]->execution_space,
fix[list_end_of_step[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
thermo energy call, only for relevant fixes
called by Thermo class
compute_scalar() is fix call to return energy
------------------------------------------------------------------------- */
double ModifyKokkos::thermo_energy()
{
double energy = 0.0;
for (int i = 0; i < n_thermo_energy; i++) {
atomKK->sync(fix[list_thermo_energy[i]]->execution_space,
fix[list_thermo_energy[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_thermo_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
energy += fix[list_thermo_energy[i]]->compute_scalar();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_thermo_energy[i]]->execution_space,
fix[list_thermo_energy[i]]->datamask_modify);
}
return energy;
}
/* ----------------------------------------------------------------------
post_run call
------------------------------------------------------------------------- */
void ModifyKokkos::post_run()
{
for (int i = 0; i < nfix; i++) {
atomKK->sync(fix[i]->execution_space,
fix[i]->datamask_read);
fix[i]->post_run();
atomKK->modified(fix[i]->execution_space,
fix[i]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
setup rRESPA pre_force call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::setup_pre_force_respa(int vflag, int ilevel)
{
for (int i = 0; i < n_pre_force; i++) {
atomKK->sync(fix[list_pre_force[i]]->execution_space,
fix[list_pre_force[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_pre_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_pre_force[i]]->setup_pre_force_respa(vflag,ilevel);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_pre_force[i]]->execution_space,
fix[list_pre_force[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
1st half of rRESPA integrate call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::initial_integrate_respa(int vflag, int ilevel, int iloop)
{
for (int i = 0; i < n_initial_integrate_respa; i++) {
atomKK->sync(fix[list_initial_integrate_respa[i]]->execution_space,
fix[list_initial_integrate_respa[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_initial_integrate_respa[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_initial_integrate_respa[i]]->
initial_integrate_respa(vflag,ilevel,iloop);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_initial_integrate_respa[i]]->execution_space,
fix[list_initial_integrate_respa[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
rRESPA post_integrate call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::post_integrate_respa(int ilevel, int iloop)
{
for (int i = 0; i < n_post_integrate_respa; i++) {
atomKK->sync(fix[list_post_integrate_respa[i]]->execution_space,
fix[list_post_integrate_respa[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_post_integrate_respa[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_post_integrate_respa[i]]->post_integrate_respa(ilevel,iloop);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_post_integrate_respa[i]]->execution_space,
fix[list_post_integrate_respa[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
rRESPA pre_force call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::pre_force_respa(int vflag, int ilevel, int iloop)
{
for (int i = 0; i < n_pre_force_respa; i++) {
atomKK->sync(fix[list_pre_force_respa[i]]->execution_space,
fix[list_pre_force_respa[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_pre_force_respa[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_pre_force_respa[i]]->pre_force_respa(vflag,ilevel,iloop);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_pre_force_respa[i]]->execution_space,
fix[list_pre_force_respa[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
rRESPA post_force call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::post_force_respa(int vflag, int ilevel, int iloop)
{
for (int i = 0; i < n_post_force_respa; i++) {
atomKK->sync(fix[list_post_force_respa[i]]->execution_space,
fix[list_post_force_respa[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_post_force_respa[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_post_force_respa[i]]->post_force_respa(vflag,ilevel,iloop);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_post_force_respa[i]]->execution_space,
fix[list_post_force_respa[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
2nd half of rRESPA integrate call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::final_integrate_respa(int ilevel, int iloop)
{
for (int i = 0; i < n_final_integrate_respa; i++) {
atomKK->sync(fix[list_final_integrate_respa[i]]->execution_space,
fix[list_final_integrate_respa[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_final_integrate_respa[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_final_integrate_respa[i]]->final_integrate_respa(ilevel,iloop);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_final_integrate_respa[i]]->execution_space,
fix[list_final_integrate_respa[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
minimizer pre-exchange call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::min_pre_exchange()
{
for (int i = 0; i < n_min_pre_exchange; i++) {
atomKK->sync(fix[list_min_pre_exchange[i]]->execution_space,
fix[list_min_pre_exchange[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_pre_exchange[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_pre_exchange[i]]->min_pre_exchange();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_pre_exchange[i]]->execution_space,
fix[list_min_pre_exchange[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
minimizer pre-neighbor call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::min_pre_neighbor()
{
for (int i = 0; i < n_min_pre_neighbor; i++) {
atomKK->sync(fix[list_min_pre_neighbor[i]]->execution_space,
fix[list_min_pre_neighbor[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_pre_neighbor[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_pre_neighbor[i]]->min_pre_neighbor();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_pre_neighbor[i]]->execution_space,
fix[list_min_pre_neighbor[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
minimizer pre-force call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::min_pre_force(int vflag)
{
for (int i = 0; i < n_min_pre_force; i++) {
atomKK->sync(fix[list_min_pre_force[i]]->execution_space,
fix[list_min_pre_force[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_pre_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_pre_force[i]]->min_pre_force(vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_pre_force[i]]->execution_space,
fix[list_min_pre_force[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
minimizer pre-reverse call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::min_pre_reverse(int eflag, int vflag)
{
for (int i = 0; i < n_min_pre_reverse; i++) {
atomKK->sync(fix[list_min_pre_reverse[i]]->execution_space,
fix[list_min_pre_reverse[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_pre_reverse[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_pre_reverse[i]]->min_pre_reverse(eflag,vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_pre_reverse[i]]->execution_space,
fix[list_min_pre_reverse[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
minimizer force adjustment call, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::min_post_force(int vflag)
{
for (int i = 0; i < n_min_post_force; i++) {
atomKK->sync(fix[list_min_post_force[i]]->execution_space,
fix[list_min_post_force[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_post_force[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_post_force[i]]->min_post_force(vflag);
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_post_force[i]]->execution_space,
fix[list_min_post_force[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
minimizer energy/force evaluation, only for relevant fixes
return energy and forces on extra degrees of freedom
------------------------------------------------------------------------- */
double ModifyKokkos::min_energy(double *fextra)
{
int ifix,index;
index = 0;
double eng = 0.0;
for (int i = 0; i < n_min_energy; i++) {
ifix = list_min_energy[i];
atomKK->sync(fix[ifix]->execution_space,fix[ifix]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[ifix]->kokkosable) lmp->kokkos->auto_sync = 1;
eng += fix[ifix]->min_energy(&fextra[index]);
index += fix[ifix]->min_dof();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[ifix]->execution_space,fix[ifix]->datamask_modify);
}
return eng;
}
/* ----------------------------------------------------------------------
store current state of extra dof, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::min_store()
{
for (int i = 0; i < n_min_energy; i++) {
atomKK->sync(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_energy[i]]->min_store();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
mange state of extra dof on a stack, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::min_clearstore()
{
for (int i = 0; i < n_min_energy; i++) {
atomKK->sync(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_energy[i]]->min_clearstore();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_modify);
}
}
void ModifyKokkos::min_pushstore()
{
for (int i = 0; i < n_min_energy; i++) {
atomKK->sync(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_energy[i]]->min_pushstore();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_modify);
}
}
void ModifyKokkos::min_popstore()
{
for (int i = 0; i < n_min_energy; i++) {
atomKK->sync(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[list_min_energy[i]]->min_popstore();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
displace extra dof along vector hextra, only for relevant fixes
------------------------------------------------------------------------- */
void ModifyKokkos::min_step(double alpha, double *hextra)
{
int ifix,index;
index = 0;
for (int i = 0; i < n_min_energy; i++) {
ifix = list_min_energy[i];
atomKK->sync(fix[ifix]->execution_space,fix[ifix]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[ifix]->kokkosable) lmp->kokkos->auto_sync = 1;
fix[ifix]->min_step(alpha,&hextra[index]);
index += fix[ifix]->min_dof();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[ifix]->execution_space,fix[ifix]->datamask_modify);
}
}
/* ----------------------------------------------------------------------
compute max allowed step size along vector hextra, only for relevant fixes
------------------------------------------------------------------------- */
double ModifyKokkos::max_alpha(double *hextra)
{
int ifix,index;
double alpha = BIG;
index = 0;
for (int i = 0; i < n_min_energy; i++) {
ifix = list_min_energy[i];
atomKK->sync(fix[ifix]->execution_space,fix[ifix]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[ifix]->kokkosable) lmp->kokkos->auto_sync = 1;
double alpha_one = fix[ifix]->max_alpha(&hextra[index]);
alpha = MIN(alpha,alpha_one);
index += fix[ifix]->min_dof();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[ifix]->execution_space,fix[ifix]->datamask_modify);
}
return alpha;
}
/* ----------------------------------------------------------------------
extract extra dof for minimization, only for relevant fixes
------------------------------------------------------------------------- */
int ModifyKokkos::min_dof()
{
int ndof = 0;
for (int i = 0; i < n_min_energy; i++) {
atomKK->sync(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
ndof += fix[list_min_energy[i]]->min_dof();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
atomKK->modified(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_modify);
}
return ndof;
}
/* ----------------------------------------------------------------------
reset reference state of fix, only for relevant fixes
------------------------------------------------------------------------- */
int ModifyKokkos::min_reset_ref()
{
int itmp,itmpall;
itmpall = 0;
for (int i = 0; i < n_min_energy; i++) {
atomKK->sync(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_read);
+ int prev_auto_sync = lmp->kokkos->auto_sync;
if (!fix[list_min_energy[i]]->kokkosable) lmp->kokkos->auto_sync = 1;
itmp = fix[list_min_energy[i]]->min_reset_ref();
- lmp->kokkos->auto_sync = 0;
+ lmp->kokkos->auto_sync = prev_auto_sync;
if (itmp) itmpall = 1;
atomKK->modified(fix[list_min_energy[i]]->execution_space,
fix[list_min_energy[i]]->datamask_modify);
}
return itmpall;
}
diff --git a/src/KOKKOS/pair_reax_c_kokkos.cpp b/src/KOKKOS/pair_reaxc_kokkos.cpp
similarity index 99%
rename from src/KOKKOS/pair_reax_c_kokkos.cpp
rename to src/KOKKOS/pair_reaxc_kokkos.cpp
index acf9c754c..59369b5e0 100644
--- a/src/KOKKOS/pair_reax_c_kokkos.cpp
+++ b/src/KOKKOS/pair_reaxc_kokkos.cpp
@@ -1,4196 +1,4196 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Ray Shan (SNL), Stan Moore (SNL)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
-#include "pair_reax_c_kokkos.h"
+#include "pair_reaxc_kokkos.h"
#include "kokkos.h"
#include "atom_kokkos.h"
#include "comm.h"
#include "force.h"
#include "neighbor.h"
#include "neigh_request.h"
#include "neigh_list_kokkos.h"
#include "update.h"
#include "integrate.h"
#include "respa.h"
#include "math_const.h"
#include "math_special.h"
#include "memory.h"
#include "error.h"
#include "atom_masks.h"
#include "reaxc_defs.h"
#include "reaxc_lookup.h"
#include "reaxc_tool_box.h"
#define TEAMSIZE 128
/* ---------------------------------------------------------------------- */
namespace LAMMPS_NS {
using namespace MathConst;
using namespace MathSpecial;
template<class DeviceType>
PairReaxCKokkos<DeviceType>::PairReaxCKokkos(LAMMPS *lmp) : PairReaxC(lmp)
{
respa_enable = 0;
cut_nbsq = cut_hbsq = cut_bosq = 0.0;
atomKK = (AtomKokkos *) atom;
execution_space = ExecutionSpaceFromDevice<DeviceType>::space;
datamask_read = X_MASK | Q_MASK | F_MASK | TYPE_MASK | ENERGY_MASK | VIRIAL_MASK;
datamask_modify = F_MASK | ENERGY_MASK | VIRIAL_MASK;
k_resize_bo = DAT::tdual_int_scalar("pair:resize_bo");
d_resize_bo = k_resize_bo.view<DeviceType>();
k_resize_hb = DAT::tdual_int_scalar("pair:resize_hb");
d_resize_hb = k_resize_hb.view<DeviceType>();
nmax = 0;
maxbo = 1;
maxhb = 1;
k_error_flag = DAT::tdual_int_scalar("pair:error_flag");
k_nbuf_local = DAT::tdual_int_scalar("pair:nbuf_local");
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
PairReaxCKokkos<DeviceType>::~PairReaxCKokkos()
{
if (copymode) return;
memory->destroy_kokkos(k_eatom,eatom);
memory->destroy_kokkos(k_vatom,vatom);
memory->destroy_kokkos(k_tmpid,tmpid);
tmpid = NULL;
memory->destroy_kokkos(k_tmpbo,tmpbo);
tmpbo = NULL;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::allocate()
{
int n = atom->ntypes;
k_params_sing = Kokkos::DualView<params_sing*,typename DeviceType::array_layout,DeviceType>
("PairReaxC::params_sing",n+1);
paramssing = k_params_sing.d_view;
k_params_twbp = Kokkos::DualView<params_twbp**,typename DeviceType::array_layout,DeviceType>
("PairReaxC::params_twbp",n+1,n+1);
paramstwbp = k_params_twbp.d_view;
k_params_thbp = Kokkos::DualView<params_thbp***,typename DeviceType::array_layout,DeviceType>
("PairReaxC::params_thbp",n+1,n+1,n+1);
paramsthbp = k_params_thbp.d_view;
k_params_fbp = Kokkos::DualView<params_fbp****,typename DeviceType::array_layout,DeviceType>
("PairReaxC::params_fbp",n+1,n+1,n+1,n+1);
paramsfbp = k_params_fbp.d_view;
k_params_hbp = Kokkos::DualView<params_hbp***,typename DeviceType::array_layout,DeviceType>
("PairReaxC::params_hbp",n+1,n+1,n+1);
paramshbp = k_params_hbp.d_view;
k_tap = DAT::tdual_ffloat_1d("pair:tap",8);
d_tap = k_tap.d_view;
h_tap = k_tap.h_view;
}
/* ----------------------------------------------------------------------
init specific to this pair style
------------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::init_style()
{
PairReaxC::init_style();
// irequest = neigh request made by parent class
neighflag = lmp->kokkos->neighflag;
int irequest = neighbor->nrequest - 1;
neighbor->requests[irequest]->
kokkos_host = Kokkos::Impl::is_same<DeviceType,LMPHostType>::value &&
!Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
neighbor->requests[irequest]->
kokkos_device = Kokkos::Impl::is_same<DeviceType,LMPDeviceType>::value;
if (neighflag == FULL) {
neighbor->requests[irequest]->full = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->ghost = 1;
} else if (neighflag == HALF || neighflag == HALFTHREAD) {
neighbor->requests[irequest]->full = 0;
neighbor->requests[irequest]->half = 1;
neighbor->requests[irequest]->ghost = 1;
} else {
error->all(FLERR,"Cannot use chosen neighbor list style with reax/c/kk");
}
allocate();
setup();
init_md();
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::setup()
{
int i,j,k,m;
int n = atom->ntypes;
// general parameters
for (i = 0; i < 39; i ++)
gp[i] = system->reax_param.gp.l[i];
p_boc1 = gp[0];
p_boc2 = gp[1];
// vdw parameters
vdwflag = system->reax_param.gp.vdw_type;
lgflag = control->lgflag;
// atom, bond, angle, dihedral, H-bond specific parameters
two_body_parameters *twbp;
// valence angle (3-body) parameters
three_body_header *thbh;
three_body_parameters *thbp;
// torsion angle (4-body) parameters
four_body_header *fbh;
four_body_parameters *fbp;
// hydrogen bond parameters
hbond_parameters *hbp;
for (i = 1; i <= n; i++) {
// general
k_params_sing.h_view(i).mass = system->reax_param.sbp[map[i]].mass;
// polarization
k_params_sing.h_view(i).chi = system->reax_param.sbp[map[i]].chi;
k_params_sing.h_view(i).eta = system->reax_param.sbp[map[i]].eta;
// bond order
k_params_sing.h_view(i).r_s = system->reax_param.sbp[map[i]].r_s;
k_params_sing.h_view(i).r_pi = system->reax_param.sbp[map[i]].r_pi;
k_params_sing.h_view(i).r_pi2 = system->reax_param.sbp[map[i]].r_pi_pi;
k_params_sing.h_view(i).valency = system->reax_param.sbp[map[i]].valency;
k_params_sing.h_view(i).valency_val = system->reax_param.sbp[map[i]].valency_val;
k_params_sing.h_view(i).valency_boc = system->reax_param.sbp[map[i]].valency_boc;
k_params_sing.h_view(i).valency_e = system->reax_param.sbp[map[i]].valency_e;
k_params_sing.h_view(i).nlp_opt = system->reax_param.sbp[map[i]].nlp_opt;
// multibody
k_params_sing.h_view(i).p_lp2 = system->reax_param.sbp[map[i]].p_lp2;
k_params_sing.h_view(i).p_ovun2 = system->reax_param.sbp[map[i]].p_ovun2;
k_params_sing.h_view(i).p_ovun5 = system->reax_param.sbp[map[i]].p_ovun5;
// angular
k_params_sing.h_view(i).p_val3 = system->reax_param.sbp[map[i]].p_val3;
k_params_sing.h_view(i).p_val5 = system->reax_param.sbp[map[i]].p_val5;
// hydrogen bond
k_params_sing.h_view(i).p_hbond = system->reax_param.sbp[map[i]].p_hbond;
for (j = 1; j <= n; j++) {
twbp = &(system->reax_param.tbp[map[i]][map[j]]);
// vdW
k_params_twbp.h_view(i,j).gamma = twbp->gamma;
k_params_twbp.h_view(i,j).gamma_w = twbp->gamma_w;
k_params_twbp.h_view(i,j).alpha = twbp->alpha;
k_params_twbp.h_view(i,j).r_vdw = twbp->r_vdW;
k_params_twbp.h_view(i,j).epsilon = twbp->D;
k_params_twbp.h_view(i,j).acore = twbp->acore;
k_params_twbp.h_view(i,j).ecore = twbp->ecore;
k_params_twbp.h_view(i,j).rcore = twbp->rcore;
k_params_twbp.h_view(i,j).lgre = twbp->lgre;
k_params_twbp.h_view(i,j).lgcij = twbp->lgcij;
// bond order
k_params_twbp.h_view(i,j).r_s = twbp->r_s;
k_params_twbp.h_view(i,j).r_pi = twbp->r_p;
k_params_twbp.h_view(i,j).r_pi2 = twbp->r_pp;
k_params_twbp.h_view(i,j).p_bo1 = twbp->p_bo1;
k_params_twbp.h_view(i,j).p_bo2 = twbp->p_bo2;
k_params_twbp.h_view(i,j).p_bo3 = twbp->p_bo3;
k_params_twbp.h_view(i,j).p_bo4 = twbp->p_bo4;
k_params_twbp.h_view(i,j).p_bo5 = twbp->p_bo5;
k_params_twbp.h_view(i,j).p_bo6 = twbp->p_bo6;
k_params_twbp.h_view(i,j).p_boc3 = twbp->p_boc3;
k_params_twbp.h_view(i,j).p_boc4 = twbp->p_boc4;
k_params_twbp.h_view(i,j).p_boc5 = twbp->p_boc5;
k_params_twbp.h_view(i,j).ovc = twbp->ovc;
k_params_twbp.h_view(i,j).v13cor = twbp->v13cor;
// bond energy
k_params_twbp.h_view(i,j).p_be1 = twbp->p_be1;
k_params_twbp.h_view(i,j).p_be2 = twbp->p_be2;
k_params_twbp.h_view(i,j).De_s = twbp->De_s;
k_params_twbp.h_view(i,j).De_p = twbp->De_p;
k_params_twbp.h_view(i,j).De_pp = twbp->De_pp;
// multibody
k_params_twbp.h_view(i,j).p_ovun1 = twbp->p_ovun1;
for (k = 1; k <= n; k++) {
// Angular
thbh = &(system->reax_param.thbp[map[i]][map[j]][map[k]]);
thbp = &(thbh->prm[0]);
k_params_thbp.h_view(i,j,k).cnt = thbh->cnt;
k_params_thbp.h_view(i,j,k).theta_00 = thbp->theta_00;
k_params_thbp.h_view(i,j,k).p_val1 = thbp->p_val1;
k_params_thbp.h_view(i,j,k).p_val2 = thbp->p_val2;
k_params_thbp.h_view(i,j,k).p_val4 = thbp->p_val4;
k_params_thbp.h_view(i,j,k).p_val7 = thbp->p_val7;
k_params_thbp.h_view(i,j,k).p_pen1 = thbp->p_pen1;
k_params_thbp.h_view(i,j,k).p_coa1 = thbp->p_coa1;
// Hydrogen Bond
hbp = &(system->reax_param.hbp[map[i]][map[j]][map[k]]);
k_params_hbp.h_view(i,j,k).p_hb1 = hbp->p_hb1;
k_params_hbp.h_view(i,j,k).p_hb2 = hbp->p_hb2;
k_params_hbp.h_view(i,j,k).p_hb3 = hbp->p_hb3;
k_params_hbp.h_view(i,j,k).r0_hb = hbp->r0_hb;
for (m = 1; m <= n; m++) {
// Torsion
fbh = &(system->reax_param.fbp[map[i]][map[j]][map[k]][map[m]]);
fbp = &(fbh->prm[0]);
k_params_fbp.h_view(i,j,k,m).p_tor1 = fbp->p_tor1;
k_params_fbp.h_view(i,j,k,m).p_cot1 = fbp->p_cot1;
k_params_fbp.h_view(i,j,k,m).V1 = fbp->V1;
k_params_fbp.h_view(i,j,k,m).V2 = fbp->V2;
k_params_fbp.h_view(i,j,k,m).V3 = fbp->V3;
}
}
}
}
k_params_sing.template modify<LMPHostType>();
k_params_twbp.template modify<LMPHostType>();
k_params_thbp.template modify<LMPHostType>();
k_params_fbp.template modify<LMPHostType>();
k_params_hbp.template modify<LMPHostType>();
// cutoffs
cut_nbsq = control->nonb_cut * control->nonb_cut;
cut_hbsq = control->hbond_cut * control->hbond_cut;
cut_bosq = control->bond_cut * control->bond_cut;
// bond order cutoffs
bo_cut = 0.01 * gp[29];
thb_cut = control->thb_cut;
thb_cutsq = 0.000010; //thb_cut*thb_cut;
if (atom->nmax > nmax) {
nmax = atom->nmax;
allocate_array();
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::init_md()
{
// init_taper()
F_FLOAT d1, d7, swa, swa2, swa3, swb, swb2, swb3;
swa = control->nonb_low;
swb = control->nonb_cut;
if (fabs(swa) > 0.01 )
error->warning(FLERR,"Warning: non-zero lower Taper-radius cutoff");
if (swb < 0)
error->one(FLERR,"Negative upper Taper-radius cutoff");
else if (swb < 5) {
char str[128];
sprintf(str,"Warning: very low Taper-radius cutoff: %f\n", swb);
error->one(FLERR,str);
}
d1 = swb - swa;
d7 = powint(d1,7);
swa2 = swa * swa;
swa3 = swa * swa2;
swb2 = swb * swb;
swb3 = swb * swb2;
k_tap.h_view(7) = 20.0/d7;
k_tap.h_view(6) = -70.0 * (swa + swb) / d7;
k_tap.h_view(5) = 84.0 * (swa2 + 3.0*swa*swb + swb2) / d7;
k_tap.h_view(4) = -35.0 * (swa3 + 9.0*swa2*swb + 9.0*swa*swb2 + swb3 ) / d7;
k_tap.h_view(3) = 140.0 * (swa3*swb + 3.0*swa2*swb2 + swa*swb3 ) / d7;
k_tap.h_view(2) =-210.0 * (swa3*swb2 + swa2*swb3) / d7;
k_tap.h_view(1) = 140.0 * swa3 * swb3 / d7;
k_tap.h_view(0) = (-35.0*swa3*swb2*swb2 + 21.0*swa2*swb3*swb2 +
7.0*swa*swb3*swb3 + swb3*swb3*swb ) / d7;
k_tap.template modify<LMPHostType>();
k_tap.template sync<DeviceType>();
if ( control->tabulate ) {
int ntypes = atom->ntypes;
Init_Lookup_Tables();
k_LR = tdual_LR_lookup_table_kk_2d("lookup:LR",ntypes+1,ntypes+1);
d_LR = k_LR.d_view;
for (int i = 1; i <= ntypes; ++i) {
for (int j = i; j <= ntypes; ++j) {
int n = LR[i][j].n;
if (n == 0) continue;
k_LR.h_view(i,j).xmin = LR[i][j].xmin;
k_LR.h_view(i,j).xmax = LR[i][j].xmax;
k_LR.h_view(i,j).n = LR[i][j].n;
k_LR.h_view(i,j).dx = LR[i][j].dx;
k_LR.h_view(i,j).inv_dx = LR[i][j].inv_dx;
k_LR.h_view(i,j).a = LR[i][j].a;
k_LR.h_view(i,j).m = LR[i][j].m;
k_LR.h_view(i,j).c = LR[i][j].c;
tdual_LR_data_1d k_y = tdual_LR_data_1d("lookup:LR[i,j].y",n);
tdual_cubic_spline_coef_1d k_H = tdual_cubic_spline_coef_1d("lookup:LR[i,j].H",n);
tdual_cubic_spline_coef_1d k_vdW = tdual_cubic_spline_coef_1d("lookup:LR[i,j].vdW",n);
tdual_cubic_spline_coef_1d k_CEvd = tdual_cubic_spline_coef_1d("lookup:LR[i,j].CEvd",n);
tdual_cubic_spline_coef_1d k_ele = tdual_cubic_spline_coef_1d("lookup:LR[i,j].ele",n);
tdual_cubic_spline_coef_1d k_CEclmb = tdual_cubic_spline_coef_1d("lookup:LR[i,j].CEclmb",n);
k_LR.h_view(i,j).d_y = k_y.d_view;
k_LR.h_view(i,j).d_H = k_H.d_view;
k_LR.h_view(i,j).d_vdW = k_vdW.d_view;
k_LR.h_view(i,j).d_CEvd = k_CEvd.d_view;
k_LR.h_view(i,j).d_ele = k_ele.d_view;
k_LR.h_view(i,j).d_CEclmb = k_CEclmb.d_view;
for (int k = 0; k < n; k++) {
k_y.h_view(k) = LR[i][j].y[k];
k_H.h_view(k) = LR[i][j].H[k];
k_vdW.h_view(k) = LR[i][j].vdW[k];
k_CEvd.h_view(k) = LR[i][j].CEvd[k];
k_ele.h_view(k) = LR[i][j].ele[k];
k_CEclmb.h_view(k) = LR[i][j].CEclmb[k];
}
k_y.template modify<LMPHostType>();
k_H.template modify<LMPHostType>();
k_vdW.template modify<LMPHostType>();
k_CEvd.template modify<LMPHostType>();
k_ele.template modify<LMPHostType>();
k_CEclmb.template modify<LMPHostType>();
k_y.template sync<DeviceType>();
k_H.template sync<DeviceType>();
k_vdW.template sync<DeviceType>();
k_CEvd.template sync<DeviceType>();
k_ele.template sync<DeviceType>();
k_CEclmb.template sync<DeviceType>();
}
}
k_LR.template modify<LMPHostType>();
k_LR.template sync<DeviceType>();
Deallocate_Lookup_Tables();
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
int PairReaxCKokkos<DeviceType>::Init_Lookup_Tables()
{
int i, j, r;
int num_atom_types;
double dr;
double *h, *fh, *fvdw, *fele, *fCEvd, *fCEclmb;
double v0_vdw, v0_ele, vlast_vdw, vlast_ele;
/* initializations */
v0_vdw = 0;
v0_ele = 0;
vlast_vdw = 0;
vlast_ele = 0;
num_atom_types = atom->ntypes;
dr = control->nonb_cut / control->tabulate;
h = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:h", world );
fh = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:fh", world );
fvdw = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:fvdw", world );
fCEvd = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:fCEvd", world );
fele = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:fele", world );
fCEclmb = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:fCEclmb", world );
LR = (LR_lookup_table**)
scalloc( num_atom_types+1, sizeof(LR_lookup_table*), "lookup:LR", world );
for( i = 0; i < num_atom_types+1; ++i )
LR[i] = (LR_lookup_table*)
scalloc( num_atom_types+1, sizeof(LR_lookup_table), "lookup:LR[i]", world );
for( i = 1; i <= num_atom_types; ++i ) {
for( j = i; j <= num_atom_types; ++j ) {
LR[i][j].xmin = 0;
LR[i][j].xmax = control->nonb_cut;
LR[i][j].n = control->tabulate + 2;
LR[i][j].dx = dr;
LR[i][j].inv_dx = control->tabulate / control->nonb_cut;
LR[i][j].y = (LR_data*)
smalloc( LR[i][j].n * sizeof(LR_data), "lookup:LR[i,j].y", world );
LR[i][j].H = (cubic_spline_coef*)
smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].H" ,
world );
LR[i][j].vdW = (cubic_spline_coef*)
smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].vdW",
world);
LR[i][j].CEvd = (cubic_spline_coef*)
smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].CEvd",
world);
LR[i][j].ele = (cubic_spline_coef*)
smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].ele",
world );
LR[i][j].CEclmb = (cubic_spline_coef*)
smalloc( LR[i][j].n*sizeof(cubic_spline_coef),
"lookup:LR[i,j].CEclmb", world );
for( r = 1; r <= control->tabulate; ++r ) {
LR_vdW_Coulomb(i, j, r * dr, &(LR[i][j].y[r]) );
h[r] = LR[i][j].dx;
fh[r] = LR[i][j].y[r].H;
fvdw[r] = LR[i][j].y[r].e_vdW;
fCEvd[r] = LR[i][j].y[r].CEvd;
fele[r] = LR[i][j].y[r].e_ele;
fCEclmb[r] = LR[i][j].y[r].CEclmb;
}
// init the start-end points
h[r] = LR[i][j].dx;
v0_vdw = LR[i][j].y[1].CEvd;
v0_ele = LR[i][j].y[1].CEclmb;
fh[r] = fh[r-1];
fvdw[r] = fvdw[r-1];
fCEvd[r] = fCEvd[r-1];
fele[r] = fele[r-1];
fCEclmb[r] = fCEclmb[r-1];
vlast_vdw = fCEvd[r-1];
vlast_ele = fele[r-1];
Natural_Cubic_Spline( &h[1], &fh[1],
&(LR[i][j].H[1]), control->tabulate+1, world );
Complete_Cubic_Spline( &h[1], &fvdw[1], v0_vdw, vlast_vdw,
&(LR[i][j].vdW[1]), control->tabulate+1,
world );
Natural_Cubic_Spline( &h[1], &fCEvd[1],
&(LR[i][j].CEvd[1]), control->tabulate+1,
world );
Complete_Cubic_Spline( &h[1], &fele[1], v0_ele, vlast_ele,
&(LR[i][j].ele[1]), control->tabulate+1,
world );
Natural_Cubic_Spline( &h[1], &fCEclmb[1],
&(LR[i][j].CEclmb[1]), control->tabulate+1,
world );
}// else{
// LR[i][j].n = 0;
//}//
}
free(h);
free(fh);
free(fvdw);
free(fCEvd);
free(fele);
free(fCEclmb);
return 1;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::Deallocate_Lookup_Tables()
{
int i, j;
int ntypes;
ntypes = atom->ntypes;
for( i = 0; i < ntypes; ++i ) {
for( j = i; j < ntypes; ++j )
if( LR[i][j].n ) {
sfree( LR[i][j].y, "LR[i,j].y" );
sfree( LR[i][j].H, "LR[i,j].H" );
sfree( LR[i][j].vdW, "LR[i,j].vdW" );
sfree( LR[i][j].CEvd, "LR[i,j].CEvd" );
sfree( LR[i][j].ele, "LR[i,j].ele" );
sfree( LR[i][j].CEclmb, "LR[i,j].CEclmb" );
}
sfree( LR[i], "LR[i]" );
}
sfree( LR, "LR" );
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::LR_vdW_Coulomb( int i, int j, double r_ij, LR_data *lr )
{
double p_vdW1 = system->reax_param.gp.l[28];
double p_vdW1i = 1.0 / p_vdW1;
double powr_vdW1, powgi_vdW1;
double tmp, fn13, exp1, exp2;
double Tap, dTap, dfn13;
double dr3gamij_1, dr3gamij_3;
double e_core, de_core;
double e_lg, de_lg, r_ij5, r_ij6, re6;
two_body_parameters *twbp;
twbp = &(system->reax_param.tbp[map[i]][map[j]]);
e_core = 0;
de_core = 0;
e_lg = de_lg = 0.0;
/* calculate taper and its derivative */
Tap = k_tap.h_view[7] * r_ij + k_tap.h_view[6];
Tap = Tap * r_ij + k_tap.h_view[5];
Tap = Tap * r_ij + k_tap.h_view[4];
Tap = Tap * r_ij + k_tap.h_view[3];
Tap = Tap * r_ij + k_tap.h_view[2];
Tap = Tap * r_ij + k_tap.h_view[1];
Tap = Tap * r_ij + k_tap.h_view[0];
dTap = 7*k_tap.h_view[7] * r_ij + 6*k_tap.h_view[6];
dTap = dTap * r_ij + 5*k_tap.h_view[5];
dTap = dTap * r_ij + 4*k_tap.h_view[4];
dTap = dTap * r_ij + 3*k_tap.h_view[3];
dTap = dTap * r_ij + 2*k_tap.h_view[2];
dTap += k_tap.h_view[1]/r_ij;
/*vdWaals Calculations*/
if(system->reax_param.gp.vdw_type==1 || system->reax_param.gp.vdw_type==3)
{ // shielding
powr_vdW1 = pow(r_ij, p_vdW1);
powgi_vdW1 = pow( 1.0 / twbp->gamma_w, p_vdW1);
fn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i );
exp1 = exp( twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );
exp2 = exp( 0.5 * twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );
lr->e_vdW = Tap * twbp->D * (exp1 - 2.0 * exp2);
dfn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i-1.0) * pow(r_ij, p_vdW1-2.0);
lr->CEvd = dTap * twbp->D * (exp1 - 2.0 * exp2) -
Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) * dfn13;
}
else{ // no shielding
exp1 = exp( twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );
exp2 = exp( 0.5 * twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );
lr->e_vdW = Tap * twbp->D * (exp1 - 2.0 * exp2);
lr->CEvd = dTap * twbp->D * (exp1 - 2.0 * exp2) -
Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) / r_ij;
}
if(system->reax_param.gp.vdw_type==2 || system->reax_param.gp.vdw_type==3)
{ // inner wall
e_core = twbp->ecore * exp(twbp->acore * (1.0-(r_ij/twbp->rcore)));
lr->e_vdW += Tap * e_core;
de_core = -(twbp->acore/twbp->rcore) * e_core;
lr->CEvd += dTap * e_core + Tap * de_core / r_ij;
// lg correction, only if lgvdw is yes
if (control->lgflag) {
r_ij5 = powint( r_ij, 5 );
r_ij6 = powint( r_ij, 6 );
re6 = powint( twbp->lgre, 6 );
e_lg = -(twbp->lgcij/( r_ij6 + re6 ));
lr->e_vdW += Tap * e_lg;
de_lg = -6.0 * e_lg * r_ij5 / ( r_ij6 + re6 ) ;
lr->CEvd += dTap * e_lg + Tap * de_lg/r_ij;
}
}
/* Coulomb calculations */
dr3gamij_1 = ( r_ij * r_ij * r_ij + twbp->gamma );
dr3gamij_3 = pow( dr3gamij_1 , 0.33333333333333 );
tmp = Tap / dr3gamij_3;
lr->H = EV_to_KCALpMOL * tmp;
lr->e_ele = C_ele * tmp;
lr->CEclmb = C_ele * ( dTap - Tap * r_ij / dr3gamij_1 ) / dr3gamij_3;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::compute(int eflag_in, int vflag_in)
{
copymode = 1;
bocnt = hbcnt = 0;
eflag = eflag_in;
vflag = vflag_in;
if (neighflag == FULL) no_virial_fdotr_compute = 1;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
atomKK->sync(execution_space,datamask_read);
k_params_sing.template sync<DeviceType>();
k_params_twbp.template sync<DeviceType>();
k_params_thbp.template sync<DeviceType>();
k_params_fbp.template sync<DeviceType>();
k_params_hbp.template sync<DeviceType>();
if (eflag || vflag) atomKK->modified(execution_space,datamask_modify);
else atomKK->modified(execution_space,F_MASK);
x = atomKK->k_x.view<DeviceType>();
f = atomKK->k_f.view<DeviceType>();
q = atomKK->k_q.view<DeviceType>();
tag = atomKK->k_tag.view<DeviceType>();
type = atomKK->k_type.view<DeviceType>();
nlocal = atomKK->nlocal;
nall = atom->nlocal + atom->nghost;
newton_pair = force->newton_pair;
const int inum = list->inum;
const int ignum = inum + list->gnum;
NeighListKokkos<DeviceType>* k_list = static_cast<NeighListKokkos<DeviceType>*>(list);
d_numneigh = k_list->d_numneigh;
d_neighbors = k_list->d_neighbors;
d_ilist = k_list->d_ilist;
k_list->clean_copy();
if (eflag_global) {
for (int i = 0; i < 14; i++)
pvector[i] = 0.0;
}
EV_FLOAT_REAX ev;
EV_FLOAT_REAX ev_all;
// Polarization (self)
if (neighflag == HALF) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputePolar<HALF,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputePolar<HALF,0> >(0,inum),*this);
} else { //if (neighflag == HALFTHREAD) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputePolar<HALFTHREAD,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputePolar<HALFTHREAD,0> >(0,inum),*this);
}
DeviceType::fence();
ev_all += ev;
pvector[13] = ev.ecoul;
// LJ + Coulomb
if (control->tabulate) {
if (neighflag == HALF) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<HALF,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<HALF,0> >(0,inum),*this);
} else if (neighflag == HALFTHREAD) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<HALFTHREAD,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<HALFTHREAD,0> >(0,inum),*this);
} else if (neighflag == FULL) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<FULL,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeTabulatedLJCoulomb<FULL,0> >(0,inum),*this);
}
} else {
if (neighflag == HALF) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<HALF,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<HALF,0> >(0,inum),*this);
} else if (neighflag == HALFTHREAD) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<HALFTHREAD,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<HALFTHREAD,0> >(0,inum),*this);
} else if (neighflag == FULL) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<FULL,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeLJCoulomb<FULL,0> >(0,inum),*this);
}
}
DeviceType::fence();
ev_all += ev;
pvector[10] = ev.evdwl;
pvector[11] = ev.ecoul;
if (atom->nmax > nmax) {
nmax = atom->nmax;
allocate_array();
}
// Neighbor lists for bond and hbond
// try, resize if necessary
int resize = 1;
while (resize) {
resize = 0;
k_resize_bo.h_view() = 0;
k_resize_bo.modify<LMPHostType>();
k_resize_bo.sync<DeviceType>();
k_resize_hb.h_view() = 0;
k_resize_hb.modify<LMPHostType>();
k_resize_hb.sync<DeviceType>();
// zero
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxZero>(0,nmax),*this);
DeviceType::fence();
if (neighflag == HALF)
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBuildListsHalf<HALF> >(0,ignum),*this);
else if (neighflag == HALFTHREAD)
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBuildListsHalf_LessAtomics<HALFTHREAD> >(0,ignum),*this);
else //(neighflag == FULL)
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBuildListsFull>(0,ignum),*this);
DeviceType::fence();
k_resize_bo.modify<DeviceType>();
k_resize_bo.sync<LMPHostType>();
int resize_bo = k_resize_bo.h_view();
if (resize_bo) maxbo++;
k_resize_hb.modify<DeviceType>();
k_resize_hb.sync<LMPHostType>();
int resize_hb = k_resize_hb.h_view();
if (resize_hb) maxhb++;
resize = resize_bo || resize_hb;
if (resize) allocate_array();
}
// Bond order
if (neighflag == HALF) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBondOrder1>(0,ignum),*this);
DeviceType::fence();
} else if (neighflag == HALFTHREAD) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBondOrder1_LessAtomics>(0,ignum),*this);
DeviceType::fence();
}
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBondOrder2>(0,ignum),*this);
DeviceType::fence();
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxBondOrder3>(0,ignum),*this);
DeviceType::fence();
// Bond energy
if (neighflag == HALF) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond1<HALF,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond1<HALF,0> >(0,inum),*this);
DeviceType::fence();
ev_all += ev;
pvector[0] = ev.evdwl;
} else { //if (neighflag == HALFTHREAD) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond1<HALFTHREAD,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond1<HALFTHREAD,0> >(0,inum),*this);
DeviceType::fence();
ev_all += ev;
pvector[0] = ev.evdwl;
}
// Multi-body corrections
if (neighflag == HALF) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti1<HALF,0> >(0,inum),*this);
DeviceType::fence();
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti2<HALF,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti2<HALF,0> >(0,inum),*this);
DeviceType::fence();
ev_all += ev;
} else { //if (neighflag == HALFTHREAD) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti1<HALFTHREAD,0> >(0,inum),*this);
DeviceType::fence();
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti2<HALFTHREAD,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeMulti2<HALFTHREAD,0> >(0,inum),*this);
DeviceType::fence();
ev_all += ev;
}
pvector[2] = ev.ereax[0];
pvector[1] = ev.ereax[1]+ev.ereax[2];
pvector[3] = 0.0;
ev_all.evdwl += ev.ereax[0] + ev.ereax[1] + ev.ereax[2];
// Angular
if (neighflag == HALF) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeAngular<HALF,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeAngular<HALF,0> >(0,inum),*this);
DeviceType::fence();
ev_all += ev;
} else { //if (neighflag == HALFTHREAD) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeAngular<HALFTHREAD,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeAngular<HALFTHREAD,0> >(0,inum),*this);
DeviceType::fence();
ev_all += ev;
}
pvector[4] = ev.ereax[3];
pvector[5] = ev.ereax[4];
pvector[6] = ev.ereax[5];
ev_all.evdwl += ev.ereax[3] + ev.ereax[4] + ev.ereax[5];
// Torsion
if (neighflag == HALF) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeTorsion<HALF,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeTorsion<HALF,0> >(0,inum),*this);
DeviceType::fence();
ev_all += ev;
} else { //if (neighflag == HALFTHREAD) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeTorsion<HALFTHREAD,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeTorsion<HALFTHREAD,0> >(0,inum),*this);
DeviceType::fence();
ev_all += ev;
}
pvector[8] = ev.ereax[6];
pvector[9] = ev.ereax[7];
ev_all.evdwl += ev.ereax[6] + ev.ereax[7];
// Hydrogen Bond
if (cut_hbsq > 0.0) {
if (neighflag == HALF) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeHydrogen<HALF,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeHydrogen<HALF,0> >(0,inum),*this);
DeviceType::fence();
ev_all += ev;
} else { //if (neighflag == HALFTHREAD) {
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeHydrogen<HALFTHREAD,1> >(0,inum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeHydrogen<HALFTHREAD,0> >(0,inum),*this);
DeviceType::fence();
ev_all += ev;
}
}
pvector[7] = ev.ereax[8];
ev_all.evdwl += ev.ereax[8];
// Bond force
if (neighflag == HALF) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxUpdateBond<HALF> >(0,ignum),*this);
DeviceType::fence();
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond2<HALF,1> >(0,ignum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond2<HALF,0> >(0,ignum),*this);
DeviceType::fence();
ev_all += ev;
pvector[0] += ev.evdwl;
} else { //if (neighflag == HALFTHREAD) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxUpdateBond<HALFTHREAD> >(0,ignum),*this);
DeviceType::fence();
if (evflag)
Kokkos::parallel_reduce(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond2<HALFTHREAD,1> >(0,ignum),*this,ev);
else
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxComputeBond2<HALFTHREAD,0> >(0,ignum),*this);
DeviceType::fence();
ev_all += ev;
pvector[0] += ev.evdwl;
}
if (eflag_global) {
eng_vdwl += ev_all.evdwl;
eng_coul += ev_all.ecoul;
}
if (vflag_global) {
virial[0] += ev_all.v[0];
virial[1] += ev_all.v[1];
virial[2] += ev_all.v[2];
virial[3] += ev_all.v[3];
virial[4] += ev_all.v[4];
virial[5] += ev_all.v[5];
}
if (vflag_fdotr) pair_virial_fdotr_compute(this);
if (eflag_atom) {
k_eatom.template modify<DeviceType>();
k_eatom.template sync<LMPHostType>();
}
if (vflag_atom) {
k_vatom.template modify<DeviceType>();
k_vatom.template sync<LMPHostType>();
}
if (fixspecies_flag)
FindBondSpecies();
copymode = 0;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputePolar<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {
const int i = d_ilist[ii];
const int itype = type(i);
const F_FLOAT qi = q(i);
const F_FLOAT chi = paramssing(itype).chi;
const F_FLOAT eta = paramssing(itype).eta;
const F_FLOAT epol = KCALpMOL_to_EV*(chi*qi+(eta/2.0)*qi*qi);
if (eflag) ev.ecoul += epol;
//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,i,epol,0.0,0.0,0.0,0.0);
if (eflag_atom) this->template e_tally_single<NEIGHFLAG>(ev,i,epol);
}
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputePolar<NEIGHFLAG,EVFLAG>, const int &ii) const {
EV_FLOAT_REAX ev;
this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputePolar<NEIGHFLAG,EVFLAG>(), ii, ev);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeLJCoulomb<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {
// The f array is atomic for Half/Thread neighbor style
Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;
F_FLOAT powr_vdw, powgi_vdw, fn13, dfn13, exp1, exp2, etmp;
F_FLOAT evdwl, fvdwl;
evdwl = fvdwl = 0.0;
const int i = d_ilist[ii];
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const F_FLOAT qi = q(i);
const int itype = type(i);
const tagint itag = tag(i);
const int jnum = d_numneigh[i];
F_FLOAT fxtmp, fytmp, fztmp;
fxtmp = fytmp = fztmp = 0.0;
for (int jj = 0; jj < jnum; jj++) {
int j = d_neighbors(i,jj);
j &= NEIGHMASK;
const int jtype = type(j);
const tagint jtag = tag(j);
const F_FLOAT qj = q(j);
if (NEIGHFLAG != FULL) {
// skip half of the interactions
if (j >= nlocal) {
if (itag > jtag) {
if ((itag+jtag) % 2 == 0) continue;
} else if (itag < jtag) {
if ((itag+jtag) % 2 == 1) continue;
} else {
if (x(j,2) < ztmp) continue;
if (x(j,2) == ztmp && x(j,1) < ytmp) continue;
if (x(j,2) == ztmp && x(j,1) == ytmp && x(j,0) < xtmp) continue;
}
}
}
const X_FLOAT delx = x(j,0) - xtmp;
const X_FLOAT dely = x(j,1) - ytmp;
const X_FLOAT delz = x(j,2) - ztmp;
const F_FLOAT rsq = delx*delx + dely*dely + delz*delz;
if (rsq > cut_nbsq) continue;
const F_FLOAT rij = sqrt(rsq);
// LJ energy/force
F_FLOAT Tap = d_tap[7] * rij + d_tap[6];
Tap = Tap * rij + d_tap[5];
Tap = Tap * rij + d_tap[4];
Tap = Tap * rij + d_tap[3];
Tap = Tap * rij + d_tap[2];
Tap = Tap * rij + d_tap[1];
Tap = Tap * rij + d_tap[0];
F_FLOAT dTap = 7*d_tap[7] * rij + 6*d_tap[6];
dTap = dTap * rij + 5*d_tap[5];
dTap = dTap * rij + 4*d_tap[4];
dTap = dTap * rij + 3*d_tap[3];
dTap = dTap * rij + 2*d_tap[2];
dTap += d_tap[1]/rij;
const F_FLOAT gamma_w = paramstwbp(itype,jtype).gamma_w;
const F_FLOAT alpha = paramstwbp(itype,jtype).alpha;
const F_FLOAT r_vdw = paramstwbp(itype,jtype).r_vdw;
const F_FLOAT epsilon = paramstwbp(itype,jtype).epsilon;
// shielding
if (vdwflag == 1 || vdwflag == 3) {
powr_vdw = pow(rij,gp[28]);
powgi_vdw = pow(1.0/gamma_w,gp[28]);
fn13 = pow(powr_vdw+powgi_vdw,1.0/gp[28]);
exp1 = exp(alpha*(1.0-fn13/r_vdw));
exp2 = exp(0.5*alpha*(1.0-fn13/r_vdw));
dfn13 = pow(powr_vdw+powgi_vdw,1.0/gp[28]-1.0)*pow(rij,gp[28]-2.0);
etmp = epsilon*(exp1-2.0*exp2);
evdwl = Tap*etmp;
fvdwl = dTap*etmp-Tap*epsilon*(alpha/r_vdw)*(exp1-exp2)*dfn13;
} else {
exp1 = exp(alpha*(1.0-rij/r_vdw));
exp2 = exp(0.5*alpha*(1.0-rij/r_vdw));
etmp = epsilon*(exp1-2.0*exp2);
evdwl = Tap*etmp;
fvdwl = dTap*etmp-Tap*epsilon*(alpha/r_vdw)*(exp1-exp2)*rij;
}
// inner wall
if (vdwflag == 2 || vdwflag == 3) {
const F_FLOAT ecore = paramstwbp(itype,jtype).ecore;
const F_FLOAT acore = paramstwbp(itype,jtype).acore;
const F_FLOAT rcore = paramstwbp(itype,jtype).rcore;
const F_FLOAT e_core = ecore*exp(acore*(1.0-(rij/rcore)));
const F_FLOAT de_core = -(acore/rcore)*e_core;
evdwl += Tap*e_core;
fvdwl += dTap*e_core+Tap*de_core/rij;
if (lgflag) {
const F_FLOAT lgre = paramstwbp(itype,jtype).lgre;
const F_FLOAT lgcij = paramstwbp(itype,jtype).lgcij;
const F_FLOAT rij5 = rsq*rsq*rij;
const F_FLOAT rij6 = rij5*rij;
const F_FLOAT re6 = lgre*lgre*lgre*lgre*lgre*lgre;
const F_FLOAT elg = -lgcij/(rij6+re6);
const F_FLOAT delg = -6.0*elg*rij5/(rij6+re6);
evdwl += Tap*elg;
fvdwl += dTap*elg+Tap*delg/rij;
}
}
// Coulomb energy/force
const F_FLOAT shld = paramstwbp(itype,jtype).gamma;
const F_FLOAT denom1 = rij * rij * rij + shld;
const F_FLOAT denom3 = pow(denom1,0.3333333333333);
const F_FLOAT ecoul = C_ele * qi*qj*Tap/denom3;
const F_FLOAT fcoul = C_ele * qi*qj*(dTap-Tap*rij/denom1)/denom3;
const F_FLOAT ftotal = fvdwl + fcoul;
fxtmp += delx*ftotal;
fytmp += dely*ftotal;
fztmp += delz*ftotal;
if (NEIGHFLAG != FULL) {
a_f(j,0) -= delx*ftotal;
a_f(j,1) -= dely*ftotal;
a_f(j,2) -= delz*ftotal;
}
if (NEIGHFLAG == FULL) {
if (eflag) ev.evdwl += 0.5*evdwl;
if (eflag) ev.ecoul += 0.5*ecoul;
} else {
if (eflag) ev.evdwl += evdwl;
if (eflag) ev.ecoul += ecoul;
}
if (vflag_either || eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,evdwl+ecoul,-ftotal,delx,dely,delz);
}
a_f(i,0) += fxtmp;
a_f(i,1) += fytmp;
a_f(i,2) += fztmp;
}
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeLJCoulomb<NEIGHFLAG,EVFLAG>, const int &ii) const {
EV_FLOAT_REAX ev;
this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeLJCoulomb<NEIGHFLAG,EVFLAG>(), ii, ev);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeTabulatedLJCoulomb<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {
// The f array is atomic for Half/Thread neighbor style
Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;
const int i = d_ilist[ii];
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const F_FLOAT qi = q(i);
const int itype = type(i);
const tagint itag = tag(i);
const int jnum = d_numneigh[i];
F_FLOAT fxtmp, fytmp, fztmp;
fxtmp = fytmp = fztmp = 0.0;
for (int jj = 0; jj < jnum; jj++) {
int j = d_neighbors(i,jj);
j &= NEIGHMASK;
const int jtype = type(j);
const tagint jtag = tag(j);
const F_FLOAT qj = q(j);
if (NEIGHFLAG != FULL) {
// skip half of the interactions
if (j >= nlocal) {
if (itag > jtag) {
if ((itag+jtag) % 2 == 0) continue;
} else if (itag < jtag) {
if ((itag+jtag) % 2 == 1) continue;
} else {
if (x(j,2) < ztmp) continue;
if (x(j,2) == ztmp && x(j,1) < ytmp) continue;
if (x(j,2) == ztmp && x(j,1) == ytmp && x(j,0) < xtmp) continue;
}
}
}
const X_FLOAT delx = x(j,0) - xtmp;
const X_FLOAT dely = x(j,1) - ytmp;
const X_FLOAT delz = x(j,2) - ztmp;
const F_FLOAT rsq = delx*delx + dely*dely + delz*delz;
if (rsq > cut_nbsq) continue;
const F_FLOAT rij = sqrt(rsq);
const int tmin = MIN( itype, jtype );
const int tmax = MAX( itype, jtype );
const LR_lookup_table_kk t = d_LR(tmin,tmax);
/* Cubic Spline Interpolation */
int r = (int)(rij * t.inv_dx);
if( r == 0 ) ++r;
const F_FLOAT base = (double)(r+1) * t.dx;
const F_FLOAT dif = rij - base;
const cubic_spline_coef vdW = t.d_vdW[r];
const cubic_spline_coef ele = t.d_ele[r];
const cubic_spline_coef CEvd = t.d_CEvd[r];
const cubic_spline_coef CEclmb = t.d_CEclmb[r];
const F_FLOAT evdwl = ((vdW.d*dif + vdW.c)*dif + vdW.b)*dif +
vdW.a;
const F_FLOAT ecoul = (((ele.d*dif + ele.c)*dif + ele.b)*dif +
ele.a)*qi*qj;
const F_FLOAT fvdwl = ((CEvd.d*dif + CEvd.c)*dif + CEvd.b)*dif +
CEvd.a;
const F_FLOAT fcoul = (((CEclmb.d*dif+CEclmb.c)*dif+CEclmb.b)*dif +
CEclmb.a)*qi*qj;
const F_FLOAT ftotal = fvdwl + fcoul;
fxtmp += delx*ftotal;
fytmp += dely*ftotal;
fztmp += delz*ftotal;
if (NEIGHFLAG != FULL) {
a_f(j,0) -= delx*ftotal;
a_f(j,1) -= dely*ftotal;
a_f(j,2) -= delz*ftotal;
}
if (NEIGHFLAG == FULL) {
if (eflag) ev.evdwl += 0.5*evdwl;
if (eflag) ev.ecoul += 0.5*ecoul;
} else {
if (eflag) ev.evdwl += evdwl;
if (eflag) ev.ecoul += ecoul;
}
if (vflag_either || eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,evdwl+ecoul,-ftotal,delx,dely,delz);
}
a_f(i,0) += fxtmp;
a_f(i,1) += fytmp;
a_f(i,2) += fztmp;
}
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeTabulatedLJCoulomb<NEIGHFLAG,EVFLAG>, const int &ii) const {
EV_FLOAT_REAX ev;
this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeTabulatedLJCoulomb<NEIGHFLAG,EVFLAG>(), ii, ev);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::allocate_array()
{
if (cut_hbsq > 0.0) {
d_hb_first = typename AT::t_int_1d("reax/c/kk:hb_first",nmax);
d_hb_num = typename AT::t_int_1d("reax/c/kk:hb_num",nmax);
d_hb_list = typename AT::t_int_1d("reax/c/kk:hb_list",nmax*maxhb);
}
d_bo_first = typename AT::t_int_1d("reax/c/kk:bo_first",nmax);
d_bo_num = typename AT::t_int_1d("reax/c/kk:bo_num",nmax);
d_bo_list = typename AT::t_int_1d("reax/c/kk:bo_list",nmax*maxbo);
d_BO = typename AT::t_ffloat_2d_dl("reax/c/kk:BO",nmax,maxbo);
d_BO_s = typename AT::t_ffloat_2d_dl("reax/c/kk:BO",nmax,maxbo);
d_BO_pi = typename AT::t_ffloat_2d_dl("reax/c/kk:BO_pi",nmax,maxbo);
d_BO_pi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:BO_pi2",nmax,maxbo);
d_dln_BOp_pix = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_pix",nmax,maxbo);
d_dln_BOp_piy = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_piy",nmax,maxbo);
d_dln_BOp_piz = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_piz",nmax,maxbo);
d_dln_BOp_pi2x = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_pi2x",nmax,maxbo);
d_dln_BOp_pi2y = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_pi2y",nmax,maxbo);
d_dln_BOp_pi2z = typename AT::t_ffloat_2d_dl("reax/c/kk:d_dln_BOp_pi2z",nmax,maxbo);
d_C1dbo = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C1dbo",nmax,maxbo);
d_C2dbo = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C2dbo",nmax,maxbo);
d_C3dbo = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C3dbo",nmax,maxbo);
d_C1dbopi = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C1dbopi",nmax,maxbo);
d_C2dbopi = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C2dbopi",nmax,maxbo);
d_C3dbopi = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C3dbopi",nmax,maxbo);
d_C4dbopi = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C4dbopi",nmax,maxbo);
d_C1dbopi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C1dbopi2",nmax,maxbo);
d_C2dbopi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C2dbopi2",nmax,maxbo);
d_C3dbopi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C3dbopi2",nmax,maxbo);
d_C4dbopi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:d_C4dbopi2",nmax,maxbo);
d_dBOpx = typename AT::t_ffloat_2d_dl("reax/c/kk:dBOpx",nmax,maxbo);
d_dBOpy = typename AT::t_ffloat_2d_dl("reax/c/kk:dBOpy",nmax,maxbo);
d_dBOpz = typename AT::t_ffloat_2d_dl("reax/c/kk:dBOpz",nmax,maxbo);
d_dDeltap_self = typename AT::t_ffloat_2d_dl("reax/c/kk:dDeltap_self",nmax,3);
d_Deltap_boc = typename AT::t_ffloat_1d("reax/c/kk:Deltap_boc",nmax);
d_Deltap = typename AT::t_ffloat_1d("reax/c/kk:Deltap",nmax);
d_total_bo = typename AT::t_ffloat_1d("reax/c/kk:total_bo",nmax);
d_Cdbo = typename AT::t_ffloat_2d_dl("reax/c/kk:Cdbo",nmax,3*maxbo);
d_Cdbopi = typename AT::t_ffloat_2d_dl("reax/c/kk:Cdbopi",nmax,3*maxbo);
d_Cdbopi2 = typename AT::t_ffloat_2d_dl("reax/c/kk:Cdbopi2",nmax,3*maxbo);
d_Delta = typename AT::t_ffloat_1d("reax/c/kk:Delta",nmax);
d_Delta_boc = typename AT::t_ffloat_1d("reax/c/kk:Delta_boc",nmax);
d_dDelta_lp = typename AT::t_ffloat_1d("reax/c/kk:dDelta_lp",nmax);
d_Delta_lp = typename AT::t_ffloat_1d("reax/c/kk:Delta_lp",nmax);
d_Delta_lp_temp = typename AT::t_ffloat_1d("reax/c/kk:Delta_lp_temp",nmax);
d_CdDelta = typename AT::t_ffloat_1d("reax/c/kk:CdDelta",nmax);
d_sum_ovun = typename AT::t_ffloat_2d_dl("reax/c/kk:sum_ovun",nmax,3);
// FixReaxCSpecies
if (fixspecies_flag) {
memory->destroy_kokkos(k_tmpid,tmpid);
memory->destroy_kokkos(k_tmpbo,tmpbo);
memory->create_kokkos(k_tmpid,tmpid,nmax,MAXSPECBOND,"pair:tmpid");
memory->create_kokkos(k_tmpbo,tmpbo,nmax,MAXSPECBOND,"pair:tmpbo");
}
// FixReaxCBonds
d_abo = typename AT::t_ffloat_2d("reax/c/kk:abo",nmax,maxbo);
d_neighid = typename AT::t_tagint_2d("reax/c/kk:neighid",nmax,maxbo);
d_numneigh_bonds = typename AT::t_int_1d("reax/c/kk:numneigh_bonds",nmax);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxZero, const int &n) const {
d_total_bo(n) = 0.0;
d_CdDelta(n) = 0.0;
if (neighflag != FULL) {
d_bo_num(n) = 0.0;
d_hb_num(n) = 0.0;
}
for (int j = 0; j < 3; j++)
d_dDeltap_self(n,j) = 0.0;
}
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxZeroEAtom, const int &i) const {
v_eatom(i) = 0.0;
}
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxZeroVAtom, const int &i) const {
v_vatom(i,0) = 0.0;
v_vatom(i,1) = 0.0;
v_vatom(i,2) = 0.0;
v_vatom(i,3) = 0.0;
v_vatom(i,4) = 0.0;
v_vatom(i,5) = 0.0;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxBuildListsFull, const int &ii) const {
if (d_resize_bo() || d_resize_hb())
return;
const int i = d_ilist[ii];
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const int itype = type(i);
const int jnum = d_numneigh[i];
F_FLOAT C12, C34, C56, BO_s, BO_pi, BO_pi2, BO, delij[3], dBOp_i[3], dln_BOp_pi_i[3], dln_BOp_pi2_i[3];
F_FLOAT total_bo = 0.0;
int j_index = i*maxbo;
d_bo_first[i] = j_index;
const int bo_first_i = j_index;
int ihb = -1;
int jhb = -1;
int hb_index = i*maxhb;
int hb_first_i;
if (cut_hbsq > 0.0) {
ihb = paramssing(itype).p_hbond;
if (ihb == 1) {
d_hb_first[i] = hb_index;
hb_first_i = hb_index;
}
}
for (int jj = 0; jj < jnum; jj++) {
int j = d_neighbors(i,jj);
j &= NEIGHMASK;
delij[0] = x(j,0) - xtmp;
delij[1] = x(j,1) - ytmp;
delij[2] = x(j,2) - ztmp;
const F_FLOAT rsq = delij[0]*delij[0] + delij[1]*delij[1] + delij[2]*delij[2];
double cutoffsq;
if(i < nlocal) cutoffsq = MAX(cut_bosq,cut_hbsq);
else cutoffsq = cut_bosq;
if (rsq > cutoffsq) continue;
const int jtype = type(j);
// hbond list
if (i < nlocal && cut_hbsq > 0.0 && (ihb == 1 || ihb == 2) && rsq <= cut_hbsq) {
jhb = paramssing(jtype).p_hbond;
if( ihb == 1 && jhb == 2) {
const int jj_index = hb_index - hb_first_i;
if (jj_index >= maxhb) {
d_resize_hb() = 1;
return;
}
d_hb_list[hb_index] = j;
hb_index++;
}
}
// bond_list
const F_FLOAT rij = sqrt(rsq);
const F_FLOAT p_bo1 = paramstwbp(itype,jtype).p_bo1;
const F_FLOAT p_bo2 = paramstwbp(itype,jtype).p_bo2;
const F_FLOAT p_bo3 = paramstwbp(itype,jtype).p_bo3;
const F_FLOAT p_bo4 = paramstwbp(itype,jtype).p_bo4;
const F_FLOAT p_bo5 = paramstwbp(itype,jtype).p_bo5;
const F_FLOAT p_bo6 = paramstwbp(itype,jtype).p_bo6;
const F_FLOAT r_s = paramstwbp(itype,jtype).r_s;
const F_FLOAT r_pi = paramstwbp(itype,jtype).r_pi;
const F_FLOAT r_pi2 = paramstwbp(itype,jtype).r_pi2;
if (paramssing(itype).r_s > 0.0 && paramssing(jtype).r_s > 0.0) {
C12 = p_bo1*pow(rij/r_s,p_bo2);
BO_s = (1.0+bo_cut)*exp(C12);
}
else BO_s = C12 = 0.0;
if (paramssing(itype).r_pi > 0.0 && paramssing(jtype).r_pi > 0.0) {
C34 = p_bo3*pow(rij/r_pi,p_bo4);
BO_pi = exp(C34);
}
else BO_pi = C34 = 0.0;
if (paramssing(itype).r_pi2 > 0.0 && paramssing(jtype).r_pi2 > 0.0) {
C56 = p_bo5*pow(rij/r_pi2,p_bo6);
BO_pi2 = exp(C56);
}
else BO_pi2 = C56 = 0.0;
BO = BO_s + BO_pi + BO_pi2;
if (BO < bo_cut) continue;
const int jj_index = j_index - bo_first_i;
if (jj_index >= maxbo) {
d_resize_bo() = 1;
return;
}
d_bo_list[j_index] = j;
// from BondOrder1
d_BO(i,jj_index) = BO;
d_BO_s(i,jj_index) = BO_s;
d_BO_pi(i,jj_index) = BO_pi;
d_BO_pi2(i,jj_index) = BO_pi2;
F_FLOAT Cln_BOp_s = p_bo2 * C12 / rij / rij;
F_FLOAT Cln_BOp_pi = p_bo4 * C34 / rij / rij;
F_FLOAT Cln_BOp_pi2 = p_bo6 * C56 / rij / rij;
if (nlocal == 0)
Cln_BOp_s = Cln_BOp_pi = Cln_BOp_pi2 = 0.0;
for (int d = 0; d < 3; d++) dln_BOp_pi_i[d] = -(BO_pi*Cln_BOp_pi)*delij[d];
for (int d = 0; d < 3; d++) dln_BOp_pi2_i[d] = -(BO_pi2*Cln_BOp_pi2)*delij[d];
for (int d = 0; d < 3; d++) dBOp_i[d] = -(BO_s*Cln_BOp_s+BO_pi*Cln_BOp_pi+BO_pi2*Cln_BOp_pi2)*delij[d];
for (int d = 0; d < 3; d++) d_dDeltap_self(i,d) += dBOp_i[d];
d_dln_BOp_pix(i,jj_index) = dln_BOp_pi_i[0];
d_dln_BOp_piy(i,jj_index) = dln_BOp_pi_i[1];
d_dln_BOp_piz(i,jj_index) = dln_BOp_pi_i[2];
d_dln_BOp_pi2x(i,jj_index) = dln_BOp_pi2_i[0];
d_dln_BOp_pi2y(i,jj_index) = dln_BOp_pi2_i[1];
d_dln_BOp_pi2z(i,jj_index) = dln_BOp_pi2_i[2];
d_dBOpx(i,jj_index) = dBOp_i[0];
d_dBOpy(i,jj_index) = dBOp_i[1];
d_dBOpz(i,jj_index) = dBOp_i[2];
d_BO(i,jj_index) -= bo_cut;
d_BO_s(i,jj_index) -= bo_cut;
total_bo += d_BO(i,jj_index);
j_index++;
}
d_bo_num[i] = j_index - d_bo_first[i];
if (cut_hbsq > 0.0 && ihb == 1) d_hb_num[i] = hb_index - d_hb_first[i];
d_total_bo[i] += total_bo;
const F_FLOAT val_i = paramssing(itype).valency;
d_Deltap[i] = d_total_bo[i] - val_i;
d_Deltap_boc[i] = d_total_bo[i] - paramssing(itype).valency_val;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxBuildListsHalf<NEIGHFLAG>, const int &ii) const {
if (d_resize_bo() || d_resize_hb())
return;
Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_dDeltap_self = d_dDeltap_self;
Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_total_bo = d_total_bo;
const int i = d_ilist[ii];
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const int itype = type(i);
const tagint itag = tag(i);
const int jnum = d_numneigh[i];
F_FLOAT C12, C34, C56, BO_s, BO_pi, BO_pi2, BO, delij[3], dBOp_i[3], dln_BOp_pi_i[3], dln_BOp_pi2_i[3];
F_FLOAT total_bo = 0.0;
int j_index,i_index;
d_bo_first[i] = i*maxbo;
const int bo_first_i = d_bo_first[i];
int ihb = -1;
int jhb = -1;
int hb_first_i;
if (cut_hbsq > 0.0) {
ihb = paramssing(itype).p_hbond;
if (ihb == 1) {
d_hb_first[i] = i*maxhb;
hb_first_i = d_hb_first[i];
}
}
for (int jj = 0; jj < jnum; jj++) {
int j = d_neighbors(i,jj);
j &= NEIGHMASK;
const tagint jtag = tag(j);
d_bo_first[j] = j*maxbo;
d_hb_first[j] = j*maxhb;
const int jtype = type(j);
delij[0] = x(j,0) - xtmp;
delij[1] = x(j,1) - ytmp;
delij[2] = x(j,2) - ztmp;
const F_FLOAT rsq = delij[0]*delij[0] + delij[1]*delij[1] + delij[2]*delij[2];
double cutoffsq;
if(i < nlocal) cutoffsq = MAX(cut_bosq,cut_hbsq);
else cutoffsq = cut_bosq;
if (rsq > cutoffsq) continue;
// hbond list
if (i < nlocal && cut_hbsq > 0.0 && (ihb == 1 || ihb == 2) && rsq <= cut_hbsq) {
jhb = paramssing(jtype).p_hbond;
if( ihb == 1 && jhb == 2) {
if (NEIGHFLAG == HALF) {
j_index = hb_first_i + d_hb_num[i];
d_hb_num[i]++;
} else {
j_index = hb_first_i + Kokkos::atomic_fetch_add(&d_hb_num[i],1);
}
const int jj_index = j_index - hb_first_i;
if (jj_index >= maxhb) {
d_resize_hb() = 1;
return;
}
d_hb_list[j_index] = j;
} else if ( j < nlocal && ihb == 2 && jhb == 1) {
if (NEIGHFLAG == HALF) {
i_index = d_hb_first[j] + d_hb_num[j];
d_hb_num[j]++;
} else {
i_index = d_hb_first[j] + Kokkos::atomic_fetch_add(&d_hb_num[j],1);
}
const int ii_index = i_index - d_hb_first[j];
if (ii_index >= maxhb) {
d_resize_hb() = 1;
return;
}
d_hb_list[i_index] = i;
}
}
// bond_list
const F_FLOAT rij = sqrt(rsq);
const F_FLOAT p_bo1 = paramstwbp(itype,jtype).p_bo1;
const F_FLOAT p_bo2 = paramstwbp(itype,jtype).p_bo2;
const F_FLOAT p_bo3 = paramstwbp(itype,jtype).p_bo3;
const F_FLOAT p_bo4 = paramstwbp(itype,jtype).p_bo4;
const F_FLOAT p_bo5 = paramstwbp(itype,jtype).p_bo5;
const F_FLOAT p_bo6 = paramstwbp(itype,jtype).p_bo6;
const F_FLOAT r_s = paramstwbp(itype,jtype).r_s;
const F_FLOAT r_pi = paramstwbp(itype,jtype).r_pi;
const F_FLOAT r_pi2 = paramstwbp(itype,jtype).r_pi2;
if (paramssing(itype).r_s > 0.0 && paramssing(jtype).r_s > 0.0) {
C12 = p_bo1*pow(rij/r_s,p_bo2);
BO_s = (1.0+bo_cut)*exp(C12);
}
else BO_s = C12 = 0.0;
if (paramssing(itype).r_pi > 0.0 && paramssing(jtype).r_pi > 0.0) {
C34 = p_bo3*pow(rij/r_pi,p_bo4);
BO_pi = exp(C34);
}
else BO_pi = C34 = 0.0;
if (paramssing(itype).r_pi2 > 0.0 && paramssing(jtype).r_pi2 > 0.0) {
C56 = p_bo5*pow(rij/r_pi2,p_bo6);
BO_pi2 = exp(C56);
}
else BO_pi2 = C56 = 0.0;
BO = BO_s + BO_pi + BO_pi2;
if (BO < bo_cut) continue;
if (NEIGHFLAG == HALF) {
j_index = bo_first_i + d_bo_num[i];
i_index = d_bo_first[j] + d_bo_num[j];
d_bo_num[i]++;
d_bo_num[j]++;
} else {
j_index = bo_first_i + Kokkos::atomic_fetch_add(&d_bo_num[i],1);
i_index = d_bo_first[j] + Kokkos::atomic_fetch_add(&d_bo_num[j],1);
}
const int jj_index = j_index - bo_first_i;
const int ii_index = i_index - d_bo_first[j];
if (jj_index >= maxbo || ii_index >= maxbo) {
d_resize_bo() = 1;
return;
}
d_bo_list[j_index] = j;
d_bo_list[i_index] = i;
// from BondOrder1
d_BO(i,jj_index) = BO;
d_BO_s(i,jj_index) = BO_s;
d_BO_pi(i,jj_index) = BO_pi;
d_BO_pi2(i,jj_index) = BO_pi2;
d_BO(j,ii_index) = BO;
d_BO_s(j,ii_index) = BO_s;
d_BO_pi(j,ii_index) = BO_pi;
d_BO_pi2(j,ii_index) = BO_pi2;
F_FLOAT Cln_BOp_s = p_bo2 * C12 / rij / rij;
F_FLOAT Cln_BOp_pi = p_bo4 * C34 / rij / rij;
F_FLOAT Cln_BOp_pi2 = p_bo6 * C56 / rij / rij;
if (nlocal == 0)
Cln_BOp_s = Cln_BOp_pi = Cln_BOp_pi2 = 0.0;
for (int d = 0; d < 3; d++) dln_BOp_pi_i[d] = -(BO_pi*Cln_BOp_pi)*delij[d];
for (int d = 0; d < 3; d++) dln_BOp_pi2_i[d] = -(BO_pi2*Cln_BOp_pi2)*delij[d];
for (int d = 0; d < 3; d++) dBOp_i[d] = -(BO_s*Cln_BOp_s+BO_pi*Cln_BOp_pi+BO_pi2*Cln_BOp_pi2)*delij[d];
for (int d = 0; d < 3; d++) a_dDeltap_self(i,d) += dBOp_i[d];
for (int d = 0; d < 3; d++) a_dDeltap_self(j,d) += -dBOp_i[d];
d_dln_BOp_pix(i,jj_index) = dln_BOp_pi_i[0];
d_dln_BOp_piy(i,jj_index) = dln_BOp_pi_i[1];
d_dln_BOp_piz(i,jj_index) = dln_BOp_pi_i[2];
d_dln_BOp_pix(j,ii_index) = -dln_BOp_pi_i[0];
d_dln_BOp_piy(j,ii_index) = -dln_BOp_pi_i[1];
d_dln_BOp_piz(j,ii_index) = -dln_BOp_pi_i[2];
d_dln_BOp_pi2x(i,jj_index) = dln_BOp_pi2_i[0];
d_dln_BOp_pi2y(i,jj_index) = dln_BOp_pi2_i[1];
d_dln_BOp_pi2z(i,jj_index) = dln_BOp_pi2_i[2];
d_dln_BOp_pi2x(j,ii_index) = -dln_BOp_pi2_i[0];
d_dln_BOp_pi2y(j,ii_index) = -dln_BOp_pi2_i[1];
d_dln_BOp_pi2z(j,ii_index) = -dln_BOp_pi2_i[2];
d_dBOpx(i,jj_index) = dBOp_i[0];
d_dBOpy(i,jj_index) = dBOp_i[1];
d_dBOpz(i,jj_index) = dBOp_i[2];
d_dBOpx(j,ii_index) = -dBOp_i[0];
d_dBOpy(j,ii_index) = -dBOp_i[1];
d_dBOpz(j,ii_index) = -dBOp_i[2];
d_BO(i,jj_index) -= bo_cut;
d_BO(j,ii_index) -= bo_cut;
d_BO_s(i,jj_index) -= bo_cut;
d_BO_s(j,ii_index) -= bo_cut;
total_bo += d_BO(i,jj_index);
a_total_bo[j] += d_BO(j,ii_index);
}
a_total_bo[i] += total_bo;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxBondOrder1, const int &ii) const {
const int i = d_ilist[ii];
const int itype = type(i);
const F_FLOAT val_i = paramssing(itype).valency;
d_Deltap[i] = d_total_bo[i] - val_i;
d_Deltap_boc[i] = d_total_bo[i] - paramssing(itype).valency_val;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxBuildListsHalf_LessAtomics<NEIGHFLAG>, const int &ii) const {
if (d_resize_bo() || d_resize_hb())
return;
const int i = d_ilist[ii];
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const int itype = type(i);
const tagint itag = tag(i);
const int jnum = d_numneigh[i];
F_FLOAT C12, C34, C56, BO_s, BO_pi, BO_pi2, BO, delij[3];
int j_index,i_index;
d_bo_first[i] = i*maxbo;
const int bo_first_i = d_bo_first[i];
int ihb = -1;
int jhb = -1;
int hb_first_i;
if (cut_hbsq > 0.0) {
ihb = paramssing(itype).p_hbond;
if (ihb == 1) {
d_hb_first[i] = i*maxhb;
hb_first_i = d_hb_first[i];
}
}
for (int jj = 0; jj < jnum; jj++) {
int j = d_neighbors(i,jj);
j &= NEIGHMASK;
const tagint jtag = tag(j);
d_bo_first[j] = j*maxbo;
d_hb_first[j] = j*maxhb;
const int jtype = type(j);
delij[0] = x(j,0) - xtmp;
delij[1] = x(j,1) - ytmp;
delij[2] = x(j,2) - ztmp;
const F_FLOAT rsq = delij[0]*delij[0] + delij[1]*delij[1] + delij[2]*delij[2];
double cutoffsq;
if(i < nlocal) cutoffsq = MAX(cut_bosq,cut_hbsq);
else cutoffsq = cut_bosq;
if (rsq > cutoffsq) continue;
// hbond list
if (i < nlocal && cut_hbsq > 0.0 && (ihb == 1 || ihb == 2) && rsq <= cut_hbsq) {
jhb = paramssing(jtype).p_hbond;
if( ihb == 1 && jhb == 2) {
if (NEIGHFLAG == HALF) {
j_index = hb_first_i + d_hb_num[i];
d_hb_num[i]++;
} else {
j_index = hb_first_i + Kokkos::atomic_fetch_add(&d_hb_num[i],1);
}
const int jj_index = j_index - hb_first_i;
if (jj_index >= maxhb) {
d_resize_hb() = 1;
return;
}
d_hb_list[j_index] = j;
} else if ( j < nlocal && ihb == 2 && jhb == 1) {
if (NEIGHFLAG == HALF) {
i_index = d_hb_first[j] + d_hb_num[j];
d_hb_num[j]++;
} else {
i_index = d_hb_first[j] + Kokkos::atomic_fetch_add(&d_hb_num[j],1);
}
const int ii_index = i_index - d_hb_first[j];
if (ii_index >= maxhb) {
d_resize_hb() = 1;
return;
}
d_hb_list[i_index] = i;
}
}
// bond_list
const F_FLOAT rij = sqrt(rsq);
const F_FLOAT p_bo1 = paramstwbp(itype,jtype).p_bo1;
const F_FLOAT p_bo2 = paramstwbp(itype,jtype).p_bo2;
const F_FLOAT p_bo3 = paramstwbp(itype,jtype).p_bo3;
const F_FLOAT p_bo4 = paramstwbp(itype,jtype).p_bo4;
const F_FLOAT p_bo5 = paramstwbp(itype,jtype).p_bo5;
const F_FLOAT p_bo6 = paramstwbp(itype,jtype).p_bo6;
const F_FLOAT r_s = paramstwbp(itype,jtype).r_s;
const F_FLOAT r_pi = paramstwbp(itype,jtype).r_pi;
const F_FLOAT r_pi2 = paramstwbp(itype,jtype).r_pi2;
if (paramssing(itype).r_s > 0.0 && paramssing(jtype).r_s > 0.0) {
C12 = p_bo1*pow(rij/r_s,p_bo2);
BO_s = (1.0+bo_cut)*exp(C12);
}
else BO_s = C12 = 0.0;
if (paramssing(itype).r_pi > 0.0 && paramssing(jtype).r_pi > 0.0) {
C34 = p_bo3*pow(rij/r_pi,p_bo4);
BO_pi = exp(C34);
}
else BO_pi = C34 = 0.0;
if (paramssing(itype).r_pi2 > 0.0 && paramssing(jtype).r_pi2 > 0.0) {
C56 = p_bo5*pow(rij/r_pi2,p_bo6);
BO_pi2 = exp(C56);
}
else BO_pi2 = C56 = 0.0;
BO = BO_s + BO_pi + BO_pi2;
if (BO < bo_cut) continue;
if (NEIGHFLAG == HALF) {
j_index = bo_first_i + d_bo_num[i];
i_index = d_bo_first[j] + d_bo_num[j];
d_bo_num[i]++;
d_bo_num[j]++;
} else {
j_index = bo_first_i + Kokkos::atomic_fetch_add(&d_bo_num[i],1);
i_index = d_bo_first[j] + Kokkos::atomic_fetch_add(&d_bo_num[j],1);
}
const int jj_index = j_index - bo_first_i;
const int ii_index = i_index - d_bo_first[j];
if (jj_index >= maxbo || ii_index >= maxbo) {
d_resize_bo() = 1;
return;
}
d_bo_list[j_index] = j;
d_bo_list[i_index] = i;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxBondOrder1_LessAtomics, const int &ii) const {
F_FLOAT C12, C34, C56, BO_s, BO_pi, BO_pi2, BO, delij[3], dBOp_i[3], dln_BOp_pi_i[3], dln_BOp_pi2_i[3];
const int i = d_ilist[ii];
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const int itype = type(i);
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
F_FLOAT total_bo = 0.0;
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
delij[0] = x(j,0) - xtmp;
delij[1] = x(j,1) - ytmp;
delij[2] = x(j,2) - ztmp;
const F_FLOAT rsq = delij[0]*delij[0] + delij[1]*delij[1] + delij[2]*delij[2];
const F_FLOAT rij = sqrt(rsq);
const int jtype = type(j);
const int j_index = jj - j_start;
// calculate uncorrected BO and total bond order
const F_FLOAT p_bo1 = paramstwbp(itype,jtype).p_bo1;
const F_FLOAT p_bo2 = paramstwbp(itype,jtype).p_bo2;
const F_FLOAT p_bo3 = paramstwbp(itype,jtype).p_bo3;
const F_FLOAT p_bo4 = paramstwbp(itype,jtype).p_bo4;
const F_FLOAT p_bo5 = paramstwbp(itype,jtype).p_bo5;
const F_FLOAT p_bo6 = paramstwbp(itype,jtype).p_bo6;
const F_FLOAT r_s = paramstwbp(itype,jtype).r_s;
const F_FLOAT r_pi = paramstwbp(itype,jtype).r_pi;
const F_FLOAT r_pi2 = paramstwbp(itype,jtype).r_pi2;
if (paramssing(itype).r_s > 0.0 && paramssing(jtype).r_s > 0.0) {
C12 = p_bo1*pow(rij/r_s,p_bo2);
BO_s = (1.0+bo_cut)*exp(C12);
}
else BO_s = C12 = 0.0;
if (paramssing(itype).r_pi > 0.0 && paramssing(jtype).r_pi > 0.0) {
C34 = p_bo3*pow(rij/r_pi,p_bo4);
BO_pi = exp(C34);
}
else BO_pi = C34 = 0.0;
if (paramssing(itype).r_pi2 > 0.0 && paramssing(jtype).r_pi2 > 0.0) {
C56 = p_bo5*pow(rij/r_pi2,p_bo6);
BO_pi2 = exp(C56);
}
else BO_pi2 = C56 = 0.0;
BO = BO_s + BO_pi + BO_pi2;
if (BO < bo_cut) continue;
d_BO(i,j_index) = BO;
d_BO_s(i,j_index) = BO;
d_BO_pi(i,j_index) = BO_pi;
d_BO_pi2(i,j_index) = BO_pi2;
F_FLOAT Cln_BOp_s = p_bo2 * C12 / rij / rij;
F_FLOAT Cln_BOp_pi = p_bo4 * C34 / rij / rij;
F_FLOAT Cln_BOp_pi2 = p_bo6 * C56 / rij / rij;
if (nlocal == 0)
Cln_BOp_s = Cln_BOp_pi = Cln_BOp_pi2 = 0.0;
for (int d = 0; d < 3; d++) dln_BOp_pi_i[d] = -(BO_pi*Cln_BOp_pi)*delij[d];
for (int d = 0; d < 3; d++) dln_BOp_pi2_i[d] = -(BO_pi2*Cln_BOp_pi2)*delij[d];
for (int d = 0; d < 3; d++) dBOp_i[d] = -(BO_s*Cln_BOp_s+BO_pi*Cln_BOp_pi+BO_pi2*Cln_BOp_pi2)*delij[d];
for (int d = 0; d < 3; d++) d_dDeltap_self(i,d) += dBOp_i[d];
d_dln_BOp_pix(i,j_index) = dln_BOp_pi_i[0];
d_dln_BOp_piy(i,j_index) = dln_BOp_pi_i[1];
d_dln_BOp_piz(i,j_index) = dln_BOp_pi_i[2];
d_dln_BOp_pi2x(i,j_index) = dln_BOp_pi2_i[0];
d_dln_BOp_pi2y(i,j_index) = dln_BOp_pi2_i[1];
d_dln_BOp_pi2z(i,j_index) = dln_BOp_pi2_i[2];
d_dBOpx(i,j_index) = dBOp_i[0];
d_dBOpy(i,j_index) = dBOp_i[1];
d_dBOpz(i,j_index) = dBOp_i[2];
d_BO(i,j_index) -= bo_cut;
d_BO_s(i,j_index) -= bo_cut;
total_bo += d_BO(i,j_index);
}
d_total_bo[i] += total_bo;
const F_FLOAT val_i = paramssing(itype).valency;
d_Deltap[i] = d_total_bo[i] - val_i;
d_Deltap_boc[i] = d_total_bo[i] - paramssing(itype).valency_val;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxBondOrder2, const int &ii) const {
F_FLOAT delij[3];
F_FLOAT exp_p1i, exp_p2i, exp_p1j, exp_p2j, f1, f2, f3, u1_ij, u1_ji, Cf1A_ij, Cf1B_ij, Cf1_ij, Cf1_ji;
F_FLOAT f4, f5, exp_f4, exp_f5, f4f5, Cf45_ij, Cf45_ji;
F_FLOAT A0_ij, A1_ij, A2_ij, A3_ij, A2_ji, A3_ji;
const int i = d_ilist[ii];
const int itype = type(i);
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const F_FLOAT val_i = paramssing(itype).valency;
d_total_bo[i] = 0.0;
F_FLOAT total_bo = 0.0;
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
delij[0] = x(j,0) - xtmp;
delij[1] = x(j,1) - ytmp;
delij[2] = x(j,2) - ztmp;
const F_FLOAT rsq = delij[0]*delij[0] + delij[1]*delij[1] + delij[2]*delij[2];
const F_FLOAT rij = sqrt(rsq);
const int jtype = type(j);
const int j_index = jj - j_start;
const int i_index = maxbo+j_index;
// calculate corrected BO and total bond order
const F_FLOAT val_j = paramssing(jtype).valency;
const F_FLOAT ovc = paramstwbp(itype,jtype).ovc;
const F_FLOAT v13cor = paramstwbp(itype,jtype).v13cor;
const F_FLOAT p_boc3 = paramstwbp(itype,jtype).p_boc3;
const F_FLOAT p_boc4 = paramstwbp(itype,jtype).p_boc4;
const F_FLOAT p_boc5 = paramstwbp(itype,jtype).p_boc5;
if (ovc < 0.001 && v13cor < 0.001) {
d_C1dbo(i,j_index) = 1.0;
d_C2dbo(i,j_index) = 0.0;
d_C3dbo(i,j_index) = 0.0;
d_C1dbopi(i,j_index) = d_BO_pi(i,j_index);
d_C2dbopi(i,j_index) = 0.0;
d_C3dbopi(i,j_index) = 0.0;
d_C4dbopi(i,j_index) = 0.0;
d_C1dbopi2(i,j_index) = d_BO_pi(i,j_index);
d_C2dbopi2(i,j_index) = 0.0;
d_C3dbopi2(i,j_index) = 0.0;
d_C4dbopi2(i,j_index) = 0.0;
} else {
if (ovc >= 0.001) {
exp_p1i = exp(-p_boc1 * d_Deltap[i]);
exp_p2i = exp(-p_boc2 * d_Deltap[i]);
exp_p1j = exp(-p_boc1 * d_Deltap[j]);
exp_p2j = exp(-p_boc2 * d_Deltap[j]);
f2 = exp_p1i + exp_p1j;
f3 = -1.0/p_boc2*log(0.5*(exp_p2i+exp_p2j));
f1 = 0.5 * ((val_i + f2)/(val_i + f2 + f3) + (val_j + f2)/(val_j + f2 + f3));
u1_ij = val_i + f2 + f3;
u1_ji = val_j + f2 + f3;
Cf1A_ij = 0.5 * f3 * (1.0/(u1_ij*u1_ij)+1.0/(u1_ji*u1_ji));
Cf1B_ij = -0.5 * ((u1_ij - f3)/(u1_ij*u1_ij)+(u1_ji - f3)/(u1_ji*u1_ji));
Cf1_ij = 0.5 * (-p_boc1 * exp_p1i / u1_ij - ((val_i+f2) / (u1_ij*u1_ij)) *
(-p_boc1 * exp_p1i + exp_p2i / (exp_p2i + exp_p2j)) +
-p_boc1 * exp_p1i / u1_ji - ((val_j+f2) / (u1_ji*u1_ji)) *
(-p_boc1 * exp_p1i + exp_p2i / (exp_p2i + exp_p2j)));
Cf1_ji = -Cf1A_ij * p_boc1 * exp_p1j + Cf1B_ij * exp_p2j / ( exp_p2i + exp_p2j );
} else {
f1 = 1.0;
Cf1_ij = Cf1_ji = 0.0;
}
if (v13cor >= 0.001) {
exp_f4 =exp(-(p_boc4*(d_BO(i,j_index)*d_BO(i,j_index))-d_Deltap_boc[i])*p_boc3+p_boc5);
exp_f5 =exp(-(p_boc4*(d_BO(i,j_index)*d_BO(i,j_index))-d_Deltap_boc[j])*p_boc3+p_boc5);
f4 = 1. / (1. + exp_f4);
f5 = 1. / (1. + exp_f5);
f4f5 = f4 * f5;
Cf45_ij = -f4 * exp_f4;
Cf45_ji = -f5 * exp_f5;
} else {
f4 = f5 = f4f5 = 1.0;
Cf45_ij = Cf45_ji = 0.0;
}
A0_ij = f1 * f4f5;
A1_ij = -2 * p_boc3 * p_boc4 * d_BO(i,j_index) * (Cf45_ij + Cf45_ji);
A2_ij = Cf1_ij / f1 + p_boc3 * Cf45_ij;
A2_ji = Cf1_ji / f1 + p_boc3 * Cf45_ji;
A3_ij = A2_ij + Cf1_ij / f1;
A3_ji = A2_ji + Cf1_ji / f1;
d_BO(i,j_index) = d_BO(i,j_index) * A0_ij;
d_BO_pi(i,j_index) = d_BO_pi(i,j_index) * A0_ij * f1;
d_BO_pi2(i,j_index) = d_BO_pi2(i,j_index) * A0_ij * f1;
d_BO_s(i,j_index) = d_BO(i,j_index)-(d_BO_pi(i,j_index)+d_BO_pi2(i,j_index));
d_C1dbo(i,j_index) = A0_ij + d_BO(i,j_index) * A1_ij;
d_C2dbo(i,j_index) = d_BO(i,j_index) * A2_ij;
d_C3dbo(i,j_index) = d_BO(i,j_index) * A2_ji;
d_C1dbopi(i,j_index) = f1*f1*f4*f5;
d_C2dbopi(i,j_index) = d_BO_pi(i,j_index) * A1_ij;
d_C3dbopi(i,j_index) = d_BO_pi(i,j_index) * A3_ij;
d_C4dbopi(i,j_index) = d_BO_pi(i,j_index) * A3_ji;
d_C1dbopi2(i,j_index) = f1*f1*f4*f5;
d_C2dbopi2(i,j_index) = d_BO_pi2(i,j_index) * A1_ij;
d_C3dbopi2(i,j_index) = d_BO_pi2(i,j_index) * A3_ij;
d_C4dbopi2(i,j_index) = d_BO_pi2(i,j_index) * A3_ji;
}
if(d_BO(i,j_index) < 1e-10) d_BO(i,j_index) = 0.0;
if(d_BO_s(i,j_index) < 1e-10) d_BO_s(i,j_index) = 0.0;
if(d_BO_pi(i,j_index) < 1e-10) d_BO_pi(i,j_index) = 0.0;
if(d_BO_pi2(i,j_index) < 1e-10) d_BO_pi2(i,j_index) = 0.0;
total_bo += d_BO(i,j_index);
d_Cdbo(i,j_index) = 0.0;
d_Cdbopi(i,j_index) = 0.0;
d_Cdbopi2(i,j_index) = 0.0;
d_Cdbo(j,i_index) = 0.0;
d_Cdbopi(j,i_index) = 0.0;
d_Cdbopi2(j,i_index) = 0.0;
d_CdDelta[j] = 0.0;
}
d_CdDelta[i] = 0.0;
d_total_bo[i] += total_bo;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxBondOrder3, const int &ii) const {
// bot part of BO()
const int i = d_ilist[ii];
const int itype = type(i);
F_FLOAT nlp_temp;
d_Delta[i] = d_total_bo[i] - paramssing(itype).valency;
const F_FLOAT Delta_e = d_total_bo[i] - paramssing(itype).valency_e;
d_Delta_boc[i] = d_total_bo[i] - paramssing(itype).valency_boc;
const F_FLOAT vlpex = Delta_e - 2.0 * (int)(Delta_e/2.0);
const F_FLOAT explp1 = exp(-gp[15] * SQR(2.0 + vlpex));
const F_FLOAT nlp = explp1 - (int)(Delta_e / 2.0);
d_Delta_lp[i] = paramssing(itype).nlp_opt - nlp;
const F_FLOAT Clp = 2.0 * gp[15] * explp1 * (2.0 + vlpex);
d_dDelta_lp[i] = Clp;
if( paramssing(itype).mass > 21.0 ) {
nlp_temp = 0.5 * (paramssing(itype).valency_e - paramssing(itype).valency);
d_Delta_lp_temp[i] = paramssing(itype).nlp_opt - nlp_temp;
} else {
nlp_temp = nlp;
d_Delta_lp_temp[i] = paramssing(itype).nlp_opt - nlp_temp;
}
d_sum_ovun(i,1) = 0.0;
d_sum_ovun(i,2) = 0.0;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeMulti1<NEIGHFLAG,EVFLAG>, const int &ii) const {
const int i = d_ilist[ii];
const int itype = type(i);
const F_FLOAT imass = paramssing(itype).mass;
F_FLOAT dfvl;
if (imass > 21.0) dfvl = 0.0;
else dfvl = 1.0;
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
F_FLOAT sum_ovun1 = 0.0;
F_FLOAT sum_ovun2 = 0.0;
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
const int jtype = type(j);
const int j_index = jj - j_start;
sum_ovun1 += paramstwbp(itype,jtype).p_ovun1 * paramstwbp(itype,jtype).De_s * d_BO(i,j_index);
sum_ovun2 += (d_Delta[j] - dfvl * d_Delta_lp_temp[j]) * (d_BO_pi(i,j_index) + d_BO_pi2(i,j_index));
}
d_sum_ovun(i,1) += sum_ovun1;
d_sum_ovun(i,2) += sum_ovun2;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeMulti2<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {
Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_CdDelta = d_CdDelta;
Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbo = d_Cdbo;
Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbopi = d_Cdbopi;
Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbopi2 = d_Cdbopi2;
const int i = d_ilist[ii];
const int itype = type(i);
const F_FLOAT imass = paramssing(itype).mass;
const F_FLOAT val_i = paramssing(itype).valency;
F_FLOAT dfvl;
if (imass > 21.0) dfvl = 0.0;
else dfvl = 1.0;
F_FLOAT e_lp, e_ov, e_un;
F_FLOAT CEover1, CEover2, CEover3, CEover4;
F_FLOAT CEunder1, CEunder2, CEunder3, CEunder4;
const F_FLOAT p_lp3 = gp[5];
const F_FLOAT p_ovun2 = paramssing(itype).p_ovun2;
const F_FLOAT p_ovun3 = gp[32];
const F_FLOAT p_ovun4 = gp[31];
const F_FLOAT p_ovun5 = paramssing(itype).p_ovun5;
const F_FLOAT p_ovun6 = gp[6];
const F_FLOAT p_ovun7 = gp[8];
const F_FLOAT p_ovun8 = gp[9];
// lone pair
const F_FLOAT p_lp2 = paramssing(itype).p_lp2;
const F_FLOAT expvd2 = exp( -75 * d_Delta_lp[i]);
const F_FLOAT inv_expvd2 = 1.0 / (1.0+expvd2);
int numbonds = d_bo_num[i];
e_lp = 0.0;
- if (numbonds > 0)
+ if (numbonds > 0 || control->enobondsflag)
e_lp = p_lp2 * d_Delta_lp[i] * inv_expvd2;
const F_FLOAT dElp = p_lp2 * inv_expvd2 + 75.0 * p_lp2 * d_Delta_lp[i] * expvd2 * inv_expvd2*inv_expvd2;
const F_FLOAT CElp = dElp * d_dDelta_lp[i];
- if (numbonds > 0)
+ if (numbonds > 0 || control->enobondsflag)
a_CdDelta[i] += CElp;
if (eflag) ev.ereax[0] += e_lp;
//if (vflag_either || eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,i,e_lp,0.0,0.0,0.0,0.0);
//if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,i,e_lp);
// over coordination
const F_FLOAT exp_ovun1 = p_ovun3 * exp( p_ovun4 * d_sum_ovun(i,2) );
const F_FLOAT inv_exp_ovun1 = 1.0 / (1 + exp_ovun1);
const F_FLOAT Delta_lpcorr = d_Delta[i] - (dfvl * d_Delta_lp_temp[i]) * inv_exp_ovun1;
const F_FLOAT exp_ovun2 = exp( p_ovun2 * Delta_lpcorr );
const F_FLOAT inv_exp_ovun2 = 1.0 / (1.0 + exp_ovun2);
const F_FLOAT DlpVi = 1.0 / (Delta_lpcorr + val_i + 1e-8);
CEover1 = Delta_lpcorr * DlpVi * inv_exp_ovun2;
e_ov = d_sum_ovun(i,1) * CEover1;
if (eflag) ev.ereax[1] += e_ov;
//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,i,e_ov,0.0,0.0,0.0,0.0);
//if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,i,e_ov);
CEover2 = d_sum_ovun(i,1) * DlpVi * inv_exp_ovun2 *
(1.0 - Delta_lpcorr * ( DlpVi + p_ovun2 * exp_ovun2 * inv_exp_ovun2 ));
CEover3 = CEover2 * (1.0 - dfvl * d_dDelta_lp[i] * inv_exp_ovun1 );
CEover4 = CEover2 * (dfvl * d_Delta_lp_temp[i]) * p_ovun4 * exp_ovun1 * SQR(inv_exp_ovun1);
// under coordination
const F_FLOAT exp_ovun2n = 1.0 / exp_ovun2;
const F_FLOAT exp_ovun6 = exp( p_ovun6 * Delta_lpcorr );
const F_FLOAT exp_ovun8 = p_ovun7 * exp(p_ovun8 * d_sum_ovun(i,2));
const F_FLOAT inv_exp_ovun2n = 1.0 / (1.0 + exp_ovun2n);
const F_FLOAT inv_exp_ovun8 = 1.0 / (1.0 + exp_ovun8);
e_un = 0;
- if (numbonds > 0)
+ if (numbonds > 0 || control->enobondsflag)
e_un = -p_ovun5 * (1.0 - exp_ovun6) * inv_exp_ovun2n * inv_exp_ovun8;
if (eflag) ev.ereax[2] += e_un;
//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,i,e_un,0.0,0.0,0.0,0.0);
//if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,i,e_un);
CEunder1 = inv_exp_ovun2n *
( p_ovun5 * p_ovun6 * exp_ovun6 * inv_exp_ovun8 + p_ovun2 * e_un * exp_ovun2n );
CEunder2 = -e_un * p_ovun8 * exp_ovun8 * inv_exp_ovun8;
CEunder3 = CEunder1 * (1.0 - dfvl * d_dDelta_lp[i] * inv_exp_ovun1);
CEunder4 = CEunder1 * (dfvl * d_Delta_lp_temp[i]) *
p_ovun4 * exp_ovun1 * inv_exp_ovun1 * inv_exp_ovun1 + CEunder2;
const F_FLOAT eng_tmp = e_lp + e_ov + e_un;
if (eflag_atom) this->template e_tally_single<NEIGHFLAG>(ev,i,eng_tmp);
// multibody forces
a_CdDelta[i] += CEover3;
- if (numbonds > 0)
+ if (numbonds > 0 || control->enobondsflag)
a_CdDelta[i] += CEunder3;
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
F_FLOAT CdDelta_i = 0.0;
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
const int jtype = type(j);
const F_FLOAT jmass = paramssing(jtype).mass;
const int j_index = jj - j_start;
const F_FLOAT De_s = paramstwbp(itype,jtype).De_s;
// multibody lone pair: correction for C2
if (p_lp3 > 0.001 && imass == 12.0 && jmass == 12.0) {
const F_FLOAT Di = d_Delta[i];
const F_FLOAT vov3 = d_BO(i,j_index) - Di - 0.040*pow(Di,4.0);
if (vov3 > 3.0) {
const F_FLOAT e_lph = p_lp3 * (vov3-3.0)*(vov3-3.0);
const F_FLOAT deahu2dbo = 2.0 * p_lp3 * (vov3 - 3.0);
const F_FLOAT deahu2dsbo = 2.0 * p_lp3 * (vov3 - 3.0) * (-1.0 - 0.16 * pow(Di,3.0));
d_Cdbo(i,j_index) += deahu2dbo;
CdDelta_i += deahu2dsbo;
if (eflag) ev.ereax[0] += e_lph;
if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,e_lph);
}
}
// over/under coordination forces merged together
const F_FLOAT p_ovun1 = paramstwbp(itype,jtype).p_ovun1;
a_CdDelta[j] += (CEover4 + CEunder4) * (1.0 - dfvl * d_dDelta_lp[j]) * (d_BO_pi(i,j_index) + d_BO_pi2(i,j_index));
d_Cdbo(i,j_index) += CEover1 * p_ovun1 * De_s;
d_Cdbopi(i,j_index) += (CEover4 + CEunder4) * (d_Delta[j] - dfvl*d_Delta_lp_temp[j]);
d_Cdbopi2(i,j_index) += (CEover4 + CEunder4) * (d_Delta[j] - dfvl*d_Delta_lp_temp[j]);
}
a_CdDelta[i] += CdDelta_i;
}
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeMulti2<NEIGHFLAG,EVFLAG>, const int &ii) const {
EV_FLOAT_REAX ev;
this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeMulti2<NEIGHFLAG,EVFLAG>(), ii, ev);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeAngular<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {
Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;
Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbo = d_Cdbo;
Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_CdDelta = d_CdDelta;
const int i = d_ilist[ii];
const int itype = type(i);
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
F_FLOAT temp, temp_bo_jt, pBOjt7;
F_FLOAT p_val1, p_val2, p_val3, p_val4, p_val5;
F_FLOAT p_val6, p_val7, p_val8, p_val9, p_val10;
F_FLOAT p_pen1, p_pen2, p_pen3, p_pen4;
F_FLOAT p_coa1, p_coa2, p_coa3, p_coa4;
F_FLOAT trm8, expval6, expval7, expval2theta, expval12theta, exp3ij, exp3jk;
F_FLOAT exp_pen2ij, exp_pen2jk, exp_pen3, exp_pen4, trm_pen34, exp_coa2;
F_FLOAT dSBO1, dSBO2, SBO, SBO2, CSBO2, SBOp, prod_SBO, vlpadj;
F_FLOAT CEval1, CEval2, CEval3, CEval4, CEval5, CEval6, CEval7, CEval8;
F_FLOAT CEpen1, CEpen2, CEpen3;
F_FLOAT e_ang, e_coa, e_pen;
F_FLOAT CEcoa1, CEcoa2, CEcoa3, CEcoa4, CEcoa5;
F_FLOAT Cf7ij, Cf7jk, Cf8j, Cf9j;
F_FLOAT f7_ij, f7_jk, f8_Dj, f9_Dj;
F_FLOAT Ctheta_0, theta_0, theta_00, theta, cos_theta, sin_theta;
F_FLOAT BOA_ij, BOA_ik, rij, bo_ij, bo_ik;
F_FLOAT dcos_theta_di[3], dcos_theta_dj[3], dcos_theta_dk[3];
F_FLOAT eng_tmp, fi_tmp[3], fj_tmp[3], fk_tmp[3];
F_FLOAT delij[3], delik[3], delji[3], delki[3];
p_val6 = gp[14];
p_val8 = gp[33];
p_val9 = gp[16];
p_val10 = gp[17];
p_pen2 = gp[19];
p_pen3 = gp[20];
p_pen4 = gp[21];
p_coa2 = gp[2];
p_coa3 = gp[38];
p_coa4 = gp[30];
p_val3 = paramssing(itype).p_val3;
p_val5 = paramssing(itype).p_val5;
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
const F_FLOAT Delta_val = d_total_bo[i] - paramssing(itype).valency_val;
SBOp = 0.0, prod_SBO = 1.0;
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
const int j_index = jj - j_start;
bo_ij = d_BO(i,j_index);
SBOp += (d_BO_pi(i,j_index) + d_BO_pi2(i,j_index));
temp = SQR(bo_ij);
temp *= temp;
temp *= temp;
prod_SBO *= exp( -temp );
}
const F_FLOAT Delta_e = d_total_bo[i] - paramssing(itype).valency_e;
const F_FLOAT vlpex = Delta_e - 2.0 * (int)(Delta_e/2.0);
const F_FLOAT explp1 = exp(-gp[15] * SQR(2.0 + vlpex));
const F_FLOAT nlp = explp1 - (int)(Delta_e / 2.0);
if( vlpex >= 0.0 ){
vlpadj = 0.0;
dSBO2 = prod_SBO - 1.0;
} else{
vlpadj = nlp;
dSBO2 = (prod_SBO - 1.0) * (1.0 - p_val8 * d_dDelta_lp[i]);
}
SBO = SBOp + (1.0 - prod_SBO) * (-d_Delta_boc[i] - p_val8 * vlpadj);
dSBO1 = -8.0 * prod_SBO * ( d_Delta_boc[i] + p_val8 * vlpadj );
if( SBO <= 0.0 ) {
SBO2 = 0.0;
CSBO2 = 0.0;
} else if( SBO > 0.0 && SBO <= 1.0 ) {
SBO2 = pow( SBO, p_val9 );
CSBO2 = p_val9 * pow( SBO, p_val9 - 1.0 );
} else if( SBO > 1.0 && SBO < 2.0 ) {
SBO2 = 2.0 - pow( 2.0-SBO, p_val9 );
CSBO2 = p_val9 * pow( 2.0 - SBO, p_val9 - 1.0 );
} else {
SBO2 = 2.0;
CSBO2 = 0.0;
}
expval6 = exp( p_val6 * d_Delta_boc[i] );
F_FLOAT CdDelta_i = 0.0;
F_FLOAT fitmp[3],fjtmp[3];
for (int j = 0; j < 3; j++) fitmp[j] = 0.0;
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
const int j_index = jj - j_start;
delij[0] = x(j,0) - xtmp;
delij[1] = x(j,1) - ytmp;
delij[2] = x(j,2) - ztmp;
const F_FLOAT rsqij = delij[0]*delij[0] + delij[1]*delij[1] + delij[2]*delij[2];
rij = sqrt(rsqij);
bo_ij = d_BO(i,j_index);
const int i_index = maxbo+j_index;
BOA_ij = bo_ij - thb_cut;
if (BOA_ij <= 0.0) continue;
if (i >= nlocal && j >= nlocal) continue;
const int jtype = type(j);
F_FLOAT CdDelta_j = 0.0;
for (int k = 0; k < 3; k++) fjtmp[k] = 0.0;
for (int kk = jj+1; kk < j_end; kk++ ) {
//for (int kk = j_start; kk < j_end; kk++ ) {
int k = d_bo_list[kk];
k &= NEIGHMASK;
if (k == j) continue;
const int k_index = kk - j_start;
delik[0] = x(k,0) - xtmp;
delik[1] = x(k,1) - ytmp;
delik[2] = x(k,2) - ztmp;
const F_FLOAT rsqik = delik[0]*delik[0] + delik[1]*delik[1] + delik[2]*delik[2];
const F_FLOAT rik = sqrt(rsqik);
bo_ik = d_BO(i,k_index);
BOA_ik = bo_ik - thb_cut;
if (BOA_ik <= 0.0 || bo_ij <= thb_cut || bo_ik <= thb_cut || bo_ij * bo_ik <= thb_cutsq) continue;
const int ktype = type(k);
// theta and derivatives
cos_theta = (delij[0]*delik[0]+delij[1]*delik[1]+delij[2]*delik[2])/(rij*rik);
if( cos_theta > 1.0 ) cos_theta = 1.0;
if( cos_theta < -1.0 ) cos_theta = -1.0;
theta = acos(cos_theta);
const F_FLOAT inv_dists = 1.0 / (rij * rik);
const F_FLOAT Cdot_inv3 = cos_theta * inv_dists * inv_dists;
for( int t = 0; t < 3; t++ ) {
dcos_theta_di[t] = -(delik[t] + delij[t]) * inv_dists + Cdot_inv3 * (rsqik * delij[t] + rsqij * delik[t]);
dcos_theta_dj[t] = delik[t] * inv_dists - Cdot_inv3 * rsqik * delij[t];
dcos_theta_dk[t] = delij[t] * inv_dists - Cdot_inv3 * rsqij * delik[t];
}
sin_theta = sin(theta);
if (sin_theta < 1.0e-5) sin_theta = 1.0e-5;
p_val1 = paramsthbp(jtype,itype,ktype).p_val1;
if (fabs(p_val1) <= 0.001) continue;
// ANGLE ENERGY
p_val1 = paramsthbp(jtype,itype,ktype).p_val1;
p_val2 = paramsthbp(jtype,itype,ktype).p_val2;
p_val4 = paramsthbp(jtype,itype,ktype).p_val4;
p_val7 = paramsthbp(jtype,itype,ktype).p_val7;
theta_00 = paramsthbp(jtype,itype,ktype).theta_00;
exp3ij = exp( -p_val3 * pow( BOA_ij, p_val4 ) );
f7_ij = 1.0 - exp3ij;
Cf7ij = p_val3 * p_val4 * pow( BOA_ij, p_val4 - 1.0 ) * exp3ij;
exp3jk = exp( -p_val3 * pow( BOA_ik, p_val4 ) );
f7_jk = 1.0 - exp3jk;
Cf7jk = p_val3 * p_val4 * pow( BOA_ik, p_val4 - 1.0 ) * exp3jk;
expval7 = exp( -p_val7 * d_Delta_boc[i] );
trm8 = 1.0 + expval6 + expval7;
f8_Dj = p_val5 - ( (p_val5 - 1.0) * (2.0 + expval6) / trm8 );
Cf8j = ((1.0 - p_val5) / (trm8*trm8)) *
(p_val6 * expval6 * trm8 - (2.0 + expval6) * ( p_val6*expval6 - p_val7*expval7));
theta_0 = 180.0 - theta_00 * (1.0 - exp(-p_val10 * (2.0 - SBO2)));
theta_0 = theta_0*constPI/180.0;
expval2theta = exp( -p_val2 * (theta_0-theta)*(theta_0-theta) );
if( p_val1 >= 0 )
expval12theta = p_val1 * (1.0 - expval2theta);
else // To avoid linear Me-H-Me angles (6/6/06)
expval12theta = p_val1 * -expval2theta;
CEval1 = Cf7ij * f7_jk * f8_Dj * expval12theta;
CEval2 = Cf7jk * f7_ij * f8_Dj * expval12theta;
CEval3 = Cf8j * f7_ij * f7_jk * expval12theta;
CEval4 = -2.0 * p_val1 * p_val2 * f7_ij * f7_jk * f8_Dj * expval2theta * (theta_0 - theta);
Ctheta_0 = p_val10 * theta_00*constPI/180.0 * exp( -p_val10 * (2.0 - SBO2) );
CEval5 = -CEval4 * Ctheta_0 * CSBO2;
CEval6 = CEval5 * dSBO1;
CEval7 = CEval5 * dSBO2;
CEval8 = -CEval4 / sin_theta;
e_ang = f7_ij * f7_jk * f8_Dj * expval12theta;
if (eflag) ev.ereax[3] += e_ang;
// Penalty energy
p_pen1 = paramsthbp(jtype,itype,ktype).p_pen1;
exp_pen2ij = exp( -p_pen2 * (BOA_ij - 2.0)*(BOA_ij - 2.0) );
exp_pen2jk = exp( -p_pen2 * (BOA_ik - 2.0)*(BOA_ik - 2.0) );
exp_pen3 = exp( -p_pen3 * d_Delta[i] );
exp_pen4 = exp( p_pen4 * d_Delta[i] );
trm_pen34 = 1.0 + exp_pen3 + exp_pen4;
f9_Dj = (2.0 + exp_pen3 ) / trm_pen34;
Cf9j = (-p_pen3 * exp_pen3 * trm_pen34 - (2.0 + exp_pen3) *
(-p_pen3 * exp_pen3 + p_pen4 * exp_pen4 ) )/(trm_pen34*trm_pen34);
e_pen = p_pen1 * f9_Dj * exp_pen2ij * exp_pen2jk;
if (eflag) ev.ereax[4] += e_pen;
CEpen1 = e_pen * Cf9j / f9_Dj;
temp = -2.0 * p_pen2 * e_pen;
CEpen2 = temp * (BOA_ij - 2.0);
CEpen3 = temp * (BOA_ik - 2.0);
// ConjAngle energy
p_coa1 = paramsthbp(jtype,itype,ktype).p_coa1;
exp_coa2 = exp( p_coa2 * Delta_val );
e_coa = p_coa1 / (1. + exp_coa2) *
exp( -p_coa3 * SQR(d_total_bo[j]-BOA_ij) ) *
exp( -p_coa3 * SQR(d_total_bo[k]-BOA_ik) ) *
exp( -p_coa4 * SQR(BOA_ij - 1.5) ) *
exp( -p_coa4 * SQR(BOA_ik - 1.5) );
CEcoa1 = -2 * p_coa4 * (BOA_ij - 1.5) * e_coa;
CEcoa2 = -2 * p_coa4 * (BOA_ik - 1.5) * e_coa;
CEcoa3 = -p_coa2 * exp_coa2 * e_coa / (1 + exp_coa2);
CEcoa4 = -2 * p_coa3 * (d_total_bo[j]-BOA_ij) * e_coa;
CEcoa5 = -2 * p_coa3 * (d_total_bo[k]-BOA_ik) * e_coa;
if (eflag) ev.ereax[5] += e_coa;
// Forces
a_Cdbo(i,j_index) += (CEval1 + CEpen2 + (CEcoa1 - CEcoa4));
a_Cdbo(j,i_index) += (CEval1 + CEpen2 + (CEcoa1 - CEcoa4));
a_Cdbo(i,k_index) += (CEval2 + CEpen3 + (CEcoa2 - CEcoa5));
a_Cdbo(k,i_index) += (CEval2 + CEpen3 + (CEcoa2 - CEcoa5));
CdDelta_i += ((CEval3 + CEval7) + CEpen1 + CEcoa3);
CdDelta_j += CEcoa4;
a_CdDelta[k] += CEcoa5;
for (int ll = j_start; ll < j_end; ll++) {
int l = d_bo_list[ll];
l &= NEIGHMASK;
const int l_index = ll - j_start;
temp_bo_jt = d_BO(i,l_index);
temp = temp_bo_jt * temp_bo_jt * temp_bo_jt;
pBOjt7 = temp * temp * temp_bo_jt;
a_Cdbo(i,l_index) += (CEval6 * pBOjt7);
d_Cdbopi(i,l_index) += CEval5;
d_Cdbopi2(i,l_index) += CEval5;
}
for (int d = 0; d < 3; d++) fi_tmp[d] = CEval8 * dcos_theta_di[d];
for (int d = 0; d < 3; d++) fj_tmp[d] = CEval8 * dcos_theta_dj[d];
for (int d = 0; d < 3; d++) fk_tmp[d] = CEval8 * dcos_theta_dk[d];
for (int d = 0; d < 3; d++) fitmp[d] -= fi_tmp[d];
for (int d = 0; d < 3; d++) fjtmp[d] -= fj_tmp[d];
for (int d = 0; d < 3; d++) a_f(k,d) -= fk_tmp[d];
// energy/virial tally
if (EVFLAG) {
eng_tmp = e_ang + e_pen + e_coa;
//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,eng_tmp,0.0,0.0,0.0,0.0);
for (int d = 0; d < 3; d++) delki[d] = -1.0 * delik[d];
for (int d = 0; d < 3; d++) delji[d] = -1.0 * delij[d];
if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,eng_tmp);
if (vflag_either) this->template v_tally3<NEIGHFLAG>(ev,i,j,k,fj_tmp,fk_tmp,delji,delki);
}
}
a_CdDelta[j] += CdDelta_j;
for (int d = 0; d < 3; d++) a_f(j,d) += fjtmp[d];
}
a_CdDelta[i] += CdDelta_i;
for (int d = 0; d < 3; d++) a_f(i,d) += fitmp[d];
}
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeAngular<NEIGHFLAG,EVFLAG>, const int &ii) const {
EV_FLOAT_REAX ev;
this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeAngular<NEIGHFLAG,EVFLAG>(), ii, ev);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeTorsion<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {
Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;
Kokkos::View<F_FLOAT*, typename DAT::t_float_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_CdDelta = d_CdDelta;
Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbo = d_Cdbo;
// in reaxc_torsion_angles: j = i, k = j, i = k;
F_FLOAT Delta_i, Delta_j, bo_ij, bo_ik, bo_jl, BOA_ij, BOA_ik, BOA_jl;
F_FLOAT p_tor1, p_cot1, V1, V2, V3;
F_FLOAT exp_tor2_ij, exp_tor2_ik, exp_tor2_jl, exp_tor1, exp_tor3_DiDj, exp_tor4_DiDj, exp_tor34_inv;
F_FLOAT exp_cot2_ij, exp_cot2_ik, exp_cot2_jl, fn10, f11_DiDj, dfn11, fn12;
F_FLOAT theta_ijk, theta_jil, sin_ijk, sin_jil, cos_ijk, cos_jil, tan_ijk_i, tan_jil_i;
F_FLOAT cos_omega, cos2omega, cos3omega;
F_FLOAT CV, cmn, CEtors1, CEtors2, CEtors3, CEtors4;
F_FLOAT CEtors5, CEtors6, CEtors7, CEtors8, CEtors9;
F_FLOAT Cconj, CEconj1, CEconj2, CEconj3, CEconj4, CEconj5, CEconj6;
F_FLOAT e_tor, e_con, eng_tmp;
F_FLOAT delij[3], delik[3], deljl[3], dellk[3], delil[3], delkl[3];
F_FLOAT fi_tmp[3], fj_tmp[3], fk_tmp[3], fl_tmp[3];
F_FLOAT dcos_omega_di[3], dcos_omega_dj[3], dcos_omega_dk[3], dcos_omega_dl[3];
F_FLOAT dcos_ijk_di[3], dcos_ijk_dj[3], dcos_ijk_dk[3], dcos_jil_di[3], dcos_jil_dj[3], dcos_jil_dk[3];
F_FLOAT p_tor2 = gp[23];
F_FLOAT p_tor3 = gp[24];
F_FLOAT p_tor4 = gp[25];
F_FLOAT p_cot2 = gp[27];
const int i = d_ilist[ii];
const int itype = type(i);
const tagint itag = tag(i);
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
Delta_i = d_Delta_boc[i];
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
F_FLOAT fitmp[3], fjtmp[3], fktmp[3];
for(int j = 0; j < 3; j++) fitmp[j] = 0.0;
F_FLOAT CdDelta_i = 0.0;
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
const tagint jtag = tag(j);
const int jtype = type(j);
const int j_index = jj - j_start;
// skip half of the interactions
if (itag > jtag) {
if ((itag+jtag) % 2 == 0) continue;
} else if (itag < jtag) {
if ((itag+jtag) % 2 == 1) continue;
} else {
if (x(j,2) < ztmp) continue;
if (x(j,2) == ztmp && x(j,1) < ytmp) continue;
if (x(j,2) == ztmp && x(j,1) == ytmp && x(j,0) < xtmp) continue;
}
bo_ij = d_BO(i,j_index);
if (bo_ij < thb_cut) continue;
delij[0] = x(j,0) - xtmp;
delij[1] = x(j,1) - ytmp;
delij[2] = x(j,2) - ztmp;
const F_FLOAT rsqij = delij[0]*delij[0] + delij[1]*delij[1] + delij[2]*delij[2];
const F_FLOAT rij = sqrt(rsqij);
BOA_ij = bo_ij - thb_cut;
Delta_j = d_Delta_boc[j];
exp_tor2_ij = exp( -p_tor2 * BOA_ij );
exp_cot2_ij = exp( -p_cot2 * SQR(BOA_ij - 1.5) );
exp_tor3_DiDj = exp( -p_tor3 * (Delta_i + Delta_j) );
exp_tor4_DiDj = exp( p_tor4 * (Delta_i + Delta_j) );
exp_tor34_inv = 1.0 / (1.0 + exp_tor3_DiDj + exp_tor4_DiDj);
f11_DiDj = (2.0 + exp_tor3_DiDj) * exp_tor34_inv;
const int l_start = d_bo_first[j];
const int l_end = l_start + d_bo_num[j];
for(int k = 0; k < 3; k++) fjtmp[k] = 0.0;
F_FLOAT CdDelta_j = 0.0;
for (int kk = j_start; kk < j_end; kk++) {
int k = d_bo_list[kk];
k &= NEIGHMASK;
if (k == j) continue;
const int ktype = type(k);
const int k_index = kk - j_start;
bo_ik = d_BO(i,k_index);
if (bo_ik < thb_cut) continue;
BOA_ik = bo_ik - thb_cut;
for (int d = 0; d < 3; d ++) delik[d] = x(k,d) - x(i,d);
const F_FLOAT rsqik = delik[0]*delik[0] + delik[1]*delik[1] + delik[2]*delik[2];
const F_FLOAT rik = sqrt(rsqik);
cos_ijk = (delij[0]*delik[0]+delij[1]*delik[1]+delij[2]*delik[2])/(rij*rik);
if( cos_ijk > 1.0 ) cos_ijk = 1.0;
if( cos_ijk < -1.0 ) cos_ijk = -1.0;
theta_ijk = acos(cos_ijk);
// dcos_ijk
const F_FLOAT inv_dists = 1.0 / (rij * rik);
const F_FLOAT cos_ijk_tmp = cos_ijk / ((rij*rik)*(rij*rik));
for( int d = 0; d < 3; d++ ) {
dcos_ijk_di[d] = -(delik[d] + delij[d]) * inv_dists + cos_ijk_tmp * (rsqik * delij[d] + rsqij * delik[d]);
dcos_ijk_dj[d] = delik[d] * inv_dists - cos_ijk_tmp * rsqik * delij[d];
dcos_ijk_dk[d] = delij[d] * inv_dists - cos_ijk_tmp * rsqij * delik[d];
}
sin_ijk = sin( theta_ijk );
if( sin_ijk >= 0 && sin_ijk <= 1e-10 )
tan_ijk_i = cos_ijk / 1e-10;
else if( sin_ijk <= 0 && sin_ijk >= -1e-10 )
tan_ijk_i = -cos_ijk / 1e-10;
else tan_ijk_i = cos_ijk / sin_ijk;
exp_tor2_ik = exp( -p_tor2 * BOA_ik );
exp_cot2_ik = exp( -p_cot2 * SQR(BOA_ik -1.5) );
for(int l = 0; l < 3; l++) fktmp[l] = 0.0;
for (int ll = l_start; ll < l_end; ll++) {
int l = d_bo_list[ll];
l &= NEIGHMASK;
if (l == i) continue;
const int ltype = type(l);
const int l_index = ll - l_start;
bo_jl = d_BO(j,l_index);
if (l == k || bo_jl < thb_cut || bo_ij*bo_ik*bo_jl < thb_cut) continue;
for (int d = 0; d < 3; d ++) deljl[d] = x(l,d) - x(j,d);
const F_FLOAT rsqjl = deljl[0]*deljl[0] + deljl[1]*deljl[1] + deljl[2]*deljl[2];
const F_FLOAT rjl = sqrt(rsqjl);
BOA_jl = bo_jl - thb_cut;
cos_jil = -(delij[0]*deljl[0]+delij[1]*deljl[1]+delij[2]*deljl[2])/(rij*rjl);
if( cos_jil > 1.0 ) cos_jil = 1.0;
if( cos_jil < -1.0 ) cos_jil = -1.0;
theta_jil = acos(cos_jil);
// dcos_jil
const F_FLOAT inv_distjl = 1.0 / (rij * rjl);
const F_FLOAT inv_distjl3 = pow( inv_distjl, 3.0 );
const F_FLOAT cos_jil_tmp = cos_jil / ((rij*rjl)*(rij*rjl));
for( int d = 0; d < 3; d++ ) {
dcos_jil_di[d] = deljl[d] * inv_distjl - cos_jil_tmp * rsqjl * -delij[d];
dcos_jil_dj[d] = (-deljl[d] + delij[d]) * inv_distjl - cos_jil_tmp * (rsqjl * delij[d] + rsqij * -deljl[d]);
dcos_jil_dk[d] = -delij[d] * inv_distjl - cos_jil_tmp * rsqij * deljl[d];
}
sin_jil = sin( theta_jil );
if( sin_jil >= 0 && sin_jil <= 1e-10 )
tan_jil_i = cos_jil / 1e-10;
else if( sin_jil <= 0 && sin_jil >= -1e-10 )
tan_jil_i = -cos_jil / 1e-10;
else tan_jil_i = cos_jil / sin_jil;
for (int d = 0; d < 3; d ++) dellk[d] = x(k,d) - x(l,d);
const F_FLOAT rsqlk = dellk[0]*dellk[0] + dellk[1]*dellk[1] + dellk[2]*dellk[2];
const F_FLOAT rlk = sqrt(rsqlk);
F_FLOAT unnorm_cos_omega, unnorm_sin_omega, omega;
F_FLOAT htra, htrb, htrc, hthd, hthe, hnra, hnrc, hnhd, hnhe;
F_FLOAT arg, poem, tel;
F_FLOAT cross_ij_jl[3];
// omega
F_FLOAT dot_ij_jk = -(delij[0]*delik[0]+delij[1]*delik[1]+delij[2]*delik[2]);
F_FLOAT dot_ij_lj = delij[0]*deljl[0]+delij[1]*deljl[1]+delij[2]*deljl[2];
F_FLOAT dot_ik_jl = delik[0]*deljl[0]+delik[1]*deljl[1]+delik[2]*deljl[2];
unnorm_cos_omega = dot_ij_jk * dot_ij_lj + rsqij * dot_ik_jl;
cross_ij_jl[0] = delij[1]*deljl[2] - delij[2]*deljl[1];
cross_ij_jl[1] = delij[2]*deljl[0] - delij[0]*deljl[2];
cross_ij_jl[2] = delij[0]*deljl[1] - delij[1]*deljl[0];
unnorm_sin_omega = -rij*(delik[0]*cross_ij_jl[0]+delik[1]*cross_ij_jl[1]+delik[2]*cross_ij_jl[2]);
omega = atan2( unnorm_sin_omega, unnorm_cos_omega );
htra = rik + cos_ijk * ( rjl * cos_jil - rij );
htrb = rij - rik * cos_ijk - rjl * cos_jil;
htrc = rjl + cos_jil * ( rik * cos_ijk - rij );
hthd = rik * sin_ijk * ( rij - rjl * cos_jil );
hthe = rjl * sin_jil * ( rij - rik * cos_ijk );
hnra = rjl * sin_ijk * sin_jil;
hnrc = rik * sin_ijk * sin_jil;
hnhd = rik * rjl * cos_ijk * sin_jil;
hnhe = rik * rjl * sin_ijk * cos_jil;
poem = 2.0 * rik * rjl * sin_ijk * sin_jil;
if( poem < 1e-20 ) poem = 1e-20;
tel = SQR(rik) + SQR(rij) + SQR(rjl) - SQR(rlk) -
2.0 * (rik * rij * cos_ijk - rik * rjl * cos_ijk * cos_jil + rij * rjl * cos_jil);
arg = tel / poem;
if( arg > 1.0 ) arg = 1.0;
if( arg < -1.0 ) arg = -1.0;
F_FLOAT sin_ijk_rnd = sin_ijk;
F_FLOAT sin_jil_rnd = sin_jil;
if( sin_ijk >= 0 && sin_ijk <= 1e-10 ) sin_ijk_rnd = 1e-10;
else if( sin_ijk <= 0 && sin_ijk >= -1e-10 ) sin_ijk_rnd = -1e-10;
if( sin_jil >= 0 && sin_jil <= 1e-10 ) sin_jil_rnd = 1e-10;
else if( sin_jil <= 0 && sin_jil >= -1e-10 ) sin_jil_rnd = -1e-10;
// dcos_omega_di
for (int d = 0; d < 3; d++) dcos_omega_dk[d] = ((htra-arg*hnra)/rik) * delik[d] - dellk[d];
for (int d = 0; d < 3; d++) dcos_omega_dk[d] += (hthd-arg*hnhd)/sin_ijk_rnd * -dcos_ijk_dk[d];
for (int d = 0; d < 3; d++) dcos_omega_dk[d] *= 2.0/poem;
// dcos_omega_dj
for (int d = 0; d < 3; d++) dcos_omega_di[d] = -((htra-arg*hnra)/rik) * delik[d] - htrb/rij * delij[d];
for (int d = 0; d < 3; d++) dcos_omega_di[d] += -(hthd-arg*hnhd)/sin_ijk_rnd * dcos_ijk_di[d];
for (int d = 0; d < 3; d++) dcos_omega_di[d] += -(hthe-arg*hnhe)/sin_jil_rnd * dcos_jil_di[d];
for (int d = 0; d < 3; d++) dcos_omega_di[d] *= 2.0/poem;
// dcos_omega_dk
for (int d = 0; d < 3; d++) dcos_omega_dj[d] = -((htrc-arg*hnrc)/rjl) * deljl[d] + htrb/rij * delij[d];
for (int d = 0; d < 3; d++) dcos_omega_dj[d] += -(hthd-arg*hnhd)/sin_ijk_rnd * dcos_ijk_dj[d];
for (int d = 0; d < 3; d++) dcos_omega_dj[d] += -(hthe-arg*hnhe)/sin_jil_rnd * dcos_jil_dj[d];
for (int d = 0; d < 3; d++) dcos_omega_dj[d] *= 2.0/poem;
// dcos_omega_dl
for (int d = 0; d < 3; d++) dcos_omega_dl[d] = ((htrc-arg*hnrc)/rjl) * deljl[d] + dellk[d];
for (int d = 0; d < 3; d++) dcos_omega_dl[d] += (hthe-arg*hnhe)/sin_jil_rnd * -dcos_jil_dk[d];
for (int d = 0; d < 3; d++) dcos_omega_dl[d] *= 2.0/poem;
cos_omega = cos( omega );
cos2omega = cos( 2. * omega );
cos3omega = cos( 3. * omega );
// torsion energy
p_tor1 = paramsfbp(ktype,itype,jtype,ltype).p_tor1;
p_cot1 = paramsfbp(ktype,itype,jtype,ltype).p_cot1;
V1 = paramsfbp(ktype,itype,jtype,ltype).V1;
V2 = paramsfbp(ktype,itype,jtype,ltype).V2;
V3 = paramsfbp(ktype,itype,jtype,ltype).V3;
exp_tor1 = exp(p_tor1 * SQR(2.0 - d_BO_pi(i,j_index) - f11_DiDj));
exp_tor2_jl = exp(-p_tor2 * BOA_jl);
exp_cot2_jl = exp(-p_cot2 * SQR(BOA_jl - 1.5) );
fn10 = (1.0 - exp_tor2_ik) * (1.0 - exp_tor2_ij) * (1.0 - exp_tor2_jl);
CV = 0.5 * (V1 * (1.0 + cos_omega) + V2 * exp_tor1 * (1.0 - cos2omega) + V3 * (1.0 + cos3omega) );
e_tor = fn10 * sin_ijk * sin_jil * CV;
if (eflag) ev.ereax[6] += e_tor;
dfn11 = (-p_tor3 * exp_tor3_DiDj + (p_tor3 * exp_tor3_DiDj - p_tor4 * exp_tor4_DiDj) *
(2.0 + exp_tor3_DiDj) * exp_tor34_inv) * exp_tor34_inv;
CEtors1 = sin_ijk * sin_jil * CV;
CEtors2 = -fn10 * 2.0 * p_tor1 * V2 * exp_tor1 * (2.0 - d_BO_pi(i,j_index) - f11_DiDj) *
(1.0 - SQR(cos_omega)) * sin_ijk * sin_jil;
CEtors3 = CEtors2 * dfn11;
CEtors4 = CEtors1 * p_tor2 * exp_tor2_ik * (1.0 - exp_tor2_ij) * (1.0 - exp_tor2_jl);
CEtors5 = CEtors1 * p_tor2 * (1.0 - exp_tor2_ik) * exp_tor2_ij * (1.0 - exp_tor2_jl);
CEtors6 = CEtors1 * p_tor2 * (1.0 - exp_tor2_ik) * (1.0 - exp_tor2_ij) * exp_tor2_jl;
cmn = -fn10 * CV;
CEtors7 = cmn * sin_jil * tan_ijk_i;
CEtors8 = cmn * sin_ijk * tan_jil_i;
CEtors9 = fn10 * sin_ijk * sin_jil *
(0.5 * V1 - 2.0 * V2 * exp_tor1 * cos_omega + 1.5 * V3 * (cos2omega + 2.0 * SQR(cos_omega)));
// 4-body conjugation energy
fn12 = exp_cot2_ik * exp_cot2_ij * exp_cot2_jl;
e_con = p_cot1 * fn12 * (1.0 + (SQR(cos_omega) - 1.0) * sin_ijk * sin_jil);
if (eflag) ev.ereax[7] += e_con;
Cconj = -2.0 * fn12 * p_cot1 * p_cot2 * (1.0 + (SQR(cos_omega) - 1.0) * sin_ijk * sin_jil);
CEconj1 = Cconj * (BOA_ik - 1.5e0);
CEconj2 = Cconj * (BOA_ij - 1.5e0);
CEconj3 = Cconj * (BOA_jl - 1.5e0);
CEconj4 = -p_cot1 * fn12 * (SQR(cos_omega) - 1.0) * sin_jil * tan_ijk_i;
CEconj5 = -p_cot1 * fn12 * (SQR(cos_omega) - 1.0) * sin_ijk * tan_jil_i;
CEconj6 = 2.0 * p_cot1 * fn12 * cos_omega * sin_ijk * sin_jil;
// forces
// contribution to bond order
d_Cdbopi(i,j_index) += CEtors2;
CdDelta_i += CEtors3;
CdDelta_j += CEtors3;
a_Cdbo(i,k_index) += CEtors4 + CEconj1;
a_Cdbo(i,j_index) += CEtors5 + CEconj2;
a_Cdbo(j,l_index) += CEtors6 + CEconj3; // trouble
// dcos_theta_ijk
const F_FLOAT coeff74 = CEtors7 + CEconj4;
for (int d = 0; d < 3; d++) fi_tmp[d] = (coeff74) * dcos_ijk_di[d];
for (int d = 0; d < 3; d++) fj_tmp[d] = (coeff74) * dcos_ijk_dj[d];
for (int d = 0; d < 3; d++) fk_tmp[d] = (coeff74) * dcos_ijk_dk[d];
const F_FLOAT coeff85 = CEtors8 + CEconj5;
// dcos_theta_jil
for (int d = 0; d < 3; d++) fi_tmp[d] += (coeff85) * dcos_jil_di[d];
for (int d = 0; d < 3; d++) fj_tmp[d] += (coeff85) * dcos_jil_dj[d];
for (int d = 0; d < 3; d++) fl_tmp[d] = (coeff85) * dcos_jil_dk[d];
// dcos_omega
const F_FLOAT coeff96 = CEtors9 + CEconj6;
for (int d = 0; d < 3; d++) fi_tmp[d] += (coeff96) * dcos_omega_di[d];
for (int d = 0; d < 3; d++) fj_tmp[d] += (coeff96) * dcos_omega_dj[d];
for (int d = 0; d < 3; d++) fk_tmp[d] += (coeff96) * dcos_omega_dk[d];
for (int d = 0; d < 3; d++) fl_tmp[d] += (coeff96) * dcos_omega_dl[d];
// total forces
for (int d = 0; d < 3; d++) fitmp[d] -= fi_tmp[d];
for (int d = 0; d < 3; d++) fjtmp[d] -= fj_tmp[d];
for (int d = 0; d < 3; d++) fktmp[d] -= fk_tmp[d];
for (int d = 0; d < 3; d++) a_f(l,d) -= fl_tmp[d];
// per-atom energy/virial tally
if (EVFLAG) {
eng_tmp = e_tor + e_con;
//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,eng_tmp,0.0,0.0,0.0,0.0);
if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,eng_tmp);
if (vflag_either) {
for (int d = 0; d < 3; d ++) delil[d] = x(l,d) - x(i,d);
for (int d = 0; d < 3; d ++) delkl[d] = x(l,d) - x(k,d);
this->template v_tally4<NEIGHFLAG>(ev,k,i,j,l,fk_tmp,fi_tmp,fj_tmp,delkl,delil,deljl);
}
}
}
for (int d = 0; d < 3; d++) a_f(k,d) += fktmp[d];
}
a_CdDelta[j] += CdDelta_j;
for (int d = 0; d < 3; d++) a_f(j,d) += fjtmp[d];
}
a_CdDelta[i] += CdDelta_i;
for (int d = 0; d < 3; d++) a_f(i,d) += fitmp[d];
}
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeTorsion<NEIGHFLAG,EVFLAG>, const int &ii) const {
EV_FLOAT_REAX ev;
this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeTorsion<NEIGHFLAG,EVFLAG>(), ii, ev);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeHydrogen<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {
Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;
int hblist[MAX_BONDS];
F_FLOAT theta, cos_theta, sin_xhz4, cos_xhz1, sin_theta2;
F_FLOAT e_hb, exp_hb2, exp_hb3, CEhb1, CEhb2, CEhb3;
F_FLOAT dcos_theta_di[3], dcos_theta_dj[3], dcos_theta_dk[3];
// tally variables
F_FLOAT fi_tmp[3], fj_tmp[3], fk_tmp[3], delij[3], delji[3], delik[3], delki[3];
for (int d = 0; d < 3; d++) fi_tmp[d] = fj_tmp[d] = fk_tmp[d] = 0.0;
const int i = d_ilist[ii];
const int itype = type(i);
if( paramssing(itype).p_hbond != 1 ) return;
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const tagint itag = tag(i);
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
const int k_start = d_hb_first[i];
const int k_end = k_start + d_hb_num[i];
int top = 0;
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
const int jtype = type(j);
const int j_index = jj - j_start;
const F_FLOAT bo_ij = d_BO(i,j_index);
if( paramssing(jtype).p_hbond == 2 && bo_ij >= HB_THRESHOLD ) {
hblist[top] = jj;
top ++;
}
}
F_FLOAT fitmp[3];
for (int d = 0; d < 3; d++) fitmp[d] = 0.0;
for (int kk = k_start; kk < k_end; kk++) {
int k = d_hb_list[kk];
k &= NEIGHMASK;
const tagint ktag = tag(k);
const int ktype = type(k);
delik[0] = x(k,0) - xtmp;
delik[1] = x(k,1) - ytmp;
delik[2] = x(k,2) - ztmp;
const F_FLOAT rsqik = delik[0]*delik[0] + delik[1]*delik[1] + delik[2]*delik[2];
const F_FLOAT rik = sqrt(rsqik);
for (int itr = 0; itr < top; itr++) {
const int jj = hblist[itr];
int j = d_bo_list[jj];
j &= NEIGHMASK;
const tagint jtag = tag(j);
if (jtag == ktag) continue;
const int jtype = type(j);
const int j_index = jj - j_start;
const F_FLOAT bo_ij = d_BO(i,j_index);
delij[0] = x(j,0) - xtmp;
delij[1] = x(j,1) - ytmp;
delij[2] = x(j,2) - ztmp;
const F_FLOAT rsqij = delij[0]*delij[0] + delij[1]*delij[1] + delij[2]*delij[2];
const F_FLOAT rij = sqrt(rsqij);
// theta and derivatives
cos_theta = (delij[0]*delik[0]+delij[1]*delik[1]+delij[2]*delik[2])/(rij*rik);
if( cos_theta > 1.0 ) cos_theta = 1.0;
if( cos_theta < -1.0 ) cos_theta = -1.0;
theta = acos(cos_theta);
const F_FLOAT inv_dists = 1.0 / (rij * rik);
const F_FLOAT Cdot_inv3 = cos_theta * inv_dists * inv_dists;
for( int d = 0; d < 3; d++ ) {
dcos_theta_di[d] = -(delik[d] + delij[d]) * inv_dists + Cdot_inv3 * (rsqik * delij[d] + rsqij * delik[d]);
dcos_theta_dj[d] = delik[d] * inv_dists - Cdot_inv3 * rsqik * delij[d];
dcos_theta_dk[d] = delij[d] * inv_dists - Cdot_inv3 * rsqij * delik[d];
}
// hydrogen bond energy
const F_FLOAT p_hb1 = paramshbp(jtype,itype,ktype).p_hb1;
const F_FLOAT p_hb2 = paramshbp(jtype,itype,ktype).p_hb2;
const F_FLOAT p_hb3 = paramshbp(jtype,itype,ktype).p_hb3;
const F_FLOAT r0_hb = paramshbp(jtype,itype,ktype).r0_hb;
sin_theta2 = sin(theta/2.0);
sin_xhz4 = SQR(sin_theta2);
sin_xhz4 *= sin_xhz4;
cos_xhz1 = (1.0 - cos_theta);
exp_hb2 = exp(-p_hb2 * bo_ij);
exp_hb3 = exp(-p_hb3 * (r0_hb/rik + rik/r0_hb - 2.0));
e_hb = p_hb1 * (1.0 - exp_hb2) * exp_hb3 * sin_xhz4;
if (eflag) ev.ereax[8] += e_hb;
// hydrogen bond forces
CEhb1 = p_hb1 * p_hb2 * exp_hb2 * exp_hb3 * sin_xhz4;
CEhb2 = -p_hb1/2.0 * (1.0 - exp_hb2) * exp_hb3 * cos_xhz1;
CEhb3 = -p_hb3 * (-r0_hb/SQR(rik) + 1.0/r0_hb) * e_hb;
d_Cdbo(i,j_index) += CEhb1; // dbo term
// dcos terms
for (int d = 0; d < 3; d++) fi_tmp[d] = CEhb2 * dcos_theta_di[d];
for (int d = 0; d < 3; d++) fj_tmp[d] = CEhb2 * dcos_theta_dj[d];
for (int d = 0; d < 3; d++) fk_tmp[d] = CEhb2 * dcos_theta_dk[d];
// dr terms
for (int d = 0; d < 3; d++) fi_tmp[d] -= CEhb3/rik * delik[d];
for (int d = 0; d < 3; d++) fk_tmp[d] += CEhb3/rik * delik[d];
for (int d = 0; d < 3; d++) fitmp[d] -= fi_tmp[d];
for (int d = 0; d < 3; d++) a_f(j,d) -= fj_tmp[d];
for (int d = 0; d < 3; d++) a_f(k,d) -= fk_tmp[d];
for (int d = 0; d < 3; d++) delki[d] = -1.0 * delik[d];
for (int d = 0; d < 3; d++) delji[d] = -1.0 * delij[d];
if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,e_hb);
if (vflag_either) this->template v_tally3<NEIGHFLAG>(ev,i,j,k,fj_tmp,fk_tmp,delji,delki);
}
}
for (int d = 0; d < 3; d++) a_f(i,d) += fitmp[d];
}
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeHydrogen<NEIGHFLAG,EVFLAG>, const int &ii) const {
EV_FLOAT_REAX ev;
this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeHydrogen<NEIGHFLAG,EVFLAG>(), ii, ev);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxUpdateBond<NEIGHFLAG>, const int &ii) const {
Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbo = d_Cdbo;
Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbopi = d_Cdbopi;
Kokkos::View<F_FLOAT**, typename DAT::t_ffloat_2d_dl::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_Cdbopi2 = d_Cdbopi2;
const int i = d_ilist[ii];
const tagint itag = tag(i);
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
const tagint jtag = tag(j);
const int j_index = jj - j_start;
const F_FLOAT Cdbo_i = d_Cdbo(i,j_index);
const F_FLOAT Cdbopi_i = d_Cdbopi(i,j_index);
const F_FLOAT Cdbopi2_i = d_Cdbopi2(i,j_index);
const int k_start = d_bo_first[j];
const int k_end = k_start + d_bo_num[j];
for (int kk = k_start; kk < k_end; kk++) {
int k = d_bo_list[kk];
k &= NEIGHMASK;
if (k != i) continue;
const int k_index = kk - k_start;
int flag = 0;
if (itag > jtag) {
if ((itag+jtag) % 2 == 0) flag = 1;
} else if (itag < jtag) {
if ((itag+jtag) % 2 == 1) flag = 1;
}
if (flag) {
a_Cdbo(j,k_index) += Cdbo_i;
a_Cdbopi(j,k_index) += Cdbopi_i;
a_Cdbopi2(j,k_index) += Cdbopi2_i;
}
}
}
}
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeBond1<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {
Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;
Kokkos::View<F_FLOAT*, typename DAT::t_ffloat_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_CdDelta = d_CdDelta;
F_FLOAT delij[3];
F_FLOAT p_be1, p_be2, De_s, De_p, De_pp, pow_BOs_be2, exp_be12, CEbo, ebond;
const int i = d_ilist[ii];
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const int itype = type(i);
const tagint itag = tag(i);
const F_FLOAT imass = paramssing(itype).mass;
const F_FLOAT val_i = paramssing(itype).valency;
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
F_FLOAT CdDelta_i = 0.0;
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
const tagint jtag = tag(j);
if (itag > jtag) {
if ((itag+jtag) % 2 == 0) continue;
} else if (itag < jtag) {
if ((itag+jtag) % 2 == 1) continue;
} else {
if (x(j,2) < ztmp) continue;
if (x(j,2) == ztmp && x(j,1) < ytmp) continue;
if (x(j,2) == ztmp && x(j,1) == ytmp && x(j,0) < xtmp) continue;
}
const int jtype = type(j);
const int j_index = jj - j_start;
const F_FLOAT jmass = paramssing(jtype).mass;
delij[0] = x(j,0) - xtmp;
delij[1] = x(j,1) - ytmp;
delij[2] = x(j,2) - ztmp;
const F_FLOAT rsq = delij[0]*delij[0] + delij[1]*delij[1] + delij[2]*delij[2];
const F_FLOAT rij = sqrt(rsq);
const int k_start = d_bo_first[j];
const int k_end = k_start + d_bo_num[j];
const F_FLOAT p_bo1 = paramstwbp(itype,jtype).p_bo1;
const F_FLOAT p_bo2 = paramstwbp(itype,jtype).p_bo2;
const F_FLOAT p_bo3 = paramstwbp(itype,jtype).p_bo3;
const F_FLOAT p_bo4 = paramstwbp(itype,jtype).p_bo4;
const F_FLOAT p_bo5 = paramstwbp(itype,jtype).p_bo5;
const F_FLOAT p_bo6 = paramstwbp(itype,jtype).p_bo6;
const F_FLOAT r_s = paramstwbp(itype,jtype).r_s;
const F_FLOAT r_pi = paramstwbp(itype,jtype).r_pi;
const F_FLOAT r_pi2 = paramstwbp(itype,jtype).r_pi2;
// bond energy (nlocal only)
p_be1 = paramstwbp(itype,jtype).p_be1;
p_be2 = paramstwbp(itype,jtype).p_be2;
De_s = paramstwbp(itype,jtype).De_s;
De_p = paramstwbp(itype,jtype).De_p;
De_pp = paramstwbp(itype,jtype).De_pp;
const F_FLOAT BO_i = d_BO(i,j_index);
const F_FLOAT BO_s_i = d_BO_s(i,j_index);
const F_FLOAT BO_pi_i = d_BO_pi(i,j_index);
const F_FLOAT BO_pi2_i = d_BO_pi2(i,j_index);
pow_BOs_be2 = pow(BO_s_i,p_be2);
exp_be12 = exp(p_be1*(1.0-pow_BOs_be2));
CEbo = -De_s*exp_be12*(1.0-p_be1*p_be2*pow_BOs_be2);
ebond = -De_s*BO_s_i*exp_be12
-De_p*BO_pi_i
-De_pp*BO_pi2_i;
if (eflag) ev.evdwl += ebond;
//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,ebond,0.0,0.0,0.0,0.0);
//if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,ebond);
// calculate derivatives of Bond Orders
d_Cdbo(i,j_index) += CEbo;
d_Cdbopi(i,j_index) -= (CEbo + De_p);
d_Cdbopi2(i,j_index) -= (CEbo + De_pp);
// Stabilisation terminal triple bond
F_FLOAT estriph = 0.0;
if( BO_i >= 1.00 ) {
if( gp[37] == 2 || (imass == 12.0000 && jmass == 15.9990) ||
(jmass == 12.0000 && imass == 15.9990) ) {
const F_FLOAT exphu = exp(-gp[7] * SQR(BO_i - 2.50) );
const F_FLOAT exphua1 = exp(-gp[3] * (d_total_bo[i]-BO_i));
const F_FLOAT exphub1 = exp(-gp[3] * (d_total_bo[j]-BO_i));
const F_FLOAT exphuov = exp(gp[4] * (d_Delta[i] + d_Delta[j]));
const F_FLOAT hulpov = 1.0 / (1.0 + 25.0 * exphuov);
estriph = gp[10] * exphu * hulpov * (exphua1 + exphub1);
if (eflag) ev.evdwl += estriph;
//if (eflag_atom) this->template ev_tally<NEIGHFLAG>(ev,i,j,estriph,0.0,0.0,0.0,0.0);
//if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,estriph);
const F_FLOAT decobdbo = gp[10] * exphu * hulpov * (exphua1 + exphub1) *
( gp[3] - 2.0 * gp[7] * (BO_i-2.50) );
const F_FLOAT decobdboua = -gp[10] * exphu * hulpov *
(gp[3]*exphua1 + 25.0*gp[4]*exphuov*hulpov*(exphua1+exphub1));
const F_FLOAT decobdboub = -gp[10] * exphu * hulpov *
(gp[3]*exphub1 + 25.0*gp[4]*exphuov*hulpov*(exphua1+exphub1));
d_Cdbo(i,j_index) += decobdbo;
CdDelta_i += decobdboua;
a_CdDelta[j] += decobdboub;
}
}
const F_FLOAT eng_tmp = ebond + estriph;
if (eflag_atom) this->template e_tally<NEIGHFLAG>(ev,i,j,eng_tmp);
}
a_CdDelta[i] += CdDelta_i;
}
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeBond1<NEIGHFLAG,EVFLAG>, const int &ii) const {
EV_FLOAT_REAX ev;
this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeBond1<NEIGHFLAG,EVFLAG>(), ii, ev);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeBond2<NEIGHFLAG,EVFLAG>, const int &ii, EV_FLOAT_REAX& ev) const {
Kokkos::View<F_FLOAT*[3], typename DAT::t_f_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_f = f;
F_FLOAT delij[3], delik[3], deljk[3], tmpvec[3];
F_FLOAT dBOp_i[3], dBOp_k[3], dln_BOp_pi[3], dln_BOp_pi2[3];
const int i = d_ilist[ii];
const X_FLOAT xtmp = x(i,0);
const X_FLOAT ytmp = x(i,1);
const X_FLOAT ztmp = x(i,2);
const int itype = type(i);
const tagint itag = tag(i);
const F_FLOAT imass = paramssing(itype).mass;
const F_FLOAT val_i = paramssing(itype).valency;
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
F_FLOAT CdDelta_i = d_CdDelta[i];
F_FLOAT fitmp[3];
for (int j = 0; j < 3; j++) fitmp[j] = 0.0;
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
const tagint jtag = tag(j);
if (itag > jtag) {
if ((itag+jtag) % 2 == 0) continue;
} else if (itag < jtag) {
if ((itag+jtag) % 2 == 1) continue;
} else {
if (x(j,2) < ztmp) continue;
if (x(j,2) == ztmp && x(j,1) < ytmp) continue;
if (x(j,2) == ztmp && x(j,1) == ytmp && x(j,0) < xtmp) continue;
}
const int jtype = type(j);
const int j_index = jj - j_start;
const F_FLOAT jmass = paramssing(jtype).mass;
F_FLOAT CdDelta_j = d_CdDelta[j];
delij[0] = x(j,0) - xtmp;
delij[1] = x(j,1) - ytmp;
delij[2] = x(j,2) - ztmp;
const F_FLOAT rsq = delij[0]*delij[0] + delij[1]*delij[1] + delij[2]*delij[2];
const F_FLOAT rij = sqrt(rsq);
const int k_start = d_bo_first[j];
const int k_end = k_start + d_bo_num[j];
F_FLOAT coef_C1dbo, coef_C2dbo, coef_C3dbo, coef_C1dbopi, coef_C2dbopi, coef_C3dbopi, coef_C4dbopi;
F_FLOAT coef_C1dbopi2, coef_C2dbopi2, coef_C3dbopi2, coef_C4dbopi2, coef_C1dDelta, coef_C2dDelta, coef_C3dDelta;
coef_C1dbo = coef_C2dbo = coef_C3dbo = 0.0;
coef_C1dbopi = coef_C2dbopi = coef_C3dbopi = coef_C4dbopi = 0.0;
coef_C1dbopi2 = coef_C2dbopi2 = coef_C3dbopi2 = coef_C4dbopi2 = 0.0;
coef_C1dDelta = coef_C2dDelta = coef_C3dDelta = 0.0;
// total forces on i, j, k (nlocal + nghost, from Add_dBond_to_Forces))
const F_FLOAT Cdbo_ij = d_Cdbo(i,j_index);
coef_C1dbo = d_C1dbo(i,j_index) * (Cdbo_ij);
coef_C2dbo = d_C2dbo(i,j_index) * (Cdbo_ij);
coef_C3dbo = d_C3dbo(i,j_index) * (Cdbo_ij);
const F_FLOAT Cdbopi_ij = d_Cdbopi(i,j_index);
coef_C1dbopi = d_C1dbopi(i,j_index) * (Cdbopi_ij);
coef_C2dbopi = d_C2dbopi(i,j_index) * (Cdbopi_ij);
coef_C3dbopi = d_C3dbopi(i,j_index) * (Cdbopi_ij);
coef_C4dbopi = d_C4dbopi(i,j_index) * (Cdbopi_ij);
const F_FLOAT Cdbopi2_ij = d_Cdbopi2(i,j_index);
coef_C1dbopi2 = d_C1dbopi2(i,j_index) * (Cdbopi2_ij);
coef_C2dbopi2 = d_C2dbopi2(i,j_index) * (Cdbopi2_ij);
coef_C3dbopi2 = d_C3dbopi2(i,j_index) * (Cdbopi2_ij);
coef_C4dbopi2 = d_C4dbopi2(i,j_index) * (Cdbopi2_ij);
const F_FLOAT coeff_CdDelta_ij = CdDelta_i + CdDelta_j;
coef_C1dDelta = d_C1dbo(i,j_index) * (coeff_CdDelta_ij);
coef_C2dDelta = d_C2dbo(i,j_index) * (coeff_CdDelta_ij);
coef_C3dDelta = d_C3dbo(i,j_index) * (coeff_CdDelta_ij);
F_FLOAT temp[3];
dln_BOp_pi[0] = d_dln_BOp_pix(i,j_index);
dln_BOp_pi[1] = d_dln_BOp_piy(i,j_index);
dln_BOp_pi[2] = d_dln_BOp_piz(i,j_index);
dln_BOp_pi2[0] = d_dln_BOp_pi2x(i,j_index);
dln_BOp_pi2[1] = d_dln_BOp_pi2y(i,j_index);
dln_BOp_pi2[2] = d_dln_BOp_pi2z(i,j_index);
dBOp_i[0] = d_dBOpx(i,j_index);
dBOp_i[1] = d_dBOpy(i,j_index);
dBOp_i[2] = d_dBOpz(i,j_index);
// forces on i
for (int d = 0; d < 3; d++) temp[d] = coef_C1dbo * dBOp_i[d];
for (int d = 0; d < 3; d++) temp[d] += coef_C2dbo * d_dDeltap_self(i,d);
for (int d = 0; d < 3; d++) temp[d] += coef_C1dDelta * dBOp_i[d];
for (int d = 0; d < 3; d++) temp[d] += coef_C2dDelta * d_dDeltap_self(i,d);
for (int d = 0; d < 3; d++) temp[d] += coef_C1dbopi * dln_BOp_pi[d];
for (int d = 0; d < 3; d++) temp[d] += coef_C2dbopi * dBOp_i[d];
for (int d = 0; d < 3; d++) temp[d] += coef_C3dbopi * d_dDeltap_self(i,d);
for (int d = 0; d < 3; d++) temp[d] += coef_C1dbopi2 * dln_BOp_pi2[d];
for (int d = 0; d < 3; d++) temp[d] += coef_C2dbopi2 * dBOp_i[d];
for (int d = 0; d < 3; d++) temp[d] += coef_C3dbopi2 * d_dDeltap_self(i,d);
if (EVFLAG)
if (vflag_either) this->template v_tally<NEIGHFLAG>(ev,i,temp,delij);
fitmp[0] -= temp[0];
fitmp[1] -= temp[1];
fitmp[2] -= temp[2];
// forces on j
for (int d = 0; d < 3; d++) temp[d] = -coef_C1dbo * dBOp_i[d];
for (int d = 0; d < 3; d++) temp[d] += coef_C3dbo * d_dDeltap_self(j,d);
for (int d = 0; d < 3; d++) temp[d] -= coef_C1dDelta * dBOp_i[d];
for (int d = 0; d < 3; d++) temp[d] += coef_C3dDelta * d_dDeltap_self(j,d);
for (int d = 0; d < 3; d++) temp[d] -= coef_C1dbopi * dln_BOp_pi[d];
for (int d = 0; d < 3; d++) temp[d] -= coef_C2dbopi * dBOp_i[d];
for (int d = 0; d < 3; d++) temp[d] += coef_C4dbopi * d_dDeltap_self(j,d);
for (int d = 0; d < 3; d++) temp[d] -= coef_C1dbopi2 * dln_BOp_pi2[d];
for (int d = 0; d < 3; d++) temp[d] -= coef_C2dbopi2 * dBOp_i[d];
for (int d = 0; d < 3; d++) temp[d] += coef_C4dbopi2 * d_dDeltap_self(j,d);
a_f(j,0) -= temp[0];
a_f(j,1) -= temp[1];
a_f(j,2) -= temp[2];
if (EVFLAG)
if (vflag_either) {
for (int d = 0; d < 3; d++) tmpvec[d] = -delij[d];
this->template v_tally<NEIGHFLAG>(ev,j,temp,tmpvec);
}
// forces on k: i neighbor
for (int kk = j_start; kk < j_end; kk++) {
int k = d_bo_list[kk];
k &= NEIGHMASK;
const int k_index = kk - j_start;
dBOp_k[0] = d_dBOpx(i,k_index);
dBOp_k[1] = d_dBOpy(i,k_index);
dBOp_k[2] = d_dBOpz(i,k_index);
const F_FLOAT coef_all = -coef_C2dbo - coef_C2dDelta - coef_C3dbopi - coef_C3dbopi2;
for (int d = 0; d < 3; d++) temp[d] = coef_all * dBOp_k[d];
a_f(k,0) -= temp[0];
a_f(k,1) -= temp[1];
a_f(k,2) -= temp[2];
if (EVFLAG)
if (vflag_either) {
delik[0] = x(k,0) - xtmp;
delik[1] = x(k,1) - ytmp;
delik[2] = x(k,2) - ztmp;
for (int d = 0; d < 3; d++) tmpvec[d] = x(j,d) - x(k,d) - delik[d];
this->template v_tally<NEIGHFLAG>(ev,k,temp,tmpvec);
}
}
// forces on k: j neighbor
for (int kk = k_start; kk < k_end; kk++) {
int k = d_bo_list[kk];
k &= NEIGHMASK;
const int k_index = kk - k_start;
dBOp_k[0] = d_dBOpx(j,k_index);
dBOp_k[1] = d_dBOpy(j,k_index);
dBOp_k[2] = d_dBOpz(j,k_index);
const F_FLOAT coef_all = -coef_C3dbo - coef_C3dDelta - coef_C4dbopi - coef_C4dbopi2;
for (int d = 0; d < 3; d++) temp[d] = coef_all * dBOp_k[d];
a_f(k,0) -= temp[0];
a_f(k,1) -= temp[1];
a_f(k,2) -= temp[2];
if (EVFLAG) {
if (vflag_either) {
for (int d = 0; d < 3; d++) deljk[d] = x(k,d) - x(j,d);
for (int d = 0; d < 3; d++) tmpvec[d] = x(i,d) - x(k,d) - deljk[d];
this->template v_tally<NEIGHFLAG>(ev,k,temp,tmpvec);
}
}
}
}
for (int d = 0; d < 3; d++) a_f(i,d) += fitmp[d];
}
template<class DeviceType>
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxComputeBond2<NEIGHFLAG,EVFLAG>, const int &ii) const {
EV_FLOAT_REAX ev;
this->template operator()<NEIGHFLAG,EVFLAG>(PairReaxComputeBond2<NEIGHFLAG,EVFLAG>(), ii, ev);
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::ev_tally(EV_FLOAT_REAX &ev, const int &i, const int &j,
const F_FLOAT &epair, const F_FLOAT &fpair, const F_FLOAT &delx,
const F_FLOAT &dely, const F_FLOAT &delz) const
{
const int VFLAG = vflag_either;
// The eatom and vatom arrays are atomic for Half/Thread neighbor style
Kokkos::View<E_FLOAT*, typename DAT::t_efloat_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_eatom = v_eatom;
Kokkos::View<F_FLOAT*[6], typename DAT::t_virial_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_vatom = v_vatom;
if (eflag_atom) {
const E_FLOAT epairhalf = 0.5 * epair;
a_eatom[i] += epairhalf;
if (NEIGHFLAG != FULL) a_eatom[j] += epairhalf;
}
if (VFLAG) {
const E_FLOAT v0 = delx*delx*fpair;
const E_FLOAT v1 = dely*dely*fpair;
const E_FLOAT v2 = delz*delz*fpair;
const E_FLOAT v3 = delx*dely*fpair;
const E_FLOAT v4 = delx*delz*fpair;
const E_FLOAT v5 = dely*delz*fpair;
if (vflag_global) {
if (NEIGHFLAG != FULL) {
ev.v[0] += v0;
ev.v[1] += v1;
ev.v[2] += v2;
ev.v[3] += v3;
ev.v[4] += v4;
ev.v[5] += v5;
} else {
ev.v[0] += 0.5*v0;
ev.v[1] += 0.5*v1;
ev.v[2] += 0.5*v2;
ev.v[3] += 0.5*v3;
ev.v[4] += 0.5*v4;
ev.v[5] += 0.5*v5;
}
}
if (vflag_atom) {
a_vatom(i,0) += 0.5*v0;
a_vatom(i,1) += 0.5*v1;
a_vatom(i,2) += 0.5*v2;
a_vatom(i,3) += 0.5*v3;
a_vatom(i,4) += 0.5*v4;
a_vatom(i,5) += 0.5*v5;
if (NEIGHFLAG != FULL) {
a_vatom(j,0) += 0.5*v0;
a_vatom(j,1) += 0.5*v1;
a_vatom(j,2) += 0.5*v2;
a_vatom(j,3) += 0.5*v3;
a_vatom(j,4) += 0.5*v4;
a_vatom(j,5) += 0.5*v5;
}
}
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::e_tally(EV_FLOAT_REAX &ev, const int &i, const int &j,
const F_FLOAT &epair) const
{
// The eatom array is atomic for Half/Thread neighbor style
if (eflag_atom) {
Kokkos::View<E_FLOAT*, typename DAT::t_efloat_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_eatom = v_eatom;
const E_FLOAT epairhalf = 0.5 * epair;
a_eatom[i] += epairhalf;
a_eatom[j] += epairhalf;
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::e_tally_single(EV_FLOAT_REAX &ev, const int &i,
const F_FLOAT &epair) const
{
// The eatom array is atomic for Half/Thread neighbor style
Kokkos::View<E_FLOAT*, typename DAT::t_efloat_1d::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_eatom = v_eatom;
a_eatom[i] += epair;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::v_tally(EV_FLOAT_REAX &ev, const int &i,
F_FLOAT *fi, F_FLOAT *drij) const
{
F_FLOAT v[6];
v[0] = 0.5*drij[0]*fi[0];
v[1] = 0.5*drij[1]*fi[1];
v[2] = 0.5*drij[2]*fi[2];
v[3] = 0.5*drij[0]*fi[1];
v[4] = 0.5*drij[0]*fi[2];
v[5] = 0.5*drij[1]*fi[2];
if (vflag_global) {
ev.v[0] += v[0];
ev.v[1] += v[1];
ev.v[2] += v[2];
ev.v[3] += v[3];
ev.v[4] += v[4];
ev.v[5] += v[5];
}
if (vflag_atom) {
Kokkos::View<F_FLOAT*[6], typename DAT::t_virial_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_vatom = v_vatom;
a_vatom(i,0) += v[0]; a_vatom(i,1) += v[1]; a_vatom(i,2) += v[2];
a_vatom(i,3) += v[3]; a_vatom(i,4) += v[4]; a_vatom(i,5) += v[5];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::v_tally3(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k,
F_FLOAT *fj, F_FLOAT *fk, F_FLOAT *drij, F_FLOAT *drik) const
{
// The eatom and vatom arrays are atomic for Half/Thread neighbor style
Kokkos::View<F_FLOAT*[6], typename DAT::t_virial_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_vatom = v_vatom;
F_FLOAT v[6];
v[0] = drij[0]*fj[0] + drik[0]*fk[0];
v[1] = drij[1]*fj[1] + drik[1]*fk[1];
v[2] = drij[2]*fj[2] + drik[2]*fk[2];
v[3] = drij[0]*fj[1] + drik[0]*fk[1];
v[4] = drij[0]*fj[2] + drik[0]*fk[2];
v[5] = drij[1]*fj[2] + drik[1]*fk[2];
if (vflag_global) {
ev.v[0] += v[0];
ev.v[1] += v[1];
ev.v[2] += v[2];
ev.v[3] += v[3];
ev.v[4] += v[4];
ev.v[5] += v[5];
}
if (vflag_atom) {
a_vatom(i,0) += THIRD * v[0]; a_vatom(i,1) += THIRD * v[1]; a_vatom(i,2) += THIRD * v[2];
a_vatom(i,3) += THIRD * v[3]; a_vatom(i,4) += THIRD * v[4]; a_vatom(i,5) += THIRD * v[5];
a_vatom(j,0) += THIRD * v[0]; a_vatom(j,1) += THIRD * v[1]; a_vatom(j,2) += THIRD * v[2];
a_vatom(j,3) += THIRD * v[3]; a_vatom(j,4) += THIRD * v[4]; a_vatom(j,5) += THIRD * v[5];
a_vatom(k,0) += THIRD * v[0]; a_vatom(k,1) += THIRD * v[1]; a_vatom(k,2) += THIRD * v[2];
a_vatom(k,3) += THIRD * v[3]; a_vatom(k,4) += THIRD * v[4]; a_vatom(k,5) += THIRD * v[5];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::v_tally4(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k,
const int &l, F_FLOAT *fi, F_FLOAT *fj, F_FLOAT *fk, F_FLOAT *dril, F_FLOAT *drjl, F_FLOAT *drkl) const
{
// The vatom array is atomic for Half/Thread neighbor style
F_FLOAT v[6];
v[0] = dril[0]*fi[0] + drjl[0]*fj[0] + drkl[0]*fk[0];
v[1] = dril[1]*fi[1] + drjl[1]*fj[1] + drkl[1]*fk[1];
v[2] = dril[2]*fi[2] + drjl[2]*fj[2] + drkl[2]*fk[2];
v[3] = dril[0]*fi[1] + drjl[0]*fj[1] + drkl[0]*fk[1];
v[4] = dril[0]*fi[2] + drjl[0]*fj[2] + drkl[0]*fk[2];
v[5] = dril[1]*fi[2] + drjl[1]*fj[2] + drkl[1]*fk[2];
if (vflag_global) {
ev.v[0] += v[0];
ev.v[1] += v[1];
ev.v[2] += v[2];
ev.v[3] += v[3];
ev.v[4] += v[4];
ev.v[5] += v[5];
}
if (vflag_atom) {
Kokkos::View<F_FLOAT*[6], typename DAT::t_virial_array::array_layout,DeviceType,Kokkos::MemoryTraits<AtomicF<NEIGHFLAG>::value> > a_vatom = v_vatom;
a_vatom(i,0) += 0.25 * v[0]; a_vatom(i,1) += 0.25 * v[1]; a_vatom(i,2) += 0.25 * v[2];
a_vatom(i,3) += 0.25 * v[3]; a_vatom(i,4) += 0.25 * v[4]; a_vatom(i,5) += 0.25 * v[5];
a_vatom(j,0) += 0.25 * v[0]; a_vatom(j,1) += 0.25 * v[1]; a_vatom(j,2) += 0.25 * v[2];
a_vatom(j,3) += 0.25 * v[3]; a_vatom(j,4) += 0.25 * v[4]; a_vatom(j,5) += 0.25 * v[5];
a_vatom(k,0) += 0.25 * v[0]; a_vatom(k,1) += 0.25 * v[1]; a_vatom(k,2) += 0.25 * v[2];
a_vatom(k,3) += 0.25 * v[3]; a_vatom(k,4) += 0.25 * v[4]; a_vatom(k,5) += 0.25 * v[5];
a_vatom(l,0) += 0.25 * v[0]; a_vatom(l,1) += 0.25 * v[1]; a_vatom(l,2) += 0.25 * v[2];
a_vatom(l,3) += 0.25 * v[3]; a_vatom(l,4) += 0.25 * v[4]; a_vatom(l,5) += 0.25 * v[5];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::v_tally3_atom(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k,
F_FLOAT *fj, F_FLOAT *fk, F_FLOAT *drji, F_FLOAT *drjk) const
{
F_FLOAT v[6];
v[0] = THIRD * (drji[0]*fj[0] + drjk[0]*fk[0]);
v[1] = THIRD * (drji[1]*fj[1] + drjk[1]*fk[1]);
v[2] = THIRD * (drji[2]*fj[2] + drjk[2]*fk[2]);
v[3] = THIRD * (drji[0]*fj[1] + drjk[0]*fk[1]);
v[4] = THIRD * (drji[0]*fj[2] + drjk[0]*fk[2]);
v[5] = THIRD * (drji[1]*fj[2] + drjk[1]*fk[2]);
if (vflag_global) {
ev.v[0] += v[0];
ev.v[1] += v[1];
ev.v[2] += v[2];
ev.v[3] += v[3];
ev.v[4] += v[4];
ev.v[5] += v[5];
}
if (vflag_atom) {
d_vatom(i,0) += v[0]; d_vatom(i,1) += v[1]; d_vatom(i,2) += v[2];
d_vatom(i,3) += v[3]; d_vatom(i,4) += v[4]; d_vatom(i,5) += v[5];
}
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void *PairReaxCKokkos<DeviceType>::extract(const char *str, int &dim)
{
dim = 1;
if (strcmp(str,"chi") == 0 && chi) {
for (int i = 1; i <= atom->ntypes; i++)
if (map[i] >= 0) chi[i] = system->reax_param.sbp[map[i]].chi;
else chi[i] = 0.0;
return (void *) chi;
}
if (strcmp(str,"eta") == 0 && eta) {
for (int i = 1; i <= atom->ntypes; i++)
if (map[i] >= 0) eta[i] = system->reax_param.sbp[map[i]].eta;
else eta[i] = 0.0;
return (void *) eta;
}
if (strcmp(str,"gamma") == 0 && gamma) {
for (int i = 1; i <= atom->ntypes; i++)
if (map[i] >= 0) gamma[i] = system->reax_param.sbp[map[i]].gamma;
else gamma[i] = 0.0;
return (void *) gamma;
}
return NULL;
}
/* ----------------------------------------------------------------------
setup for energy, virial computation
see integrate::ev_set() for values of eflag (0-3) and vflag (0-6)
------------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::ev_setup(int eflag, int vflag)
{
int i;
evflag = 1;
eflag_either = eflag;
eflag_global = eflag % 2;
eflag_atom = eflag / 2;
vflag_either = vflag;
vflag_global = vflag % 4;
vflag_atom = vflag / 4;
// reallocate per-atom arrays if necessary
if (eflag_atom && atom->nmax > maxeatom) {
maxeatom = atom->nmax;
memory->destroy_kokkos(k_eatom,eatom);
memory->create_kokkos(k_eatom,eatom,maxeatom,"pair:eatom");
v_eatom = k_eatom.view<DeviceType>();
}
if (vflag_atom && atom->nmax > maxvatom) {
maxvatom = atom->nmax;
memory->destroy_kokkos(k_vatom,vatom);
memory->create_kokkos(k_vatom,vatom,maxvatom,6,"pair:vatom");
v_vatom = k_vatom.view<DeviceType>();
}
// zero accumulators
if (eflag_global) eng_vdwl = eng_coul = 0.0;
if (vflag_global) for (i = 0; i < 6; i++) virial[i] = 0.0;
if (eflag_atom) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxZeroEAtom>(0,maxeatom),*this);
DeviceType::fence();
}
if (vflag_atom) {
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxZeroVAtom>(0,maxvatom),*this);
DeviceType::fence();
}
// if vflag_global = 2 and pair::compute() calls virial_fdotr_compute()
// compute global virial via (F dot r) instead of via pairwise summation
// unset other flags as appropriate
if (vflag_global == 2 && no_virial_fdotr_compute == 0) {
vflag_fdotr = 1;
vflag_global = 0;
if (vflag_atom == 0) vflag_either = 0;
if (vflag_either == 0 && eflag_either == 0) evflag = 0;
} else vflag_fdotr = 0;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
double PairReaxCKokkos<DeviceType>::memory_usage()
{
double bytes = 0.0;
if (cut_hbsq > 0.0) {
bytes += nmax*3*sizeof(int);
bytes += maxhb*nmax*sizeof(int);
}
bytes += nmax*2*sizeof(int);
bytes += maxbo*nmax*sizeof(int);
bytes += nmax*17*sizeof(F_FLOAT);
bytes += maxbo*nmax*34*sizeof(F_FLOAT);
// FixReaxCSpecies
if (fixspecies_flag) {
bytes += MAXSPECBOND*nmax*sizeof(tagint);
bytes += MAXSPECBOND*nmax*sizeof(F_FLOAT);
}
// FixReaxCBonds
bytes += maxbo*nmax*sizeof(tagint);
bytes += maxbo*nmax*sizeof(F_FLOAT);
bytes += nmax*sizeof(int);
return bytes;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::FindBond(int &numbonds)
{
copymode = 1;
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxFindBondZero>(0,nmax),*this);
DeviceType::fence();
bo_cut_bond = control->bg_cut;
atomKK->sync(execution_space,TAG_MASK);
tag = atomKK->k_tag.view<DeviceType>();
const int inum = list->inum;
NeighListKokkos<DeviceType>* k_list = static_cast<NeighListKokkos<DeviceType>*>(list);
d_ilist = k_list->d_ilist;
k_list->clean_copy();
numbonds = 0;
PairReaxCKokkosFindBondFunctor<DeviceType> find_bond_functor(this);
Kokkos::parallel_reduce(inum,find_bond_functor,numbonds);
DeviceType::fence();
copymode = 0;
}
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxFindBondZero, const int &i) const {
d_numneigh_bonds[i] = 0;
for (int j = 0; j < maxbo; j++) {
d_neighid(i,j) = 0;
d_abo(i,j) = 0.0;
}
}
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::calculate_find_bond_item(int ii, int &numbonds) const
{
const int i = d_ilist[ii];
int nj = 0;
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
const tagint jtag = tag[j];
const int j_index = jj - j_start;
double bo_tmp = d_BO(i,j_index);
if (bo_tmp > bo_cut_bond) {
d_neighid(i,nj) = jtag;
d_abo(i,nj) = bo_tmp;
nj++;
}
}
d_numneigh_bonds[i] = nj;
if (nj > numbonds) numbonds = nj;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::PackBondBuffer(DAT::tdual_ffloat_1d k_buf, int &nbuf_local)
{
d_buf = k_buf.view<DeviceType>();
k_params_sing.template sync<DeviceType>();
atomKK->sync(execution_space,TAG_MASK|TYPE_MASK|Q_MASK|MOLECULE_MASK);
tag = atomKK->k_tag.view<DeviceType>();
type = atomKK->k_type.view<DeviceType>();
q = atomKK->k_q.view<DeviceType>();
if (atom->molecule)
molecule = atomKK->k_molecule.view<DeviceType>();
copymode = 1;
nlocal = atomKK->nlocal;
PairReaxCKokkosPackBondBufferFunctor<DeviceType> pack_bond_buffer_functor(this);
Kokkos::parallel_scan(nlocal,pack_bond_buffer_functor);
DeviceType::fence();
copymode = 0;
k_buf.modify<DeviceType>();
k_nbuf_local.modify<DeviceType>();
k_buf.sync<LMPHostType>();
k_nbuf_local.sync<LMPHostType>();
nbuf_local = k_nbuf_local.h_view();
}
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::pack_bond_buffer_item(int i, int &j, const bool &final) const
{
if (i == 0)
j += 2;
if (final) {
d_buf[j-1] = tag[i];
d_buf[j+0] = type[i];
d_buf[j+1] = d_total_bo[i];
d_buf[j+2] = paramssing(type[i]).nlp_opt - d_Delta_lp[i];
d_buf[j+3] = q[i];
d_buf[j+4] = d_numneigh_bonds[i];
}
const int numbonds = d_numneigh_bonds[i];
if (final) {
for (int k = 5; k < 5+numbonds; k++) {
d_buf[j+k] = d_neighid(i,k-5);
}
}
j += (5+numbonds);
if (final) {
if (!molecule.data()) d_buf[j] = 0.0;
else d_buf[j] = molecule[i];
}
j++;
if (final) {
for (int k = 0; k < numbonds; k++) {
d_buf[j+k] = d_abo(i,k);
}
}
j += (1+numbonds);
if (final && i == nlocal-1)
k_nbuf_local.view<DeviceType>()() = j - 1;
}
/* ---------------------------------------------------------------------- */
template<class DeviceType>
void PairReaxCKokkos<DeviceType>::FindBondSpecies()
{
copymode = 1;
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxFindBondSpeciesZero>(0,nmax),*this);
DeviceType::fence();
nlocal = atomKK->nlocal;
Kokkos::parallel_for(Kokkos::RangePolicy<DeviceType, PairReaxFindBondSpecies>(0,nlocal),*this);
DeviceType::fence();
copymode = 0;
// NOTE: Could improve performance if a Kokkos version of ComputeSpecAtom is added
k_tmpbo.modify<DeviceType>();
k_tmpid.modify<DeviceType>();
k_error_flag.modify<DeviceType>();
k_tmpbo.sync<LMPHostType>();
k_tmpid.sync<LMPHostType>();
k_error_flag.sync<LMPHostType>();
if (k_error_flag.h_view())
error->all(FLERR,"Increase MAXSPECBOND in reaxc_defs.h");
}
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxFindBondSpeciesZero, const int &i) const {
for (int j = 0; j < MAXSPECBOND; j++) {
k_tmpbo.view<DeviceType>()(i,j) = 0.0;
k_tmpid.view<DeviceType>()(i,j) = 0;
}
}
template<class DeviceType>
KOKKOS_INLINE_FUNCTION
void PairReaxCKokkos<DeviceType>::operator()(PairReaxFindBondSpecies, const int &i) const {
int nj = 0;
const int j_start = d_bo_first[i];
const int j_end = j_start + d_bo_num[i];
for (int jj = j_start; jj < j_end; jj++) {
int j = d_bo_list[jj];
j &= NEIGHMASK;
if (j < i) continue;
const int j_index = jj - j_start;
double bo_tmp = d_BO(i,j_index);
if (bo_tmp >= 0.10 ) { // Why is this a hardcoded value?
k_tmpid.view<DeviceType>()(i,nj) = j;
k_tmpbo.view<DeviceType>()(i,nj) = bo_tmp;
nj++;
if (nj > MAXSPECBOND) k_error_flag.view<DeviceType>()() = 1;
}
}
}
template class PairReaxCKokkos<LMPDeviceType>;
#ifdef KOKKOS_HAVE_CUDA
template class PairReaxCKokkos<LMPHostType>;
#endif
}
diff --git a/src/KOKKOS/pair_reax_c_kokkos.h b/src/KOKKOS/pair_reaxc_kokkos.h
similarity index 99%
rename from src/KOKKOS/pair_reax_c_kokkos.h
rename to src/KOKKOS/pair_reaxc_kokkos.h
index 8a0c08b66..59c4d196d 100644
--- a/src/KOKKOS/pair_reax_c_kokkos.h
+++ b/src/KOKKOS/pair_reaxc_kokkos.h
@@ -1,498 +1,498 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(reax/c/kk,PairReaxCKokkos<LMPDeviceType>)
PairStyle(reax/c/kk/device,PairReaxCKokkos<LMPDeviceType>)
PairStyle(reax/c/kk/host,PairReaxCKokkos<LMPHostType>)
#else
#ifndef LMP_PAIR_REAXC_KOKKOS_H
#define LMP_PAIR_REAXC_KOKKOS_H
#include <stdio.h>
#include "pair_kokkos.h"
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "neigh_list_kokkos.h"
#include "reaxc_types.h"
#define C_ele 332.06371
#define SMALL 0.0001
#define KCALpMOL_to_EV 23.02
#define HB_THRESHOLD 1e-2 // 0.01
#define MAX_BONDS 30
#define SQR(x) ((x)*(x))
namespace LAMMPS_NS {
typedef Kokkos::DualView<LR_data*,Kokkos::LayoutRight,LMPDeviceType> tdual_LR_data_1d;
typedef typename tdual_LR_data_1d::t_dev t_LR_data_1d;
typedef Kokkos::DualView<cubic_spline_coef*,Kokkos::LayoutRight,LMPDeviceType> tdual_cubic_spline_coef_1d;
typedef typename tdual_cubic_spline_coef_1d::t_dev t_cubic_spline_coef_1d;
struct LR_lookup_table_kk
{
double xmin, xmax;
int n;
double dx, inv_dx;
double a;
double m;
double c;
t_LR_data_1d d_y;
t_cubic_spline_coef_1d d_H;
t_cubic_spline_coef_1d d_vdW, d_CEvd;
t_cubic_spline_coef_1d d_ele, d_CEclmb;
};
template<int NEIGHFLAG, int EVFLAG>
struct PairReaxComputePolar{};
template<int NEIGHFLAG, int EVFLAG>
struct PairReaxComputeLJCoulomb{};
template<int NEIGHFLAG, int EVFLAG>
struct PairReaxComputeTabulatedLJCoulomb{};
struct PairReaxBuildListsFull{};
template<int NEIGHFLAG>
struct PairReaxBuildListsHalf{};
template<int NEIGHFLAG>
struct PairReaxBuildListsHalf_LessAtomics{};
struct PairReaxZero{};
struct PairReaxZeroEAtom{};
struct PairReaxZeroVAtom{};
struct PairReaxBondOrder1{};
struct PairReaxBondOrder1_LessAtomics{};
struct PairReaxBondOrder2{};
struct PairReaxBondOrder3{};
template<int NEIGHFLAG>
struct PairReaxUpdateBond{};
template<int NEIGHFLAG, int EVFLAG>
struct PairReaxComputeBond1{};
template<int NEIGHFLAG, int EVFLAG>
struct PairReaxComputeBond2{};
template<int NEIGHFLAG, int EVFLAG>
struct PairReaxComputeMulti1{};
template<int NEIGHFLAG, int EVFLAG>
struct PairReaxComputeMulti2{};
template<int NEIGHFLAG, int EVFLAG>
struct PairReaxComputeAngular{};
template<int NEIGHFLAG, int EVFLAG>
struct PairReaxComputeTorsion{};
template<int NEIGHFLAG, int EVFLAG>
struct PairReaxComputeHydrogen{};
struct PairReaxFindBondZero{};
struct PairReaxFindBondSpeciesZero{};
struct PairReaxFindBondSpecies{};
template<class DeviceType>
class PairReaxCKokkos : public PairReaxC {
public:
enum {EnabledNeighFlags=FULL|HALF|HALFTHREAD};
enum {COUL_FLAG=1};
typedef DeviceType device_type;
typedef ArrayTypes<DeviceType> AT;
typedef EV_FLOAT_REAX value_type;
PairReaxCKokkos(class LAMMPS *);
virtual ~PairReaxCKokkos();
void ev_setup(int, int);
void compute(int, int);
void *extract(const char *, int &);
void init_style();
double memory_usage();
void FindBond(int &);
void PackBondBuffer(DAT::tdual_ffloat_1d, int &);
void FindBondSpecies();
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputePolar<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputePolar<NEIGHFLAG,EVFLAG>, const int&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeLJCoulomb<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeLJCoulomb<NEIGHFLAG,EVFLAG>, const int&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeTabulatedLJCoulomb<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeTabulatedLJCoulomb<NEIGHFLAG,EVFLAG>, const int&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxBuildListsFull, const int&) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxBuildListsHalf<NEIGHFLAG>, const int&) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxBuildListsHalf_LessAtomics<NEIGHFLAG>, const int&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxZero, const int&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxZeroEAtom, const int&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxZeroVAtom, const int&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxBondOrder1, const int&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxBondOrder1_LessAtomics, const int&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxBondOrder2, const int&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxBondOrder3, const int&) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxUpdateBond<NEIGHFLAG>, const int&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeBond1<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeBond1<NEIGHFLAG,EVFLAG>, const int&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeBond2<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeBond2<NEIGHFLAG,EVFLAG>, const int&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeMulti1<NEIGHFLAG,EVFLAG>, const int&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeMulti2<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeMulti2<NEIGHFLAG,EVFLAG>, const int&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeAngular<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeAngular<NEIGHFLAG,EVFLAG>, const int&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeTorsion<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeTorsion<NEIGHFLAG,EVFLAG>, const int&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeHydrogen<NEIGHFLAG,EVFLAG>, const int&, EV_FLOAT_REAX&) const;
template<int NEIGHFLAG, int EVFLAG>
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxComputeHydrogen<NEIGHFLAG,EVFLAG>, const int&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxFindBondZero, const int&) const;
KOKKOS_INLINE_FUNCTION
void calculate_find_bond_item(int, int&) const;
KOKKOS_INLINE_FUNCTION
void pack_bond_buffer_item(int, int&, const bool&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxFindBondSpeciesZero, const int&) const;
KOKKOS_INLINE_FUNCTION
void operator()(PairReaxFindBondSpecies, const int&) const;
struct params_sing{
KOKKOS_INLINE_FUNCTION
params_sing(){mass=0;chi=0;eta=0;r_s=0;r_pi=0;r_pi2=0;valency=0;valency_val=0;valency_e=0;valency_boc=0;nlp_opt=0;
p_lp2=0;p_ovun2=0;p_ovun5=0;p_val3=0;p_val5=0;p_hbond=0;};
KOKKOS_INLINE_FUNCTION
params_sing(int i){mass=0;chi=0;eta=0;r_s=0;r_pi=0;r_pi2=0;valency=0;valency_val=0;valency_e=0;valency_boc=0;nlp_opt=0;
p_lp2=0;p_ovun2=0;p_ovun5=0;p_val3=0;p_val5=0;p_hbond=0;};
F_FLOAT mass,chi,eta,r_s,r_pi,r_pi2,valency,valency_val,valency_e,valency_boc,nlp_opt,
p_lp2,p_ovun2,p_ovun5, p_val3, p_val5, p_hbond;
};
struct params_twbp{
KOKKOS_INLINE_FUNCTION
params_twbp(){gamma=0;gamma_w=0;alpha=0;r_vdw=0;epsilon=0;acore=0;ecore=0;rcore=0;lgre=0;lgcij=0;
r_s=0;r_pi=0;r_pi2=0;p_bo1=0;p_bo2=0;p_bo3=0;p_bo4=0;p_bo5=0;p_bo6=0;ovc=0;v13cor=0;
p_boc3=0;p_boc4=0;p_boc5=0;p_be1=0,p_be2=0,De_s=0,De_p=0;De_pp=0;
p_ovun1=0;};
KOKKOS_INLINE_FUNCTION
params_twbp(int i){gamma=0;gamma_w=0;alpha=0;r_vdw=0;epsilon=0;acore=0;ecore=0;rcore=0;lgre=0;lgcij=0;
r_s=0;r_pi=0;r_pi2=0;p_bo1=0;p_bo2=0;p_bo3=0;p_bo4=0;p_bo5=0;p_bo6=0;ovc=0;v13cor=0;
p_boc3=0;p_boc4=0;p_boc5=0;p_be1=0,p_be2=0,De_s=0,De_p=0;De_pp=0;
p_ovun1=0;};
F_FLOAT gamma,gamma_w,alpha,r_vdw,epsilon,acore,ecore,rcore,lgre,lgcij,
r_s,r_pi,r_pi2,p_bo1,p_bo2,p_bo3,p_bo4,p_bo5,p_bo6,ovc,v13cor,
p_boc3,p_boc4,p_boc5,p_be1,p_be2,De_s,De_p,De_pp,
p_ovun1;
};
struct params_thbp{
KOKKOS_INLINE_FUNCTION
params_thbp(){cnt=0;theta_00=0;p_val1=0;p_val2=0;p_val4=0;p_val7=0;p_pen1=0;p_coa1=0;};
KOKKOS_INLINE_FUNCTION
params_thbp(int i){cnt=0;theta_00=0;p_val1=0;p_val2=0;p_val4=0;p_val7=0;p_pen1=0;p_coa1=0;};
F_FLOAT cnt, theta_00, p_val1, p_val2, p_val4, p_val7, p_pen1, p_coa1;
};
struct params_fbp{
KOKKOS_INLINE_FUNCTION
params_fbp(){p_tor1=0;p_cot1=0;V1=0;V2=0;V3=0;};
KOKKOS_INLINE_FUNCTION
params_fbp(int i){p_tor1=0;p_cot1=0;V1=0;V2=0;V3=0;};
F_FLOAT p_tor1, p_cot1, V1, V2, V3;
};
struct params_hbp{
KOKKOS_INLINE_FUNCTION
params_hbp(){p_hb1=0;p_hb2=0;p_hb3=0;r0_hb=0;};
KOKKOS_INLINE_FUNCTION
params_hbp(int i){p_hb1=0;p_hb2=0;p_hb3=0;r0_hb=0;};
F_FLOAT p_hb1, p_hb2, p_hb3, r0_hb;
};
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void ev_tally(EV_FLOAT_REAX &ev, const int &i, const int &j, const F_FLOAT &epair, const F_FLOAT &fpair, const F_FLOAT &delx,
const F_FLOAT &dely, const F_FLOAT &delz) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void e_tally(EV_FLOAT_REAX &ev, const int &i, const int &j, const F_FLOAT &epair) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void e_tally_single(EV_FLOAT_REAX &ev, const int &i, const F_FLOAT &epair) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void v_tally(EV_FLOAT_REAX &ev, const int &i, F_FLOAT *fi, F_FLOAT *drij) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void v_tally3(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k,
F_FLOAT *fj, F_FLOAT *fk, F_FLOAT *drij, F_FLOAT *drik) const;
KOKKOS_INLINE_FUNCTION
void v_tally3_atom(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k,
F_FLOAT *fj, F_FLOAT *fk, F_FLOAT *drji, F_FLOAT *drjk) const;
template<int NEIGHFLAG>
KOKKOS_INLINE_FUNCTION
void v_tally4(EV_FLOAT_REAX &ev, const int &i, const int &j, const int &k, const int &l,
F_FLOAT *fi, F_FLOAT *fj, F_FLOAT *fk, F_FLOAT *dril, F_FLOAT *drjl, F_FLOAT *drkl) const;
protected:
void cleanup_copy();
void allocate();
void allocate_array();
void setup();
void init_md();
int Init_Lookup_Tables();
void Deallocate_Lookup_Tables();
void LR_vdW_Coulomb( int i, int j, double r_ij, LR_data *lr );
typedef Kokkos::DualView<int*,DeviceType> tdual_int_1d;
Kokkos::DualView<params_sing*,typename DeviceType::array_layout,DeviceType> k_params_sing;
typename Kokkos::DualView<params_sing*,typename DeviceType::array_layout,DeviceType>::t_dev_const paramssing;
typedef Kokkos::DualView<int**,DeviceType> tdual_int_2d;
Kokkos::DualView<params_twbp**,typename DeviceType::array_layout,DeviceType> k_params_twbp;
typename Kokkos::DualView<params_twbp**,typename DeviceType::array_layout,DeviceType>::t_dev_const paramstwbp;
typedef Kokkos::DualView<int***,DeviceType> tdual_int_3d;
Kokkos::DualView<params_thbp***,typename DeviceType::array_layout,DeviceType> k_params_thbp;
typename Kokkos::DualView<params_thbp***,typename DeviceType::array_layout,DeviceType>::t_dev_const paramsthbp;
Kokkos::DualView<params_hbp***,typename DeviceType::array_layout,DeviceType> k_params_hbp;
typename Kokkos::DualView<params_hbp***,typename DeviceType::array_layout,DeviceType>::t_dev_const paramshbp;
typedef Kokkos::DualView<int****,DeviceType> tdual_int_4d;
Kokkos::DualView<params_fbp****,typename DeviceType::array_layout,DeviceType> k_params_fbp;
typename Kokkos::DualView<params_fbp****,typename DeviceType::array_layout,DeviceType>::t_dev_const paramsfbp;
typename AT::t_x_array_randomread x;
typename AT::t_f_array f;
typename AT::t_int_1d_randomread type;
typename AT::t_tagint_1d_randomread tag;
typename AT::t_float_1d_randomread q;
typename AT::t_tagint_1d_randomread molecule;
DAT::tdual_efloat_1d k_eatom;
typename AT::t_efloat_1d v_eatom;
DAT::tdual_virial_array k_vatom;
typename ArrayTypes<DeviceType>::t_virial_array d_vatom;
typename AT::t_virial_array v_vatom;
HAT::t_virial_array h_vatom;
DAT::tdual_float_1d k_tap;
DAT::t_float_1d d_tap;
HAT::t_float_1d h_tap;
typename AT::t_float_1d d_bo_rij, d_hb_rsq, d_Deltap, d_Deltap_boc, d_total_bo;
typename AT::t_float_1d d_Delta, d_Delta_boc, d_Delta_lp, d_dDelta_lp, d_Delta_lp_temp, d_CdDelta;
typename AT::t_ffloat_2d_dl d_BO, d_BO_s, d_BO_pi, d_BO_pi2, d_dBOp;
typename AT::t_ffloat_2d_dl d_dln_BOp_pix, d_dln_BOp_piy, d_dln_BOp_piz;
typename AT::t_ffloat_2d_dl d_dln_BOp_pi2x, d_dln_BOp_pi2y, d_dln_BOp_pi2z;
typename AT::t_ffloat_2d_dl d_C1dbo, d_C2dbo, d_C3dbo;
typename AT::t_ffloat_2d_dl d_C1dbopi, d_C2dbopi, d_C3dbopi, d_C4dbopi;
typename AT::t_ffloat_2d_dl d_C1dbopi2, d_C2dbopi2, d_C3dbopi2, d_C4dbopi2;
typename AT::t_ffloat_2d_dl d_Cdbo, d_Cdbopi, d_Cdbopi2, d_dDeltap_self;
typedef Kokkos::DualView<F_FLOAT**[7],typename DeviceType::array_layout,DeviceType> tdual_ffloat_2d_n7;
typedef typename tdual_ffloat_2d_n7::t_dev_const_randomread t_ffloat_2d_n7_randomread;
typedef typename tdual_ffloat_2d_n7::t_host t_host_ffloat_2d_n7;
typename AT::t_neighbors_2d d_neighbors;
typename AT::t_int_1d_randomread d_ilist;
typename AT::t_int_1d_randomread d_numneigh;
typename AT::t_int_1d d_bo_first, d_bo_num, d_bo_list, d_hb_first, d_hb_num, d_hb_list;
DAT::tdual_int_scalar k_resize_bo, k_resize_hb;
typename AT::t_int_scalar d_resize_bo, d_resize_hb;
typename AT::t_ffloat_2d_dl d_sum_ovun;
typename AT::t_ffloat_2d_dl d_dBOpx, d_dBOpy, d_dBOpz;
int neighflag,newton_pair, maxnumneigh, maxhb, maxbo;
int nlocal,nall,eflag,vflag;
F_FLOAT cut_nbsq, cut_hbsq, cut_bosq, bo_cut, thb_cut, thb_cutsq;
F_FLOAT bo_cut_bond;
int vdwflag, lgflag;
F_FLOAT gp[39], p_boc1, p_boc2;
friend void pair_virial_fdotr_compute<PairReaxCKokkos>(PairReaxCKokkos*);
int bocnt,hbcnt;
typedef Kokkos::DualView<LR_lookup_table_kk**,LMPDeviceType::array_layout,DeviceType> tdual_LR_lookup_table_kk_2d;
typedef typename tdual_LR_lookup_table_kk_2d::t_dev t_LR_lookup_table_kk_2d;
tdual_LR_lookup_table_kk_2d k_LR;
t_LR_lookup_table_kk_2d d_LR;
DAT::tdual_int_2d k_tmpid;
DAT::tdual_ffloat_2d k_tmpbo;
DAT::tdual_int_scalar k_error_flag;
typename AT::t_int_1d d_numneigh_bonds;
typename AT::t_tagint_2d d_neighid;
typename AT::t_ffloat_2d d_abo;
typename AT::t_ffloat_1d d_buf;
DAT::tdual_int_scalar k_nbuf_local;
};
template <class DeviceType>
struct PairReaxCKokkosFindBondFunctor {
typedef DeviceType device_type;
typedef int value_type;
PairReaxCKokkos<DeviceType> c;
PairReaxCKokkosFindBondFunctor(PairReaxCKokkos<DeviceType>* c_ptr):c(*c_ptr) {};
KOKKOS_INLINE_FUNCTION
void join(volatile int &dst,
const volatile int &src) const {
dst = MAX(dst,src);
}
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, int &numbonds) const {
c.calculate_find_bond_item(ii,numbonds);
}
};
template <class DeviceType>
struct PairReaxCKokkosPackBondBufferFunctor {
typedef DeviceType device_type;
typedef int value_type;
PairReaxCKokkos<DeviceType> c;
PairReaxCKokkosPackBondBufferFunctor(PairReaxCKokkos<DeviceType>* c_ptr):c(*c_ptr) {};
KOKKOS_INLINE_FUNCTION
void operator()(const int ii, int &j, const bool &final) const {
c.pack_bond_buffer_item(ii,j,final);
}
};
}
#endif
#endif
/* ERROR/WARNING messages:
*/
diff --git a/src/KOKKOS/verlet_kokkos.cpp b/src/KOKKOS/verlet_kokkos.cpp
index 53b404237..e4a3f857d 100644
--- a/src/KOKKOS/verlet_kokkos.cpp
+++ b/src/KOKKOS/verlet_kokkos.cpp
@@ -1,624 +1,627 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <string.h>
#include "verlet_kokkos.h"
#include "neighbor.h"
#include "domain.h"
#include "comm.h"
#include "atom.h"
#include "atom_kokkos.h"
#include "atom_masks.h"
#include "force.h"
#include "pair.h"
#include "bond.h"
#include "angle.h"
#include "dihedral.h"
#include "improper.h"
#include "kspace.h"
#include "output.h"
#include "update.h"
#include "modify.h"
#include "compute.h"
#include "fix.h"
#include "timer.h"
#include "memory.h"
#include "error.h"
#include <ctime>
using namespace LAMMPS_NS;
template<class ViewA, class ViewB>
struct ForceAdder {
ViewA a;
ViewB b;
ForceAdder(const ViewA& a_, const ViewB& b_):a(a_),b(b_) {}
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
a(i,0) += b(i,0);
a(i,1) += b(i,1);
a(i,2) += b(i,2);
}
};
/* ---------------------------------------------------------------------- */
VerletKokkos::VerletKokkos(LAMMPS *lmp, int narg, char **arg) :
Verlet(lmp, narg, arg)
{
atomKK = (AtomKokkos *) atom;
}
/* ----------------------------------------------------------------------
setup before run
------------------------------------------------------------------------- */
-void VerletKokkos::setup()
+void VerletKokkos::setup(int flag)
{
if (comm->me == 0 && screen) {
fprintf(screen,"Setting up Verlet run ...\n");
- fprintf(screen," Unit style : %s\n", update->unit_style);
- fprintf(screen," Current step : " BIGINT_FORMAT "\n", update->ntimestep);
- fprintf(screen," Time step : %g\n", update->dt);
- timer->print_timeout(screen);
+ if (flag) {
+ fprintf(screen," Unit style : %s\n", update->unit_style);
+ fprintf(screen," Current step : " BIGINT_FORMAT "\n",
+ update->ntimestep);
+ fprintf(screen," Time step : %g\n", update->dt);
+ timer->print_timeout(screen);
+ }
}
update->setupflag = 1;
lmp->kokkos->auto_sync = 0;
// setup domain, communication and neighboring
// acquire ghosts
// build neighbor lists
atomKK->sync(Host,ALL_MASK);
atomKK->modified(Host,ALL_MASK);
atomKK->setup();
modify->setup_pre_exchange();
// debug
atomKK->sync(Host,ALL_MASK);
atomKK->modified(Host,ALL_MASK);
if (triclinic) domain->x2lamda(atomKK->nlocal);
domain->pbc();
atomKK->sync(Host,ALL_MASK);
domain->reset_box();
comm->setup();
if (neighbor->style) neighbor->setup_bins();
comm->exchange();
if (atomKK->sortfreq > 0) atomKK->sort();
comm->borders();
if (triclinic) domain->lamda2x(atomKK->nlocal+atomKK->nghost);
atomKK->sync(Host,ALL_MASK);
domain->image_check();
domain->box_too_small_check();
modify->setup_pre_neighbor();
atomKK->modified(Host,ALL_MASK);
neighbor->build();
neighbor->ncalls = 0;
// compute all forces
ev_set(update->ntimestep);
force_clear();
modify->setup_pre_force(vflag);
if (pair_compute_flag) {
atomKK->sync(force->pair->execution_space,force->pair->datamask_read);
force->pair->compute(eflag,vflag);
atomKK->modified(force->pair->execution_space,force->pair->datamask_modify);
timer->stamp(Timer::PAIR);
}
else if (force->pair) force->pair->compute_dummy(eflag,vflag);
if (atomKK->molecular) {
if (force->bond) {
atomKK->sync(force->bond->execution_space,force->bond->datamask_read);
force->bond->compute(eflag,vflag);
atomKK->modified(force->bond->execution_space,force->bond->datamask_modify);
}
if (force->angle) {
atomKK->sync(force->angle->execution_space,force->angle->datamask_read);
force->angle->compute(eflag,vflag);
atomKK->modified(force->angle->execution_space,force->angle->datamask_modify);
}
if (force->dihedral) {
atomKK->sync(force->dihedral->execution_space,force->dihedral->datamask_read);
force->dihedral->compute(eflag,vflag);
atomKK->modified(force->dihedral->execution_space,force->dihedral->datamask_modify);
}
if (force->improper) {
atomKK->sync(force->improper->execution_space,force->improper->datamask_read);
force->improper->compute(eflag,vflag);
atomKK->modified(force->improper->execution_space,force->improper->datamask_modify);
}
timer->stamp(Timer::BOND);
}
if(force->kspace) {
force->kspace->setup();
if (kspace_compute_flag) {
atomKK->sync(force->kspace->execution_space,force->kspace->datamask_read);
force->kspace->compute(eflag,vflag);
atomKK->modified(force->kspace->execution_space,force->kspace->datamask_modify);
timer->stamp(Timer::KSPACE);
} else force->kspace->compute_dummy(eflag,vflag);
}
if (force->newton) comm->reverse_comm();
modify->setup(vflag);
- output->setup();
+ output->setup(flag);
lmp->kokkos->auto_sync = 1;
update->setupflag = 1;
}
/* ----------------------------------------------------------------------
setup without output
flag = 0 = just force calculation
flag = 1 = reneighbor and force calculation
------------------------------------------------------------------------- */
void VerletKokkos::setup_minimal(int flag)
{
update->setupflag = 1;
lmp->kokkos->auto_sync = 0;
// setup domain, communication and neighboring
// acquire ghosts
// build neighbor lists
if (flag) {
atomKK->sync(Host,ALL_MASK);
atomKK->modified(Host,ALL_MASK);
modify->setup_pre_exchange();
// debug
atomKK->sync(Host,ALL_MASK);
atomKK->modified(Host,ALL_MASK);
if (triclinic) domain->x2lamda(atomKK->nlocal);
domain->pbc();
atomKK->sync(Host,ALL_MASK);
domain->reset_box();
comm->setup();
if (neighbor->style) neighbor->setup_bins();
comm->exchange();
comm->borders();
if (triclinic) domain->lamda2x(atomKK->nlocal+atomKK->nghost);
atomKK->sync(Host,ALL_MASK);
domain->image_check();
domain->box_too_small_check();
modify->setup_pre_neighbor();
atomKK->modified(Host,ALL_MASK);
neighbor->build();
neighbor->ncalls = 0;
}
// compute all forces
ev_set(update->ntimestep);
force_clear();
modify->setup_pre_force(vflag);
if (pair_compute_flag) {
atomKK->sync(force->pair->execution_space,force->pair->datamask_read);
force->pair->compute(eflag,vflag);
atomKK->modified(force->pair->execution_space,force->pair->datamask_modify);
timer->stamp(Timer::PAIR);
}
else if (force->pair) force->pair->compute_dummy(eflag,vflag);
if (atomKK->molecular) {
if (force->bond) {
atomKK->sync(force->bond->execution_space,force->bond->datamask_read);
force->bond->compute(eflag,vflag);
atomKK->modified(force->bond->execution_space,force->bond->datamask_modify);
}
if (force->angle) {
atomKK->sync(force->angle->execution_space,force->angle->datamask_read);
force->angle->compute(eflag,vflag);
atomKK->modified(force->angle->execution_space,force->angle->datamask_modify);
}
if (force->dihedral) {
atomKK->sync(force->dihedral->execution_space,force->dihedral->datamask_read);
force->dihedral->compute(eflag,vflag);
atomKK->modified(force->dihedral->execution_space,force->dihedral->datamask_modify);
}
if (force->improper) {
atomKK->sync(force->improper->execution_space,force->improper->datamask_read);
force->improper->compute(eflag,vflag);
atomKK->modified(force->improper->execution_space,force->improper->datamask_modify);
}
timer->stamp(Timer::BOND);
}
if(force->kspace) {
force->kspace->setup();
if (kspace_compute_flag) {
atomKK->sync(force->kspace->execution_space,force->kspace->datamask_read);
force->kspace->compute(eflag,vflag);
atomKK->modified(force->kspace->execution_space,force->kspace->datamask_modify);
timer->stamp(Timer::KSPACE);
} else force->kspace->compute_dummy(eflag,vflag);
}
if (force->newton) comm->reverse_comm();
modify->setup(vflag);
lmp->kokkos->auto_sync = 1;
update->setupflag = 0;
}
/* ----------------------------------------------------------------------
run for N steps
------------------------------------------------------------------------- */
void VerletKokkos::run(int n)
{
bigint ntimestep;
int nflag,sortflag;
int n_post_integrate = modify->n_post_integrate;
int n_pre_exchange = modify->n_pre_exchange;
int n_pre_neighbor = modify->n_pre_neighbor;
int n_pre_force = modify->n_pre_force;
int n_post_force = modify->n_post_force;
int n_end_of_step = modify->n_end_of_step;
lmp->kokkos->auto_sync = 0;
if (atomKK->sortfreq > 0) sortflag = 1;
else sortflag = 0;
f_merge_copy = DAT::t_f_array("VerletKokkos::f_merge_copy",atomKK->k_f.dimension_0());
static double time = 0.0;
atomKK->sync(Device,ALL_MASK);
Kokkos::Impl::Timer ktimer;
timer->init_timeout();
for (int i = 0; i < n; i++) {
if (timer->check_timeout(i)) {
update->nsteps = i;
break;
}
ntimestep = ++update->ntimestep;
ev_set(ntimestep);
// initial time integration
ktimer.reset();
timer->stamp();
modify->initial_integrate(vflag);
time += ktimer.seconds();
if (n_post_integrate) modify->post_integrate();
timer->stamp(Timer::MODIFY);
// regular communication vs neighbor list rebuild
nflag = neighbor->decide();
if (nflag == 0) {
timer->stamp();
comm->forward_comm();
timer->stamp(Timer::COMM);
} else {
// added debug
//atomKK->sync(Host,ALL_MASK);
//atomKK->modified(Host,ALL_MASK);
if (n_pre_exchange) {
timer->stamp();
modify->pre_exchange();
timer->stamp(Timer::MODIFY);
}
// debug
//atomKK->sync(Host,ALL_MASK);
//atomKK->modified(Host,ALL_MASK);
if (triclinic) domain->x2lamda(atomKK->nlocal);
domain->pbc();
if (domain->box_change) {
domain->reset_box();
comm->setup();
if (neighbor->style) neighbor->setup_bins();
}
timer->stamp();
// added debug
//atomKK->sync(Device,ALL_MASK);
//atomKK->modified(Device,ALL_MASK);
comm->exchange();
if (sortflag && ntimestep >= atomKK->nextsort) atomKK->sort();
comm->borders();
// added debug
//atomKK->sync(Host,ALL_MASK);
//atomKK->modified(Host,ALL_MASK);
if (triclinic) domain->lamda2x(atomKK->nlocal+atomKK->nghost);
timer->stamp(Timer::COMM);
if (n_pre_neighbor) {
modify->pre_neighbor();
timer->stamp(Timer::MODIFY);
}
neighbor->build();
timer->stamp(Timer::NEIGH);
}
// force computations
// important for pair to come before bonded contributions
// since some bonded potentials tally pairwise energy/virial
// and Pair:ev_tally() needs to be called before any tallying
force_clear();
timer->stamp();
if (n_pre_force) {
modify->pre_force(vflag);
timer->stamp(Timer::MODIFY);
}
bool execute_on_host = false;
unsigned int datamask_read_device = 0;
unsigned int datamask_modify_device = 0;
unsigned int datamask_read_host = 0;
if ( pair_compute_flag ) {
if (force->pair->execution_space==Host) {
execute_on_host = true;
datamask_read_host |= force->pair->datamask_read;
datamask_modify_device |= force->pair->datamask_modify;
} else {
datamask_read_device |= force->pair->datamask_read;
datamask_modify_device |= force->pair->datamask_modify;
}
}
if ( atomKK->molecular && force->bond ) {
if (force->bond->execution_space==Host) {
execute_on_host = true;
datamask_read_host |= force->bond->datamask_read;
datamask_modify_device |= force->bond->datamask_modify;
} else {
datamask_read_device |= force->bond->datamask_read;
datamask_modify_device |= force->bond->datamask_modify;
}
}
if ( atomKK->molecular && force->angle ) {
if (force->angle->execution_space==Host) {
execute_on_host = true;
datamask_read_host |= force->angle->datamask_read;
datamask_modify_device |= force->angle->datamask_modify;
} else {
datamask_read_device |= force->angle->datamask_read;
datamask_modify_device |= force->angle->datamask_modify;
}
}
if ( atomKK->molecular && force->dihedral ) {
if (force->dihedral->execution_space==Host) {
execute_on_host = true;
datamask_read_host |= force->dihedral->datamask_read;
datamask_modify_device |= force->dihedral->datamask_modify;
} else {
datamask_read_device |= force->dihedral->datamask_read;
datamask_modify_device |= force->dihedral->datamask_modify;
}
}
if ( atomKK->molecular && force->improper ) {
if (force->improper->execution_space==Host) {
execute_on_host = true;
datamask_read_host |= force->improper->datamask_read;
datamask_modify_device |= force->improper->datamask_modify;
} else {
datamask_read_device |= force->improper->datamask_read;
datamask_modify_device |= force->improper->datamask_modify;
}
}
if ( kspace_compute_flag ) {
if (force->kspace->execution_space==Host) {
execute_on_host = true;
datamask_read_host |= force->kspace->datamask_read;
datamask_modify_device |= force->kspace->datamask_modify;
} else {
datamask_read_device |= force->kspace->datamask_read;
datamask_modify_device |= force->kspace->datamask_modify;
}
}
if (pair_compute_flag) {
atomKK->sync(force->pair->execution_space,force->pair->datamask_read);
atomKK->sync(force->pair->execution_space,~(~force->pair->datamask_read|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
Kokkos::Impl::Timer ktimer;
force->pair->compute(eflag,vflag);
atomKK->modified(force->pair->execution_space,force->pair->datamask_modify);
atomKK->modified(force->pair->execution_space,~(~force->pair->datamask_modify|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
timer->stamp(Timer::PAIR);
}
if(execute_on_host) {
if(pair_compute_flag && force->pair->datamask_modify!=(F_MASK | ENERGY_MASK | VIRIAL_MASK))
Kokkos::fence();
atomKK->sync_overlapping_device(Host,~(~datamask_read_host|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
if(pair_compute_flag && force->pair->execution_space!=Host) {
Kokkos::deep_copy(LMPHostType(),atomKK->k_f.h_view,0.0);
}
}
if (atomKK->molecular) {
if (force->bond) {
atomKK->sync(force->bond->execution_space,~(~force->bond->datamask_read|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
force->bond->compute(eflag,vflag);
atomKK->modified(force->bond->execution_space,~(~force->bond->datamask_modify|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
}
if (force->angle) {
atomKK->sync(force->angle->execution_space,~(~force->angle->datamask_read|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
force->angle->compute(eflag,vflag);
atomKK->modified(force->angle->execution_space,~(~force->angle->datamask_modify|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
}
if (force->dihedral) {
atomKK->sync(force->dihedral->execution_space,~(~force->dihedral->datamask_read|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
force->dihedral->compute(eflag,vflag);
atomKK->modified(force->dihedral->execution_space,~(~force->dihedral->datamask_modify|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
}
if (force->improper) {
atomKK->sync(force->improper->execution_space,~(~force->improper->datamask_read|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
force->improper->compute(eflag,vflag);
atomKK->modified(force->improper->execution_space,~(~force->improper->datamask_modify|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
}
timer->stamp(Timer::BOND);
}
if (kspace_compute_flag) {
atomKK->sync(force->kspace->execution_space,~(~force->kspace->datamask_read|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
force->kspace->compute(eflag,vflag);
atomKK->modified(force->kspace->execution_space,~(~force->kspace->datamask_modify|(F_MASK | ENERGY_MASK | VIRIAL_MASK)));
timer->stamp(Timer::KSPACE);
}
if(execute_on_host && !std::is_same<LMPHostType,LMPDeviceType>::value) {
if(f_merge_copy.dimension_0()<atomKK->k_f.dimension_0()) {
f_merge_copy = DAT::t_f_array("VerletKokkos::f_merge_copy",atomKK->k_f.dimension_0());
}
f = atomKK->k_f.d_view;
Kokkos::deep_copy(LMPHostType(),f_merge_copy,atomKK->k_f.h_view);
Kokkos::parallel_for(atomKK->k_f.dimension_0(),
ForceAdder<DAT::t_f_array,DAT::t_f_array>(atomKK->k_f.d_view,f_merge_copy));
atomKK->k_f.modified_host() = 0; // special case
atomKK->k_f.modify<LMPDeviceType>();
}
// reverse communication of forces
if (force->newton) comm->reverse_comm();
timer->stamp(Timer::COMM);
// force modifications, final time integration, diagnostics
if (n_post_force) modify->post_force(vflag);
modify->final_integrate();
if (n_end_of_step) modify->end_of_step();
timer->stamp(Timer::MODIFY);
// all output
if (ntimestep == output->next) {
atomKK->sync(Host,ALL_MASK);
timer->stamp();
output->write(ntimestep);
timer->stamp(Timer::OUTPUT);
}
}
atomKK->sync(Host,ALL_MASK);
lmp->kokkos->auto_sync = 1;
}
/* ----------------------------------------------------------------------
clear force on own & ghost atoms
clear other arrays as needed
------------------------------------------------------------------------- */
void VerletKokkos::force_clear()
{
int i;
if (external_force_clear) return;
// clear force on all particles
// if either newton flag is set, also include ghosts
// when using threads always clear all forces.
if (neighbor->includegroup == 0) {
int nall;
if (force->newton) nall = atomKK->nlocal + atomKK->nghost;
else nall = atomKK->nlocal;
size_t nbytes = sizeof(double) * nall;
if (nbytes) {
if (atomKK->k_f.modified_host() > atomKK->k_f.modified_device()) {
memset_kokkos(atomKK->k_f.view<LMPHostType>());
atomKK->modified(Host,F_MASK);
atomKK->sync(Device,F_MASK);
} else {
memset_kokkos(atomKK->k_f.view<LMPDeviceType>());
atomKK->modified(Device,F_MASK);
}
if (torqueflag) memset(&(atomKK->torque[0][0]),0,3*nbytes);
}
// neighbor includegroup flag is set
// clear force only on initial nfirst particles
// if either newton flag is set, also include ghosts
} else {
int nall = atomKK->nfirst;
if (atomKK->k_f.modified_host() > atomKK->k_f.modified_device()) {
memset_kokkos(atomKK->k_f.view<LMPHostType>());
atomKK->modified(Host,F_MASK);
} else {
memset_kokkos(atomKK->k_f.view<LMPDeviceType>());
atomKK->modified(Device,F_MASK);
}
if (torqueflag) {
double **torque = atomKK->torque;
for (i = 0; i < nall; i++) {
torque[i][0] = 0.0;
torque[i][1] = 0.0;
torque[i][2] = 0.0;
}
}
if (force->newton) {
nall = atomKK->nlocal + atomKK->nghost;
if (torqueflag) {
double **torque = atomKK->torque;
for (i = atomKK->nlocal; i < nall; i++) {
torque[i][0] = 0.0;
torque[i][1] = 0.0;
torque[i][2] = 0.0;
}
}
}
}
}
diff --git a/src/KOKKOS/verlet_kokkos.h b/src/KOKKOS/verlet_kokkos.h
index 03a938332..645523920 100644
--- a/src/KOKKOS/verlet_kokkos.h
+++ b/src/KOKKOS/verlet_kokkos.h
@@ -1,57 +1,57 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef INTEGRATE_CLASS
IntegrateStyle(verlet/kk,VerletKokkos)
#else
#ifndef LMP_VERLET_KOKKOS_H
#define LMP_VERLET_KOKKOS_H
#include "verlet.h"
#include "kokkos_type.h"
namespace LAMMPS_NS {
class VerletKokkos : public Verlet {
public:
VerletKokkos(class LAMMPS *, int, char **);
~VerletKokkos() {}
- void setup();
+ void setup(int flag=1);
void setup_minimal(int);
void run(int);
KOKKOS_INLINE_FUNCTION
void operator() (const int& i) const {
f(i,0) += f_merge_copy(i,0);
f(i,1) += f_merge_copy(i,1);
f(i,2) += f_merge_copy(i,2);
}
protected:
DAT::t_f_array f_merge_copy,f;
void force_clear();
};
}
#endif
#endif
/* ERROR/WARNING messages:
*/
diff --git a/src/KSPACE/ewald_disp.cpp b/src/KSPACE/ewald_disp.cpp
index 467a748d0..85e3da921 100644
--- a/src/KSPACE/ewald_disp.cpp
+++ b/src/KSPACE/ewald_disp.cpp
@@ -1,1509 +1,1510 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing authors: Pieter in 't Veld (SNL), Stan Moore (SNL)
------------------------------------------------------------------------- */
#include <mpi.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "ewald_disp.h"
#include "math_vector.h"
#include "math_const.h"
#include "math_special.h"
#include "atom.h"
#include "comm.h"
#include "force.h"
#include "pair.h"
#include "domain.h"
#include "memory.h"
#include "error.h"
#include "update.h"
using namespace LAMMPS_NS;
using namespace MathConst;
using namespace MathSpecial;
#define SMALL 0.00001
enum{GEOMETRIC,ARITHMETIC,SIXTHPOWER}; // same as in pair.h
//#define DEBUG
/* ---------------------------------------------------------------------- */
EwaldDisp::EwaldDisp(LAMMPS *lmp, int narg, char **arg) : KSpace(lmp, narg, arg),
kenergy(NULL), kvirial(NULL), energy_self_peratom(NULL), virial_self_peratom(NULL),
ekr_local(NULL), hvec(NULL), kvec(NULL), B(NULL), cek_local(NULL), cek_global(NULL)
{
if (narg!=1) error->all(FLERR,"Illegal kspace_style ewald/n command");
ewaldflag = dispersionflag = dipoleflag = 1;
accuracy_relative = fabs(force->numeric(FLERR,arg[0]));
memset(function, 0, EWALD_NFUNCS*sizeof(int));
kenergy = kvirial = NULL;
cek_local = cek_global = NULL;
ekr_local = NULL;
hvec = NULL;
kvec = NULL;
B = NULL;
first_output = 0;
energy_self_peratom = NULL;
virial_self_peratom = NULL;
nmax = 0;
q2 = 0;
b2 = 0;
M2 = 0;
}
/* ---------------------------------------------------------------------- */
EwaldDisp::~EwaldDisp()
{
deallocate();
deallocate_peratom();
delete [] ekr_local;
delete [] B;
}
/* --------------------------------------------------------------------- */
void EwaldDisp::init()
{
nkvec = nkvec_max = nevec = nevec_max = 0;
nfunctions = nsums = sums = 0;
nbox = -1;
bytes = 0.0;
if (!comm->me) {
if (screen) fprintf(screen,"EwaldDisp initialization ...\n");
if (logfile) fprintf(logfile,"EwaldDisp initialization ...\n");
}
triclinic_check();
if (domain->dimension == 2)
error->all(FLERR,"Cannot use EwaldDisp with 2d simulation");
if (slabflag == 0 && domain->nonperiodic > 0)
error->all(FLERR,"Cannot use nonperiodic boundaries with EwaldDisp");
if (slabflag == 1) {
if (domain->xperiodic != 1 || domain->yperiodic != 1 ||
domain->boundary[2][0] != 1 || domain->boundary[2][1] != 1)
error->all(FLERR,"Incorrect boundaries with slab EwaldDisp");
}
scale = 1.0;
mumurd2e = force->qqrd2e;
dielectric = force->dielectric;
int tmp;
Pair *pair = force->pair;
int *ptr = pair ? (int *) pair->extract("ewald_order",tmp) : NULL;
double *cutoff = pair ? (double *) pair->extract("cut_coul",tmp) : NULL;
if (!(ptr||cutoff))
error->all(FLERR,"KSpace style is incompatible with Pair style");
int ewald_order = ptr ? *((int *) ptr) : 1<<1;
int ewald_mix = ptr ? *((int *) pair->extract("ewald_mix",tmp)) : GEOMETRIC;
memset(function, 0, EWALD_NFUNCS*sizeof(int));
for (int i=0; i<=EWALD_NORDER; ++i) // transcribe order
if (ewald_order&(1<<i)) { // from pair_style
int n[] = EWALD_NSUMS, k = 0;
switch (i) {
case 1:
k = 0; break;
case 3:
k = 3; break;
case 6:
if (ewald_mix==GEOMETRIC) { k = 1; break; }
else if (ewald_mix==ARITHMETIC) { k = 2; break; }
error->all(FLERR,
"Unsupported mixing rule in kspace_style ewald/disp");
default:
error->all(FLERR,"Unsupported order in kspace_style ewald/disp");
}
nfunctions += function[k] = 1;
nsums += n[k];
}
- if (!gewaldflag) g_ewald = 0.0;
+ if (!gewaldflag) g_ewald = g_ewald_6 = 1.0;
pair->init(); // so B is defined
init_coeffs();
init_coeff_sums();
if (function[0]) qsum_qsq();
else qsqsum = qsum = 0.0;
natoms_original = atom->natoms;
+ if (!gewaldflag) g_ewald = g_ewald_6 = 0.0;
// turn off coulombic if no charge
if (function[0] && qsqsum == 0.0) {
function[0] = 0;
nfunctions -= 1;
nsums -= 1;
}
double bsbsum = 0.0;
M2 = 0.0;
if (function[1]) bsbsum = sum[1].x2;
if (function[2]) bsbsum = sum[2].x2;
if (function[3]) M2 = sum[9].x2;
if (function[3] && strcmp(update->unit_style,"electron") == 0)
error->all(FLERR,"Cannot (yet) use 'electron' units with dipoles");
if (qsqsum == 0.0 && bsbsum == 0.0 && M2 == 0.0)
error->all(FLERR,"Cannot use Ewald/disp solver "
"on system with no charge, dipole, or LJ particles");
if (fabs(qsum) > SMALL && comm->me == 0) {
char str[128];
sprintf(str,"System is not charge neutral, net charge = %g",qsum);
error->warning(FLERR,str);
}
if (!function[1] && !function[2]) dispersionflag = 0;
if (!function[3]) dipoleflag = 0;
pair_check();
// set accuracy (force units) from accuracy_relative or accuracy_absolute
if (accuracy_absolute >= 0.0) accuracy = accuracy_absolute;
else accuracy = accuracy_relative * two_charge_force;
// setup K-space resolution
q2 = qsqsum * force->qqrd2e;
M2 *= mumurd2e;
b2 = bsbsum; //Are these units right?
bigint natoms = atom->natoms;
if (!gewaldflag) {
if (function[0]) {
g_ewald = accuracy*sqrt(natoms*(*cutoff)*shape_det(domain->h)) / (2.0*q2);
if (g_ewald >= 1.0) g_ewald = (1.35 - 0.15*log(accuracy))/(*cutoff);
else g_ewald = sqrt(-log(g_ewald)) / (*cutoff);
} else if (function[3]) {
//Try Newton Solver
//Use old method to get guess
g_ewald = (1.35 - 0.15*log(accuracy))/ *cutoff;
double g_ewald_new =
NewtonSolve(g_ewald,(*cutoff),natoms,shape_det(domain->h),M2);
if (g_ewald_new > 0.0) g_ewald = g_ewald_new;
else error->warning(FLERR,"Ewald/disp Newton solver failed, "
"using old method to estimate g_ewald");
} else if (function[1] || function[2]) {
//Try Newton Solver
//Use old method to get guess
g_ewald = (1.35 - 0.15*log(accuracy))/ *cutoff;
double g_ewald_new =
NewtonSolve(g_ewald,(*cutoff),natoms,shape_det(domain->h),b2);
if (g_ewald_new > 0.0) g_ewald = g_ewald_new;
else error->warning(FLERR,"Ewald/disp Newton solver failed, "
"using old method to estimate g_ewald");
}
}
if (!comm->me) {
- if (screen) fprintf(screen, " G vector = %g\n", g_ewald);
- if (logfile) fprintf(logfile, " G vector = %g\n", g_ewald);
+ if (screen) fprintf(screen, " G vector = %g, accuracy = %g\n", g_ewald,accuracy);
+ if (logfile) fprintf(logfile, " G vector = %g accuracy = %g\n", g_ewald,accuracy);
}
g_ewald_6 = g_ewald;
deallocate_peratom();
peratom_allocate_flag = 0;
}
/* ----------------------------------------------------------------------
adjust EwaldDisp coeffs, called initially and whenever volume has changed
------------------------------------------------------------------------- */
void EwaldDisp::setup()
{
volume = shape_det(domain->h)*slab_volfactor;
memcpy(unit, domain->h_inv, sizeof(shape));
shape_scalar_mult(unit, 2.0*MY_PI);
unit[2] /= slab_volfactor;
// int nbox_old = nbox, nkvec_old = nkvec;
if (accuracy >= 1) {
nbox = 0;
error->all(FLERR,"KSpace accuracy too low");
}
bigint natoms = atom->natoms;
double err;
int kxmax = 1;
int kymax = 1;
int kzmax = 1;
err = rms(kxmax,domain->h[0],natoms,q2,b2,M2);
while (err > accuracy) {
kxmax++;
err = rms(kxmax,domain->h[0],natoms,q2,b2,M2);
}
err = rms(kymax,domain->h[1],natoms,q2,b2,M2);
while (err > accuracy) {
kymax++;
err = rms(kymax,domain->h[1],natoms,q2,b2,M2);
}
err = rms(kzmax,domain->h[2]*slab_volfactor,natoms,q2,b2,M2);
while (err > accuracy) {
kzmax++;
err = rms(kzmax,domain->h[2]*slab_volfactor,natoms,q2,b2,M2);
}
nbox = MAX(kxmax,kymax);
nbox = MAX(nbox,kzmax);
double gsqxmx = unit[0]*unit[0]*kxmax*kxmax;
double gsqymx = unit[1]*unit[1]*kymax*kymax;
double gsqzmx = unit[2]*unit[2]*kzmax*kzmax;
gsqmx = MAX(gsqxmx,gsqymx);
gsqmx = MAX(gsqmx,gsqzmx);
gsqmx *= 1.00001;
reallocate();
coefficients();
init_coeffs();
init_coeff_sums();
init_self();
if (!(first_output||comm->me)) {
first_output = 1;
if (screen) fprintf(screen,
" vectors: nbox = %d, nkvec = %d\n", nbox, nkvec);
if (logfile) fprintf(logfile,
" vectors: nbox = %d, nkvec = %d\n", nbox, nkvec);
}
}
/* ----------------------------------------------------------------------
compute RMS accuracy for a dimension
------------------------------------------------------------------------- */
double EwaldDisp::rms(int km, double prd, bigint natoms,
double q2, double b2, double M2)
{
double value = 0.0;
// Coulombic
double g2 = g_ewald*g_ewald;
value += 2.0*q2*g_ewald/prd *
sqrt(1.0/(MY_PI*km*natoms)) *
exp(-MY_PI*MY_PI*km*km/(g2*prd*prd));
// Lennard-Jones
double g7 = g2*g2*g2*g_ewald;
value += 4.0*b2*g7/3.0 *
sqrt(1.0/(MY_PI*natoms)) *
(exp(-MY_PI*MY_PI*km*km/(g2*prd*prd)) *
(MY_PI*km/(g_ewald*prd) + 1));
// dipole
value += 8.0*MY_PI*M2/volume*g_ewald *
sqrt(2.0*MY_PI*km*km*km/(15.0*natoms)) *
exp(-pow(MY_PI*km/(g_ewald*prd),2.0));
return value;
}
void EwaldDisp::reallocate()
{
int ix, iy, iz;
int nkvec_max = nkvec;
vector h;
nkvec = 0;
int *kflag = new int[(nbox+1)*(2*nbox+1)*(2*nbox+1)];
int *flag = kflag;
for (ix=0; ix<=nbox; ++ix)
for (iy=-nbox; iy<=nbox; ++iy)
for (iz=-nbox; iz<=nbox; ++iz)
if (!(ix||iy||iz)) *(flag++) = 0;
else if ((!ix)&&(iy<0)) *(flag++) = 0;
else if ((!(ix||iy))&&(iz<0)) *(flag++) = 0; // use symmetry
else {
h[0] = unit[0]*ix;
h[1] = unit[5]*ix+unit[1]*iy;
h[2] = unit[4]*ix+unit[3]*iy+unit[2]*iz;
if ((*(flag++) = h[0]*h[0]+h[1]*h[1]+h[2]*h[2]<=gsqmx)) ++nkvec;
}
if (nkvec>nkvec_max) {
deallocate(); // free memory
hvec = new hvector[nkvec]; // hvec
bytes += (nkvec-nkvec_max)*sizeof(hvector);
kvec = new kvector[nkvec]; // kvec
bytes += (nkvec-nkvec_max)*sizeof(kvector);
kenergy = new double[nkvec*nfunctions]; // kenergy
bytes += (nkvec-nkvec_max)*nfunctions*sizeof(double);
kvirial = new double[6*nkvec*nfunctions]; // kvirial
bytes += 6*(nkvec-nkvec_max)*nfunctions*sizeof(double);
cek_local = new complex[nkvec*nsums]; // cek_local
bytes += (nkvec-nkvec_max)*nsums*sizeof(complex);
cek_global = new complex[nkvec*nsums]; // cek_global
bytes += (nkvec-nkvec_max)*nsums*sizeof(complex);
nkvec_max = nkvec;
}
flag = kflag; // create index and
kvector *k = kvec; // wave vectors
hvector *hi = hvec;
for (ix=0; ix<=nbox; ++ix)
for (iy=-nbox; iy<=nbox; ++iy)
for (iz=-nbox; iz<=nbox; ++iz)
if (*(flag++)) {
hi->x = unit[0]*ix;
hi->y = unit[5]*ix+unit[1]*iy;
(hi++)->z = unit[4]*ix+unit[3]*iy+unit[2]*iz;
k->x = ix+nbox; k->y = iy+nbox; (k++)->z = iz+nbox; }
delete [] kflag;
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::reallocate_atoms()
{
if (eflag_atom || vflag_atom)
if (atom->nmax > nmax) {
deallocate_peratom();
allocate_peratom();
nmax = atom->nmax;
}
if ((nevec = atom->nmax*(2*nbox+1))<=nevec_max) return;
delete [] ekr_local;
ekr_local = new cvector[nevec];
bytes += (nevec-nevec_max)*sizeof(cvector);
nevec_max = nevec;
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::allocate_peratom()
{
memory->create(energy_self_peratom,
atom->nmax,EWALD_NFUNCS,"ewald/n:energy_self_peratom");
memory->create(virial_self_peratom,
atom->nmax,EWALD_NFUNCS,"ewald/n:virial_self_peratom");
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::deallocate_peratom() // free memory
{
if (energy_self_peratom) {
memory->destroy(energy_self_peratom);
energy_self_peratom = NULL;
}
if (virial_self_peratom) {
memory->destroy(virial_self_peratom);
virial_self_peratom = NULL;
}
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::deallocate() // free memory
{
delete [] hvec; hvec = NULL;
delete [] kvec; kvec = NULL;
delete [] kenergy; kenergy = NULL;
delete [] kvirial; kvirial = NULL;
delete [] cek_local; cek_local = NULL;
delete [] cek_global; cek_global = NULL;
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::coefficients()
{
vector h;
hvector *hi = hvec, *nh;
double eta2 = 0.25/(g_ewald*g_ewald);
double b1, b2, expb2, h1, h2, c1, c2;
double *ke = kenergy, *kv = kvirial;
int func0 = function[0], func12 = function[1]||function[2],
func3 = function[3];
for (nh = (hi = hvec)+nkvec; hi<nh; ++hi) { // wave vectors
memcpy(h, hi, sizeof(vector));
expb2 = exp(-(b2 = (h2 = vec_dot(h, h))*eta2));
if (func0) { // qi*qj/r coeffs
*(ke++) = c1 = expb2/h2;
*(kv++) = c1-(c2 = 2.0*c1*(1.0+b2)/h2)*h[0]*h[0];
*(kv++) = c1-c2*h[1]*h[1]; // lammps convention
*(kv++) = c1-c2*h[2]*h[2]; // instead of voigt
*(kv++) = -c2*h[1]*h[0];
*(kv++) = -c2*h[2]*h[0];
*(kv++) = -c2*h[2]*h[1];
}
if (func12) { // -Bij/r^6 coeffs
b1 = sqrt(b2); // minus sign folded
h1 = sqrt(h2); // into constants
*(ke++) = c1 = -h1*h2*((c2=MY_PIS*erfc(b1))+(0.5/b2-1.0)*expb2/b1);
*(kv++) = c1-(c2 = 3.0*h1*(c2-expb2/b1))*h[0]*h[0];
*(kv++) = c1-c2*h[1]*h[1]; // lammps convention
*(kv++) = c1-c2*h[2]*h[2]; // instead of voigt
*(kv++) = -c2*h[1]*h[0];
*(kv++) = -c2*h[2]*h[0];
*(kv++) = -c2*h[2]*h[1];
}
if (func3) { // dipole coeffs
*(ke++) = c1 = expb2/h2;
*(kv++) = c1-(c2 = 2.0*c1*(1.0+b2)/h2)*h[0]*h[0];
*(kv++) = c1-c2*h[1]*h[1]; // lammps convention
*(kv++) = c1-c2*h[2]*h[2]; // instead of voigt
*(kv++) = -c2*h[1]*h[0];
*(kv++) = -c2*h[2]*h[0];
*(kv++) = -c2*h[2]*h[1];
}
}
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::init_coeffs()
{
int tmp;
int n = atom->ntypes;
if (function[1]) { // geometric 1/r^6
double **b = (double **) force->pair->extract("B",tmp);
delete [] B;
B = new double[n+1];
B[0] = 0.0;
bytes += (n+1)*sizeof(double);
for (int i=1; i<=n; ++i) B[i] = sqrt(fabs(b[i][i]));
}
if (function[2]) { // arithmetic 1/r^6
double **epsilon = (double **) force->pair->extract("epsilon",tmp);
double **sigma = (double **) force->pair->extract("sigma",tmp);
double eps_i, sigma_i, sigma_n, *bi = B = new double[7*n+7];
double c[7] = {
1.0, sqrt(6.0), sqrt(15.0), sqrt(20.0), sqrt(15.0), sqrt(6.0), 1.0};
if (!(epsilon&&sigma))
error->all(
FLERR,"Epsilon or sigma reference not set by pair style in ewald/n");
for (int j=0; j<7; ++j)
*(bi++) = 0.0;
for (int i=1; i<=n; ++i) {
eps_i = sqrt(epsilon[i][i]);
sigma_i = sigma[i][i];
sigma_n = 1.0;
for (int j=0; j<7; ++j) {
*(bi++) = sigma_n*eps_i*c[j]; sigma_n *= sigma_i;
}
}
}
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::init_coeff_sums()
{
if (sums) return; // calculated only once
sums = 1;
Sum sum_local[EWALD_MAX_NSUMS];
memset(sum_local, 0, EWALD_MAX_NSUMS*sizeof(Sum));
memset(sum, 0, EWALD_MAX_NSUMS*sizeof(Sum));
// now perform qsum and qsq via parent qsum_qsq()
sum_local[0].x = 0.0;
sum_local[0].x2 = 0.0;
//if (function[0]) { // 1/r
// double *q = atom->q, *qn = q+atom->nlocal;
// for (double *i=q; i<qn; ++i) {
// sum_local[0].x += i[0]; sum_local[0].x2 += i[0]*i[0]; }
//}
if (function[1]) { // geometric 1/r^6
int *type = atom->type, *ntype = type+atom->nlocal;
for (int *i=type; i<ntype; ++i) {
sum_local[1].x += B[i[0]]; sum_local[1].x2 += B[i[0]]*B[i[0]]; }
}
if (function[2]) { // arithmetic 1/r^6
double *bi;
int *type = atom->type, *ntype = type+atom->nlocal;
for (int *i=type; i<ntype; ++i) {
bi = B+7*i[0];
sum_local[2].x2 += bi[0]*bi[6];
for (int k=2; k<9; ++k) sum_local[k].x += *(bi++);
}
}
if (function[3]&&atom->mu) { // dipole
double *mu = atom->mu[0], *nmu = mu+4*atom->nlocal;
for (double *i = mu; i < nmu; i += 4)
sum_local[9].x2 += i[3]*i[3];
}
MPI_Allreduce(sum_local, sum, 2*EWALD_MAX_NSUMS, MPI_DOUBLE, MPI_SUM, world);
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::init_self()
{
double g1 = g_ewald, g2 = g1*g1, g3 = g1*g2;
const double qscale = force->qqrd2e * scale;
memset(energy_self, 0, EWALD_NFUNCS*sizeof(double)); // self energy
memset(virial_self, 0, EWALD_NFUNCS*sizeof(double));
if (function[0]) { // 1/r
virial_self[0] = -0.5*MY_PI*qscale/(g2*volume)*qsum*qsum;
energy_self[0] = qsqsum*qscale*g1/MY_PIS-virial_self[0];
}
if (function[1]) { // geometric 1/r^6
virial_self[1] = MY_PI*MY_PIS*g3/(6.0*volume)*sum[1].x*sum[1].x;
energy_self[1] = -sum[1].x2*g3*g3/12.0+virial_self[1];
}
if (function[2]) { // arithmetic 1/r^6
virial_self[2] = MY_PI*MY_PIS*g3/(48.0*volume)*(sum[2].x*sum[8].x+
sum[3].x*sum[7].x+sum[4].x*sum[6].x+0.5*sum[5].x*sum[5].x);
energy_self[2] = -sum[2].x2*g3*g3/3.0+virial_self[2];
}
if (function[3]) { // dipole
virial_self[3] = 0; // in surface
energy_self[3] = sum[9].x2*mumurd2e*2.0*g3/3.0/MY_PIS-virial_self[3];
}
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::init_self_peratom()
{
if (!(vflag_atom || eflag_atom)) return;
double g1 = g_ewald, g2 = g1*g1, g3 = g1*g2;
const double qscale = force->qqrd2e * scale;
double *energy = energy_self_peratom[0];
double *virial = virial_self_peratom[0];
int nlocal = atom->nlocal;
memset(energy, 0, EWALD_NFUNCS*nlocal*sizeof(double));
memset(virial, 0, EWALD_NFUNCS*nlocal*sizeof(double));
if (function[0]) { // 1/r
double *ei = energy;
double *vi = virial;
double ce = qscale*g1/MY_PIS;
double cv = -0.5*MY_PI*qscale/(g2*volume);
double *qi = atom->q, *qn = qi + nlocal;
for (; qi < qn; qi++, vi += EWALD_NFUNCS, ei += EWALD_NFUNCS) {
double q = *qi;
*vi = cv*q*qsum;
*ei = ce*q*q-vi[0];
}
}
if (function[1]) { // geometric 1/r^6
double *ei = energy+1;
double *vi = virial+1;
double ce = -g3*g3/12.0;
double cv = MY_PI*MY_PIS*g3/(6.0*volume);
int *typei = atom->type, *typen = typei + atom->nlocal;
for (; typei < typen; typei++, vi += EWALD_NFUNCS, ei += EWALD_NFUNCS) {
double b = B[*typei];
*vi = cv*b*sum[1].x;
*ei = ce*b*b+vi[0];
}
}
if (function[2]) { // arithmetic 1/r^6
double *bi;
double *ei = energy+2;
double *vi = virial+2;
double ce = -g3*g3/3.0;
double cv = 0.5*MY_PI*MY_PIS*g3/(48.0*volume);
int *typei = atom->type, *typen = typei + atom->nlocal;
for (; typei < typen; typei++, vi += EWALD_NFUNCS, ei += EWALD_NFUNCS) {
bi = B+7*typei[0]+7;
for (int k=2; k<9; ++k) *vi += cv*sum[k].x*(--bi)[0];
/* PJV 20120225:
should this be this instead? above implies an inverse dependence
seems to be the above way in original; i recall having tested
arithmetic mixing in the conception phase, but an extra test would
be prudent (pattern repeats in multiple functions below)
bi = B+7*typei[0];
for (int k=2; k<9; ++k) *vi += cv*sum[k].x*(bi++)[0];
*/
*ei = ce*bi[0]*bi[6]+vi[0];
}
}
if (function[3]&&atom->mu) { // dipole
double *ei = energy+3;
double *vi = virial+3;
double *imu = atom->mu[0], *nmu = imu+4*atom->nlocal;
double ce = mumurd2e*2.0*g3/3.0/MY_PIS;
for (; imu < nmu; imu += 4, vi += EWALD_NFUNCS, ei += EWALD_NFUNCS) {
*vi = 0; // in surface
*ei = ce*imu[3]*imu[3]-vi[0];
}
}
}
/* ----------------------------------------------------------------------
compute the EwaldDisp long-range force, energy, virial
------------------------------------------------------------------------- */
void EwaldDisp::compute(int eflag, int vflag)
{
if (!nbox) return;
// set energy/virial flags
// invoke allocate_peratom() if needed for first time
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = eflag_global = vflag_global = eflag_atom = vflag_atom = 0;
if (!peratom_allocate_flag && (eflag_atom || vflag_atom)) {
allocate_peratom();
peratom_allocate_flag = 1;
nmax = atom->nmax;
}
reallocate_atoms();
init_self_peratom();
compute_ek();
compute_force();
//compute_surface(); // assume conducting metal (tinfoil) boundary conditions
// update qsum and qsqsum, if atom count has changed and energy needed
if ((eflag_global || eflag_atom) && atom->natoms != natoms_original) {
if (function[0]) qsum_qsq();
natoms_original = atom->natoms;
}
compute_energy();
compute_energy_peratom();
compute_virial();
compute_virial_dipole();
compute_virial_peratom();
if (slabflag) compute_slabcorr();
}
void EwaldDisp::compute_ek()
{
cvector *ekr = ekr_local;
int lbytes = (2*nbox+1)*sizeof(cvector);
hvector *h = NULL;
kvector *k, *nk = kvec+nkvec;
cvector *z = new cvector[2*nbox+1];
cvector z1, *zx, *zy, *zz, *zn = z+2*nbox;
complex *cek, zxyz, zxy = COMPLEX_NULL, cx = COMPLEX_NULL;
vector mui;
double *x = atom->x[0], *xn = x+3*atom->nlocal, *q = atom->q, qi = 0.0;
double bi = 0.0, ci[7];
double *mu = atom->mu ? atom->mu[0] : NULL;
int i, kx, ky, n = nkvec*nsums, *type = atom->type, tri = domain->triclinic;
int func[EWALD_NFUNCS];
memcpy(func, function, EWALD_NFUNCS*sizeof(int));
memset(cek_local, 0, n*sizeof(complex)); // reset sums
while (x<xn) {
zx = (zy = (zz = z+nbox)+1)-2;
C_SET(zz->x, 1, 0); C_SET(zz->y, 1, 0); C_SET(zz->z, 1, 0); // z[0]
if (tri) { // triclinic z[1]
C_ANGLE(z1.x, unit[0]*x[0]+unit[5]*x[1]+unit[4]*x[2]);
C_ANGLE(z1.y, unit[1]*x[1]+unit[3]*x[2]);
C_ANGLE(z1.z, x[2]*unit[2]); x += 3;
}
else { // orthogonal z[1]
C_ANGLE(z1.x, *(x++)*unit[0]);
C_ANGLE(z1.y, *(x++)*unit[1]);
C_ANGLE(z1.z, *(x++)*unit[2]);
}
for (; zz<zn; --zx, ++zy, ++zz) { // set up z[k]=e^(ik.r)
C_RMULT(zy->x, zz->x, z1.x); // 3D k-vector
C_RMULT(zy->y, zz->y, z1.y); C_CONJ(zx->y, zy->y);
C_RMULT(zy->z, zz->z, z1.z); C_CONJ(zx->z, zy->z);
}
kx = ky = -1;
cek = cek_local;
if (func[0]) qi = *(q++);
if (func[1]) bi = B[*type];
if (func[2]) memcpy(ci, B+7*type[0], 7*sizeof(double));
if (func[3]) {
memcpy(mui, mu, sizeof(vector));
mu += 4;
h = hvec;
}
for (k=kvec; k<nk; ++k) { // compute rho(k)
if (ky!=k->y) { // based on order in
if (kx!=k->x) cx = z[kx = k->x].x; // reallocate
C_RMULT(zxy, z[ky = k->y].y, cx);
}
C_RMULT(zxyz, z[k->z].z, zxy);
if (func[0]) {
cek->re += zxyz.re*qi; (cek++)->im += zxyz.im*qi;
}
if (func[1]) {
cek->re += zxyz.re*bi; (cek++)->im += zxyz.im*bi;
}
if (func[2]) for (i=0; i<7; ++i) {
cek->re += zxyz.re*ci[i]; (cek++)->im += zxyz.im*ci[i];
}
if (func[3]) {
register double muk = mui[0]*h->x+mui[1]*h->y+mui[2]*h->z; ++h;
cek->re += zxyz.re*muk; (cek++)->im += zxyz.im*muk;
}
}
ekr = (cvector *) ((char *) memcpy(ekr, z, lbytes)+lbytes);
++type;
}
MPI_Allreduce(cek_local, cek_global, 2*n, MPI_DOUBLE, MPI_SUM, world);
delete [] z;
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::compute_force()
{
kvector *k;
hvector *h, *nh;
cvector *z = ekr_local;
vector sum[EWALD_MAX_NSUMS], mui = COMPLEX_NULL;
complex *cek, zc, zx = COMPLEX_NULL, zxy = COMPLEX_NULL;
complex *cek_coul;
double *f = atom->f[0], *fn = f+3*atom->nlocal, *q = atom->q, *t = NULL;
double *mu = atom->mu ? atom->mu[0] : NULL;
const double qscale = force->qqrd2e * scale;
double *ke, c[EWALD_NFUNCS] = {
8.0*MY_PI*qscale/volume, 2.0*MY_PI*MY_PIS/(12.0*volume),
2.0*MY_PI*MY_PIS/(192.0*volume), 8.0*MY_PI*mumurd2e/volume};
int i, kx, ky, lbytes = (2*nbox+1)*sizeof(cvector), *type = atom->type;
int func[EWALD_NFUNCS];
if (atom->torque) t = atom->torque[0];
memcpy(func, function, EWALD_NFUNCS*sizeof(int));
memset(sum, 0, EWALD_MAX_NSUMS*sizeof(vector)); // fj = -dE/dr =
for (; f<fn; f+=3) { // -i*qj*fac*
k = kvec; // Sum[conj(d)-d]
kx = ky = -1; // d = k*conj(ekj)*ek
ke = kenergy;
cek = cek_global;
memset(sum, 0, EWALD_MAX_NSUMS*sizeof(vector));
if (func[3]) {
register double di = c[3];
mui[0] = di*(mu++)[0]; mui[1] = di*(mu++)[0]; mui[2] = di*(mu++)[0];
mu++;
}
for (nh = (h = hvec)+nkvec; h<nh; ++h, ++k) {
if (ky!=k->y) { // based on order in
if (kx!=k->x) zx = z[kx = k->x].x; // reallocate
C_RMULT(zxy, z[ky = k->y].y, zx);
}
C_CRMULT(zc, z[k->z].z, zxy);
if (func[0]) { // 1/r
register double im = *(ke++)*(zc.im*cek->re+cek->im*zc.re);
if (func[3]) cek_coul = cek;
++cek;
sum[0][0] += h->x*im; sum[0][1] += h->y*im; sum[0][2] += h->z*im;
}
if (func[1]) { // geometric 1/r^6
register double im = *(ke++)*(zc.im*cek->re+cek->im*zc.re); ++cek;
sum[1][0] += h->x*im; sum[1][1] += h->y*im; sum[1][2] += h->z*im;
}
if (func[2]) { // arithmetic 1/r^6
register double im, c = *(ke++);
for (i=2; i<9; ++i) {
im = c*(zc.im*cek->re+cek->im*zc.re); ++cek;
sum[i][0] += h->x*im; sum[i][1] += h->y*im; sum[i][2] += h->z*im;
}
}
if (func[3]) { // dipole
register double im = *(ke)*(zc.im*cek->re+
cek->im*zc.re)*(mui[0]*h->x+mui[1]*h->y+mui[2]*h->z);
register double im2 = *(ke)*(zc.re*cek->re-
cek->im*zc.im);
sum[9][0] += h->x*im; sum[9][1] += h->y*im; sum[9][2] += h->z*im;
t[0] += -mui[1]*h->z*im2 + mui[2]*h->y*im2; // torque
t[1] += -mui[2]*h->x*im2 + mui[0]*h->z*im2;
t[2] += -mui[0]*h->y*im2 + mui[1]*h->x*im2;
if (func[0]) { // charge-dipole
register double qi = *(q)*c[0];
im = - *(ke)*(zc.re*cek_coul->re -
cek_coul->im*zc.im)*(mui[0]*h->x+mui[1]*h->y+mui[2]*h->z);
im += *(ke)*(zc.re*cek->re - cek->im*zc.im)*qi;
sum[9][0] += h->x*im; sum[9][1] += h->y*im; sum[9][2] += h->z*im;
im2 = *(ke)*(zc.re*cek_coul->im + cek_coul->re*zc.im);
im2 += -*(ke)*(zc.re*cek->im - cek->im*zc.re);
t[0] += -mui[1]*h->z*im2 + mui[2]*h->y*im2; // torque
t[1] += -mui[2]*h->x*im2 + mui[0]*h->z*im2;
t[2] += -mui[0]*h->y*im2 + mui[1]*h->x*im2;
}
++cek;
ke++;
}
}
if (func[0]) { // 1/r
register double qi = *(q++)*c[0];
f[0] -= sum[0][0]*qi; f[1] -= sum[0][1]*qi; f[2] -= sum[0][2]*qi;
}
if (func[1]) { // geometric 1/r^6
register double bi = B[*type]*c[1];
f[0] -= sum[1][0]*bi; f[1] -= sum[1][1]*bi; f[2] -= sum[1][2]*bi;
}
if (func[2]) { // arithmetic 1/r^6
register double *bi = B+7*type[0]+7;
for (i=2; i<9; ++i) {
register double c2 = (--bi)[0]*c[2];
f[0] -= sum[i][0]*c2; f[1] -= sum[i][1]*c2; f[2] -= sum[i][2]*c2;
}
}
if (func[3]) { // dipole
f[0] -= sum[9][0]; f[1] -= sum[9][1]; f[2] -= sum[9][2];
}
z = (cvector *) ((char *) z+lbytes);
++type;
t += 3;
}
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::compute_surface()
{
// assume conducting metal (tinfoil) boundary conditions, so this function is
// not called because dielectric at the boundary --> infinity, which makes all
// the terms here zero.
if (!function[3]) return;
if (!atom->mu) return;
vector sum_local = VECTOR_NULL, sum_total;
memset(sum_local, 0, sizeof(vector));
double *i, *n, *mu = atom->mu[0];
for (n = (i = mu) + 4*atom->nlocal; i < n; ++i) {
sum_local[0] += (i++)[0];
sum_local[1] += (i++)[0];
sum_local[2] += (i++)[0];
}
MPI_Allreduce(sum_local, sum_total, 3, MPI_DOUBLE, MPI_SUM, world);
virial_self[3] =
mumurd2e*(2.0*MY_PI*vec_dot(sum_total,sum_total)/(2.0*dielectric+1)/volume);
energy_self[3] -= virial_self[3];
if (!(vflag_atom || eflag_atom)) return;
double *ei = energy_self_peratom[0]+3;
double *vi = virial_self_peratom[0]+3;
double cv = 2.0*mumurd2e*MY_PI/(2.0*dielectric+1)/volume;
for (i = mu; i < n; i += 4, ei += EWALD_NFUNCS, vi += EWALD_NFUNCS) {
*vi = cv*(i[0]*sum_total[0]+i[1]*sum_total[1]+i[2]*sum_total[2]);
*ei -= *vi;
}
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::compute_energy()
{
energy = 0.0;
if (!eflag_global) return;
complex *cek = cek_global;
complex *cek_coul;
double *ke = kenergy;
const double qscale = force->qqrd2e * scale;
double c[EWALD_NFUNCS] = {
4.0*MY_PI*qscale/volume, 2.0*MY_PI*MY_PIS/(24.0*volume),
2.0*MY_PI*MY_PIS/(192.0*volume), 4.0*MY_PI*mumurd2e/volume};
double sum[EWALD_NFUNCS];
int func[EWALD_NFUNCS];
memcpy(func, function, EWALD_NFUNCS*sizeof(int));
memset(sum, 0, EWALD_NFUNCS*sizeof(double)); // reset sums
for (int k=0; k<nkvec; ++k) { // sum over k vectors
if (func[0]) { // 1/r
sum[0] += *(ke++)*(cek->re*cek->re+cek->im*cek->im);
if (func[3]) cek_coul = cek;
++cek;
}
if (func[1]) { // geometric 1/r^6
sum[1] += *(ke++)*(cek->re*cek->re+cek->im*cek->im); ++cek; }
if (func[2]) { // arithmetic 1/r^6
register double r =
(cek[0].re*cek[6].re+cek[0].im*cek[6].im)+
(cek[1].re*cek[5].re+cek[1].im*cek[5].im)+
(cek[2].re*cek[4].re+cek[2].im*cek[4].im)+
0.5*(cek[3].re*cek[3].re+cek[3].im*cek[3].im); cek += 7;
sum[2] += *(ke++)*r;
}
if (func[3]) { // dipole
sum[3] += *(ke)*(cek->re*cek->re+cek->im*cek->im);
if (func[0]) { // charge-dipole
sum[3] += *(ke)*2.0*(cek->re*cek_coul->im - cek->im*cek_coul->re);
}
ke++;
++cek;
}
}
for (int k=0; k<EWALD_NFUNCS; ++k) energy += c[k]*sum[k]-energy_self[k];
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::compute_energy_peratom()
{
if (!eflag_atom) return;
kvector *k;
hvector *h, *nh;
cvector *z = ekr_local;
vector mui = VECTOR_NULL;
double sum[EWALD_MAX_NSUMS];
complex *cek, zc = COMPLEX_NULL, zx = COMPLEX_NULL, zxy = COMPLEX_NULL;
complex *cek_coul;
double *q = atom->q;
double *eatomj = eatom;
double *mu = atom->mu ? atom->mu[0] : NULL;
const double qscale = force->qqrd2e * scale;
double *ke = kenergy;
double c[EWALD_NFUNCS] = {
4.0*MY_PI*qscale/volume, 2.0*MY_PI*MY_PIS/(24.0*volume),
2.0*MY_PI*MY_PIS/(192.0*volume), 4.0*MY_PI*mumurd2e/volume};
int i, kx, ky, lbytes = (2*nbox+1)*sizeof(cvector), *type = atom->type;
int func[EWALD_NFUNCS];
memcpy(func, function, EWALD_NFUNCS*sizeof(int));
for (int j = 0; j < atom->nlocal; j++, ++eatomj) {
k = kvec;
kx = ky = -1;
ke = kenergy;
cek = cek_global;
memset(sum, 0, EWALD_MAX_NSUMS*sizeof(double));
if (func[3]) {
register double di = c[3];
mui[0] = di*(mu++)[0]; mui[1] = di*(mu++)[0]; mui[2] = di*(mu++)[0];
mu++;
}
for (nh = (h = hvec)+nkvec; h<nh; ++h, ++k) {
if (ky!=k->y) { // based on order in
if (kx!=k->x) zx = z[kx = k->x].x; // reallocate
C_RMULT(zxy, z[ky = k->y].y, zx);
}
C_CRMULT(zc, z[k->z].z, zxy);
if (func[0]) { // 1/r
sum[0] += *(ke++)*(cek->re*zc.re - cek->im*zc.im);
if (func[3]) cek_coul = cek;
++cek;
}
if (func[1]) { // geometric 1/r^6
sum[1] += *(ke++)*(cek->re*zc.re - cek->im*zc.im); ++cek; }
if (func[2]) { // arithmetic 1/r^6
register double im, c = *(ke++);
for (i=2; i<9; ++i) {
im = c*(cek->re*zc.re - cek->im*zc.im); ++cek;
sum[i] += im;
}
}
if (func[3]) { // dipole
double muk = (mui[0]*h->x+mui[1]*h->y+mui[2]*h->z);
sum[9] += *(ke)*(cek->re*zc.re - cek->im*zc.im)*muk;
if (func[0]) { // charge-dipole
register double qj = *(q)*c[0];
sum[9] += *(ke)*(cek_coul->im*zc.re + cek_coul->re*zc.im)*muk;
sum[9] -= *(ke)*(cek->re*zc.im + cek->im*zc.re)*qj;
}
++cek;
ke++;
}
}
if (func[0]) { // 1/r
register double qj = *(q++)*c[0];
*eatomj += sum[0]*qj - energy_self_peratom[j][0];
}
if (func[1]) { // geometric 1/r^6
register double bj = B[*type]*c[1];
*eatomj += sum[1]*bj - energy_self_peratom[j][1];
}
if (func[2]) { // arithmetic 1/r^6
register double *bj = B+7*type[0]+7;
for (i=2; i<9; ++i) {
register double c2 = (--bj)[0]*c[2];
*eatomj += 0.5*sum[i]*c2;
}
*eatomj -= energy_self_peratom[j][2];
}
if (func[3]) { // dipole
*eatomj += sum[9] - energy_self_peratom[j][3];
}
z = (cvector *) ((char *) z+lbytes);
++type;
}
}
/* ---------------------------------------------------------------------- */
#define swap(a, b) { register double t = a; a= b; b = t; }
void EwaldDisp::compute_virial()
{
memset(virial, 0, sizeof(shape));
if (!vflag_global) return;
complex *cek = cek_global;
complex *cek_coul;
double *kv = kvirial;
const double qscale = force->qqrd2e * scale;
double c[EWALD_NFUNCS] = {
4.0*MY_PI*qscale/volume, 2.0*MY_PI*MY_PIS/(24.0*volume),
2.0*MY_PI*MY_PIS/(192.0*volume), 4.0*MY_PI*mumurd2e/volume};
shape sum[EWALD_NFUNCS];
int func[EWALD_NFUNCS];
memcpy(func, function, EWALD_NFUNCS*sizeof(int));
memset(sum, 0, EWALD_NFUNCS*sizeof(shape));
for (int k=0; k<nkvec; ++k) { // sum over k vectors
if (func[0]) { // 1/r
register double r = cek->re*cek->re+cek->im*cek->im;
if (func[3]) cek_coul = cek;
++cek;
sum[0][0] += *(kv++)*r; sum[0][1] += *(kv++)*r; sum[0][2] += *(kv++)*r;
sum[0][3] += *(kv++)*r; sum[0][4] += *(kv++)*r; sum[0][5] += *(kv++)*r;
}
if (func[1]) { // geometric 1/r^6
register double r = cek->re*cek->re+cek->im*cek->im; ++cek;
sum[1][0] += *(kv++)*r; sum[1][1] += *(kv++)*r; sum[1][2] += *(kv++)*r;
sum[1][3] += *(kv++)*r; sum[1][4] += *(kv++)*r; sum[1][5] += *(kv++)*r;
}
if (func[2]) { // arithmetic 1/r^6
register double r =
(cek[0].re*cek[6].re+cek[0].im*cek[6].im)+
(cek[1].re*cek[5].re+cek[1].im*cek[5].im)+
(cek[2].re*cek[4].re+cek[2].im*cek[4].im)+
0.5*(cek[3].re*cek[3].re+cek[3].im*cek[3].im); cek += 7;
sum[2][0] += *(kv++)*r; sum[2][1] += *(kv++)*r; sum[2][2] += *(kv++)*r;
sum[2][3] += *(kv++)*r; sum[2][4] += *(kv++)*r; sum[2][5] += *(kv++)*r;
}
if (func[3]) {
register double r = cek->re*cek->re+cek->im*cek->im;
sum[3][0] += *(kv++)*r; sum[3][1] += *(kv++)*r; sum[3][2] += *(kv++)*r;
sum[3][3] += *(kv++)*r; sum[3][4] += *(kv++)*r; sum[3][5] += *(kv++)*r;
if (func[0]) { // charge-dipole
kv -= 6;
register double r = 2.0*(cek->re*cek_coul->im - cek->im*cek_coul->re);
sum[3][0] += *(kv++)*r; sum[3][1] += *(kv++)*r; sum[3][2] += *(kv++)*r;
sum[3][3] += *(kv++)*r; sum[3][4] += *(kv++)*r; sum[3][5] += *(kv++)*r;
}
++cek;
}
}
for (int k=0; k<EWALD_NFUNCS; ++k)
if (func[k]) {
shape self = {virial_self[k], virial_self[k], virial_self[k], 0, 0, 0};
shape_scalar_mult(sum[k], c[k]);
shape_add(virial, sum[k]);
shape_subtr(virial, self);
}
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::compute_virial_dipole()
{
if (!function[3]) return;
if (!vflag_atom && !vflag_global) return;
kvector *k;
hvector *h, *nh;
cvector *z = ekr_local;
vector mui = COMPLEX_NULL;
double sum[6];
double sum_total[6];
complex *cek, zc, zx = COMPLEX_NULL, zxy = COMPLEX_NULL;
complex *cek_coul;
double *mu = atom->mu ? atom->mu[0] : NULL;
double *vatomj = NULL;
if (vflag_atom && vatom) vatomj = vatom[0];
const double qscale = force->qqrd2e * scale;
double *ke, c[EWALD_NFUNCS] = {
8.0*MY_PI*qscale/volume, 2.0*MY_PI*MY_PIS/(12.0*volume),
2.0*MY_PI*MY_PIS/(192.0*volume), 8.0*MY_PI*mumurd2e/volume};
int i, kx, ky, lbytes = (2*nbox+1)*sizeof(cvector), *type = atom->type;
int func[EWALD_NFUNCS];
memcpy(func, function, EWALD_NFUNCS*sizeof(int));
memset(&sum[0], 0, 6*sizeof(double));
memset(&sum_total[0], 0, 6*sizeof(double));
for (int j = 0; j < atom->nlocal; j++) {
k = kvec;
kx = ky = -1;
ke = kenergy;
cek = cek_global;
memset(&sum[0], 0, 6*sizeof(double));
if (func[3]) {
register double di = c[3];
mui[0] = di*(mu++)[0]; mui[1] = di*(mu++)[0]; mui[2] = di*(mu++)[0];
mu++;
}
for (nh = (h = hvec)+nkvec; h<nh; ++h, ++k) {
if (ky!=k->y) { // based on order in
if (kx!=k->x) zx = z[kx = k->x].x; // reallocate
C_RMULT(zxy, z[ky = k->y].y, zx);
}
C_CRMULT(zc, z[k->z].z, zxy);
double im = 0.0;
if (func[0]) { // 1/r
ke++;
if (func[3]) cek_coul = cek;
++cek;
}
if (func[1]) { // geometric 1/r^6
ke++;
++cek;
}
if (func[2]) { // arithmetic 1/r^6
ke++;
for (i=2; i<9; ++i) {
++cek;
}
}
if (func[3]) { // dipole
im = *(ke)*(zc.re*cek->re - cek->im*zc.im);
if (func[0]) { // charge-dipole
im += *(ke)*(zc.im*cek_coul->re + cek_coul->im*zc.re);
}
sum[0] -= mui[0]*h->x*im;
sum[1] -= mui[1]*h->y*im;
sum[2] -= mui[2]*h->z*im;
sum[3] -= mui[0]*h->y*im;
sum[4] -= mui[0]*h->z*im;
sum[5] -= mui[1]*h->z*im;
++cek;
ke++;
}
}
if (vflag_global)
for (int n = 0; n < 6; n++)
sum_total[n] -= sum[n];
if (vflag_atom)
for (int n = 0; n < 6; n++)
vatomj[n] -= sum[n];
z = (cvector *) ((char *) z+lbytes);
++type;
if (vflag_atom) vatomj += 6;
}
if (vflag_global) {
MPI_Allreduce(&sum_total[0],&sum[0],6,MPI_DOUBLE,MPI_SUM,world);
for (int n = 0; n < 6; n++)
virial[n] += sum[n];
}
}
/* ---------------------------------------------------------------------- */
void EwaldDisp::compute_virial_peratom()
{
if (!vflag_atom) return;
kvector *k;
hvector *h, *nh;
cvector *z = ekr_local;
vector mui = VECTOR_NULL;
complex *cek, zc = COMPLEX_NULL, zx = COMPLEX_NULL, zxy = COMPLEX_NULL;
complex *cek_coul;
double *kv;
double *q = atom->q;
double *vatomj = vatom ? vatom[0] : NULL;
double *mu = atom->mu ? atom->mu[0] : NULL;
const double qscale = force->qqrd2e * scale;
double c[EWALD_NFUNCS] = {
4.0*MY_PI*qscale/volume, 2.0*MY_PI*MY_PIS/(24.0*volume),
2.0*MY_PI*MY_PIS/(192.0*volume), 4.0*MY_PI*mumurd2e/volume};
shape sum[EWALD_MAX_NSUMS];
int func[EWALD_NFUNCS];
memcpy(func, function, EWALD_NFUNCS*sizeof(int));
int i, kx, ky, lbytes = (2*nbox+1)*sizeof(cvector), *type = atom->type;
for (int j = 0; j < atom->nlocal; j++) {
k = kvec;
kx = ky = -1;
kv = kvirial;
cek = cek_global;
memset(sum, 0, EWALD_MAX_NSUMS*sizeof(shape));
if (func[3]) {
register double di = c[3];
mui[0] = di*(mu++)[0]; mui[1] = di*(mu++)[0]; mui[2] = di*(mu++)[0];
mu++;
}
for (nh = (h = hvec)+nkvec; h<nh; ++h, ++k) {
if (ky!=k->y) { // based on order in
if (kx!=k->x) zx = z[kx = k->x].x; // reallocate
C_RMULT(zxy, z[ky = k->y].y, zx);
}
C_CRMULT(zc, z[k->z].z, zxy);
if (func[0]) { // 1/r
if (func[3]) cek_coul = cek;
register double r = cek->re*zc.re - cek->im*zc.im; ++cek;
sum[0][0] += *(kv++)*r;
sum[0][1] += *(kv++)*r;
sum[0][2] += *(kv++)*r;
sum[0][3] += *(kv++)*r;
sum[0][4] += *(kv++)*r;
sum[0][5] += *(kv++)*r;
}
if (func[1]) { // geometric 1/r^6
register double r = cek->re*zc.re - cek->im*zc.im; ++cek;
sum[1][0] += *(kv++)*r;
sum[1][1] += *(kv++)*r;
sum[1][2] += *(kv++)*r;
sum[1][3] += *(kv++)*r;
sum[1][4] += *(kv++)*r;
sum[1][5] += *(kv++)*r;
}
if (func[2]) { // arithmetic 1/r^6
register double r;
for (i=2; i<9; ++i) {
r = cek->re*zc.re - cek->im*zc.im; ++cek;
sum[i][0] += *(kv++)*r;
sum[i][1] += *(kv++)*r;
sum[i][2] += *(kv++)*r;
sum[i][3] += *(kv++)*r;
sum[i][4] += *(kv++)*r;
sum[i][5] += *(kv++)*r;
kv -= 6;
}
kv += 6;
}
if (func[3]) { // dipole
double muk = (mui[0]*h->x+mui[1]*h->y+mui[2]*h->z);
register double
r = (cek->re*zc.re - cek->im*zc.im)*muk;
sum[9][0] += *(kv++)*r;
sum[9][1] += *(kv++)*r;
sum[9][2] += *(kv++)*r;
sum[9][3] += *(kv++)*r;
sum[9][4] += *(kv++)*r;
sum[9][5] += *(kv++)*r;
if (func[0]) { // charge-dipole
kv -= 6;
register double qj = *(q)*c[0];
r = (cek_coul->im*zc.re + cek_coul->re*zc.im)*muk;
r += -(cek->re*zc.im + cek->im*zc.re)*qj;
sum[9][0] += *(kv++)*r; sum[9][1] += *(kv++)*r; sum[9][2] += *(kv++)*r;
sum[9][3] += *(kv++)*r; sum[9][4] += *(kv++)*r; sum[9][5] += *(kv++)*r;
}
++cek;
}
}
if (func[0]) { // 1/r
register double qi = *(q++)*c[0];
for (int n = 0; n < 6; n++) vatomj[n] += sum[0][n]*qi;
}
if (func[1]) { // geometric 1/r^6
register double bi = B[*type]*c[1];
for (int n = 0; n < 6; n++) vatomj[n] += sum[1][n]*bi;
}
if (func[2]) { // arithmetic 1/r^6
register double *bj = B+7*type[0]+7;
for (i=2; i<9; ++i) {
register double c2 = (--bj)[0]*c[2];
for (int n = 0; n < 6; n++) vatomj[n] += 0.5*sum[i][n]*c2;
}
}
if (func[3]) { // dipole
for (int n = 0; n < 6; n++) vatomj[n] += sum[9][n];
}
for (int k=0; k<EWALD_NFUNCS; ++k) {
if (func[k]) {
for (int n = 0; n < 3; n++) vatomj[n] -= virial_self_peratom[j][k];
}
}
z = (cvector *) ((char *) z+lbytes);
++type;
vatomj += 6;
}
}
/* ----------------------------------------------------------------------
Slab-geometry correction term to dampen inter-slab interactions between
periodically repeating slabs. Yields good approximation to 2D Ewald if
adequate empty space is left between repeating slabs (J. Chem. Phys.
111, 3155). Slabs defined here to be parallel to the xy plane. Also
extended to non-neutral systems (J. Chem. Phys. 131, 094107).
------------------------------------------------------------------------- */
void EwaldDisp::compute_slabcorr()
{
// compute local contribution to global dipole moment
double *q = atom->q;
double **x = atom->x;
double zprd = domain->zprd;
int nlocal = atom->nlocal;
double dipole = 0.0;
for (int i = 0; i < nlocal; i++) dipole += q[i]*x[i][2];
if (function[3] && atom->mu) {
double **mu = atom->mu;
for (int i = 0; i < nlocal; i++) dipole += mu[i][2];
}
// sum local contributions to get global dipole moment
double dipole_all;
MPI_Allreduce(&dipole,&dipole_all,1,MPI_DOUBLE,MPI_SUM,world);
// need to make non-neutral systems and/or
// per-atom energy translationally invariant
double dipole_r2 = 0.0;
if (eflag_atom || fabs(qsum) > SMALL) {
if (function[3] && atom->mu)
error->all(FLERR,"Cannot (yet) use kspace slab correction with "
"long-range dipoles and non-neutral systems or per-atom energy");
for (int i = 0; i < nlocal; i++)
dipole_r2 += q[i]*x[i][2]*x[i][2];
// sum local contributions
double tmp;
MPI_Allreduce(&dipole_r2,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
dipole_r2 = tmp;
}
// compute corrections
const double e_slabcorr = MY_2PI*(dipole_all*dipole_all -
qsum*dipole_r2 - qsum*qsum*zprd*zprd/12.0)/volume;
const double qscale = force->qqrd2e * scale;
if (eflag_global) energy += qscale * e_slabcorr;
// per-atom energy
if (eflag_atom) {
double efact = qscale * MY_2PI/volume;
for (int i = 0; i < nlocal; i++)
eatom[i] += efact * q[i]*(x[i][2]*dipole_all - 0.5*(dipole_r2 +
qsum*x[i][2]*x[i][2]) - qsum*zprd*zprd/12.0);
}
// add on force corrections
double ffact = qscale * (-4.0*MY_PI/volume);
double **f = atom->f;
for (int i = 0; i < nlocal; i++)
f[i][2] += ffact * q[i]*(dipole_all - qsum*x[i][2]);
// add on torque corrections
if (function[3] && atom->mu && atom->torque) {
double **mu = atom->mu;
double **torque = atom->torque;
for (int i = 0; i < nlocal; i++) {
torque[i][0] += ffact * dipole_all * mu[i][1];
torque[i][1] += -ffact * dipole_all * mu[i][0];
}
}
}
/* ----------------------------------------------------------------------
Newton solver used to find g_ewald for LJ systems
------------------------------------------------------------------------- */
double EwaldDisp::NewtonSolve(double x, double Rc,
bigint natoms, double vol, double b2)
{
double dx,tol;
int maxit;
maxit = 10000; //Maximum number of iterations
tol = 0.00001; //Convergence tolerance
//Begin algorithm
for (int i = 0; i < maxit; i++) {
dx = f(x,Rc,natoms,vol,b2) / derivf(x,Rc,natoms,vol,b2);
x = x - dx; //Update x
if (fabs(dx) < tol) return x;
if (x < 0 || x != x) // solver failed
return -1;
}
return -1;
}
/* ----------------------------------------------------------------------
Calculate f(x)
------------------------------------------------------------------------- */
double EwaldDisp::f(double x, double Rc, bigint natoms, double vol, double b2)
{
double a = Rc*x;
double f = 0.0;
if (function[3]) { // dipole
double rg2 = a*a;
double rg4 = rg2*rg2;
double rg6 = rg4*rg2;
double Cc = 4.0*rg4 + 6.0*rg2 + 3.0;
double Dc = 8.0*rg6 + 20.0*rg4 + 30.0*rg2 + 15.0;
f = (b2/(sqrt(vol*powint(x,4)*powint(Rc,9)*natoms)) *
sqrt(13.0/6.0*Cc*Cc + 2.0/15.0*Dc*Dc - 13.0/15.0*Cc*Dc) *
exp(-rg2)) - accuracy;
} else if (function[1] || function[2]) { // LJ
f = (4.0*MY_PI*b2*powint(x,4)/vol/sqrt((double)natoms)*erfc(a) *
(6.0*powint(a,-5) + 6.0*powint(a,-3) + 3.0/a + a) - accuracy);
}
return f;
}
/* ----------------------------------------------------------------------
Calculate numerical derivative f'(x)
------------------------------------------------------------------------- */
double EwaldDisp::derivf(double x, double Rc,
bigint natoms, double vol, double b2)
{
double h = 0.000001; //Derivative step-size
return (f(x + h,Rc,natoms,vol,b2) - f(x,Rc,natoms,vol,b2)) / h;
}
diff --git a/src/KSPACE/pair_lj_charmmfsw_coul_long.cpp b/src/KSPACE/pair_lj_charmmfsw_coul_long.cpp
index 11c7a147e..6e17a9bbd 100644
--- a/src/KSPACE/pair_lj_charmmfsw_coul_long.cpp
+++ b/src/KSPACE/pair_lj_charmmfsw_coul_long.cpp
@@ -1,1078 +1,1078 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Paul Crozier (SNL)
The lj-fsw sections (force-switched) were provided by
Robert Meissner and Lucio Colombi Ciacchi of
Bremen University, Bremen, Germany, with
additional assistance from Robert A. Latour, Clemson University
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "pair_lj_charmmfsw_coul_long.h"
#include "atom.h"
#include "comm.h"
#include "force.h"
#include "kspace.h"
#include "update.h"
#include "integrate.h"
#include "respa.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
#define EWALD_F 1.12837917
#define EWALD_P 0.3275911
#define A1 0.254829592
#define A2 -0.284496736
#define A3 1.421413741
#define A4 -1.453152027
#define A5 1.061405429
/* ---------------------------------------------------------------------- */
PairLJCharmmfswCoulLong::PairLJCharmmfswCoulLong(LAMMPS *lmp) : Pair(lmp)
{
respa_enable = 1;
ewaldflag = pppmflag = 1;
ftable = NULL;
implicit = 0;
mix_flag = ARITHMETIC;
writedata = 1;
+
+ // short-range/long-range flag accessed by DihedralCharmmfsw
+
+ dihedflag = 1;
}
/* ---------------------------------------------------------------------- */
PairLJCharmmfswCoulLong::~PairLJCharmmfswCoulLong()
{
if (!copymode) {
if (allocated) {
memory->destroy(setflag);
memory->destroy(cutsq);
memory->destroy(epsilon);
memory->destroy(sigma);
memory->destroy(eps14);
memory->destroy(sigma14);
memory->destroy(lj1);
memory->destroy(lj2);
memory->destroy(lj3);
memory->destroy(lj4);
memory->destroy(lj14_1);
memory->destroy(lj14_2);
memory->destroy(lj14_3);
memory->destroy(lj14_4);
}
if (ftable) free_tables();
}
}
/* ---------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::compute(int eflag, int vflag)
{
int i,j,ii,jj,inum,jnum,itype,jtype,itable;
double qtmp,xtmp,ytmp,ztmp,delx,dely,delz,evdwl,evdwl12,evdwl6,ecoul,fpair;
double fraction,table;
double r,rinv,r2inv,r3inv,r6inv,rsq,forcecoul,forcelj,factor_coul,factor_lj;
double grij,expm2,prefactor,t,erfc;
double switch1;
int *ilist,*jlist,*numneigh,**firstneigh;
evdwl = ecoul = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
double **x = atom->x;
double **f = atom->f;
double *q = atom->q;
int *type = atom->type;
int nlocal = atom->nlocal;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
// loop over neighbors of my atoms
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
qtmp = q[i];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
itype = type[i];
jlist = firstneigh[i];
jnum = numneigh[i];
for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
factor_lj = special_lj[sbmask(j)];
factor_coul = special_coul[sbmask(j)];
j &= NEIGHMASK;
delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq < cut_bothsq) {
r2inv = 1.0/rsq;
if (rsq < cut_coulsq) {
if (!ncoultablebits || rsq <= tabinnersq) {
r = sqrt(rsq);
grij = g_ewald * r;
expm2 = exp(-grij*grij);
t = 1.0 / (1.0 + EWALD_P*grij);
erfc = t * (A1+t*(A2+t*(A3+t*(A4+t*A5)))) * expm2;
prefactor = qqrd2e * qtmp*q[j]/r;
forcecoul = prefactor * (erfc + EWALD_F*grij*expm2);
if (factor_coul < 1.0) forcecoul -= (1.0-factor_coul)*prefactor;
} else {
union_int_float_t rsq_lookup;
rsq_lookup.f = rsq;
itable = rsq_lookup.i & ncoulmask;
itable >>= ncoulshiftbits;
fraction = (rsq_lookup.f - rtable[itable]) * drtable[itable];
table = ftable[itable] + fraction*dftable[itable];
forcecoul = qtmp*q[j] * table;
if (factor_coul < 1.0) {
table = ctable[itable] + fraction*dctable[itable];
prefactor = qtmp*q[j] * table;
forcecoul -= (1.0-factor_coul)*prefactor;
}
}
} else forcecoul = 0.0;
if (rsq < cut_ljsq) {
r6inv = r2inv*r2inv*r2inv;
jtype = type[j];
forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
if (rsq > cut_lj_innersq) {
switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
(cut_ljsq + 2.0*rsq - 3.0*cut_lj_innersq) / denom_lj;
forcelj = forcelj*switch1;
}
} else forcelj = 0.0;
fpair = (forcecoul + factor_lj*forcelj) * r2inv;
f[i][0] += delx*fpair;
f[i][1] += dely*fpair;
f[i][2] += delz*fpair;
if (newton_pair || j < nlocal) {
f[j][0] -= delx*fpair;
f[j][1] -= dely*fpair;
f[j][2] -= delz*fpair;
}
if (eflag) {
if (rsq < cut_coulsq) {
if (!ncoultablebits || rsq <= tabinnersq)
ecoul = prefactor*erfc;
else {
table = etable[itable] + fraction*detable[itable];
ecoul = qtmp*q[j] * table;
}
if (factor_coul < 1.0) ecoul -= (1.0-factor_coul)*prefactor;
} else ecoul = 0.0;
if (rsq < cut_ljsq) {
if (rsq > cut_lj_innersq) {
r = sqrt(rsq);
rinv = 1.0/r;
r3inv = rinv*rinv*rinv;
evdwl12 = lj3[itype][jtype]*cut_lj6*denom_lj12 *
(r6inv - cut_lj6inv)*(r6inv - cut_lj6inv);
evdwl6 = -lj4[itype][jtype]*cut_lj3*denom_lj6 *
(r3inv - cut_lj3inv)*(r3inv - cut_lj3inv);;
evdwl = evdwl12 + evdwl6;
} else {
evdwl12 = r6inv*lj3[itype][jtype]*r6inv -
lj3[itype][jtype]*cut_lj_inner6inv*cut_lj6inv;
evdwl6 = -lj4[itype][jtype]*r6inv +
lj4[itype][jtype]*cut_lj_inner3inv*cut_lj3inv;
evdwl = evdwl12 + evdwl6;
}
evdwl *= factor_lj;
} else evdwl = 0.0;
}
if (evflag) ev_tally(i,j,nlocal,newton_pair,
evdwl,ecoul,fpair,delx,dely,delz);
}
}
}
if (vflag_fdotr) virial_fdotr_compute();
}
/* ---------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::compute_inner()
{
int i,j,ii,jj,inum,jnum,itype,jtype;
double qtmp,xtmp,ytmp,ztmp,delx,dely,delz,fpair;
double rsq,r2inv,r6inv,forcecoul,forcelj,factor_coul,factor_lj;
double rsw;
int *ilist,*jlist,*numneigh,**firstneigh;
double **x = atom->x;
double **f = atom->f;
double *q = atom->q;
int *type = atom->type;
int nlocal = atom->nlocal;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
inum = listinner->inum;
ilist = listinner->ilist;
numneigh = listinner->numneigh;
firstneigh = listinner->firstneigh;
double cut_out_on = cut_respa[0];
double cut_out_off = cut_respa[1];
double cut_out_diff = cut_out_off - cut_out_on;
double cut_out_on_sq = cut_out_on*cut_out_on;
double cut_out_off_sq = cut_out_off*cut_out_off;
// loop over neighbors of my atoms
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
qtmp = q[i];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
itype = type[i];
jlist = firstneigh[i];
jnum = numneigh[i];
for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
factor_lj = special_lj[sbmask(j)];
factor_coul = special_coul[sbmask(j)];
j &= NEIGHMASK;
delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq < cut_out_off_sq) {
r2inv = 1.0/rsq;
forcecoul = qqrd2e * qtmp*q[j]*sqrt(r2inv);
if (factor_coul < 1.0) forcecoul -= (1.0-factor_coul)*forcecoul;
r6inv = r2inv*r2inv*r2inv;
jtype = type[j];
forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
fpair = (forcecoul + factor_lj*forcelj) * r2inv;
if (rsq > cut_out_on_sq) {
rsw = (sqrt(rsq) - cut_out_on)/cut_out_diff;
fpair *= 1.0 + rsw*rsw*(2.0*rsw-3.0);
}
f[i][0] += delx*fpair;
f[i][1] += dely*fpair;
f[i][2] += delz*fpair;
if (newton_pair || j < nlocal) {
f[j][0] -= delx*fpair;
f[j][1] -= dely*fpair;
f[j][2] -= delz*fpair;
}
}
}
}
}
/* ---------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::compute_middle()
{
int i,j,ii,jj,inum,jnum,itype,jtype;
double qtmp,xtmp,ytmp,ztmp,delx,dely,delz,fpair;
double rsq,r2inv,r6inv,forcecoul,forcelj,factor_coul,factor_lj;
double switch1;
double rsw;
int *ilist,*jlist,*numneigh,**firstneigh;
double **x = atom->x;
double **f = atom->f;
double *q = atom->q;
int *type = atom->type;
int nlocal = atom->nlocal;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
inum = listmiddle->inum;
ilist = listmiddle->ilist;
numneigh = listmiddle->numneigh;
firstneigh = listmiddle->firstneigh;
double cut_in_off = cut_respa[0];
double cut_in_on = cut_respa[1];
double cut_out_on = cut_respa[2];
double cut_out_off = cut_respa[3];
double cut_in_diff = cut_in_on - cut_in_off;
double cut_out_diff = cut_out_off - cut_out_on;
double cut_in_off_sq = cut_in_off*cut_in_off;
double cut_in_on_sq = cut_in_on*cut_in_on;
double cut_out_on_sq = cut_out_on*cut_out_on;
double cut_out_off_sq = cut_out_off*cut_out_off;
// loop over neighbors of my atoms
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
qtmp = q[i];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
itype = type[i];
jlist = firstneigh[i];
jnum = numneigh[i];
for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
factor_lj = special_lj[sbmask(j)];
factor_coul = special_coul[sbmask(j)];
j &= NEIGHMASK;
delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq < cut_out_off_sq && rsq > cut_in_off_sq) {
r2inv = 1.0/rsq;
forcecoul = qqrd2e * qtmp*q[j]*sqrt(r2inv);
if (factor_coul < 1.0) forcecoul -= (1.0-factor_coul)*forcecoul;
r6inv = r2inv*r2inv*r2inv;
jtype = type[j];
forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
if (rsq > cut_lj_innersq) {
switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
(cut_ljsq + 2.0*rsq - 3.0*cut_lj_innersq) / denom_lj;
forcelj = forcelj*switch1;
}
fpair = (forcecoul + factor_lj*forcelj) * r2inv;
if (rsq < cut_in_on_sq) {
rsw = (sqrt(rsq) - cut_in_off)/cut_in_diff;
fpair *= rsw*rsw*(3.0 - 2.0*rsw);
}
if (rsq > cut_out_on_sq) {
rsw = (sqrt(rsq) - cut_out_on)/cut_out_diff;
fpair *= 1.0 + rsw*rsw*(2.0*rsw - 3.0);
}
f[i][0] += delx*fpair;
f[i][1] += dely*fpair;
f[i][2] += delz*fpair;
if (newton_pair || j < nlocal) {
f[j][0] -= delx*fpair;
f[j][1] -= dely*fpair;
f[j][2] -= delz*fpair;
}
}
}
}
}
/* ---------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::compute_outer(int eflag, int vflag)
{
int i,j,ii,jj,inum,jnum,itype,jtype,itable;
double qtmp,xtmp,ytmp,ztmp,delx,dely,delz,evdwl,evdwl6,evdwl12,ecoul,fpair;
double fraction,table;
double r,rinv,r2inv,r3inv,r6inv,forcecoul,forcelj,factor_coul,factor_lj;
double grij,expm2,prefactor,t,erfc;
double switch1;
double rsw;
int *ilist,*jlist,*numneigh,**firstneigh;
double rsq;
evdwl = ecoul = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = 0;
double **x = atom->x;
double **f = atom->f;
double *q = atom->q;
int *type = atom->type;
int nlocal = atom->nlocal;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
inum = listouter->inum;
ilist = listouter->ilist;
numneigh = listouter->numneigh;
firstneigh = listouter->firstneigh;
double cut_in_off = cut_respa[2];
double cut_in_on = cut_respa[3];
double cut_in_diff = cut_in_on - cut_in_off;
double cut_in_off_sq = cut_in_off*cut_in_off;
double cut_in_on_sq = cut_in_on*cut_in_on;
// loop over neighbors of my atoms
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
qtmp = q[i];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
itype = type[i];
jlist = firstneigh[i];
jnum = numneigh[i];
for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
factor_lj = special_lj[sbmask(j)];
factor_coul = special_coul[sbmask(j)];
j &= NEIGHMASK;
delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
jtype = type[j];
if (rsq < cut_bothsq) {
r2inv = 1.0/rsq;
if (rsq < cut_coulsq) {
if (!ncoultablebits || rsq <= tabinnersq) {
r = sqrt(rsq);
grij = g_ewald * r;
expm2 = exp(-grij*grij);
t = 1.0 / (1.0 + EWALD_P*grij);
erfc = t * (A1+t*(A2+t*(A3+t*(A4+t*A5)))) * expm2;
prefactor = qqrd2e * qtmp*q[j]/r;
forcecoul = prefactor * (erfc + EWALD_F*grij*expm2 - 1.0);
if (rsq > cut_in_off_sq) {
if (rsq < cut_in_on_sq) {
rsw = (r - cut_in_off)/cut_in_diff;
forcecoul += prefactor*rsw*rsw*(3.0 - 2.0*rsw);
if (factor_coul < 1.0)
forcecoul -=
(1.0-factor_coul)*prefactor*rsw*rsw*(3.0 - 2.0*rsw);
} else {
forcecoul += prefactor;
if (factor_coul < 1.0)
forcecoul -= (1.0-factor_coul)*prefactor;
}
}
} else {
union_int_float_t rsq_lookup;
rsq_lookup.f = rsq;
itable = rsq_lookup.i & ncoulmask;
itable >>= ncoulshiftbits;
fraction = (rsq_lookup.f - rtable[itable]) * drtable[itable];
table = ftable[itable] + fraction*dftable[itable];
forcecoul = qtmp*q[j] * table;
if (factor_coul < 1.0) {
table = ctable[itable] + fraction*dctable[itable];
prefactor = qtmp*q[j] * table;
forcecoul -= (1.0-factor_coul)*prefactor;
}
}
} else forcecoul = 0.0;
if (rsq < cut_ljsq && rsq > cut_in_off_sq) {
r6inv = r2inv*r2inv*r2inv;
forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
if (rsq > cut_lj_innersq) {
switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
(cut_ljsq + 2.0*rsq - 3.0*cut_lj_innersq) / denom_lj;
forcelj = forcelj*switch1;
}
if (rsq < cut_in_on_sq) {
rsw = (sqrt(rsq) - cut_in_off)/cut_in_diff;
forcelj *= rsw*rsw*(3.0 - 2.0*rsw);
}
} else forcelj = 0.0;
fpair = (forcecoul + forcelj) * r2inv;
f[i][0] += delx*fpair;
f[i][1] += dely*fpair;
f[i][2] += delz*fpair;
if (newton_pair || j < nlocal) {
f[j][0] -= delx*fpair;
f[j][1] -= dely*fpair;
f[j][2] -= delz*fpair;
}
if (eflag) {
if (rsq < cut_coulsq) {
if (!ncoultablebits || rsq <= tabinnersq) {
ecoul = prefactor*erfc;
if (factor_coul < 1.0) ecoul -= (1.0-factor_coul)*prefactor;
} else {
table = etable[itable] + fraction*detable[itable];
ecoul = qtmp*q[j] * table;
if (factor_coul < 1.0) {
table = ptable[itable] + fraction*dptable[itable];
prefactor = qtmp*q[j] * table;
ecoul -= (1.0-factor_coul)*prefactor;
}
}
} else ecoul = 0.0;
if (rsq < cut_ljsq) {
r6inv = r2inv*r2inv*r2inv;
evdwl = r6inv*(lj3[itype][jtype]*r6inv-lj4[itype][jtype]);
if (rsq > cut_lj_innersq) {
rinv = 1.0/r;
r3inv = rinv*rinv*rinv;
evdwl12 = lj3[itype][jtype]*cut_lj6*denom_lj12 *
(r6inv - cut_lj6inv)*(r6inv - cut_lj6inv);
evdwl6 = -lj4[itype][jtype]*cut_lj3*denom_lj6 *
(r3inv - cut_lj3inv)*(r3inv - cut_lj3inv);;
evdwl = evdwl12 + evdwl6;
} else {
evdwl12 = r6inv*lj3[itype][jtype]*r6inv -
lj3[itype][jtype]*cut_lj_inner6inv*cut_lj6inv;
evdwl6 = -lj4[itype][jtype]*r6inv +
lj4[itype][jtype]*cut_lj_inner3inv*cut_lj3inv;
evdwl = evdwl12 + evdwl6;
}
evdwl *= factor_lj;
} else evdwl = 0.0;
}
if (vflag) {
if (rsq < cut_coulsq) {
if (!ncoultablebits || rsq <= tabinnersq) {
forcecoul = prefactor * (erfc + EWALD_F*grij*expm2);
if (factor_coul < 1.0) forcecoul -= (1.0-factor_coul)*prefactor;
} else {
table = vtable[itable] + fraction*dvtable[itable];
forcecoul = qtmp*q[j] * table;
if (factor_coul < 1.0) {
table = ptable[itable] + fraction*dptable[itable];
prefactor = qtmp*q[j] * table;
forcecoul -= (1.0-factor_coul)*prefactor;
}
}
} else forcecoul = 0.0;
if (rsq <= cut_in_off_sq) {
r6inv = r2inv*r2inv*r2inv;
forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
if (rsq > cut_lj_innersq) {
switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
(cut_ljsq + 2.0*rsq - 3.0*cut_lj_innersq) / denom_lj;
forcelj = forcelj*switch1;
}
} else if (rsq <= cut_in_on_sq) {
forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
if (rsq > cut_lj_innersq) {
switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
(cut_ljsq + 2.0*rsq - 3.0*cut_lj_innersq) / denom_lj;
forcelj = forcelj*switch1;
}
}
fpair = (forcecoul + factor_lj*forcelj) * r2inv;
}
if (evflag) ev_tally(i,j,nlocal,newton_pair,
evdwl,ecoul,fpair,delx,dely,delz);
}
}
}
}
/* ----------------------------------------------------------------------
allocate all arrays
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::allocate()
{
allocated = 1;
int n = atom->ntypes;
memory->create(setflag,n+1,n+1,"pair:setflag");
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j++)
setflag[i][j] = 0;
memory->create(cutsq,n+1,n+1,"pair:cutsq");
memory->create(epsilon,n+1,n+1,"pair:epsilon");
memory->create(sigma,n+1,n+1,"pair:sigma");
memory->create(eps14,n+1,n+1,"pair:eps14");
memory->create(sigma14,n+1,n+1,"pair:sigma14");
memory->create(lj1,n+1,n+1,"pair:lj1");
memory->create(lj2,n+1,n+1,"pair:lj2");
memory->create(lj3,n+1,n+1,"pair:lj3");
memory->create(lj4,n+1,n+1,"pair:lj4");
memory->create(lj14_1,n+1,n+1,"pair:lj14_1");
memory->create(lj14_2,n+1,n+1,"pair:lj14_2");
memory->create(lj14_3,n+1,n+1,"pair:lj14_3");
memory->create(lj14_4,n+1,n+1,"pair:lj14_4");
}
/* ----------------------------------------------------------------------
global settings
unlike other pair styles,
there are no individual pair settings that these override
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::settings(int narg, char **arg)
{
if (narg != 2 && narg != 3) error->all(FLERR,"Illegal pair_style command");
cut_lj_inner = force->numeric(FLERR,arg[0]);
cut_lj = force->numeric(FLERR,arg[1]);
if (narg == 2) cut_coul = cut_lj;
else cut_coul = force->numeric(FLERR,arg[2]);
-
- // indicates pair_style being used for dihedral_charmm
-
- dihedflag = 1;
}
/* ----------------------------------------------------------------------
set coeffs for one or more type pairs
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::coeff(int narg, char **arg)
{
if (narg != 4 && narg != 6) error->all(FLERR,"Illegal pair_coeff command");
if (!allocated) allocate();
int ilo,ihi,jlo,jhi;
force->bounds(FLERR,arg[0],atom->ntypes,ilo,ihi);
force->bounds(FLERR,arg[1],atom->ntypes,jlo,jhi);
double epsilon_one = force->numeric(FLERR,arg[2]);
double sigma_one = force->numeric(FLERR,arg[3]);
double eps14_one = epsilon_one;
double sigma14_one = sigma_one;
if (narg == 6) {
eps14_one = force->numeric(FLERR,arg[4]);
sigma14_one = force->numeric(FLERR,arg[5]);
}
int count = 0;
for (int i = ilo; i <= ihi; i++) {
for (int j = MAX(jlo,i); j <= jhi; j++) {
epsilon[i][j] = epsilon_one;
sigma[i][j] = sigma_one;
eps14[i][j] = eps14_one;
sigma14[i][j] = sigma14_one;
setflag[i][j] = 1;
count++;
}
}
if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
}
/* ----------------------------------------------------------------------
init specific to this pair style
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::init_style()
{
if (!atom->q_flag)
error->all(FLERR,
"Pair style lj/charmmfsw/coul/long requires atom attribute q");
// request regular or rRESPA neighbor lists
int irequest;
if (update->whichflag == 1 && strstr(update->integrate_style,"respa")) {
int respa = 0;
if (((Respa *) update->integrate)->level_inner >= 0) respa = 1;
if (((Respa *) update->integrate)->level_middle >= 0) respa = 2;
if (respa == 0) irequest = neighbor->request(this,instance_me);
else if (respa == 1) {
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respainner = 1;
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 3;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respaouter = 1;
} else {
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respainner = 1;
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 2;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respamiddle = 1;
irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->id = 3;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->respaouter = 1;
}
} else irequest = neighbor->request(this,instance_me);
// require cut_lj_inner < cut_lj
if (cut_lj_inner >= cut_lj)
error->all(FLERR,"Pair inner cutoff >= Pair outer cutoff");
cut_lj_innersq = cut_lj_inner * cut_lj_inner;
cut_ljsq = cut_lj * cut_lj;
cut_ljinv = 1.0/cut_lj;
cut_lj_innerinv = 1.0/cut_lj_inner;
cut_lj3 = cut_lj * cut_lj * cut_lj;
cut_lj3inv = cut_ljinv * cut_ljinv * cut_ljinv;
cut_lj_inner3inv = cut_lj_innerinv * cut_lj_innerinv * cut_lj_innerinv;
cut_lj_inner3 = cut_lj_inner * cut_lj_inner * cut_lj_inner;
cut_lj6 = cut_ljsq * cut_ljsq * cut_ljsq;
cut_lj6inv = cut_lj3inv * cut_lj3inv;
cut_lj_inner6inv = cut_lj_inner3inv * cut_lj_inner3inv;
cut_lj_inner6 = cut_lj_innersq * cut_lj_innersq * cut_lj_innersq;
cut_coulsq = cut_coul * cut_coul;
cut_bothsq = MAX(cut_ljsq,cut_coulsq);
denom_lj = (cut_ljsq-cut_lj_innersq) * (cut_ljsq-cut_lj_innersq) *
(cut_ljsq-cut_lj_innersq);
denom_lj12 = 1.0/(cut_lj6 - cut_lj_inner6);
denom_lj6 = 1.0/(cut_lj3 - cut_lj_inner3);
// set & error check interior rRESPA cutoffs
if (strstr(update->integrate_style,"respa") &&
((Respa *) update->integrate)->level_inner >= 0) {
cut_respa = ((Respa *) update->integrate)->cutoff;
if (MIN(cut_lj,cut_coul) < cut_respa[3])
error->all(FLERR,"Pair cutoff < Respa interior cutoff");
if (cut_lj_inner < cut_respa[1])
error->all(FLERR,"Pair inner cutoff < Respa interior cutoff");
} else cut_respa = NULL;
// insure use of KSpace long-range solver, set g_ewald
if (force->kspace == NULL)
error->all(FLERR,"Pair style requires a KSpace style");
g_ewald = force->kspace->g_ewald;
// setup force tables
if (ncoultablebits) init_tables(cut_coul,cut_respa);
}
/* ----------------------------------------------------------------------
neighbor callback to inform pair style of neighbor list to use
regular or rRESPA
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::init_list(int id, NeighList *ptr)
{
if (id == 0) list = ptr;
else if (id == 1) listinner = ptr;
else if (id == 2) listmiddle = ptr;
else if (id == 3) listouter = ptr;
}
/* ----------------------------------------------------------------------
init for one type pair i,j and corresponding j,i
------------------------------------------------------------------------- */
double PairLJCharmmfswCoulLong::init_one(int i, int j)
{
if (setflag[i][j] == 0) {
epsilon[i][j] = mix_energy(epsilon[i][i],epsilon[j][j],
sigma[i][i],sigma[j][j]);
sigma[i][j] = mix_distance(sigma[i][i],sigma[j][j]);
eps14[i][j] = mix_energy(eps14[i][i],eps14[j][j],
sigma14[i][i],sigma14[j][j]);
sigma14[i][j] = mix_distance(sigma14[i][i],sigma14[j][j]);
}
double cut = MAX(cut_lj,cut_coul);
lj1[i][j] = 48.0 * epsilon[i][j] * pow(sigma[i][j],12.0);
lj2[i][j] = 24.0 * epsilon[i][j] * pow(sigma[i][j],6.0);
lj3[i][j] = 4.0 * epsilon[i][j] * pow(sigma[i][j],12.0);
lj4[i][j] = 4.0 * epsilon[i][j] * pow(sigma[i][j],6.0);
lj14_1[i][j] = 48.0 * eps14[i][j] * pow(sigma14[i][j],12.0);
lj14_2[i][j] = 24.0 * eps14[i][j] * pow(sigma14[i][j],6.0);
lj14_3[i][j] = 4.0 * eps14[i][j] * pow(sigma14[i][j],12.0);
lj14_4[i][j] = 4.0 * eps14[i][j] * pow(sigma14[i][j],6.0);
lj1[j][i] = lj1[i][j];
lj2[j][i] = lj2[i][j];
lj3[j][i] = lj3[i][j];
lj4[j][i] = lj4[i][j];
lj14_1[j][i] = lj14_1[i][j];
lj14_2[j][i] = lj14_2[i][j];
lj14_3[j][i] = lj14_3[i][j];
lj14_4[j][i] = lj14_4[i][j];
return cut;
}
/* ----------------------------------------------------------------------
proc 0 writes to restart file
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::write_restart(FILE *fp)
{
write_restart_settings(fp);
int i,j;
for (i = 1; i <= atom->ntypes; i++)
for (j = i; j <= atom->ntypes; j++) {
fwrite(&setflag[i][j],sizeof(int),1,fp);
if (setflag[i][j]) {
fwrite(&epsilon[i][j],sizeof(double),1,fp);
fwrite(&sigma[i][j],sizeof(double),1,fp);
fwrite(&eps14[i][j],sizeof(double),1,fp);
fwrite(&sigma14[i][j],sizeof(double),1,fp);
}
}
}
/* ----------------------------------------------------------------------
proc 0 reads from restart file, bcasts
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::read_restart(FILE *fp)
{
read_restart_settings(fp);
allocate();
int i,j;
int me = comm->me;
for (i = 1; i <= atom->ntypes; i++)
for (j = i; j <= atom->ntypes; j++) {
if (me == 0) fread(&setflag[i][j],sizeof(int),1,fp);
MPI_Bcast(&setflag[i][j],1,MPI_INT,0,world);
if (setflag[i][j]) {
if (me == 0) {
fread(&epsilon[i][j],sizeof(double),1,fp);
fread(&sigma[i][j],sizeof(double),1,fp);
fread(&eps14[i][j],sizeof(double),1,fp);
fread(&sigma14[i][j],sizeof(double),1,fp);
}
MPI_Bcast(&epsilon[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&sigma[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&eps14[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&sigma14[i][j],1,MPI_DOUBLE,0,world);
}
}
}
/* ----------------------------------------------------------------------
proc 0 writes to restart file
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::write_restart_settings(FILE *fp)
{
fwrite(&cut_lj_inner,sizeof(double),1,fp);
fwrite(&cut_lj,sizeof(double),1,fp);
fwrite(&cut_coul,sizeof(double),1,fp);
fwrite(&offset_flag,sizeof(int),1,fp);
fwrite(&mix_flag,sizeof(int),1,fp);
fwrite(&ncoultablebits,sizeof(int),1,fp);
fwrite(&tabinner,sizeof(double),1,fp);
}
/* ----------------------------------------------------------------------
proc 0 reads from restart file, bcasts
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::read_restart_settings(FILE *fp)
{
if (comm->me == 0) {
fread(&cut_lj_inner,sizeof(double),1,fp);
fread(&cut_lj,sizeof(double),1,fp);
fread(&cut_coul,sizeof(double),1,fp);
fread(&offset_flag,sizeof(int),1,fp);
fread(&mix_flag,sizeof(int),1,fp);
fread(&ncoultablebits,sizeof(int),1,fp);
fread(&tabinner,sizeof(double),1,fp);
}
MPI_Bcast(&cut_lj_inner,1,MPI_DOUBLE,0,world);
MPI_Bcast(&cut_lj,1,MPI_DOUBLE,0,world);
MPI_Bcast(&cut_coul,1,MPI_DOUBLE,0,world);
MPI_Bcast(&offset_flag,1,MPI_INT,0,world);
MPI_Bcast(&mix_flag,1,MPI_INT,0,world);
MPI_Bcast(&ncoultablebits,1,MPI_INT,0,world);
MPI_Bcast(&tabinner,1,MPI_DOUBLE,0,world);
}
/* ----------------------------------------------------------------------
proc 0 writes to data file
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::write_data(FILE *fp)
{
for (int i = 1; i <= atom->ntypes; i++)
fprintf(fp,"%d %g %g %g %g\n",
i,epsilon[i][i],sigma[i][i],eps14[i][i],sigma14[i][i]);
}
/* ----------------------------------------------------------------------
proc 0 writes all pairs to data file
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulLong::write_data_all(FILE *fp)
{
for (int i = 1; i <= atom->ntypes; i++)
for (int j = i; j <= atom->ntypes; j++)
fprintf(fp,"%d %d %g %g %g %g\n",i,j,
epsilon[i][j],sigma[i][j],eps14[i][j],sigma14[i][j]);
}
/* ---------------------------------------------------------------------- */
double PairLJCharmmfswCoulLong::single(int i, int j, int itype, int jtype,
double rsq,
double factor_coul, double factor_lj,
double &fforce)
{
double r,rinv,r2inv,r3inv,r6inv,grij,expm2,t,erfc,prefactor;
double switch1,fraction,table,forcecoul,forcelj,phicoul,philj,philj12,philj6;
int itable;
r = sqrt(rsq);
rinv = 1.0/r;
r2inv = 1.0/rsq;
if (rsq < cut_coulsq) {
if (!ncoultablebits || rsq <= tabinnersq) {
r = sqrt(rsq);
grij = g_ewald * r;
expm2 = exp(-grij*grij);
t = 1.0 / (1.0 + EWALD_P*grij);
erfc = t * (A1+t*(A2+t*(A3+t*(A4+t*A5)))) * expm2;
prefactor = force->qqrd2e * atom->q[i]*atom->q[j]/r;
forcecoul = prefactor * (erfc + EWALD_F*grij*expm2);
if (factor_coul < 1.0) forcecoul -= (1.0-factor_coul)*prefactor;
} else {
union_int_float_t rsq_lookup;
rsq_lookup.f = rsq;
itable = rsq_lookup.i & ncoulmask;
itable >>= ncoulshiftbits;
fraction = (rsq_lookup.f - rtable[itable]) * drtable[itable];
table = ftable[itable] + fraction*dftable[itable];
forcecoul = atom->q[i]*atom->q[j] * table;
if (factor_coul < 1.0) {
table = ctable[itable] + fraction*dctable[itable];
prefactor = atom->q[i]*atom->q[j] * table;
forcecoul -= (1.0-factor_coul)*prefactor;
}
}
} else forcecoul = 0.0;
if (rsq < cut_ljsq) {
r3inv = rinv*rinv*rinv;
r6inv = r3inv*r3inv;
forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
if (rsq > cut_lj_innersq) {
switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
(cut_ljsq + 2.0*rsq - 3.0*cut_lj_innersq) / denom_lj;
forcelj = forcelj*switch1;
}
} else forcelj = 0.0;
fforce = (forcecoul + factor_lj*forcelj) * r2inv;
double eng = 0.0;
if (rsq < cut_coulsq) {
if (!ncoultablebits || rsq <= tabinnersq)
phicoul = prefactor*erfc;
else {
table = etable[itable] + fraction*detable[itable];
phicoul = atom->q[i]*atom->q[j] * table;
}
if (factor_coul < 1.0) phicoul -= (1.0-factor_coul)*prefactor;
eng += phicoul;
}
if (rsq < cut_ljsq) {
if (rsq > cut_lj_innersq) {
philj12 = lj3[itype][jtype]*cut_lj6*denom_lj12 *
(r6inv - cut_lj6inv)*(r6inv - cut_lj6inv);
philj6 = -lj4[itype][jtype]*cut_lj3*denom_lj6 *
(r3inv - cut_lj3inv)*(r3inv - cut_lj3inv);;
philj = philj12 + philj6;
} else {
philj12 = r6inv*lj3[itype][jtype]*r6inv -
lj3[itype][jtype]*cut_lj_inner6inv*cut_lj6inv;
philj6 = -lj4[itype][jtype]*r6inv +
lj4[itype][jtype]*cut_lj_inner3inv*cut_lj3inv;
philj = philj12 + philj6;
}
eng += factor_lj*philj;
}
return eng;
}
/* ---------------------------------------------------------------------- */
void *PairLJCharmmfswCoulLong::extract(const char *str, int &dim)
{
dim = 2;
if (strcmp(str,"lj14_1") == 0) return (void *) lj14_1;
if (strcmp(str,"lj14_2") == 0) return (void *) lj14_2;
if (strcmp(str,"lj14_3") == 0) return (void *) lj14_3;
if (strcmp(str,"lj14_4") == 0) return (void *) lj14_4;
dim = 0;
if (strcmp(str,"implicit") == 0) return (void *) &implicit;
// info extracted by dihedral_charmmfsw
if (strcmp(str,"cut_coul") == 0) return (void *) &cut_coul;
if (strcmp(str,"cut_lj_inner") == 0) return (void *) &cut_lj_inner;
if (strcmp(str,"cut_lj") == 0) return (void *) &cut_lj;
if (strcmp(str,"dihedflag") == 0) return (void *) &dihedflag;
return NULL;
}
diff --git a/src/KSPACE/pppm_disp.cpp b/src/KSPACE/pppm_disp.cpp
index 5d6c2042b..b31d42a81 100644
--- a/src/KSPACE/pppm_disp.cpp
+++ b/src/KSPACE/pppm_disp.cpp
@@ -1,8256 +1,8256 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing authors: Rolf Isele-Holder (Aachen University)
Paul Crozier (SNL)
------------------------------------------------------------------------- */
#include <mpi.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "pppm_disp.h"
#include "math_const.h"
#include "atom.h"
#include "comm.h"
#include "gridcomm.h"
#include "neighbor.h"
#include "force.h"
#include "pair.h"
#include "bond.h"
#include "angle.h"
#include "domain.h"
#include "fft3d_wrap.h"
#include "remap_wrap.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace MathConst;
#define MAXORDER 7
#define OFFSET 16384
#define SMALL 0.00001
#define LARGE 10000.0
#define EPS_HOC 1.0e-7
enum{GEOMETRIC,ARITHMETIC,SIXTHPOWER};
enum{REVERSE_RHO, REVERSE_RHO_G, REVERSE_RHO_A, REVERSE_RHO_NONE};
enum{FORWARD_IK, FORWARD_AD, FORWARD_IK_PERATOM, FORWARD_AD_PERATOM,
FORWARD_IK_G, FORWARD_AD_G, FORWARD_IK_PERATOM_G, FORWARD_AD_PERATOM_G,
FORWARD_IK_A, FORWARD_AD_A, FORWARD_IK_PERATOM_A, FORWARD_AD_PERATOM_A,
FORWARD_IK_NONE, FORWARD_AD_NONE, FORWARD_IK_PERATOM_NONE, FORWARD_AD_PERATOM_NONE};
#ifdef FFT_SINGLE
#define ZEROF 0.0f
#define ONEF 1.0f
#else
#define ZEROF 0.0
#define ONEF 1.0
#endif
/* ---------------------------------------------------------------------- */
PPPMDisp::PPPMDisp(LAMMPS *lmp, int narg, char **arg) : KSpace(lmp, narg, arg),
factors(NULL), csumi(NULL), cii(NULL), B(NULL), density_brick(NULL), vdx_brick(NULL),
vdy_brick(NULL), vdz_brick(NULL), density_fft(NULL), u_brick(NULL), v0_brick(NULL),
v1_brick(NULL), v2_brick(NULL), v3_brick(NULL), v4_brick(NULL), v5_brick(NULL),
density_brick_g(NULL), vdx_brick_g(NULL), vdy_brick_g(NULL), vdz_brick_g(NULL),
density_fft_g(NULL), u_brick_g(NULL), v0_brick_g(NULL), v1_brick_g(NULL), v2_brick_g(NULL),
v3_brick_g(NULL), v4_brick_g(NULL), v5_brick_g(NULL), density_brick_a0(NULL),
vdx_brick_a0(NULL), vdy_brick_a0(NULL), vdz_brick_a0(NULL), density_fft_a0(NULL),
u_brick_a0(NULL), v0_brick_a0(NULL), v1_brick_a0(NULL), v2_brick_a0(NULL),
v3_brick_a0(NULL), v4_brick_a0(NULL), v5_brick_a0(NULL), density_brick_a1(NULL),
vdx_brick_a1(NULL), vdy_brick_a1(NULL), vdz_brick_a1(NULL), density_fft_a1(NULL),
u_brick_a1(NULL), v0_brick_a1(NULL), v1_brick_a1(NULL), v2_brick_a1(NULL),
v3_brick_a1(NULL), v4_brick_a1(NULL), v5_brick_a1(NULL), density_brick_a2(NULL),
vdx_brick_a2(NULL), vdy_brick_a2(NULL), vdz_brick_a2(NULL), density_fft_a2(NULL),
u_brick_a2(NULL), v0_brick_a2(NULL), v1_brick_a2(NULL), v2_brick_a2(NULL),
v3_brick_a2(NULL), v4_brick_a2(NULL), v5_brick_a2(NULL), density_brick_a3(NULL),
vdx_brick_a3(NULL), vdy_brick_a3(NULL), vdz_brick_a3(NULL), density_fft_a3(NULL),
u_brick_a3(NULL), v0_brick_a3(NULL), v1_brick_a3(NULL), v2_brick_a3(NULL),
v3_brick_a3(NULL), v4_brick_a3(NULL), v5_brick_a3(NULL), density_brick_a4(NULL),
vdx_brick_a4(NULL), vdy_brick_a4(NULL), vdz_brick_a4(NULL), density_fft_a4(NULL),
u_brick_a4(NULL), v0_brick_a4(NULL), v1_brick_a4(NULL), v2_brick_a4(NULL),
v3_brick_a4(NULL), v4_brick_a4(NULL), v5_brick_a4(NULL), density_brick_a5(NULL),
vdx_brick_a5(NULL), vdy_brick_a5(NULL), vdz_brick_a5(NULL), density_fft_a5(NULL),
u_brick_a5(NULL), v0_brick_a5(NULL), v1_brick_a5(NULL), v2_brick_a5(NULL),
v3_brick_a5(NULL), v4_brick_a5(NULL), v5_brick_a5(NULL), density_brick_a6(NULL),
vdx_brick_a6(NULL), vdy_brick_a6(NULL), vdz_brick_a6(NULL), density_fft_a6(NULL),
u_brick_a6(NULL), v0_brick_a6(NULL), v1_brick_a6(NULL), v2_brick_a6(NULL),
v3_brick_a6(NULL), v4_brick_a6(NULL), v5_brick_a6(NULL), density_brick_none(NULL),
vdx_brick_none(NULL), vdy_brick_none(NULL), vdz_brick_none(NULL),
density_fft_none(NULL), u_brick_none(NULL), v0_brick_none(NULL), v1_brick_none(NULL),
v2_brick_none(NULL), v3_brick_none(NULL), v4_brick_none(NULL), v5_brick_none(NULL),
greensfn(NULL), vg(NULL), vg2(NULL), greensfn_6(NULL), vg_6(NULL), vg2_6(NULL),
fkx(NULL), fky(NULL), fkz(NULL), fkx2(NULL), fky2(NULL), fkz2(NULL), fkx_6(NULL),
fky_6(NULL), fkz_6(NULL), fkx2_6(NULL), fky2_6(NULL), fkz2_6(NULL), gf_b(NULL),
gf_b_6(NULL), sf_precoeff1(NULL), sf_precoeff2(NULL), sf_precoeff3(NULL),
sf_precoeff4(NULL), sf_precoeff5(NULL), sf_precoeff6(NULL), sf_precoeff1_6(NULL),
sf_precoeff2_6(NULL), sf_precoeff3_6(NULL), sf_precoeff4_6(NULL), sf_precoeff5_6(NULL),
sf_precoeff6_6(NULL), rho1d(NULL), rho_coeff(NULL), drho1d(NULL), drho_coeff(NULL),
rho1d_6(NULL), rho_coeff_6(NULL), drho1d_6(NULL), drho_coeff_6(NULL), work1(NULL),
work2(NULL), work1_6(NULL), work2_6(NULL), fft1(NULL), fft2(NULL), fft1_6(NULL),
fft2_6(NULL), remap(NULL), remap_6(NULL), cg(NULL), cg_peratom(NULL), cg_6(NULL),
cg_peratom_6(NULL), part2grid(NULL), part2grid_6(NULL), boxlo(NULL)
{
if (narg < 1) error->all(FLERR,"Illegal kspace_style pppm/disp command");
triclinic_support = 0;
pppmflag = dispersionflag = 1;
accuracy_relative = fabs(force->numeric(FLERR,arg[0]));
nfactors = 3;
factors = new int[nfactors];
factors[0] = 2;
factors[1] = 3;
factors[2] = 5;
MPI_Comm_rank(world,&me);
MPI_Comm_size(world,&nprocs);
csumflag = 0;
B = NULL;
cii = NULL;
csumi = NULL;
peratom_allocate_flag = 0;
density_brick = vdx_brick = vdy_brick = vdz_brick = NULL;
density_fft = NULL;
u_brick = v0_brick = v1_brick = v2_brick = v3_brick =
v4_brick = v5_brick = NULL;
density_brick_g = vdx_brick_g = vdy_brick_g = vdz_brick_g = NULL;
density_fft_g = NULL;
u_brick_g = v0_brick_g = v1_brick_g = v2_brick_g = v3_brick_g =
v4_brick_g = v5_brick_g = NULL;
density_brick_a0 = vdx_brick_a0 = vdy_brick_a0 = vdz_brick_a0 = NULL;
density_fft_a0 = NULL;
u_brick_a0 = v0_brick_a0 = v1_brick_a0 = v2_brick_a0 = v3_brick_a0 =
v4_brick_a0 = v5_brick_a0 = NULL;
density_brick_a1 = vdx_brick_a1 = vdy_brick_a1 = vdz_brick_a1 = NULL;
density_fft_a1 = NULL;
u_brick_a1 = v0_brick_a1 = v1_brick_a1 = v2_brick_a1 = v3_brick_a1 =
v4_brick_a1 = v5_brick_a1 = NULL;
density_brick_a2 = vdx_brick_a2 = vdy_brick_a2 = vdz_brick_a2 = NULL;
density_fft_a2 = NULL;
u_brick_a2 = v0_brick_a2 = v1_brick_a2 = v2_brick_a2 = v3_brick_a2 =
v4_brick_a2 = v5_brick_a2 = NULL;
density_brick_a3 = vdx_brick_a3 = vdy_brick_a3 = vdz_brick_a3 = NULL;
density_fft_a3 = NULL;
u_brick_a3 = v0_brick_a3 = v1_brick_a3 = v2_brick_a3 = v3_brick_a3 =
v4_brick_a3 = v5_brick_a3 = NULL;
density_brick_a4 = vdx_brick_a4 = vdy_brick_a4 = vdz_brick_a4 = NULL;
density_fft_a4 = NULL;
u_brick_a4 = v0_brick_a4 = v1_brick_a4 = v2_brick_a4 = v3_brick_a4 =
v4_brick_a4 = v5_brick_a4 = NULL;
density_brick_a5 = vdx_brick_a5 = vdy_brick_a5 = vdz_brick_a5 = NULL;
density_fft_a5 = NULL;
u_brick_a5 = v0_brick_a5 = v1_brick_a5 = v2_brick_a5 = v3_brick_a5 =
v4_brick_a5 = v5_brick_a5 = NULL;
density_brick_a6 = vdx_brick_a6 = vdy_brick_a6 = vdz_brick_a6 = NULL;
density_fft_a6 = NULL;
u_brick_a6 = v0_brick_a6 = v1_brick_a6 = v2_brick_a6 = v3_brick_a6 =
v4_brick_a6 = v5_brick_a6 = NULL;
density_brick_none = vdx_brick_none = vdy_brick_none = vdz_brick_none = NULL;
density_fft_none = NULL;
u_brick_none = v0_brick_none = v1_brick_none = v2_brick_none = v3_brick_none =
v4_brick_none = v5_brick_none = NULL;
greensfn = NULL;
greensfn_6 = NULL;
work1 = work2 = NULL;
work1_6 = work2_6 = NULL;
vg = NULL;
vg2 = NULL;
vg_6 = NULL;
vg2_6 = NULL;
fkx = fky = fkz = NULL;
fkx2 = fky2 = fkz2 = NULL;
fkx_6 = fky_6 = fkz_6 = NULL;
fkx2_6 = fky2_6 = fkz2_6 = NULL;
sf_precoeff1 = sf_precoeff2 = sf_precoeff3 = sf_precoeff4 =
sf_precoeff5 = sf_precoeff6 = NULL;
sf_precoeff1_6 = sf_precoeff2_6 = sf_precoeff3_6 = sf_precoeff4_6 =
sf_precoeff5_6 = sf_precoeff6_6 = NULL;
gf_b = NULL;
gf_b_6 = NULL;
rho1d = rho_coeff = NULL;
drho1d = drho_coeff = NULL;
rho1d_6 = rho_coeff_6 = NULL;
drho1d_6 = drho_coeff_6 = NULL;
fft1 = fft2 = NULL;
fft1_6 = fft2_6 = NULL;
remap = NULL;
remap_6 = NULL;
nmax = 0;
part2grid = NULL;
part2grid_6 = NULL;
cg = NULL;
cg_peratom = NULL;
cg_6 = NULL;
cg_peratom_6 = NULL;
memset(function, 0, EWALD_FUNCS*sizeof(int));
}
/* ----------------------------------------------------------------------
free all memory
------------------------------------------------------------------------- */
PPPMDisp::~PPPMDisp()
{
delete [] factors;
delete [] B;
B = NULL;
delete [] cii;
cii = NULL;
delete [] csumi;
csumi = NULL;
deallocate();
deallocate_peratom();
memory->destroy(part2grid);
memory->destroy(part2grid_6);
part2grid = part2grid_6 = NULL;
}
/* ----------------------------------------------------------------------
called once before run
------------------------------------------------------------------------- */
void PPPMDisp::init()
{
if (me == 0) {
if (screen) fprintf(screen,"PPPMDisp initialization ...\n");
if (logfile) fprintf(logfile,"PPPMDisp initialization ...\n");
}
triclinic_check();
if (domain->dimension == 2)
error->all(FLERR,"Cannot use PPPMDisp with 2d simulation");
if (comm->style != 0)
error->universe_all(FLERR,"PPPMDisp can only currently be used with "
"comm_style brick");
if (slabflag == 0 && domain->nonperiodic > 0)
error->all(FLERR,"Cannot use nonperiodic boundaries with PPPMDisp");
if (slabflag == 1) {
if (domain->xperiodic != 1 || domain->yperiodic != 1 ||
domain->boundary[2][0] != 1 || domain->boundary[2][1] != 1)
error->all(FLERR,"Incorrect boundaries with slab PPPMDisp");
}
if (order > MAXORDER || order_6 > MAXORDER) {
char str[128];
sprintf(str,"PPPMDisp coulomb order cannot be greater than %d",MAXORDER);
error->all(FLERR,str);
}
// free all arrays previously allocated
deallocate();
deallocate_peratom();
// check whether cutoff and pair style are set
triclinic = domain->triclinic;
pair_check();
int tmp;
Pair *pair = force->pair;
int *ptr = pair ? (int *) pair->extract("ewald_order",tmp) : NULL;
double *p_cutoff = pair ? (double *) pair->extract("cut_coul",tmp) : NULL;
double *p_cutoff_lj = pair ? (double *) pair->extract("cut_LJ",tmp) : NULL;
if (!(ptr||p_cutoff||p_cutoff_lj))
error->all(FLERR,"KSpace style is incompatible with Pair style");
cutoff = *p_cutoff;
cutoff_lj = *p_cutoff_lj;
double tmp2;
MPI_Allreduce(&cutoff, &tmp2,1,MPI_DOUBLE,MPI_SUM,world);
// check out which types of potentials will have to be calculated
int ewald_order = ptr ? *((int *) ptr) : 1<<1;
int ewald_mix = ptr ? *((int *) pair->extract("ewald_mix",tmp)) : GEOMETRIC;
memset(function, 0, EWALD_FUNCS*sizeof(int));
for (int i=0; i<=EWALD_MAXORDER; ++i) // transcribe order
if (ewald_order&(1<<i)) { // from pair_style
int k=0;
char str[128];
switch (i) {
case 1:
k = 0; break;
case 6:
if ((ewald_mix==GEOMETRIC || ewald_mix==SIXTHPOWER ||
mixflag == 1) && mixflag!= 2) { k = 1; break; }
else if (ewald_mix==ARITHMETIC && mixflag!=2) { k = 2; break; }
else if (mixflag == 2) { k = 3; break; }
default:
sprintf(str, "Unsupported order in kspace_style "
"pppm/disp, pair_style %s", force->pair_style);
error->all(FLERR,str);
}
function[k] = 1;
}
// warn, if function[0] is not set but charge attribute is set!
if (!function[0] && atom->q_flag && me == 0) {
char str[128];
sprintf(str, "Charges are set, but coulombic solver is not used");
error->warning(FLERR, str);
}
// show error message if pppm/disp is not used correctly
if (function[1] || function[2] || function[3]) {
if (!gridflag_6 && !gewaldflag_6 && accuracy_real_6 < 0
&& accuracy_kspace_6 < 0 && !auto_disp_flag) {
error->all(FLERR, "PPPMDisp used but no parameters set, "
"for further information please see the pppm/disp "
"documentation");
}
}
// compute qsum & qsqsum, if function[0] is set, warn if not charge-neutral
scale = 1.0;
qqrd2e = force->qqrd2e;
natoms_original = atom->natoms;
if (function[0]) qsum_qsq();
// if kspace is TIP4P, extract TIP4P params from pair style
// bond/angle are not yet init(), so insure equilibrium request is valid
qdist = 0.0;
if (tip4pflag) {
int itmp;
double *p_qdist = (double *) force->pair->extract("qdist",itmp);
int *p_typeO = (int *) force->pair->extract("typeO",itmp);
int *p_typeH = (int *) force->pair->extract("typeH",itmp);
int *p_typeA = (int *) force->pair->extract("typeA",itmp);
int *p_typeB = (int *) force->pair->extract("typeB",itmp);
if (!p_qdist || !p_typeO || !p_typeH || !p_typeA || !p_typeB)
error->all(FLERR,"KSpace style is incompatible with Pair style");
qdist = *p_qdist;
typeO = *p_typeO;
typeH = *p_typeH;
int typeA = *p_typeA;
int typeB = *p_typeB;
if (force->angle == NULL || force->bond == NULL)
error->all(FLERR,"Bond and angle potentials must be defined for TIP4P");
if (typeA < 1 || typeA > atom->nangletypes ||
force->angle->setflag[typeA] == 0)
error->all(FLERR,"Bad TIP4P angle type for PPPMDisp/TIP4P");
if (typeB < 1 || typeB > atom->nbondtypes ||
force->bond->setflag[typeB] == 0)
error->all(FLERR,"Bad TIP4P bond type for PPPMDisp/TIP4P");
double theta = force->angle->equilibrium_angle(typeA);
double blen = force->bond->equilibrium_distance(typeB);
alpha = qdist / (cos(0.5*theta) * blen);
}
+ //if g_ewald and g_ewald_6 have not been specified, set some initial value
+ // to avoid problems when calculating the energies!
+
+ if (!gewaldflag) g_ewald = 1;
+ if (!gewaldflag_6) g_ewald_6 = 1;
+
// initialize the pair style to get the coefficients
neighrequest_flag = 0;
pair->init();
neighrequest_flag = 1;
init_coeffs();
- //if g_ewald and g_ewald_6 have not been specified, set some initial value
- // to avoid problems when calculating the energies!
-
- if (!gewaldflag) g_ewald = 1;
- if (!gewaldflag_6) g_ewald_6 = 1;
-
// set accuracy (force units) from accuracy_relative or accuracy_absolute
if (accuracy_absolute >= 0.0) accuracy = accuracy_absolute;
else accuracy = accuracy_relative * two_charge_force;
int (*procneigh)[2] = comm->procneigh;
int iteration = 0;
if (function[0]) {
GridComm *cgtmp = NULL;
while (order >= minorder) {
if (iteration && me == 0)
error->warning(FLERR,"Reducing PPPMDisp Coulomb order "
"b/c stencil extends beyond neighbor processor");
iteration++;
// set grid for dispersion interaction and coulomb interactions
set_grid();
if (nx_pppm >= OFFSET || ny_pppm >= OFFSET || nz_pppm >= OFFSET)
error->all(FLERR,"PPPMDisp Coulomb grid is too large");
set_fft_parameters(nx_pppm, ny_pppm, nz_pppm,
nxlo_fft, nylo_fft, nzlo_fft,
nxhi_fft, nyhi_fft, nzhi_fft,
nxlo_in, nylo_in, nzlo_in,
nxhi_in, nyhi_in, nzhi_in,
nxlo_out, nylo_out, nzlo_out,
nxhi_out, nyhi_out, nzhi_out,
nlower, nupper,
ngrid, nfft, nfft_both,
shift, shiftone, order);
if (overlap_allowed) break;
cgtmp = new GridComm(lmp, world,1,1,
nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
nxlo_out,nxhi_out,nylo_out,nyhi_out,
nzlo_out,nzhi_out,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
cgtmp->ghost_notify();
if (!cgtmp->ghost_overlap()) break;
delete cgtmp;
order--;
}
if (order < minorder)
error->all(FLERR,
"Coulomb PPPMDisp order has been reduced below minorder");
if (cgtmp) delete cgtmp;
// adjust g_ewald
if (!gewaldflag) adjust_gewald();
// calculate the final accuracy
double acc = final_accuracy();
// print stats
int ngrid_max,nfft_both_max;
MPI_Allreduce(&ngrid,&ngrid_max,1,MPI_INT,MPI_MAX,world);
MPI_Allreduce(&nfft_both,&nfft_both_max,1,MPI_INT,MPI_MAX,world);
if (me == 0) {
#ifdef FFT_SINGLE
const char fft_prec[] = "single";
#else
const char fft_prec[] = "double";
#endif
if (screen) {
fprintf(screen," Coulomb G vector (1/distance)= %g\n",g_ewald);
fprintf(screen," Coulomb grid = %d %d %d\n",nx_pppm,ny_pppm,nz_pppm);
fprintf(screen," Coulomb stencil order = %d\n",order);
fprintf(screen," Coulomb estimated absolute RMS force accuracy = %g\n",
acc);
fprintf(screen," Coulomb estimated relative force accuracy = %g\n",
acc/two_charge_force);
fprintf(screen," using %s precision FFTs\n",fft_prec);
fprintf(screen," 3d grid and FFT values/proc = %d %d\n",
ngrid_max, nfft_both_max);
}
if (logfile) {
fprintf(logfile," Coulomb G vector (1/distance) = %g\n",g_ewald);
fprintf(logfile," Coulomb grid = %d %d %d\n",nx_pppm,ny_pppm,nz_pppm);
fprintf(logfile," Coulomb stencil order = %d\n",order);
fprintf(logfile,
" Coulomb estimated absolute RMS force accuracy = %g\n",
acc);
fprintf(logfile," Coulomb estimated relative force accuracy = %g\n",
acc/two_charge_force);
fprintf(logfile," using %s precision FFTs\n",fft_prec);
fprintf(logfile," 3d grid and FFT values/proc = %d %d\n",
ngrid_max, nfft_both_max);
}
}
}
iteration = 0;
if (function[1] + function[2] + function[3]) {
GridComm *cgtmp = NULL;
while (order_6 >= minorder) {
if (iteration && me == 0)
error->warning(FLERR,"Reducing PPPMDisp dispersion order "
"b/c stencil extends beyond neighbor processor");
iteration++;
set_grid_6();
if (nx_pppm_6 >= OFFSET || ny_pppm_6 >= OFFSET || nz_pppm_6 >= OFFSET)
error->all(FLERR,"PPPMDisp Dispersion grid is too large");
set_fft_parameters(nx_pppm_6, ny_pppm_6, nz_pppm_6,
nxlo_fft_6, nylo_fft_6, nzlo_fft_6,
nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
nxlo_in_6, nylo_in_6, nzlo_in_6,
nxhi_in_6, nyhi_in_6, nzhi_in_6,
nxlo_out_6, nylo_out_6, nzlo_out_6,
nxhi_out_6, nyhi_out_6, nzhi_out_6,
nlower_6, nupper_6,
ngrid_6, nfft_6, nfft_both_6,
shift_6, shiftone_6, order_6);
if (overlap_allowed) break;
cgtmp = new GridComm(lmp,world,1,1,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,
nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,
nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
cgtmp->ghost_notify();
if (!cgtmp->ghost_overlap()) break;
delete cgtmp;
order_6--;
}
if (order_6 < minorder)
error->all(FLERR,"Dispersion PPPMDisp order has been "
"reduced below minorder");
if (cgtmp) delete cgtmp;
// adjust g_ewald_6
if (!gewaldflag_6 && accuracy_kspace_6 == accuracy_real_6)
adjust_gewald_6();
// calculate the final accuracy
double acc, acc_real, acc_kspace;
final_accuracy_6(acc, acc_real, acc_kspace);
// print stats
int ngrid_max,nfft_both_max;
MPI_Allreduce(&ngrid_6,&ngrid_max,1,MPI_INT,MPI_MAX,world);
MPI_Allreduce(&nfft_both_6,&nfft_both_max,1,MPI_INT,MPI_MAX,world);
if (me == 0) {
#ifdef FFT_SINGLE
const char fft_prec[] = "single";
#else
const char fft_prec[] = "double";
#endif
if (screen) {
fprintf(screen," Dispersion G vector (1/distance)= %g\n",g_ewald_6);
fprintf(screen," Dispersion grid = %d %d %d\n",
nx_pppm_6,ny_pppm_6,nz_pppm_6);
fprintf(screen," Dispersion stencil order = %d\n",order_6);
fprintf(screen," Dispersion estimated absolute "
"RMS force accuracy = %g\n",acc);
fprintf(screen," Dispersion estimated absolute "
"real space RMS force accuracy = %g\n",acc_real);
fprintf(screen," Dispersion estimated absolute "
"kspace RMS force accuracy = %g\n",acc_kspace);
fprintf(screen," Dispersion estimated relative force accuracy = %g\n",
acc/two_charge_force);
fprintf(screen," using %s precision FFTs\n",fft_prec);
fprintf(screen," 3d grid and FFT values/proc dispersion = %d %d\n",
ngrid_max,nfft_both_max);
}
if (logfile) {
fprintf(logfile," Dispersion G vector (1/distance) = %g\n",g_ewald_6);
fprintf(logfile," Dispersion grid = %d %d %d\n",
nx_pppm_6,ny_pppm_6,nz_pppm_6);
fprintf(logfile," Dispersion stencil order = %d\n",order_6);
fprintf(logfile," Dispersion estimated absolute "
"RMS force accuracy = %g\n",acc);
fprintf(logfile," Dispersion estimated absolute "
"real space RMS force accuracy = %g\n",acc_real);
fprintf(logfile," Dispersion estimated absolute "
"kspace RMS force accuracy = %g\n",acc_kspace);
fprintf(logfile," Disperion estimated relative force accuracy = %g\n",
acc/two_charge_force);
fprintf(logfile," using %s precision FFTs\n",fft_prec);
fprintf(logfile," 3d grid and FFT values/proc dispersion = %d %d\n",
ngrid_max,nfft_both_max);
}
}
}
// allocate K-space dependent memory
allocate();
// pre-compute Green's function denomiator expansion
// pre-compute 1d charge distribution coefficients
if (function[0]) {
compute_gf_denom(gf_b, order);
compute_rho_coeff(rho_coeff, drho_coeff, order);
cg->ghost_notify();
cg->setup();
if (differentiation_flag == 1)
compute_sf_precoeff(nx_pppm, ny_pppm, nz_pppm, order,
nxlo_fft, nylo_fft, nzlo_fft,
nxhi_fft, nyhi_fft, nzhi_fft,
sf_precoeff1, sf_precoeff2, sf_precoeff3,
sf_precoeff4, sf_precoeff5, sf_precoeff6);
}
if (function[1] + function[2] + function[3]) {
compute_gf_denom(gf_b_6, order_6);
compute_rho_coeff(rho_coeff_6, drho_coeff_6, order_6);
cg_6->ghost_notify();
cg_6->setup();
if (differentiation_flag == 1)
compute_sf_precoeff(nx_pppm_6, ny_pppm_6, nz_pppm_6, order_6,
nxlo_fft_6, nylo_fft_6, nzlo_fft_6,
nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
sf_precoeff1_6, sf_precoeff2_6, sf_precoeff3_6,
sf_precoeff4_6, sf_precoeff5_6, sf_precoeff6_6);
}
}
/* ----------------------------------------------------------------------
adjust PPPM coeffs, called initially and whenever volume has changed
------------------------------------------------------------------------- */
void PPPMDisp::setup()
{
if (slabflag == 0 && domain->nonperiodic > 0)
error->all(FLERR,"Cannot use nonperiodic boundaries with PPPMDisp");
if (slabflag == 1) {
if (domain->xperiodic != 1 || domain->yperiodic != 1 ||
domain->boundary[2][0] != 1 || domain->boundary[2][1] != 1)
error->all(FLERR,"Incorrect boundaries with slab PPPMDisp");
}
double *prd;
// volume-dependent factors
// adjust z dimension for 2d slab PPPM
// z dimension for 3d PPPM is zprd since slab_volfactor = 1.0
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
volume = xprd * yprd * zprd_slab;
// compute fkx,fky,fkz for my FFT grid pts
double unitkx = (2.0*MY_PI/xprd);
double unitky = (2.0*MY_PI/yprd);
double unitkz = (2.0*MY_PI/zprd_slab);
//compute the virial coefficients and green functions
if (function[0]){
delxinv = nx_pppm/xprd;
delyinv = ny_pppm/yprd;
delzinv = nz_pppm/zprd_slab;
delvolinv = delxinv*delyinv*delzinv;
double per;
int i, j, k, n;
for (i = nxlo_fft; i <= nxhi_fft; i++) {
per = i - nx_pppm*(2*i/nx_pppm);
fkx[i] = unitkx*per;
j = (nx_pppm - i) % nx_pppm;
per = j - nx_pppm*(2*j/nx_pppm);
fkx2[i] = unitkx*per;
}
for (i = nylo_fft; i <= nyhi_fft; i++) {
per = i - ny_pppm*(2*i/ny_pppm);
fky[i] = unitky*per;
j = (ny_pppm - i) % ny_pppm;
per = j - ny_pppm*(2*j/ny_pppm);
fky2[i] = unitky*per;
}
for (i = nzlo_fft; i <= nzhi_fft; i++) {
per = i - nz_pppm*(2*i/nz_pppm);
fkz[i] = unitkz*per;
j = (nz_pppm - i) % nz_pppm;
per = j - nz_pppm*(2*j/nz_pppm);
fkz2[i] = unitkz*per;
}
double sqk,vterm;
double gew2inv = 1/(g_ewald*g_ewald);
n = 0;
for (k = nzlo_fft; k <= nzhi_fft; k++) {
for (j = nylo_fft; j <= nyhi_fft; j++) {
for (i = nxlo_fft; i <= nxhi_fft; i++) {
sqk = fkx[i]*fkx[i] + fky[j]*fky[j] + fkz[k]*fkz[k];
if (sqk == 0.0) {
vg[n][0] = 0.0;
vg[n][1] = 0.0;
vg[n][2] = 0.0;
vg[n][3] = 0.0;
vg[n][4] = 0.0;
vg[n][5] = 0.0;
} else {
vterm = -2.0 * (1.0/sqk + 0.25*gew2inv);
vg[n][0] = 1.0 + vterm*fkx[i]*fkx[i];
vg[n][1] = 1.0 + vterm*fky[j]*fky[j];
vg[n][2] = 1.0 + vterm*fkz[k]*fkz[k];
vg[n][3] = vterm*fkx[i]*fky[j];
vg[n][4] = vterm*fkx[i]*fkz[k];
vg[n][5] = vterm*fky[j]*fkz[k];
vg2[n][0] = vterm*0.5*(fkx[i]*fky[j] + fkx2[i]*fky2[j]);
vg2[n][1] = vterm*0.5*(fkx[i]*fkz[k] + fkx2[i]*fkz2[k]);
vg2[n][2] = vterm*0.5*(fky[j]*fkz[k] + fky2[j]*fkz2[k]);
}
n++;
}
}
}
compute_gf();
if (differentiation_flag == 1) compute_sf_coeff();
}
if (function[1] + function[2] + function[3]) {
delxinv_6 = nx_pppm_6/xprd;
delyinv_6 = ny_pppm_6/yprd;
delzinv_6 = nz_pppm_6/zprd_slab;
delvolinv_6 = delxinv_6*delyinv_6*delzinv_6;
double per;
int i, j, k, n;
for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
per = i - nx_pppm_6*(2*i/nx_pppm_6);
fkx_6[i] = unitkx*per;
j = (nx_pppm_6 - i) % nx_pppm_6;
per = j - nx_pppm_6*(2*j/nx_pppm_6);
fkx2_6[i] = unitkx*per;
}
for (i = nylo_fft_6; i <= nyhi_fft_6; i++) {
per = i - ny_pppm_6*(2*i/ny_pppm_6);
fky_6[i] = unitky*per;
j = (ny_pppm_6 - i) % ny_pppm_6;
per = j - ny_pppm_6*(2*j/ny_pppm_6);
fky2_6[i] = unitky*per;
}
for (i = nzlo_fft_6; i <= nzhi_fft_6; i++) {
per = i - nz_pppm_6*(2*i/nz_pppm_6);
fkz_6[i] = unitkz*per;
j = (nz_pppm_6 - i) % nz_pppm_6;
per = j - nz_pppm_6*(2*j/nz_pppm_6);
fkz2_6[i] = unitkz*per;
}
double sqk,vterm;
long double erft, expt,nom, denom;
long double b, bs, bt;
double rtpi = sqrt(MY_PI);
double gewinv = 1/g_ewald_6;
n = 0;
for (k = nzlo_fft_6; k <= nzhi_fft_6; k++) {
for (j = nylo_fft_6; j <= nyhi_fft_6; j++) {
for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
sqk = fkx_6[i]*fkx_6[i] + fky_6[j]*fky_6[j] + fkz_6[k]*fkz_6[k];
if (sqk == 0.0) {
vg_6[n][0] = 0.0;
vg_6[n][1] = 0.0;
vg_6[n][2] = 0.0;
vg_6[n][3] = 0.0;
vg_6[n][4] = 0.0;
vg_6[n][5] = 0.0;
} else {
b = 0.5*sqrt(sqk)*gewinv;
bs = b*b;
bt = bs*b;
erft = 2*bt*rtpi*erfc((double) b);
expt = exp(-bs);
nom = erft - 2*bs*expt;
denom = nom + expt;
if (denom == 0) vterm = 3.0/sqk;
else vterm = 3.0*nom/(sqk*denom);
vg_6[n][0] = 1.0 + vterm*fkx_6[i]*fkx_6[i];
vg_6[n][1] = 1.0 + vterm*fky_6[j]*fky_6[j];
vg_6[n][2] = 1.0 + vterm*fkz_6[k]*fkz_6[k];
vg_6[n][3] = vterm*fkx_6[i]*fky_6[j];
vg_6[n][4] = vterm*fkx_6[i]*fkz_6[k];
vg_6[n][5] = vterm*fky_6[j]*fkz_6[k];
vg2_6[n][0] = vterm*0.5*(fkx_6[i]*fky_6[j] + fkx2_6[i]*fky2_6[j]);
vg2_6[n][1] = vterm*0.5*(fkx_6[i]*fkz_6[k] + fkx2_6[i]*fkz2_6[k]);
vg2_6[n][2] = vterm*0.5*(fky_6[j]*fkz_6[k] + fky2_6[j]*fkz2_6[k]);
}
n++;
}
}
}
compute_gf_6();
if (differentiation_flag == 1) compute_sf_coeff_6();
}
}
/* ----------------------------------------------------------------------
reset local grid arrays and communication stencils
called by fix balance b/c it changed sizes of processor sub-domains
------------------------------------------------------------------------- */
void PPPMDisp::setup_grid()
{
// free all arrays previously allocated
deallocate();
deallocate_peratom();
// reset portion of global grid that each proc owns
if (function[0])
set_fft_parameters(nx_pppm, ny_pppm, nz_pppm,
nxlo_fft, nylo_fft, nzlo_fft,
nxhi_fft, nyhi_fft, nzhi_fft,
nxlo_in, nylo_in, nzlo_in,
nxhi_in, nyhi_in, nzhi_in,
nxlo_out, nylo_out, nzlo_out,
nxhi_out, nyhi_out, nzhi_out,
nlower, nupper,
ngrid, nfft, nfft_both,
shift, shiftone, order);
if (function[1] + function[2] + function[3])
set_fft_parameters(nx_pppm_6, ny_pppm_6, nz_pppm_6,
nxlo_fft_6, nylo_fft_6, nzlo_fft_6,
nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
nxlo_in_6, nylo_in_6, nzlo_in_6,
nxhi_in_6, nyhi_in_6, nzhi_in_6,
nxlo_out_6, nylo_out_6, nzlo_out_6,
nxhi_out_6, nyhi_out_6, nzhi_out_6,
nlower_6, nupper_6,
ngrid_6, nfft_6, nfft_both_6,
shift_6, shiftone_6, order_6);
// reallocate K-space dependent memory
// check if grid communication is now overlapping if not allowed
// don't invoke allocate_peratom(), compute() will allocate when needed
allocate();
if (function[0]) {
cg->ghost_notify();
if (overlap_allowed == 0 && cg->ghost_overlap())
error->all(FLERR,"PPPM grid stencil extends "
"beyond nearest neighbor processor");
cg->setup();
}
if (function[1] + function[2] + function[3]) {
cg_6->ghost_notify();
if (overlap_allowed == 0 && cg_6->ghost_overlap())
error->all(FLERR,"PPPM grid stencil extends "
"beyond nearest neighbor processor");
cg_6->setup();
}
// pre-compute Green's function denomiator expansion
// pre-compute 1d charge distribution coefficients
if (function[0]) {
compute_gf_denom(gf_b, order);
compute_rho_coeff(rho_coeff, drho_coeff, order);
if (differentiation_flag == 1)
compute_sf_precoeff(nx_pppm, ny_pppm, nz_pppm, order,
nxlo_fft, nylo_fft, nzlo_fft,
nxhi_fft, nyhi_fft, nzhi_fft,
sf_precoeff1, sf_precoeff2, sf_precoeff3,
sf_precoeff4, sf_precoeff5, sf_precoeff6);
}
if (function[1] + function[2] + function[3]) {
compute_gf_denom(gf_b_6, order_6);
compute_rho_coeff(rho_coeff_6, drho_coeff_6, order_6);
if (differentiation_flag == 1)
compute_sf_precoeff(nx_pppm_6, ny_pppm_6, nz_pppm_6, order_6,
nxlo_fft_6, nylo_fft_6, nzlo_fft_6,
nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
sf_precoeff1_6, sf_precoeff2_6, sf_precoeff3_6,
sf_precoeff4_6, sf_precoeff5_6, sf_precoeff6_6);
}
// pre-compute volume-dependent coeffs
setup();
}
/* ----------------------------------------------------------------------
compute the PPPM long-range force, energy, virial
------------------------------------------------------------------------- */
void PPPMDisp::compute(int eflag, int vflag)
{
int i;
// convert atoms from box to lamda coords
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = evflag_atom = eflag_global = vflag_global =
eflag_atom = vflag_atom = 0;
if (evflag_atom && !peratom_allocate_flag) {
allocate_peratom();
if (function[0]) {
cg_peratom->ghost_notify();
cg_peratom->setup();
}
if (function[1] + function[2] + function[3]) {
cg_peratom_6->ghost_notify();
cg_peratom_6->setup();
}
peratom_allocate_flag = 1;
}
if (triclinic == 0) boxlo = domain->boxlo;
else {
boxlo = domain->boxlo_lamda;
domain->x2lamda(atom->nlocal);
}
// extend size of per-atom arrays if necessary
if (atom->nmax > nmax) {
if (function[0]) memory->destroy(part2grid);
if (function[1] + function[2] + function[3]) memory->destroy(part2grid_6);
nmax = atom->nmax;
if (function[0]) memory->create(part2grid,nmax,3,"pppm/disp:part2grid");
if (function[1] + function[2] + function[3])
memory->create(part2grid_6,nmax,3,"pppm/disp:part2grid_6");
}
energy = 0.0;
energy_1 = 0.0;
energy_6 = 0.0;
if (vflag) for (i = 0; i < 6; i++) virial_6[i] = virial_1[i] = 0.0;
// find grid points for all my particles
// distribute partcles' charges/dispersion coefficients on the grid
// communication between processors and remapping two fft
// Solution of poissons equation in k-space and backtransformation
// communication between processors
// calculation of forces
if (function[0]) {
//perfrom calculations for coulomb interactions only
particle_map_c(delxinv, delyinv, delzinv, shift, part2grid, nupper, nlower,
nxlo_out, nylo_out, nzlo_out, nxhi_out, nyhi_out, nzhi_out);
make_rho_c();
cg->reverse_comm(this,REVERSE_RHO);
brick2fft(nxlo_in, nylo_in, nzlo_in, nxhi_in, nyhi_in, nzhi_in,
density_brick, density_fft, work1,remap);
if (differentiation_flag == 1) {
poisson_ad(work1, work2, density_fft, fft1, fft2,
nx_pppm, ny_pppm, nz_pppm, nfft,
nxlo_fft, nylo_fft, nzlo_fft, nxhi_fft, nyhi_fft, nzhi_fft,
nxlo_in, nylo_in, nzlo_in, nxhi_in, nyhi_in, nzhi_in,
energy_1, greensfn,
virial_1, vg,vg2,
u_brick, v0_brick, v1_brick, v2_brick, v3_brick, v4_brick, v5_brick);
cg->forward_comm(this,FORWARD_AD);
fieldforce_c_ad();
if (vflag_atom) cg_peratom->forward_comm(this, FORWARD_AD_PERATOM);
} else {
poisson_ik(work1, work2, density_fft, fft1, fft2,
nx_pppm, ny_pppm, nz_pppm, nfft,
nxlo_fft, nylo_fft, nzlo_fft, nxhi_fft, nyhi_fft, nzhi_fft,
nxlo_in, nylo_in, nzlo_in, nxhi_in, nyhi_in, nzhi_in,
energy_1, greensfn,
fkx, fky, fkz,fkx2, fky2, fkz2,
vdx_brick, vdy_brick, vdz_brick, virial_1, vg,vg2,
u_brick, v0_brick, v1_brick, v2_brick, v3_brick, v4_brick, v5_brick);
cg->forward_comm(this, FORWARD_IK);
fieldforce_c_ik();
if (evflag_atom) cg_peratom->forward_comm(this, FORWARD_IK_PERATOM);
}
if (evflag_atom) fieldforce_c_peratom();
}
if (function[1]) {
//perfrom calculations for geometric mixing
particle_map(delxinv_6, delyinv_6, delzinv_6, shift_6, part2grid_6, nupper_6, nlower_6,
nxlo_out_6, nylo_out_6, nzlo_out_6, nxhi_out_6, nyhi_out_6, nzhi_out_6);
make_rho_g();
cg_6->reverse_comm(this, REVERSE_RHO_G);
brick2fft(nxlo_in_6, nylo_in_6, nzlo_in_6, nxhi_in_6, nyhi_in_6, nzhi_in_6,
density_brick_g, density_fft_g, work1_6,remap_6);
if (differentiation_flag == 1) {
poisson_ad(work1_6, work2_6, density_fft_g, fft1_6, fft2_6,
nx_pppm_6, ny_pppm_6, nz_pppm_6, nfft_6,
nxlo_fft_6, nylo_fft_6, nzlo_fft_6, nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
nxlo_in_6, nylo_in_6, nzlo_in_6, nxhi_in_6, nyhi_in_6, nzhi_in_6,
energy_6, greensfn_6,
virial_6, vg_6, vg2_6,
u_brick_g, v0_brick_g, v1_brick_g, v2_brick_g, v3_brick_g, v4_brick_g, v5_brick_g);
cg_6->forward_comm(this,FORWARD_AD_G);
fieldforce_g_ad();
if (vflag_atom) cg_peratom_6->forward_comm(this,FORWARD_AD_PERATOM_G);
} else {
poisson_ik(work1_6, work2_6, density_fft_g, fft1_6, fft2_6,
nx_pppm_6, ny_pppm_6, nz_pppm_6, nfft_6,
nxlo_fft_6, nylo_fft_6, nzlo_fft_6, nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
nxlo_in_6, nylo_in_6, nzlo_in_6, nxhi_in_6, nyhi_in_6, nzhi_in_6,
energy_6, greensfn_6,
fkx_6, fky_6, fkz_6,fkx2_6, fky2_6, fkz2_6,
vdx_brick_g, vdy_brick_g, vdz_brick_g, virial_6, vg_6, vg2_6,
u_brick_g, v0_brick_g, v1_brick_g, v2_brick_g, v3_brick_g, v4_brick_g, v5_brick_g);
cg_6->forward_comm(this,FORWARD_IK_G);
fieldforce_g_ik();
if (evflag_atom) cg_peratom_6->forward_comm(this, FORWARD_IK_PERATOM_G);
}
if (evflag_atom) fieldforce_g_peratom();
}
if (function[2]) {
//perform calculations for arithmetic mixing
particle_map(delxinv_6, delyinv_6, delzinv_6, shift_6, part2grid_6, nupper_6, nlower_6,
nxlo_out_6, nylo_out_6, nzlo_out_6, nxhi_out_6, nyhi_out_6, nzhi_out_6);
make_rho_a();
cg_6->reverse_comm(this, REVERSE_RHO_A);
brick2fft_a();
if ( differentiation_flag == 1) {
poisson_ad(work1_6, work2_6, density_fft_a3, fft1_6, fft2_6,
nx_pppm_6, ny_pppm_6, nz_pppm_6, nfft_6,
nxlo_fft_6, nylo_fft_6, nzlo_fft_6, nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
nxlo_in_6, nylo_in_6, nzlo_in_6, nxhi_in_6, nyhi_in_6, nzhi_in_6,
energy_6, greensfn_6,
virial_6, vg_6, vg2_6,
u_brick_a3, v0_brick_a3, v1_brick_a3, v2_brick_a3, v3_brick_a3, v4_brick_a3, v5_brick_a3);
poisson_2s_ad(density_fft_a0, density_fft_a6,
u_brick_a0, v0_brick_a0, v1_brick_a0, v2_brick_a0, v3_brick_a0, v4_brick_a0, v5_brick_a0,
u_brick_a6, v0_brick_a6, v1_brick_a6, v2_brick_a6, v3_brick_a6, v4_brick_a6, v5_brick_a6);
poisson_2s_ad(density_fft_a1, density_fft_a5,
u_brick_a1, v0_brick_a1, v1_brick_a1, v2_brick_a1, v3_brick_a1, v4_brick_a1, v5_brick_a1,
u_brick_a5, v0_brick_a5, v1_brick_a5, v2_brick_a5, v3_brick_a5, v4_brick_a5, v5_brick_a5);
poisson_2s_ad(density_fft_a2, density_fft_a4,
u_brick_a2, v0_brick_a2, v1_brick_a2, v2_brick_a2, v3_brick_a2, v4_brick_a2, v5_brick_a2,
u_brick_a4, v0_brick_a4, v1_brick_a4, v2_brick_a4, v3_brick_a4, v4_brick_a4, v5_brick_a4);
cg_6->forward_comm(this, FORWARD_AD_A);
fieldforce_a_ad();
if (evflag_atom) cg_peratom_6->forward_comm(this, FORWARD_AD_PERATOM_A);
} else {
poisson_ik(work1_6, work2_6, density_fft_a3, fft1_6, fft2_6,
nx_pppm_6, ny_pppm_6, nz_pppm_6, nfft_6,
nxlo_fft_6, nylo_fft_6, nzlo_fft_6, nxhi_fft_6, nyhi_fft_6, nzhi_fft_6,
nxlo_in_6, nylo_in_6, nzlo_in_6, nxhi_in_6, nyhi_in_6, nzhi_in_6,
energy_6, greensfn_6,
fkx_6, fky_6, fkz_6,fkx2_6, fky2_6, fkz2_6,
vdx_brick_a3, vdy_brick_a3, vdz_brick_a3, virial_6, vg_6, vg2_6,
u_brick_a3, v0_brick_a3, v1_brick_a3, v2_brick_a3, v3_brick_a3, v4_brick_a3, v5_brick_a3);
poisson_2s_ik(density_fft_a0, density_fft_a6,
vdx_brick_a0, vdy_brick_a0, vdz_brick_a0,
vdx_brick_a6, vdy_brick_a6, vdz_brick_a6,
u_brick_a0, v0_brick_a0, v1_brick_a0, v2_brick_a0, v3_brick_a0, v4_brick_a0, v5_brick_a0,
u_brick_a6, v0_brick_a6, v1_brick_a6, v2_brick_a6, v3_brick_a6, v4_brick_a6, v5_brick_a6);
poisson_2s_ik(density_fft_a1, density_fft_a5,
vdx_brick_a1, vdy_brick_a1, vdz_brick_a1,
vdx_brick_a5, vdy_brick_a5, vdz_brick_a5,
u_brick_a1, v0_brick_a1, v1_brick_a1, v2_brick_a1, v3_brick_a1, v4_brick_a1, v5_brick_a1,
u_brick_a5, v0_brick_a5, v1_brick_a5, v2_brick_a5, v3_brick_a5, v4_brick_a5, v5_brick_a5);
poisson_2s_ik(density_fft_a2, density_fft_a4,
vdx_brick_a2, vdy_brick_a2, vdz_brick_a2,
vdx_brick_a4, vdy_brick_a4, vdz_brick_a4,
u_brick_a2, v0_brick_a2, v1_brick_a2, v2_brick_a2, v3_brick_a2, v4_brick_a2, v5_brick_a2,
u_brick_a4, v0_brick_a4, v1_brick_a4, v2_brick_a4, v3_brick_a4, v4_brick_a4, v5_brick_a4);
cg_6->forward_comm(this, FORWARD_IK_A);
fieldforce_a_ik();
if (evflag_atom) cg_peratom_6->forward_comm(this, FORWARD_IK_PERATOM_A);
}
if (evflag_atom) fieldforce_a_peratom();
}
if (function[3]) {
//perfrom calculations if no mixing rule applies
particle_map(delxinv_6, delyinv_6, delzinv_6, shift_6, part2grid_6, nupper_6, nlower_6,
nxlo_out_6, nylo_out_6, nzlo_out_6, nxhi_out_6, nyhi_out_6, nzhi_out_6);
make_rho_none();
cg_6->reverse_comm(this, REVERSE_RHO_NONE);
brick2fft_none();
if (differentiation_flag == 1) {
int n = 0;
for (int k = 0; k<nsplit_alloc/2; k++) {
poisson_none_ad(n,n+1,density_fft_none[n],density_fft_none[n+1],
u_brick_none[n],u_brick_none[n+1],
v0_brick_none, v1_brick_none, v2_brick_none,
v3_brick_none, v4_brick_none, v5_brick_none);
n += 2;
}
cg_6->forward_comm(this,FORWARD_AD_NONE);
fieldforce_none_ad();
if (vflag_atom) cg_peratom_6->forward_comm(this,FORWARD_AD_PERATOM_NONE);
} else {
int n = 0;
for (int k = 0; k<nsplit_alloc/2; k++) {
poisson_none_ik(n,n+1,density_fft_none[n], density_fft_none[n+1],
vdx_brick_none[n], vdy_brick_none[n], vdz_brick_none[n],
vdx_brick_none[n+1], vdy_brick_none[n+1], vdz_brick_none[n+1],
u_brick_none, v0_brick_none, v1_brick_none, v2_brick_none,
v3_brick_none, v4_brick_none, v5_brick_none);
n += 2;
}
cg_6->forward_comm(this,FORWARD_IK_NONE);
fieldforce_none_ik();
if (evflag_atom)
cg_peratom_6->forward_comm(this, FORWARD_IK_PERATOM_NONE);
}
if (evflag_atom) fieldforce_none_peratom();
}
// update qsum and qsqsum, if atom count has changed and energy needed
if ((eflag_global || eflag_atom) && atom->natoms != natoms_original) {
qsum_qsq();
natoms_original = atom->natoms;
}
// sum energy across procs and add in volume-dependent term
const double qscale = force->qqrd2e * scale;
if (eflag_global) {
double energy_all;
MPI_Allreduce(&energy_1,&energy_all,1,MPI_DOUBLE,MPI_SUM,world);
energy_1 = energy_all;
MPI_Allreduce(&energy_6,&energy_all,1,MPI_DOUBLE,MPI_SUM,world);
energy_6 = energy_all;
energy_1 *= 0.5*volume;
energy_6 *= 0.5*volume;
energy_1 -= g_ewald*qsqsum/MY_PIS +
MY_PI2*qsum*qsum / (g_ewald*g_ewald*volume);
energy_6 += - MY_PI*MY_PIS/(6*volume)*pow(g_ewald_6,3)*csumij +
1.0/12.0*pow(g_ewald_6,6)*csum;
energy_1 *= qscale;
}
// sum virial across procs
if (vflag_global) {
double virial_all[6];
MPI_Allreduce(virial_1,virial_all,6,MPI_DOUBLE,MPI_SUM,world);
for (i = 0; i < 6; i++) virial[i] = 0.5*qscale*volume*virial_all[i];
MPI_Allreduce(virial_6,virial_all,6,MPI_DOUBLE,MPI_SUM,world);
for (i = 0; i < 6; i++) virial[i] += 0.5*volume*virial_all[i];
if (function[1]+function[2]+function[3]){
double a = MY_PI*MY_PIS/(6*volume)*pow(g_ewald_6,3)*csumij;
virial[0] -= a;
virial[1] -= a;
virial[2] -= a;
}
}
if (eflag_atom) {
if (function[0]) {
double *q = atom->q;
for (i = 0; i < atom->nlocal; i++) {
eatom[i] -= qscale*g_ewald*q[i]*q[i]/MY_PIS + qscale*MY_PI2*q[i]*qsum / (g_ewald*g_ewald*volume); //coulomb self energy correction
}
}
if (function[1] + function[2] + function[3]) {
int tmp;
for (i = 0; i < atom->nlocal; i++) {
tmp = atom->type[i];
eatom[i] += - MY_PI*MY_PIS/(6*volume)*pow(g_ewald_6,3)*csumi[tmp] +
1.0/12.0*pow(g_ewald_6,6)*cii[tmp];
}
}
}
if (vflag_atom) {
if (function[1] + function[2] + function[3]) {
int tmp;
for (i = 0; i < atom->nlocal; i++) {
tmp = atom->type[i];
for (int n = 0; n < 3; n++) vatom[i][n] -= MY_PI*MY_PIS/(6*volume)*pow(g_ewald_6,3)*csumi[tmp]; //dispersion self virial correction
}
}
}
// 2d slab correction
if (slabflag) slabcorr(eflag);
if (function[0]) energy += energy_1;
if (function[1] + function[2] + function[3]) energy += energy_6;
// convert atoms back from lamda to box coords
if (triclinic) domain->lamda2x(atom->nlocal);
}
/* ----------------------------------------------------------------------
initialize coefficients needed for the dispersion density on the grids
------------------------------------------------------------------------- */
void PPPMDisp::init_coeffs() // local pair coeffs
{
int tmp;
int n = atom->ntypes;
int converged;
delete [] B;
B = NULL;
if (function[3] + function[2]) { // no mixing rule or arithmetic
if (function[2] && me == 0) {
if (screen) fprintf(screen," Optimizing splitting of Dispersion coefficients\n");
if (logfile) fprintf(logfile," Optimizing splitting of Dispersion coefficients\n");
}
// allocate data for eigenvalue decomposition
double **A=NULL;
double **Q=NULL;
if ( n > 1 ) {
// get dispersion coefficients
double **b = (double **) force->pair->extract("B",tmp);
memory->create(A,n,n,"pppm/disp:A");
memory->create(Q,n,n,"pppm/disp:Q");
// fill coefficients to matrix a
for (int i = 1; i <= n; i++)
for (int j = 1; j <= n; j++)
A[i-1][j-1] = b[i][j];
// transform q to a unity matrix
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
Q[i][j] = 0.0;
for (int i = 0; i < n; i++)
Q[i][i] = 1.0;
// perfrom eigenvalue decomposition with QR algorithm
converged = qr_alg(A,Q,n);
if (function[3] && !converged) {
error->all(FLERR,"Matrix factorization to split dispersion coefficients failed");
}
// determine number of used eigenvalues
// based on maximum allowed number or cutoff criterion
// sort eigenvalues according to their size with bubble sort
double t;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n-1-i; j++) {
if (fabs(A[j][j]) < fabs(A[j+1][j+1])) {
t = A[j][j];
A[j][j] = A[j+1][j+1];
A[j+1][j+1] = t;
for (int k = 0; k < n; k++) {
t = Q[k][j];
Q[k][j] = Q[k][j+1];
Q[k][j+1] = t;
}
}
}
}
// check which eigenvalue is the first that is smaller
// than a specified tolerance
// check how many are maximum allowed by the user
double amax = fabs(A[0][0]);
double acrit = amax*splittol;
double bmax = 0;
double err = 0;
nsplit = 0;
for (int i = 0; i < n; i++) {
if (fabs(A[i][i]) > acrit) nsplit++;
else {
bmax = fabs(A[i][i]);
break;
}
}
err = bmax/amax;
if (err > 1.0e-4) {
char str[128];
sprintf(str,"Estimated error in splitting of dispersion coeffs is %g",err);
error->warning(FLERR, str);
}
// set B
B = new double[nsplit*n+nsplit];
for (int i = 0; i< nsplit; i++) {
B[i] = A[i][i];
for (int j = 0; j < n; j++) {
B[nsplit*(j+1) + i] = Q[j][i];
}
}
nsplit_alloc = nsplit;
if (nsplit%2 == 1) nsplit_alloc = nsplit + 1;
} else
nsplit = 1; // use geometric mixing
// check if the function should preferably be [1] or [2] or [3]
if (nsplit == 1) {
if ( B ) delete [] B;
function[3] = 0;
function[2] = 0;
function[1] = 1;
if (me == 0) {
if (screen) fprintf(screen," Using geometric mixing for reciprocal space\n");
if (logfile) fprintf(logfile," Using geometric mixing for reciprocal space\n");
}
}
if (function[2] && nsplit <= 6) {
if (me == 0) {
if (screen) fprintf(screen," Using %d instead of 7 structure factors\n",nsplit);
if (logfile) fprintf(logfile," Using %d instead of 7 structure factors\n",nsplit);
}
function[3] = 1;
function[2] = 0;
}
if (function[2] && (nsplit > 6)) {
if (me == 0) {
if (screen) fprintf(screen," Using 7 structure factors\n");
if (logfile) fprintf(logfile," Using 7 structure factors\n");
}
if ( B ) delete [] B;
}
if (function[3]) {
if (me == 0) {
if (screen) fprintf(screen," Using %d structure factors\n",nsplit);
if (logfile) fprintf(logfile," Using %d structure factors\n",nsplit);
}
if (nsplit > 9) error->warning(FLERR, "Simulations might be very slow because of large number of structure factors");
}
memory->destroy(A);
memory->destroy(Q);
}
if (function[1]) { // geometric 1/r^6
double **b = (double **) force->pair->extract("B",tmp);
B = new double[n+1];
B[0] = 0.0;
for (int i=1; i<=n; ++i) B[i] = sqrt(fabs(b[i][i]));
}
if (function[2]) { // arithmetic 1/r^6
//cannot use epsilon, because this has not been set yet
double **epsilon = (double **) force->pair->extract("epsilon",tmp);
//cannot use sigma, because this has not been set yet
double **sigma = (double **) force->pair->extract("sigma",tmp);
if (!(epsilon&&sigma))
error->all(FLERR,"Epsilon or sigma reference not set by pair style in PPPMDisp");
double eps_i, sigma_i, sigma_n, *bi = B = new double[7*n+7];
double c[7] = {
1.0, sqrt(6.0), sqrt(15.0), sqrt(20.0), sqrt(15.0), sqrt(6.0), 1.0};
for (int i=0; i<=n; ++i) {
eps_i = sqrt(epsilon[i][i]);
sigma_i = sigma[i][i];
sigma_n = 1.0;
for (int j=0; j<7; ++j) {
*(bi++) = sigma_n*eps_i*c[j]*0.25;
sigma_n *= sigma_i;
}
}
}
}
/* ----------------------------------------------------------------------
Eigenvalue decomposition of a real, symmetric matrix with the QR
method (includes transpformation to Tridiagonal Matrix + Wilkinson
shift)
------------------------------------------------------------------------- */
int PPPMDisp::qr_alg(double **A, double **Q, int n)
{
int converged = 0;
double an1, an, bn1, d, mue;
// allocate some memory for the required operations
double **A0,**Qi,**C,**D,**E;
// make a copy of A for convergence check
memory->create(A0,n,n,"pppm/disp:A0");
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
A0[i][j] = A[i][j];
// allocate an auxiliary matrix Qi
memory->create(Qi,n,n,"pppm/disp:Qi");
// alllocate an auxillary matrices for the matrix multiplication
memory->create(C,n,n,"pppm/disp:C");
memory->create(D,n,n,"pppm/disp:D");
memory->create(E,n,n,"pppm/disp:E");
// transform Matrix A to Tridiagonal form
hessenberg(A,Q,n);
// start loop for the matrix factorization
int count = 0;
int countmax = 100000;
while (1) {
// make a Wilkinson shift
an1 = A[n-2][n-2];
an = A[n-1][n-1];
bn1 = A[n-2][n-1];
d = (an1-an)/2;
mue = an + d - copysign(1.,d)*sqrt(d*d + bn1*bn1);
for (int i = 0; i < n; i++)
A[i][i] -= mue;
// perform a QR factorization for a tridiagonal matrix A
qr_tri(Qi,A,n);
// update the matrices
mmult(A,Qi,C,n);
mmult(Q,Qi,C,n);
// backward Wilkinson shift
for (int i = 0; i < n; i++)
A[i][i] += mue;
// check the convergence
converged = check_convergence(A,Q,A0,C,D,E,n);
if (converged) break;
count = count + 1;
if (count == countmax) break;
}
// free allocated memory
memory->destroy(Qi);
memory->destroy(A0);
memory->destroy(C);
memory->destroy(D);
memory->destroy(E);
return converged;
}
/* ----------------------------------------------------------------------
Transform a Matrix to Hessenberg form (for symmetric Matrices, the
result will be a tridiagonal matrix)
------------------------------------------------------------------------- */
void PPPMDisp::hessenberg(double **A, double **Q, int n)
{
double r,a,b,c,s,x1,x2;
for (int i = 0; i < n-1; i++) {
for (int j = i+2; j < n; j++) {
// compute coeffs for the rotation matrix
a = A[i+1][i];
b = A[j][i];
r = sqrt(a*a + b*b);
c = a/r;
s = b/r;
// update the entries of A with multiplication from the left
for (int k = 0; k < n; k++) {
x1 = A[i+1][k];
x2 = A[j][k];
A[i+1][k] = c*x1 + s*x2;
A[j][k] = -s*x1 + c*x2;
}
// update the entries of A and Q with a multiplication from the right
for (int k = 0; k < n; k++) {
x1 = A[k][i+1];
x2 = A[k][j];
A[k][i+1] = c*x1 + s*x2;
A[k][j] = -s*x1 + c*x2;
x1 = Q[k][i+1];
x2 = Q[k][j];
Q[k][i+1] = c*x1 + s*x2;
Q[k][j] = -s*x1 + c*x2;
}
}
}
}
/* ----------------------------------------------------------------------
QR factorization for a tridiagonal matrix; Result of the factorization
is stored in A and Qi
------------------------------------------------------------------------- */
void PPPMDisp::qr_tri(double** Qi,double** A,int n)
{
double r,a,b,c,s,x1,x2;
int j,k,k0,kmax;
// make Qi a unity matrix
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
Qi[i][j] = 0.0;
for (int i = 0; i < n; i++)
Qi[i][i] = 1.0;
// loop over main diagonal and first of diagonal of A
for (int i = 0; i < n-1; i++) {
j = i+1;
// coefficients of the rotation matrix
a = A[i][i];
b = A[j][i];
r = sqrt(a*a + b*b);
c = a/r;
s = b/r;
// update the entries of A and Q
k0 = (i-1>0)?i-1:0; //min(i-1,0);
kmax = (i+3<n)?i+3:n; //min(i+3,n);
for (k = k0; k < kmax; k++) {
x1 = A[i][k];
x2 = A[j][k];
A[i][k] = c*x1 + s*x2;
A[j][k] = -s*x1 + c*x2;
}
for (k = 0; k < n; k++) {
x1 = Qi[k][i];
x2 = Qi[k][j];
Qi[k][i] = c*x1 + s*x2;
Qi[k][j] = -s*x1 + c*x2;
}
}
}
/* ----------------------------------------------------------------------
Multiply two matrices A and B, store the result in A; C provides
some memory to store intermediate results
------------------------------------------------------------------------- */
void PPPMDisp::mmult(double** A, double** B, double** C, int n)
{
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
C[i][j] = 0.0;
// perform matrix multiplication
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
for (int k = 0; k < n; k++)
C[i][j] += A[i][k] * B[k][j];
// copy the result back to matrix A
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
A[i][j] = C[i][j];
}
/* ----------------------------------------------------------------------
Check if the factorization has converged by comparing all elements of the
original matrix and the new matrix
------------------------------------------------------------------------- */
int PPPMDisp::check_convergence(double** A,double** Q,double** A0,
double** C,double** D,double** E,int n)
{
double eps = 1.0e-8;
int converged = 1;
double epsmax = -1;
double Bmax = 0.0;
double diff;
// get the largest eigenvalue of the original matrix
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
Bmax = (Bmax>A0[i][j])?Bmax:A0[i][j]; //max(Bmax,A0[i][j]);
double epsabs = eps*Bmax;
// reconstruct the original matrix
// store the diagonal elements in D
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
D[i][j] = 0.0;
for (int i = 0; i < n; i++)
D[i][i] = A[i][i];
// store matrix Q in E
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
E[i][j] = Q[i][j];
// E = Q*A
mmult(E,D,C,n);
// store transpose of Q in D
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
D[i][j] = Q[j][i];
// E = Q*A*Q.t
mmult(E,D,C,n);
//compare the original matrix and the final matrix
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
diff = A0[i][j] - E[i][j];
epsmax = (epsmax>fabs(diff))?epsmax:fabs(diff);//max(epsmax,fabs(diff));
}
}
if (epsmax > epsabs) converged = 0;
return converged;
}
/* ----------------------------------------------------------------------
allocate memory that depends on # of K-vectors and order
------------------------------------------------------------------------- */
void PPPMDisp::allocate()
{
int (*procneigh)[2] = comm->procneigh;
if (function[0]) {
memory->create(work1,2*nfft_both,"pppm/disp:work1");
memory->create(work2,2*nfft_both,"pppm/disp:work2");
memory->create1d_offset(fkx,nxlo_fft,nxhi_fft,"pppm/disp:fkx");
memory->create1d_offset(fky,nylo_fft,nyhi_fft,"pppm/disp:fky");
memory->create1d_offset(fkz,nzlo_fft,nzhi_fft,"pppm/disp:fkz");
memory->create1d_offset(fkx2,nxlo_fft,nxhi_fft,"pppm/disp:fkx2");
memory->create1d_offset(fky2,nylo_fft,nyhi_fft,"pppm/disp:fky2");
memory->create1d_offset(fkz2,nzlo_fft,nzhi_fft,"pppm/disp:fkz2");
memory->create(gf_b,order,"pppm/disp:gf_b");
memory->create2d_offset(rho1d,3,-order/2,order/2,"pppm/disp:rho1d");
memory->create2d_offset(rho_coeff,order,(1-order)/2,order/2,"pppm/disp:rho_coeff");
memory->create2d_offset(drho1d,3,-order/2,order/2,"pppm/disp:rho1d");
memory->create2d_offset(drho_coeff,order,(1-order)/2,order/2,"pppm/disp:drho_coeff");
memory->create(greensfn,nfft_both,"pppm/disp:greensfn");
memory->create(vg,nfft_both,6,"pppm/disp:vg");
memory->create(vg2,nfft_both,3,"pppm/disp:vg2");
memory->create3d_offset(density_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:density_brick");
if ( differentiation_flag == 1) {
memory->create3d_offset(u_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:u_brick");
memory->create(sf_precoeff1,nfft_both,"pppm/disp:sf_precoeff1");
memory->create(sf_precoeff2,nfft_both,"pppm/disp:sf_precoeff2");
memory->create(sf_precoeff3,nfft_both,"pppm/disp:sf_precoeff3");
memory->create(sf_precoeff4,nfft_both,"pppm/disp:sf_precoeff4");
memory->create(sf_precoeff5,nfft_both,"pppm/disp:sf_precoeff5");
memory->create(sf_precoeff6,nfft_both,"pppm/disp:sf_precoeff6");
} else {
memory->create3d_offset(vdx_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:vdx_brick");
memory->create3d_offset(vdy_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:vdy_brick");
memory->create3d_offset(vdz_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:vdz_brick");
}
memory->create(density_fft,nfft_both,"pppm/disp:density_fft");
int tmp;
fft1 = new FFT3d(lmp,world,nx_pppm,ny_pppm,nz_pppm,
nxlo_fft,nxhi_fft,nylo_fft,nyhi_fft,nzlo_fft,nzhi_fft,
nxlo_fft,nxhi_fft,nylo_fft,nyhi_fft,nzlo_fft,nzhi_fft,
0,0,&tmp,collective_flag);
fft2 = new FFT3d(lmp,world,nx_pppm,ny_pppm,nz_pppm,
nxlo_fft,nxhi_fft,nylo_fft,nyhi_fft,nzlo_fft,nzhi_fft,
nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
0,0,&tmp,collective_flag);
remap = new Remap(lmp,world,
nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
nxlo_fft,nxhi_fft,nylo_fft,nyhi_fft,nzlo_fft,nzhi_fft,
1,0,0,FFT_PRECISION,collective_flag);
// create ghost grid object for rho and electric field communication
if (differentiation_flag == 1)
cg = new GridComm(lmp,world,1,1,
nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
nxlo_out,nxhi_out,nylo_out,nyhi_out,nzlo_out,nzhi_out,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
else
cg = new GridComm(lmp,world,3,1,
nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
nxlo_out,nxhi_out,nylo_out,nyhi_out,nzlo_out,nzhi_out,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
}
if (function[1]) {
memory->create(work1_6,2*nfft_both_6,"pppm/disp:work1_6");
memory->create(work2_6,2*nfft_both_6,"pppm/disp:work2_6");
memory->create1d_offset(fkx_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx_6");
memory->create1d_offset(fky_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky_6");
memory->create1d_offset(fkz_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz_6");
memory->create1d_offset(fkx2_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx2_6");
memory->create1d_offset(fky2_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky2_6");
memory->create1d_offset(fkz2_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz2_6");
memory->create(gf_b_6,order_6,"pppm/disp:gf_b_6");
memory->create2d_offset(rho1d_6,3,-order_6/2,order_6/2,"pppm/disp:rho1d_6");
memory->create2d_offset(rho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:rho_coeff_6");
memory->create2d_offset(drho1d_6,3,-order_6/2,order_6/2,"pppm/disp:drho1d_6");
memory->create2d_offset(drho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:drho_coeff_6");
memory->create(greensfn_6,nfft_both_6,"pppm/disp:greensfn_6");
memory->create(vg_6,nfft_both_6,6,"pppm/disp:vg_6");
memory->create(vg2_6,nfft_both_6,3,"pppm/disp:vg2_6");
memory->create3d_offset(density_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_g");
if ( differentiation_flag == 1) {
memory->create3d_offset(u_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_g");
memory->create(sf_precoeff1_6,nfft_both_6,"pppm/disp:sf_precoeff1_6");
memory->create(sf_precoeff2_6,nfft_both_6,"pppm/disp:sf_precoeff2_6");
memory->create(sf_precoeff3_6,nfft_both_6,"pppm/disp:sf_precoeff3_6");
memory->create(sf_precoeff4_6,nfft_both_6,"pppm/disp:sf_precoeff4_6");
memory->create(sf_precoeff5_6,nfft_both_6,"pppm/disp:sf_precoeff5_6");
memory->create(sf_precoeff6_6,nfft_both_6,"pppm/disp:sf_precoeff6_6");
} else {
memory->create3d_offset(vdx_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_g");
memory->create3d_offset(vdy_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_g");
memory->create3d_offset(vdz_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_g");
}
memory->create(density_fft_g,nfft_both_6,"pppm/disp:density_fft_g");
int tmp;
fft1_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
0,0,&tmp,collective_flag);
fft2_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
0,0,&tmp,collective_flag);
remap_6 = new Remap(lmp,world,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
1,0,0,FFT_PRECISION,collective_flag);
// create ghost grid object for rho and electric field communication
if (differentiation_flag == 1)
cg_6 = new GridComm(lmp,world,1,1,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
else
cg_6 = new GridComm(lmp,world,3,1,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
}
if (function[2]) {
memory->create(work1_6,2*nfft_both_6,"pppm/disp:work1_6");
memory->create(work2_6,2*nfft_both_6,"pppm/disp:work2_6");
memory->create1d_offset(fkx_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx_6");
memory->create1d_offset(fky_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky_6");
memory->create1d_offset(fkz_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz_6");
memory->create1d_offset(fkx2_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx2_6");
memory->create1d_offset(fky2_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky2_6");
memory->create1d_offset(fkz2_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz2_6");
memory->create(gf_b_6,order_6,"pppm/disp:gf_b_6");
memory->create2d_offset(rho1d_6,3,-order_6/2,order_6/2,"pppm/disp:rho1d_6");
memory->create2d_offset(rho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:rho_coeff_6");
memory->create2d_offset(drho1d_6,3,-order_6/2,order_6/2,"pppm/disp:drho1d_6");
memory->create2d_offset(drho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:drho_coeff_6");
memory->create(greensfn_6,nfft_both_6,"pppm/disp:greensfn_6");
memory->create(vg_6,nfft_both_6,6,"pppm/disp:vg_6");
memory->create(vg2_6,nfft_both_6,3,"pppm/disp:vg2_6");
memory->create3d_offset(density_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a0");
memory->create3d_offset(density_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a1");
memory->create3d_offset(density_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a2");
memory->create3d_offset(density_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a3");
memory->create3d_offset(density_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a4");
memory->create3d_offset(density_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a5");
memory->create3d_offset(density_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_a6");
memory->create(density_fft_a0,nfft_both_6,"pppm/disp:density_fft_a0");
memory->create(density_fft_a1,nfft_both_6,"pppm/disp:density_fft_a1");
memory->create(density_fft_a2,nfft_both_6,"pppm/disp:density_fft_a2");
memory->create(density_fft_a3,nfft_both_6,"pppm/disp:density_fft_a3");
memory->create(density_fft_a4,nfft_both_6,"pppm/disp:density_fft_a4");
memory->create(density_fft_a5,nfft_both_6,"pppm/disp:density_fft_a5");
memory->create(density_fft_a6,nfft_both_6,"pppm/disp:density_fft_a6");
if ( differentiation_flag == 1 ) {
memory->create3d_offset(u_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a0");
memory->create3d_offset(u_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a1");
memory->create3d_offset(u_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a2");
memory->create3d_offset(u_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a3");
memory->create3d_offset(u_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a4");
memory->create3d_offset(u_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a5");
memory->create3d_offset(u_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a6");
memory->create(sf_precoeff1_6,nfft_both_6,"pppm/disp:sf_precoeff1_6");
memory->create(sf_precoeff2_6,nfft_both_6,"pppm/disp:sf_precoeff2_6");
memory->create(sf_precoeff3_6,nfft_both_6,"pppm/disp:sf_precoeff3_6");
memory->create(sf_precoeff4_6,nfft_both_6,"pppm/disp:sf_precoeff4_6");
memory->create(sf_precoeff5_6,nfft_both_6,"pppm/disp:sf_precoeff5_6");
memory->create(sf_precoeff6_6,nfft_both_6,"pppm/disp:sf_precoeff6_6");
} else {
memory->create3d_offset(vdx_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a0");
memory->create3d_offset(vdy_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a0");
memory->create3d_offset(vdz_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a0");
memory->create3d_offset(vdx_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a1");
memory->create3d_offset(vdy_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a1");
memory->create3d_offset(vdz_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a1");
memory->create3d_offset(vdx_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a2");
memory->create3d_offset(vdy_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a2");
memory->create3d_offset(vdz_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a2");
memory->create3d_offset(vdx_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a3");
memory->create3d_offset(vdy_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a3");
memory->create3d_offset(vdz_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a3");
memory->create3d_offset(vdx_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a4");
memory->create3d_offset(vdy_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a4");
memory->create3d_offset(vdz_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a4");
memory->create3d_offset(vdx_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a5");
memory->create3d_offset(vdy_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a5");
memory->create3d_offset(vdz_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a5");
memory->create3d_offset(vdx_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_a6");
memory->create3d_offset(vdy_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_a6");
memory->create3d_offset(vdz_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_a6");
}
int tmp;
fft1_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
0,0,&tmp,collective_flag);
fft2_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
0,0,&tmp,collective_flag);
remap_6 = new Remap(lmp,world,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
1,0,0,FFT_PRECISION,collective_flag);
// create ghost grid object for rho and electric field communication
if (differentiation_flag == 1)
cg_6 = new GridComm(lmp,world,7,7,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
else
cg_6 = new GridComm(lmp,world,21,7,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
}
if (function[3]) {
memory->create(work1_6,2*nfft_both_6,"pppm/disp:work1_6");
memory->create(work2_6,2*nfft_both_6,"pppm/disp:work2_6");
memory->create1d_offset(fkx_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx_6");
memory->create1d_offset(fky_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky_6");
memory->create1d_offset(fkz_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz_6");
memory->create1d_offset(fkx2_6,nxlo_fft_6,nxhi_fft_6,"pppm/disp:fkx2_6");
memory->create1d_offset(fky2_6,nylo_fft_6,nyhi_fft_6,"pppm/disp:fky2_6");
memory->create1d_offset(fkz2_6,nzlo_fft_6,nzhi_fft_6,"pppm/disp:fkz2_6");
memory->create(gf_b_6,order_6,"pppm/disp:gf_b_6");
memory->create2d_offset(rho1d_6,3,-order_6/2,order_6/2,"pppm/disp:rho1d_6");
memory->create2d_offset(rho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:rho_coeff_6");
memory->create2d_offset(drho1d_6,3,-order_6/2,order_6/2,"pppm/disp:drho1d_6");
memory->create2d_offset(drho_coeff_6,order_6,(1-order_6)/2,order_6/2,"pppm/disp:drho_coeff_6");
memory->create(greensfn_6,nfft_both_6,"pppm/disp:greensfn_6");
memory->create(vg_6,nfft_both_6,6,"pppm/disp:vg_6");
memory->create(vg2_6,nfft_both_6,3,"pppm/disp:vg2_6");
memory->create4d_offset(density_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:density_brick_none");
if ( differentiation_flag == 1) {
memory->create4d_offset(u_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_none");
memory->create(sf_precoeff1_6,nfft_both_6,"pppm/disp:sf_precoeff1_6");
memory->create(sf_precoeff2_6,nfft_both_6,"pppm/disp:sf_precoeff2_6");
memory->create(sf_precoeff3_6,nfft_both_6,"pppm/disp:sf_precoeff3_6");
memory->create(sf_precoeff4_6,nfft_both_6,"pppm/disp:sf_precoeff4_6");
memory->create(sf_precoeff5_6,nfft_both_6,"pppm/disp:sf_precoeff5_6");
memory->create(sf_precoeff6_6,nfft_both_6,"pppm/disp:sf_precoeff6_6");
} else {
memory->create4d_offset(vdx_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdx_brick_none");
memory->create4d_offset(vdy_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdy_brick_none");
memory->create4d_offset(vdz_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:vdz_brick_none");
}
memory->create(density_fft_none,nsplit_alloc,nfft_both_6,"pppm/disp:density_fft_none");
int tmp;
fft1_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
0,0,&tmp,collective_flag);
fft2_6 = new FFT3d(lmp,world,nx_pppm_6,ny_pppm_6,nz_pppm_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
0,0,&tmp,collective_flag);
remap_6 = new Remap(lmp,world,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_fft_6,nxhi_fft_6,nylo_fft_6,nyhi_fft_6,nzlo_fft_6,nzhi_fft_6,
1,0,0,FFT_PRECISION,collective_flag);
// create ghost grid object for rho and electric field communication
if (differentiation_flag == 1)
cg_6 = new GridComm(lmp,world,nsplit_alloc,nsplit_alloc,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
else
cg_6 = new GridComm(lmp,world,3*nsplit_alloc,nsplit_alloc,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
}
}
/* ----------------------------------------------------------------------
allocate memory that depends on # of K-vectors and order
for per atom calculations
------------------------------------------------------------------------- */
void PPPMDisp::allocate_peratom()
{
int (*procneigh)[2] = comm->procneigh;
if (function[0]) {
if (differentiation_flag != 1)
memory->create3d_offset(u_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:u_brick");
memory->create3d_offset(v0_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:v0_brick");
memory->create3d_offset(v1_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:v1_brick");
memory->create3d_offset(v2_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:v2_brick");
memory->create3d_offset(v3_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:v3_brick");
memory->create3d_offset(v4_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:v4_brick");
memory->create3d_offset(v5_brick,nzlo_out,nzhi_out,nylo_out,nyhi_out,
nxlo_out,nxhi_out,"pppm/disp:v5_brick");
// create ghost grid object for rho and electric field communication
if (differentiation_flag == 1)
cg_peratom =
new GridComm(lmp,world,6,1,
nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
nxlo_out,nxhi_out,nylo_out,nyhi_out,nzlo_out,nzhi_out,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
else
cg_peratom =
new GridComm(lmp,world,7,1,
nxlo_in,nxhi_in,nylo_in,nyhi_in,nzlo_in,nzhi_in,
nxlo_out,nxhi_out,nylo_out,nyhi_out,nzlo_out,nzhi_out,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
}
if (function[1]) {
if ( differentiation_flag != 1 )
memory->create3d_offset(u_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_g");
memory->create3d_offset(v0_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_g");
memory->create3d_offset(v1_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_g");
memory->create3d_offset(v2_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_g");
memory->create3d_offset(v3_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_g");
memory->create3d_offset(v4_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_g");
memory->create3d_offset(v5_brick_g,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_g");
// create ghost grid object for rho and electric field communication
if (differentiation_flag == 1)
cg_peratom_6 =
new GridComm(lmp,world,6,1,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
else
cg_peratom_6 =
new GridComm(lmp,world,7,1,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
}
if (function[2]) {
if ( differentiation_flag != 1 ) {
memory->create3d_offset(u_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a0");
memory->create3d_offset(u_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a1");
memory->create3d_offset(u_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a2");
memory->create3d_offset(u_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a3");
memory->create3d_offset(u_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a4");
memory->create3d_offset(u_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a5");
memory->create3d_offset(u_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_a6");
}
memory->create3d_offset(v0_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a0");
memory->create3d_offset(v1_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a0");
memory->create3d_offset(v2_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a0");
memory->create3d_offset(v3_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a0");
memory->create3d_offset(v4_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a0");
memory->create3d_offset(v5_brick_a0,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a0");
memory->create3d_offset(v0_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a1");
memory->create3d_offset(v1_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a1");
memory->create3d_offset(v2_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a1");
memory->create3d_offset(v3_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a1");
memory->create3d_offset(v4_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a1");
memory->create3d_offset(v5_brick_a1,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a1");
memory->create3d_offset(v0_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a2");
memory->create3d_offset(v1_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a2");
memory->create3d_offset(v2_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a2");
memory->create3d_offset(v3_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a2");
memory->create3d_offset(v4_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a2");
memory->create3d_offset(v5_brick_a2,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a2");
memory->create3d_offset(v0_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a3");
memory->create3d_offset(v1_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a3");
memory->create3d_offset(v2_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a3");
memory->create3d_offset(v3_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a3");
memory->create3d_offset(v4_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a3");
memory->create3d_offset(v5_brick_a3,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a3");
memory->create3d_offset(v0_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a4");
memory->create3d_offset(v1_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a4");
memory->create3d_offset(v2_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a4");
memory->create3d_offset(v3_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a4");
memory->create3d_offset(v4_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a4");
memory->create3d_offset(v5_brick_a4,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a4");
memory->create3d_offset(v0_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a5");
memory->create3d_offset(v1_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a5");
memory->create3d_offset(v2_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a5");
memory->create3d_offset(v3_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a5");
memory->create3d_offset(v4_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a5");
memory->create3d_offset(v5_brick_a5,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a5");
memory->create3d_offset(v0_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_a6");
memory->create3d_offset(v1_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_a6");
memory->create3d_offset(v2_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_a6");
memory->create3d_offset(v3_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_a6");
memory->create3d_offset(v4_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_a6");
memory->create3d_offset(v5_brick_a6,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_a6");
// create ghost grid object for rho and electric field communication
if (differentiation_flag == 1)
cg_peratom_6 =
new GridComm(lmp,world,42,1,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
else
cg_peratom_6 =
new GridComm(lmp,world,49,1,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
}
if (function[3]) {
if ( differentiation_flag != 1 )
memory->create4d_offset(u_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:u_brick_none");
memory->create4d_offset(v0_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v0_brick_none");
memory->create4d_offset(v1_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v1_brick_none");
memory->create4d_offset(v2_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v2_brick_none");
memory->create4d_offset(v3_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v3_brick_none");
memory->create4d_offset(v4_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v4_brick_none");
memory->create4d_offset(v5_brick_none,nsplit_alloc,nzlo_out_6,nzhi_out_6,nylo_out_6,nyhi_out_6,
nxlo_out_6,nxhi_out_6,"pppm/disp:v5_brick_none");
// create ghost grid object for rho and electric field communication
if (differentiation_flag == 1)
cg_peratom_6 =
new GridComm(lmp,world,6*nsplit_alloc,1,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
else
cg_peratom_6 =
new GridComm(lmp,world,7*nsplit_alloc,1,
nxlo_in_6,nxhi_in_6,nylo_in_6,nyhi_in_6,nzlo_in_6,nzhi_in_6,
nxlo_out_6,nxhi_out_6,nylo_out_6,nyhi_out_6,nzlo_out_6,nzhi_out_6,
procneigh[0][0],procneigh[0][1],procneigh[1][0],
procneigh[1][1],procneigh[2][0],procneigh[2][1]);
}
}
/* ----------------------------------------------------------------------
deallocate memory that depends on # of K-vectors and order
------------------------------------------------------------------------- */
void PPPMDisp::deallocate()
{
memory->destroy3d_offset(density_brick,nzlo_out,nylo_out,nxlo_out);
memory->destroy3d_offset(vdx_brick,nzlo_out,nylo_out,nxlo_out);
memory->destroy3d_offset(vdy_brick,nzlo_out,nylo_out,nxlo_out);
memory->destroy3d_offset(vdz_brick,nzlo_out,nylo_out,nxlo_out);
memory->destroy(density_fft);
density_brick = vdx_brick = vdy_brick = vdz_brick = NULL;
density_fft = NULL;
memory->destroy3d_offset(density_brick_g,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdx_brick_g,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdy_brick_g,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdz_brick_g,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy(density_fft_g);
density_brick_g = vdx_brick_g = vdy_brick_g = vdz_brick_g = NULL;
density_fft_g = NULL;
memory->destroy3d_offset(density_brick_a0,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdx_brick_a0,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdy_brick_a0,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdz_brick_a0,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy(density_fft_a0);
density_brick_a0 = vdx_brick_a0 = vdy_brick_a0 = vdz_brick_a0 = NULL;
density_fft_a0 = NULL;
memory->destroy3d_offset(density_brick_a1,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdx_brick_a1,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdy_brick_a1,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdz_brick_a1,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy(density_fft_a1);
density_brick_a1 = vdx_brick_a1 = vdy_brick_a1 = vdz_brick_a1 = NULL;
density_fft_a1 = NULL;
memory->destroy3d_offset(density_brick_a2,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdx_brick_a2,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdy_brick_a2,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdz_brick_a2,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy(density_fft_a2);
density_brick_a2 = vdx_brick_a2 = vdy_brick_a2 = vdz_brick_a2 = NULL;
density_fft_a2 = NULL;
memory->destroy3d_offset(density_brick_a3,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdx_brick_a3,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdy_brick_a3,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdz_brick_a3,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy(density_fft_a3);
density_brick_a3 = vdx_brick_a3 = vdy_brick_a3 = vdz_brick_a3 = NULL;
density_fft_a3 = NULL;
memory->destroy3d_offset(density_brick_a4,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdx_brick_a4,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdy_brick_a4,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdz_brick_a4,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy(density_fft_a4);
density_brick_a4 = vdx_brick_a4 = vdy_brick_a4 = vdz_brick_a4 = NULL;
density_fft_a4 = NULL;
memory->destroy3d_offset(density_brick_a5,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdx_brick_a5,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdy_brick_a5,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdz_brick_a5,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy(density_fft_a5);
density_brick_a5 = vdx_brick_a5 = vdy_brick_a5 = vdz_brick_a5 = NULL;
density_fft_a5 = NULL;
memory->destroy3d_offset(density_brick_a6,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdx_brick_a6,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdy_brick_a6,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy3d_offset(vdz_brick_a6,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy(density_fft_a6);
density_brick_a6 = vdx_brick_a6 = vdy_brick_a6 = vdz_brick_a6 = NULL;
density_fft_a6 = NULL;
memory->destroy4d_offset(density_brick_none,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy4d_offset(vdx_brick_none,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy4d_offset(vdy_brick_none,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy4d_offset(vdz_brick_none,nzlo_out_6,nylo_out_6,nxlo_out_6);
memory->destroy(density_fft_none);
density_brick_none = vdx_brick_none = vdy_brick_none = vdz_brick_none = NULL;
density_fft_none = NULL;
memory->destroy(sf_precoeff1);
memory->destroy(sf_precoeff2);
memory->destroy(sf_precoeff3);
memory->destroy(sf_precoeff4);
memory->destroy(sf_precoeff5);
memory->destroy(sf_precoeff6);
sf_precoeff1 = sf_precoeff2 = sf_precoeff3 = sf_precoeff4 = sf_precoeff5 = sf_precoeff6 = NULL;
memory->destroy(sf_precoeff1_6);
memory->destroy(sf_precoeff2_6);
memory->destroy(sf_precoeff3_6);
memory->destroy(sf_precoeff4_6);
memory->destroy(sf_precoeff5_6);
memory->destroy(sf_precoeff6_6);
sf_precoeff1_6 = sf_precoeff2_6 = sf_precoeff3_6 = sf_precoeff4_6 = sf_precoeff5_6 = sf_precoeff6_6 = NULL;
memory->destroy(greensfn);
memory->destroy(greensfn_6);
memory->destroy(work1);
memory->destroy(work2);
memory->destroy(work1_6);
memory->destroy(work2_6);
memory->destroy(vg);
memory->destroy(vg2);
memory->destroy(vg_6);
memory->destroy(vg2_6);
greensfn = greensfn_6 = NULL;
work1 = work2 = work1_6 = work2_6 = NULL;
vg = vg2 = vg_6 = vg2_6 = NULL;
memory->destroy1d_offset(fkx,nxlo_fft);
memory->destroy1d_offset(fky,nylo_fft);
memory->destroy1d_offset(fkz,nzlo_fft);
fkx = fky = fkz = NULL;
memory->destroy1d_offset(fkx2,nxlo_fft);
memory->destroy1d_offset(fky2,nylo_fft);
memory->destroy1d_offset(fkz2,nzlo_fft);
fkx2 = fky2 = fkz2 = NULL;
memory->destroy1d_offset(fkx_6,nxlo_fft_6);
memory->destroy1d_offset(fky_6,nylo_fft_6);
memory->destroy1d_offset(fkz_6,nzlo_fft_6);
fkx_6 = fky_6 = fkz_6 = NULL;
memory->destroy1d_offset(fkx2_6,nxlo_fft_6);
memory->destroy1d_offset(fky2_6,nylo_fft_6);
memory->destroy1d_offset(fkz2_6,nzlo_fft_6);
fkx2_6 = fky2_6 = fkz2_6 = NULL;
memory->destroy(gf_b);
memory->destroy2d_offset(rho1d,-order/2);
memory->destroy2d_offset(rho_coeff,(1-order)/2);
memory->destroy2d_offset(drho1d,-order/2);
memory->destroy2d_offset(drho_coeff, (1-order)/2);
gf_b = NULL;
rho1d = rho_coeff = drho1d = drho_coeff = NULL;
memory->destroy(gf_b_6);
memory->destroy2d_offset(rho1d_6,-order_6/2);
memory->destroy2d_offset(rho_coeff_6,(1-order_6)/2);
memory->destroy2d_offset(drho1d_6,-order_6/2);
memory->destroy2d_offset(drho_coeff_6,(1-order_6)/2);
gf_b_6 = NULL;
rho1d_6 = rho_coeff_6 = drho1d_6 = drho_coeff_6 = NULL;
delete fft1;
delete fft2;
delete remap;
delete cg;
fft1 = fft2 = NULL;
remap = NULL;
cg = NULL;
delete fft1_6;
delete fft2_6;
delete remap_6;
delete cg_6;
fft1_6 = fft2_6 = NULL;
remap_6 = NULL;
cg_6 = NULL;
}
/* ----------------------------------------------------------------------
deallocate memory that depends on # of K-vectors and order
for per atom calculations
------------------------------------------------------------------------- */
void PPPMDisp::deallocate_peratom()
{
peratom_allocate_flag = 0;
memory->destroy3d_offset(u_brick, nzlo_out, nylo_out, nxlo_out);
memory->destroy3d_offset(v0_brick, nzlo_out, nylo_out, nxlo_out);
memory->destroy3d_offset(v1_brick, nzlo_out, nylo_out, nxlo_out);
memory->destroy3d_offset(v2_brick, nzlo_out, nylo_out, nxlo_out);
memory->destroy3d_offset(v3_brick, nzlo_out, nylo_out, nxlo_out);
memory->destroy3d_offset(v4_brick, nzlo_out, nylo_out, nxlo_out);
memory->destroy3d_offset(v5_brick, nzlo_out, nylo_out, nxlo_out);
u_brick = v0_brick = v1_brick = v2_brick = v3_brick = v4_brick = v5_brick = NULL;
memory->destroy3d_offset(u_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v0_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v1_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v2_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v3_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v4_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v5_brick_g, nzlo_out_6, nylo_out_6, nxlo_out_6);
u_brick_g = v0_brick_g = v1_brick_g = v2_brick_g = v3_brick_g = v4_brick_g = v5_brick_g = NULL;
memory->destroy3d_offset(u_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v0_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v1_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v2_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v3_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v4_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v5_brick_a0, nzlo_out_6, nylo_out_6, nxlo_out_6);
u_brick_a0 = v0_brick_a0 = v1_brick_a0 = v2_brick_a0 = v3_brick_a0 = v4_brick_a0 = v5_brick_a0 = NULL;
memory->destroy3d_offset(u_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v0_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v1_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v2_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v3_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v4_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v5_brick_a1, nzlo_out_6, nylo_out_6, nxlo_out_6);
u_brick_a1 = v0_brick_a1 = v1_brick_a1 = v2_brick_a1 = v3_brick_a1 = v4_brick_a1 = v5_brick_a1 = NULL;
memory->destroy3d_offset(u_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v0_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v1_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v2_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v3_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v4_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v5_brick_a2, nzlo_out_6, nylo_out_6, nxlo_out_6);
u_brick_a2 = v0_brick_a2 = v1_brick_a2 = v2_brick_a2 = v3_brick_a2 = v4_brick_a2 = v5_brick_a2 = NULL;
memory->destroy3d_offset(u_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v0_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v1_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v2_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v3_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v4_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v5_brick_a3, nzlo_out_6, nylo_out_6, nxlo_out_6);
u_brick_a3 = v0_brick_a3 = v1_brick_a3 = v2_brick_a3 = v3_brick_a3 = v4_brick_a3 = v5_brick_a3 = NULL;
memory->destroy3d_offset(u_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v0_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v1_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v2_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v3_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v4_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v5_brick_a4, nzlo_out_6, nylo_out_6, nxlo_out_6);
u_brick_a4 = v0_brick_a4 = v1_brick_a4 = v2_brick_a4 = v3_brick_a4 = v4_brick_a4 = v5_brick_a4 = NULL;
memory->destroy3d_offset(u_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v0_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v1_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v2_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v3_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v4_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v5_brick_a5, nzlo_out_6, nylo_out_6, nxlo_out_6);
u_brick_a5 = v0_brick_a5 = v1_brick_a5 = v2_brick_a5 = v3_brick_a5 = v4_brick_a5 = v5_brick_a5 = NULL;
memory->destroy3d_offset(u_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v0_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v1_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v2_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v3_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v4_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy3d_offset(v5_brick_a6, nzlo_out_6, nylo_out_6, nxlo_out_6);
u_brick_a6 = v0_brick_a6 = v1_brick_a6 = v2_brick_a6 = v3_brick_a6 = v4_brick_a6 = v5_brick_a6 = NULL;
memory->destroy4d_offset(u_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy4d_offset(v0_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy4d_offset(v1_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy4d_offset(v2_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy4d_offset(v3_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy4d_offset(v4_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
memory->destroy4d_offset(v5_brick_none, nzlo_out_6, nylo_out_6, nxlo_out_6);
u_brick_none = v0_brick_none = v1_brick_none = v2_brick_none = v3_brick_none = v4_brick_none = v5_brick_none = NULL;
delete cg_peratom;
delete cg_peratom_6;
cg_peratom = cg_peratom_6 = NULL;
}
/* ----------------------------------------------------------------------
set size of FFT grid (nx,ny,nz_pppm) and g_ewald
for Coulomb interactions
------------------------------------------------------------------------- */
void PPPMDisp::set_grid()
{
double q2 = qsqsum * force->qqrd2e;
// use xprd,yprd,zprd even if triclinic so grid size is the same
// adjust z dimension for 2d slab PPPM
// 3d PPPM just uses zprd since slab_volfactor = 1.0
double xprd = domain->xprd;
double yprd = domain->yprd;
double zprd = domain->zprd;
double zprd_slab = zprd*slab_volfactor;
// make initial g_ewald estimate
// based on desired accuracy and real space cutoff
// fluid-occupied volume used to estimate real-space error
// zprd used rather than zprd_slab
double h, h_x,h_y,h_z;
bigint natoms = atom->natoms;
if (!gewaldflag) {
g_ewald = accuracy*sqrt(natoms*cutoff*xprd*yprd*zprd) / (2.0*q2);
if (g_ewald >= 1.0)
error->all(FLERR,"KSpace accuracy too large to estimate G vector");
g_ewald = sqrt(-log(g_ewald)) / cutoff;
}
// set optimal nx_pppm,ny_pppm,nz_pppm based on order and accuracy
// nz_pppm uses extended zprd_slab instead of zprd
// reduce it until accuracy target is met
if (!gridflag) {
h = h_x = h_y = h_z = 4.0/g_ewald;
int count = 0;
while (1) {
// set grid dimension
nx_pppm = static_cast<int> (xprd/h_x);
ny_pppm = static_cast<int> (yprd/h_y);
nz_pppm = static_cast<int> (zprd_slab/h_z);
if (nx_pppm <= 1) nx_pppm = 2;
if (ny_pppm <= 1) ny_pppm = 2;
if (nz_pppm <= 1) nz_pppm = 2;
//set local grid dimension
int npey_fft,npez_fft;
if (nz_pppm >= nprocs) {
npey_fft = 1;
npez_fft = nprocs;
} else procs2grid2d(nprocs,ny_pppm,nz_pppm,&npey_fft,&npez_fft);
int me_y = me % npey_fft;
int me_z = me / npey_fft;
nxlo_fft = 0;
nxhi_fft = nx_pppm - 1;
nylo_fft = me_y*ny_pppm/npey_fft;
nyhi_fft = (me_y+1)*ny_pppm/npey_fft - 1;
nzlo_fft = me_z*nz_pppm/npez_fft;
nzhi_fft = (me_z+1)*nz_pppm/npez_fft - 1;
double qopt = compute_qopt();
double dfkspace = sqrt(qopt/natoms)*q2/(xprd*yprd*zprd_slab);
count++;
// break loop if the accuracy has been reached or too many loops have been performed
if (dfkspace <= accuracy) break;
if (count > 500) error->all(FLERR, "Could not compute grid size for Coulomb interaction");
h *= 0.95;
h_x = h_y = h_z = h;
}
}
// boost grid size until it is factorable
while (!factorable(nx_pppm)) nx_pppm++;
while (!factorable(ny_pppm)) ny_pppm++;
while (!factorable(nz_pppm)) nz_pppm++;
}
/* ----------------------------------------------------------------------
set the FFT parameters
------------------------------------------------------------------------- */
void PPPMDisp::set_fft_parameters(int& nx_p,int& ny_p,int& nz_p,
int& nxlo_f,int& nylo_f,int& nzlo_f,
int& nxhi_f,int& nyhi_f,int& nzhi_f,
int& nxlo_i,int& nylo_i,int& nzlo_i,
int& nxhi_i,int& nyhi_i,int& nzhi_i,
int& nxlo_o,int& nylo_o,int& nzlo_o,
int& nxhi_o,int& nyhi_o,int& nzhi_o,
int& nlow, int& nupp,
int& ng, int& nf, int& nfb,
double& sft,double& sftone, int& ord)
{
// global indices of PPPM grid range from 0 to N-1
// nlo_in,nhi_in = lower/upper limits of the 3d sub-brick of
// global PPPM grid that I own without ghost cells
// for slab PPPM, assign z grid as if it were not extended
nxlo_i = static_cast<int> (comm->xsplit[comm->myloc[0]] * nx_p);
nxhi_i = static_cast<int> (comm->xsplit[comm->myloc[0]+1] * nx_p) - 1;
nylo_i = static_cast<int> (comm->ysplit[comm->myloc[1]] * ny_p);
nyhi_i = static_cast<int> (comm->ysplit[comm->myloc[1]+1] * ny_p) - 1;
nzlo_i = static_cast<int>
(comm->zsplit[comm->myloc[2]] * nz_p/slab_volfactor);
nzhi_i = static_cast<int>
(comm->zsplit[comm->myloc[2]+1] * nz_p/slab_volfactor) - 1;
// nlow,nupp = stencil size for mapping particles to PPPM grid
nlow = -(ord-1)/2;
nupp = ord/2;
// sft values for particle <-> grid mapping
// add/subtract OFFSET to avoid int(-0.75) = 0 when want it to be -1
if (ord % 2) sft = OFFSET + 0.5;
else sft = OFFSET;
if (ord % 2) sftone = 0.0;
else sftone = 0.5;
// nlo_out,nhi_out = lower/upper limits of the 3d sub-brick of
// global PPPM grid that my particles can contribute charge to
// effectively nlo_in,nhi_in + ghost cells
// nlo,nhi = global coords of grid pt to "lower left" of smallest/largest
// position a particle in my box can be at
// dist[3] = particle position bound = subbox + skin/2.0 + qdist
// qdist = offset due to TIP4P fictitious charge
// convert to triclinic if necessary
// nlo_out,nhi_out = nlo,nhi + stencil size for particle mapping
// for slab PPPM, assign z grid as if it were not extended
double *prd,*sublo,*subhi;
if (triclinic == 0) {
prd = domain->prd;
boxlo = domain->boxlo;
sublo = domain->sublo;
subhi = domain->subhi;
} else {
prd = domain->prd_lamda;
boxlo = domain->boxlo_lamda;
sublo = domain->sublo_lamda;
subhi = domain->subhi_lamda;
}
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double dist[3];
double cuthalf = 0.5*neighbor->skin + qdist;
if (triclinic == 0) dist[0] = dist[1] = dist[2] = cuthalf;
else {
dist[0] = cuthalf/domain->prd[0];
dist[1] = cuthalf/domain->prd[1];
dist[2] = cuthalf/domain->prd[2];
}
int nlo,nhi;
nlo = static_cast<int> ((sublo[0]-dist[0]-boxlo[0]) *
nx_p/xprd + sft) - OFFSET;
nhi = static_cast<int> ((subhi[0]+dist[0]-boxlo[0]) *
nx_p/xprd + sft) - OFFSET;
nxlo_o = nlo + nlow;
nxhi_o = nhi + nupp;
nlo = static_cast<int> ((sublo[1]-dist[1]-boxlo[1]) *
ny_p/yprd + sft) - OFFSET;
nhi = static_cast<int> ((subhi[1]+dist[1]-boxlo[1]) *
ny_p/yprd + sft) - OFFSET;
nylo_o = nlo + nlow;
nyhi_o = nhi + nupp;
nlo = static_cast<int> ((sublo[2]-dist[2]-boxlo[2]) *
nz_p/zprd_slab + sft) - OFFSET;
nhi = static_cast<int> ((subhi[2]+dist[2]-boxlo[2]) *
nz_p/zprd_slab + sft) - OFFSET;
nzlo_o = nlo + nlow;
nzhi_o = nhi + nupp;
// for slab PPPM, change the grid boundary for processors at +z end
// to include the empty volume between periodically repeating slabs
// for slab PPPM, want charge data communicated from -z proc to +z proc,
// but not vice versa, also want field data communicated from +z proc to
// -z proc, but not vice versa
// this is accomplished by nzhi_i = nzhi_o on +z end (no ghost cells)
if (slabflag && (comm->myloc[2] == comm->procgrid[2]-1)) {
nzhi_i = nz_p - 1;
nzhi_o = nz_p - 1;
}
// decomposition of FFT mesh
// global indices range from 0 to N-1
// proc owns entire x-dimension, clump of columns in y,z dimensions
// npey_fft,npez_fft = # of procs in y,z dims
// if nprocs is small enough, proc can own 1 or more entire xy planes,
// else proc owns 2d sub-blocks of yz plane
// me_y,me_z = which proc (0-npe_fft-1) I am in y,z dimensions
// nlo_fft,nhi_fft = lower/upper limit of the section
// of the global FFT mesh that I own
int npey_fft,npez_fft;
if (nz_p >= nprocs) {
npey_fft = 1;
npez_fft = nprocs;
} else procs2grid2d(nprocs,ny_p,nz_p,&npey_fft,&npez_fft);
int me_y = me % npey_fft;
int me_z = me / npey_fft;
nxlo_f = 0;
nxhi_f = nx_p - 1;
nylo_f = me_y*ny_p/npey_fft;
nyhi_f = (me_y+1)*ny_p/npey_fft - 1;
nzlo_f = me_z*nz_p/npez_fft;
nzhi_f = (me_z+1)*nz_p/npez_fft - 1;
// PPPM grid for this proc, including ghosts
ng = (nxhi_o-nxlo_o+1) * (nyhi_o-nylo_o+1) *
(nzhi_o-nzlo_o+1);
// FFT arrays on this proc, without ghosts
// nfft = FFT points in FFT decomposition on this proc
// nfft_brick = FFT points in 3d brick-decomposition on this proc
// nfft_both = greater of 2 values
nf = (nxhi_f-nxlo_f+1) * (nyhi_f-nylo_f+1) *
(nzhi_f-nzlo_f+1);
int nfft_brick = (nxhi_i-nxlo_i+1) * (nyhi_i-nylo_i+1) *
(nzhi_i-nzlo_i+1);
nfb = MAX(nf,nfft_brick);
}
/* ----------------------------------------------------------------------
check if all factors of n are in list of factors
return 1 if yes, 0 if no
------------------------------------------------------------------------- */
int PPPMDisp::factorable(int n)
{
int i;
while (n > 1) {
for (i = 0; i < nfactors; i++) {
if (n % factors[i] == 0) {
n /= factors[i];
break;
}
}
if (i == nfactors) return 0;
}
return 1;
}
/* ----------------------------------------------------------------------
pre-compute Green's function denominator expansion coeffs, Gamma(2n)
------------------------------------------------------------------------- */
void PPPMDisp::adjust_gewald()
{
// Use Newton solver to find g_ewald
double dx;
// Begin algorithm
for (int i = 0; i < LARGE; i++) {
dx = f() / derivf();
g_ewald -= dx; //Update g_ewald
if (fabs(f()) < SMALL) return;
}
// Failed to converge
char str[128];
sprintf(str, "Could not compute g_ewald");
error->all(FLERR, str);
}
/* ----------------------------------------------------------------------
Calculate f(x)
------------------------------------------------------------------------- */
double PPPMDisp::f()
{
double df_rspace, df_kspace;
double q2 = qsqsum * force->qqrd2e;
double xprd = domain->xprd;
double yprd = domain->yprd;
double zprd = domain->zprd;
double zprd_slab = zprd*slab_volfactor;
bigint natoms = atom->natoms;
df_rspace = 2.0*q2*exp(-g_ewald*g_ewald*cutoff*cutoff) /
sqrt(natoms*cutoff*xprd*yprd*zprd);
double qopt = compute_qopt();
df_kspace = sqrt(qopt/natoms)*q2/(xprd*yprd*zprd_slab);
return df_rspace - df_kspace;
}
/* ----------------------------------------------------------------------
Calculate numerical derivative f'(x) using forward difference
[f(x + h) - f(x)] / h
------------------------------------------------------------------------- */
double PPPMDisp::derivf()
{
double h = 0.000001; //Derivative step-size
double df,f1,f2,g_ewald_old;
f1 = f();
g_ewald_old = g_ewald;
g_ewald += h;
f2 = f();
g_ewald = g_ewald_old;
df = (f2 - f1)/h;
return df;
}
/* ----------------------------------------------------------------------
Calculate the final estimator for the accuracy
------------------------------------------------------------------------- */
double PPPMDisp::final_accuracy()
{
double df_rspace, df_kspace;
double q2 = qsqsum * force->qqrd2e;
double xprd = domain->xprd;
double yprd = domain->yprd;
double zprd = domain->zprd;
double zprd_slab = zprd*slab_volfactor;
bigint natoms = atom->natoms;
df_rspace = 2.0*q2 * exp(-g_ewald*g_ewald*cutoff*cutoff) /
sqrt(natoms*cutoff*xprd*yprd*zprd);
double qopt = compute_qopt();
df_kspace = sqrt(qopt/natoms)*q2/(xprd*yprd*zprd_slab);
double acc = sqrt(df_rspace*df_rspace + df_kspace*df_kspace);
return acc;
}
/* ----------------------------------------------------------------------
Calculate the final estimator for the Dispersion accuracy
------------------------------------------------------------------------- */
void PPPMDisp::final_accuracy_6(double& acc, double& acc_real, double& acc_kspace)
{
double xprd = domain->xprd;
double yprd = domain->yprd;
double zprd = domain->zprd;
double zprd_slab = zprd*slab_volfactor;
bigint natoms = atom->natoms;
acc_real = lj_rspace_error();
double qopt = compute_qopt_6();
acc_kspace = sqrt(qopt/natoms)*csum/(xprd*yprd*zprd_slab);
acc = sqrt(acc_real*acc_real + acc_kspace*acc_kspace);
return;
}
/* ----------------------------------------------------------------------
Compute qopt for Coulomb interactions
------------------------------------------------------------------------- */
double PPPMDisp::compute_qopt()
{
double qopt;
if (differentiation_flag == 1) {
qopt = compute_qopt_ad();
} else {
qopt = compute_qopt_ik();
}
double qopt_all;
MPI_Allreduce(&qopt,&qopt_all,1,MPI_DOUBLE,MPI_SUM,world);
return qopt_all;
}
/* ----------------------------------------------------------------------
Compute qopt for Dispersion interactions
------------------------------------------------------------------------- */
double PPPMDisp::compute_qopt_6()
{
double qopt;
if (differentiation_flag == 1) {
qopt = compute_qopt_6_ad();
} else {
qopt = compute_qopt_6_ik();
}
double qopt_all;
MPI_Allreduce(&qopt,&qopt_all,1,MPI_DOUBLE,MPI_SUM,world);
return qopt_all;
}
/* ----------------------------------------------------------------------
Compute qopt for the ik differentiation scheme and Coulomb interaction
------------------------------------------------------------------------- */
double PPPMDisp::compute_qopt_ik()
{
double qopt = 0.0;
int k,l,m;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double unitkx = (2.0*MY_PI/xprd);
double unitky = (2.0*MY_PI/yprd);
double unitkz = (2.0*MY_PI/zprd_slab);
int nx,ny,nz,kper,lper,mper;
double sqk, u2;
double argx,argy,argz,wx,wy,wz,sx,sy,sz,qx,qy,qz;
double sum1,sum2, sum3,dot1,dot2;
int nbx = 2;
int nby = 2;
int nbz = 2;
for (m = nzlo_fft; m <= nzhi_fft; m++) {
mper = m - nz_pppm*(2*m/nz_pppm);
for (l = nylo_fft; l <= nyhi_fft; l++) {
lper = l - ny_pppm*(2*l/ny_pppm);
for (k = nxlo_fft; k <= nxhi_fft; k++) {
kper = k - nx_pppm*(2*k/nx_pppm);
sqk = pow(unitkx*kper,2.0) + pow(unitky*lper,2.0) +
pow(unitkz*mper,2.0);
if (sqk != 0.0) {
sum1 = 0.0;
sum2 = 0.0;
sum3 = 0.0;
for (nx = -nbx; nx <= nbx; nx++) {
qx = unitkx*(kper+nx_pppm*nx);
sx = exp(-0.25*pow(qx/g_ewald,2.0));
wx = 1.0;
argx = 0.5*qx*xprd/nx_pppm;
if (argx != 0.0) wx = pow(sin(argx)/argx,order);
for (ny = -nby; ny <= nby; ny++) {
qy = unitky*(lper+ny_pppm*ny);
sy = exp(-0.25*pow(qy/g_ewald,2.0));
wy = 1.0;
argy = 0.5*qy*yprd/ny_pppm;
if (argy != 0.0) wy = pow(sin(argy)/argy,order);
for (nz = -nbz; nz <= nbz; nz++) {
qz = unitkz*(mper+nz_pppm*nz);
sz = exp(-0.25*pow(qz/g_ewald,2.0));
wz = 1.0;
argz = 0.5*qz*zprd_slab/nz_pppm;
if (argz != 0.0) wz = pow(sin(argz)/argz,order);
dot1 = unitkx*kper*qx + unitky*lper*qy + unitkz*mper*qz;
dot2 = qx*qx+qy*qy+qz*qz;
u2 = pow(wx*wy*wz,2.0);
sum1 += sx*sy*sz*sx*sy*sz/dot2*4.0*4.0*MY_PI*MY_PI;
sum2 += u2*sx*sy*sz*4.0*MY_PI/dot2*dot1;
sum3 += u2;
}
}
}
sum2 *= sum2;
sum3 *= sum3*sqk;
qopt += sum1 -sum2/sum3;
}
}
}
}
return qopt;
}
/* ----------------------------------------------------------------------
Compute qopt for the ad differentiation scheme and Coulomb interaction
------------------------------------------------------------------------- */
double PPPMDisp::compute_qopt_ad()
{
double qopt = 0.0;
int k,l,m;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double unitkx = (2.0*MY_PI/xprd);
double unitky = (2.0*MY_PI/yprd);
double unitkz = (2.0*MY_PI/zprd_slab);
int nx,ny,nz,kper,lper,mper;
double argx,argy,argz,wx,wy,wz,sx,sy,sz,qx,qy,qz;
double u2, sqk;
double sum1,sum2,sum3,sum4,dot2;
int nbx = 2;
int nby = 2;
int nbz = 2;
for (m = nzlo_fft; m <= nzhi_fft; m++) {
mper = m - nz_pppm*(2*m/nz_pppm);
for (l = nylo_fft; l <= nyhi_fft; l++) {
lper = l - ny_pppm*(2*l/ny_pppm);
for (k = nxlo_fft; k <= nxhi_fft; k++) {
kper = k - nx_pppm*(2*k/nx_pppm);
sqk = pow(unitkx*kper,2.0) + pow(unitky*lper,2.0) +
pow(unitkz*mper,2.0);
if (sqk != 0.0) {
sum1 = 0.0;
sum2 = 0.0;
sum3 = 0.0;
sum4 = 0.0;
for (nx = -nbx; nx <= nbx; nx++) {
qx = unitkx*(kper+nx_pppm*nx);
sx = exp(-0.25*pow(qx/g_ewald,2.0));
wx = 1.0;
argx = 0.5*qx*xprd/nx_pppm;
if (argx != 0.0) wx = pow(sin(argx)/argx,order);
for (ny = -nby; ny <= nby; ny++) {
qy = unitky*(lper+ny_pppm*ny);
sy = exp(-0.25*pow(qy/g_ewald,2.0));
wy = 1.0;
argy = 0.5*qy*yprd/ny_pppm;
if (argy != 0.0) wy = pow(sin(argy)/argy,order);
for (nz = -nbz; nz <= nbz; nz++) {
qz = unitkz*(mper+nz_pppm*nz);
sz = exp(-0.25*pow(qz/g_ewald,2.0));
wz = 1.0;
argz = 0.5*qz*zprd_slab/nz_pppm;
if (argz != 0.0) wz = pow(sin(argz)/argz,order);
dot2 = qx*qx+qy*qy+qz*qz;
u2 = pow(wx*wy*wz,2.0);
sum1 += sx*sy*sz*sx*sy*sz/dot2*4.0*4.0*MY_PI*MY_PI;
sum2 += sx*sy*sz * u2*4.0*MY_PI;
sum3 += u2;
sum4 += dot2*u2;
}
}
}
sum2 *= sum2;
qopt += sum1 - sum2/(sum3*sum4);
}
}
}
}
return qopt;
}
/* ----------------------------------------------------------------------
Compute qopt for the ik differentiation scheme and Dispersion interaction
------------------------------------------------------------------------- */
double PPPMDisp::compute_qopt_6_ik()
{
double qopt = 0.0;
int k,l,m;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double unitkx = (2.0*MY_PI/xprd);
double unitky = (2.0*MY_PI/yprd);
double unitkz = (2.0*MY_PI/zprd_slab);
int nx,ny,nz,kper,lper,mper;
double sqk, u2;
double argx,argy,argz,wx,wy,wz,sx,sy,sz,qx,qy,qz;
double sum1,sum2, sum3;
double dot1,dot2, rtdot2, term;
double inv2ew = 2*g_ewald_6;
inv2ew = 1.0/inv2ew;
double rtpi = sqrt(MY_PI);
int nbx = 2;
int nby = 2;
int nbz = 2;
for (m = nzlo_fft_6; m <= nzhi_fft_6; m++) {
mper = m - nz_pppm_6*(2*m/nz_pppm_6);
for (l = nylo_fft_6; l <= nyhi_fft_6; l++) {
lper = l - ny_pppm_6*(2*l/ny_pppm_6);
for (k = nxlo_fft_6; k <= nxhi_fft_6; k++) {
kper = k - nx_pppm_6*(2*k/nx_pppm_6);
sqk = pow(unitkx*kper,2.0) + pow(unitky*lper,2.0) +
pow(unitkz*mper,2.0);
if (sqk != 0.0) {
sum1 = 0.0;
sum2 = 0.0;
sum3 = 0.0;
for (nx = -nbx; nx <= nbx; nx++) {
qx = unitkx*(kper+nx_pppm_6*nx);
sx = exp(-qx*qx*inv2ew*inv2ew);
wx = 1.0;
argx = 0.5*qx*xprd/nx_pppm_6;
if (argx != 0.0) wx = pow(sin(argx)/argx,order_6);
for (ny = -nby; ny <= nby; ny++) {
qy = unitky*(lper+ny_pppm_6*ny);
sy = exp(-qy*qy*inv2ew*inv2ew);
wy = 1.0;
argy = 0.5*qy*yprd/ny_pppm_6;
if (argy != 0.0) wy = pow(sin(argy)/argy,order_6);
for (nz = -nbz; nz <= nbz; nz++) {
qz = unitkz*(mper+nz_pppm_6*nz);
sz = exp(-qz*qz*inv2ew*inv2ew);
wz = 1.0;
argz = 0.5*qz*zprd_slab/nz_pppm_6;
if (argz != 0.0) wz = pow(sin(argz)/argz,order_6);
dot1 = unitkx*kper*qx + unitky*lper*qy + unitkz*mper*qz;
dot2 = qx*qx+qy*qy+qz*qz;
rtdot2 = sqrt(dot2);
term = (1-2*dot2*inv2ew*inv2ew)*sx*sy*sz +
2*dot2*rtdot2*inv2ew*inv2ew*inv2ew*rtpi*erfc(rtdot2*inv2ew);
term *= g_ewald_6*g_ewald_6*g_ewald_6;
u2 = pow(wx*wy*wz,2.0);
sum1 += term*term*MY_PI*MY_PI*MY_PI/9.0 * dot2;
sum2 += -u2*term*MY_PI*rtpi/3.0*dot1;
sum3 += u2;
}
}
}
sum2 *= sum2;
sum3 *= sum3*sqk;
qopt += sum1 -sum2/sum3;
}
}
}
}
return qopt;
}
/* ----------------------------------------------------------------------
Compute qopt for the ad differentiation scheme and Dispersion interaction
------------------------------------------------------------------------- */
double PPPMDisp::compute_qopt_6_ad()
{
double qopt = 0.0;
int k,l,m;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double unitkx = (2.0*MY_PI/xprd);
double unitky = (2.0*MY_PI/yprd);
double unitkz = (2.0*MY_PI/zprd_slab);
int nx,ny,nz,kper,lper,mper;
double argx,argy,argz,wx,wy,wz,sx,sy,sz,qx,qy,qz;
double u2, sqk;
double sum1,sum2,sum3,sum4;
double dot2, rtdot2, term;
double inv2ew = 2*g_ewald_6;
inv2ew = 1/inv2ew;
double rtpi = sqrt(MY_PI);
int nbx = 2;
int nby = 2;
int nbz = 2;
for (m = nzlo_fft_6; m <= nzhi_fft_6; m++) {
mper = m - nz_pppm_6*(2*m/nz_pppm_6);
for (l = nylo_fft_6; l <= nyhi_fft_6; l++) {
lper = l - ny_pppm_6*(2*l/ny_pppm_6);
for (k = nxlo_fft_6; k <= nxhi_fft_6; k++) {
kper = k - nx_pppm_6*(2*k/nx_pppm_6);
sqk = pow(unitkx*kper,2.0) + pow(unitky*lper,2.0) +
pow(unitkz*mper,2.0);
if (sqk != 0.0) {
sum1 = 0.0;
sum2 = 0.0;
sum3 = 0.0;
sum4 = 0.0;
for (nx = -nbx; nx <= nbx; nx++) {
qx = unitkx*(kper+nx_pppm_6*nx);
sx = exp(-qx*qx*inv2ew*inv2ew);
wx = 1.0;
argx = 0.5*qx*xprd/nx_pppm_6;
if (argx != 0.0) wx = pow(sin(argx)/argx,order_6);
for (ny = -nby; ny <= nby; ny++) {
qy = unitky*(lper+ny_pppm_6*ny);
sy = exp(-qy*qy*inv2ew*inv2ew);
wy = 1.0;
argy = 0.5*qy*yprd/ny_pppm_6;
if (argy != 0.0) wy = pow(sin(argy)/argy,order_6);
for (nz = -nbz; nz <= nbz; nz++) {
qz = unitkz*(mper+nz_pppm_6*nz);
sz = exp(-qz*qz*inv2ew*inv2ew);
wz = 1.0;
argz = 0.5*qz*zprd_slab/nz_pppm_6;
if (argz != 0.0) wz = pow(sin(argz)/argz,order_6);
dot2 = qx*qx+qy*qy+qz*qz;
rtdot2 = sqrt(dot2);
term = (1-2*dot2*inv2ew*inv2ew)*sx*sy*sz +
2*dot2*rtdot2*inv2ew*inv2ew*inv2ew*rtpi*erfc(rtdot2*inv2ew);
term *= g_ewald_6*g_ewald_6*g_ewald_6;
u2 = pow(wx*wy*wz,2.0);
sum1 += term*term*MY_PI*MY_PI*MY_PI/9.0 * dot2;
sum2 += -term*MY_PI*rtpi/3.0 * u2 * dot2;
sum3 += u2;
sum4 += dot2*u2;
}
}
}
sum2 *= sum2;
qopt += sum1 - sum2/(sum3*sum4);
}
}
}
}
return qopt;
}
/* ----------------------------------------------------------------------
set size of FFT grid and g_ewald_6
for Dispersion interactions
------------------------------------------------------------------------- */
void PPPMDisp::set_grid_6()
{
// Calculate csum
if (!csumflag) calc_csum();
if (!gewaldflag_6) set_init_g6();
if (!gridflag_6) set_n_pppm_6();
while (!factorable(nx_pppm_6)) nx_pppm_6++;
while (!factorable(ny_pppm_6)) ny_pppm_6++;
while (!factorable(nz_pppm_6)) nz_pppm_6++;
}
/* ----------------------------------------------------------------------
Calculate the sum of the squared dispersion coefficients and other
related quantities required for the calculations
------------------------------------------------------------------------- */
void PPPMDisp::calc_csum()
{
csumij = 0.0;
csum = 0.0;
int ntypes = atom->ntypes;
int i,j,k;
delete [] cii;
cii = new double[ntypes +1];
for (i = 0; i<=ntypes; i++) cii[i] = 0.0;
delete [] csumi;
csumi = new double[ntypes +1];
for (i = 0; i<=ntypes; i++) csumi[i] = 0.0;
int *neach = new int[ntypes+1];
for (i = 0; i<=ntypes; i++) neach[i] = 0;
//the following variables are needed to distinguish between arithmetic
// and geometric mixing
if (function[1]) {
for (i = 1; i <= ntypes; i++)
cii[i] = B[i]*B[i];
int tmp;
for (i = 0; i < atom->nlocal; i++) {
tmp = atom->type[i];
neach[tmp]++;
csum += B[tmp]*B[tmp];
}
}
if (function[2]) {
for (i = 1; i <= ntypes; i++)
cii[i] = 64.0/20.0*B[7*i+3]*B[7*i+3];
int tmp;
for (i = 0; i < atom->nlocal; i++) {
tmp = atom->type[i];
neach[tmp]++;
csum += 64.0/20.0*B[7*tmp+3]*B[7*tmp+3];
}
}
if (function[3]) {
for (i = 1; i <= ntypes; i++)
for (j = 0; j < nsplit; j++)
cii[i] += B[j]*B[nsplit*i + j]*B[nsplit*i + j];
int tmp;
for (i = 0; i < atom->nlocal; i++) {
tmp = atom->type[i];
neach[tmp]++;
for (j = 0; j < nsplit; j++)
csum += B[j]*B[nsplit*tmp + j]*B[nsplit*tmp + j];
}
}
double tmp2;
MPI_Allreduce(&csum,&tmp2,1,MPI_DOUBLE,MPI_SUM,world);
csum = tmp2;
csumflag = 1;
int *neach_all = new int[ntypes+1];
MPI_Allreduce(neach,neach_all,ntypes+1,MPI_INT,MPI_SUM,world);
// copmute csumij and csumi
double d1, d2;
if (function[1]){
for (i=1; i<=ntypes; i++) {
for (j=1; j<=ntypes; j++) {
csumi[i] += neach_all[j]*B[i]*B[j];
d1 = neach_all[i]*B[i];
d2 = neach_all[j]*B[j];
csumij += d1*d2;
//csumij += neach_all[i]*neach_all[j]*B[i]*B[j];
}
}
}
if (function[2]) {
for (i=1; i<=ntypes; i++) {
for (j=1; j<=ntypes; j++) {
for (k=0; k<=6; k++) {
csumi[i] += neach_all[j]*B[7*i + k]*B[7*(j+1)-k-1];
d1 = neach_all[i]*B[7*i + k];
d2 = neach_all[j]*B[7*(j+1)-k-1];
csumij += d1*d2;
//csumij += neach_all[i]*neach_all[j]*B[7*i + k]*B[7*(j+1)-k-1];
}
}
}
}
if (function[3]) {
for (i=1; i<=ntypes; i++) {
for (j=1; j<=ntypes; j++) {
for (k=0; k<nsplit; k++) {
csumi[i] += neach_all[j]*B[k]*B[nsplit*i+k]*B[nsplit*j+k];
d1 = neach_all[i]*B[nsplit*i+k];
d2 = neach_all[j]*B[nsplit*j+k];
csumij += B[k]*d1*d2;
}
}
}
}
delete [] neach;
delete [] neach_all;
}
/* ----------------------------------------------------------------------
adjust g_ewald_6 to the new grid size
------------------------------------------------------------------------- */
void PPPMDisp::adjust_gewald_6()
{
// Use Newton solver to find g_ewald_6
double dx;
// Start loop
for (int i = 0; i < LARGE; i++) {
dx = f_6() / derivf_6();
g_ewald_6 -= dx; //update g_ewald_6
if (fabs(f_6()) < SMALL) return;
}
// Failed to converge
char str[128];
sprintf(str, "Could not adjust g_ewald_6");
error->all(FLERR, str);
}
/* ----------------------------------------------------------------------
Calculate f(x) for Dispersion interaction
------------------------------------------------------------------------- */
double PPPMDisp::f_6()
{
double df_rspace, df_kspace;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
bigint natoms = atom->natoms;
df_rspace = lj_rspace_error();
double qopt = compute_qopt_6();
df_kspace = sqrt(qopt/natoms)*csum/(xprd*yprd*zprd_slab);
return df_rspace - df_kspace;
}
/* ----------------------------------------------------------------------
Calculate numerical derivative f'(x) using forward difference
[f(x + h) - f(x)] / h
------------------------------------------------------------------------- */
double PPPMDisp::derivf_6()
{
double h = 0.000001; //Derivative step-size
double df,f1,f2,g_ewald_old;
f1 = f_6();
g_ewald_old = g_ewald_6;
g_ewald_6 += h;
f2 = f_6();
g_ewald_6 = g_ewald_old;
df = (f2 - f1)/h;
return df;
}
/* ----------------------------------------------------------------------
calculate an initial value for g_ewald_6
---------------------------------------------------------------------- */
void PPPMDisp::set_init_g6()
{
// use xprd,yprd,zprd even if triclinic so grid size is the same
// adjust z dimension for 2d slab PPPM
// 3d PPPM just uses zprd since slab_volfactor = 1.0
// make initial g_ewald estimate
// based on desired error and real space cutoff
// compute initial value for df_real with g_ewald_6 = 1/cutoff_lj
// if df_real > 0, repeat divide g_ewald_6 by 2 until df_real < 0
// else, repeat multiply g_ewald_6 by 2 until df_real > 0
// perform bisection for the last two values of
double df_real;
double g_ewald_old;
double gmin, gmax;
// check if there is a user defined accuracy
double acc_rspace = accuracy;
if (accuracy_real_6 > 0) acc_rspace = accuracy_real_6;
g_ewald_old = g_ewald_6 = 1.0/cutoff_lj;
df_real = lj_rspace_error() - acc_rspace;
int counter = 0;
if (df_real > 0) {
while (df_real > 0 && counter < LARGE) {
counter++;
g_ewald_old = g_ewald_6;
g_ewald_6 *= 2;
df_real = lj_rspace_error() - acc_rspace;
}
}
if (df_real < 0) {
while (df_real < 0 && counter < LARGE) {
counter++;
g_ewald_old = g_ewald_6;
g_ewald_6 *= 0.5;
df_real = lj_rspace_error() - acc_rspace;
}
}
if (counter >= LARGE-1) error->all(FLERR,"Cannot compute initial g_ewald_disp");
gmin = MIN(g_ewald_6, g_ewald_old);
gmax = MAX(g_ewald_6, g_ewald_old);
g_ewald_6 = gmin + 0.5*(gmax-gmin);
counter = 0;
while (gmax-gmin > SMALL && counter < LARGE) {
counter++;
df_real = lj_rspace_error() -acc_rspace;
if (df_real < 0) gmax = g_ewald_6;
else gmin = g_ewald_6;
g_ewald_6 = gmin + 0.5*(gmax-gmin);
}
if (counter >= LARGE-1) error->all(FLERR,"Cannot compute initial g_ewald_disp");
}
/* ----------------------------------------------------------------------
calculate nx_pppm, ny_pppm, nz_pppm for dispersion interaction
---------------------------------------------------------------------- */
void PPPMDisp::set_n_pppm_6()
{
bigint natoms = atom->natoms;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double h, h_x,h_y,h_z;
double acc_kspace = accuracy;
if (accuracy_kspace_6 > 0.0) acc_kspace = accuracy_kspace_6;
// initial value for the grid spacing
h = h_x = h_y = h_z = 4.0/g_ewald_6;
// decrease grid spacing untill required precision is obtained
int count = 0;
while(1) {
// set grid dimension
nx_pppm_6 = static_cast<int> (xprd/h_x);
ny_pppm_6 = static_cast<int> (yprd/h_y);
nz_pppm_6 = static_cast<int> (zprd_slab/h_z);
if (nx_pppm_6 <= 1) nx_pppm_6 = 2;
if (ny_pppm_6 <= 1) ny_pppm_6 = 2;
if (nz_pppm_6 <= 1) nz_pppm_6 = 2;
//set local grid dimension
int npey_fft,npez_fft;
if (nz_pppm_6 >= nprocs) {
npey_fft = 1;
npez_fft = nprocs;
} else procs2grid2d(nprocs,ny_pppm_6,nz_pppm_6,&npey_fft,&npez_fft);
int me_y = me % npey_fft;
int me_z = me / npey_fft;
nxlo_fft_6 = 0;
nxhi_fft_6 = nx_pppm_6 - 1;
nylo_fft_6 = me_y*ny_pppm_6/npey_fft;
nyhi_fft_6 = (me_y+1)*ny_pppm_6/npey_fft - 1;
nzlo_fft_6 = me_z*nz_pppm_6/npez_fft;
nzhi_fft_6 = (me_z+1)*nz_pppm_6/npez_fft - 1;
double qopt = compute_qopt_6();
double df_kspace = sqrt(qopt/natoms)*csum/(xprd*yprd*zprd_slab);
count++;
// break loop if the accuracy has been reached or too many loops have been performed
if (df_kspace <= acc_kspace) break;
if (count > 500) error->all(FLERR, "Could not compute grid size for Dispersion");
h *= 0.95;
h_x = h_y = h_z = h;
}
}
/* ----------------------------------------------------------------------
calculate the real space error for dispersion interactions
---------------------------------------------------------------------- */
double PPPMDisp::lj_rspace_error()
{
bigint natoms = atom->natoms;
double xprd = domain->xprd;
double yprd = domain->yprd;
double zprd = domain->zprd;
double zprd_slab = zprd*slab_volfactor;
double deltaf;
double rgs = (cutoff_lj*g_ewald_6);
rgs *= rgs;
double rgs_inv = 1.0/rgs;
deltaf = csum/sqrt(natoms*xprd*yprd*zprd_slab*cutoff_lj)*sqrt(MY_PI)*pow(g_ewald_6, 5)*
exp(-rgs)*(1+rgs_inv*(3+rgs_inv*(6+rgs_inv*6)));
return deltaf;
}
/* ----------------------------------------------------------------------
Compyute the modified (hockney-eastwood) coulomb green function
---------------------------------------------------------------------- */
void PPPMDisp::compute_gf()
{
int k,l,m,n;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
volume = xprd * yprd * zprd_slab;
double unitkx = (2.0*MY_PI/xprd);
double unitky = (2.0*MY_PI/yprd);
double unitkz = (2.0*MY_PI/zprd_slab);
int kper,lper,mper;
double snx,sny,snz,snx2,sny2,snz2;
double sqk;
double argx,argy,argz,wx,wy,wz,sx,sy,sz,qx,qy,qz;
double numerator,denominator;
n = 0;
for (m = nzlo_fft; m <= nzhi_fft; m++) {
mper = m - nz_pppm*(2*m/nz_pppm);
qz = unitkz*mper;
snz = sin(0.5*qz*zprd_slab/nz_pppm);
snz2 = snz*snz;
sz = exp(-0.25*pow(qz/g_ewald,2.0));
wz = 1.0;
argz = 0.5*qz*zprd_slab/nz_pppm;
if (argz != 0.0) wz = pow(sin(argz)/argz,order);
wz *= wz;
for (l = nylo_fft; l <= nyhi_fft; l++) {
lper = l - ny_pppm*(2*l/ny_pppm);
qy = unitky*lper;
sny = sin(0.5*qy*yprd/ny_pppm);
sny2 = sny*sny;
sy = exp(-0.25*pow(qy/g_ewald,2.0));
wy = 1.0;
argy = 0.5*qy*yprd/ny_pppm;
if (argy != 0.0) wy = pow(sin(argy)/argy,order);
wy *= wy;
for (k = nxlo_fft; k <= nxhi_fft; k++) {
kper = k - nx_pppm*(2*k/nx_pppm);
qx = unitkx*kper;
snx = sin(0.5*qx*xprd/nx_pppm);
snx2 = snx*snx;
sx = exp(-0.25*pow(qx/g_ewald,2.0));
wx = 1.0;
argx = 0.5*qx*xprd/nx_pppm;
if (argx != 0.0) wx = pow(sin(argx)/argx,order);
wx *= wx;
sqk = pow(qx,2.0) + pow(qy,2.0) + pow(qz,2.0);
if (sqk != 0.0) {
numerator = 4.0*MY_PI/sqk;
denominator = gf_denom(snx2,sny2,snz2, gf_b, order);
greensfn[n++] = numerator*sx*sy*sz*wx*wy*wz/denominator;
} else greensfn[n++] = 0.0;
}
}
}
}
/* ----------------------------------------------------------------------
compute self force coefficients for ad-differentiation scheme
and Coulomb interaction
------------------------------------------------------------------------- */
void PPPMDisp::compute_sf_precoeff(int nxp, int nyp, int nzp, int ord,
int nxlo_ft, int nylo_ft, int nzlo_ft,
int nxhi_ft, int nyhi_ft, int nzhi_ft,
double *sf_pre1, double *sf_pre2, double *sf_pre3,
double *sf_pre4, double *sf_pre5, double *sf_pre6)
{
int i,k,l,m,n;
double *prd;
// volume-dependent factors
// adjust z dimension for 2d slab PPPM
// z dimension for 3d PPPM is zprd since slab_volfactor = 1.0
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double unitkx = (2.0*MY_PI/xprd);
double unitky = (2.0*MY_PI/yprd);
double unitkz = (2.0*MY_PI/zprd_slab);
int nx,ny,nz,kper,lper,mper;
double argx,argy,argz;
double wx0[5],wy0[5],wz0[5],wx1[5],wy1[5],wz1[5],wx2[5],wy2[5],wz2[5];
double qx0,qy0,qz0,qx1,qy1,qz1,qx2,qy2,qz2;
double u0,u1,u2,u3,u4,u5,u6;
double sum1,sum2,sum3,sum4,sum5,sum6;
int nb = 2;
n = 0;
for (m = nzlo_ft; m <= nzhi_ft; m++) {
mper = m - nzp*(2*m/nzp);
for (l = nylo_ft; l <= nyhi_ft; l++) {
lper = l - nyp*(2*l/nyp);
for (k = nxlo_ft; k <= nxhi_ft; k++) {
kper = k - nxp*(2*k/nxp);
sum1 = sum2 = sum3 = sum4 = sum5 = sum6 = 0.0;
for (i = -nb; i <= nb; i++) {
qx0 = unitkx*(kper+nxp*i);
qx1 = unitkx*(kper+nxp*(i+1));
qx2 = unitkx*(kper+nxp*(i+2));
wx0[i+2] = 1.0;
wx1[i+2] = 1.0;
wx2[i+2] = 1.0;
argx = 0.5*qx0*xprd/nxp;
if (argx != 0.0) wx0[i+2] = pow(sin(argx)/argx,ord);
argx = 0.5*qx1*xprd/nxp;
if (argx != 0.0) wx1[i+2] = pow(sin(argx)/argx,ord);
argx = 0.5*qx2*xprd/nxp;
if (argx != 0.0) wx2[i+2] = pow(sin(argx)/argx,ord);
qy0 = unitky*(lper+nyp*i);
qy1 = unitky*(lper+nyp*(i+1));
qy2 = unitky*(lper+nyp*(i+2));
wy0[i+2] = 1.0;
wy1[i+2] = 1.0;
wy2[i+2] = 1.0;
argy = 0.5*qy0*yprd/nyp;
if (argy != 0.0) wy0[i+2] = pow(sin(argy)/argy,ord);
argy = 0.5*qy1*yprd/nyp;
if (argy != 0.0) wy1[i+2] = pow(sin(argy)/argy,ord);
argy = 0.5*qy2*yprd/nyp;
if (argy != 0.0) wy2[i+2] = pow(sin(argy)/argy,ord);
qz0 = unitkz*(mper+nzp*i);
qz1 = unitkz*(mper+nzp*(i+1));
qz2 = unitkz*(mper+nzp*(i+2));
wz0[i+2] = 1.0;
wz1[i+2] = 1.0;
wz2[i+2] = 1.0;
argz = 0.5*qz0*zprd_slab/nzp;
if (argz != 0.0) wz0[i+2] = pow(sin(argz)/argz,ord);
argz = 0.5*qz1*zprd_slab/nzp;
if (argz != 0.0) wz1[i+2] = pow(sin(argz)/argz,ord);
argz = 0.5*qz2*zprd_slab/nzp;
if (argz != 0.0) wz2[i+2] = pow(sin(argz)/argz,ord);
}
for (nx = 0; nx <= 4; nx++) {
for (ny = 0; ny <= 4; ny++) {
for (nz = 0; nz <= 4; nz++) {
u0 = wx0[nx]*wy0[ny]*wz0[nz];
u1 = wx1[nx]*wy0[ny]*wz0[nz];
u2 = wx2[nx]*wy0[ny]*wz0[nz];
u3 = wx0[nx]*wy1[ny]*wz0[nz];
u4 = wx0[nx]*wy2[ny]*wz0[nz];
u5 = wx0[nx]*wy0[ny]*wz1[nz];
u6 = wx0[nx]*wy0[ny]*wz2[nz];
sum1 += u0*u1;
sum2 += u0*u2;
sum3 += u0*u3;
sum4 += u0*u4;
sum5 += u0*u5;
sum6 += u0*u6;
}
}
}
// store values
sf_pre1[n] = sum1;
sf_pre2[n] = sum2;
sf_pre3[n] = sum3;
sf_pre4[n] = sum4;
sf_pre5[n] = sum5;
sf_pre6[n++] = sum6;
}
}
}
}
/* ----------------------------------------------------------------------
Compute the modified (hockney-eastwood) dispersion green function
---------------------------------------------------------------------- */
void PPPMDisp::compute_gf_6()
{
double *prd;
int k,l,m,n;
// volume-dependent factors
// adjust z dimension for 2d slab PPPM
// z dimension for 3d PPPM is zprd since slab_volfactor = 1.0
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double unitkx = (2.0*MY_PI/xprd);
double unitky = (2.0*MY_PI/yprd);
double unitkz = (2.0*MY_PI/zprd_slab);
int kper,lper,mper;
double sqk;
double snx,sny,snz,snx2,sny2,snz2;
double argx,argy,argz,wx,wy,wz,sx,sy,sz;
double qx,qy,qz;
double rtsqk, term;
double numerator,denominator;
double inv2ew = 2*g_ewald_6;
inv2ew = 1/inv2ew;
double rtpi = sqrt(MY_PI);
numerator = -MY_PI*rtpi*g_ewald_6*g_ewald_6*g_ewald_6/(3.0);
n = 0;
for (m = nzlo_fft_6; m <= nzhi_fft_6; m++) {
mper = m - nz_pppm_6*(2*m/nz_pppm_6);
qz = unitkz*mper;
snz = sin(0.5*unitkz*mper*zprd_slab/nz_pppm_6);
snz2 = snz*snz;
sz = exp(-qz*qz*inv2ew*inv2ew);
wz = 1.0;
argz = 0.5*qz*zprd_slab/nz_pppm_6;
if (argz != 0.0) wz = pow(sin(argz)/argz,order_6);
wz *= wz;
for (l = nylo_fft_6; l <= nyhi_fft_6; l++) {
lper = l - ny_pppm_6*(2*l/ny_pppm_6);
qy = unitky*lper;
sny = sin(0.5*unitky*lper*yprd/ny_pppm_6);
sny2 = sny*sny;
sy = exp(-qy*qy*inv2ew*inv2ew);
wy = 1.0;
argy = 0.5*qy*yprd/ny_pppm_6;
if (argy != 0.0) wy = pow(sin(argy)/argy,order_6);
wy *= wy;
for (k = nxlo_fft_6; k <= nxhi_fft_6; k++) {
kper = k - nx_pppm_6*(2*k/nx_pppm_6);
qx = unitkx*kper;
snx = sin(0.5*unitkx*kper*xprd/nx_pppm_6);
snx2 = snx*snx;
sx = exp(-qx*qx*inv2ew*inv2ew);
wx = 1.0;
argx = 0.5*qx*xprd/nx_pppm_6;
if (argx != 0.0) wx = pow(sin(argx)/argx,order_6);
wx *= wx;
sqk = pow(qx,2.0) + pow(qy,2.0) + pow(qz,2.0);
if (sqk != 0.0) {
denominator = gf_denom(snx2,sny2,snz2, gf_b_6, order_6);
rtsqk = sqrt(sqk);
term = (1-2*sqk*inv2ew*inv2ew)*sx*sy*sz +
2*sqk*rtsqk*inv2ew*inv2ew*inv2ew*rtpi*erfc(rtsqk*inv2ew);
greensfn_6[n++] = numerator*term*wx*wy*wz/denominator;
} else greensfn_6[n++] = 0.0;
}
}
}
}
/* ----------------------------------------------------------------------
compute self force coefficients for ad-differentiation scheme
and Coulomb interaction
------------------------------------------------------------------------- */
void PPPMDisp::compute_sf_coeff()
{
int i,k,l,m,n;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
volume = xprd * yprd * zprd_slab;
for (i = 0; i <= 5; i++) sf_coeff[i] = 0.0;
n = 0;
for (m = nzlo_fft; m <= nzhi_fft; m++) {
for (l = nylo_fft; l <= nyhi_fft; l++) {
for (k = nxlo_fft; k <= nxhi_fft; k++) {
sf_coeff[0] += sf_precoeff1[n]*greensfn[n];
sf_coeff[1] += sf_precoeff2[n]*greensfn[n];
sf_coeff[2] += sf_precoeff3[n]*greensfn[n];
sf_coeff[3] += sf_precoeff4[n]*greensfn[n];
sf_coeff[4] += sf_precoeff5[n]*greensfn[n];
sf_coeff[5] += sf_precoeff6[n]*greensfn[n];
++n;
}
}
}
// Compute the coefficients for the self-force correction
double prex, prey, prez;
prex = prey = prez = MY_PI/volume;
prex *= nx_pppm/xprd;
prey *= ny_pppm/yprd;
prez *= nz_pppm/zprd_slab;
sf_coeff[0] *= prex;
sf_coeff[1] *= prex*2;
sf_coeff[2] *= prey;
sf_coeff[3] *= prey*2;
sf_coeff[4] *= prez;
sf_coeff[5] *= prez*2;
// communicate values with other procs
double tmp[6];
MPI_Allreduce(sf_coeff,tmp,6,MPI_DOUBLE,MPI_SUM,world);
for (n = 0; n < 6; n++) sf_coeff[n] = tmp[n];
}
/* ----------------------------------------------------------------------
compute self force coefficients for ad-differentiation scheme
and Dispersion interaction
------------------------------------------------------------------------- */
void PPPMDisp::compute_sf_coeff_6()
{
int i,k,l,m,n;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
volume = xprd * yprd * zprd_slab;
for (i = 0; i <= 5; i++) sf_coeff_6[i] = 0.0;
n = 0;
for (m = nzlo_fft_6; m <= nzhi_fft_6; m++) {
for (l = nylo_fft_6; l <= nyhi_fft_6; l++) {
for (k = nxlo_fft_6; k <= nxhi_fft_6; k++) {
sf_coeff_6[0] += sf_precoeff1_6[n]*greensfn_6[n];
sf_coeff_6[1] += sf_precoeff2_6[n]*greensfn_6[n];
sf_coeff_6[2] += sf_precoeff3_6[n]*greensfn_6[n];
sf_coeff_6[3] += sf_precoeff4_6[n]*greensfn_6[n];
sf_coeff_6[4] += sf_precoeff5_6[n]*greensfn_6[n];
sf_coeff_6[5] += sf_precoeff6_6[n]*greensfn_6[n];
++n;
}
}
}
// perform multiplication with prefactors
double prex, prey, prez;
prex = prey = prez = MY_PI/volume;
prex *= nx_pppm_6/xprd;
prey *= ny_pppm_6/yprd;
prez *= nz_pppm_6/zprd_slab;
sf_coeff_6[0] *= prex;
sf_coeff_6[1] *= prex*2;
sf_coeff_6[2] *= prey;
sf_coeff_6[3] *= prey*2;
sf_coeff_6[4] *= prez;
sf_coeff_6[5] *= prez*2;
// communicate values with other procs
double tmp[6];
MPI_Allreduce(sf_coeff_6,tmp,6,MPI_DOUBLE,MPI_SUM,world);
for (n = 0; n < 6; n++) sf_coeff_6[n] = tmp[n];
}
/* ----------------------------------------------------------------------
denominator for Hockney-Eastwood Green's function
of x,y,z = sin(kx*deltax/2), etc
inf n-1
S(n,k) = Sum W(k+pi*j)**2 = Sum b(l)*(z*z)**l
j=-inf l=0
= -(z*z)**n /(2n-1)! * (d/dx)**(2n-1) cot(x) at z = sin(x)
gf_b = denominator expansion coeffs
------------------------------------------------------------------------- */
double PPPMDisp::gf_denom(double x, double y, double z, double *g_b, int ord)
{
double sx,sy,sz;
sz = sy = sx = 0.0;
for (int l = ord-1; l >= 0; l--) {
sx = g_b[l] + sx*x;
sy = g_b[l] + sy*y;
sz = g_b[l] + sz*z;
}
double s = sx*sy*sz;
return s*s;
}
/* ----------------------------------------------------------------------
pre-compute Green's function denominator expansion coeffs, Gamma(2n)
------------------------------------------------------------------------- */
void PPPMDisp::compute_gf_denom(double* gf, int ord)
{
int k,l,m;
for (l = 1; l < ord; l++) gf[l] = 0.0;
gf[0] = 1.0;
for (m = 1; m < ord; m++) {
for (l = m; l > 0; l--)
gf[l] = 4.0 * (gf[l]*(l-m)*(l-m-0.5)-gf[l-1]*(l-m-1)*(l-m-1));
gf[0] = 4.0 * (gf[0]*(l-m)*(l-m-0.5));
}
bigint ifact = 1;
for (k = 1; k < 2*ord; k++) ifact *= k;
double gaminv = 1.0/ifact;
for (l = 0; l < ord; l++) gf[l] *= gaminv;
}
/* ----------------------------------------------------------------------
ghost-swap to accumulate full density in brick decomposition
remap density from 3d brick decomposition to FFTdecomposition
for coulomb interaction or dispersion interaction with geometric
mixing
------------------------------------------------------------------------- */
void PPPMDisp::brick2fft(int nxlo_i, int nylo_i, int nzlo_i,
int nxhi_i, int nyhi_i, int nzhi_i,
FFT_SCALAR*** dbrick, FFT_SCALAR* dfft, FFT_SCALAR* work,
LAMMPS_NS::Remap* rmp)
{
int n,ix,iy,iz;
// copy grabs inner portion of density from 3d brick
// remap could be done as pre-stage of FFT,
// but this works optimally on only double values, not complex values
n = 0;
for (iz = nzlo_i; iz <= nzhi_i; iz++)
for (iy = nylo_i; iy <= nyhi_i; iy++)
for (ix = nxlo_i; ix <= nxhi_i; ix++)
dfft[n++] = dbrick[iz][iy][ix];
rmp->perform(dfft,dfft,work);
}
/* ----------------------------------------------------------------------
ghost-swap to accumulate full density in brick decomposition
remap density from 3d brick decomposition to FFTdecomposition
for dispersion with arithmetic mixing rule
------------------------------------------------------------------------- */
void PPPMDisp::brick2fft_a()
{
int n,ix,iy,iz;
// copy grabs inner portion of density from 3d brick
// remap could be done as pre-stage of FFT,
// but this works optimally on only double values, not complex values
n = 0;
for (iz = nzlo_in_6; iz <= nzhi_in_6; iz++)
for (iy = nylo_in_6; iy <= nyhi_in_6; iy++)
for (ix = nxlo_in_6; ix <= nxhi_in_6; ix++) {
density_fft_a0[n] = density_brick_a0[iz][iy][ix];
density_fft_a1[n] = density_brick_a1[iz][iy][ix];
density_fft_a2[n] = density_brick_a2[iz][iy][ix];
density_fft_a3[n] = density_brick_a3[iz][iy][ix];
density_fft_a4[n] = density_brick_a4[iz][iy][ix];
density_fft_a5[n] = density_brick_a5[iz][iy][ix];
density_fft_a6[n++] = density_brick_a6[iz][iy][ix];
}
remap_6->perform(density_fft_a0,density_fft_a0,work1_6);
remap_6->perform(density_fft_a1,density_fft_a1,work1_6);
remap_6->perform(density_fft_a2,density_fft_a2,work1_6);
remap_6->perform(density_fft_a3,density_fft_a3,work1_6);
remap_6->perform(density_fft_a4,density_fft_a4,work1_6);
remap_6->perform(density_fft_a5,density_fft_a5,work1_6);
remap_6->perform(density_fft_a6,density_fft_a6,work1_6);
}
/* ----------------------------------------------------------------------
ghost-swap to accumulate full density in brick decomposition
remap density from 3d brick decomposition to FFTdecomposition
for dispersion with special case
------------------------------------------------------------------------- */
void PPPMDisp::brick2fft_none()
{
int k,n,ix,iy,iz;
// copy grabs inner portion of density from 3d brick
// remap could be done as pre-stage of FFT,
// but this works optimally on only double values, not complex values
for (k = 0; k<nsplit_alloc; k++) {
n = 0;
for (iz = nzlo_in_6; iz <= nzhi_in_6; iz++)
for (iy = nylo_in_6; iy <= nyhi_in_6; iy++)
for (ix = nxlo_in_6; ix <= nxhi_in_6; ix++)
density_fft_none[k][n++] = density_brick_none[k][iz][iy][ix];
}
for (k=0; k<nsplit_alloc; k++)
remap_6->perform(density_fft_none[k],density_fft_none[k],work1_6);
}
/* ----------------------------------------------------------------------
find center grid pt for each of my particles
check that full stencil for the particle will fit in my 3d brick
store central grid pt indices in part2grid array
------------------------------------------------------------------------- */
void PPPMDisp::particle_map(double delx, double dely, double delz,
double sft, int** p2g, int nup, int nlow,
int nxlo, int nylo, int nzlo,
int nxhi, int nyhi, int nzhi)
{
int nx,ny,nz;
double **x = atom->x;
int nlocal = atom->nlocal;
if (!ISFINITE(boxlo[0]) || !ISFINITE(boxlo[1]) || !ISFINITE(boxlo[2]))
error->one(FLERR,"Non-numeric box dimensions - simulation unstable");
int flag = 0;
for (int i = 0; i < nlocal; i++) {
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// current particle coord can be outside global and local box
// add/subtract OFFSET to avoid int(-0.75) = 0 when want it to be -1
nx = static_cast<int> ((x[i][0]-boxlo[0])*delx+sft) - OFFSET;
ny = static_cast<int> ((x[i][1]-boxlo[1])*dely+sft) - OFFSET;
nz = static_cast<int> ((x[i][2]-boxlo[2])*delz+sft) - OFFSET;
p2g[i][0] = nx;
p2g[i][1] = ny;
p2g[i][2] = nz;
// check that entire stencil around nx,ny,nz will fit in my 3d brick
if (nx+nlow < nxlo || nx+nup > nxhi ||
ny+nlow < nylo || ny+nup > nyhi ||
nz+nlow < nzlo || nz+nup > nzhi)
flag = 1;
}
if (flag) error->one(FLERR,"Out of range atoms - cannot compute PPPMDisp");
}
void PPPMDisp::particle_map_c(double delx, double dely, double delz,
double sft, int** p2g, int nup, int nlow,
int nxlo, int nylo, int nzlo,
int nxhi, int nyhi, int nzhi)
{
particle_map(delx, dely, delz, sft, p2g, nup, nlow,
nxlo, nylo, nzlo, nxhi, nyhi, nzhi);
}
/* ----------------------------------------------------------------------
create discretized "density" on section of global grid due to my particles
density(x,y,z) = charge "density" at grid points of my 3d brick
(nxlo:nxhi,nylo:nyhi,nzlo:nzhi) is extent of my brick (including ghosts)
in global grid
------------------------------------------------------------------------- */
void PPPMDisp::make_rho_c()
{
int l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
// clear 3d density array
memset(&(density_brick[nzlo_out][nylo_out][nxlo_out]),0,
ngrid*sizeof(FFT_SCALAR));
// loop over my charges, add their contribution to nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
double *q = atom->q;
double **x = atom->x;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
nx = part2grid[i][0];
ny = part2grid[i][1];
nz = part2grid[i][2];
dx = nx+shiftone - (x[i][0]-boxlo[0])*delxinv;
dy = ny+shiftone - (x[i][1]-boxlo[1])*delyinv;
dz = nz+shiftone - (x[i][2]-boxlo[2])*delzinv;
compute_rho1d(dx,dy,dz, order, rho_coeff, rho1d);
z0 = delvolinv * q[i];
for (n = nlower; n <= nupper; n++) {
mz = n+nz;
y0 = z0*rho1d[2][n];
for (m = nlower; m <= nupper; m++) {
my = m+ny;
x0 = y0*rho1d[1][m];
for (l = nlower; l <= nupper; l++) {
mx = l+nx;
density_brick[mz][my][mx] += x0*rho1d[0][l];
}
}
}
}
}
/* ----------------------------------------------------------------------
create discretized "density" on section of global grid due to my particles
density(x,y,z) = dispersion "density" at grid points of my 3d brick
(nxlo:nxhi,nylo:nyhi,nzlo:nzhi) is extent of my brick (including ghosts)
in global grid --- geometric mixing
------------------------------------------------------------------------- */
void PPPMDisp::make_rho_g()
{
int l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
// clear 3d density array
memset(&(density_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
ngrid_6*sizeof(FFT_SCALAR));
// loop over my charges, add their contribution to nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
int type;
double **x = atom->x;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
type = atom->type[i];
z0 = delvolinv_6 * B[type];
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
y0 = z0*rho1d_6[2][n];
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
x0 = y0*rho1d_6[1][m];
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
density_brick_g[mz][my][mx] += x0*rho1d_6[0][l];
}
}
}
}
}
/* ----------------------------------------------------------------------
create discretized "density" on section of global grid due to my particles
density(x,y,z) = dispersion "density" at grid points of my 3d brick
(nxlo:nxhi,nylo:nyhi,nzlo:nzhi) is extent of my brick (including ghosts)
in global grid --- arithmetic mixing
------------------------------------------------------------------------- */
void PPPMDisp::make_rho_a()
{
int l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0,w;
// clear 3d density array
memset(&(density_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
ngrid_6*sizeof(FFT_SCALAR));
memset(&(density_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
ngrid_6*sizeof(FFT_SCALAR));
memset(&(density_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
ngrid_6*sizeof(FFT_SCALAR));
memset(&(density_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
ngrid_6*sizeof(FFT_SCALAR));
memset(&(density_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
ngrid_6*sizeof(FFT_SCALAR));
memset(&(density_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
ngrid_6*sizeof(FFT_SCALAR));
memset(&(density_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
ngrid_6*sizeof(FFT_SCALAR));
// loop over my particles, add their contribution to nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
int type;
double **x = atom->x;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
//do the following for all 4 grids
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
type = atom->type[i];
z0 = delvolinv_6;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
y0 = z0*rho1d_6[2][n];
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
x0 = y0*rho1d_6[1][m];
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
w = x0*rho1d_6[0][l];
density_brick_a0[mz][my][mx] += w*B[7*type];
density_brick_a1[mz][my][mx] += w*B[7*type+1];
density_brick_a2[mz][my][mx] += w*B[7*type+2];
density_brick_a3[mz][my][mx] += w*B[7*type+3];
density_brick_a4[mz][my][mx] += w*B[7*type+4];
density_brick_a5[mz][my][mx] += w*B[7*type+5];
density_brick_a6[mz][my][mx] += w*B[7*type+6];
}
}
}
}
}
/* ----------------------------------------------------------------------
create discretized "density" on section of global grid due to my particles
density(x,y,z) = dispersion "density" at grid points of my 3d brick
(nxlo:nxhi,nylo:nyhi,nzlo:nzhi) is extent of my brick (including ghosts)
in global grid --- case when mixing rules don't apply
------------------------------------------------------------------------- */
void PPPMDisp::make_rho_none()
{
int k,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0,w;
// clear 3d density array
for (k = 0; k < nsplit_alloc; k++)
memset(&(density_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6]),0,
ngrid_6*sizeof(FFT_SCALAR));
// loop over my particles, add their contribution to nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
int type;
double **x = atom->x;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
//do the following for all 4 grids
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
type = atom->type[i];
z0 = delvolinv_6;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
y0 = z0*rho1d_6[2][n];
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
x0 = y0*rho1d_6[1][m];
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
w = x0*rho1d_6[0][l];
for (k = 0; k < nsplit; k++)
density_brick_none[k][mz][my][mx] += w*B[nsplit*type + k];
}
}
}
}
}
/* ----------------------------------------------------------------------
FFT-based Poisson solver for ik differentiation
------------------------------------------------------------------------- */
void PPPMDisp::poisson_ik(FFT_SCALAR* wk1, FFT_SCALAR* wk2,
FFT_SCALAR* dfft, LAMMPS_NS::FFT3d* ft1,LAMMPS_NS::FFT3d* ft2,
int nx_p, int ny_p, int nz_p, int nft,
int nxlo_ft, int nylo_ft, int nzlo_ft,
int nxhi_ft, int nyhi_ft, int nzhi_ft,
int nxlo_i, int nylo_i, int nzlo_i,
int nxhi_i, int nyhi_i, int nzhi_i,
double& egy, double* gfn,
double* kx, double* ky, double* kz,
double* kx2, double* ky2, double* kz2,
FFT_SCALAR*** vx_brick, FFT_SCALAR*** vy_brick, FFT_SCALAR*** vz_brick,
double* vir, double** vcoeff, double** vcoeff2,
FFT_SCALAR*** u_pa, FFT_SCALAR*** v0_pa, FFT_SCALAR*** v1_pa, FFT_SCALAR*** v2_pa,
FFT_SCALAR*** v3_pa, FFT_SCALAR*** v4_pa, FFT_SCALAR*** v5_pa)
{
int i,j,k,n;
double eng;
// transform charge/dispersion density (r -> k)
n = 0;
for (i = 0; i < nft; i++) {
wk1[n++] = dfft[i];
wk1[n++] = ZEROF;
}
ft1->compute(wk1,wk1,1);
// if requested, compute energy and virial contribution
double scaleinv = 1.0/(nx_p*ny_p*nz_p);
double s2 = scaleinv*scaleinv;
if (eflag_global || vflag_global) {
if (vflag_global) {
n = 0;
for (i = 0; i < nft; i++) {
eng = s2 * gfn[i] * (wk1[n]*wk1[n] + wk1[n+1]*wk1[n+1]);
for (j = 0; j < 6; j++) vir[j] += eng*vcoeff[i][j];
if (eflag_global) egy += eng;
n += 2;
}
} else {
n = 0;
for (i = 0; i < nft; i++) {
egy +=
s2 * gfn[i] * (wk1[n]*wk1[n] + wk1[n+1]*wk1[n+1]);
n += 2;
}
}
}
// scale by 1/total-grid-pts to get rho(k)
// multiply by Green's function to get V(k)
n = 0;
for (i = 0; i < nft; i++) {
wk1[n++] *= scaleinv * gfn[i];
wk1[n++] *= scaleinv * gfn[i];
}
// compute gradients of V(r) in each of 3 dims by transformimg -ik*V(k)
// FFT leaves data in 3d brick decomposition
// copy it into inner portion of vdx,vdy,vdz arrays
// x & y direction gradient
n = 0;
for (k = nzlo_ft; k <= nzhi_ft; k++)
for (j = nylo_ft; j <= nyhi_ft; j++)
for (i = nxlo_ft; i <= nxhi_ft; i++) {
wk2[n] = 0.5*(kx[i]-kx2[i])*wk1[n+1] + 0.5*(ky[j]-ky2[j])*wk1[n];
wk2[n+1] = -0.5*(kx[i]-kx2[i])*wk1[n] + 0.5*(ky[j]-ky2[j])*wk1[n+1];
n += 2;
}
ft2->compute(wk2,wk2,-1);
n = 0;
for (k = nzlo_i; k <= nzhi_i; k++)
for (j = nylo_i; j <= nyhi_i; j++)
for (i = nxlo_i; i <= nxhi_i; i++) {
vx_brick[k][j][i] = wk2[n++];
vy_brick[k][j][i] = wk2[n++];
}
if (!eflag_atom) {
// z direction gradient only
n = 0;
for (k = nzlo_ft; k <= nzhi_ft; k++)
for (j = nylo_ft; j <= nyhi_ft; j++)
for (i = nxlo_ft; i <= nxhi_ft; i++) {
wk2[n] = kz[k]*wk1[n+1];
wk2[n+1] = -kz[k]*wk1[n];
n += 2;
}
ft2->compute(wk2,wk2,-1);
n = 0;
for (k = nzlo_i; k <= nzhi_i; k++)
for (j = nylo_i; j <= nyhi_i; j++)
for (i = nxlo_i; i <= nxhi_i; i++) {
vz_brick[k][j][i] = wk2[n];
n += 2;
}
}
else {
// z direction gradient & per-atom energy
n = 0;
for (k = nzlo_ft; k <= nzhi_ft; k++)
for (j = nylo_ft; j <= nyhi_ft; j++)
for (i = nxlo_ft; i <= nxhi_ft; i++) {
wk2[n] = 0.5*(kz[k]-kz2[k])*wk1[n+1] - wk1[n+1];
wk2[n+1] = -0.5*(kz[k]-kz2[k])*wk1[n] + wk1[n];
n += 2;
}
ft2->compute(wk2,wk2,-1);
n = 0;
for (k = nzlo_i; k <= nzhi_i; k++)
for (j = nylo_i; j <= nyhi_i; j++)
for (i = nxlo_i; i <= nxhi_i; i++) {
vz_brick[k][j][i] = wk2[n++];
u_pa[k][j][i] = wk2[n++];;
}
}
if (vflag_atom) poisson_peratom(wk1, wk2, ft2, vcoeff, vcoeff2, nft,
nxlo_i, nylo_i, nzlo_i, nxhi_i, nyhi_i, nzhi_i,
v0_pa, v1_pa, v2_pa, v3_pa, v4_pa, v5_pa);
}
/* ----------------------------------------------------------------------
FFT-based Poisson solver for ad differentiation
------------------------------------------------------------------------- */
void PPPMDisp::poisson_ad(FFT_SCALAR* wk1, FFT_SCALAR* wk2,
FFT_SCALAR* dfft, LAMMPS_NS::FFT3d* ft1,LAMMPS_NS::FFT3d* ft2,
int nx_p, int ny_p, int nz_p, int nft,
int nxlo_ft, int nylo_ft, int nzlo_ft,
int nxhi_ft, int nyhi_ft, int nzhi_ft,
int nxlo_i, int nylo_i, int nzlo_i,
int nxhi_i, int nyhi_i, int nzhi_i,
double& egy, double* gfn,
double* vir, double** vcoeff, double** vcoeff2,
FFT_SCALAR*** u_pa, FFT_SCALAR*** v0_pa, FFT_SCALAR*** v1_pa, FFT_SCALAR*** v2_pa,
FFT_SCALAR*** v3_pa, FFT_SCALAR*** v4_pa, FFT_SCALAR*** v5_pa)
{
int i,j,k,n;
double eng;
// transform charge/dispersion density (r -> k)
n = 0;
for (i = 0; i < nft; i++) {
wk1[n++] = dfft[i];
wk1[n++] = ZEROF;
}
ft1->compute(wk1,wk1,1);
// if requested, compute energy and virial contribution
double scaleinv = 1.0/(nx_p*ny_p*nz_p);
double s2 = scaleinv*scaleinv;
if (eflag_global || vflag_global) {
if (vflag_global) {
n = 0;
for (i = 0; i < nft; i++) {
eng = s2 * gfn[i] * (wk1[n]*wk1[n] + wk1[n+1]*wk1[n+1]);
for (j = 0; j < 6; j++) vir[j] += eng*vcoeff[i][j];
if (eflag_global) egy += eng;
n += 2;
}
} else {
n = 0;
for (i = 0; i < nft; i++) {
egy +=
s2 * gfn[i] * (wk1[n]*wk1[n] + wk1[n+1]*wk1[n+1]);
n += 2;
}
}
}
// scale by 1/total-grid-pts to get rho(k)
// multiply by Green's function to get V(k)
n = 0;
for (i = 0; i < nft; i++) {
wk1[n++] *= scaleinv * gfn[i];
wk1[n++] *= scaleinv * gfn[i];
}
n = 0;
for (k = nzlo_ft; k <= nzhi_ft; k++)
for (j = nylo_ft; j <= nyhi_ft; j++)
for (i = nxlo_ft; i <= nxhi_ft; i++) {
wk2[n] = wk1[n];
wk2[n+1] = wk1[n+1];
n += 2;
}
ft2->compute(wk2,wk2,-1);
n = 0;
for (k = nzlo_i; k <= nzhi_i; k++)
for (j = nylo_i; j <= nyhi_i; j++)
for (i = nxlo_i; i <= nxhi_i; i++) {
u_pa[k][j][i] = wk2[n++];
n++;
}
if (vflag_atom) poisson_peratom(wk1, wk2, ft2, vcoeff, vcoeff2, nft,
nxlo_i, nylo_i, nzlo_i, nxhi_i, nyhi_i, nzhi_i,
v0_pa, v1_pa, v2_pa, v3_pa, v4_pa, v5_pa);
}
/* ----------------------------------------------------------------------
Fourier Transform for per atom virial calculations
------------------------------------------------------------------------- */
void PPPMDisp:: poisson_peratom(FFT_SCALAR* wk1, FFT_SCALAR* wk2, LAMMPS_NS::FFT3d* ft2,
double** vcoeff, double** vcoeff2, int nft,
int nxlo_i, int nylo_i, int nzlo_i,
int nxhi_i, int nyhi_i, int nzhi_i,
FFT_SCALAR*** v0_pa, FFT_SCALAR*** v1_pa, FFT_SCALAR*** v2_pa,
FFT_SCALAR*** v3_pa, FFT_SCALAR*** v4_pa, FFT_SCALAR*** v5_pa)
{
//v0 & v1 term
int n, i, j, k;
n = 0;
for (i = 0; i < nft; i++) {
wk2[n] = wk1[n]*vcoeff[i][0] - wk1[n+1]*vcoeff[i][1];
wk2[n+1] = wk1[n+1]*vcoeff[i][0] + wk1[n]*vcoeff[i][1];
n += 2;
}
ft2->compute(wk2,wk2,-1);
n = 0;
for (k = nzlo_i; k <= nzhi_i; k++)
for (j = nylo_i; j <= nyhi_i; j++)
for (i = nxlo_i; i <= nxhi_i; i++) {
v0_pa[k][j][i] = wk2[n++];
v1_pa[k][j][i] = wk2[n++];
}
//v2 & v3 term
n = 0;
for (i = 0; i < nft; i++) {
wk2[n] = wk1[n]*vcoeff[i][2] - wk1[n+1]*vcoeff2[i][0];
wk2[n+1] = wk1[n+1]*vcoeff[i][2] + wk1[n]*vcoeff2[i][0];
n += 2;
}
ft2->compute(wk2,wk2,-1);
n = 0;
for (k = nzlo_i; k <= nzhi_i; k++)
for (j = nylo_i; j <= nyhi_i; j++)
for (i = nxlo_i; i <= nxhi_i; i++) {
v2_pa[k][j][i] = wk2[n++];
v3_pa[k][j][i] = wk2[n++];
}
//v4 & v5 term
n = 0;
for (i = 0; i < nft; i++) {
wk2[n] = wk1[n]*vcoeff2[i][1] - wk1[n+1]*vcoeff2[i][2];
wk2[n+1] = wk1[n+1]*vcoeff2[i][1] + wk1[n]*vcoeff2[i][2];
n += 2;
}
ft2->compute(wk2,wk2,-1);
n = 0;
for (k = nzlo_i; k <= nzhi_i; k++)
for (j = nylo_i; j <= nyhi_i; j++)
for (i = nxlo_i; i <= nxhi_i; i++) {
v4_pa[k][j][i] = wk2[n++];
v5_pa[k][j][i] = wk2[n++];
}
}
/* ----------------------------------------------------------------------
Poisson solver for one mesh with 2 different dispersion densities
for ik scheme
------------------------------------------------------------------------- */
void PPPMDisp::poisson_2s_ik(FFT_SCALAR* dfft_1, FFT_SCALAR* dfft_2,
FFT_SCALAR*** vxbrick_1, FFT_SCALAR*** vybrick_1, FFT_SCALAR*** vzbrick_1,
FFT_SCALAR*** vxbrick_2, FFT_SCALAR*** vybrick_2, FFT_SCALAR*** vzbrick_2,
FFT_SCALAR*** u_pa_1, FFT_SCALAR*** v0_pa_1, FFT_SCALAR*** v1_pa_1, FFT_SCALAR*** v2_pa_1,
FFT_SCALAR*** v3_pa_1, FFT_SCALAR*** v4_pa_1, FFT_SCALAR*** v5_pa_1,
FFT_SCALAR*** u_pa_2, FFT_SCALAR*** v0_pa_2, FFT_SCALAR*** v1_pa_2, FFT_SCALAR*** v2_pa_2,
FFT_SCALAR*** v3_pa_2, FFT_SCALAR*** v4_pa_2, FFT_SCALAR*** v5_pa_2)
{
int i,j,k,n;
double eng;
double scaleinv = 1.0/(nx_pppm_6*ny_pppm_6*nz_pppm_6);
// transform charge/dispersion density (r -> k)
// only one tansform required when energies and pressures do not
// need to be calculated
if (eflag_global + vflag_global == 0) {
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n++] = dfft_1[i];
work1_6[n++] = dfft_2[i];
}
fft1_6->compute(work1_6,work1_6,1);
}
// two transforms are required when energies and pressures are
// calculated
else {
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n] = dfft_1[i];
work2_6[n++] = ZEROF;
work1_6[n] = ZEROF;
work2_6[n++] = dfft_2[i];
}
fft1_6->compute(work1_6,work1_6,1);
fft1_6->compute(work2_6,work2_6,1);
double s2 = scaleinv*scaleinv;
if (vflag_global) {
n = 0;
for (i = 0; i < nfft_6; i++) {
eng = 2 * s2 * greensfn_6[i] * (work1_6[n]*work2_6[n+1] - work1_6[n+1]*work2_6[n]);
for (j = 0; j < 6; j++) virial_6[j] += eng*vg_6[i][j];
if (eflag_global)energy_6 += eng;
n += 2;
}
} else {
n = 0;
for (i = 0; i < nfft_6; i++) {
energy_6 +=
2 * s2 * greensfn_6[i] * (work1_6[n]*work2_6[n+1] - work1_6[n+1]*work2_6[n]);
n += 2;
}
}
// unify the two transformed vectors for efficient calculations later
for ( i = 0; i < 2*nfft_6; i++) {
work1_6[i] += work2_6[i];
}
}
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n++] *= scaleinv * greensfn_6[i];
work1_6[n++] *= scaleinv * greensfn_6[i];
}
// compute gradients of V(r) in each of 3 dims by transformimg -ik*V(k)
// FFT leaves data in 3d brick decomposition
// copy it into inner portion of vdx,vdy,vdz arrays
// x direction gradient
n = 0;
for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
work2_6[n] = 0.5*(fkx_6[i]-fkx2_6[i])*work1_6[n+1];
work2_6[n+1] = -0.5*(fkx_6[i]-fkx2_6[i])*work1_6[n];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
vxbrick_1[k][j][i] = work2_6[n++];
vxbrick_2[k][j][i] = work2_6[n++];
}
// y direction gradient
n = 0;
for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
work2_6[n] = 0.5*(fky_6[j]-fky2_6[j])*work1_6[n+1];
work2_6[n+1] = -0.5*(fky_6[j]-fky2_6[j])*work1_6[n];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
vybrick_1[k][j][i] = work2_6[n++];
vybrick_2[k][j][i] = work2_6[n++];
}
// z direction gradient
n = 0;
for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
work2_6[n] = 0.5*(fkz_6[k]-fkz2_6[k])*work1_6[n+1];
work2_6[n+1] = -0.5*(fkz_6[k]-fkz2_6[k])*work1_6[n];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
vzbrick_1[k][j][i] = work2_6[n++];
vzbrick_2[k][j][i] = work2_6[n++];
}
//Per-atom energy
if (eflag_atom) {
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n];
work2_6[n+1] = work1_6[n+1];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
u_pa_1[k][j][i] = work2_6[n++];
u_pa_2[k][j][i] = work2_6[n++];
}
}
if (vflag_atom) poisson_2s_peratom(v0_pa_1, v1_pa_1, v2_pa_1, v3_pa_1, v4_pa_1, v5_pa_1,
v0_pa_2, v1_pa_2, v2_pa_2, v3_pa_2, v4_pa_2, v5_pa_2);
}
/* ----------------------------------------------------------------------
Poisson solver for one mesh with 2 different dispersion densities
for ik scheme
------------------------------------------------------------------------- */
void PPPMDisp::poisson_none_ik(int n1, int n2,FFT_SCALAR* dfft_1, FFT_SCALAR* dfft_2,
FFT_SCALAR*** vxbrick_1, FFT_SCALAR*** vybrick_1, FFT_SCALAR*** vzbrick_1,
FFT_SCALAR*** vxbrick_2, FFT_SCALAR*** vybrick_2, FFT_SCALAR*** vzbrick_2,
FFT_SCALAR**** u_pa, FFT_SCALAR**** v0_pa, FFT_SCALAR**** v1_pa, FFT_SCALAR**** v2_pa,
FFT_SCALAR**** v3_pa, FFT_SCALAR**** v4_pa, FFT_SCALAR**** v5_pa)
{
int i,j,k,n;
double eng;
double scaleinv = 1.0/(nx_pppm_6*ny_pppm_6*nz_pppm_6);
// transform charge/dispersion density (r -> k)
// only one tansform required when energies and pressures do not
// need to be calculated
if (eflag_global + vflag_global == 0) {
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n++] = dfft_1[i];
work1_6[n++] = dfft_2[i];
}
fft1_6->compute(work1_6,work1_6,1);
}
// two transforms are required when energies and pressures are
// calculated
else {
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n] = dfft_1[i];
work2_6[n++] = ZEROF;
work1_6[n] = ZEROF;
work2_6[n++] = dfft_2[i];
}
fft1_6->compute(work1_6,work1_6,1);
fft1_6->compute(work2_6,work2_6,1);
double s2 = scaleinv*scaleinv;
if (vflag_global) {
n = 0;
for (i = 0; i < nfft_6; i++) {
eng = s2 * greensfn_6[i] * (B[n1]*(work1_6[n]*work1_6[n] + work1_6[n+1]*work1_6[n+1]) + B[n2]*(work2_6[n]*work2_6[n] + work2_6[n+1]*work2_6[n+1]));
for (j = 0; j < 6; j++) virial_6[j] += eng*vg_6[i][j];
if (eflag_global)energy_6 += eng;
n += 2;
}
} else {
n = 0;
for (i = 0; i < nfft_6; i++) {
energy_6 +=
s2 * greensfn_6[i] * (B[n1]*(work1_6[n]*work1_6[n] + work1_6[n+1]*work1_6[n+1]) + B[n2]*(work2_6[n]*work2_6[n] + work2_6[n+1]*work2_6[n+1]));
n += 2;
}
}
// unify the two transformed vectors for efficient calculations later
for ( i = 0; i < 2*nfft_6; i++) {
work1_6[i] += work2_6[i];
}
}
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n++] *= scaleinv * greensfn_6[i];
work1_6[n++] *= scaleinv * greensfn_6[i];
}
// compute gradients of V(r) in each of 3 dims by transformimg -ik*V(k)
// FFT leaves data in 3d brick decomposition
// copy it into inner portion of vdx,vdy,vdz arrays
// x direction gradient
n = 0;
for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
work2_6[n] = 0.5*(fkx_6[i]-fkx2_6[i])*work1_6[n+1];
work2_6[n+1] = -0.5*(fkx_6[i]-fkx2_6[i])*work1_6[n];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
vxbrick_1[k][j][i] = B[n1]*work2_6[n++];
vxbrick_2[k][j][i] = B[n2]*work2_6[n++];
}
// y direction gradient
n = 0;
for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
work2_6[n] = 0.5*(fky_6[j]-fky2_6[j])*work1_6[n+1];
work2_6[n+1] = -0.5*(fky_6[j]-fky2_6[j])*work1_6[n];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
vybrick_1[k][j][i] = B[n1]*work2_6[n++];
vybrick_2[k][j][i] = B[n2]*work2_6[n++];
}
// z direction gradient
n = 0;
for (k = nzlo_fft_6; k <= nzhi_fft_6; k++)
for (j = nylo_fft_6; j <= nyhi_fft_6; j++)
for (i = nxlo_fft_6; i <= nxhi_fft_6; i++) {
work2_6[n] = 0.5*(fkz_6[k]-fkz2_6[k])*work1_6[n+1];
work2_6[n+1] = -0.5*(fkz_6[k]-fkz2_6[k])*work1_6[n];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
vzbrick_1[k][j][i] = B[n1]*work2_6[n++];
vzbrick_2[k][j][i] = B[n2]*work2_6[n++];
}
//Per-atom energy
if (eflag_atom) {
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n];
work2_6[n+1] = work1_6[n+1];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
u_pa[n1][k][j][i] = B[n1]*work2_6[n++];
u_pa[n2][k][j][i] = B[n2]*work2_6[n++];
}
}
if (vflag_atom) poisson_none_peratom(n1,n2,
v0_pa[n1], v1_pa[n1], v2_pa[n1], v3_pa[n1], v4_pa[n1], v5_pa[n1],
v0_pa[n2], v1_pa[n2], v2_pa[n2], v3_pa[n2], v4_pa[n2], v5_pa[n2]);
}
/* ----------------------------------------------------------------------
Poisson solver for one mesh with 2 different dispersion densities
for ad scheme
------------------------------------------------------------------------- */
void PPPMDisp::poisson_2s_ad(FFT_SCALAR* dfft_1, FFT_SCALAR* dfft_2,
FFT_SCALAR*** u_pa_1, FFT_SCALAR*** v0_pa_1, FFT_SCALAR*** v1_pa_1, FFT_SCALAR*** v2_pa_1,
FFT_SCALAR*** v3_pa_1, FFT_SCALAR*** v4_pa_1, FFT_SCALAR*** v5_pa_1,
FFT_SCALAR*** u_pa_2, FFT_SCALAR*** v0_pa_2, FFT_SCALAR*** v1_pa_2, FFT_SCALAR*** v2_pa_2,
FFT_SCALAR*** v3_pa_2, FFT_SCALAR*** v4_pa_2, FFT_SCALAR*** v5_pa_2)
{
int i,j,k,n;
double eng;
double scaleinv = 1.0/(nx_pppm_6*ny_pppm_6*nz_pppm_6);
// transform charge/dispersion density (r -> k)
// only one tansform required when energies and pressures do not
// need to be calculated
if (eflag_global + vflag_global == 0) {
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n++] = dfft_1[i];
work1_6[n++] = dfft_2[i];
}
fft1_6->compute(work1_6,work1_6,1);
}
// two transforms are required when energies and pressures are
// calculated
else {
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n] = dfft_1[i];
work2_6[n++] = ZEROF;
work1_6[n] = ZEROF;
work2_6[n++] = dfft_2[i];
}
fft1_6->compute(work1_6,work1_6,1);
fft1_6->compute(work2_6,work2_6,1);
double s2 = scaleinv*scaleinv;
if (vflag_global) {
n = 0;
for (i = 0; i < nfft_6; i++) {
eng = 2 * s2 * greensfn_6[i] * (work1_6[n]*work2_6[n+1] - work1_6[n+1]*work2_6[n]);
for (j = 0; j < 6; j++) virial_6[j] += eng*vg_6[i][j];
if (eflag_global)energy_6 += eng;
n += 2;
}
} else {
n = 0;
for (i = 0; i < nfft_6; i++) {
energy_6 +=
2 * s2 * greensfn_6[i] * (work1_6[n]*work2_6[n+1] - work1_6[n+1]*work2_6[n]);
n += 2;
}
}
// unify the two transformed vectors for efficient calculations later
for ( i = 0; i < 2*nfft_6; i++) {
work1_6[i] += work2_6[i];
}
}
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n++] *= scaleinv * greensfn_6[i];
work1_6[n++] *= scaleinv * greensfn_6[i];
}
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n];
work2_6[n+1] = work1_6[n+1];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
u_pa_1[k][j][i] = work2_6[n++];
u_pa_2[k][j][i] = work2_6[n++];
}
if (vflag_atom) poisson_2s_peratom(v0_pa_1, v1_pa_1, v2_pa_1, v3_pa_1, v4_pa_1, v5_pa_1,
v0_pa_2, v1_pa_2, v2_pa_2, v3_pa_2, v4_pa_2, v5_pa_2);
}
/* ----------------------------------------------------------------------
Poisson solver for one mesh with 2 different dispersion densities
for ad scheme
------------------------------------------------------------------------- */
void PPPMDisp::poisson_none_ad(int n1, int n2, FFT_SCALAR* dfft_1, FFT_SCALAR* dfft_2,
FFT_SCALAR*** u_pa_1, FFT_SCALAR*** u_pa_2,
FFT_SCALAR**** v0_pa, FFT_SCALAR**** v1_pa, FFT_SCALAR**** v2_pa,
FFT_SCALAR**** v3_pa, FFT_SCALAR**** v4_pa, FFT_SCALAR**** v5_pa)
{
int i,j,k,n;
double eng;
double scaleinv = 1.0/(nx_pppm_6*ny_pppm_6*nz_pppm_6);
// transform charge/dispersion density (r -> k)
// only one tansform required when energies and pressures do not
// need to be calculated
if (eflag_global + vflag_global == 0) {
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n++] = dfft_1[i];
work1_6[n++] = dfft_2[i];
}
fft1_6->compute(work1_6,work1_6,1);
}
// two transforms are required when energies and pressures are
// calculated
else {
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n] = dfft_1[i];
work2_6[n++] = ZEROF;
work1_6[n] = ZEROF;
work2_6[n++] = dfft_2[i];
}
fft1_6->compute(work1_6,work1_6,1);
fft1_6->compute(work2_6,work2_6,1);
double s2 = scaleinv*scaleinv;
if (vflag_global) {
n = 0;
for (i = 0; i < nfft_6; i++) {
eng = s2 * greensfn_6[i] * (B[n1]*(work1_6[n]*work1_6[n] + work1_6[n+1]*work1_6[n+1]) + B[n2]*(work2_6[n]*work2_6[n] + work2_6[n+1]*work2_6[n+1]));
for (j = 0; j < 6; j++) virial_6[j] += eng*vg_6[i][j];
if (eflag_global)energy_6 += eng;
n += 2;
}
} else {
n = 0;
for (i = 0; i < nfft_6; i++) {
energy_6 +=
s2 * greensfn_6[i] * (B[n1]*(work1_6[n]*work1_6[n] + work1_6[n+1]*work1_6[n+1]) + B[n2]*(work2_6[n]*work2_6[n] + work2_6[n+1]*work2_6[n+1]));
n += 2;
}
}
// unify the two transformed vectors for efficient calculations later
for ( i = 0; i < 2*nfft_6; i++) {
work1_6[i] += work2_6[i];
}
}
n = 0;
for (i = 0; i < nfft_6; i++) {
work1_6[n++] *= scaleinv * greensfn_6[i];
work1_6[n++] *= scaleinv * greensfn_6[i];
}
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n];
work2_6[n+1] = work1_6[n+1];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
u_pa_1[k][j][i] = B[n1]*work2_6[n++];
u_pa_2[k][j][i] = B[n2]*work2_6[n++];
}
if (vflag_atom) poisson_none_peratom(n1,n2,
v0_pa[n1], v1_pa[n1], v2_pa[n1], v3_pa[n1], v4_pa[n1], v5_pa[n1],
v0_pa[n2], v1_pa[n2], v2_pa[n2], v3_pa[n2], v4_pa[n2], v5_pa[n2]);
}
/* ----------------------------------------------------------------------
Fourier Transform for per atom virial calculations
------------------------------------------------------------------------- */
void PPPMDisp::poisson_2s_peratom(FFT_SCALAR*** v0_pa_1, FFT_SCALAR*** v1_pa_1, FFT_SCALAR*** v2_pa_1,
FFT_SCALAR*** v3_pa_1, FFT_SCALAR*** v4_pa_1, FFT_SCALAR*** v5_pa_1,
FFT_SCALAR*** v0_pa_2, FFT_SCALAR*** v1_pa_2, FFT_SCALAR*** v2_pa_2,
FFT_SCALAR*** v3_pa_2, FFT_SCALAR*** v4_pa_2, FFT_SCALAR*** v5_pa_2)
{
//Compute first virial term v0
int n, i, j, k;
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg_6[i][0];
work2_6[n+1] = work1_6[n+1]*vg_6[i][0];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v0_pa_1[k][j][i] = work2_6[n++];
v0_pa_2[k][j][i] = work2_6[n++];
}
//Compute second virial term v1
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg_6[i][1];
work2_6[n+1] = work1_6[n+1]*vg_6[i][1];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v1_pa_1[k][j][i] = work2_6[n++];
v1_pa_2[k][j][i] = work2_6[n++];
}
//Compute third virial term v2
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg_6[i][2];
work2_6[n+1] = work1_6[n+1]*vg_6[i][2];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v2_pa_1[k][j][i] = work2_6[n++];
v2_pa_2[k][j][i] = work2_6[n++];
}
//Compute fourth virial term v3
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg2_6[i][0];
work2_6[n+1] = work1_6[n+1]*vg2_6[i][0];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v3_pa_1[k][j][i] = work2_6[n++];
v3_pa_2[k][j][i] = work2_6[n++];
}
//Compute fifth virial term v4
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg2_6[i][1];
work2_6[n+1] = work1_6[n+1]*vg2_6[i][1];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v4_pa_1[k][j][i] = work2_6[n++];
v4_pa_2[k][j][i] = work2_6[n++];
}
//Compute last virial term v5
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg2_6[i][2];
work2_6[n+1] = work1_6[n+1]*vg2_6[i][2];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v5_pa_1[k][j][i] = work2_6[n++];
v5_pa_2[k][j][i] = work2_6[n++];
}
}
/* ----------------------------------------------------------------------
Fourier Transform for per atom virial calculations
------------------------------------------------------------------------- */
void PPPMDisp::poisson_none_peratom(int n1, int n2,
FFT_SCALAR*** v0_pa_1, FFT_SCALAR*** v1_pa_1, FFT_SCALAR*** v2_pa_1,
FFT_SCALAR*** v3_pa_1, FFT_SCALAR*** v4_pa_1, FFT_SCALAR*** v5_pa_1,
FFT_SCALAR*** v0_pa_2, FFT_SCALAR*** v1_pa_2, FFT_SCALAR*** v2_pa_2,
FFT_SCALAR*** v3_pa_2, FFT_SCALAR*** v4_pa_2, FFT_SCALAR*** v5_pa_2)
{
//Compute first virial term v0
int n, i, j, k;
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg_6[i][0];
work2_6[n+1] = work1_6[n+1]*vg_6[i][0];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v0_pa_1[k][j][i] = B[n1]*work2_6[n++];
v0_pa_2[k][j][i] = B[n2]*work2_6[n++];
}
//Compute second virial term v1
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg_6[i][1];
work2_6[n+1] = work1_6[n+1]*vg_6[i][1];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v1_pa_1[k][j][i] = B[n1]*work2_6[n++];
v1_pa_2[k][j][i] = B[n2]*work2_6[n++];
}
//Compute third virial term v2
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg_6[i][2];
work2_6[n+1] = work1_6[n+1]*vg_6[i][2];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v2_pa_1[k][j][i] = B[n1]*work2_6[n++];
v2_pa_2[k][j][i] = B[n2]*work2_6[n++];
}
//Compute fourth virial term v3
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg2_6[i][0];
work2_6[n+1] = work1_6[n+1]*vg2_6[i][0];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v3_pa_1[k][j][i] = B[n1]*work2_6[n++];
v3_pa_2[k][j][i] = B[n2]*work2_6[n++];
}
//Compute fifth virial term v4
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg2_6[i][1];
work2_6[n+1] = work1_6[n+1]*vg2_6[i][1];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v4_pa_1[k][j][i] = B[n1]*work2_6[n++];
v4_pa_2[k][j][i] = B[n2]*work2_6[n++];
}
//Compute last virial term v5
n = 0;
for (i = 0; i < nfft_6; i++) {
work2_6[n] = work1_6[n]*vg2_6[i][2];
work2_6[n+1] = work1_6[n+1]*vg2_6[i][2];
n += 2;
}
fft2_6->compute(work2_6,work2_6,-1);
n = 0;
for (k = nzlo_in_6; k <= nzhi_in_6; k++)
for (j = nylo_in_6; j <= nyhi_in_6; j++)
for (i = nxlo_in_6; i <= nxhi_in_6; i++) {
v5_pa_1[k][j][i] = B[n1]*work2_6[n++];
v5_pa_2[k][j][i] = B[n2]*work2_6[n++];
}
}
/* ----------------------------------------------------------------------
interpolate from grid to get electric field & force on my particles
for ik scheme
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_c_ik()
{
int i,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
FFT_SCALAR ekx,eky,ekz;
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of E-field on particle
double *q = atom->q;
double **x = atom->x;
double **f = atom->f;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid[i][0];
ny = part2grid[i][1];
nz = part2grid[i][2];
dx = nx+shiftone - (x[i][0]-boxlo[0])*delxinv;
dy = ny+shiftone - (x[i][1]-boxlo[1])*delyinv;
dz = nz+shiftone - (x[i][2]-boxlo[2])*delzinv;
compute_rho1d(dx,dy,dz, order, rho_coeff, rho1d);
ekx = eky = ekz = ZEROF;
for (n = nlower; n <= nupper; n++) {
mz = n+nz;
z0 = rho1d[2][n];
for (m = nlower; m <= nupper; m++) {
my = m+ny;
y0 = z0*rho1d[1][m];
for (l = nlower; l <= nupper; l++) {
mx = l+nx;
x0 = y0*rho1d[0][l];
ekx -= x0*vdx_brick[mz][my][mx];
eky -= x0*vdy_brick[mz][my][mx];
ekz -= x0*vdz_brick[mz][my][mx];
}
}
}
// convert E-field to force
const double qfactor = force->qqrd2e * scale * q[i];
f[i][0] += qfactor*ekx;
f[i][1] += qfactor*eky;
if (slabflag != 2) f[i][2] += qfactor*ekz;
}
}
/* ----------------------------------------------------------------------
interpolate from grid to get electric field & force on my particles
for ad scheme
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_c_ad()
{
int i,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz;
FFT_SCALAR ekx,eky,ekz;
double s1,s2,s3;
double sf = 0.0;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double hx_inv = nx_pppm/xprd;
double hy_inv = ny_pppm/yprd;
double hz_inv = nz_pppm/zprd_slab;
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of E-field on particle
double *q = atom->q;
double **x = atom->x;
double **f = atom->f;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid[i][0];
ny = part2grid[i][1];
nz = part2grid[i][2];
dx = nx+shiftone - (x[i][0]-boxlo[0])*delxinv;
dy = ny+shiftone - (x[i][1]-boxlo[1])*delyinv;
dz = nz+shiftone - (x[i][2]-boxlo[2])*delzinv;
compute_rho1d(dx,dy,dz, order, rho_coeff, rho1d);
compute_drho1d(dx,dy,dz, order, drho_coeff, drho1d);
ekx = eky = ekz = ZEROF;
for (n = nlower; n <= nupper; n++) {
mz = n+nz;
for (m = nlower; m <= nupper; m++) {
my = m+ny;
for (l = nlower; l <= nupper; l++) {
mx = l+nx;
ekx += drho1d[0][l]*rho1d[1][m]*rho1d[2][n]*u_brick[mz][my][mx];
eky += rho1d[0][l]*drho1d[1][m]*rho1d[2][n]*u_brick[mz][my][mx];
ekz += rho1d[0][l]*rho1d[1][m]*drho1d[2][n]*u_brick[mz][my][mx];
}
}
}
ekx *= hx_inv;
eky *= hy_inv;
ekz *= hz_inv;
// convert E-field to force and substract self forces
const double qfactor = force->qqrd2e * scale;
s1 = x[i][0]*hx_inv;
s2 = x[i][1]*hy_inv;
s3 = x[i][2]*hz_inv;
sf = sf_coeff[0]*sin(2*MY_PI*s1);
sf += sf_coeff[1]*sin(4*MY_PI*s1);
sf *= 2*q[i]*q[i];
f[i][0] += qfactor*(ekx*q[i] - sf);
sf = sf_coeff[2]*sin(2*MY_PI*s2);
sf += sf_coeff[3]*sin(4*MY_PI*s2);
sf *= 2*q[i]*q[i];
f[i][1] += qfactor*(eky*q[i] - sf);
sf = sf_coeff[4]*sin(2*MY_PI*s3);
sf += sf_coeff[5]*sin(4*MY_PI*s3);
sf *= 2*q[i]*q[i];
if (slabflag != 2) f[i][2] += qfactor*(ekz*q[i] - sf);
}
}
/* ----------------------------------------------------------------------
interpolate from grid to get electric field & force on my particles
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_c_peratom()
{
int i,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
FFT_SCALAR u_pa,v0,v1,v2,v3,v4,v5;
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of E-field on particle
double *q = atom->q;
double **x = atom->x;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid[i][0];
ny = part2grid[i][1];
nz = part2grid[i][2];
dx = nx+shiftone - (x[i][0]-boxlo[0])*delxinv;
dy = ny+shiftone - (x[i][1]-boxlo[1])*delyinv;
dz = nz+shiftone - (x[i][2]-boxlo[2])*delzinv;
compute_rho1d(dx,dy,dz, order, rho_coeff, rho1d);
u_pa = v0 = v1 = v2 = v3 = v4 = v5 = ZEROF;
for (n = nlower; n <= nupper; n++) {
mz = n+nz;
z0 = rho1d[2][n];
for (m = nlower; m <= nupper; m++) {
my = m+ny;
y0 = z0*rho1d[1][m];
for (l = nlower; l <= nupper; l++) {
mx = l+nx;
x0 = y0*rho1d[0][l];
if (eflag_atom) u_pa += x0*u_brick[mz][my][mx];
if (vflag_atom) {
v0 += x0*v0_brick[mz][my][mx];
v1 += x0*v1_brick[mz][my][mx];
v2 += x0*v2_brick[mz][my][mx];
v3 += x0*v3_brick[mz][my][mx];
v4 += x0*v4_brick[mz][my][mx];
v5 += x0*v5_brick[mz][my][mx];
}
}
}
}
// convert E-field to force
const double qfactor = 0.5*force->qqrd2e * scale * q[i];
if (eflag_atom) eatom[i] += u_pa*qfactor;
if (vflag_atom) {
vatom[i][0] += v0*qfactor;
vatom[i][1] += v1*qfactor;
vatom[i][2] += v2*qfactor;
vatom[i][3] += v3*qfactor;
vatom[i][4] += v4*qfactor;
vatom[i][5] += v5*qfactor;
}
}
}
/* ----------------------------------------------------------------------
interpolate from grid to get dispersion field & force on my particles
for geometric mixing rule
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_g_ik()
{
int i,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
FFT_SCALAR ekx,eky,ekz;
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of dispersion field on particle
double **x = atom->x;
double **f = atom->f;
int type;
double lj;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
ekx = eky = ekz = ZEROF;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
z0 = rho1d_6[2][n];
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
y0 = z0*rho1d_6[1][m];
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
x0 = y0*rho1d_6[0][l];
ekx -= x0*vdx_brick_g[mz][my][mx];
eky -= x0*vdy_brick_g[mz][my][mx];
ekz -= x0*vdz_brick_g[mz][my][mx];
}
}
}
// convert E-field to force
type = atom->type[i];
lj = B[type];
f[i][0] += lj*ekx;
f[i][1] += lj*eky;
if (slabflag != 2) f[i][2] += lj*ekz;
}
}
/* ----------------------------------------------------------------------
interpolate from grid to get dispersion field & force on my particles
for geometric mixing rule for ad scheme
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_g_ad()
{
int i,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz;
FFT_SCALAR ekx,eky,ekz;
double s1,s2,s3;
double sf = 0.0;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double hx_inv = nx_pppm_6/xprd;
double hy_inv = ny_pppm_6/yprd;
double hz_inv = nz_pppm_6/zprd_slab;
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of dispersion field on particle
double **x = atom->x;
double **f = atom->f;
int type;
double lj;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
compute_drho1d(dx,dy,dz, order_6, drho_coeff_6, drho1d_6);
ekx = eky = ekz = ZEROF;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
ekx += drho1d_6[0][l]*rho1d_6[1][m]*rho1d_6[2][n]*u_brick_g[mz][my][mx];
eky += rho1d_6[0][l]*drho1d_6[1][m]*rho1d_6[2][n]*u_brick_g[mz][my][mx];
ekz += rho1d_6[0][l]*rho1d_6[1][m]*drho1d_6[2][n]*u_brick_g[mz][my][mx];
}
}
}
ekx *= hx_inv;
eky *= hy_inv;
ekz *= hz_inv;
// convert E-field to force
type = atom->type[i];
lj = B[type];
s1 = x[i][0]*hx_inv;
s2 = x[i][1]*hy_inv;
s3 = x[i][2]*hz_inv;
sf = sf_coeff_6[0]*sin(2*MY_PI*s1);
sf += sf_coeff_6[1]*sin(4*MY_PI*s1);
sf *= 2*lj*lj;
f[i][0] += ekx*lj - sf;
sf = sf_coeff_6[2]*sin(2*MY_PI*s2);
sf += sf_coeff_6[3]*sin(4*MY_PI*s2);
sf *= 2*lj*lj;
f[i][1] += eky*lj - sf;
sf = sf_coeff_6[4]*sin(2*MY_PI*s3);
sf += sf_coeff_6[5]*sin(4*MY_PI*s3);
sf *= 2*lj*lj;
if (slabflag != 2) f[i][2] += ekz*lj - sf;
}
}
/* ----------------------------------------------------------------------
interpolate from grid to get dispersion field & force on my particles
for geometric mixing rule for per atom quantities
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_g_peratom()
{
int i,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
FFT_SCALAR u_pa,v0,v1,v2,v3,v4,v5;
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of dispersion field on particle
double **x = atom->x;
int type;
double lj;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
u_pa = v0 = v1 = v2 = v3 = v4 = v5 = ZEROF;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
z0 = rho1d_6[2][n];
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
y0 = z0*rho1d_6[1][m];
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
x0 = y0*rho1d_6[0][l];
if (eflag_atom) u_pa += x0*u_brick_g[mz][my][mx];
if (vflag_atom) {
v0 += x0*v0_brick_g[mz][my][mx];
v1 += x0*v1_brick_g[mz][my][mx];
v2 += x0*v2_brick_g[mz][my][mx];
v3 += x0*v3_brick_g[mz][my][mx];
v4 += x0*v4_brick_g[mz][my][mx];
v5 += x0*v5_brick_g[mz][my][mx];
}
}
}
}
// convert E-field to force
type = atom->type[i];
lj = B[type]*0.5;
if (eflag_atom) eatom[i] += u_pa*lj;
if (vflag_atom) {
vatom[i][0] += v0*lj;
vatom[i][1] += v1*lj;
vatom[i][2] += v2*lj;
vatom[i][3] += v3*lj;
vatom[i][4] += v4*lj;
vatom[i][5] += v5*lj;
}
}
}
/* ----------------------------------------------------------------------
interpolate from grid to get dispersion field & force on my particles
for arithmetic mixing rule and ik scheme
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_a_ik()
{
int i,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
FFT_SCALAR ekx0, eky0, ekz0, ekx1, eky1, ekz1, ekx2, eky2, ekz2;
FFT_SCALAR ekx3, eky3, ekz3, ekx4, eky4, ekz4, ekx5, eky5, ekz5;
FFT_SCALAR ekx6, eky6, ekz6;
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of dispersion field on particle
double **x = atom->x;
double **f = atom->f;
int type;
double lj0, lj1, lj2, lj3, lj4, lj5, lj6;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
ekx0 = eky0 = ekz0 = ZEROF;
ekx1 = eky1 = ekz1 = ZEROF;
ekx2 = eky2 = ekz2 = ZEROF;
ekx3 = eky3 = ekz3 = ZEROF;
ekx4 = eky4 = ekz4 = ZEROF;
ekx5 = eky5 = ekz5 = ZEROF;
ekx6 = eky6 = ekz6 = ZEROF;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
z0 = rho1d_6[2][n];
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
y0 = z0*rho1d_6[1][m];
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
x0 = y0*rho1d_6[0][l];
ekx0 -= x0*vdx_brick_a0[mz][my][mx];
eky0 -= x0*vdy_brick_a0[mz][my][mx];
ekz0 -= x0*vdz_brick_a0[mz][my][mx];
ekx1 -= x0*vdx_brick_a1[mz][my][mx];
eky1 -= x0*vdy_brick_a1[mz][my][mx];
ekz1 -= x0*vdz_brick_a1[mz][my][mx];
ekx2 -= x0*vdx_brick_a2[mz][my][mx];
eky2 -= x0*vdy_brick_a2[mz][my][mx];
ekz2 -= x0*vdz_brick_a2[mz][my][mx];
ekx3 -= x0*vdx_brick_a3[mz][my][mx];
eky3 -= x0*vdy_brick_a3[mz][my][mx];
ekz3 -= x0*vdz_brick_a3[mz][my][mx];
ekx4 -= x0*vdx_brick_a4[mz][my][mx];
eky4 -= x0*vdy_brick_a4[mz][my][mx];
ekz4 -= x0*vdz_brick_a4[mz][my][mx];
ekx5 -= x0*vdx_brick_a5[mz][my][mx];
eky5 -= x0*vdy_brick_a5[mz][my][mx];
ekz5 -= x0*vdz_brick_a5[mz][my][mx];
ekx6 -= x0*vdx_brick_a6[mz][my][mx];
eky6 -= x0*vdy_brick_a6[mz][my][mx];
ekz6 -= x0*vdz_brick_a6[mz][my][mx];
}
}
}
// convert D-field to force
type = atom->type[i];
lj0 = B[7*type+6];
lj1 = B[7*type+5];
lj2 = B[7*type+4];
lj3 = B[7*type+3];
lj4 = B[7*type+2];
lj5 = B[7*type+1];
lj6 = B[7*type];
f[i][0] += lj0*ekx0 + lj1*ekx1 + lj2*ekx2 + lj3*ekx3 + lj4*ekx4 + lj5*ekx5 + lj6*ekx6;
f[i][1] += lj0*eky0 + lj1*eky1 + lj2*eky2 + lj3*eky3 + lj4*eky4 + lj5*eky5 + lj6*eky6;
if (slabflag != 2) f[i][2] += lj0*ekz0 + lj1*ekz1 + lj2*ekz2 + lj3*ekz3 + lj4*ekz4 + lj5*ekz5 + lj6*ekz6;
}
}
/* ----------------------------------------------------------------------
interpolate from grid to get dispersion field & force on my particles
for arithmetic mixing rule for the ad scheme
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_a_ad()
{
int i,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
FFT_SCALAR ekx0, eky0, ekz0, ekx1, eky1, ekz1, ekx2, eky2, ekz2;
FFT_SCALAR ekx3, eky3, ekz3, ekx4, eky4, ekz4, ekx5, eky5, ekz5;
FFT_SCALAR ekx6, eky6, ekz6;
double s1,s2,s3;
double sf = 0.0;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double hx_inv = nx_pppm_6/xprd;
double hy_inv = ny_pppm_6/yprd;
double hz_inv = nz_pppm_6/zprd_slab;
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of dispersion field on particle
double **x = atom->x;
double **f = atom->f;
int type;
double lj0, lj1, lj2, lj3, lj4, lj5, lj6;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
compute_drho1d(dx,dy,dz, order_6, drho_coeff_6, drho1d_6);
ekx0 = eky0 = ekz0 = ZEROF;
ekx1 = eky1 = ekz1 = ZEROF;
ekx2 = eky2 = ekz2 = ZEROF;
ekx3 = eky3 = ekz3 = ZEROF;
ekx4 = eky4 = ekz4 = ZEROF;
ekx5 = eky5 = ekz5 = ZEROF;
ekx6 = eky6 = ekz6 = ZEROF;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
x0 = drho1d_6[0][l]*rho1d_6[1][m]*rho1d_6[2][n];
y0 = rho1d_6[0][l]*drho1d_6[1][m]*rho1d_6[2][n];
z0 = rho1d_6[0][l]*rho1d_6[1][m]*drho1d_6[2][n];
ekx0 += x0*u_brick_a0[mz][my][mx];
eky0 += y0*u_brick_a0[mz][my][mx];
ekz0 += z0*u_brick_a0[mz][my][mx];
ekx1 += x0*u_brick_a1[mz][my][mx];
eky1 += y0*u_brick_a1[mz][my][mx];
ekz1 += z0*u_brick_a1[mz][my][mx];
ekx2 += x0*u_brick_a2[mz][my][mx];
eky2 += y0*u_brick_a2[mz][my][mx];
ekz2 += z0*u_brick_a2[mz][my][mx];
ekx3 += x0*u_brick_a3[mz][my][mx];
eky3 += y0*u_brick_a3[mz][my][mx];
ekz3 += z0*u_brick_a3[mz][my][mx];
ekx4 += x0*u_brick_a4[mz][my][mx];
eky4 += y0*u_brick_a4[mz][my][mx];
ekz4 += z0*u_brick_a4[mz][my][mx];
ekx5 += x0*u_brick_a5[mz][my][mx];
eky5 += y0*u_brick_a5[mz][my][mx];
ekz5 += z0*u_brick_a5[mz][my][mx];
ekx6 += x0*u_brick_a6[mz][my][mx];
eky6 += y0*u_brick_a6[mz][my][mx];
ekz6 += z0*u_brick_a6[mz][my][mx];
}
}
}
ekx0 *= hx_inv;
eky0 *= hy_inv;
ekz0 *= hz_inv;
ekx1 *= hx_inv;
eky1 *= hy_inv;
ekz1 *= hz_inv;
ekx2 *= hx_inv;
eky2 *= hy_inv;
ekz2 *= hz_inv;
ekx3 *= hx_inv;
eky3 *= hy_inv;
ekz3 *= hz_inv;
ekx4 *= hx_inv;
eky4 *= hy_inv;
ekz4 *= hz_inv;
ekx5 *= hx_inv;
eky5 *= hy_inv;
ekz5 *= hz_inv;
ekx6 *= hx_inv;
eky6 *= hy_inv;
ekz6 *= hz_inv;
// convert D-field to force
type = atom->type[i];
lj0 = B[7*type+6];
lj1 = B[7*type+5];
lj2 = B[7*type+4];
lj3 = B[7*type+3];
lj4 = B[7*type+2];
lj5 = B[7*type+1];
lj6 = B[7*type];
s1 = x[i][0]*hx_inv;
s2 = x[i][1]*hy_inv;
s3 = x[i][2]*hz_inv;
sf = sf_coeff_6[0]*sin(2*MY_PI*s1);
sf += sf_coeff_6[1]*sin(4*MY_PI*s1);
sf *= 4*lj0*lj6 + 4*lj1*lj5 + 4*lj2*lj4 + 2*lj3*lj3;
f[i][0] += lj0*ekx0 + lj1*ekx1 + lj2*ekx2 + lj3*ekx3 + lj4*ekx4 + lj5*ekx5 + lj6*ekx6 - sf;
sf = sf_coeff_6[2]*sin(2*MY_PI*s2);
sf += sf_coeff_6[3]*sin(4*MY_PI*s2);
sf *= 4*lj0*lj6 + 4*lj1*lj5 + 4*lj2*lj4 + 2*lj3*lj3;
f[i][1] += lj0*eky0 + lj1*eky1 + lj2*eky2 + lj3*eky3 + lj4*eky4 + lj5*eky5 + lj6*eky6 - sf;
sf = sf_coeff_6[4]*sin(2*MY_PI*s3);
sf += sf_coeff_6[5]*sin(4*MY_PI*s3);
sf *= 4*lj0*lj6 + 4*lj1*lj5 + 4*lj2*lj4 + 2*lj3*lj3;
if (slabflag != 2) f[i][2] += lj0*ekz0 + lj1*ekz1 + lj2*ekz2 + lj3*ekz3 + lj4*ekz4 + lj5*ekz5 + lj6*ekz6 - sf;
}
}
/* ----------------------------------------------------------------------
interpolate from grid to get dispersion field & force on my particles
for arithmetic mixing rule for per atom quantities
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_a_peratom()
{
int i,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
FFT_SCALAR u_pa0,v00,v10,v20,v30,v40,v50;
FFT_SCALAR u_pa1,v01,v11,v21,v31,v41,v51;
FFT_SCALAR u_pa2,v02,v12,v22,v32,v42,v52;
FFT_SCALAR u_pa3,v03,v13,v23,v33,v43,v53;
FFT_SCALAR u_pa4,v04,v14,v24,v34,v44,v54;
FFT_SCALAR u_pa5,v05,v15,v25,v35,v45,v55;
FFT_SCALAR u_pa6,v06,v16,v26,v36,v46,v56;
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of dispersion field on particle
double **x = atom->x;
int type;
double lj0, lj1, lj2, lj3, lj4, lj5, lj6;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
u_pa0 = v00 = v10 = v20 = v30 = v40 = v50 = ZEROF;
u_pa1 = v01 = v11 = v21 = v31 = v41 = v51 = ZEROF;
u_pa2 = v02 = v12 = v22 = v32 = v42 = v52 = ZEROF;
u_pa3 = v03 = v13 = v23 = v33 = v43 = v53 = ZEROF;
u_pa4 = v04 = v14 = v24 = v34 = v44 = v54 = ZEROF;
u_pa5 = v05 = v15 = v25 = v35 = v45 = v55 = ZEROF;
u_pa6 = v06 = v16 = v26 = v36 = v46 = v56 = ZEROF;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
z0 = rho1d_6[2][n];
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
y0 = z0*rho1d_6[1][m];
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
x0 = y0*rho1d_6[0][l];
if (eflag_atom) {
u_pa0 += x0*u_brick_a0[mz][my][mx];
u_pa1 += x0*u_brick_a1[mz][my][mx];
u_pa2 += x0*u_brick_a2[mz][my][mx];
u_pa3 += x0*u_brick_a3[mz][my][mx];
u_pa4 += x0*u_brick_a4[mz][my][mx];
u_pa5 += x0*u_brick_a5[mz][my][mx];
u_pa6 += x0*u_brick_a6[mz][my][mx];
}
if (vflag_atom) {
v00 += x0*v0_brick_a0[mz][my][mx];
v10 += x0*v1_brick_a0[mz][my][mx];
v20 += x0*v2_brick_a0[mz][my][mx];
v30 += x0*v3_brick_a0[mz][my][mx];
v40 += x0*v4_brick_a0[mz][my][mx];
v50 += x0*v5_brick_a0[mz][my][mx];
v01 += x0*v0_brick_a1[mz][my][mx];
v11 += x0*v1_brick_a1[mz][my][mx];
v21 += x0*v2_brick_a1[mz][my][mx];
v31 += x0*v3_brick_a1[mz][my][mx];
v41 += x0*v4_brick_a1[mz][my][mx];
v51 += x0*v5_brick_a1[mz][my][mx];
v02 += x0*v0_brick_a2[mz][my][mx];
v12 += x0*v1_brick_a2[mz][my][mx];
v22 += x0*v2_brick_a2[mz][my][mx];
v32 += x0*v3_brick_a2[mz][my][mx];
v42 += x0*v4_brick_a2[mz][my][mx];
v52 += x0*v5_brick_a2[mz][my][mx];
v03 += x0*v0_brick_a3[mz][my][mx];
v13 += x0*v1_brick_a3[mz][my][mx];
v23 += x0*v2_brick_a3[mz][my][mx];
v33 += x0*v3_brick_a3[mz][my][mx];
v43 += x0*v4_brick_a3[mz][my][mx];
v53 += x0*v5_brick_a3[mz][my][mx];
v04 += x0*v0_brick_a4[mz][my][mx];
v14 += x0*v1_brick_a4[mz][my][mx];
v24 += x0*v2_brick_a4[mz][my][mx];
v34 += x0*v3_brick_a4[mz][my][mx];
v44 += x0*v4_brick_a4[mz][my][mx];
v54 += x0*v5_brick_a4[mz][my][mx];
v05 += x0*v0_brick_a5[mz][my][mx];
v15 += x0*v1_brick_a5[mz][my][mx];
v25 += x0*v2_brick_a5[mz][my][mx];
v35 += x0*v3_brick_a5[mz][my][mx];
v45 += x0*v4_brick_a5[mz][my][mx];
v55 += x0*v5_brick_a5[mz][my][mx];
v06 += x0*v0_brick_a6[mz][my][mx];
v16 += x0*v1_brick_a6[mz][my][mx];
v26 += x0*v2_brick_a6[mz][my][mx];
v36 += x0*v3_brick_a6[mz][my][mx];
v46 += x0*v4_brick_a6[mz][my][mx];
v56 += x0*v5_brick_a6[mz][my][mx];
}
}
}
}
// convert D-field to force
type = atom->type[i];
lj0 = B[7*type+6]*0.5;
lj1 = B[7*type+5]*0.5;
lj2 = B[7*type+4]*0.5;
lj3 = B[7*type+3]*0.5;
lj4 = B[7*type+2]*0.5;
lj5 = B[7*type+1]*0.5;
lj6 = B[7*type]*0.5;
if (eflag_atom)
eatom[i] += u_pa0*lj0 + u_pa1*lj1 + u_pa2*lj2 +
u_pa3*lj3 + u_pa4*lj4 + u_pa5*lj5 + u_pa6*lj6;
if (vflag_atom) {
vatom[i][0] += v00*lj0 + v01*lj1 + v02*lj2 + v03*lj3 +
v04*lj4 + v05*lj5 + v06*lj6;
vatom[i][1] += v10*lj0 + v11*lj1 + v12*lj2 + v13*lj3 +
v14*lj4 + v15*lj5 + v16*lj6;
vatom[i][2] += v20*lj0 + v21*lj1 + v22*lj2 + v23*lj3 +
v24*lj4 + v25*lj5 + v26*lj6;
vatom[i][3] += v30*lj0 + v31*lj1 + v32*lj2 + v33*lj3 +
v34*lj4 + v35*lj5 + v36*lj6;
vatom[i][4] += v40*lj0 + v41*lj1 + v42*lj2 + v43*lj3 +
v44*lj4 + v45*lj5 + v46*lj6;
vatom[i][5] += v50*lj0 + v51*lj1 + v52*lj2 + v53*lj3 +
v54*lj4 + v55*lj5 + v56*lj6;
}
}
}
/* ----------------------------------------------------------------------
interpolate from grid to get dispersion field & force on my particles
for arithmetic mixing rule and ik scheme
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_none_ik()
{
int i,k,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
FFT_SCALAR *ekx, *eky, *ekz;
ekx = new FFT_SCALAR[nsplit];
eky = new FFT_SCALAR[nsplit];
ekz = new FFT_SCALAR[nsplit];
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of dispersion field on particle
double **x = atom->x;
double **f = atom->f;
int type;
double lj;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
for (k = 0; k < nsplit; k++)
ekx[k] = eky[k] = ekz[k] = ZEROF;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
z0 = rho1d_6[2][n];
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
y0 = z0*rho1d_6[1][m];
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
x0 = y0*rho1d_6[0][l];
for (k = 0; k < nsplit; k++) {
ekx[k] -= x0*vdx_brick_none[k][mz][my][mx];
eky[k] -= x0*vdy_brick_none[k][mz][my][mx];
ekz[k] -= x0*vdz_brick_none[k][mz][my][mx];
}
}
}
}
// convert D-field to force
type = atom->type[i];
for (k = 0; k < nsplit; k++) {
lj = B[nsplit*type + k];
f[i][0] += lj*ekx[k];
f[i][1] +=lj*eky[k];
if (slabflag != 2) f[i][2] +=lj*ekz[k];
}
}
delete [] ekx;
delete [] eky;
delete [] ekz;
}
/* ----------------------------------------------------------------------
interpolate from grid to get dispersion field & force on my particles
for arithmetic mixing rule for the ad scheme
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_none_ad()
{
int i,k,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
FFT_SCALAR *ekx, *eky, *ekz;
ekx = new FFT_SCALAR[nsplit];
eky = new FFT_SCALAR[nsplit];
ekz = new FFT_SCALAR[nsplit];
double s1,s2,s3;
double sf1,sf2,sf3;
double sf = 0.0;
double *prd;
if (triclinic == 0) prd = domain->prd;
else prd = domain->prd_lamda;
double xprd = prd[0];
double yprd = prd[1];
double zprd = prd[2];
double zprd_slab = zprd*slab_volfactor;
double hx_inv = nx_pppm_6/xprd;
double hy_inv = ny_pppm_6/yprd;
double hz_inv = nz_pppm_6/zprd_slab;
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of dispersion field on particle
double **x = atom->x;
double **f = atom->f;
int type;
double lj;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
compute_drho1d(dx,dy,dz, order_6, drho_coeff_6, drho1d_6);
for (k = 0; k < nsplit; k++)
ekx[k] = eky[k] = ekz[k] = ZEROF;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
x0 = drho1d_6[0][l]*rho1d_6[1][m]*rho1d_6[2][n];
y0 = rho1d_6[0][l]*drho1d_6[1][m]*rho1d_6[2][n];
z0 = rho1d_6[0][l]*rho1d_6[1][m]*drho1d_6[2][n];
for (k = 0; k < nsplit; k++) {
ekx[k] += x0*u_brick_none[k][mz][my][mx];
eky[k] += y0*u_brick_none[k][mz][my][mx];
ekz[k] += z0*u_brick_none[k][mz][my][mx];
}
}
}
}
for (k = 0; k < nsplit; k++) {
ekx[k] *= hx_inv;
eky[k] *= hy_inv;
ekz[k] *= hz_inv;
}
// convert D-field to force
type = atom->type[i];
s1 = x[i][0]*hx_inv;
s2 = x[i][1]*hy_inv;
s3 = x[i][2]*hz_inv;
sf1 = sf_coeff_6[0]*sin(2*MY_PI*s1);
sf1 += sf_coeff_6[1]*sin(4*MY_PI*s1);
sf2 = sf_coeff_6[2]*sin(2*MY_PI*s2);
sf2 += sf_coeff_6[3]*sin(4*MY_PI*s2);
sf3 = sf_coeff_6[4]*sin(2*MY_PI*s3);
sf3 += sf_coeff_6[5]*sin(4*MY_PI*s3);
for (k = 0; k < nsplit; k++) {
lj = B[nsplit*type + k];
sf = sf1*B[k]*2*lj*lj;
f[i][0] += lj*ekx[k] - sf;
sf = sf2*B[k]*2*lj*lj;
f[i][1] += lj*eky[k] - sf;
sf = sf3*B[k]*2*lj*lj;
if (slabflag != 2) f[i][2] += lj*ekz[k] - sf;
}
}
delete [] ekx;
delete [] eky;
delete [] ekz;
}
/* ----------------------------------------------------------------------
interpolate from grid to get dispersion field & force on my particles
for arithmetic mixing rule for per atom quantities
------------------------------------------------------------------------- */
void PPPMDisp::fieldforce_none_peratom()
{
int i,k,l,m,n,nx,ny,nz,mx,my,mz;
FFT_SCALAR dx,dy,dz,x0,y0,z0;
FFT_SCALAR *u_pa,*v0,*v1,*v2,*v3,*v4,*v5;
u_pa = new FFT_SCALAR[nsplit];
v0 = new FFT_SCALAR[nsplit];
v1 = new FFT_SCALAR[nsplit];
v2 = new FFT_SCALAR[nsplit];
v3 = new FFT_SCALAR[nsplit];
v4 = new FFT_SCALAR[nsplit];
v5 = new FFT_SCALAR[nsplit];
// loop over my charges, interpolate electric field from nearby grid points
// (nx,ny,nz) = global coords of grid pt to "lower left" of charge
// (dx,dy,dz) = distance to "lower left" grid pt
// (mx,my,mz) = global coords of moving stencil pt
// ek = 3 components of dispersion field on particle
double **x = atom->x;
int type;
double lj;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
nx = part2grid_6[i][0];
ny = part2grid_6[i][1];
nz = part2grid_6[i][2];
dx = nx+shiftone_6 - (x[i][0]-boxlo[0])*delxinv_6;
dy = ny+shiftone_6 - (x[i][1]-boxlo[1])*delyinv_6;
dz = nz+shiftone_6 - (x[i][2]-boxlo[2])*delzinv_6;
compute_rho1d(dx,dy,dz, order_6, rho_coeff_6, rho1d_6);
for (k = 0; k < nsplit; k++)
u_pa[k] = v0[k] = v1[k] = v2[k] = v3[k] = v4[k] = v5[k] = ZEROF;
for (n = nlower_6; n <= nupper_6; n++) {
mz = n+nz;
z0 = rho1d_6[2][n];
for (m = nlower_6; m <= nupper_6; m++) {
my = m+ny;
y0 = z0*rho1d_6[1][m];
for (l = nlower_6; l <= nupper_6; l++) {
mx = l+nx;
x0 = y0*rho1d_6[0][l];
if (eflag_atom) {
for (k = 0; k < nsplit; k++)
u_pa[k] += x0*u_brick_none[k][mz][my][mx];
}
if (vflag_atom) {
for (k = 0; k < nsplit; k++) {
v0[k] += x0*v0_brick_none[k][mz][my][mx];
v1[k] += x0*v1_brick_none[k][mz][my][mx];
v2[k] += x0*v2_brick_none[k][mz][my][mx];
v3[k] += x0*v3_brick_none[k][mz][my][mx];
v4[k] += x0*v4_brick_none[k][mz][my][mx];
v5[k] += x0*v5_brick_none[k][mz][my][mx];
}
}
}
}
}
// convert D-field to force
type = atom->type[i];
for (k = 0; k < nsplit; k++) {
lj = B[nsplit*type + k]*0.5;
if (eflag_atom) {
eatom[i] += u_pa[k]*lj;
}
if (vflag_atom) {
vatom[i][0] += v0[k]*lj;
vatom[i][1] += v1[k]*lj;
vatom[i][2] += v2[k]*lj;
vatom[i][3] += v3[k]*lj;
vatom[i][4] += v4[k]*lj;
vatom[i][5] += v5[k]*lj;
}
}
}
delete [] u_pa;
delete [] v0;
delete [] v1;
delete [] v2;
delete [] v3;
delete [] v4;
delete [] v5;
}
/* ----------------------------------------------------------------------
pack values to buf to send to another proc
------------------------------------------------------------------------- */
void PPPMDisp::pack_forward(int flag, FFT_SCALAR *buf, int nlist, int *list)
{
int n = 0;
switch (flag) {
// Coulomb interactions
case FORWARD_IK: {
FFT_SCALAR *xsrc = &vdx_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *ysrc = &vdy_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *zsrc = &vdz_brick[nzlo_out][nylo_out][nxlo_out];
for (int i = 0; i < nlist; i++) {
buf[n++] = xsrc[list[i]];
buf[n++] = ysrc[list[i]];
buf[n++] = zsrc[list[i]];
}
break;
}
case FORWARD_AD: {
FFT_SCALAR *src = &u_brick[nzlo_out][nylo_out][nxlo_out];
for (int i = 0; i < nlist; i++)
buf[i] = src[list[i]];
break;
}
case FORWARD_IK_PERATOM: {
FFT_SCALAR *esrc = &u_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v0src = &v0_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v1src = &v1_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v2src = &v2_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v3src = &v3_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v4src = &v4_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v5src = &v5_brick[nzlo_out][nylo_out][nxlo_out];
for (int i = 0; i < nlist; i++) {
if (eflag_atom) buf[n++] = esrc[list[i]];
if (vflag_atom) {
buf[n++] = v0src[list[i]];
buf[n++] = v1src[list[i]];
buf[n++] = v2src[list[i]];
buf[n++] = v3src[list[i]];
buf[n++] = v4src[list[i]];
buf[n++] = v5src[list[i]];
}
}
break;
}
case FORWARD_AD_PERATOM: {
FFT_SCALAR *v0src = &v0_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v1src = &v1_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v2src = &v2_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v3src = &v3_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v4src = &v4_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v5src = &v5_brick[nzlo_out][nylo_out][nxlo_out];
for (int i = 0; i < nlist; i++) {
buf[n++] = v0src[list[i]];
buf[n++] = v1src[list[i]];
buf[n++] = v2src[list[i]];
buf[n++] = v3src[list[i]];
buf[n++] = v4src[list[i]];
buf[n++] = v5src[list[i]];
}
break;
}
// Dispersion interactions, geometric mixing
case FORWARD_IK_G: {
FFT_SCALAR *xsrc = &vdx_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ysrc = &vdy_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zsrc = &vdz_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
buf[n++] = xsrc[list[i]];
buf[n++] = ysrc[list[i]];
buf[n++] = zsrc[list[i]];
}
break;
}
case FORWARD_AD_G: {
FFT_SCALAR *src = &u_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++)
buf[i] = src[list[i]];
break;
}
case FORWARD_IK_PERATOM_G: {
FFT_SCALAR *esrc = &u_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src = &v0_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src = &v1_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src = &v2_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src = &v3_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src = &v4_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src = &v5_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
if (eflag_atom) buf[n++] = esrc[list[i]];
if (vflag_atom) {
buf[n++] = v0src[list[i]];
buf[n++] = v1src[list[i]];
buf[n++] = v2src[list[i]];
buf[n++] = v3src[list[i]];
buf[n++] = v4src[list[i]];
buf[n++] = v5src[list[i]];
}
}
break;
}
case FORWARD_AD_PERATOM_G: {
FFT_SCALAR *v0src = &v0_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src = &v1_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src = &v2_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src = &v3_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src = &v4_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src = &v5_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
buf[n++] = v0src[list[i]];
buf[n++] = v1src[list[i]];
buf[n++] = v2src[list[i]];
buf[n++] = v3src[list[i]];
buf[n++] = v4src[list[i]];
buf[n++] = v5src[list[i]];
}
break;
}
// Dispersion interactions, arithmetic mixing
case FORWARD_IK_A: {
FFT_SCALAR *xsrc0 = &vdx_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ysrc0 = &vdy_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zsrc0 = &vdz_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xsrc1 = &vdx_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ysrc1 = &vdy_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zsrc1 = &vdz_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xsrc2 = &vdx_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ysrc2 = &vdy_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zsrc2 = &vdz_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xsrc3 = &vdx_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ysrc3 = &vdy_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zsrc3 = &vdz_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xsrc4 = &vdx_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ysrc4 = &vdy_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zsrc4 = &vdz_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xsrc5 = &vdx_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ysrc5 = &vdy_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zsrc5 = &vdz_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xsrc6 = &vdx_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ysrc6 = &vdy_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zsrc6 = &vdz_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
buf[n++] = xsrc0[list[i]];
buf[n++] = ysrc0[list[i]];
buf[n++] = zsrc0[list[i]];
buf[n++] = xsrc1[list[i]];
buf[n++] = ysrc1[list[i]];
buf[n++] = zsrc1[list[i]];
buf[n++] = xsrc2[list[i]];
buf[n++] = ysrc2[list[i]];
buf[n++] = zsrc2[list[i]];
buf[n++] = xsrc3[list[i]];
buf[n++] = ysrc3[list[i]];
buf[n++] = zsrc3[list[i]];
buf[n++] = xsrc4[list[i]];
buf[n++] = ysrc4[list[i]];
buf[n++] = zsrc4[list[i]];
buf[n++] = xsrc5[list[i]];
buf[n++] = ysrc5[list[i]];
buf[n++] = zsrc5[list[i]];
buf[n++] = xsrc6[list[i]];
buf[n++] = ysrc6[list[i]];
buf[n++] = zsrc6[list[i]];
}
break;
}
case FORWARD_AD_A: {
FFT_SCALAR *src0 = &u_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src1 = &u_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src2 = &u_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src3 = &u_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src4 = &u_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src5 = &u_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src6 = &u_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
buf[n++] = src0[list[i]];
buf[n++] = src1[list[i]];
buf[n++] = src2[list[i]];
buf[n++] = src3[list[i]];
buf[n++] = src4[list[i]];
buf[n++] = src5[list[i]];
buf[n++] = src6[list[i]];
}
break;
}
case FORWARD_IK_PERATOM_A: {
FFT_SCALAR *esrc0 = &u_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src0 = &v0_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src0 = &v1_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src0 = &v2_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src0 = &v3_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src0 = &v4_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src0 = &v5_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc1 = &u_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src1 = &v0_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src1 = &v1_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src1 = &v2_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src1 = &v3_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src1 = &v4_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src1 = &v5_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc2 = &u_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src2 = &v0_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src2 = &v1_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src2 = &v2_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src2 = &v3_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src2 = &v4_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src2 = &v5_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc3 = &u_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src3 = &v0_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src3 = &v1_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src3 = &v2_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src3 = &v3_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src3 = &v4_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src3 = &v5_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc4 = &u_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src4 = &v0_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src4 = &v1_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src4 = &v2_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src4 = &v3_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src4 = &v4_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src4 = &v5_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc5 = &u_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src5 = &v0_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src5 = &v1_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src5 = &v2_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src5 = &v3_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src5 = &v4_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src5 = &v5_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc6 = &u_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src6 = &v0_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src6 = &v1_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src6 = &v2_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src6 = &v3_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src6 = &v4_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src6 = &v5_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
if (eflag_atom) {
buf[n++] = esrc0[list[i]];
buf[n++] = esrc1[list[i]];
buf[n++] = esrc2[list[i]];
buf[n++] = esrc3[list[i]];
buf[n++] = esrc4[list[i]];
buf[n++] = esrc5[list[i]];
buf[n++] = esrc6[list[i]];
}
if (vflag_atom) {
buf[n++] = v0src0[list[i]];
buf[n++] = v1src0[list[i]];
buf[n++] = v2src0[list[i]];
buf[n++] = v3src0[list[i]];
buf[n++] = v4src0[list[i]];
buf[n++] = v5src0[list[i]];
buf[n++] = v0src1[list[i]];
buf[n++] = v1src1[list[i]];
buf[n++] = v2src1[list[i]];
buf[n++] = v3src1[list[i]];
buf[n++] = v4src1[list[i]];
buf[n++] = v5src1[list[i]];
buf[n++] = v0src2[list[i]];
buf[n++] = v1src2[list[i]];
buf[n++] = v2src2[list[i]];
buf[n++] = v3src2[list[i]];
buf[n++] = v4src2[list[i]];
buf[n++] = v5src2[list[i]];
buf[n++] = v0src3[list[i]];
buf[n++] = v1src3[list[i]];
buf[n++] = v2src3[list[i]];
buf[n++] = v3src3[list[i]];
buf[n++] = v4src3[list[i]];
buf[n++] = v5src3[list[i]];
buf[n++] = v0src4[list[i]];
buf[n++] = v1src4[list[i]];
buf[n++] = v2src4[list[i]];
buf[n++] = v3src4[list[i]];
buf[n++] = v4src4[list[i]];
buf[n++] = v5src4[list[i]];
buf[n++] = v0src5[list[i]];
buf[n++] = v1src5[list[i]];
buf[n++] = v2src5[list[i]];
buf[n++] = v3src5[list[i]];
buf[n++] = v4src5[list[i]];
buf[n++] = v5src5[list[i]];
buf[n++] = v0src6[list[i]];
buf[n++] = v1src6[list[i]];
buf[n++] = v2src6[list[i]];
buf[n++] = v3src6[list[i]];
buf[n++] = v4src6[list[i]];
buf[n++] = v5src6[list[i]];
}
}
break;
}
case FORWARD_AD_PERATOM_A: {
FFT_SCALAR *v0src0 = &v0_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src0 = &v1_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src0 = &v2_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src0 = &v3_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src0 = &v4_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src0 = &v5_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src1 = &v0_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src1 = &v1_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src1 = &v2_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src1 = &v3_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src1 = &v4_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src1 = &v5_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src2 = &v0_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src2 = &v1_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src2 = &v2_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src2 = &v3_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src2 = &v4_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src2 = &v5_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src3 = &v0_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src3 = &v1_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src3 = &v2_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src3 = &v3_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src3 = &v4_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src3 = &v5_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src4 = &v0_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src4 = &v1_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src4 = &v2_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src4 = &v3_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src4 = &v4_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src4 = &v5_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src5 = &v0_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src5 = &v1_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src5 = &v2_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src5 = &v3_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src5 = &v4_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src5 = &v5_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src6 = &v0_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src6 = &v1_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src6 = &v2_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src6 = &v3_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src6 = &v4_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src6 = &v5_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
buf[n++] = v0src0[list[i]];
buf[n++] = v1src0[list[i]];
buf[n++] = v2src0[list[i]];
buf[n++] = v3src0[list[i]];
buf[n++] = v4src0[list[i]];
buf[n++] = v5src0[list[i]];
buf[n++] = v0src1[list[i]];
buf[n++] = v1src1[list[i]];
buf[n++] = v2src1[list[i]];
buf[n++] = v3src1[list[i]];
buf[n++] = v4src1[list[i]];
buf[n++] = v5src1[list[i]];
buf[n++] = v0src2[list[i]];
buf[n++] = v1src2[list[i]];
buf[n++] = v2src2[list[i]];
buf[n++] = v3src2[list[i]];
buf[n++] = v4src2[list[i]];
buf[n++] = v5src2[list[i]];
buf[n++] = v0src3[list[i]];
buf[n++] = v1src3[list[i]];
buf[n++] = v2src3[list[i]];
buf[n++] = v3src3[list[i]];
buf[n++] = v4src3[list[i]];
buf[n++] = v5src3[list[i]];
buf[n++] = v0src4[list[i]];
buf[n++] = v1src4[list[i]];
buf[n++] = v2src4[list[i]];
buf[n++] = v3src4[list[i]];
buf[n++] = v4src4[list[i]];
buf[n++] = v5src4[list[i]];
buf[n++] = v0src5[list[i]];
buf[n++] = v1src5[list[i]];
buf[n++] = v2src5[list[i]];
buf[n++] = v3src5[list[i]];
buf[n++] = v4src5[list[i]];
buf[n++] = v5src5[list[i]];
buf[n++] = v0src6[list[i]];
buf[n++] = v1src6[list[i]];
buf[n++] = v2src6[list[i]];
buf[n++] = v3src6[list[i]];
buf[n++] = v4src6[list[i]];
buf[n++] = v5src6[list[i]];
}
break;
}
// Dispersion interactions, no mixing
case FORWARD_IK_NONE: {
for (int k = 0; k < nsplit_alloc; k++) {
FFT_SCALAR *xsrc = &vdx_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ysrc = &vdy_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zsrc = &vdz_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
buf[n++] = xsrc[list[i]];
buf[n++] = ysrc[list[i]];
buf[n++] = zsrc[list[i]];
}
}
break;
}
case FORWARD_AD_NONE: {
for (int k = 0; k < nsplit_alloc; k++) {
FFT_SCALAR *src = &u_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++)
buf[n++] = src[list[i]];
}
break;
}
case FORWARD_IK_PERATOM_NONE: {
for (int k = 0; k < nsplit_alloc; k++) {
FFT_SCALAR *esrc = &u_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src = &v0_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src = &v1_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src = &v2_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src = &v3_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src = &v4_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src = &v5_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
if (eflag_atom) buf[n++] = esrc[list[i]];
if (vflag_atom) {
buf[n++] = v0src[list[i]];
buf[n++] = v1src[list[i]];
buf[n++] = v2src[list[i]];
buf[n++] = v3src[list[i]];
buf[n++] = v4src[list[i]];
buf[n++] = v5src[list[i]];
}
}
}
break;
}
case FORWARD_AD_PERATOM_NONE: {
for (int k = 0; k < nsplit_alloc; k++) {
FFT_SCALAR *v0src = &v0_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src = &v1_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src = &v2_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src = &v3_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src = &v4_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src = &v5_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
buf[n++] = v0src[list[i]];
buf[n++] = v1src[list[i]];
buf[n++] = v2src[list[i]];
buf[n++] = v3src[list[i]];
buf[n++] = v4src[list[i]];
buf[n++] = v5src[list[i]];
}
}
break;
}
}
}
/* ----------------------------------------------------------------------
unpack another proc's own values from buf and set own ghost values
------------------------------------------------------------------------- */
void PPPMDisp::unpack_forward(int flag, FFT_SCALAR *buf, int nlist, int *list)
{
int n = 0;
switch (flag) {
// Coulomb interactions
case FORWARD_IK: {
FFT_SCALAR *xdest = &vdx_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *ydest = &vdy_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *zdest = &vdz_brick[nzlo_out][nylo_out][nxlo_out];
for (int i = 0; i < nlist; i++) {
xdest[list[i]] = buf[n++];
ydest[list[i]] = buf[n++];
zdest[list[i]] = buf[n++];
}
break;
}
case FORWARD_AD: {
FFT_SCALAR *dest = &u_brick[nzlo_out][nylo_out][nxlo_out];
for (int i = 0; i < nlist; i++)
dest[list[i]] = buf[n++];
break;
}
case FORWARD_IK_PERATOM: {
FFT_SCALAR *esrc = &u_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v0src = &v0_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v1src = &v1_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v2src = &v2_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v3src = &v3_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v4src = &v4_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v5src = &v5_brick[nzlo_out][nylo_out][nxlo_out];
for (int i = 0; i < nlist; i++) {
if (eflag_atom) esrc[list[i]] = buf[n++];
if (vflag_atom) {
v0src[list[i]] = buf[n++];
v1src[list[i]] = buf[n++];
v2src[list[i]] = buf[n++];
v3src[list[i]] = buf[n++];
v4src[list[i]] = buf[n++];
v5src[list[i]] = buf[n++];
}
}
break;
}
case FORWARD_AD_PERATOM: {
FFT_SCALAR *v0src = &v0_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v1src = &v1_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v2src = &v2_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v3src = &v3_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v4src = &v4_brick[nzlo_out][nylo_out][nxlo_out];
FFT_SCALAR *v5src = &v5_brick[nzlo_out][nylo_out][nxlo_out];
for (int i = 0; i < nlist; i++) {
v0src[list[i]] = buf[n++];
v1src[list[i]] = buf[n++];
v2src[list[i]] = buf[n++];
v3src[list[i]] = buf[n++];
v4src[list[i]] = buf[n++];
v5src[list[i]] = buf[n++];
}
break;
}
// Disperion interactions, geometric mixing
case FORWARD_IK_G: {
FFT_SCALAR *xdest = &vdx_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ydest = &vdy_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zdest = &vdz_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
xdest[list[i]] = buf[n++];
ydest[list[i]] = buf[n++];
zdest[list[i]] = buf[n++];
}
break;
}
case FORWARD_AD_G: {
FFT_SCALAR *dest = &u_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++)
dest[list[i]] = buf[n++];
break;
}
case FORWARD_IK_PERATOM_G: {
FFT_SCALAR *esrc = &u_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src = &v0_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src = &v1_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src = &v2_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src = &v3_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src = &v4_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src = &v5_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
if (eflag_atom) esrc[list[i]] = buf[n++];
if (vflag_atom) {
v0src[list[i]] = buf[n++];
v1src[list[i]] = buf[n++];
v2src[list[i]] = buf[n++];
v3src[list[i]] = buf[n++];
v4src[list[i]] = buf[n++];
v5src[list[i]] = buf[n++];
}
}
break;
}
case FORWARD_AD_PERATOM_G: {
FFT_SCALAR *v0src = &v0_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src = &v1_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src = &v2_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src = &v3_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src = &v4_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src = &v5_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
v0src[list[i]] = buf[n++];
v1src[list[i]] = buf[n++];
v2src[list[i]] = buf[n++];
v3src[list[i]] = buf[n++];
v4src[list[i]] = buf[n++];
v5src[list[i]] = buf[n++];
}
break;
}
// Disperion interactions, arithmetic mixing
case FORWARD_IK_A: {
FFT_SCALAR *xdest0 = &vdx_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ydest0 = &vdy_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zdest0 = &vdz_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xdest1 = &vdx_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ydest1 = &vdy_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zdest1 = &vdz_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xdest2 = &vdx_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ydest2 = &vdy_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zdest2 = &vdz_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xdest3 = &vdx_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ydest3 = &vdy_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zdest3 = &vdz_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xdest4 = &vdx_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ydest4 = &vdy_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zdest4 = &vdz_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xdest5 = &vdx_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ydest5 = &vdy_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zdest5 = &vdz_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *xdest6 = &vdx_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ydest6 = &vdy_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zdest6 = &vdz_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
xdest0[list[i]] = buf[n++];
ydest0[list[i]] = buf[n++];
zdest0[list[i]] = buf[n++];
xdest1[list[i]] = buf[n++];
ydest1[list[i]] = buf[n++];
zdest1[list[i]] = buf[n++];
xdest2[list[i]] = buf[n++];
ydest2[list[i]] = buf[n++];
zdest2[list[i]] = buf[n++];
xdest3[list[i]] = buf[n++];
ydest3[list[i]] = buf[n++];
zdest3[list[i]] = buf[n++];
xdest4[list[i]] = buf[n++];
ydest4[list[i]] = buf[n++];
zdest4[list[i]] = buf[n++];
xdest5[list[i]] = buf[n++];
ydest5[list[i]] = buf[n++];
zdest5[list[i]] = buf[n++];
xdest6[list[i]] = buf[n++];
ydest6[list[i]] = buf[n++];
zdest6[list[i]] = buf[n++];
}
break;
}
case FORWARD_AD_A: {
FFT_SCALAR *dest0 = &u_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest1 = &u_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest2 = &u_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest3 = &u_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest4 = &u_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest5 = &u_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest6 = &u_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
dest0[list[i]] = buf[n++];
dest1[list[i]] = buf[n++];
dest2[list[i]] = buf[n++];
dest3[list[i]] = buf[n++];
dest4[list[i]] = buf[n++];
dest5[list[i]] = buf[n++];
dest6[list[i]] = buf[n++];
}
break;
}
case FORWARD_IK_PERATOM_A: {
FFT_SCALAR *esrc0 = &u_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src0 = &v0_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src0 = &v1_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src0 = &v2_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src0 = &v3_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src0 = &v4_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src0 = &v5_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc1 = &u_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src1 = &v0_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src1 = &v1_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src1 = &v2_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src1 = &v3_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src1 = &v4_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src1 = &v5_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc2 = &u_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src2 = &v0_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src2 = &v1_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src2 = &v2_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src2 = &v3_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src2 = &v4_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src2 = &v5_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc3 = &u_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src3 = &v0_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src3 = &v1_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src3 = &v2_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src3 = &v3_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src3 = &v4_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src3 = &v5_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc4 = &u_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src4 = &v0_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src4 = &v1_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src4 = &v2_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src4 = &v3_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src4 = &v4_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src4 = &v5_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc5 = &u_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src5 = &v0_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src5 = &v1_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src5 = &v2_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src5 = &v3_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src5 = &v4_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src5 = &v5_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *esrc6 = &u_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src6 = &v0_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src6 = &v1_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src6 = &v2_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src6 = &v3_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src6 = &v4_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src6 = &v5_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
if (eflag_atom) {
esrc0[list[i]] = buf[n++];
esrc1[list[i]] = buf[n++];
esrc2[list[i]] = buf[n++];
esrc3[list[i]] = buf[n++];
esrc4[list[i]] = buf[n++];
esrc5[list[i]] = buf[n++];
esrc6[list[i]] = buf[n++];
}
if (vflag_atom) {
v0src0[list[i]] = buf[n++];
v1src0[list[i]] = buf[n++];
v2src0[list[i]] = buf[n++];
v3src0[list[i]] = buf[n++];
v4src0[list[i]] = buf[n++];
v5src0[list[i]] = buf[n++];
v0src1[list[i]] = buf[n++];
v1src1[list[i]] = buf[n++];
v2src1[list[i]] = buf[n++];
v3src1[list[i]] = buf[n++];
v4src1[list[i]] = buf[n++];
v5src1[list[i]] = buf[n++];
v0src2[list[i]] = buf[n++];
v1src2[list[i]] = buf[n++];
v2src2[list[i]] = buf[n++];
v3src2[list[i]] = buf[n++];
v4src2[list[i]] = buf[n++];
v5src2[list[i]] = buf[n++];
v0src3[list[i]] = buf[n++];
v1src3[list[i]] = buf[n++];
v2src3[list[i]] = buf[n++];
v3src3[list[i]] = buf[n++];
v4src3[list[i]] = buf[n++];
v5src3[list[i]] = buf[n++];
v0src4[list[i]] = buf[n++];
v1src4[list[i]] = buf[n++];
v2src4[list[i]] = buf[n++];
v3src4[list[i]] = buf[n++];
v4src4[list[i]] = buf[n++];
v5src4[list[i]] = buf[n++];
v0src5[list[i]] = buf[n++];
v1src5[list[i]] = buf[n++];
v2src5[list[i]] = buf[n++];
v3src5[list[i]] = buf[n++];
v4src5[list[i]] = buf[n++];
v5src5[list[i]] = buf[n++];
v0src6[list[i]] = buf[n++];
v1src6[list[i]] = buf[n++];
v2src6[list[i]] = buf[n++];
v3src6[list[i]] = buf[n++];
v4src6[list[i]] = buf[n++];
v5src6[list[i]] = buf[n++];
}
}
break;
}
case FORWARD_AD_PERATOM_A: {
FFT_SCALAR *v0src0 = &v0_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src0 = &v1_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src0 = &v2_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src0 = &v3_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src0 = &v4_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src0 = &v5_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src1 = &v0_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src1 = &v1_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src1 = &v2_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src1 = &v3_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src1 = &v4_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src1 = &v5_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src2 = &v0_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src2 = &v1_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src2 = &v2_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src2 = &v3_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src2 = &v4_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src2 = &v5_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src3 = &v0_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src3 = &v1_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src3 = &v2_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src3 = &v3_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src3 = &v4_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src3 = &v5_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src4 = &v0_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src4 = &v1_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src4 = &v2_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src4 = &v3_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src4 = &v4_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src4 = &v5_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src5 = &v0_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src5 = &v1_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src5 = &v2_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src5 = &v3_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src5 = &v4_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src5 = &v5_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src6 = &v0_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src6 = &v1_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src6 = &v2_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src6 = &v3_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src6 = &v4_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src6 = &v5_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
v0src0[list[i]] = buf[n++];
v1src0[list[i]] = buf[n++];
v2src0[list[i]] = buf[n++];
v3src0[list[i]] = buf[n++];
v4src0[list[i]] = buf[n++];
v5src0[list[i]] = buf[n++];
v0src1[list[i]] = buf[n++];
v1src1[list[i]] = buf[n++];
v2src1[list[i]] = buf[n++];
v3src1[list[i]] = buf[n++];
v4src1[list[i]] = buf[n++];
v5src1[list[i]] = buf[n++];
v0src2[list[i]] = buf[n++];
v1src2[list[i]] = buf[n++];
v2src2[list[i]] = buf[n++];
v3src2[list[i]] = buf[n++];
v4src2[list[i]] = buf[n++];
v5src2[list[i]] = buf[n++];
v0src3[list[i]] = buf[n++];
v1src3[list[i]] = buf[n++];
v2src3[list[i]] = buf[n++];
v3src3[list[i]] = buf[n++];
v4src3[list[i]] = buf[n++];
v5src3[list[i]] = buf[n++];
v0src4[list[i]] = buf[n++];
v1src4[list[i]] = buf[n++];
v2src4[list[i]] = buf[n++];
v3src4[list[i]] = buf[n++];
v4src4[list[i]] = buf[n++];
v5src4[list[i]] = buf[n++];
v0src5[list[i]] = buf[n++];
v1src5[list[i]] = buf[n++];
v2src5[list[i]] = buf[n++];
v3src5[list[i]] = buf[n++];
v4src5[list[i]] = buf[n++];
v5src5[list[i]] = buf[n++];
v0src6[list[i]] = buf[n++];
v1src6[list[i]] = buf[n++];
v2src6[list[i]] = buf[n++];
v3src6[list[i]] = buf[n++];
v4src6[list[i]] = buf[n++];
v5src6[list[i]] = buf[n++];
}
break;
}
// Disperion interactions, geometric mixing
case FORWARD_IK_NONE: {
for (int k = 0; k < nsplit_alloc; k++) {
FFT_SCALAR *xdest = &vdx_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *ydest = &vdy_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *zdest = &vdz_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
xdest[list[i]] = buf[n++];
ydest[list[i]] = buf[n++];
zdest[list[i]] = buf[n++];
}
}
break;
}
case FORWARD_AD_NONE: {
for (int k = 0; k < nsplit_alloc; k++) {
FFT_SCALAR *dest = &u_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++)
dest[list[i]] = buf[n++];
}
break;
}
case FORWARD_IK_PERATOM_NONE: {
for (int k = 0; k < nsplit_alloc; k++) {
FFT_SCALAR *esrc = &u_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v0src = &v0_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src = &v1_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src = &v2_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src = &v3_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src = &v4_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src = &v5_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
if (eflag_atom) esrc[list[i]] = buf[n++];
if (vflag_atom) {
v0src[list[i]] = buf[n++];
v1src[list[i]] = buf[n++];
v2src[list[i]] = buf[n++];
v3src[list[i]] = buf[n++];
v4src[list[i]] = buf[n++];
v5src[list[i]] = buf[n++];
}
}
}
break;
}
case FORWARD_AD_PERATOM_NONE: {
for (int k = 0; k < nsplit_alloc; k++) {
FFT_SCALAR *v0src = &v0_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v1src = &v1_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v2src = &v2_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v3src = &v3_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v4src = &v4_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *v5src = &v5_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
v0src[list[i]] = buf[n++];
v1src[list[i]] = buf[n++];
v2src[list[i]] = buf[n++];
v3src[list[i]] = buf[n++];
v4src[list[i]] = buf[n++];
v5src[list[i]] = buf[n++];
}
}
break;
}
}
}
/* ----------------------------------------------------------------------
pack ghost values into buf to send to another proc
------------------------------------------------------------------------- */
void PPPMDisp::pack_reverse(int flag, FFT_SCALAR *buf, int nlist, int *list)
{
int n = 0;
//Coulomb interactions
if (flag == REVERSE_RHO) {
FFT_SCALAR *src = &density_brick[nzlo_out][nylo_out][nxlo_out];
for (int i = 0; i < nlist; i++)
buf[i] = src[list[i]];
//Dispersion interactions, geometric mixing
} else if (flag == REVERSE_RHO_G) {
FFT_SCALAR *src = &density_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++)
buf[i] = src[list[i]];
//Dispersion interactions, arithmetic mixing
} else if (flag == REVERSE_RHO_A) {
FFT_SCALAR *src0 = &density_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src1 = &density_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src2 = &density_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src3 = &density_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src4 = &density_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src5 = &density_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *src6 = &density_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
buf[n++] = src0[list[i]];
buf[n++] = src1[list[i]];
buf[n++] = src2[list[i]];
buf[n++] = src3[list[i]];
buf[n++] = src4[list[i]];
buf[n++] = src5[list[i]];
buf[n++] = src6[list[i]];
}
//Dispersion interactions, no mixing
} else if (flag == REVERSE_RHO_NONE) {
for (int k = 0; k < nsplit_alloc; k++) {
FFT_SCALAR *src = &density_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
buf[n++] = src[list[i]];
}
}
}
}
/* ----------------------------------------------------------------------
unpack another proc's ghost values from buf and add to own values
------------------------------------------------------------------------- */
void PPPMDisp::unpack_reverse(int flag, FFT_SCALAR *buf, int nlist, int *list)
{
int n = 0;
//Coulomb interactions
if (flag == REVERSE_RHO) {
FFT_SCALAR *dest = &density_brick[nzlo_out][nylo_out][nxlo_out];
for (int i = 0; i < nlist; i++)
dest[list[i]] += buf[i];
//Dispersion interactions, geometric mixing
} else if (flag == REVERSE_RHO_G) {
FFT_SCALAR *dest = &density_brick_g[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++)
dest[list[i]] += buf[i];
//Dispersion interactions, arithmetic mixing
} else if (flag == REVERSE_RHO_A) {
FFT_SCALAR *dest0 = &density_brick_a0[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest1 = &density_brick_a1[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest2 = &density_brick_a2[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest3 = &density_brick_a3[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest4 = &density_brick_a4[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest5 = &density_brick_a5[nzlo_out_6][nylo_out_6][nxlo_out_6];
FFT_SCALAR *dest6 = &density_brick_a6[nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++) {
dest0[list[i]] += buf[n++];
dest1[list[i]] += buf[n++];
dest2[list[i]] += buf[n++];
dest3[list[i]] += buf[n++];
dest4[list[i]] += buf[n++];
dest5[list[i]] += buf[n++];
dest6[list[i]] += buf[n++];
}
//Dispersion interactions, no mixing
} else if (flag == REVERSE_RHO_NONE) {
for (int k = 0; k < nsplit_alloc; k++) {
FFT_SCALAR *dest = &density_brick_none[k][nzlo_out_6][nylo_out_6][nxlo_out_6];
for (int i = 0; i < nlist; i++)
dest[list[i]] += buf[n++];
}
}
}
/* ----------------------------------------------------------------------
map nprocs to NX by NY grid as PX by PY procs - return optimal px,py
------------------------------------------------------------------------- */
void PPPMDisp::procs2grid2d(int nprocs, int nx, int ny, int *px, int *py)
{
// loop thru all possible factorizations of nprocs
// surf = surface area of largest proc sub-domain
// innermost if test minimizes surface area and surface/volume ratio
int bestsurf = 2 * (nx + ny);
int bestboxx = 0;
int bestboxy = 0;
int boxx,boxy,surf,ipx,ipy;
ipx = 1;
while (ipx <= nprocs) {
if (nprocs % ipx == 0) {
ipy = nprocs/ipx;
boxx = nx/ipx;
if (nx % ipx) boxx++;
boxy = ny/ipy;
if (ny % ipy) boxy++;
surf = boxx + boxy;
if (surf < bestsurf ||
(surf == bestsurf && boxx*boxy > bestboxx*bestboxy)) {
bestsurf = surf;
bestboxx = boxx;
bestboxy = boxy;
*px = ipx;
*py = ipy;
}
}
ipx++;
}
}
/* ----------------------------------------------------------------------
charge assignment into rho1d
dx,dy,dz = distance of particle from "lower left" grid point
------------------------------------------------------------------------- */
void PPPMDisp::compute_rho1d(const FFT_SCALAR &dx, const FFT_SCALAR &dy,
const FFT_SCALAR &dz, int ord,
FFT_SCALAR **rho_c, FFT_SCALAR **r1d)
{
int k,l;
FFT_SCALAR r1,r2,r3;
for (k = (1-ord)/2; k <= ord/2; k++) {
r1 = r2 = r3 = ZEROF;
for (l = ord-1; l >= 0; l--) {
r1 = rho_c[l][k] + r1*dx;
r2 = rho_c[l][k] + r2*dy;
r3 = rho_c[l][k] + r3*dz;
}
r1d[0][k] = r1;
r1d[1][k] = r2;
r1d[2][k] = r3;
}
}
/* ----------------------------------------------------------------------
charge assignment into drho1d
dx,dy,dz = distance of particle from "lower left" grid point
------------------------------------------------------------------------- */
void PPPMDisp::compute_drho1d(const FFT_SCALAR &dx, const FFT_SCALAR &dy,
const FFT_SCALAR &dz, int ord,
FFT_SCALAR **drho_c, FFT_SCALAR **dr1d)
{
int k,l;
FFT_SCALAR r1,r2,r3;
for (k = (1-ord)/2; k <= ord/2; k++) {
r1 = r2 = r3 = ZEROF;
for (l = ord-2; l >= 0; l--) {
r1 = drho_c[l][k] + r1*dx;
r2 = drho_c[l][k] + r2*dy;
r3 = drho_c[l][k] + r3*dz;
}
dr1d[0][k] = r1;
dr1d[1][k] = r2;
dr1d[2][k] = r3;
}
}
/* ----------------------------------------------------------------------
generate coeffients for the weight function of order n
(n-1)
Wn(x) = Sum wn(k,x) , Sum is over every other integer
k=-(n-1)
For k=-(n-1),-(n-1)+2, ....., (n-1)-2,n-1
k is odd integers if n is even and even integers if n is odd
---
| n-1
| Sum a(l,j)*(x-k/2)**l if abs(x-k/2) < 1/2
wn(k,x) = < l=0
|
| 0 otherwise
---
a coeffients are packed into the array rho_coeff to eliminate zeros
rho_coeff(l,((k+mod(n+1,2))/2) = a(l,k)
------------------------------------------------------------------------- */
void PPPMDisp::compute_rho_coeff(FFT_SCALAR **coeff , FFT_SCALAR **dcoeff,
int ord)
{
int j,k,l,m;
FFT_SCALAR s;
FFT_SCALAR **a;
memory->create2d_offset(a,ord,-ord,ord,"pppm/disp:a");
for (k = -ord; k <= ord; k++)
for (l = 0; l < ord; l++)
a[l][k] = 0.0;
a[0][0] = 1.0;
for (j = 1; j < ord; j++) {
for (k = -j; k <= j; k += 2) {
s = 0.0;
for (l = 0; l < j; l++) {
a[l+1][k] = (a[l][k+1]-a[l][k-1]) / (l+1);
#ifdef FFT_SINGLE
s += powf(0.5,(float) l+1) *
(a[l][k-1] + powf(-1.0,(float) l) * a[l][k+1]) / (l+1);
#else
s += pow(0.5,(double) l+1) *
(a[l][k-1] + pow(-1.0,(double) l) * a[l][k+1]) / (l+1);
#endif
}
a[0][k] = s;
}
}
m = (1-ord)/2;
for (k = -(ord-1); k < ord; k += 2) {
for (l = 0; l < ord; l++)
coeff[l][m] = a[l][k];
for (l = 1; l < ord; l++)
dcoeff[l-1][m] = l*a[l][k];
m++;
}
memory->destroy2d_offset(a,-ord);
}
/* ----------------------------------------------------------------------
Slab-geometry correction term to dampen inter-slab interactions between
periodically repeating slabs. Yields good approximation to 2D Ewald if
adequate empty space is left between repeating slabs (J. Chem. Phys.
111, 3155). Slabs defined here to be parallel to the xy plane. Also
extended to non-neutral systems (J. Chem. Phys. 131, 094107).
------------------------------------------------------------------------- */
void PPPMDisp::slabcorr(int eflag)
{
// compute local contribution to global dipole moment
double *q = atom->q;
double **x = atom->x;
double zprd = domain->zprd;
int nlocal = atom->nlocal;
double dipole = 0.0;
for (int i = 0; i < nlocal; i++) dipole += q[i]*x[i][2];
// sum local contributions to get global dipole moment
double dipole_all;
MPI_Allreduce(&dipole,&dipole_all,1,MPI_DOUBLE,MPI_SUM,world);
// need to make non-neutral systems and/or
// per-atom energy translationally invariant
double dipole_r2 = 0.0;
if (eflag_atom || fabs(qsum) > SMALL) {
for (int i = 0; i < nlocal; i++)
dipole_r2 += q[i]*x[i][2]*x[i][2];
// sum local contributions
double tmp;
MPI_Allreduce(&dipole_r2,&tmp,1,MPI_DOUBLE,MPI_SUM,world);
dipole_r2 = tmp;
}
// compute corrections
const double e_slabcorr = MY_2PI*(dipole_all*dipole_all -
qsum*dipole_r2 - qsum*qsum*zprd*zprd/12.0)/volume;
const double qscale = force->qqrd2e * scale;
if (eflag_global) energy_1 += qscale * e_slabcorr;
// per-atom energy
if (eflag_atom) {
double efact = qscale * MY_2PI/volume;
for (int i = 0; i < nlocal; i++)
eatom[i] += efact * q[i]*(x[i][2]*dipole_all - 0.5*(dipole_r2 +
qsum*x[i][2]*x[i][2]) - qsum*zprd*zprd/12.0);
}
// add on force corrections
double ffact = qscale * (-4.0*MY_PI/volume);
double **f = atom->f;
for (int i = 0; i < nlocal; i++) f[i][2] += ffact * q[i]*(dipole_all - qsum*x[i][2]);
}
/* ----------------------------------------------------------------------
perform and time the 1d FFTs required for N timesteps
------------------------------------------------------------------------- */
int PPPMDisp::timing_1d(int n, double &time1d)
{
double time1,time2;
int mixing = 1;
if (function[2]) mixing = 4;
if (function[3]) mixing = nsplit_alloc/2;
if (function[0]) for (int i = 0; i < 2*nfft_both; i++) work1[i] = ZEROF;
if (function[1] + function[2] + function[3])
for (int i = 0; i < 2*nfft_both_6; i++) work1_6[i] = ZEROF;
MPI_Barrier(world);
time1 = MPI_Wtime();
if (function[0]) {
for (int i = 0; i < n; i++) {
fft1->timing1d(work1,nfft_both,1);
fft2->timing1d(work1,nfft_both,-1);
if (differentiation_flag != 1){
fft2->timing1d(work1,nfft_both,-1);
fft2->timing1d(work1,nfft_both,-1);
}
}
}
MPI_Barrier(world);
time2 = MPI_Wtime();
time1d = time2 - time1;
MPI_Barrier(world);
time1 = MPI_Wtime();
if (function[1] + function[2] + function[3]) {
for (int i = 0; i < n; i++) {
fft1_6->timing1d(work1_6,nfft_both_6,1);
fft2_6->timing1d(work1_6,nfft_both_6,-1);
if (differentiation_flag != 1){
fft2_6->timing1d(work1_6,nfft_both_6,-1);
fft2_6->timing1d(work1_6,nfft_both_6,-1);
}
}
}
MPI_Barrier(world);
time2 = MPI_Wtime();
time1d += (time2 - time1)*mixing;
if (differentiation_flag) return 2;
return 4;
}
/* ----------------------------------------------------------------------
perform and time the 3d FFTs required for N timesteps
------------------------------------------------------------------------- */
int PPPMDisp::timing_3d(int n, double &time3d)
{
double time1,time2;
int mixing = 1;
if (function[2]) mixing = 4;
if (function[3]) mixing = nsplit_alloc/2;
if (function[0]) for (int i = 0; i < 2*nfft_both; i++) work1[i] = ZEROF;
if (function[1] + function[2] + function[3])
for (int i = 0; i < 2*nfft_both_6; i++) work1_6[i] = ZEROF;
MPI_Barrier(world);
time1 = MPI_Wtime();
if (function[0]) {
for (int i = 0; i < n; i++) {
fft1->compute(work1,work1,1);
fft2->compute(work1,work1,-1);
if (differentiation_flag != 1) {
fft2->compute(work1,work1,-1);
fft2->compute(work1,work1,-1);
}
}
}
MPI_Barrier(world);
time2 = MPI_Wtime();
time3d = time2 - time1;
MPI_Barrier(world);
time1 = MPI_Wtime();
if (function[1] + function[2] + function[3]) {
for (int i = 0; i < n; i++) {
fft1_6->compute(work1_6,work1_6,1);
fft2_6->compute(work1_6,work1_6,-1);
if (differentiation_flag != 1) {
fft2_6->compute(work1_6,work1_6,-1);
fft2_6->compute(work1_6,work1_6,-1);
}
}
}
MPI_Barrier(world);
time2 = MPI_Wtime();
time3d += (time2 - time1) * mixing;
if (differentiation_flag) return 2;
return 4;
}
/* ----------------------------------------------------------------------
memory usage of local arrays
------------------------------------------------------------------------- */
double PPPMDisp::memory_usage()
{
double bytes = nmax*3 * sizeof(double);
int mixing = 1;
int diff = 3; //depends on differentiation
int per = 7; //depends on per atom calculations
if (differentiation_flag) {
diff = 1;
per = 6;
}
if (!evflag_atom) per = 0;
if (function[2]) mixing = 7;
if (function[3]) mixing = nsplit_alloc;
if (function[0]) {
int nbrick = (nxhi_out-nxlo_out+1) * (nyhi_out-nylo_out+1) *
(nzhi_out-nzlo_out+1);
bytes += (1 + diff + per) * nbrick * sizeof(FFT_SCALAR); //brick memory
bytes += 6 * nfft_both * sizeof(double); // vg
bytes += nfft_both * sizeof(double); // greensfn
bytes += nfft_both * 3 * sizeof(FFT_SCALAR); // density_FFT, work1, work2
if (cg) bytes += cg->memory_usage();
}
if (function[1] + function[2] + function[3]) {
int nbrick = (nxhi_out_6-nxlo_out_6+1) * (nyhi_out_6-nylo_out_6+1) *
(nzhi_out_6-nzlo_out_6+1);
bytes += (1 + diff + per ) * nbrick * sizeof(FFT_SCALAR) * mixing; // density_brick + vd_brick + per atom bricks
bytes += 6 * nfft_both_6 * sizeof(double); // vg
bytes += nfft_both_6 * sizeof(double); // greensfn
bytes += nfft_both_6 * (mixing + 2) * sizeof(FFT_SCALAR); // density_FFT, work1, work2
if (cg_6) bytes += cg_6->memory_usage();
}
return bytes;
}
diff --git a/src/MC/fix_gcmc.cpp b/src/MC/fix_gcmc.cpp
index cba5a0a17..73758e362 100644
--- a/src/MC/fix_gcmc.cpp
+++ b/src/MC/fix_gcmc.cpp
@@ -1,2475 +1,2476 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Paul Crozier, Aidan Thompson (SNL)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "fix_gcmc.h"
#include "atom.h"
#include "atom_vec.h"
#include "atom_vec_hybrid.h"
#include "molecule.h"
#include "update.h"
#include "modify.h"
#include "fix.h"
#include "comm.h"
#include "compute.h"
#include "group.h"
#include "domain.h"
#include "region.h"
#include "random_park.h"
#include "force.h"
#include "pair.h"
#include "bond.h"
#include "angle.h"
#include "dihedral.h"
#include "improper.h"
#include "kspace.h"
#include "math_extra.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
#include "thermo.h"
#include "output.h"
#include "neighbor.h"
#include <iostream>
using namespace std;
using namespace LAMMPS_NS;
using namespace FixConst;
using namespace MathConst;
// large energy value used to signal overlap
#define MAXENERGYSIGNAL 1.0e100
// this must be lower than MAXENERGYSIGNAL
// by a large amount, so that it is still
// less than total energy when negative
// energy contributions are added to MAXENERGYSIGNAL
#define MAXENERGYTEST 1.0e50
enum{ATOM,MOLECULE};
/* ---------------------------------------------------------------------- */
FixGCMC::FixGCMC(LAMMPS *lmp, int narg, char **arg) :
Fix(lmp, narg, arg),
idregion(NULL), full_flag(0), ngroups(0), groupstrings(NULL), ngrouptypes(0), grouptypestrings(NULL),
grouptypebits(NULL), grouptypes(NULL), local_gas_list(NULL), atom_coord(NULL), random_equal(NULL), random_unequal(NULL),
coords(NULL), imageflags(NULL), fixrigid(NULL), fixshake(NULL), idrigid(NULL), idshake(NULL)
{
if (narg < 11) error->all(FLERR,"Illegal fix gcmc command");
if (atom->molecular == 2)
error->all(FLERR,"Fix gcmc does not (yet) work with atom_style template");
dynamic_group_allow = 1;
vector_flag = 1;
size_vector = 8;
global_freq = 1;
extvector = 0;
restart_global = 1;
time_depend = 1;
// required args
nevery = force->inumeric(FLERR,arg[3]);
nexchanges = force->inumeric(FLERR,arg[4]);
nmcmoves = force->inumeric(FLERR,arg[5]);
ngcmc_type = force->inumeric(FLERR,arg[6]);
seed = force->inumeric(FLERR,arg[7]);
reservoir_temperature = force->numeric(FLERR,arg[8]);
chemical_potential = force->numeric(FLERR,arg[9]);
displace = force->numeric(FLERR,arg[10]);
if (nevery <= 0) error->all(FLERR,"Illegal fix gcmc command");
if (nexchanges < 0) error->all(FLERR,"Illegal fix gcmc command");
if (nmcmoves < 0) error->all(FLERR,"Illegal fix gcmc command");
if (seed <= 0) error->all(FLERR,"Illegal fix gcmc command");
if (reservoir_temperature < 0.0)
error->all(FLERR,"Illegal fix gcmc command");
if (displace < 0.0) error->all(FLERR,"Illegal fix gcmc command");
// read options from end of input line
options(narg-11,&arg[11]);
// random number generator, same for all procs
random_equal = new RanPark(lmp,seed);
// random number generator, not the same for all procs
random_unequal = new RanPark(lmp,seed);
// error checks on region and its extent being inside simulation box
region_xlo = region_xhi = region_ylo = region_yhi =
region_zlo = region_zhi = 0.0;
if (regionflag) {
if (domain->regions[iregion]->bboxflag == 0)
error->all(FLERR,"Fix gcmc region does not support a bounding box");
if (domain->regions[iregion]->dynamic_check())
error->all(FLERR,"Fix gcmc region cannot be dynamic");
region_xlo = domain->regions[iregion]->extent_xlo;
region_xhi = domain->regions[iregion]->extent_xhi;
region_ylo = domain->regions[iregion]->extent_ylo;
region_yhi = domain->regions[iregion]->extent_yhi;
region_zlo = domain->regions[iregion]->extent_zlo;
region_zhi = domain->regions[iregion]->extent_zhi;
if (region_xlo < domain->boxlo[0] || region_xhi > domain->boxhi[0] ||
region_ylo < domain->boxlo[1] || region_yhi > domain->boxhi[1] ||
region_zlo < domain->boxlo[2] || region_zhi > domain->boxhi[2])
error->all(FLERR,"Fix gcmc region extends outside simulation box");
// estimate region volume using MC trials
double coord[3];
int inside = 0;
int attempts = 10000000;
for (int i = 0; i < attempts; i++) {
coord[0] = region_xlo + random_equal->uniform() * (region_xhi-region_xlo);
coord[1] = region_ylo + random_equal->uniform() * (region_yhi-region_ylo);
coord[2] = region_zlo + random_equal->uniform() * (region_zhi-region_zlo);
if (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) != 0)
inside++;
}
double max_region_volume = (region_xhi - region_xlo)*
(region_yhi - region_ylo)*(region_zhi - region_zlo);
region_volume = max_region_volume*static_cast<double> (inside)/
static_cast<double> (attempts);
}
// error check and further setup for mode = MOLECULE
if (mode == MOLECULE) {
if (onemols[imol]->xflag == 0)
error->all(FLERR,"Fix gcmc molecule must have coordinates");
if (onemols[imol]->typeflag == 0)
error->all(FLERR,"Fix gcmc molecule must have atom types");
if (ngcmc_type != 0)
error->all(FLERR,"Atom type must be zero in fix gcmc mol command");
if (onemols[imol]->qflag == 1 && atom->q == NULL)
error->all(FLERR,"Fix gcmc molecule has charges, but atom style does not");
if (atom->molecular == 2 && onemols != atom->avec->onemols)
error->all(FLERR,"Fix gcmc molecule template ID must be same "
"as atom_style template ID");
onemols[imol]->check_attributes(0);
}
if (charge_flag && atom->q == NULL)
error->all(FLERR,"Fix gcmc atom has charge, but atom style does not");
if (rigidflag && mode == ATOM)
error->all(FLERR,"Cannot use fix gcmc rigid and not molecule");
if (shakeflag && mode == ATOM)
error->all(FLERR,"Cannot use fix gcmc shake and not molecule");
if (rigidflag && shakeflag)
error->all(FLERR,"Cannot use fix gcmc rigid and shake");
// setup of coords and imageflags array
if (mode == ATOM) natoms_per_molecule = 1;
else natoms_per_molecule = onemols[imol]->natoms;
memory->create(coords,natoms_per_molecule,3,"gcmc:coords");
memory->create(imageflags,natoms_per_molecule,"gcmc:imageflags");
memory->create(atom_coord,natoms_per_molecule,3,"gcmc:atom_coord");
// compute the number of MC cycles that occur nevery timesteps
ncycles = nexchanges + nmcmoves;
// set up reneighboring
force_reneighbor = 1;
next_reneighbor = update->ntimestep + 1;
// zero out counters
ntranslation_attempts = 0.0;
ntranslation_successes = 0.0;
nrotation_attempts = 0.0;
nrotation_successes = 0.0;
ndeletion_attempts = 0.0;
ndeletion_successes = 0.0;
ninsertion_attempts = 0.0;
ninsertion_successes = 0.0;
gcmc_nmax = 0;
local_gas_list = NULL;
}
/* ----------------------------------------------------------------------
parse optional parameters at end of input line
------------------------------------------------------------------------- */
void FixGCMC::options(int narg, char **arg)
{
if (narg < 0) error->all(FLERR,"Illegal fix gcmc command");
// defaults
mode = ATOM;
max_rotation_angle = 10*MY_PI/180;
regionflag = 0;
iregion = -1;
region_volume = 0;
max_region_attempts = 1000;
molecule_group = 0;
molecule_group_bit = 0;
molecule_group_inversebit = 0;
exclusion_group = 0;
exclusion_group_bit = 0;
pressure_flag = false;
pressure = 0.0;
fugacity_coeff = 1.0;
rigidflag = 0;
shakeflag = 0;
charge = 0.0;
charge_flag = false;
full_flag = false;
ngroups = 0;
int ngroupsmax = 0;
groupstrings = NULL;
ngrouptypes = 0;
int ngrouptypesmax = 0;
grouptypestrings = NULL;
grouptypes = NULL;
grouptypebits = NULL;
energy_intra = 0.0;
tfac_insert = 1.0;
- overlap_cutoff = 0.0;
+ overlap_cutoffsq = 0.0;
overlap_flag = 0;
int iarg = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"mol") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
imol = atom->find_molecule(arg[iarg+1]);
if (imol == -1)
error->all(FLERR,"Molecule template ID for fix gcmc does not exist");
if (atom->molecules[imol]->nset > 1 && comm->me == 0)
error->warning(FLERR,"Molecule template for "
"fix gcmc has multiple molecules");
mode = MOLECULE;
onemols = atom->molecules;
nmol = onemols[imol]->nset;
iarg += 2;
} else if (strcmp(arg[iarg],"region") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
iregion = domain->find_region(arg[iarg+1]);
if (iregion == -1)
error->all(FLERR,"Region ID for fix gcmc does not exist");
int n = strlen(arg[iarg+1]) + 1;
idregion = new char[n];
strcpy(idregion,arg[iarg+1]);
regionflag = 1;
iarg += 2;
} else if (strcmp(arg[iarg],"maxangle") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
max_rotation_angle = force->numeric(FLERR,arg[iarg+1]);
max_rotation_angle *= MY_PI/180;
iarg += 2;
} else if (strcmp(arg[iarg],"pressure") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
pressure = force->numeric(FLERR,arg[iarg+1]);
pressure_flag = true;
iarg += 2;
} else if (strcmp(arg[iarg],"fugacity_coeff") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
fugacity_coeff = force->numeric(FLERR,arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"charge") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
charge = force->numeric(FLERR,arg[iarg+1]);
charge_flag = true;
iarg += 2;
} else if (strcmp(arg[iarg],"rigid") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
int n = strlen(arg[iarg+1]) + 1;
delete [] idrigid;
idrigid = new char[n];
strcpy(idrigid,arg[iarg+1]);
rigidflag = 1;
iarg += 2;
} else if (strcmp(arg[iarg],"shake") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
int n = strlen(arg[iarg+1]) + 1;
delete [] idshake;
idshake = new char[n];
strcpy(idshake,arg[iarg+1]);
shakeflag = 1;
iarg += 2;
} else if (strcmp(arg[iarg],"full_energy") == 0) {
full_flag = true;
iarg += 1;
} else if (strcmp(arg[iarg],"group") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
if (ngroups >= ngroupsmax) {
ngroupsmax = ngroups+1;
groupstrings = (char **)
memory->srealloc(groupstrings,
ngroupsmax*sizeof(char *),
"fix_gcmc:groupstrings");
}
int n = strlen(arg[iarg+1]) + 1;
groupstrings[ngroups] = new char[n];
strcpy(groupstrings[ngroups],arg[iarg+1]);
ngroups++;
iarg += 2;
} else if (strcmp(arg[iarg],"grouptype") == 0) {
if (iarg+3 > narg) error->all(FLERR,"Illegal fix gcmc command");
if (ngrouptypes >= ngrouptypesmax) {
ngrouptypesmax = ngrouptypes+1;
grouptypes = (int*) memory->srealloc(grouptypes,ngrouptypesmax*sizeof(int),
"fix_gcmc:grouptypes");
grouptypestrings = (char**)
memory->srealloc(grouptypestrings,
ngrouptypesmax*sizeof(char *),
"fix_gcmc:grouptypestrings");
}
grouptypes[ngrouptypes] = atoi(arg[iarg+1]);
int n = strlen(arg[iarg+2]) + 1;
grouptypestrings[ngrouptypes] = new char[n];
strcpy(grouptypestrings[ngrouptypes],arg[iarg+2]);
ngrouptypes++;
iarg += 3;
} else if (strcmp(arg[iarg],"intra_energy") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
energy_intra = force->numeric(FLERR,arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"tfac_insert") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
tfac_insert = force->numeric(FLERR,arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"overlap_cutoff") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix gcmc command");
- overlap_cutoff = force->numeric(FLERR,arg[iarg+1]);
+ double rtmp = force->numeric(FLERR,arg[iarg+1]);
+ overlap_cutoffsq = rtmp*rtmp;
overlap_flag = 1;
iarg += 2;
} else error->all(FLERR,"Illegal fix gcmc command");
}
}
/* ---------------------------------------------------------------------- */
FixGCMC::~FixGCMC()
{
if (regionflag) delete [] idregion;
delete random_equal;
delete random_unequal;
memory->destroy(local_gas_list);
memory->destroy(atom_coord);
memory->destroy(coords);
memory->destroy(imageflags);
delete [] idrigid;
delete [] idshake;
if (ngroups > 0) {
for (int igroup = 0; igroup < ngroups; igroup++)
delete [] groupstrings[igroup];
memory->sfree(groupstrings);
}
if (ngrouptypes > 0) {
memory->destroy(grouptypes);
memory->destroy(grouptypebits);
for (int igroup = 0; igroup < ngrouptypes; igroup++)
delete [] grouptypestrings[igroup];
memory->sfree(grouptypestrings);
}
if (full_flag && group) {
int igroupall = group->find("all");
neighbor->exclusion_group_group_delete(exclusion_group,igroupall);
}
}
/* ---------------------------------------------------------------------- */
int FixGCMC::setmask()
{
int mask = 0;
mask |= PRE_EXCHANGE;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixGCMC::init()
{
triclinic = domain->triclinic;
// decide whether to switch to the full_energy option
if (!full_flag) {
if ((force->kspace) ||
(force->pair == NULL) ||
(force->pair->single_enable == 0) ||
(force->pair_match("hybrid",0)) ||
(force->pair_match("eam",0)) ||
(force->pair->tail_flag)
) {
full_flag = true;
if (comm->me == 0)
error->warning(FLERR,"Fix gcmc using full_energy option");
}
}
if (full_flag) {
char *id_pe = (char *) "thermo_pe";
int ipe = modify->find_compute(id_pe);
c_pe = modify->compute[ipe];
}
int *type = atom->type;
if (mode == ATOM) {
if (ngcmc_type <= 0 || ngcmc_type > atom->ntypes)
error->all(FLERR,"Invalid atom type in fix gcmc command");
}
// if mode == ATOM, warn if any deletable atom has a mol ID
if ((mode == ATOM) && atom->molecule_flag) {
tagint *molecule = atom->molecule;
int flag = 0;
for (int i = 0; i < atom->nlocal; i++)
if (type[i] == ngcmc_type)
if (molecule[i]) flag = 1;
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
if (flagall && comm->me == 0)
error->all(FLERR,
"Fix gcmc cannot exchange individual atoms belonging to a molecule");
}
// if mode == MOLECULE, check for unset mol IDs
if (mode == MOLECULE) {
tagint *molecule = atom->molecule;
int *mask = atom->mask;
int flag = 0;
for (int i = 0; i < atom->nlocal; i++)
if (mask[i] == groupbit)
if (molecule[i] == 0) flag = 1;
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
if (flagall && comm->me == 0)
error->all(FLERR,
"All mol IDs should be set for fix gcmc group atoms");
}
if (((mode == MOLECULE) && (atom->molecule_flag == 0)) ||
((mode == MOLECULE) && (!atom->tag_enable || !atom->map_style)))
error->all(FLERR,
"Fix gcmc molecule command requires that "
"atoms have molecule attributes");
// if rigidflag defined, check for rigid/small fix
// its molecule template must be same as this one
fixrigid = NULL;
if (rigidflag) {
int ifix = modify->find_fix(idrigid);
if (ifix < 0) error->all(FLERR,"Fix gcmc rigid fix does not exist");
fixrigid = modify->fix[ifix];
int tmp;
if (onemols != (Molecule **) fixrigid->extract("onemol",tmp))
error->all(FLERR,
"Fix gcmc and fix rigid/small not using "
"same molecule template ID");
}
// if shakeflag defined, check for SHAKE fix
// its molecule template must be same as this one
fixshake = NULL;
if (shakeflag) {
int ifix = modify->find_fix(idshake);
if (ifix < 0) error->all(FLERR,"Fix gcmc shake fix does not exist");
fixshake = modify->fix[ifix];
int tmp;
if (onemols != (Molecule **) fixshake->extract("onemol",tmp))
error->all(FLERR,"Fix gcmc and fix shake not using "
"same molecule template ID");
}
if (domain->dimension == 2)
error->all(FLERR,"Cannot use fix gcmc in a 2d simulation");
// create a new group for interaction exclusions
// used for attempted atom or molecule deletions
// skip if already exists from previous init()
if (full_flag && !exclusion_group_bit) {
char **group_arg = new char*[4];
// create unique group name for atoms to be excluded
int len = strlen(id) + 30;
group_arg[0] = new char[len];
sprintf(group_arg[0],"FixGCMC:gcmc_exclusion_group:%s",id);
group_arg[1] = (char *) "subtract";
group_arg[2] = (char *) "all";
group_arg[3] = (char *) "all";
group->assign(4,group_arg);
exclusion_group = group->find(group_arg[0]);
if (exclusion_group == -1)
error->all(FLERR,"Could not find fix gcmc exclusion group ID");
exclusion_group_bit = group->bitmask[exclusion_group];
// neighbor list exclusion setup
// turn off interactions between group all and the exclusion group
int narg = 4;
char **arg = new char*[narg];;
arg[0] = (char *) "exclude";
arg[1] = (char *) "group";
arg[2] = group_arg[0];
arg[3] = (char *) "all";
neighbor->modify_params(narg,arg);
delete [] group_arg[0];
delete [] group_arg;
delete [] arg;
}
// create a new group for temporary use with selected molecules
if (mode == MOLECULE) {
char **group_arg = new char*[3];
// create unique group name for atoms to be rotated
int len = strlen(id) + 30;
group_arg[0] = new char[len];
sprintf(group_arg[0],"FixGCMC:rotation_gas_atoms:%s",id);
group_arg[1] = (char *) "molecule";
char digits[12];
sprintf(digits,"%d",-1);
group_arg[2] = digits;
group->assign(3,group_arg);
molecule_group = group->find(group_arg[0]);
if (molecule_group == -1)
error->all(FLERR,"Could not find fix gcmc rotation group ID");
molecule_group_bit = group->bitmask[molecule_group];
molecule_group_inversebit = molecule_group_bit ^ ~0;
delete [] group_arg[0];
delete [] group_arg;
}
// get all of the needed molecule data if mode == MOLECULE,
// otherwise just get the gas mass
if (mode == MOLECULE) {
onemols[imol]->compute_mass();
onemols[imol]->compute_com();
gas_mass = onemols[imol]->masstotal;
for (int i = 0; i < onemols[imol]->natoms; i++) {
onemols[imol]->x[i][0] -= onemols[imol]->com[0];
onemols[imol]->x[i][1] -= onemols[imol]->com[1];
onemols[imol]->x[i][2] -= onemols[imol]->com[2];
}
} else gas_mass = atom->mass[ngcmc_type];
if (gas_mass <= 0.0)
error->all(FLERR,"Illegal fix gcmc gas mass <= 0");
// check that no deletable atoms are in atom->firstgroup
// deleting such an atom would not leave firstgroup atoms first
if (atom->firstgroup >= 0) {
int *mask = atom->mask;
int firstgroupbit = group->bitmask[atom->firstgroup];
int flag = 0;
for (int i = 0; i < atom->nlocal; i++)
if ((mask[i] == groupbit) && (mask[i] && firstgroupbit)) flag = 1;
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
if (flagall)
error->all(FLERR,"Cannot do GCMC on atoms in atom_modify first group");
}
// compute beta, lambda, sigma, and the zz factor
// For LJ units, lambda=1
beta = 1.0/(force->boltz*reservoir_temperature);
if (strcmp(update->unit_style,"lj") == 0)
zz = exp(beta*chemical_potential);
else {
double lambda = sqrt(force->hplanck*force->hplanck/
(2.0*MY_PI*gas_mass*force->mvv2e*
force->boltz*reservoir_temperature));
zz = exp(beta*chemical_potential)/(pow(lambda,3.0));
}
sigma = sqrt(force->boltz*reservoir_temperature*tfac_insert/gas_mass/force->mvv2e);
if (pressure_flag) zz = pressure*fugacity_coeff*beta/force->nktv2p;
imagezero = ((imageint) IMGMAX << IMG2BITS) |
((imageint) IMGMAX << IMGBITS) | IMGMAX;
// construct group bitmask for all new atoms
// aggregated over all group keywords
groupbitall = 1 | groupbit;
for (int igroup = 0; igroup < ngroups; igroup++) {
int jgroup = group->find(groupstrings[igroup]);
if (jgroup == -1)
error->all(FLERR,"Could not find specified fix gcmc group ID");
groupbitall |= group->bitmask[jgroup];
}
// construct group type bitmasks
// not aggregated over all group keywords
if (ngrouptypes > 0) {
memory->create(grouptypebits,ngrouptypes,"fix_gcmc:grouptypebits");
for (int igroup = 0; igroup < ngrouptypes; igroup++) {
int jgroup = group->find(grouptypestrings[igroup]);
if (jgroup == -1)
error->all(FLERR,"Could not find specified fix gcmc group ID");
grouptypebits[igroup] = group->bitmask[jgroup];
}
}
}
/* ----------------------------------------------------------------------
attempt Monte Carlo translations, rotations, insertions, and deletions
done before exchange, borders, reneighbor
so that ghost atoms and neighbor lists will be correct
------------------------------------------------------------------------- */
void FixGCMC::pre_exchange()
{
// just return if should not be called on this timestep
if (next_reneighbor != update->ntimestep) return;
xlo = domain->boxlo[0];
xhi = domain->boxhi[0];
ylo = domain->boxlo[1];
yhi = domain->boxhi[1];
zlo = domain->boxlo[2];
zhi = domain->boxhi[2];
if (triclinic) {
sublo = domain->sublo_lamda;
subhi = domain->subhi_lamda;
} else {
sublo = domain->sublo;
subhi = domain->subhi;
}
if (regionflag) volume = region_volume;
else volume = domain->xprd * domain->yprd * domain->zprd;
if (triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
comm->exchange();
atom->nghost = 0;
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
update_gas_atoms_list();
if (full_flag) {
energy_stored = energy_full();
if (overlap_flag && energy_stored > MAXENERGYTEST)
error->warning(FLERR,"Energy of old configuration in "
"fix gcmc is > MAXENERGYTEST.");
if (mode == MOLECULE) {
for (int i = 0; i < ncycles; i++) {
int random_int_fraction =
static_cast<int>(random_equal->uniform()*ncycles) + 1;
if (random_int_fraction <= nmcmoves) {
if (random_equal->uniform() < 0.5) attempt_molecule_translation_full();
else attempt_molecule_rotation_full();
} else {
if (random_equal->uniform() < 0.5) attempt_molecule_deletion_full();
else attempt_molecule_insertion_full();
}
}
} else {
for (int i = 0; i < ncycles; i++) {
int random_int_fraction =
static_cast<int>(random_equal->uniform()*ncycles) + 1;
if (random_int_fraction <= nmcmoves) {
attempt_atomic_translation_full();
} else {
if (random_equal->uniform() < 0.5) attempt_atomic_deletion_full();
else attempt_atomic_insertion_full();
}
}
}
if (triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
comm->exchange();
atom->nghost = 0;
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
} else {
if (mode == MOLECULE) {
for (int i = 0; i < ncycles; i++) {
int random_int_fraction =
static_cast<int>(random_equal->uniform()*ncycles) + 1;
if (random_int_fraction <= nmcmoves) {
if (random_equal->uniform() < 0.5) attempt_molecule_translation();
else attempt_molecule_rotation();
} else {
if (random_equal->uniform() < 0.5) attempt_molecule_deletion();
else attempt_molecule_insertion();
}
}
} else {
for (int i = 0; i < ncycles; i++) {
int random_int_fraction =
static_cast<int>(random_equal->uniform()*ncycles) + 1;
if (random_int_fraction <= nmcmoves) {
attempt_atomic_translation();
} else {
if (random_equal->uniform() < 0.5) attempt_atomic_deletion();
else attempt_atomic_insertion();
}
}
}
}
next_reneighbor = update->ntimestep + nevery;
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_atomic_translation()
{
ntranslation_attempts += 1.0;
if (ngas == 0) return;
int i = pick_random_gas_atom();
int success = 0;
if (i >= 0) {
double **x = atom->x;
double energy_before = energy(i,ngcmc_type,-1,x[i]);
if (overlap_flag && energy_before > MAXENERGYTEST)
error->warning(FLERR,"Energy of old configuration in "
"fix gcmc is > MAXENERGYTEST.");
double rsq = 1.1;
double rx,ry,rz;
rx = ry = rz = 0.0;
double coord[3];
while (rsq > 1.0) {
rx = 2*random_unequal->uniform() - 1.0;
ry = 2*random_unequal->uniform() - 1.0;
rz = 2*random_unequal->uniform() - 1.0;
rsq = rx*rx + ry*ry + rz*rz;
}
coord[0] = x[i][0] + displace*rx;
coord[1] = x[i][1] + displace*ry;
coord[2] = x[i][2] + displace*rz;
if (regionflag) {
while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
rsq = 1.1;
while (rsq > 1.0) {
rx = 2*random_unequal->uniform() - 1.0;
ry = 2*random_unequal->uniform() - 1.0;
rz = 2*random_unequal->uniform() - 1.0;
rsq = rx*rx + ry*ry + rz*rz;
}
coord[0] = x[i][0] + displace*rx;
coord[1] = x[i][1] + displace*ry;
coord[2] = x[i][2] + displace*rz;
}
}
if (!domain->inside_nonperiodic(coord))
error->one(FLERR,"Fix gcmc put atom outside box");
double energy_after = energy(i,ngcmc_type,-1,coord);
if (energy_after < MAXENERGYTEST &&
random_unequal->uniform() <
exp(beta*(energy_before - energy_after))) {
x[i][0] = coord[0];
x[i][1] = coord[1];
x[i][2] = coord[2];
success = 1;
}
}
int success_all = 0;
MPI_Allreduce(&success,&success_all,1,MPI_INT,MPI_MAX,world);
if (success_all) {
if (triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
comm->exchange();
atom->nghost = 0;
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
update_gas_atoms_list();
ntranslation_successes += 1.0;
}
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_atomic_deletion()
{
ndeletion_attempts += 1.0;
if (ngas == 0) return;
int i = pick_random_gas_atom();
int success = 0;
if (i >= 0) {
double deletion_energy = energy(i,ngcmc_type,-1,atom->x[i]);
if (random_unequal->uniform() <
ngas*exp(beta*deletion_energy)/(zz*volume)) {
atom->avec->copy(atom->nlocal-1,i,1);
atom->nlocal--;
success = 1;
}
}
int success_all = 0;
MPI_Allreduce(&success,&success_all,1,MPI_INT,MPI_MAX,world);
if (success_all) {
atom->natoms--;
if (atom->tag_enable) {
if (atom->map_style) atom->map_init();
}
atom->nghost = 0;
if (triclinic) domain->x2lamda(atom->nlocal);
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
update_gas_atoms_list();
ndeletion_successes += 1.0;
}
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_atomic_insertion()
{
double lamda[3];
ninsertion_attempts += 1.0;
// pick coordinates for insertion point
double coord[3];
if (regionflag) {
int region_attempt = 0;
coord[0] = region_xlo + random_equal->uniform() * (region_xhi-region_xlo);
coord[1] = region_ylo + random_equal->uniform() * (region_yhi-region_ylo);
coord[2] = region_zlo + random_equal->uniform() * (region_zhi-region_zlo);
while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
coord[0] = region_xlo + random_equal->uniform() * (region_xhi-region_xlo);
coord[1] = region_ylo + random_equal->uniform() * (region_yhi-region_ylo);
coord[2] = region_zlo + random_equal->uniform() * (region_zhi-region_zlo);
region_attempt++;
if (region_attempt >= max_region_attempts) return;
}
if (triclinic) domain->x2lamda(coord,lamda);
} else {
if (triclinic == 0) {
coord[0] = xlo + random_equal->uniform() * (xhi-xlo);
coord[1] = ylo + random_equal->uniform() * (yhi-ylo);
coord[2] = zlo + random_equal->uniform() * (zhi-zlo);
} else {
lamda[0] = random_equal->uniform();
lamda[1] = random_equal->uniform();
lamda[2] = random_equal->uniform();
// wasteful, but necessary
if (lamda[0] == 1.0) lamda[0] = 0.0;
if (lamda[1] == 1.0) lamda[1] = 0.0;
if (lamda[2] == 1.0) lamda[2] = 0.0;
domain->lamda2x(lamda,coord);
}
}
int proc_flag = 0;
if (triclinic == 0) {
domain->remap(coord);
if (!domain->inside(coord))
error->one(FLERR,"Fix gcmc put atom outside box");
if (coord[0] >= sublo[0] && coord[0] < subhi[0] &&
coord[1] >= sublo[1] && coord[1] < subhi[1] &&
coord[2] >= sublo[2] && coord[2] < subhi[2]) proc_flag = 1;
} else {
if (lamda[0] >= sublo[0] && lamda[0] < subhi[0] &&
lamda[1] >= sublo[1] && lamda[1] < subhi[1] &&
lamda[2] >= sublo[2] && lamda[2] < subhi[2]) proc_flag = 1;
}
int success = 0;
if (proc_flag) {
int ii = -1;
if (charge_flag) {
ii = atom->nlocal + atom->nghost;
if (ii >= atom->nmax) atom->avec->grow(0);
atom->q[ii] = charge;
}
double insertion_energy = energy(ii,ngcmc_type,-1,coord);
if (insertion_energy < MAXENERGYTEST &&
random_unequal->uniform() <
zz*volume*exp(-beta*insertion_energy)/(ngas+1)) {
atom->avec->create_atom(ngcmc_type,coord);
int m = atom->nlocal - 1;
// add to groups
// optionally add to type-based groups
atom->mask[m] = groupbitall;
for (int igroup = 0; igroup < ngrouptypes; igroup++) {
if (ngcmc_type == grouptypes[igroup])
atom->mask[m] |= grouptypebits[igroup];
}
atom->v[m][0] = random_unequal->gaussian()*sigma;
atom->v[m][1] = random_unequal->gaussian()*sigma;
atom->v[m][2] = random_unequal->gaussian()*sigma;
modify->create_attribute(m);
success = 1;
}
}
int success_all = 0;
MPI_Allreduce(&success,&success_all,1,MPI_INT,MPI_MAX,world);
if (success_all) {
atom->natoms++;
if (atom->tag_enable) {
atom->tag_extend();
if (atom->map_style) atom->map_init();
}
atom->nghost = 0;
if (triclinic) domain->x2lamda(atom->nlocal);
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
update_gas_atoms_list();
ninsertion_successes += 1.0;
}
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_molecule_translation()
{
ntranslation_attempts += 1.0;
if (ngas == 0) return;
tagint translation_molecule = pick_random_gas_molecule();
if (translation_molecule == -1) return;
double energy_before_sum = molecule_energy(translation_molecule);
if (overlap_flag && energy_before_sum > MAXENERGYTEST)
error->warning(FLERR,"Energy of old configuration in "
"fix gcmc is > MAXENERGYTEST.");
double **x = atom->x;
double rx,ry,rz;
double com_displace[3],coord[3];
double rsq = 1.1;
while (rsq > 1.0) {
rx = 2*random_equal->uniform() - 1.0;
ry = 2*random_equal->uniform() - 1.0;
rz = 2*random_equal->uniform() - 1.0;
rsq = rx*rx + ry*ry + rz*rz;
}
com_displace[0] = displace*rx;
com_displace[1] = displace*ry;
com_displace[2] = displace*rz;
int nlocal = atom->nlocal;
if (regionflag) {
int *mask = atom->mask;
for (int i = 0; i < nlocal; i++) {
if (atom->molecule[i] == translation_molecule) {
mask[i] |= molecule_group_bit;
} else {
mask[i] &= molecule_group_inversebit;
}
}
double com[3];
com[0] = com[1] = com[2] = 0.0;
group->xcm(molecule_group,gas_mass,com);
coord[0] = com[0] + displace*rx;
coord[1] = com[1] + displace*ry;
coord[2] = com[2] + displace*rz;
while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
rsq = 1.1;
while (rsq > 1.0) {
rx = 2*random_equal->uniform() - 1.0;
ry = 2*random_equal->uniform() - 1.0;
rz = 2*random_equal->uniform() - 1.0;
rsq = rx*rx + ry*ry + rz*rz;
}
coord[0] = com[0] + displace*rx;
coord[1] = com[1] + displace*ry;
coord[2] = com[2] + displace*rz;
}
com_displace[0] = displace*rx;
com_displace[1] = displace*ry;
com_displace[2] = displace*rz;
}
double energy_after = 0.0;
for (int i = 0; i < nlocal; i++) {
if (atom->molecule[i] == translation_molecule) {
coord[0] = x[i][0] + com_displace[0];
coord[1] = x[i][1] + com_displace[1];
coord[2] = x[i][2] + com_displace[2];
if (!domain->inside_nonperiodic(coord))
error->one(FLERR,"Fix gcmc put atom outside box");
energy_after += energy(i,atom->type[i],translation_molecule,coord);
}
}
double energy_after_sum = 0.0;
MPI_Allreduce(&energy_after,&energy_after_sum,1,MPI_DOUBLE,MPI_SUM,world);
if (energy_after_sum < MAXENERGYTEST &&
random_equal->uniform() <
exp(beta*(energy_before_sum - energy_after_sum))) {
for (int i = 0; i < nlocal; i++) {
if (atom->molecule[i] == translation_molecule) {
x[i][0] += com_displace[0];
x[i][1] += com_displace[1];
x[i][2] += com_displace[2];
}
}
if (triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
comm->exchange();
atom->nghost = 0;
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
update_gas_atoms_list();
ntranslation_successes += 1.0;
}
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_molecule_rotation()
{
nrotation_attempts += 1.0;
if (ngas == 0) return;
tagint rotation_molecule = pick_random_gas_molecule();
if (rotation_molecule == -1) return;
double energy_before_sum = molecule_energy(rotation_molecule);
if (overlap_flag && energy_before_sum > MAXENERGYTEST)
error->warning(FLERR,"Energy of old configuration in "
"fix gcmc is > MAXENERGYTEST.");
int nlocal = atom->nlocal;
int *mask = atom->mask;
for (int i = 0; i < nlocal; i++) {
if (atom->molecule[i] == rotation_molecule) {
mask[i] |= molecule_group_bit;
} else {
mask[i] &= molecule_group_inversebit;
}
}
double com[3];
com[0] = com[1] = com[2] = 0.0;
group->xcm(molecule_group,gas_mass,com);
// generate point in unit cube
// then restrict to unit sphere
double r[3],rotmat[3][3],quat[4];
double rsq = 1.1;
while (rsq > 1.0) {
r[0] = 2.0*random_equal->uniform() - 1.0;
r[1] = 2.0*random_equal->uniform() - 1.0;
r[2] = 2.0*random_equal->uniform() - 1.0;
rsq = MathExtra::dot3(r, r);
}
double theta = random_equal->uniform() * max_rotation_angle;
MathExtra::norm3(r);
MathExtra::axisangle_to_quat(r,theta,quat);
MathExtra::quat_to_mat(quat,rotmat);
double **x = atom->x;
imageint *image = atom->image;
double energy_after = 0.0;
int n = 0;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & molecule_group_bit) {
double xtmp[3];
domain->unmap(x[i],image[i],xtmp);
xtmp[0] -= com[0];
xtmp[1] -= com[1];
xtmp[2] -= com[2];
MathExtra::matvec(rotmat,xtmp,atom_coord[n]);
atom_coord[n][0] += com[0];
atom_coord[n][1] += com[1];
atom_coord[n][2] += com[2];
xtmp[0] = atom_coord[n][0];
xtmp[1] = atom_coord[n][1];
xtmp[2] = atom_coord[n][2];
domain->remap(xtmp);
if (!domain->inside(xtmp))
error->one(FLERR,"Fix gcmc put atom outside box");
energy_after += energy(i,atom->type[i],rotation_molecule,xtmp);
n++;
}
}
double energy_after_sum = 0.0;
MPI_Allreduce(&energy_after,&energy_after_sum,1,MPI_DOUBLE,MPI_SUM,world);
if (energy_after_sum < MAXENERGYTEST &&
random_equal->uniform() <
exp(beta*(energy_before_sum - energy_after_sum))) {
int n = 0;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & molecule_group_bit) {
image[i] = imagezero;
x[i][0] = atom_coord[n][0];
x[i][1] = atom_coord[n][1];
x[i][2] = atom_coord[n][2];
domain->remap(x[i],image[i]);
n++;
}
}
if (triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
comm->exchange();
atom->nghost = 0;
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
update_gas_atoms_list();
nrotation_successes += 1.0;
}
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_molecule_deletion()
{
ndeletion_attempts += 1.0;
if (ngas == 0) return;
tagint deletion_molecule = pick_random_gas_molecule();
if (deletion_molecule == -1) return;
double deletion_energy_sum = molecule_energy(deletion_molecule);
if (random_equal->uniform() <
ngas*exp(beta*deletion_energy_sum)/(zz*volume*natoms_per_molecule)) {
int i = 0;
while (i < atom->nlocal) {
if (atom->molecule[i] == deletion_molecule) {
atom->avec->copy(atom->nlocal-1,i,1);
atom->nlocal--;
} else i++;
}
atom->natoms -= natoms_per_molecule;
if (atom->map_style) atom->map_init();
atom->nghost = 0;
if (triclinic) domain->x2lamda(atom->nlocal);
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
update_gas_atoms_list();
ndeletion_successes += 1.0;
}
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_molecule_insertion()
{
double lamda[3];
ninsertion_attempts += 1.0;
double com_coord[3];
if (regionflag) {
int region_attempt = 0;
com_coord[0] = region_xlo + random_equal->uniform() *
(region_xhi-region_xlo);
com_coord[1] = region_ylo + random_equal->uniform() *
(region_yhi-region_ylo);
com_coord[2] = region_zlo + random_equal->uniform() *
(region_zhi-region_zlo);
while (domain->regions[iregion]->match(com_coord[0],com_coord[1],
com_coord[2]) == 0) {
com_coord[0] = region_xlo + random_equal->uniform() *
(region_xhi-region_xlo);
com_coord[1] = region_ylo + random_equal->uniform() *
(region_yhi-region_ylo);
com_coord[2] = region_zlo + random_equal->uniform() *
(region_zhi-region_zlo);
region_attempt++;
if (region_attempt >= max_region_attempts) return;
}
if (triclinic) domain->x2lamda(com_coord,lamda);
} else {
if (triclinic == 0) {
com_coord[0] = xlo + random_equal->uniform() * (xhi-xlo);
com_coord[1] = ylo + random_equal->uniform() * (yhi-ylo);
com_coord[2] = zlo + random_equal->uniform() * (zhi-zlo);
} else {
lamda[0] = random_equal->uniform();
lamda[1] = random_equal->uniform();
lamda[2] = random_equal->uniform();
// wasteful, but necessary
if (lamda[0] == 1.0) lamda[0] = 0.0;
if (lamda[1] == 1.0) lamda[1] = 0.0;
if (lamda[2] == 1.0) lamda[2] = 0.0;
domain->lamda2x(lamda,com_coord);
}
}
// generate point in unit cube
// then restrict to unit sphere
double r[3],rotmat[3][3],quat[4];
double rsq = 1.1;
while (rsq > 1.0) {
r[0] = 2.0*random_equal->uniform() - 1.0;
r[1] = 2.0*random_equal->uniform() - 1.0;
r[2] = 2.0*random_equal->uniform() - 1.0;
rsq = MathExtra::dot3(r, r);
}
double theta = random_equal->uniform() * MY_2PI;
MathExtra::norm3(r);
MathExtra::axisangle_to_quat(r,theta,quat);
MathExtra::quat_to_mat(quat,rotmat);
double insertion_energy = 0.0;
bool procflag[natoms_per_molecule];
for (int i = 0; i < natoms_per_molecule; i++) {
MathExtra::matvec(rotmat,onemols[imol]->x[i],atom_coord[i]);
atom_coord[i][0] += com_coord[0];
atom_coord[i][1] += com_coord[1];
atom_coord[i][2] += com_coord[2];
// use temporary variable for remapped position
// so unmapped position is preserved in atom_coord
double xtmp[3];
xtmp[0] = atom_coord[i][0];
xtmp[1] = atom_coord[i][1];
xtmp[2] = atom_coord[i][2];
domain->remap(xtmp);
if (!domain->inside(xtmp))
error->one(FLERR,"Fix gcmc put atom outside box");
procflag[i] = false;
if (triclinic == 0) {
if (xtmp[0] >= sublo[0] && xtmp[0] < subhi[0] &&
xtmp[1] >= sublo[1] && xtmp[1] < subhi[1] &&
xtmp[2] >= sublo[2] && xtmp[2] < subhi[2]) procflag[i] = true;
} else {
domain->x2lamda(xtmp,lamda);
if (lamda[0] >= sublo[0] && lamda[0] < subhi[0] &&
lamda[1] >= sublo[1] && lamda[1] < subhi[1] &&
lamda[2] >= sublo[2] && lamda[2] < subhi[2]) procflag[i] = true;
}
if (procflag[i]) {
int ii = -1;
if (onemols[imol]->qflag == 1) {
ii = atom->nlocal + atom->nghost;
if (ii >= atom->nmax) atom->avec->grow(0);
atom->q[ii] = onemols[imol]->q[i];
}
insertion_energy += energy(ii,onemols[imol]->type[i],-1,xtmp);
}
}
double insertion_energy_sum = 0.0;
MPI_Allreduce(&insertion_energy,&insertion_energy_sum,1,
MPI_DOUBLE,MPI_SUM,world);
if (insertion_energy_sum < MAXENERGYTEST &&
random_equal->uniform() < zz*volume*natoms_per_molecule*
exp(-beta*insertion_energy_sum)/(ngas + natoms_per_molecule)) {
tagint maxmol = 0;
for (int i = 0; i < atom->nlocal; i++) maxmol = MAX(maxmol,atom->molecule[i]);
tagint maxmol_all;
MPI_Allreduce(&maxmol,&maxmol_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
maxmol_all++;
if (maxmol_all >= MAXTAGINT)
error->all(FLERR,"Fix gcmc ran out of available molecule IDs");
tagint maxtag = 0;
for (int i = 0; i < atom->nlocal; i++) maxtag = MAX(maxtag,atom->tag[i]);
tagint maxtag_all;
MPI_Allreduce(&maxtag,&maxtag_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
int nlocalprev = atom->nlocal;
double vnew[3];
vnew[0] = random_equal->gaussian()*sigma;
vnew[1] = random_equal->gaussian()*sigma;
vnew[2] = random_equal->gaussian()*sigma;
for (int i = 0; i < natoms_per_molecule; i++) {
if (procflag[i]) {
atom->avec->create_atom(onemols[imol]->type[i],atom_coord[i]);
int m = atom->nlocal - 1;
// add to groups
// optionally add to type-based groups
atom->mask[m] = groupbitall;
for (int igroup = 0; igroup < ngrouptypes; igroup++) {
if (ngcmc_type == grouptypes[igroup])
atom->mask[m] |= grouptypebits[igroup];
}
atom->image[m] = imagezero;
domain->remap(atom->x[m],atom->image[m]);
atom->molecule[m] = maxmol_all;
if (maxtag_all+i+1 >= MAXTAGINT)
error->all(FLERR,"Fix gcmc ran out of available atom IDs");
atom->tag[m] = maxtag_all + i + 1;
atom->v[m][0] = vnew[0];
atom->v[m][1] = vnew[1];
atom->v[m][2] = vnew[2];
atom->add_molecule_atom(onemols[imol],i,m,maxtag_all);
modify->create_attribute(m);
}
}
// FixRigidSmall::set_molecule stores rigid body attributes
// FixShake::set_molecule stores shake info for molecule
if (rigidflag)
fixrigid->set_molecule(nlocalprev,maxtag_all,imol,com_coord,vnew,quat);
else if (shakeflag)
fixshake->set_molecule(nlocalprev,maxtag_all,imol,com_coord,vnew,quat);
atom->natoms += natoms_per_molecule;
if (atom->natoms < 0)
error->all(FLERR,"Too many total atoms");
atom->nbonds += onemols[imol]->nbonds;
atom->nangles += onemols[imol]->nangles;
atom->ndihedrals += onemols[imol]->ndihedrals;
atom->nimpropers += onemols[imol]->nimpropers;
if (atom->map_style) atom->map_init();
atom->nghost = 0;
if (triclinic) domain->x2lamda(atom->nlocal);
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
update_gas_atoms_list();
ninsertion_successes += 1.0;
}
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_atomic_translation_full()
{
ntranslation_attempts += 1.0;
if (ngas == 0) return;
double energy_before = energy_stored;
int i = pick_random_gas_atom();
double **x = atom->x;
double xtmp[3];
xtmp[0] = xtmp[1] = xtmp[2] = 0.0;
tagint tmptag = -1;
if (i >= 0) {
double rsq = 1.1;
double rx,ry,rz;
rx = ry = rz = 0.0;
double coord[3];
while (rsq > 1.0) {
rx = 2*random_unequal->uniform() - 1.0;
ry = 2*random_unequal->uniform() - 1.0;
rz = 2*random_unequal->uniform() - 1.0;
rsq = rx*rx + ry*ry + rz*rz;
}
coord[0] = x[i][0] + displace*rx;
coord[1] = x[i][1] + displace*ry;
coord[2] = x[i][2] + displace*rz;
if (regionflag) {
while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
rsq = 1.1;
while (rsq > 1.0) {
rx = 2*random_unequal->uniform() - 1.0;
ry = 2*random_unequal->uniform() - 1.0;
rz = 2*random_unequal->uniform() - 1.0;
rsq = rx*rx + ry*ry + rz*rz;
}
coord[0] = x[i][0] + displace*rx;
coord[1] = x[i][1] + displace*ry;
coord[2] = x[i][2] + displace*rz;
}
}
if (!domain->inside_nonperiodic(coord))
error->one(FLERR,"Fix gcmc put atom outside box");
xtmp[0] = x[i][0];
xtmp[1] = x[i][1];
xtmp[2] = x[i][2];
x[i][0] = coord[0];
x[i][1] = coord[1];
x[i][2] = coord[2];
tmptag = atom->tag[i];
}
double energy_after = energy_full();
if (energy_after < MAXENERGYTEST &&
random_equal->uniform() <
exp(beta*(energy_before - energy_after))) {
energy_stored = energy_after;
ntranslation_successes += 1.0;
} else {
tagint tmptag_all;
MPI_Allreduce(&tmptag,&tmptag_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
double xtmp_all[3];
MPI_Allreduce(&xtmp,&xtmp_all,3,MPI_DOUBLE,MPI_SUM,world);
for (int i = 0; i < atom->nlocal; i++) {
if (tmptag_all == atom->tag[i]) {
x[i][0] = xtmp_all[0];
x[i][1] = xtmp_all[1];
x[i][2] = xtmp_all[2];
}
}
energy_stored = energy_before;
}
update_gas_atoms_list();
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_atomic_deletion_full()
{
double q_tmp;
const int q_flag = atom->q_flag;
ndeletion_attempts += 1.0;
if (ngas == 0) return;
double energy_before = energy_stored;
const int i = pick_random_gas_atom();
int tmpmask;
if (i >= 0) {
tmpmask = atom->mask[i];
atom->mask[i] = exclusion_group_bit;
if (q_flag) {
q_tmp = atom->q[i];
atom->q[i] = 0.0;
}
}
if (force->kspace) force->kspace->qsum_qsq();
double energy_after = energy_full();
if (random_equal->uniform() <
ngas*exp(beta*(energy_before - energy_after))/(zz*volume)) {
if (i >= 0) {
atom->avec->copy(atom->nlocal-1,i,1);
atom->nlocal--;
}
atom->natoms--;
if (atom->map_style) atom->map_init();
ndeletion_successes += 1.0;
energy_stored = energy_after;
} else {
if (i >= 0) {
atom->mask[i] = tmpmask;
if (q_flag) atom->q[i] = q_tmp;
}
if (force->kspace) force->kspace->qsum_qsq();
energy_stored = energy_before;
}
update_gas_atoms_list();
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_atomic_insertion_full()
{
double lamda[3];
ninsertion_attempts += 1.0;
double energy_before = energy_stored;
double coord[3];
if (regionflag) {
int region_attempt = 0;
coord[0] = region_xlo + random_equal->uniform() * (region_xhi-region_xlo);
coord[1] = region_ylo + random_equal->uniform() * (region_yhi-region_ylo);
coord[2] = region_zlo + random_equal->uniform() * (region_zhi-region_zlo);
while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
coord[0] = region_xlo + random_equal->uniform() * (region_xhi-region_xlo);
coord[1] = region_ylo + random_equal->uniform() * (region_yhi-region_ylo);
coord[2] = region_zlo + random_equal->uniform() * (region_zhi-region_zlo);
region_attempt++;
if (region_attempt >= max_region_attempts) return;
}
if (triclinic) domain->x2lamda(coord,lamda);
} else {
if (triclinic == 0) {
coord[0] = xlo + random_equal->uniform() * (xhi-xlo);
coord[1] = ylo + random_equal->uniform() * (yhi-ylo);
coord[2] = zlo + random_equal->uniform() * (zhi-zlo);
} else {
lamda[0] = random_equal->uniform();
lamda[1] = random_equal->uniform();
lamda[2] = random_equal->uniform();
// wasteful, but necessary
if (lamda[0] == 1.0) lamda[0] = 0.0;
if (lamda[1] == 1.0) lamda[1] = 0.0;
if (lamda[2] == 1.0) lamda[2] = 0.0;
domain->lamda2x(lamda,coord);
}
}
int proc_flag = 0;
if (triclinic == 0) {
domain->remap(coord);
if (!domain->inside(coord))
error->one(FLERR,"Fix gcmc put atom outside box");
if (coord[0] >= sublo[0] && coord[0] < subhi[0] &&
coord[1] >= sublo[1] && coord[1] < subhi[1] &&
coord[2] >= sublo[2] && coord[2] < subhi[2]) proc_flag = 1;
} else {
if (lamda[0] >= sublo[0] && lamda[0] < subhi[0] &&
lamda[1] >= sublo[1] && lamda[1] < subhi[1] &&
lamda[2] >= sublo[2] && lamda[2] < subhi[2]) proc_flag = 1;
}
if (proc_flag) {
atom->avec->create_atom(ngcmc_type,coord);
int m = atom->nlocal - 1;
// add to groups
// optionally add to type-based groups
atom->mask[m] = groupbitall;
for (int igroup = 0; igroup < ngrouptypes; igroup++) {
if (ngcmc_type == grouptypes[igroup])
atom->mask[m] |= grouptypebits[igroup];
}
atom->v[m][0] = random_unequal->gaussian()*sigma;
atom->v[m][1] = random_unequal->gaussian()*sigma;
atom->v[m][2] = random_unequal->gaussian()*sigma;
if (charge_flag) atom->q[m] = charge;
modify->create_attribute(m);
}
atom->natoms++;
if (atom->tag_enable) {
atom->tag_extend();
if (atom->map_style) atom->map_init();
}
atom->nghost = 0;
if (triclinic) domain->x2lamda(atom->nlocal);
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
if (force->kspace) force->kspace->qsum_qsq();
double energy_after = energy_full();
if (energy_after < MAXENERGYTEST &&
random_equal->uniform() <
zz*volume*exp(beta*(energy_before - energy_after))/(ngas+1)) {
ninsertion_successes += 1.0;
energy_stored = energy_after;
} else {
atom->natoms--;
if (proc_flag) atom->nlocal--;
if (force->kspace) force->kspace->qsum_qsq();
energy_stored = energy_before;
}
update_gas_atoms_list();
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_molecule_translation_full()
{
ntranslation_attempts += 1.0;
if (ngas == 0) return;
tagint translation_molecule = pick_random_gas_molecule();
if (translation_molecule == -1) return;
double energy_before = energy_stored;
double **x = atom->x;
double rx,ry,rz;
double com_displace[3],coord[3];
double rsq = 1.1;
while (rsq > 1.0) {
rx = 2*random_equal->uniform() - 1.0;
ry = 2*random_equal->uniform() - 1.0;
rz = 2*random_equal->uniform() - 1.0;
rsq = rx*rx + ry*ry + rz*rz;
}
com_displace[0] = displace*rx;
com_displace[1] = displace*ry;
com_displace[2] = displace*rz;
int nlocal = atom->nlocal;
if (regionflag) {
int *mask = atom->mask;
for (int i = 0; i < nlocal; i++) {
if (atom->molecule[i] == translation_molecule) {
mask[i] |= molecule_group_bit;
} else {
mask[i] &= molecule_group_inversebit;
}
}
double com[3];
com[0] = com[1] = com[2] = 0.0;
group->xcm(molecule_group,gas_mass,com);
coord[0] = com[0] + displace*rx;
coord[1] = com[1] + displace*ry;
coord[2] = com[2] + displace*rz;
while (domain->regions[iregion]->match(coord[0],coord[1],coord[2]) == 0) {
rsq = 1.1;
while (rsq > 1.0) {
rx = 2*random_equal->uniform() - 1.0;
ry = 2*random_equal->uniform() - 1.0;
rz = 2*random_equal->uniform() - 1.0;
rsq = rx*rx + ry*ry + rz*rz;
}
coord[0] = com[0] + displace*rx;
coord[1] = com[1] + displace*ry;
coord[2] = com[2] + displace*rz;
}
com_displace[0] = displace*rx;
com_displace[1] = displace*ry;
com_displace[2] = displace*rz;
}
for (int i = 0; i < nlocal; i++) {
if (atom->molecule[i] == translation_molecule) {
x[i][0] += com_displace[0];
x[i][1] += com_displace[1];
x[i][2] += com_displace[2];
if (!domain->inside_nonperiodic(x[i]))
error->one(FLERR,"Fix gcmc put atom outside box");
}
}
double energy_after = energy_full();
if (energy_after < MAXENERGYTEST &&
random_equal->uniform() <
exp(beta*(energy_before - energy_after))) {
ntranslation_successes += 1.0;
energy_stored = energy_after;
} else {
energy_stored = energy_before;
for (int i = 0; i < nlocal; i++) {
if (atom->molecule[i] == translation_molecule) {
x[i][0] -= com_displace[0];
x[i][1] -= com_displace[1];
x[i][2] -= com_displace[2];
}
}
}
update_gas_atoms_list();
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_molecule_rotation_full()
{
nrotation_attempts += 1.0;
if (ngas == 0) return;
tagint rotation_molecule = pick_random_gas_molecule();
if (rotation_molecule == -1) return;
double energy_before = energy_stored;
int nlocal = atom->nlocal;
int *mask = atom->mask;
for (int i = 0; i < nlocal; i++) {
if (atom->molecule[i] == rotation_molecule) {
mask[i] |= molecule_group_bit;
} else {
mask[i] &= molecule_group_inversebit;
}
}
double com[3];
com[0] = com[1] = com[2] = 0.0;
group->xcm(molecule_group,gas_mass,com);
// generate point in unit cube
// then restrict to unit sphere
double r[3],rotmat[3][3],quat[4];
double rsq = 1.1;
while (rsq > 1.0) {
r[0] = 2.0*random_equal->uniform() - 1.0;
r[1] = 2.0*random_equal->uniform() - 1.0;
r[2] = 2.0*random_equal->uniform() - 1.0;
rsq = MathExtra::dot3(r, r);
}
double theta = random_equal->uniform() * max_rotation_angle;
MathExtra::norm3(r);
MathExtra::axisangle_to_quat(r,theta,quat);
MathExtra::quat_to_mat(quat,rotmat);
double **x = atom->x;
imageint *image = atom->image;
imageint image_orig[natoms_per_molecule];
int n = 0;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & molecule_group_bit) {
atom_coord[n][0] = x[i][0];
atom_coord[n][1] = x[i][1];
atom_coord[n][2] = x[i][2];
image_orig[n] = image[i];
double xtmp[3];
domain->unmap(x[i],image[i],xtmp);
xtmp[0] -= com[0];
xtmp[1] -= com[1];
xtmp[2] -= com[2];
MathExtra::matvec(rotmat,xtmp,x[i]);
x[i][0] += com[0];
x[i][1] += com[1];
x[i][2] += com[2];
image[i] = imagezero;
domain->remap(x[i],image[i]);
if (!domain->inside(x[i]))
error->one(FLERR,"Fix gcmc put atom outside box");
n++;
}
}
double energy_after = energy_full();
if (energy_after < MAXENERGYTEST &&
random_equal->uniform() <
exp(beta*(energy_before - energy_after))) {
nrotation_successes += 1.0;
energy_stored = energy_after;
} else {
energy_stored = energy_before;
int n = 0;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & molecule_group_bit) {
x[i][0] = atom_coord[n][0];
x[i][1] = atom_coord[n][1];
x[i][2] = atom_coord[n][2];
image[i] = image_orig[n];
n++;
}
}
}
update_gas_atoms_list();
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_molecule_deletion_full()
{
ndeletion_attempts += 1.0;
if (ngas == 0) return;
tagint deletion_molecule = pick_random_gas_molecule();
if (deletion_molecule == -1) return;
double energy_before = energy_stored;
int m = 0;
double q_tmp[natoms_per_molecule];
int tmpmask[atom->nlocal];
for (int i = 0; i < atom->nlocal; i++) {
if (atom->molecule[i] == deletion_molecule) {
tmpmask[i] = atom->mask[i];
atom->mask[i] = exclusion_group_bit;
toggle_intramolecular(i);
if (atom->q_flag) {
q_tmp[m] = atom->q[i];
m++;
atom->q[i] = 0.0;
}
}
}
if (force->kspace) force->kspace->qsum_qsq();
double energy_after = energy_full();
// energy_before corrected by energy_intra
double deltaphi = ngas*exp(beta*((energy_before - energy_intra) - energy_after))/(zz*volume*natoms_per_molecule);
if (random_equal->uniform() < deltaphi) {
int i = 0;
while (i < atom->nlocal) {
if (atom->molecule[i] == deletion_molecule) {
atom->avec->copy(atom->nlocal-1,i,1);
atom->nlocal--;
} else i++;
}
atom->natoms -= natoms_per_molecule;
if (atom->map_style) atom->map_init();
ndeletion_successes += 1.0;
energy_stored = energy_after;
} else {
energy_stored = energy_before;
int m = 0;
for (int i = 0; i < atom->nlocal; i++) {
if (atom->molecule[i] == deletion_molecule) {
atom->mask[i] = tmpmask[i];
toggle_intramolecular(i);
if (atom->q_flag) {
atom->q[i] = q_tmp[m];
m++;
}
}
}
if (force->kspace) force->kspace->qsum_qsq();
}
update_gas_atoms_list();
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::attempt_molecule_insertion_full()
{
double lamda[3];
ninsertion_attempts += 1.0;
double energy_before = energy_stored;
tagint maxmol = 0;
for (int i = 0; i < atom->nlocal; i++) maxmol = MAX(maxmol,atom->molecule[i]);
tagint maxmol_all;
MPI_Allreduce(&maxmol,&maxmol_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
maxmol_all++;
if (maxmol_all >= MAXTAGINT)
error->all(FLERR,"Fix gcmc ran out of available molecule IDs");
int insertion_molecule = maxmol_all;
tagint maxtag = 0;
for (int i = 0; i < atom->nlocal; i++) maxtag = MAX(maxtag,atom->tag[i]);
tagint maxtag_all;
MPI_Allreduce(&maxtag,&maxtag_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
int nlocalprev = atom->nlocal;
double com_coord[3];
if (regionflag) {
int region_attempt = 0;
com_coord[0] = region_xlo + random_equal->uniform() *
(region_xhi-region_xlo);
com_coord[1] = region_ylo + random_equal->uniform() *
(region_yhi-region_ylo);
com_coord[2] = region_zlo + random_equal->uniform() *
(region_zhi-region_zlo);
while (domain->regions[iregion]->match(com_coord[0],com_coord[1],
com_coord[2]) == 0) {
com_coord[0] = region_xlo + random_equal->uniform() *
(region_xhi-region_xlo);
com_coord[1] = region_ylo + random_equal->uniform() *
(region_yhi-region_ylo);
com_coord[2] = region_zlo + random_equal->uniform() *
(region_zhi-region_zlo);
region_attempt++;
if (region_attempt >= max_region_attempts) return;
}
if (triclinic) domain->x2lamda(com_coord,lamda);
} else {
if (triclinic == 0) {
com_coord[0] = xlo + random_equal->uniform() * (xhi-xlo);
com_coord[1] = ylo + random_equal->uniform() * (yhi-ylo);
com_coord[2] = zlo + random_equal->uniform() * (zhi-zlo);
} else {
lamda[0] = random_equal->uniform();
lamda[1] = random_equal->uniform();
lamda[2] = random_equal->uniform();
// wasteful, but necessary
if (lamda[0] == 1.0) lamda[0] = 0.0;
if (lamda[1] == 1.0) lamda[1] = 0.0;
if (lamda[2] == 1.0) lamda[2] = 0.0;
domain->lamda2x(lamda,com_coord);
}
}
// generate point in unit cube
// then restrict to unit sphere
double r[3],rotmat[3][3],quat[4];
double rsq = 1.1;
while (rsq > 1.0) {
r[0] = 2.0*random_equal->uniform() - 1.0;
r[1] = 2.0*random_equal->uniform() - 1.0;
r[2] = 2.0*random_equal->uniform() - 1.0;
rsq = MathExtra::dot3(r, r);
}
double theta = random_equal->uniform() * MY_2PI;
MathExtra::norm3(r);
MathExtra::axisangle_to_quat(r,theta,quat);
MathExtra::quat_to_mat(quat,rotmat);
double vnew[3];
vnew[0] = random_equal->gaussian()*sigma;
vnew[1] = random_equal->gaussian()*sigma;
vnew[2] = random_equal->gaussian()*sigma;
for (int i = 0; i < natoms_per_molecule; i++) {
double xtmp[3];
MathExtra::matvec(rotmat,onemols[imol]->x[i],xtmp);
xtmp[0] += com_coord[0];
xtmp[1] += com_coord[1];
xtmp[2] += com_coord[2];
// need to adjust image flags in remap()
imageint imagetmp = imagezero;
domain->remap(xtmp,imagetmp);
if (!domain->inside(xtmp))
error->one(FLERR,"Fix gcmc put atom outside box");
int proc_flag = 0;
if (triclinic == 0) {
if (xtmp[0] >= sublo[0] && xtmp[0] < subhi[0] &&
xtmp[1] >= sublo[1] && xtmp[1] < subhi[1] &&
xtmp[2] >= sublo[2] && xtmp[2] < subhi[2]) proc_flag = 1;
} else {
domain->x2lamda(xtmp,lamda);
if (lamda[0] >= sublo[0] && lamda[0] < subhi[0] &&
lamda[1] >= sublo[1] && lamda[1] < subhi[1] &&
lamda[2] >= sublo[2] && lamda[2] < subhi[2]) proc_flag = 1;
}
if (proc_flag) {
atom->avec->create_atom(onemols[imol]->type[i],xtmp);
int m = atom->nlocal - 1;
// add to groups
// optionally add to type-based groups
atom->mask[m] = groupbitall;
for (int igroup = 0; igroup < ngrouptypes; igroup++) {
if (ngcmc_type == grouptypes[igroup])
atom->mask[m] |= grouptypebits[igroup];
}
atom->image[m] = imagetmp;
atom->molecule[m] = insertion_molecule;
if (maxtag_all+i+1 >= MAXTAGINT)
error->all(FLERR,"Fix gcmc ran out of available atom IDs");
atom->tag[m] = maxtag_all + i + 1;
atom->v[m][0] = vnew[0];
atom->v[m][1] = vnew[1];
atom->v[m][2] = vnew[2];
atom->add_molecule_atom(onemols[imol],i,m,maxtag_all);
modify->create_attribute(m);
}
}
// FixRigidSmall::set_molecule stores rigid body attributes
// FixShake::set_molecule stores shake info for molecule
if (rigidflag)
fixrigid->set_molecule(nlocalprev,maxtag_all,imol,com_coord,vnew,quat);
else if (shakeflag)
fixshake->set_molecule(nlocalprev,maxtag_all,imol,com_coord,vnew,quat);
atom->natoms += natoms_per_molecule;
if (atom->natoms < 0)
error->all(FLERR,"Too many total atoms");
atom->nbonds += onemols[imol]->nbonds;
atom->nangles += onemols[imol]->nangles;
atom->ndihedrals += onemols[imol]->ndihedrals;
atom->nimpropers += onemols[imol]->nimpropers;
if (atom->map_style) atom->map_init();
atom->nghost = 0;
if (triclinic) domain->x2lamda(atom->nlocal);
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
if (force->kspace) force->kspace->qsum_qsq();
double energy_after = energy_full();
// energy_after corrected by energy_intra
double deltaphi = zz*volume*natoms_per_molecule*
exp(beta*(energy_before - (energy_after - energy_intra)))/(ngas + natoms_per_molecule);
if (energy_after < MAXENERGYTEST &&
random_equal->uniform() < deltaphi) {
ninsertion_successes += 1.0;
energy_stored = energy_after;
} else {
atom->nbonds -= onemols[imol]->nbonds;
atom->nangles -= onemols[imol]->nangles;
atom->ndihedrals -= onemols[imol]->ndihedrals;
atom->nimpropers -= onemols[imol]->nimpropers;
atom->natoms -= natoms_per_molecule;
energy_stored = energy_before;
int i = 0;
while (i < atom->nlocal) {
if (atom->molecule[i] == insertion_molecule) {
atom->avec->copy(atom->nlocal-1,i,1);
atom->nlocal--;
} else i++;
}
if (force->kspace) force->kspace->qsum_qsq();
}
update_gas_atoms_list();
}
/* ----------------------------------------------------------------------
compute particle's interaction energy with the rest of the system
------------------------------------------------------------------------- */
double FixGCMC::energy(int i, int itype, tagint imolecule, double *coord)
{
double delx,dely,delz,rsq;
double **x = atom->x;
int *type = atom->type;
tagint *molecule = atom->molecule;
int nall = atom->nlocal + atom->nghost;
pair = force->pair;
cutsq = force->pair->cutsq;
double fpair = 0.0;
double factor_coul = 1.0;
double factor_lj = 1.0;
double total_energy = 0.0;
for (int j = 0; j < nall; j++) {
if (i == j) continue;
if (mode == MOLECULE)
if (imolecule == molecule[j]) continue;
delx = coord[0] - x[j][0];
dely = coord[1] - x[j][1];
delz = coord[2] - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
int jtype = type[j];
// if overlap check requested, if overlap,
// return signal value for energy
- if (overlap_flag && rsq < overlap_cutoff)
+ if (overlap_flag && rsq < overlap_cutoffsq)
return MAXENERGYSIGNAL;
if (rsq < cutsq[itype][jtype])
total_energy +=
pair->single(i,j,itype,jtype,rsq,factor_coul,factor_lj,fpair);
}
return total_energy;
}
/* ----------------------------------------------------------------------
compute the energy of the given gas molecule in its current position
sum across all procs that own atoms of the given molecule
------------------------------------------------------------------------- */
double FixGCMC::molecule_energy(tagint gas_molecule_id)
{
double mol_energy = 0.0;
for (int i = 0; i < atom->nlocal; i++)
if (atom->molecule[i] == gas_molecule_id) {
mol_energy += energy(i,atom->type[i],gas_molecule_id,atom->x[i]);
}
double mol_energy_sum = 0.0;
MPI_Allreduce(&mol_energy,&mol_energy_sum,1,MPI_DOUBLE,MPI_SUM,world);
return mol_energy_sum;
}
/* ----------------------------------------------------------------------
compute system potential energy
------------------------------------------------------------------------- */
double FixGCMC::energy_full()
{
int imolecule;
if (triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
comm->exchange();
atom->nghost = 0;
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
if (modify->n_pre_neighbor) modify->pre_neighbor();
neighbor->build();
int eflag = 1;
int vflag = 0;
// if overlap check requested, if overlap,
// return signal value for energy
if (overlap_flag) {
int overlaptestall;
int overlaptest = 0;
double delx,dely,delz,rsq;
double **x = atom->x;
tagint *molecule = atom->molecule;
int nall = atom->nlocal + atom->nghost;
for (int i = 0; i < atom->nlocal; i++) {
if (mode == MOLECULE) imolecule = molecule[i];
for (int j = i+1; j < nall; j++) {
if (mode == MOLECULE)
if (imolecule == molecule[j]) continue;
delx = x[i][0] - x[j][0];
dely = x[i][1] - x[j][1];
delz = x[i][2] - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
- if (rsq < overlap_cutoff) {
+ if (rsq < overlap_cutoffsq) {
overlaptest = 1;
break;
}
}
if (overlaptest) break;
}
MPI_Allreduce(&overlaptest, &overlaptestall, 1,
MPI_INT, MPI_MAX, world);
if (overlaptestall) return MAXENERGYSIGNAL;
}
// clear forces so they don't accumulate over multiple
// calls within fix gcmc timestep, e.g. for fix shake
size_t nbytes = sizeof(double) * (atom->nlocal + atom->nghost);
if (nbytes) memset(&atom->f[0][0],0,3*nbytes);
if (modify->n_pre_force) modify->pre_force(vflag);
if (force->pair) force->pair->compute(eflag,vflag);
if (atom->molecular) {
if (force->bond) force->bond->compute(eflag,vflag);
if (force->angle) force->angle->compute(eflag,vflag);
if (force->dihedral) force->dihedral->compute(eflag,vflag);
if (force->improper) force->improper->compute(eflag,vflag);
}
if (force->kspace) force->kspace->compute(eflag,vflag);
// unlike Verlet, not performing a reverse_comm() or forces here
// b/c GCMC does not care about forces
// don't think it will mess up energy due to any post_force() fixes
if (modify->n_post_force) modify->post_force(vflag);
if (modify->n_end_of_step) modify->end_of_step();
// NOTE: all fixes with THERMO_ENERGY mask set and which
// operate at pre_force() or post_force() or end_of_step()
// and which user has enable via fix_modify thermo yes,
// will contribute to total MC energy via pe->compute_scalar()
update->eflag_global = update->ntimestep;
double total_energy = c_pe->compute_scalar();
return total_energy;
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
int FixGCMC::pick_random_gas_atom()
{
int i = -1;
int iwhichglobal = static_cast<int> (ngas*random_equal->uniform());
if ((iwhichglobal >= ngas_before) &&
(iwhichglobal < ngas_before + ngas_local)) {
int iwhichlocal = iwhichglobal - ngas_before;
i = local_gas_list[iwhichlocal];
}
return i;
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
tagint FixGCMC::pick_random_gas_molecule()
{
int iwhichglobal = static_cast<int> (ngas*random_equal->uniform());
tagint gas_molecule_id = 0;
if ((iwhichglobal >= ngas_before) &&
(iwhichglobal < ngas_before + ngas_local)) {
int iwhichlocal = iwhichglobal - ngas_before;
int i = local_gas_list[iwhichlocal];
gas_molecule_id = atom->molecule[i];
}
tagint gas_molecule_id_all = 0;
MPI_Allreduce(&gas_molecule_id,&gas_molecule_id_all,1,
MPI_LMP_TAGINT,MPI_MAX,world);
return gas_molecule_id_all;
}
/* ----------------------------------------------------------------------
------------------------------------------------------------------------- */
void FixGCMC::toggle_intramolecular(int i)
{
if (atom->avec->bonds_allow)
for (int m = 0; m < atom->num_bond[i]; m++)
atom->bond_type[i][m] = -atom->bond_type[i][m];
if (atom->avec->angles_allow)
for (int m = 0; m < atom->num_angle[i]; m++)
atom->angle_type[i][m] = -atom->angle_type[i][m];
if (atom->avec->dihedrals_allow)
for (int m = 0; m < atom->num_dihedral[i]; m++)
atom->dihedral_type[i][m] = -atom->dihedral_type[i][m];
if (atom->avec->impropers_allow)
for (int m = 0; m < atom->num_improper[i]; m++)
atom->improper_type[i][m] = -atom->improper_type[i][m];
}
/* ----------------------------------------------------------------------
update the list of gas atoms
------------------------------------------------------------------------- */
void FixGCMC::update_gas_atoms_list()
{
int nlocal = atom->nlocal;
int *mask = atom->mask;
tagint *molecule = atom->molecule;
double **x = atom->x;
if (atom->nmax > gcmc_nmax) {
memory->sfree(local_gas_list);
gcmc_nmax = atom->nmax;
local_gas_list = (int *) memory->smalloc(gcmc_nmax*sizeof(int),
"GCMC:local_gas_list");
}
ngas_local = 0;
if (regionflag) {
if (mode == MOLECULE) {
tagint maxmol = 0;
for (int i = 0; i < nlocal; i++) maxmol = MAX(maxmol,molecule[i]);
tagint maxmol_all;
MPI_Allreduce(&maxmol,&maxmol_all,1,MPI_LMP_TAGINT,MPI_MAX,world);
double comx[maxmol_all];
double comy[maxmol_all];
double comz[maxmol_all];
for (int imolecule = 0; imolecule < maxmol_all; imolecule++) {
for (int i = 0; i < nlocal; i++) {
if (molecule[i] == imolecule) {
mask[i] |= molecule_group_bit;
} else {
mask[i] &= molecule_group_inversebit;
}
}
double com[3];
com[0] = com[1] = com[2] = 0.0;
group->xcm(molecule_group,gas_mass,com);
// remap unwrapped com into periodic box
domain->remap(com);
comx[imolecule] = com[0];
comy[imolecule] = com[1];
comz[imolecule] = com[2];
}
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) {
if (domain->regions[iregion]->match(comx[molecule[i]],
comy[molecule[i]],comz[molecule[i]]) == 1) {
local_gas_list[ngas_local] = i;
ngas_local++;
}
}
}
} else {
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) {
if (domain->regions[iregion]->match(x[i][0],x[i][1],x[i][2]) == 1) {
local_gas_list[ngas_local] = i;
ngas_local++;
}
}
}
}
} else {
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) {
local_gas_list[ngas_local] = i;
ngas_local++;
}
}
}
MPI_Allreduce(&ngas_local,&ngas,1,MPI_INT,MPI_SUM,world);
MPI_Scan(&ngas_local,&ngas_before,1,MPI_INT,MPI_SUM,world);
ngas_before -= ngas_local;
}
/* ----------------------------------------------------------------------
return acceptance ratios
------------------------------------------------------------------------- */
double FixGCMC::compute_vector(int n)
{
if (n == 0) return ntranslation_attempts;
if (n == 1) return ntranslation_successes;
if (n == 2) return ninsertion_attempts;
if (n == 3) return ninsertion_successes;
if (n == 4) return ndeletion_attempts;
if (n == 5) return ndeletion_successes;
if (n == 6) return nrotation_attempts;
if (n == 7) return nrotation_successes;
return 0.0;
}
/* ----------------------------------------------------------------------
memory usage of local atom-based arrays
------------------------------------------------------------------------- */
double FixGCMC::memory_usage()
{
double bytes = gcmc_nmax * sizeof(int);
return bytes;
}
/* ----------------------------------------------------------------------
pack entire state of Fix into one write
------------------------------------------------------------------------- */
void FixGCMC::write_restart(FILE *fp)
{
int n = 0;
double list[4];
list[n++] = random_equal->state();
list[n++] = random_unequal->state();
list[n++] = next_reneighbor;
if (comm->me == 0) {
int size = n * sizeof(double);
fwrite(&size,sizeof(int),1,fp);
fwrite(list,sizeof(double),n,fp);
}
}
/* ----------------------------------------------------------------------
use state info from restart file to restart the Fix
------------------------------------------------------------------------- */
void FixGCMC::restart(char *buf)
{
int n = 0;
double *list = (double *) buf;
seed = static_cast<int> (list[n++]);
random_equal->reset(seed);
seed = static_cast<int> (list[n++]);
random_unequal->reset(seed);
next_reneighbor = static_cast<int> (list[n++]);
}
diff --git a/src/MC/fix_gcmc.h b/src/MC/fix_gcmc.h
index 2519c0096..8a5375eed 100644
--- a/src/MC/fix_gcmc.h
+++ b/src/MC/fix_gcmc.h
@@ -1,304 +1,304 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef FIX_CLASS
FixStyle(gcmc,FixGCMC)
#else
#ifndef LMP_FIX_GCMC_H
#define LMP_FIX_GCMC_H
#include <stdio.h>
#include "fix.h"
namespace LAMMPS_NS {
class FixGCMC : public Fix {
public:
FixGCMC(class LAMMPS *, int, char **);
~FixGCMC();
int setmask();
void init();
void pre_exchange();
void attempt_atomic_translation();
void attempt_atomic_deletion();
void attempt_atomic_insertion();
void attempt_molecule_translation();
void attempt_molecule_rotation();
void attempt_molecule_deletion();
void attempt_molecule_insertion();
void attempt_atomic_translation_full();
void attempt_atomic_deletion_full();
void attempt_atomic_insertion_full();
void attempt_molecule_translation_full();
void attempt_molecule_rotation_full();
void attempt_molecule_deletion_full();
void attempt_molecule_insertion_full();
double energy(int, int, tagint, double *);
double molecule_energy(tagint);
double energy_full();
int pick_random_gas_atom();
tagint pick_random_gas_molecule();
void toggle_intramolecular(int);
void update_gas_atoms_list();
double compute_vector(int);
double memory_usage();
void write_restart(FILE *);
void restart(char *);
private:
int molecule_group,molecule_group_bit;
int molecule_group_inversebit;
int exclusion_group,exclusion_group_bit;
int ngcmc_type,nevery,seed;
int ncycles,nexchanges,nmcmoves;
int ngas; // # of gas atoms on all procs
int ngas_local; // # of gas atoms on this proc
int ngas_before; // # of gas atoms on procs < this proc
int mode; // ATOM or MOLECULE
int regionflag; // 0 = anywhere in box, 1 = specific region
int iregion; // gcmc region
char *idregion; // gcmc region id
bool pressure_flag; // true if user specified reservoir pressure
bool charge_flag; // true if user specified atomic charge
bool full_flag; // true if doing full system energy calculations
int natoms_per_molecule; // number of atoms in each gas molecule
int groupbitall; // group bitmask for inserted atoms
int ngroups; // number of group-ids for inserted atoms
char** groupstrings; // list of group-ids for inserted atoms
int ngrouptypes; // number of type-based group-ids for inserted atoms
char** grouptypestrings; // list of type-based group-ids for inserted atoms
int* grouptypebits; // list of type-based group bitmasks
int* grouptypes; // list of type-based group types
double ntranslation_attempts;
double ntranslation_successes;
double nrotation_attempts;
double nrotation_successes;
double ndeletion_attempts;
double ndeletion_successes;
double ninsertion_attempts;
double ninsertion_successes;
int gcmc_nmax;
int max_region_attempts;
double gas_mass;
double reservoir_temperature;
double tfac_insert;
double chemical_potential;
double displace;
double max_rotation_angle;
double beta,zz,sigma,volume;
double pressure,fugacity_coeff,charge;
double xlo,xhi,ylo,yhi,zlo,zhi;
double region_xlo,region_xhi,region_ylo,region_yhi,region_zlo,region_zhi;
double region_volume;
double energy_stored; // full energy of old/current configuration
double *sublo,*subhi;
int *local_gas_list;
double **cutsq;
double **atom_coord;
imageint imagezero;
- double overlap_cutoff;
+ double overlap_cutoffsq; // square distance cutoff for overlap
int overlap_flag;
double energy_intra;
class Pair *pair;
class RanPark *random_equal;
class RanPark *random_unequal;
class Atom *model_atom;
class Molecule **onemols;
int imol,nmol;
double **coords;
imageint *imageflags;
class Fix *fixrigid, *fixshake;
int rigidflag, shakeflag;
char *idrigid, *idshake;
int triclinic; // 0 = orthog box, 1 = triclinic
class Compute *c_pe;
void options(int, char **);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Fix gcmc does not (yet) work with atom_style template
Self-explanatory.
E: Fix gcmc region does not support a bounding box
Not all regions represent bounded volumes. You cannot use
such a region with the fix gcmc command.
E: Fix gcmc region cannot be dynamic
Only static regions can be used with fix gcmc.
E: Fix gcmc region extends outside simulation box
Self-explanatory.
E: Fix gcmc molecule must have coordinates
The defined molecule does not specify coordinates.
E: Fix gcmc molecule must have atom types
The defined molecule does not specify atom types.
E: Atom type must be zero in fix gcmc mol command
Self-explanatory.
E: Fix gcmc molecule has charges, but atom style does not
Self-explanatory.
E: Fix gcmc molecule template ID must be same as atom_style template ID
When using atom_style template, you cannot insert molecules that are
not in that template.
E: Fix gcmc atom has charge, but atom style does not
Self-explanatory.
E: Cannot use fix gcmc shake and not molecule
Self-explanatory.
E: Molecule template ID for fix gcmc does not exist
Self-explanatory.
W: Molecule template for fix gcmc has multiple molecules
The fix gcmc command will only create molecules of a single type,
i.e. the first molecule in the template.
E: Region ID for fix gcmc does not exist
Self-explanatory.
W: Fix gcmc using full_energy option
Fix gcmc has automatically turned on the full_energy option since it
is required for systems like the one specified by the user. User input
included one or more of the following: kspace, a hybrid
pair style, an eam pair style, tail correction,
or no "single" function for the pair style.
W: Energy of old configuration in fix gcmc is > MAXENERGYTEST.
This probably means that a pair of atoms are closer than the
overlap cutoff distance for keyword overlap_cutoff.
E: Invalid atom type in fix gcmc command
The atom type specified in the gcmc command does not exist.
E: Fix gcmc cannot exchange individual atoms belonging to a molecule
This is an error since you should not delete only one atom of a
molecule. The user has specified atomic (non-molecular) gas
exchanges, but an atom belonging to a molecule could be deleted.
E: All mol IDs should be set for fix gcmc group atoms
The molecule flag is on, yet not all molecule ids in the fix group
have been set to non-zero positive values by the user. This is an
error since all atoms in the fix gcmc group are eligible for deletion,
rotation, and translation and therefore must have valid molecule ids.
E: Fix gcmc molecule command requires that atoms have molecule attributes
Should not choose the gcmc molecule feature if no molecules are being
simulated. The general molecule flag is off, but gcmc's molecule flag
is on.
E: Fix gcmc shake fix does not exist
Self-explanatory.
E: Fix gcmc and fix shake not using same molecule template ID
Self-explanatory.
E: Fix gcmc can not currently be used with fix rigid or fix rigid/small
Self-explanatory.
E: Cannot use fix gcmc in a 2d simulation
Fix gcmc is set up to run in 3d only. No 2d simulations with fix gcmc
are allowed.
E: Could not find fix gcmc exclusion group ID
Self-explanatory.
E: Could not find fix gcmc rotation group ID
Self-explanatory.
E: Illegal fix gcmc gas mass <= 0
The computed mass of the designated gas molecule or atom type was less
than or equal to zero.
E: Cannot do GCMC on atoms in atom_modify first group
This is a restriction due to the way atoms are organized in a list to
enable the atom_modify first command.
E: Could not find specified fix gcmc group ID
Self-explanatory.
E: Fix gcmc put atom outside box
This should not normally happen. Contact the developers.
E: Fix gcmc ran out of available molecule IDs
See the setting for tagint in the src/lmptype.h file.
E: Fix gcmc ran out of available atom IDs
See the setting for tagint in the src/lmptype.h file.
E: Too many total atoms
See the setting for bigint in the src/lmptype.h file.
*/
diff --git a/src/MOLECULE/pair_lj_charmmfsw_coul_charmmfsh.cpp b/src/MOLECULE/pair_lj_charmmfsw_coul_charmmfsh.cpp
index c75da63ca..af19f3eb3 100644
--- a/src/MOLECULE/pair_lj_charmmfsw_coul_charmmfsh.cpp
+++ b/src/MOLECULE/pair_lj_charmmfsw_coul_charmmfsh.cpp
@@ -1,546 +1,546 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Paul Crozier (SNL)
The lj-fsw/coul-fsh (force-switched and force-shifted) sections
were provided by Robert Meissner
and Lucio Colombi Ciacchi of Bremen University, Bremen, Germany,
with additional assistance from Robert A. Latour, Clemson University
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "pair_lj_charmmfsw_coul_charmmfsh.h"
#include "atom.h"
#include "comm.h"
#include "force.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
PairLJCharmmfswCoulCharmmfsh::PairLJCharmmfswCoulCharmmfsh(LAMMPS *lmp) :
Pair(lmp)
{
implicit = 0;
mix_flag = ARITHMETIC;
writedata = 1;
+
+ // short-range/long-range flag accessed by DihedralCharmmfsw
+
+ dihedflag = 0;
}
/* ---------------------------------------------------------------------- */
PairLJCharmmfswCoulCharmmfsh::~PairLJCharmmfswCoulCharmmfsh()
{
if (!copymode) {
if (allocated) {
memory->destroy(setflag);
memory->destroy(cutsq);
memory->destroy(epsilon);
memory->destroy(sigma);
memory->destroy(eps14);
memory->destroy(sigma14);
memory->destroy(lj1);
memory->destroy(lj2);
memory->destroy(lj3);
memory->destroy(lj4);
memory->destroy(lj14_1);
memory->destroy(lj14_2);
memory->destroy(lj14_3);
memory->destroy(lj14_4);
}
}
}
/* ---------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::compute(int eflag, int vflag)
{
int i,j,ii,jj,inum,jnum,itype,jtype;
double qtmp,xtmp,ytmp,ztmp,delx,dely,delz,evdwl,evdwl12,evdwl6,ecoul,fpair;
double r,rinv,r3inv,rsq,r2inv,r6inv,forcecoul,forcelj,factor_coul,factor_lj;
double switch1;
int *ilist,*jlist,*numneigh,**firstneigh;
evdwl = ecoul = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
double **x = atom->x;
double **f = atom->f;
double *q = atom->q;
int *type = atom->type;
int nlocal = atom->nlocal;
double *special_coul = force->special_coul;
double *special_lj = force->special_lj;
int newton_pair = force->newton_pair;
double qqrd2e = force->qqrd2e;
inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
// loop over neighbors of my atoms
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
qtmp = q[i];
xtmp = x[i][0];
ytmp = x[i][1];
ztmp = x[i][2];
itype = type[i];
jlist = firstneigh[i];
jnum = numneigh[i];
for (jj = 0; jj < jnum; jj++) {
j = jlist[jj];
factor_lj = special_lj[sbmask(j)];
factor_coul = special_coul[sbmask(j)];
j &= NEIGHMASK;
delx = xtmp - x[j][0];
dely = ytmp - x[j][1];
delz = ztmp - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq < cut_bothsq) {
r2inv = 1.0/rsq;
r = sqrt(rsq);
if (rsq < cut_coulsq) {
forcecoul = qqrd2e * qtmp*q[j]*
(sqrt(r2inv) - r*cut_coulinv*cut_coulinv);
} else forcecoul = 0.0;
if (rsq < cut_ljsq) {
r6inv = r2inv*r2inv*r2inv;
jtype = type[j];
forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
if (rsq > cut_lj_innersq) {
switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
(cut_ljsq + 2.0*rsq - 3.0*cut_lj_innersq) / denom_lj;
forcelj = forcelj*switch1;
}
} else forcelj = 0.0;
fpair = (factor_coul*forcecoul + factor_lj*forcelj) * r2inv;
f[i][0] += delx*fpair;
f[i][1] += dely*fpair;
f[i][2] += delz*fpair;
if (newton_pair || j < nlocal) {
f[j][0] -= delx*fpair;
f[j][1] -= dely*fpair;
f[j][2] -= delz*fpair;
}
if (eflag) {
if (rsq < cut_coulsq) {
ecoul = qqrd2e * qtmp*q[j]*
(sqrt(r2inv) + cut_coulinv*cut_coulinv*r - 2.0*cut_coulinv);
ecoul *= factor_coul;
} else ecoul = 0.0;
if (rsq < cut_ljsq) {
if (rsq > cut_lj_innersq) {
rinv = 1.0/r;
r3inv = rinv*rinv*rinv;
evdwl12 = lj3[itype][jtype]*cut_lj6*denom_lj12 *
(r6inv - cut_lj6inv)*(r6inv - cut_lj6inv);
evdwl6 = -lj4[itype][jtype]*cut_lj3*denom_lj6 *
(r3inv - cut_lj3inv)*(r3inv - cut_lj3inv);;
evdwl = evdwl12 + evdwl6;
} else {
evdwl12 = r6inv*lj3[itype][jtype]*r6inv -
lj3[itype][jtype]*cut_lj_inner6inv*cut_lj6inv;
evdwl6 = -lj4[itype][jtype]*r6inv +
lj4[itype][jtype]*cut_lj_inner3inv*cut_lj3inv;
evdwl = evdwl12 + evdwl6;
}
evdwl *= factor_lj;
} else evdwl = 0.0;
}
if (evflag) ev_tally(i,j,nlocal,newton_pair,
evdwl,ecoul,fpair,delx,dely,delz);
}
}
}
if (vflag_fdotr) virial_fdotr_compute();
}
/* ----------------------------------------------------------------------
allocate all arrays
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::allocate()
{
allocated = 1;
int n = atom->ntypes;
memory->create(setflag,n+1,n+1,"pair:setflag");
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j++)
setflag[i][j] = 0;
memory->create(cutsq,n+1,n+1,"pair:cutsq");
memory->create(epsilon,n+1,n+1,"pair:epsilon");
memory->create(sigma,n+1,n+1,"pair:sigma");
memory->create(eps14,n+1,n+1,"pair:eps14");
memory->create(sigma14,n+1,n+1,"pair:sigma14");
memory->create(lj1,n+1,n+1,"pair:lj1");
memory->create(lj2,n+1,n+1,"pair:lj2");
memory->create(lj3,n+1,n+1,"pair:lj3");
memory->create(lj4,n+1,n+1,"pair:lj4");
memory->create(lj14_1,n+1,n+1,"pair:lj14_1");
memory->create(lj14_2,n+1,n+1,"pair:lj14_2");
memory->create(lj14_3,n+1,n+1,"pair:lj14_3");
memory->create(lj14_4,n+1,n+1,"pair:lj14_4");
}
/* ----------------------------------------------------------------------
global settings
unlike other pair styles,
there are no individual pair settings that these override
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::settings(int narg, char **arg)
{
if (narg != 2 && narg != 3)
error->all(FLERR,"Illegal pair_style command");
cut_lj_inner = force->numeric(FLERR,arg[0]);
cut_lj = force->numeric(FLERR,arg[1]);
if (narg == 2) {
cut_coul = cut_lj;
} else {
cut_coul = force->numeric(FLERR,arg[2]);
}
-
- // indicates pair_style being used for dihedral_charmm
-
- dihedflag = 0;
}
/* ----------------------------------------------------------------------
set coeffs for one or more type pairs
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::coeff(int narg, char **arg)
{
if (narg != 4 && narg != 6)
error->all(FLERR,"Incorrect args for pair coefficients");
if (!allocated) allocate();
int ilo,ihi,jlo,jhi;
force->bounds(FLERR,arg[0],atom->ntypes,ilo,ihi);
force->bounds(FLERR,arg[1],atom->ntypes,jlo,jhi);
double epsilon_one = force->numeric(FLERR,arg[2]);
double sigma_one = force->numeric(FLERR,arg[3]);
double eps14_one = epsilon_one;
double sigma14_one = sigma_one;
if (narg == 6) {
eps14_one = force->numeric(FLERR,arg[4]);
sigma14_one = force->numeric(FLERR,arg[5]);
}
int count = 0;
for (int i = ilo; i <= ihi; i++) {
for (int j = MAX(jlo,i); j <= jhi; j++) {
epsilon[i][j] = epsilon_one;
sigma[i][j] = sigma_one;
eps14[i][j] = eps14_one;
sigma14[i][j] = sigma14_one;
setflag[i][j] = 1;
count++;
}
}
if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
}
/* ----------------------------------------------------------------------
init specific to this pair style
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::init_style()
{
if (!atom->q_flag)
error->all(FLERR,"Pair style lj/charmmfsw/coul/charmmfsh "
"requires atom attribute q");
neighbor->request(this,instance_me);
// require cut_lj_inner < cut_lj
if (cut_lj_inner >= cut_lj)
error->all(FLERR,"Pair inner lj cutoff >= Pair outer lj cutoff");
cut_lj_innersq = cut_lj_inner * cut_lj_inner;
cut_ljsq = cut_lj * cut_lj;
cut_ljinv = 1.0/cut_lj;
cut_lj_innerinv = 1.0/cut_lj_inner;
cut_lj3 = cut_lj * cut_lj * cut_lj;
cut_lj3inv = cut_ljinv * cut_ljinv * cut_ljinv;
cut_lj_inner3inv = cut_lj_innerinv * cut_lj_innerinv * cut_lj_innerinv;
cut_lj_inner3 = cut_lj_inner * cut_lj_inner * cut_lj_inner;
cut_lj6 = cut_ljsq * cut_ljsq * cut_ljsq;
cut_lj6inv = cut_lj3inv * cut_lj3inv;
cut_lj_inner6inv = cut_lj_inner3inv * cut_lj_inner3inv;
cut_lj_inner6 = cut_lj_innersq * cut_lj_innersq * cut_lj_innersq;
cut_coulsq = cut_coul * cut_coul;
cut_coulinv = 1.0/cut_coul;
cut_bothsq = MAX(cut_ljsq,cut_coulsq);
denom_lj = (cut_ljsq-cut_lj_innersq) * (cut_ljsq-cut_lj_innersq) *
(cut_ljsq-cut_lj_innersq);
denom_lj12 = 1.0/(cut_lj6 - cut_lj_inner6);
denom_lj6 = 1.0/(cut_lj3 - cut_lj_inner3);
}
/* ----------------------------------------------------------------------
init for one type pair i,j and corresponding j,i
------------------------------------------------------------------------- */
double PairLJCharmmfswCoulCharmmfsh::init_one(int i, int j)
{
if (setflag[i][j] == 0) {
epsilon[i][j] = mix_energy(epsilon[i][i],epsilon[j][j],
sigma[i][i],sigma[j][j]);
sigma[i][j] = mix_distance(sigma[i][i],sigma[j][j]);
eps14[i][j] = mix_energy(eps14[i][i],eps14[j][j],
sigma14[i][i],sigma14[j][j]);
sigma14[i][j] = mix_distance(sigma14[i][i],sigma14[j][j]);
}
double cut = MAX(cut_lj,cut_coul);
lj1[i][j] = 48.0 * epsilon[i][j] * pow(sigma[i][j],12.0);
lj2[i][j] = 24.0 * epsilon[i][j] * pow(sigma[i][j],6.0);
lj3[i][j] = 4.0 * epsilon[i][j] * pow(sigma[i][j],12.0);
lj4[i][j] = 4.0 * epsilon[i][j] * pow(sigma[i][j],6.0);
lj14_1[i][j] = 48.0 * eps14[i][j] * pow(sigma14[i][j],12.0);
lj14_2[i][j] = 24.0 * eps14[i][j] * pow(sigma14[i][j],6.0);
lj14_3[i][j] = 4.0 * eps14[i][j] * pow(sigma14[i][j],12.0);
lj14_4[i][j] = 4.0 * eps14[i][j] * pow(sigma14[i][j],6.0);
lj1[j][i] = lj1[i][j];
lj2[j][i] = lj2[i][j];
lj3[j][i] = lj3[i][j];
lj4[j][i] = lj4[i][j];
lj14_1[j][i] = lj14_1[i][j];
lj14_2[j][i] = lj14_2[i][j];
lj14_3[j][i] = lj14_3[i][j];
lj14_4[j][i] = lj14_4[i][j];
return cut;
}
/* ----------------------------------------------------------------------
proc 0 writes to data file
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::write_data(FILE *fp)
{
for (int i = 1; i <= atom->ntypes; i++)
fprintf(fp,"%d %g %g %g %g\n",
i,epsilon[i][i],sigma[i][i],eps14[i][i],sigma14[i][i]);
}
/* ----------------------------------------------------------------------
proc 0 writes all pairs to data file
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::write_data_all(FILE *fp)
{
for (int i = 1; i <= atom->ntypes; i++)
for (int j = i; j <= atom->ntypes; j++)
fprintf(fp,"%d %d %g %g %g %g\n",i,j,
epsilon[i][j],sigma[i][j],eps14[i][j],sigma14[i][j]);
}
/* ----------------------------------------------------------------------
proc 0 writes to restart file
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::write_restart(FILE *fp)
{
write_restart_settings(fp);
int i,j;
for (i = 1; i <= atom->ntypes; i++)
for (j = i; j <= atom->ntypes; j++) {
fwrite(&setflag[i][j],sizeof(int),1,fp);
if (setflag[i][j]) {
fwrite(&epsilon[i][j],sizeof(double),1,fp);
fwrite(&sigma[i][j],sizeof(double),1,fp);
fwrite(&eps14[i][j],sizeof(double),1,fp);
fwrite(&sigma14[i][j],sizeof(double),1,fp);
}
}
}
/* ----------------------------------------------------------------------
proc 0 reads from restart file, bcasts
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::read_restart(FILE *fp)
{
read_restart_settings(fp);
allocate();
int i,j;
int me = comm->me;
for (i = 1; i <= atom->ntypes; i++)
for (j = i; j <= atom->ntypes; j++) {
if (me == 0) fread(&setflag[i][j],sizeof(int),1,fp);
MPI_Bcast(&setflag[i][j],1,MPI_INT,0,world);
if (setflag[i][j]) {
if (me == 0) {
fread(&epsilon[i][j],sizeof(double),1,fp);
fread(&sigma[i][j],sizeof(double),1,fp);
fread(&eps14[i][j],sizeof(double),1,fp);
fread(&sigma14[i][j],sizeof(double),1,fp);
}
MPI_Bcast(&epsilon[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&sigma[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&eps14[i][j],1,MPI_DOUBLE,0,world);
MPI_Bcast(&sigma14[i][j],1,MPI_DOUBLE,0,world);
}
}
}
/* ----------------------------------------------------------------------
proc 0 writes to restart file
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::write_restart_settings(FILE *fp)
{
fwrite(&cut_lj_inner,sizeof(double),1,fp);
fwrite(&cut_lj,sizeof(double),1,fp);
fwrite(&cut_coul,sizeof(double),1,fp);
fwrite(&offset_flag,sizeof(int),1,fp);
fwrite(&mix_flag,sizeof(int),1,fp);
}
/* ----------------------------------------------------------------------
proc 0 reads from restart file, bcasts
------------------------------------------------------------------------- */
void PairLJCharmmfswCoulCharmmfsh::read_restart_settings(FILE *fp)
{
if (comm->me == 0) {
fread(&cut_lj_inner,sizeof(double),1,fp);
fread(&cut_lj,sizeof(double),1,fp);
fread(&cut_coul,sizeof(double),1,fp);
fread(&offset_flag,sizeof(int),1,fp);
fread(&mix_flag,sizeof(int),1,fp);
}
MPI_Bcast(&cut_lj_inner,1,MPI_DOUBLE,0,world);
MPI_Bcast(&cut_lj,1,MPI_DOUBLE,0,world);
MPI_Bcast(&cut_coul,1,MPI_DOUBLE,0,world);
MPI_Bcast(&offset_flag,1,MPI_INT,0,world);
MPI_Bcast(&mix_flag,1,MPI_INT,0,world);
}
/* ---------------------------------------------------------------------- */
double PairLJCharmmfswCoulCharmmfsh::
single(int i, int j, int itype, int jtype,
double rsq, double factor_coul, double factor_lj, double &fforce)
{
double r,rinv,r2inv,r3inv,r6inv,forcecoul,forcelj;
double phicoul,philj,philj12,philj6;
double switch1;
r2inv = 1.0/rsq;
r = sqrt(rsq);
rinv = 1.0/r;
if (rsq < cut_coulsq) {
forcecoul = force->qqrd2e * atom->q[i]*atom->q[j] *
(sqrt(r2inv) - r*cut_coulinv*cut_coulinv);
} else forcecoul = 0.0;
if (rsq < cut_ljsq) {
r6inv = r2inv*r2inv*r2inv;
r3inv = rinv*rinv*rinv;
forcelj = r6inv * (lj1[itype][jtype]*r6inv - lj2[itype][jtype]);
if (rsq > cut_lj_innersq) {
switch1 = (cut_ljsq-rsq) * (cut_ljsq-rsq) *
(cut_ljsq + 2.0*rsq - 3.0*cut_lj_innersq) / denom_lj;
forcelj = forcelj*switch1;
}
} else forcelj = 0.0;
fforce = (factor_coul*forcecoul + factor_lj*forcelj) * r2inv;
double eng = 0.0;
if (rsq < cut_coulsq) {
phicoul = force->qqrd2e * atom->q[i]*atom->q[j] *
(sqrt(r2inv) + cut_coulinv*cut_coulinv*r - 2.0*cut_coulinv);
eng += factor_coul*phicoul;
}
if (rsq < cut_ljsq) {
if (rsq > cut_lj_innersq) {
philj12 = lj3[itype][jtype]*cut_lj6*denom_lj12 *
(r6inv - cut_lj6inv)*(r6inv - cut_lj6inv);
philj6 = -lj4[itype][jtype]*cut_lj3*denom_lj6 *
(r3inv - cut_lj3inv)*(r3inv - cut_lj3inv);;
philj = philj12 + philj6;
} else {
philj12 = r6inv*lj3[itype][jtype]*r6inv -
lj3[itype][jtype]*cut_lj_inner6inv*cut_lj6inv;
philj6 = -lj4[itype][jtype]*r6inv +
lj4[itype][jtype]*cut_lj_inner3inv*cut_lj3inv;
philj = philj12 + philj6;
}
eng += factor_lj*philj;
}
return eng;
}
/* ---------------------------------------------------------------------- */
void *PairLJCharmmfswCoulCharmmfsh::extract(const char *str, int &dim)
{
dim = 2;
if (strcmp(str,"lj14_1") == 0) return (void *) lj14_1;
if (strcmp(str,"lj14_2") == 0) return (void *) lj14_2;
if (strcmp(str,"lj14_3") == 0) return (void *) lj14_3;
if (strcmp(str,"lj14_4") == 0) return (void *) lj14_4;
dim = 0;
if (strcmp(str,"implicit") == 0) return (void *) &implicit;
- // info extracted by dihedral_charmmf
+ // info extracted by dihedral_charmmfsw
if (strcmp(str,"cut_coul") == 0) return (void *) &cut_coul;
if (strcmp(str,"cut_lj_inner") == 0) return (void *) &cut_lj_inner;
if (strcmp(str,"cut_lj") == 0) return (void *) &cut_lj;
if (strcmp(str,"dihedflag") == 0) return (void *) &dihedflag;
return NULL;
}
diff --git a/src/Makefile b/src/Makefile
index 59f954014..32f9c3787 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -1,379 +1,381 @@
# LAMMPS multiple-machine -*- Makefile -*-
SHELL = /bin/bash
PYTHON = python
#.IGNORE:
# Definitions
ROOT = lmp
EXE = lmp_$@
ARLIB = liblammps_$@.a
SHLIB = liblammps_$@.so
ARLINK = liblammps.a
SHLINK = liblammps.so
OBJDIR = Obj_$@
OBJSHDIR = Obj_shared_$@
SRC = $(wildcard *.cpp)
INC = $(wildcard *.h)
OBJ = $(SRC:.cpp=.o)
SRCLIB = $(filter-out main.cpp,$(SRC))
OBJLIB = $(filter-out main.o,$(OBJ))
# Command-line options for mode: exe (default), shexe, lib, shlib
mode = exe
objdir = $(OBJDIR)
ifeq ($(mode),shexe)
objdir = $(OBJSHDIR)
endif
ifeq ($(mode),lib)
objdir = $(OBJDIR)
endif
ifeq ($(mode),shlib)
objdir = $(OBJSHDIR)
endif
# Package variables
# PACKAGE = standard packages
# PACKUSER = user packagse
# PACKLIB = all packages that require an additional lib
+# should be PACKSYS + PACKINT + PACKEXT
# PACKSYS = subset that reqiure a common system library
+# include MPIIO and LB b/c require full MPI, not just STUBS
# PACKINT = subset that require an internal (provided) library
# PACKEXT = subset that require an external (downloaded) library
-# PACKLIB = PACKSYS + PACKING + PACKEXT
-# PACKSCRIPT = libs under lammps/lib that have an Install.py script
PACKAGE = asphere body class2 colloid compress coreshell dipole gpu \
granular kim kokkos kspace manybody mc meam misc molecule \
mpiio mscg opt peri poems \
python qeq reax replica rigid shock snap srd voronoi
-PACKUSER = user-atc user-awpmd user-cg-cmm user-cgdna user-colvars \
+PACKUSER = user-atc user-awpmd user-cgdna user-cgsdk user-colvars \
user-diffraction user-dpd user-drude user-eff user-fep user-h5md \
user-intel user-lb user-manifold user-mgpt user-misc user-molfile \
- user-nc-dump user-omp user-phonon user-qmmm user-qtb \
+ user-netcdf user-omp user-phonon user-qmmm user-qtb \
user-quip user-reaxc user-smd user-smtbq user-sph user-tally \
user-vtk
PACKLIB = compress gpu kim kokkos meam mpiio mscg poems \
python reax voronoi \
- user-atc user-awpmd user-colvars user-h5md user-molfile \
- user-nc-dump user-qmmm user-quip user-smd user-vtk
+ user-atc user-awpmd user-colvars user-h5md user-lb user-molfile \
+ user-netcdf user-qmmm user-quip user-smd user-vtk
-PACKSYS = compress mpiio python
+PACKSYS = compress mpiio python user-lb
PACKINT = gpu kokkos meam poems reax user-atc user-awpmd user-colvars
PACKEXT = kim mscg voronoi \
- user-h5md user-molfile user-nc-dump user-qmmm user-quip \
+ user-h5md user-molfile user-netcdf user-qmmm user-quip \
user-smd user-vtk
-PACKSCRIPT = voronoi
-
PACKALL = $(PACKAGE) $(PACKUSER)
PACKAGEUC = $(shell echo $(PACKAGE) | tr a-z A-Z)
PACKUSERUC = $(shell echo $(PACKUSER) | tr a-z A-Z)
YESDIR = $(shell echo $(@:yes-%=%) | tr a-z A-Z)
NODIR = $(shell echo $(@:no-%=%) | tr a-z A-Z)
LIBDIR = $(shell echo $(@:lib-%=%))
+LIBUSERDIR = $(shell echo $(@:lib-user-%=%))
# List of all targets
help:
@echo ''
@echo 'make clean-all delete all object files'
@echo 'make clean-machine delete object files for one machine'
@echo 'make mpi-stubs build dummy MPI library in STUBS'
@echo 'make install-python install LAMMPS wrapper in Python'
@echo 'make tar create lmp_src.tar.gz for src dir and packages'
@echo ''
@echo 'make package list available packages and their dependencies'
@echo 'make package-status (ps) status of all packages'
@echo 'make yes-package install a single pgk in src dir'
@echo 'make no-package remove a single pkg from src dir'
@echo 'make yes-all install all pgks in src dir'
@echo 'make no-all remove all pkgs from src dir'
@echo 'make yes-standard (yes-std) install all standard pkgs'
@echo 'make no-standard (no-std) remove all standard pkgs'
@echo 'make yes-user install all user pkgs'
@echo 'make no-user remove all user pkgs'
- @echo 'make yes-lib install all pkgs with libs (incldued or ext)'
+ @echo 'make yes-lib install all pkgs with libs (included or ext)'
@echo 'make no-lib remove all pkgs with libs (included or ext)'
@echo 'make yes-ext install all pkgs with external libs'
@echo 'make no-ext remove all pkgs with external libs'
@echo ''
@echo 'make package-update (pu) replace src files with updated package files'
@echo 'make package-overwrite replace package files with src files'
@echo 'make package-diff (pd) diff src files against package files'
@echo ''
@echo 'make lib-package download/build/install a package library'
@echo 'make purge purge obsolete copies of source files'
@echo ''
@echo 'make machine build LAMMPS for machine'
@echo 'make mode=lib machine build LAMMPS as static lib for machine'
@echo 'make mode=shlib machine build LAMMPS as shared lib for machine'
@echo 'make mode=shexe machine build LAMMPS as shared exe for machine'
@echo 'make makelist create Makefile.list used by old makes'
@echo 'make -f Makefile.list machine build LAMMPS for machine (old)'
@echo ''
@echo 'machine is one of these from src/MAKE:'
@echo ''
@files="`ls MAKE/Makefile.*`"; \
for file in $$files; do head -1 $$file; done
@echo ''
@echo '... or one of these from src/MAKE/OPTIONS:'
@echo ''
@files="`ls MAKE/OPTIONS/Makefile.*`"; \
for file in $$files; do head -1 $$file; done
@echo ''
@echo '... or one of these from src/MAKE/MACHINES:'
@echo ''
@files="`ls MAKE/MACHINES/Makefile.*`"; \
for file in $$files; do head -1 $$file; done
@echo ''
@echo '... or one of these from src/MAKE/MINE:'
@echo ''
@files="`ls MAKE/MINE/Makefile.* 2>/dev/null`"; \
for file in $$files; do head -1 $$file; done
@echo ''
# Build LAMMPS in one of 4 modes
# exe = exe with static compile in Obj_machine (default)
# shexe = exe with shared compile in Obj_shared_machine
# lib = static lib in Obj_machine
# shlib = shared lib in Obj_shared_machine
.DEFAULT:
@if [ $@ = "serial" -a ! -f STUBS/libmpi_stubs.a ]; \
then $(MAKE) mpi-stubs; fi
@test -f MAKE/Makefile.$@ -o -f MAKE/OPTIONS/Makefile.$@ -o \
-f MAKE/MACHINES/Makefile.$@ -o -f MAKE/MINE/Makefile.$@
@if [ ! -d $(objdir) ]; then mkdir $(objdir); fi
@$(SHELL) Make.sh style
@if [ -f MAKE/MACHINES/Makefile.$@ ]; \
then cp MAKE/MACHINES/Makefile.$@ $(objdir)/Makefile; fi
@if [ -f MAKE/OPTIONS/Makefile.$@ ]; \
then cp MAKE/OPTIONS/Makefile.$@ $(objdir)/Makefile; fi
@if [ -f MAKE/Makefile.$@ ]; \
then cp MAKE/Makefile.$@ $(objdir)/Makefile; fi
@if [ -f MAKE/MINE/Makefile.$@ ]; \
then cp MAKE/MINE/Makefile.$@ $(objdir)/Makefile; fi
@if [ ! -e Makefile.package ]; \
then cp Makefile.package.empty Makefile.package; fi
@if [ ! -e Makefile.package.settings ]; \
then cp Makefile.package.settings.empty Makefile.package.settings; fi
@cp Makefile.package Makefile.package.settings $(objdir)
@cd $(objdir); rm -f .depend; \
$(MAKE) $(MFLAGS) "SRC = $(SRC)" "INC = $(INC)" depend || :
ifeq ($(mode),exe)
@cd $(objdir); \
$(MAKE) $(MFLAGS) "OBJ = $(OBJ)" "INC = $(INC)" "SHFLAGS =" \
"EXE = ../$(EXE)" ../$(EXE)
endif
ifeq ($(mode),shexe)
@cd $(objdir); \
$(MAKE) $(MFLAGS) "OBJ = $(OBJ)" "INC = $(INC)" \
"EXE = ../$(EXE)" ../$(EXE)
endif
ifeq ($(mode),lib)
@cd $(objdir); \
$(MAKE) $(MFLAGS) "OBJ = $(OBJLIB)" "INC = $(INC)" "SHFLAGS =" \
"EXE = ../$(ARLIB)" lib
@rm -f $(ARLINK)
@ln -s $(ARLIB) $(ARLINK)
endif
ifeq ($(mode),shlib)
@cd $(objdir); \
$(MAKE) $(MFLAGS) "OBJ = $(OBJLIB)" "INC = $(INC)" \
"EXE = ../$(SHLIB)" shlib
@rm -f $(SHLINK)
@ln -s $(SHLIB) $(SHLINK)
endif
# Remove machine-specific object files
clean:
@echo 'make clean-all delete all object files'
@echo 'make clean-machine delete object files for one machine'
clean-all:
rm -rf Obj_*
clean-%:
rm -rf Obj_$(@:clean-%=%) Obj_shared_$(@:clean-%=%)
# Create Makefile.list
makelist:
@$(SHELL) Make.sh style
@$(SHELL) Make.sh Makefile.list
# Make MPI STUBS library
mpi-stubs:
@cd STUBS; $(MAKE) clean; $(MAKE)
# install LAMMPS shared lib and Python wrapper for Python usage
# include python package settings to automatically adapt name of python interpreter
sinclude ../lib/python/Makefile.lammps
install-python:
@$(PYTHON) ../python/install.py
# Create a tarball of src dir and packages
tar:
@cd STUBS; $(MAKE) clean
@cd ..; tar cvzf src/$(ROOT)_src.tar.gz \
src/Make* src/Package.sh src/Depend.sh src/Install.sh \
src/MAKE src/DEPEND src/*.cpp src/*.h src/STUBS \
$(patsubst %,src/%,$(PACKAGEUC)) $(patsubst %,src/%,$(PACKUSERUC)) \
--exclude=*/.svn
@cd STUBS; $(MAKE)
@echo "Created $(ROOT)_src.tar.gz"
# Package management
package:
@echo 'Standard packages:' $(PACKAGE)
@echo ''
@echo 'User-contributed packages:' $(PACKUSER)
@echo ''
@echo 'Packages that need system libraries:' $(PACKSYS)
@echo ''
@echo 'Packages that need provided libraries:' $(PACKINT)
@echo ''
@echo 'Packages that need external libraries:' $(PACKEXT)
@echo ''
@echo 'make package list available packages'
@echo 'make package list available packages'
@echo 'make package-status (ps) status of all packages'
@echo 'make yes-package install a single pgk in src dir'
@echo 'make no-package remove a single pkg from src dir'
@echo 'make yes-all install all pgks in src dir'
@echo 'make no-all remove all pkgs from src dir'
@echo 'make yes-standard (yes-std) install all standard pkgs'
@echo 'make no-standard (no-srd) remove all standard pkgs'
@echo 'make yes-user install all user pkgs'
@echo 'make no-user remove all user pkgs'
@echo 'make yes-lib install all pkgs with libs (included or ext)'
@echo 'make no-lib remove all pkgs with libs (included or ext)'
@echo 'make yes-ext install all pkgs with external libs'
@echo 'make no-ext remove all pkgs with external libs'
@echo ''
@echo 'make package-update (pu) replace src files with package files'
@echo 'make package-overwrite replace package files with src files'
@echo 'make package-diff (pd) diff src files against package file'
@echo ''
- @echo 'make lib-package download/build/install a package library'
+ @echo 'make lib-package build and/or download a package library'
yes-all:
@for p in $(PACKALL); do $(MAKE) yes-$$p; done
no-all:
@for p in $(PACKALL); do $(MAKE) no-$$p; done
yes-standard yes-std:
@for p in $(PACKAGE); do $(MAKE) yes-$$p; done
no-standard no-std:
@for p in $(PACKAGE); do $(MAKE) no-$$p; done
yes-user:
@for p in $(PACKUSER); do $(MAKE) yes-$$p; done
no-user:
@for p in $(PACKUSER); do $(MAKE) no-$$p; done
yes-lib:
@for p in $(PACKLIB); do $(MAKE) yes-$$p; done
no-lib:
@for p in $(PACKLIB); do $(MAKE) no-$$p; done
yes-ext:
@for p in $(PACKEXT); do $(MAKE) yes-$$p; done
no-ext:
@for p in $(PACKEXT); do $(MAKE) no-$$p; done
yes-%:
@if [ ! -e Makefile.package ]; \
then cp Makefile.package.empty Makefile.package; fi
@if [ ! -e Makefile.package.settings ]; \
then cp Makefile.package.settings.empty Makefile.package.settings; fi
@if [ ! -e $(YESDIR) ]; then \
echo "Package $(@:yes-%=%) does not exist"; \
elif [ -e $(YESDIR)/Install.sh ]; then \
echo "Installing package $(@:yes-%=%)"; \
cd $(YESDIR); $(SHELL) Install.sh 1; cd ..; \
$(SHELL) Depend.sh $(YESDIR) 1; \
else \
echo "Installing package $(@:yes-%=%)"; \
cd $(YESDIR); $(SHELL) ../Install.sh 1; cd ..; \
$(SHELL) Depend.sh $(YESDIR) 1; \
fi;
no-%:
@if [ ! -e $(NODIR) ]; then \
echo "Package $(@:no-%=%) does not exist"; \
elif [ -e $(NODIR)/Install.sh ]; then \
echo "Uninstalling package $(@:no-%=%)"; \
cd $(NODIR); $(SHELL) Install.sh 0; cd ..; \
$(SHELL) Depend.sh $(NODIR) 0; \
else \
echo "Uninstalling package $(@:no-%=%)"; \
cd $(NODIR); $(SHELL) ../Install.sh 0; cd ..; \
$(SHELL) Depend.sh $(NODIR) 0; \
fi;
# download/build/install a package library
lib-%:
- @if [ ! -e ../lib/$(LIBDIR)/Install.py ]; then \
- echo "Install script for lib $(@:lib-%=%) does not exist"; \
- else \
- echo "Installing lib for package $(@:lib-%=%)"; \
+ @if [ -e ../lib/$(LIBDIR)/Install.py ]; then \
+ echo "Installing lib $(@:lib-%=%)"; \
cd ../lib/$(LIBDIR); python Install.py $(args); \
+ elif [ -e ../lib/$(LIBUSERDIR)/Install.py ]; then \
+ echo "Installing lib $(@:lib-user-%=%)"; \
+ cd ../lib/$(LIBUSERDIR); python Install.py $(args); \
+ else \
+ echo "Install script for lib $(@:lib-%=%) does not exist"; \
fi;
# status = list src files that differ from package files
# update = replace src files with newer package files
# overwrite = overwrite package files with newer src files
# diff = show differences between src and package files
# purge = delete obsolete and auto-generated package files
package-status ps:
@for p in $(PACKAGEUC); do $(SHELL) Package.sh $$p status; done
@echo ''
@for p in $(PACKUSERUC); do $(SHELL) Package.sh $$p status; done
package-update pu:
@for p in $(PACKAGEUC); do $(SHELL) Package.sh $$p update; done
@echo ''
@for p in $(PACKUSERUC); do $(SHELL) Package.sh $$p update; done
package-overwrite:
@for p in $(PACKAGEUC); do $(SHELL) Package.sh $$p overwrite; done
@echo ''
@for p in $(PACKUSERUC); do $(SHELL) Package.sh $$p overwrite; done
package-diff pd:
@for p in $(PACKAGEUC); do $(SHELL) Package.sh $$p diff; done
@echo ''
@for p in $(PACKUSERUC); do $(SHELL) Package.sh $$p diff; done
purge: Purge.list
@echo 'Purging obsolete and auto-generated source files'
@for f in `grep -v '#' Purge.list` ; \
do test -f $$f && rm $$f && echo $$f || : ; \
done
diff --git a/src/Purge.list b/src/Purge.list
index 554c5df82..6326dbadf 100644
--- a/src/Purge.list
+++ b/src/Purge.list
@@ -1,451 +1,470 @@
# auto-generated style files
style_angle.h
style_atom.h
style_bond.h
style_command.h
style_compute.h
style_dihedral.h
style_dump.h
style_fix.h
style_improper.h
style_integrate.h
style_kspace.h
style_minimize.h
style_pair.h
style_region.h
style_neigh_bin.h
style_neigh_pair.h
style_neigh_stencil.h
+# deleted on 4 May 2017
+pair_reax_c.cpp
+pair_reax_c.h
+fix_reax_c_bonds.cpp
+fix_reax_c_bonds.h
+fix_reax_c_species.cpp
+fix_reax_c_species.h
+pair_reax_c_kokkos.cpp
+pair_reax_c_kokkos.h
+fix_reax_c_bonds_kokkos.cpp
+fix_reax_c_bonds_kokkos.h
+fix_reax_c_species_kokkos.cpp
+fix_reax_c_species_kokkos.h
+# deleted on 19 April 2017
+vmdplugin.h
+molfile_plugin.h
+# deleted on 13 April 2017
+dihedral_charmmfsh.cpp
+dihedral_charmmfsh.h
# deleted on ## XXX 2016
accelerator_intel.h
neigh_bond.cpp
neigh_bond.h
neigh_derive.cpp
neigh_derive.h
neigh_full.cpp
neigh_full.h
neigh_gran.cpp
neigh_gran.h
neigh_half_bin.cpp
neigh_half_bin.h
neigh_half_multi.cpp
neigh_half_multi.h
neigh_half_nsq.cpp
neigh_half_nsq.h
neigh_respa.cpp
neigh_respa.h
neigh_shardlow.cpp
neigh_shardlow.h
neigh_stencil.cpp
neigh_half_bin_intel.cpp
neigh_full_kokkos.h
neighbor_omp.h
neigh_derive_omp.cpp
neigh_full_omp.cpp
neigh_gran_omp.cpp
neigh_half_bin_omp.cpp
neigh_half_multi_omp.cpp
neigh_half_nsq_omp.cpp
neigh_respa_omp.cpp
# deleted on 20 Sep 2016
fix_ti_rs.cpp
fix_ti_rs.h
# deleted on 31 May 2016
fix_ave_spatial_sphere.cpp
fix_ave_spatial_sphere.h
atom_vec_angle_cuda.cpp
atom_vec_angle_cuda.h
atom_vec_atomic_cuda.cpp
atom_vec_atomic_cuda.h
atom_vec_charge_cuda.cpp
atom_vec_charge_cuda.h
atom_vec_full_cuda.cpp
atom_vec_full_cuda.h
comm_cuda.cpp
comm_cuda.h
compute_pe_cuda.cpp
compute_pe_cuda.h
compute_pressure_cuda.cpp
compute_pressure_cuda.h
compute_temp_cuda.cpp
compute_temp_cuda.h
compute_temp_partial_cuda.cpp
compute_temp_partial_cuda.h
cuda.cpp
cuda_data.h
cuda_modify_flags.h
cuda_neigh_list.cpp
cuda_neigh_list.h
domain_cuda.cpp
domain_cuda.h
fft3d_cuda.cpp
fft3d_cuda.h
fft3d_wrap_cuda.cpp
fft3d_wrap_cuda.h
fix_addforce_cuda.cpp
fix_addforce_cuda.h
fix_aveforce_cuda.cpp
fix_aveforce_cuda.h
fix_enforce2d_cuda.cpp
fix_enforce2d_cuda.h
fix_freeze_cuda.cpp
fix_freeze_cuda.h
fix_gravity_cuda.cpp
fix_gravity_cuda.h
fix_nh_cuda.cpp
fix_nh_cuda.h
fix_npt_cuda.cpp
fix_npt_cuda.h
fix_nve_cuda.cpp
fix_nve_cuda.h
fix_nvt_cuda.cpp
fix_nvt_cuda.h
fix_set_force_cuda.cpp
fix_set_force_cuda.h
fix_shake_cuda.cpp
fix_shake_cuda.h
fix_temp_berendsen_cuda.cpp
fix_temp_berendsen_cuda.h
fix_temp_rescale_cuda.cpp
fix_temp_rescale_cuda.h
fix_temp_rescale_limit_cuda.cpp
fix_temp_rescale_limit_cuda.h
fix_viscous_cuda.cpp
fix_viscous_cuda.h
modify_cuda.cpp
modify_cuda.h
neighbor_cuda.cpp
neighbor_cuda.h
neigh_full_cuda.cpp
pair_born_coul_long_cuda.cpp
pair_born_coul_long_cuda.h
pair_buck_coul_cut_cuda.cpp
pair_buck_coul_cut_cuda.h
pair_buck_coul_long_cuda.cpp
pair_buck_coul_long_cuda.h
pair_buck_cuda.cpp
pair_buck_cuda.h
pair_eam_alloy_cuda.cpp
pair_eam_alloy_cuda.h
pair_eam_cuda.cpp
pair_eam_cuda.h
pair_eam_fs_cuda.cpp
pair_eam_fs_cuda.h
pair_gran_hooke_cuda.cpp
pair_gran_hooke_cuda.h
pair_lj96_cut_cuda.cpp
pair_lj96_cut_cuda.h
pair_lj_charmm_coul_charmm_cuda.cpp
pair_lj_charmm_coul_charmm_cuda.h
pair_lj_charmm_coul_charmm_implicit_cuda.cpp
pair_lj_charmm_coul_charmm_implicit_cuda.h
pair_lj_charmm_coul_long_cuda.cpp
pair_lj_charmm_coul_long_cuda.h
pair_lj_class2_coul_cut_cuda.cpp
pair_lj_class2_coul_cut_cuda.h
pair_lj_class2_coul_long_cuda.cpp
pair_lj_class2_coul_long_cuda.h
pair_lj_class2_cuda.cpp
pair_lj_class2_cuda.h
pair_lj_cut_coul_cut_cuda.cpp
pair_lj_cut_coul_cut_cuda.h
pair_lj_cut_coul_debye_cuda.cpp
pair_lj_cut_coul_debye_cuda.h
pair_lj_cut_coul_long_cuda.cpp
pair_lj_cut_coul_long_cuda.h
pair_lj_cut_cuda.cpp
pair_lj_cut_cuda.h
pair_lj_cut_experimental_cuda.cpp
pair_lj_cut_experimental_cuda.h
pair_lj_expand_cuda.cpp
pair_lj_expand_cuda.h
pair_lj_gromacs_coul_gromacs_cuda.cpp
pair_lj_gromacs_coul_gromacs_cuda.h
pair_lj_gromacs_cuda.cpp
pair_lj_gromacs_cuda.h
pair_lj_sdk_coul_long_cuda.cpp
pair_lj_sdk_coul_long_cuda.h
pair_lj_sdk_cuda.cpp
pair_lj_sdk_cuda.h
pair_lj_smooth_cuda.cpp
pair_lj_smooth_cuda.h
pair_morse_cuda.cpp
pair_morse_cuda.h
pair_sw_cuda.cpp
pair_sw_cuda.h
pair_tersoff_cuda.cpp
pair_tersoff_cuda.h
pair_tersoff_zbl_cuda.cpp
pair_tersoff_zbl_cuda.h
pppm_cuda.cpp
pppm_cuda.h
pppm_old.cpp
pppm_old.h
user_cuda.h
verlet_cuda.cpp
verlet_cuda.h
# deleted on 11 May 2016
pair_dpd_conservative.cpp
pair_dpd_conservative.h
# deleted on 21 Mar 2016
verlet_intel.cpp
verlet_intel.h
verlet_split_intel.cpp
verlet_split_intel.h
# deleted on 15 Jan 2016
pair_line_lj_omp.cpp
pair_line_lj_omp.h
pair_tri_lj_omp.cpp
pair_tri_lj_omp.h
# deleted on 13 May 14
commgrid.cpp
commgrid.h
# deleted on 5 May 14
reaxc_basic_comm.cpp
reaxc_basic_comm.h
# deleted on 15 Apr 14
pppm_old.cpp
pppm_old.h
# deleted on Thu Jun 6 15:19:12 2013 +0000
pair_dipole_cut.h
pair_dipole_cut.cpp
pair_dipole_cut_gpu.h
pair_dipole_cut_gpu.cpp
pair_dipole_cut_omp.h
pair_dipole_cut_omp.cpp
pair_dipole_sf.h
pair_dipole_sf.cpp
pair_dipole_sf_omp.h
pair_dipole_sf_omp.cpp
pair_dipole_sf_gpu.h
pair_dipole_sf_gpu.cpp
# deleted on Wed May 8 15:24:36 2013 +0000
compute_spec_atom.cpp
compute_spec_atom.h
fix_species.cpp
fix_species.h
# deleted on Fri Oct 19 15:27:15 2012 +0000
pair_lj_charmm_coul_long_proxy_omp.cpp
pair_lj_charmm_coul_long_proxy_omp.h
pair_lj_class2_coul_long_proxy_omp.cpp
pair_lj_class2_coul_long_proxy_omp.h
pair_lj_cut_coul_long_proxy_omp.cpp
pair_lj_cut_coul_long_proxy_omp.h
pair_lj_cut_tip4p_long_proxy_omp.cpp
pair_lj_cut_tip4p_long_proxy_omp.h
pppm_proxy.cpp
pppm_proxy.h
pppm_tip4p_proxy.cpp
pppm_tip4p_proxy.h
# deleted on Wed Oct 3 15:17:27 2012 +0000
pair_lj_cut_coul_long_proxy_tip4p_omp.cpp
pair_lj_cut_coul_long_proxy_tip4p_omp.h
# deleted on Wed Oct 3 15:06:24 2012 +0000
pair_lj_cut_coul_long_tip4p_opt.cpp
pair_lj_cut_coul_long_tip4p_opt.h
# deleted on Wed Oct 3 14:53:43 2012 +0000
pair_lj_charmm_coul_long_proxy_omp.cpp
pair_lj_charmm_coul_long_proxy_omp.h
pair_lj_class2_coul_long_proxy_omp.cpp
pair_lj_class2_coul_long_proxy_omp.h
pair_lj_cut_coul_long_proxy_omp.cpp
pair_lj_cut_coul_long_proxy_omp.h
pair_lj_cut_coul_long_tip4p_omp.cpp
pair_lj_cut_coul_long_tip4p_omp.h
# deleted on Wed Oct 3 14:50:44 2012 +0000
pair_buck_disp_coul_long_omp.cpp
pair_buck_disp_coul_long_omp.h
pair_lj_disp_coul_long_omp.cpp
pair_lj_disp_coul_long_omp.h
# deleted on Wed Oct 3 14:46:42 2012 +0000
pair_lj_cut_coul_long_tip4p.cpp
pair_lj_cut_coul_long_tip4p.h
# deleted on Wed Oct 3 14:46:23 2012 +0000
pair_buck_disp_coul_long.cpp
pair_buck_disp_coul_long.h
pair_lj_disp_coul_long.cpp
pair_lj_disp_coul_long.h
pair_lj_disp_coul_long_tip4p.cpp
pair_lj_disp_coul_long_tip4p.h
# deleted on Tue Oct 2 22:50:58 2012 +0000
pair_buck_coul_omp.cpp
pair_buck_coul_omp.h
pair_lj_coul_omp.cpp
pair_lj_coul_omp.h
# deleted on Tue Oct 2 20:12:27 2012 +0000
pair_lj_charmm_coul_pppm_omp.cpp
pair_lj_charmm_coul_pppm_omp.h
pair_lj_class2_coul_pppm_omp.cpp
pair_lj_class2_coul_pppm_omp.h
pair_lj_cut_coul_pppm_omp.cpp
pair_lj_cut_coul_pppm_omp.h
pair_lj_cut_coul_pppm_tip4p_omp.cpp
pair_lj_cut_coul_pppm_tip4p_omp.h
# deleted on Tue Oct 2 19:59:40 2012 +0000
pair_buck_coul_omp.cpp
pair_buck_coul_omp.h
pair_lj_coul_omp.cpp
pair_lj_coul_omp.h
pair_lj_cut_coul_long_tip4p_omp.cpp
pair_lj_cut_coul_long_tip4p_omp.h
pppm_proxy.cpp
pppm_proxy.h
pppm_tip4p_proxy.cpp
pppm_tip4p_proxy.h
# deleted on Tue Oct 2 19:58:21 2012 +0000
pair_lj_cut_coul_pppm_omp.cpp
pair_lj_cut_coul_pppm_omp.h
pair_lj_cut_coul_pppm_tip4p_omp.cpp
pair_lj_cut_coul_pppm_tip4p_omp.h
# deleted on Tue Oct 2 19:58:03 2012 +0000
pair_lj_charmm_coul_pppm_omp.cpp
pair_lj_charmm_coul_pppm_omp.h
pair_lj_class2_coul_pppm_omp.cpp
pair_lj_class2_coul_pppm_omp.h
# deleted on Tue Oct 2 16:36:24 2012 +0000
ewald_n.cpp
ewald_n.h
pair_buck_coul.cpp
pair_buck_coul.h
pair_lj_coul.cpp
pair_lj_coul.h
# deleted on Wed Jul 25 15:17:24 2012 +0000
pair_lj_sdk_coul_cut_cuda.cpp
pair_lj_sdk_coul_cut_cuda.h
pair_lj_sdk_coul_debye_cuda.cpp
pair_lj_sdk_coul_debye_cuda.h
# deleted on Tue Jul 24 14:55:49 2012 +0000
pair_cg_cmm_coul_cut_cuda.cpp
pair_cg_cmm_coul_cut_cuda.h
pair_cg_cmm_coul_debye_cuda.cpp
pair_cg_cmm_coul_debye_cuda.h
pair_cg_cmm_coul_long_cuda.cpp
pair_cg_cmm_coul_long_cuda.h
pair_cg_cmm_cuda.cpp
pair_cg_cmm_cuda.h
# deleted on Sat Dec 31 20:27:05 2011 -0500
ewald_cg.cpp
ewald_cg.h
# deleted on Sat Dec 31 20:01:21 2011 -0500
dihedral_omp.cpp
dihedral_omp.h
pair_cg_cmm_omp.cpp
pair_cg_cmm_omp.h
pair_lj_cut_coul_long_tip4p_omp.cpp
pair_lj_cut_coul_long_tip4p_omp.h
pair_omp.cpp
pair_omp.h
# deleted on Thu Dec 8 23:13:51 2011 +0000
pair_cg_cmm_coul_long_gpu.cpp
pair_cg_cmm_coul_long_gpu.h
pair_cg_cmm_gpu.cpp
pair_cg_cmm_gpu.h
# deleted on Mon Nov 7 19:32:59 2011 -0500
pair_cg_cmm_coul_long_gpu.cpp
pair_cg_cmm_coul_long_gpu.h
pair_cg_cmm_gpu.cpp
pair_cg_cmm_gpu.h
# deleted on Tue Oct 25 23:04:03 2011 -0400
lj_sdk_common.cpp
# deleted on Fri Oct 7 08:55:40 2011 -0400
pair_hybrid_overlay_omp.cpp
pair_hybrid_overlay_omp.h
# deleted on Fri Oct 7 08:54:38 2011 -0400
angle_hybrid_omp.cpp
angle_hybrid_omp.h
bond_hybrid_omp.cpp
bond_hybrid_omp.h
dihedral_hybrid_omp.cpp
dihedral_hybrid_omp.h
improper_hybrid_omp.cpp
improper_hybrid_omp.h
pair_hybrid_omp.cpp
pair_hybrid_omp.h
# deleted on Mon Aug 22 13:48:15 2011 -0400
omp_thr.cpp
omp_thr.h
# deleted on Mon Aug 8 22:56:28 2011 +0000
dihedral_cosineshiftexp.cpp
dihedral_cosineshiftexp.h
# deleted on Mon Aug 8 22:55:20 2011 +0000
angle_cosineshift.cpp
angle_cosineshift.h
angle_cosineshiftexp.cpp
angle_cosineshiftexp.h
# deleted on Mon Aug 8 19:25:08 2011 +0000
pppm_gpu_double.cpp
pppm_gpu_double.h
pppm_gpu_single.cpp
pppm_gpu_single.h
# deleted on Fri Apr 15 20:57:03 2011 -0400
pair_lj_charmm_coul_long_gpu2.cpp
pair_lj_charmm_coul_long_gpu2.h
# deleted on Wed Apr 13 21:40:14 2011 +0000
atom_vec_colloid.cpp
atom_vec_colloid.h
atom_vec_granular.cpp
atom_vec_granular.h
# deleted on Fri Nov 19 12:53:07 2010 -0500
fix_pour_omp.cpp
fix_pour_omp.h
# deleted on Thu Aug 19 23:20:14 2010 +0000
fix_qeq.cpp
fix_qeq.h
# deleted on Thu Jun 17 01:34:38 2010 +0000
compute_vsum.cpp
compute_vsum.h
# deleted on Mon Jun 14 11:06:46 2010 -0400
pair_buck_coul_omp.cpp
pair_buck_coul_omp.h
pair_lj_coul_omp.cpp
pair_lj_coul_omp.h
# deleted on Thu Jun 10 15:39:08 2010 -0400
pair_buck_coul_omp.cpp
pair_buck_coul_omp.h
# deleted on Tue Jun 8 15:42:51 2010 -0400
pair_buck_coul_omp.cpp
pair_buck_coul_omp.h
# deleted on Thu Dec 17 23:52:31 2009 +0000
dump_bond.cpp
dump_bond.h
# deleted on Mon Nov 9 18:20:20 2009 +0000
atom_vec_dpd.cpp
atom_vec_dpd.h
style_dpd.h
# deleted on Mon Jun 22 21:11:31 2009 +0000
fix_write_reax_bonds.cpp
fix_write_reax_bonds.h
# deleted on Thu Jan 8 16:53:09 2009 +0000
pair_gran_hertzian.cpp
pair_gran_hertzian.h
pair_gran_history.cpp
pair_gran_history.h
pair_gran_no_history.cpp
pair_gran_no_history.h
# deleted on Mon Mar 17 23:24:44 2008 +0000
compute_temp_dipole.cpp
compute_temp_dipole.h
fix_nve_dipole.cpp
fix_nve_dipole.h
# deleted on Mon Mar 17 23:23:24 2008 +0000
fix_nve_gran.cpp
fix_nve_gran.h
# deleted on Fri Nov 30 21:49:20 2007 +0000
fix_gran_diag.cpp
fix_gran_diag.h
atom_angle.cpp
atom_angle.h
atom_bond.cpp
atom_bond.h
atom_full.cpp
atom_full.h
atom_molecular.cpp
atom_molecular.h
# deleted on Tue Jan 30 00:22:05 2007 +0000
atom_dpd.cpp
atom_dpd.h
atom_granular.cpp
atom_granular.h
# deleted on Wed Dec 13 00:34:21 2006 +0000
fix_insert.cpp
fix_insert.h
diff --git a/src/QEQ/fix_qeq_point.cpp b/src/QEQ/fix_qeq_point.cpp
index 9af70a445..63d20ad91 100644
--- a/src/QEQ/fix_qeq_point.cpp
+++ b/src/QEQ/fix_qeq_point.cpp
@@ -1,173 +1,173 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Ray Shan (Sandia)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "fix_qeq_point.h"
#include "atom.h"
#include "comm.h"
#include "domain.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "update.h"
#include "force.h"
#include "group.h"
#include "kspace.h"
#include "respa.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
FixQEqPoint::FixQEqPoint(LAMMPS *lmp, int narg, char **arg) :
FixQEq(lmp, narg, arg) {}
/* ---------------------------------------------------------------------- */
void FixQEqPoint::init()
{
if (!atom->q_flag)
error->all(FLERR,"Fix qeq/point requires atom attribute q");
ngroup = group->count(igroup);
if (ngroup == 0) error->all(FLERR,"Fix qeq/point group has no atoms");
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->pair = 0;
neighbor->requests[irequest]->fix = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->full = 1;
int ntypes = atom->ntypes;
- memory->create(shld,ntypes+1,ntypes+1,"qeq:shileding");
+ memory->create(shld,ntypes+1,ntypes+1,"qeq:shielding");
if (strstr(update->integrate_style,"respa"))
nlevels_respa = ((Respa *) update->integrate)->nlevels;
}
/* ---------------------------------------------------------------------- */
void FixQEqPoint::pre_force(int vflag)
{
if (update->ntimestep % nevery) return;
nlocal = atom->nlocal;
if( atom->nmax > nmax ) reallocate_storage();
if( nlocal > n_cap*DANGER_ZONE || m_fill > m_cap*DANGER_ZONE )
reallocate_matrix();
init_matvec();
matvecs = CG(b_s, s); // CG on s - parallel
matvecs += CG(b_t, t); // CG on t - parallel
calculate_Q();
if (force->kspace) force->kspace->qsum_qsq();
}
/* ---------------------------------------------------------------------- */
void FixQEqPoint::init_matvec()
{
compute_H();
int inum, ii, i;
int *ilist;
inum = list->inum;
ilist = list->ilist;
for( ii = 0; ii < inum; ++ii ) {
i = ilist[ii];
if (atom->mask[i] & groupbit) {
Hdia_inv[i] = 1. / eta[ atom->type[i] ];
b_s[i] = -( chi[atom->type[i]] + chizj[i] );
b_t[i] = -1.0;
t[i] = t_hist[i][2] + 3 * ( t_hist[i][0] - t_hist[i][1] );
s[i] = 4*(s_hist[i][0]+s_hist[i][2])-(6*s_hist[i][1]+s_hist[i][3]);
}
}
pack_flag = 2;
comm->forward_comm_fix(this); //Dist_vector( s );
pack_flag = 3;
comm->forward_comm_fix(this); //Dist_vector( t );
}
/* ---------------------------------------------------------------------- */
void FixQEqPoint::compute_H()
{
int inum, jnum, *ilist, *jlist, *numneigh, **firstneigh;
int i, j, ii, jj;
double **x;
double dx, dy, dz, r_sqr, r;
x = atom->x;
int *mask = atom->mask;
inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
// fill in the H matrix
m_fill = 0;
for( ii = 0; ii < inum; ii++ ) {
i = ilist[ii];
if (mask[i] & groupbit) {
jlist = firstneigh[i];
jnum = numneigh[i];
H.firstnbr[i] = m_fill;
for( jj = 0; jj < jnum; jj++ ) {
j = jlist[jj];
j &= NEIGHMASK;
dx = x[j][0] - x[i][0];
dy = x[j][1] - x[i][1];
dz = x[j][2] - x[i][2];
r_sqr = dx*dx + dy*dy + dz*dz;
if (r_sqr <= cutoff_sq) {
H.jlist[m_fill] = j;
r = sqrt(r_sqr);
H.val[m_fill] = 0.5/r;
m_fill++;
}
}
H.numnbrs[i] = m_fill - H.firstnbr[i];
}
}
if (m_fill >= H.m) {
char str[128];
sprintf(str,"H matrix size has been exceeded: m_fill=%d H.m=%d\n",
m_fill, H.m );
error->warning(FLERR,str);
error->all(FLERR,"Fix qeq/point has insufficient QEq matrix size");
}
}
/* ---------------------------------------------------------------------- */
diff --git a/src/RIGID/fix_shake.cpp b/src/RIGID/fix_shake.cpp
index 1fe704efb..5c993ee85 100644
--- a/src/RIGID/fix_shake.cpp
+++ b/src/RIGID/fix_shake.cpp
@@ -1,2811 +1,2820 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <mpi.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "fix_shake.h"
#include "fix_rattle.h"
#include "atom.h"
#include "atom_vec.h"
#include "molecule.h"
#include "update.h"
#include "respa.h"
#include "modify.h"
#include "domain.h"
#include "force.h"
#include "bond.h"
#include "angle.h"
#include "comm.h"
#include "group.h"
#include "fix_respa.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace FixConst;
using namespace MathConst;
// allocate space for static class variable
FixShake *FixShake::fsptr;
#define BIG 1.0e20
#define MASSDELTA 0.1
/* ---------------------------------------------------------------------- */
FixShake::FixShake(LAMMPS *lmp, int narg, char **arg) :
Fix(lmp, narg, arg), bond_flag(NULL), angle_flag(NULL),
type_flag(NULL), mass_list(NULL), bond_distance(NULL), angle_distance(NULL),
loop_respa(NULL), step_respa(NULL), x(NULL), v(NULL), f(NULL), ftmp(NULL),
vtmp(NULL), mass(NULL), rmass(NULL), type(NULL), shake_flag(NULL),
shake_atom(NULL), shake_type(NULL), xshake(NULL), nshake(NULL),
list(NULL), b_count(NULL), b_count_all(NULL), b_ave(NULL), b_max(NULL),
b_min(NULL), b_ave_all(NULL), b_max_all(NULL), b_min_all(NULL),
a_count(NULL), a_count_all(NULL), a_ave(NULL), a_max(NULL), a_min(NULL),
a_ave_all(NULL), a_max_all(NULL), a_min_all(NULL), atommols(NULL),
onemols(NULL)
{
MPI_Comm_rank(world,&me);
MPI_Comm_size(world,&nprocs);
virial_flag = 1;
create_attribute = 1;
dof_flag = 1;
// error check
molecular = atom->molecular;
if (molecular == 0)
error->all(FLERR,"Cannot use fix shake with non-molecular system");
// perform initial allocation of atom-based arrays
// register with Atom class
shake_flag = NULL;
shake_atom = NULL;
shake_type = NULL;
xshake = NULL;
ftmp = NULL;
vtmp = NULL;
grow_arrays(atom->nmax);
atom->add_callback(0);
// set comm size needed by this fix
comm_forward = 3;
// parse SHAKE args
if (narg < 8) error->all(FLERR,"Illegal fix shake command");
tolerance = force->numeric(FLERR,arg[3]);
max_iter = force->inumeric(FLERR,arg[4]);
output_every = force->inumeric(FLERR,arg[5]);
// parse SHAKE args for bond and angle types
// will be used by find_clusters
// store args for "b" "a" "t" as flags in (1:n) list for fast access
// store args for "m" in list of length nmass for looping over
// for "m" verify that atom masses have been set
bond_flag = new int[atom->nbondtypes+1];
for (int i = 1; i <= atom->nbondtypes; i++) bond_flag[i] = 0;
angle_flag = new int[atom->nangletypes+1];
for (int i = 1; i <= atom->nangletypes; i++) angle_flag[i] = 0;
type_flag = new int[atom->ntypes+1];
for (int i = 1; i <= atom->ntypes; i++) type_flag[i] = 0;
mass_list = new double[atom->ntypes];
nmass = 0;
char mode = '\0';
int next = 6;
while (next < narg) {
if (strcmp(arg[next],"b") == 0) mode = 'b';
else if (strcmp(arg[next],"a") == 0) mode = 'a';
else if (strcmp(arg[next],"t") == 0) mode = 't';
else if (strcmp(arg[next],"m") == 0) {
mode = 'm';
atom->check_mass(FLERR);
// break if keyword that is not b,a,t,m
} else if (isalpha(arg[next][0])) break;
// read numeric args of b,a,t,m
else if (mode == 'b') {
int i = force->inumeric(FLERR,arg[next]);
if (i < 1 || i > atom->nbondtypes)
error->all(FLERR,"Invalid bond type index for fix shake");
bond_flag[i] = 1;
} else if (mode == 'a') {
int i = force->inumeric(FLERR,arg[next]);
if (i < 1 || i > atom->nangletypes)
error->all(FLERR,"Invalid angle type index for fix shake");
angle_flag[i] = 1;
} else if (mode == 't') {
int i = force->inumeric(FLERR,arg[next]);
if (i < 1 || i > atom->ntypes)
error->all(FLERR,"Invalid atom type index for fix shake");
type_flag[i] = 1;
} else if (mode == 'm') {
double massone = force->numeric(FLERR,arg[next]);
if (massone == 0.0) error->all(FLERR,"Invalid atom mass for fix shake");
if (nmass == atom->ntypes)
error->all(FLERR,"Too many masses for fix shake");
mass_list[nmass++] = massone;
} else error->all(FLERR,"Illegal fix shake command");
next++;
}
// parse optional args
onemols = NULL;
int iarg = next;
while (iarg < narg) {
if (strcmp(arg[next],"mol") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix shake command");
int imol = atom->find_molecule(arg[iarg+1]);
if (imol == -1)
error->all(FLERR,"Molecule template ID for fix shake does not exist");
if (atom->molecules[imol]->nset > 1 && comm->me == 0)
error->warning(FLERR,"Molecule template for "
"fix shake has multiple molecules");
onemols = &atom->molecules[imol];
nmol = onemols[0]->nset;
iarg += 2;
} else error->all(FLERR,"Illegal fix shake command");
}
// error check for Molecule template
if (onemols) {
for (int i = 0; i < nmol; i++)
if (onemols[i]->shakeflag == 0)
error->all(FLERR,"Fix shake molecule template must have shake info");
}
// allocate bond and angle distance arrays, indexed from 1 to n
bond_distance = new double[atom->nbondtypes+1];
angle_distance = new double[atom->nangletypes+1];
// allocate statistics arrays
if (output_every) {
int nb = atom->nbondtypes + 1;
b_count = new int[nb];
b_count_all = new int[nb];
b_ave = new double[nb];
b_ave_all = new double[nb];
b_max = new double[nb];
b_max_all = new double[nb];
b_min = new double[nb];
b_min_all = new double[nb];
int na = atom->nangletypes + 1;
a_count = new int[na];
a_count_all = new int[na];
a_ave = new double[na];
a_ave_all = new double[na];
a_max = new double[na];
a_max_all = new double[na];
a_min = new double[na];
a_min_all = new double[na];
}
// SHAKE vs RATTLE
rattle = 0;
// identify all SHAKE clusters
find_clusters();
// initialize list of SHAKE clusters to constrain
maxlist = 0;
list = NULL;
}
/* ---------------------------------------------------------------------- */
FixShake::~FixShake()
{
// unregister callbacks to this fix from Atom class
atom->delete_callback(id,0);
// set bond_type and angle_type back to positive for SHAKE clusters
// must set for all SHAKE bonds and angles stored by each atom
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (shake_flag[i] == 0) continue;
else if (shake_flag[i] == 1) {
bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],1);
bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],1);
angletype_findset(i,shake_atom[i][1],shake_atom[i][2],1);
} else if (shake_flag[i] == 2) {
bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],1);
} else if (shake_flag[i] == 3) {
bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],1);
bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],1);
} else if (shake_flag[i] == 4) {
bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],1);
bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],1);
bondtype_findset(i,shake_atom[i][0],shake_atom[i][3],1);
}
}
// delete locally stored arrays
memory->destroy(shake_flag);
memory->destroy(shake_atom);
memory->destroy(shake_type);
memory->destroy(xshake);
memory->destroy(ftmp);
memory->destroy(vtmp);
delete [] bond_flag;
delete [] angle_flag;
delete [] type_flag;
delete [] mass_list;
delete [] bond_distance;
delete [] angle_distance;
if (output_every) {
delete [] b_count;
delete [] b_count_all;
delete [] b_ave;
delete [] b_ave_all;
delete [] b_max;
delete [] b_max_all;
delete [] b_min;
delete [] b_min_all;
delete [] a_count;
delete [] a_count_all;
delete [] a_ave;
delete [] a_ave_all;
delete [] a_max;
delete [] a_max_all;
delete [] a_min;
delete [] a_min_all;
}
memory->destroy(list);
}
/* ---------------------------------------------------------------------- */
int FixShake::setmask()
{
int mask = 0;
mask |= PRE_NEIGHBOR;
mask |= POST_FORCE;
mask |= POST_FORCE_RESPA;
return mask;
}
/* ----------------------------------------------------------------------
set bond and angle distances
this init must happen after force->bond and force->angle inits
------------------------------------------------------------------------- */
void FixShake::init()
{
int i,m,flag,flag_all,type1,type2,bond1_type,bond2_type;
double rsq,angle;
// error if more than one shake fix
int count = 0;
for (i = 0; i < modify->nfix; i++)
if (strcmp(modify->fix[i]->style,"shake") == 0) count++;
if (count > 1) error->all(FLERR,"More than one fix shake");
// cannot use with minimization since SHAKE turns off bonds
// that should contribute to potential energy
if (update->whichflag == 2)
error->all(FLERR,"Fix shake cannot be used with minimization");
// error if npt,nph fix comes before shake fix
for (i = 0; i < modify->nfix; i++) {
if (strcmp(modify->fix[i]->style,"npt") == 0) break;
if (strcmp(modify->fix[i]->style,"nph") == 0) break;
}
if (i < modify->nfix) {
for (int j = i; j < modify->nfix; j++)
if (strcmp(modify->fix[j]->style,"shake") == 0)
error->all(FLERR,"Shake fix must come before NPT/NPH fix");
}
// if rRESPA, find associated fix that must exist
// could have changed locations in fix list since created
// set ptrs to rRESPA variables
if (strstr(update->integrate_style,"respa")) {
for (i = 0; i < modify->nfix; i++)
if (strcmp(modify->fix[i]->style,"RESPA") == 0) ifix_respa = i;
nlevels_respa = ((Respa *) update->integrate)->nlevels;
loop_respa = ((Respa *) update->integrate)->loop;
step_respa = ((Respa *) update->integrate)->step;
}
// set equilibrium bond distances
if (force->bond == NULL)
error->all(FLERR,"Bond potential must be defined for SHAKE");
for (i = 1; i <= atom->nbondtypes; i++)
bond_distance[i] = force->bond->equilibrium_distance(i);
// set equilibrium angle distances
int nlocal = atom->nlocal;
for (i = 1; i <= atom->nangletypes; i++) {
if (angle_flag[i] == 0) continue;
if (force->angle == NULL)
error->all(FLERR,"Angle potential must be defined for SHAKE");
// scan all atoms for a SHAKE angle cluster
// extract bond types for the 2 bonds in the cluster
// bond types must be same in all clusters of this angle type,
// else set error flag
flag = 0;
bond1_type = bond2_type = 0;
for (m = 0; m < nlocal; m++) {
if (shake_flag[m] != 1) continue;
if (shake_type[m][2] != i) continue;
type1 = MIN(shake_type[m][0],shake_type[m][1]);
type2 = MAX(shake_type[m][0],shake_type[m][1]);
if (bond1_type > 0) {
if (type1 != bond1_type || type2 != bond2_type) {
flag = 1;
break;
}
}
bond1_type = type1;
bond2_type = type2;
}
// error check for any bond types that are not the same
MPI_Allreduce(&flag,&flag_all,1,MPI_INT,MPI_MAX,world);
if (flag_all) error->all(FLERR,"Shake angles have different bond types");
// insure all procs have bond types
MPI_Allreduce(&bond1_type,&flag_all,1,MPI_INT,MPI_MAX,world);
bond1_type = flag_all;
MPI_Allreduce(&bond2_type,&flag_all,1,MPI_INT,MPI_MAX,world);
bond2_type = flag_all;
// if bond types are 0, no SHAKE angles of this type exist
// just skip this angle
if (bond1_type == 0) {
angle_distance[i] = 0.0;
continue;
}
// compute the angle distance as a function of 2 bond distances
// formula is now correct for bonds of same or different lengths (Oct15)
angle = force->angle->equilibrium_angle(i);
const double b1 = bond_distance[bond1_type];
const double b2 = bond_distance[bond2_type];
rsq = b1*b1 + b2*b2 - 2.0*b1*b2*cos(angle);
angle_distance[i] = sqrt(rsq);
}
}
/* ----------------------------------------------------------------------
SHAKE as pre-integrator constraint
------------------------------------------------------------------------- */
void FixShake::setup(int vflag)
{
pre_neighbor();
if (output_every) stats();
// setup SHAKE output
bigint ntimestep = update->ntimestep;
if (output_every) {
next_output = ntimestep + output_every;
if (ntimestep % output_every != 0)
next_output = (ntimestep/output_every)*output_every + output_every;
} else next_output = -1;
// set respa to 0 if verlet is used and to 1 otherwise
if (strstr(update->integrate_style,"verlet"))
respa = 0;
else
respa = 1;
if (!respa) {
dtv = update->dt;
dtfsq = 0.5 * update->dt * update->dt * force->ftm2v;
if (!rattle) dtfsq = update->dt * update->dt * force->ftm2v;
} else {
dtv = step_respa[0];
dtf_innerhalf = 0.5 * step_respa[0] * force->ftm2v;
dtf_inner = dtf_innerhalf;
}
// correct geometry of cluster if necessary
correct_coordinates(vflag);
// remove velocities along any bonds
correct_velocities();
// precalculate constraining forces for first integration step
shake_end_of_step(vflag);
}
/* ----------------------------------------------------------------------
build list of SHAKE clusters to constrain
if one or more atoms in cluster are on this proc,
this proc lists the cluster exactly once
------------------------------------------------------------------------- */
void FixShake::pre_neighbor()
{
int atom1,atom2,atom3,atom4;
// local copies of atom quantities
// used by SHAKE until next re-neighboring
x = atom->x;
v = atom->v;
f = atom->f;
mass = atom->mass;
rmass = atom->rmass;
type = atom->type;
nlocal = atom->nlocal;
// extend size of SHAKE list if necessary
if (nlocal > maxlist) {
maxlist = nlocal;
memory->destroy(list);
memory->create(list,maxlist,"shake:list");
}
// build list of SHAKE clusters I compute
nlist = 0;
for (int i = 0; i < nlocal; i++)
if (shake_flag[i]) {
if (shake_flag[i] == 2) {
atom1 = atom->map(shake_atom[i][0]);
atom2 = atom->map(shake_atom[i][1]);
if (atom1 == -1 || atom2 == -1) {
char str[128];
sprintf(str,"Shake atoms " TAGINT_FORMAT " " TAGINT_FORMAT
" missing on proc %d at step " BIGINT_FORMAT,
shake_atom[i][0],shake_atom[i][1],me,update->ntimestep);
error->one(FLERR,str);
}
if (i <= atom1 && i <= atom2) list[nlist++] = i;
} else if (shake_flag[i] % 2 == 1) {
atom1 = atom->map(shake_atom[i][0]);
atom2 = atom->map(shake_atom[i][1]);
atom3 = atom->map(shake_atom[i][2]);
if (atom1 == -1 || atom2 == -1 || atom3 == -1) {
char str[128];
sprintf(str,"Shake atoms "
TAGINT_FORMAT " " TAGINT_FORMAT " " TAGINT_FORMAT
" missing on proc %d at step " BIGINT_FORMAT,
shake_atom[i][0],shake_atom[i][1],shake_atom[i][2],
me,update->ntimestep);
error->one(FLERR,str);
}
if (i <= atom1 && i <= atom2 && i <= atom3) list[nlist++] = i;
} else {
atom1 = atom->map(shake_atom[i][0]);
atom2 = atom->map(shake_atom[i][1]);
atom3 = atom->map(shake_atom[i][2]);
atom4 = atom->map(shake_atom[i][3]);
if (atom1 == -1 || atom2 == -1 || atom3 == -1 || atom4 == -1) {
char str[128];
sprintf(str,"Shake atoms "
TAGINT_FORMAT " " TAGINT_FORMAT " "
TAGINT_FORMAT " " TAGINT_FORMAT
" missing on proc %d at step " BIGINT_FORMAT,
shake_atom[i][0],shake_atom[i][1],
shake_atom[i][2],shake_atom[i][3],
me,update->ntimestep);
error->one(FLERR,str);
}
if (i <= atom1 && i <= atom2 && i <= atom3 && i <= atom4)
list[nlist++] = i;
}
}
}
/* ----------------------------------------------------------------------
compute the force adjustment for SHAKE constraint
------------------------------------------------------------------------- */
void FixShake::post_force(int vflag)
{
if (update->ntimestep == next_output) stats();
// xshake = unconstrained move with current v,f
// communicate results if necessary
unconstrained_update();
if (nprocs > 1) comm->forward_comm_fix(this);
// virial setup
if (vflag) v_setup(vflag);
else evflag = 0;
// loop over clusters to add constraint forces
int m;
for (int i = 0; i < nlist; i++) {
m = list[i];
if (shake_flag[m] == 2) shake(m);
else if (shake_flag[m] == 3) shake3(m);
else if (shake_flag[m] == 4) shake4(m);
else shake3angle(m);
}
// store vflag for coordinate_constraints_end_of_step()
vflag_post_force = vflag;
}
/* ----------------------------------------------------------------------
enforce SHAKE constraints from rRESPA
xshake prediction portion is different than Verlet
------------------------------------------------------------------------- */
void FixShake::post_force_respa(int vflag, int ilevel, int iloop)
{
// call stats only on outermost level
if (ilevel == nlevels_respa-1 && update->ntimestep == next_output) stats();
// might be OK to skip enforcing SHAKE constraings
// on last iteration of inner levels if pressure not requested
// however, leads to slightly different trajectories
//if (ilevel < nlevels_respa-1 && iloop == loop_respa[ilevel]-1 && !vflag)
// return;
// xshake = unconstrained move with current v,f as function of level
// communicate results if necessary
unconstrained_update_respa(ilevel);
if (nprocs > 1) comm->forward_comm_fix(this);
// virial setup only needed on last iteration of innermost level
// and if pressure is requested
// virial accumulation happens via evflag at last iteration of each level
if (ilevel == 0 && iloop == loop_respa[ilevel]-1 && vflag) v_setup(vflag);
if (iloop == loop_respa[ilevel]-1) evflag = 1;
else evflag = 0;
// loop over clusters to add constraint forces
int m;
for (int i = 0; i < nlist; i++) {
m = list[i];
if (shake_flag[m] == 2) shake(m);
else if (shake_flag[m] == 3) shake3(m);
else if (shake_flag[m] == 4) shake4(m);
else shake3angle(m);
}
// store vflag for coordinate_constraints_end_of_step()
vflag_post_force = vflag;
}
/* ----------------------------------------------------------------------
count # of degrees-of-freedom removed by SHAKE for atoms in igroup
------------------------------------------------------------------------- */
int FixShake::dof(int igroup)
{
int groupbit = group->bitmask[igroup];
int *mask = atom->mask;
tagint *tag = atom->tag;
int nlocal = atom->nlocal;
// count dof in a cluster if and only if
// the central atom is in group and atom i is the central atom
int n = 0;
for (int i = 0; i < nlocal; i++) {
if (!(mask[i] & groupbit)) continue;
if (shake_flag[i] == 0) continue;
if (shake_atom[i][0] != tag[i]) continue;
if (shake_flag[i] == 1) n += 3;
else if (shake_flag[i] == 2) n += 1;
else if (shake_flag[i] == 3) n += 2;
else if (shake_flag[i] == 4) n += 3;
}
int nall;
MPI_Allreduce(&n,&nall,1,MPI_INT,MPI_SUM,world);
return nall;
}
/* ----------------------------------------------------------------------
identify whether each atom is in a SHAKE cluster
only include atoms in fix group and those bonds/angles specified in input
test whether all clusters are valid
set shake_flag, shake_atom, shake_type values
set bond,angle types negative so will be ignored in neighbor lists
------------------------------------------------------------------------- */
void FixShake::find_clusters()
{
int i,j,m,n,imol,iatom;
int flag,flag_all,nbuf,size;
tagint tagprev;
double massone;
tagint *buf;
if (me == 0 && screen) {
if (!rattle) fprintf(screen,"Finding SHAKE clusters ...\n");
else fprintf(screen,"Finding RATTLE clusters ...\n");
}
atommols = atom->avec->onemols;
tagint *tag = atom->tag;
int *type = atom->type;
int *mask = atom->mask;
double *mass = atom->mass;
double *rmass = atom->rmass;
int **nspecial = atom->nspecial;
tagint **special = atom->special;
int *molindex = atom->molindex;
int *molatom = atom->molatom;
int nlocal = atom->nlocal;
int angles_allow = atom->avec->angles_allow;
// setup ring of procs
int next = me + 1;
int prev = me -1;
if (next == nprocs) next = 0;
if (prev < 0) prev = nprocs - 1;
// -----------------------------------------------------
// allocate arrays for self (1d) and bond partners (2d)
// max = max # of bond partners for owned atoms = 2nd dim of partner arrays
// npartner[i] = # of bonds attached to atom i
// nshake[i] = # of SHAKE bonds attached to atom i
// partner_tag[i][] = global IDs of each partner
// partner_mask[i][] = mask of each partner
// partner_type[i][] = type of each partner
// partner_massflag[i][] = 1 if partner meets mass criterion, 0 if not
// partner_bondtype[i][] = type of bond attached to each partner
// partner_shake[i][] = 1 if SHAKE bonded to partner, 0 if not
// partner_nshake[i][] = nshake value for each partner
// -----------------------------------------------------
int max = 0;
if (molecular == 1) {
for (i = 0; i < nlocal; i++) max = MAX(max,nspecial[i][0]);
} else {
for (i = 0; i < nlocal; i++) {
imol = molindex[i];
if (imol < 0) continue;
iatom = molatom[i];
max = MAX(max,atommols[imol]->nspecial[iatom][0]);
}
}
int *npartner;
memory->create(npartner,nlocal,"shake:npartner");
memory->create(nshake,nlocal,"shake:nshake");
tagint **partner_tag;
int **partner_mask,**partner_type,**partner_massflag;
int **partner_bondtype,**partner_shake,**partner_nshake;
memory->create(partner_tag,nlocal,max,"shake:partner_tag");
memory->create(partner_mask,nlocal,max,"shake:partner_mask");
memory->create(partner_type,nlocal,max,"shake:partner_type");
memory->create(partner_massflag,nlocal,max,"shake:partner_massflag");
memory->create(partner_bondtype,nlocal,max,"shake:partner_bondtype");
memory->create(partner_shake,nlocal,max,"shake:partner_shake");
memory->create(partner_nshake,nlocal,max,"shake:partner_nshake");
// -----------------------------------------------------
// set npartner and partner_tag from special arrays
// -----------------------------------------------------
if (molecular == 1) {
for (i = 0; i < nlocal; i++) {
npartner[i] = nspecial[i][0];
for (j = 0; j < npartner[i]; j++)
partner_tag[i][j] = special[i][j];
}
} else {
for (i = 0; i < nlocal; i++) {
imol = molindex[i];
if (imol < 0) continue;
iatom = molatom[i];
tagprev = tag[i] - iatom - 1;
npartner[i] = atommols[imol]->nspecial[iatom][0];
for (j = 0; j < npartner[i]; j++)
partner_tag[i][j] = atommols[imol]->special[iatom][j] + tagprev;;
}
}
// -----------------------------------------------------
// set partner_mask, partner_type, partner_massflag, partner_bondtype
// for bonded partners
// requires communication for off-proc partners
// -----------------------------------------------------
// fill in mask, type, massflag, bondtype if own bond partner
// info to store in buf for each off-proc bond = nper = 6
// 2 atoms IDs in bond, space for mask, type, massflag, bondtype
// nbufmax = largest buffer needed to hold info from any proc
int nper = 6;
nbuf = 0;
for (i = 0; i < nlocal; i++) {
for (j = 0; j < npartner[i]; j++) {
partner_mask[i][j] = 0;
partner_type[i][j] = 0;
partner_massflag[i][j] = 0;
partner_bondtype[i][j] = 0;
m = atom->map(partner_tag[i][j]);
if (m >= 0 && m < nlocal) {
partner_mask[i][j] = mask[m];
partner_type[i][j] = type[m];
if (nmass) {
if (rmass) massone = rmass[m];
else massone = mass[type[m]];
partner_massflag[i][j] = masscheck(massone);
}
n = bondtype_findset(i,tag[i],partner_tag[i][j],0);
if (n) partner_bondtype[i][j] = n;
else {
n = bondtype_findset(m,tag[i],partner_tag[i][j],0);
if (n) partner_bondtype[i][j] = n;
}
} else nbuf += nper;
}
}
memory->create(buf,nbuf,"shake:buf");
// fill buffer with info
size = 0;
for (i = 0; i < nlocal; i++) {
for (j = 0; j < npartner[i]; j++) {
m = atom->map(partner_tag[i][j]);
if (m < 0 || m >= nlocal) {
buf[size] = tag[i];
buf[size+1] = partner_tag[i][j];
buf[size+2] = 0;
buf[size+3] = 0;
buf[size+4] = 0;
n = bondtype_findset(i,tag[i],partner_tag[i][j],0);
if (n) buf[size+5] = n;
else buf[size+5] = 0;
size += nper;
}
}
}
// cycle buffer around ring of procs back to self
fsptr = this;
comm->ring(size,sizeof(tagint),buf,1,ring_bonds,buf);
// store partner info returned to me
m = 0;
while (m < size) {
i = atom->map(buf[m]);
for (j = 0; j < npartner[i]; j++)
if (buf[m+1] == partner_tag[i][j]) break;
partner_mask[i][j] = buf[m+2];
partner_type[i][j] = buf[m+3];
partner_massflag[i][j] = buf[m+4];
partner_bondtype[i][j] = buf[m+5];
m += nper;
}
memory->destroy(buf);
// error check for unfilled partner info
// if partner_type not set, is an error
// partner_bondtype may not be set if special list is not consistent
// with bondatom (e.g. due to delete_bonds command)
// this is OK if one or both atoms are not in fix group, since
// bond won't be SHAKEn anyway
// else it's an error
flag = 0;
for (i = 0; i < nlocal; i++)
for (j = 0; j < npartner[i]; j++) {
if (partner_type[i][j] == 0) flag = 1;
if (!(mask[i] & groupbit)) continue;
if (!(partner_mask[i][j] & groupbit)) continue;
if (partner_bondtype[i][j] == 0) flag = 1;
}
MPI_Allreduce(&flag,&flag_all,1,MPI_INT,MPI_SUM,world);
if (flag_all) error->all(FLERR,"Did not find fix shake partner info");
// -----------------------------------------------------
// identify SHAKEable bonds
// set nshake[i] = # of SHAKE bonds attached to atom i
// set partner_shake[i][] = 1 if SHAKE bonded to partner, 0 if not
// both atoms must be in group, bondtype must be > 0
// check if bondtype is in input bond_flag
// check if type of either atom is in input type_flag
// check if mass of either atom is in input mass_list
// -----------------------------------------------------
int np;
for (i = 0; i < nlocal; i++) {
nshake[i] = 0;
np = npartner[i];
for (j = 0; j < np; j++) {
partner_shake[i][j] = 0;
if (!(mask[i] & groupbit)) continue;
if (!(partner_mask[i][j] & groupbit)) continue;
if (partner_bondtype[i][j] <= 0) continue;
if (bond_flag[partner_bondtype[i][j]]) {
partner_shake[i][j] = 1;
nshake[i]++;
continue;
}
if (type_flag[type[i]] || type_flag[partner_type[i][j]]) {
partner_shake[i][j] = 1;
nshake[i]++;
continue;
}
if (nmass) {
if (partner_massflag[i][j]) {
partner_shake[i][j] = 1;
nshake[i]++;
continue;
} else {
if (rmass) massone = rmass[i];
else massone = mass[type[i]];
if (masscheck(massone)) {
partner_shake[i][j] = 1;
nshake[i]++;
continue;
}
}
}
}
}
// -----------------------------------------------------
// set partner_nshake for bonded partners
// requires communication for off-proc partners
// -----------------------------------------------------
// fill in partner_nshake if own bond partner
// info to store in buf for each off-proc bond =
// 2 atoms IDs in bond, space for nshake value
// nbufmax = largest buffer needed to hold info from any proc
nbuf = 0;
for (i = 0; i < nlocal; i++) {
for (j = 0; j < npartner[i]; j++) {
m = atom->map(partner_tag[i][j]);
if (m >= 0 && m < nlocal) partner_nshake[i][j] = nshake[m];
else nbuf += 3;
}
}
memory->create(buf,nbuf,"shake:buf");
// fill buffer with info
size = 0;
for (i = 0; i < nlocal; i++) {
for (j = 0; j < npartner[i]; j++) {
m = atom->map(partner_tag[i][j]);
if (m < 0 || m >= nlocal) {
buf[size] = tag[i];
buf[size+1] = partner_tag[i][j];
size += 3;
}
}
}
// cycle buffer around ring of procs back to self
fsptr = this;
comm->ring(size,sizeof(tagint),buf,2,ring_nshake,buf);
// store partner info returned to me
m = 0;
while (m < size) {
i = atom->map(buf[m]);
for (j = 0; j < npartner[i]; j++)
if (buf[m+1] == partner_tag[i][j]) break;
partner_nshake[i][j] = buf[m+2];
m += 3;
}
memory->destroy(buf);
// -----------------------------------------------------
// error checks
// no atom with nshake > 3
// no connected atoms which both have nshake > 1
// -----------------------------------------------------
flag = 0;
for (i = 0; i < nlocal; i++) if (nshake[i] > 3) flag = 1;
MPI_Allreduce(&flag,&flag_all,1,MPI_INT,MPI_SUM,world);
if (flag_all) error->all(FLERR,"Shake cluster of more than 4 atoms");
flag = 0;
for (i = 0; i < nlocal; i++) {
if (nshake[i] <= 1) continue;
for (j = 0; j < npartner[i]; j++)
if (partner_shake[i][j] && partner_nshake[i][j] > 1) flag = 1;
}
MPI_Allreduce(&flag,&flag_all,1,MPI_INT,MPI_SUM,world);
if (flag_all) error->all(FLERR,"Shake clusters are connected");
// -----------------------------------------------------
// set SHAKE arrays that are stored with atoms & add angle constraints
// zero shake arrays for all owned atoms
// if I am central atom set shake_flag & shake_atom & shake_type
// for 2-atom clusters, I am central atom if my atom ID < partner ID
// for 3-atom clusters, test for angle constraint
// angle will be stored by this atom if it exists
// if angle type matches angle_flag, then it is angle-constrained
// shake_flag[] = 0 if atom not in SHAKE cluster
// 2,3,4 = size of bond-only cluster
// 1 = 3-atom angle cluster
// shake_atom[][] = global IDs of 2,3,4 atoms in cluster
// central atom is 1st
// for 2-atom cluster, lowest ID is 1st
// shake_type[][] = bondtype of each bond in cluster
// for 3-atom angle cluster, 3rd value is angletype
// -----------------------------------------------------
for (i = 0; i < nlocal; i++) {
shake_flag[i] = 0;
shake_atom[i][0] = 0;
shake_atom[i][1] = 0;
shake_atom[i][2] = 0;
shake_atom[i][3] = 0;
shake_type[i][0] = 0;
shake_type[i][1] = 0;
shake_type[i][2] = 0;
if (nshake[i] == 1) {
for (j = 0; j < npartner[i]; j++)
if (partner_shake[i][j]) break;
if (partner_nshake[i][j] == 1 && tag[i] < partner_tag[i][j]) {
shake_flag[i] = 2;
shake_atom[i][0] = tag[i];
shake_atom[i][1] = partner_tag[i][j];
shake_type[i][0] = partner_bondtype[i][j];
}
}
if (nshake[i] > 1) {
shake_flag[i] = 1;
shake_atom[i][0] = tag[i];
for (j = 0; j < npartner[i]; j++)
if (partner_shake[i][j]) {
m = shake_flag[i];
shake_atom[i][m] = partner_tag[i][j];
shake_type[i][m-1] = partner_bondtype[i][j];
shake_flag[i]++;
}
}
if (nshake[i] == 2 && angles_allow) {
n = angletype_findset(i,shake_atom[i][1],shake_atom[i][2],0);
if (n <= 0) continue;
if (angle_flag[n]) {
shake_flag[i] = 1;
shake_type[i][2] = n;
}
}
}
// -----------------------------------------------------
// set shake_flag,shake_atom,shake_type for non-central atoms
// requires communication for off-proc atoms
// -----------------------------------------------------
// fill in shake arrays for each bond partner I own
// info to store in buf for each off-proc bond =
// all values from shake_flag, shake_atom, shake_type
// nbufmax = largest buffer needed to hold info from any proc
nbuf = 0;
for (i = 0; i < nlocal; i++) {
if (shake_flag[i] == 0) continue;
for (j = 0; j < npartner[i]; j++) {
if (partner_shake[i][j] == 0) continue;
m = atom->map(partner_tag[i][j]);
if (m >= 0 && m < nlocal) {
shake_flag[m] = shake_flag[i];
shake_atom[m][0] = shake_atom[i][0];
shake_atom[m][1] = shake_atom[i][1];
shake_atom[m][2] = shake_atom[i][2];
shake_atom[m][3] = shake_atom[i][3];
shake_type[m][0] = shake_type[i][0];
shake_type[m][1] = shake_type[i][1];
shake_type[m][2] = shake_type[i][2];
} else nbuf += 9;
}
}
memory->create(buf,nbuf,"shake:buf");
// fill buffer with info
size = 0;
for (i = 0; i < nlocal; i++) {
if (shake_flag[i] == 0) continue;
for (j = 0; j < npartner[i]; j++) {
if (partner_shake[i][j] == 0) continue;
m = atom->map(partner_tag[i][j]);
if (m < 0 || m >= nlocal) {
buf[size] = partner_tag[i][j];
buf[size+1] = shake_flag[i];
buf[size+2] = shake_atom[i][0];
buf[size+3] = shake_atom[i][1];
buf[size+4] = shake_atom[i][2];
buf[size+5] = shake_atom[i][3];
buf[size+6] = shake_type[i][0];
buf[size+7] = shake_type[i][1];
buf[size+8] = shake_type[i][2];
size += 9;
}
}
}
// cycle buffer around ring of procs back to self
fsptr = this;
comm->ring(size,sizeof(tagint),buf,3,ring_shake,NULL);
memory->destroy(buf);
// -----------------------------------------------------
// free local memory
// -----------------------------------------------------
memory->destroy(npartner);
memory->destroy(nshake);
memory->destroy(partner_tag);
memory->destroy(partner_mask);
memory->destroy(partner_type);
memory->destroy(partner_massflag);
memory->destroy(partner_bondtype);
memory->destroy(partner_shake);
memory->destroy(partner_nshake);
// -----------------------------------------------------
// set bond_type and angle_type negative for SHAKE clusters
// must set for all SHAKE bonds and angles stored by each atom
// -----------------------------------------------------
for (i = 0; i < nlocal; i++) {
if (shake_flag[i] == 0) continue;
else if (shake_flag[i] == 1) {
bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],-1);
bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],-1);
angletype_findset(i,shake_atom[i][1],shake_atom[i][2],-1);
} else if (shake_flag[i] == 2) {
bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],-1);
} else if (shake_flag[i] == 3) {
bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],-1);
bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],-1);
} else if (shake_flag[i] == 4) {
bondtype_findset(i,shake_atom[i][0],shake_atom[i][1],-1);
bondtype_findset(i,shake_atom[i][0],shake_atom[i][2],-1);
bondtype_findset(i,shake_atom[i][0],shake_atom[i][3],-1);
}
}
// -----------------------------------------------------
// print info on SHAKE clusters
// -----------------------------------------------------
int count1,count2,count3,count4;
count1 = count2 = count3 = count4 = 0;
for (i = 0; i < nlocal; i++) {
if (shake_flag[i] == 1) count1++;
else if (shake_flag[i] == 2) count2++;
else if (shake_flag[i] == 3) count3++;
else if (shake_flag[i] == 4) count4++;
}
int tmp;
tmp = count1;
MPI_Allreduce(&tmp,&count1,1,MPI_INT,MPI_SUM,world);
tmp = count2;
MPI_Allreduce(&tmp,&count2,1,MPI_INT,MPI_SUM,world);
tmp = count3;
MPI_Allreduce(&tmp,&count3,1,MPI_INT,MPI_SUM,world);
tmp = count4;
MPI_Allreduce(&tmp,&count4,1,MPI_INT,MPI_SUM,world);
if (me == 0) {
if (screen) {
fprintf(screen," %d = # of size 2 clusters\n",count2/2);
fprintf(screen," %d = # of size 3 clusters\n",count3/3);
fprintf(screen," %d = # of size 4 clusters\n",count4/4);
fprintf(screen," %d = # of frozen angles\n",count1/3);
}
if (logfile) {
fprintf(logfile," %d = # of size 2 clusters\n",count2/2);
fprintf(logfile," %d = # of size 3 clusters\n",count3/3);
fprintf(logfile," %d = # of size 4 clusters\n",count4/4);
fprintf(logfile," %d = # of frozen angles\n",count1/3);
}
}
}
/* ----------------------------------------------------------------------
when receive buffer, scan bond partner IDs for atoms I own
if I own partner:
fill in mask and type and massflag
search for bond with 1st atom and fill in bondtype
------------------------------------------------------------------------- */
void FixShake::ring_bonds(int ndatum, char *cbuf)
{
Atom *atom = fsptr->atom;
double *rmass = atom->rmass;
double *mass = atom->mass;
int *mask = atom->mask;
int *type = atom->type;
int nlocal = atom->nlocal;
int nmass = fsptr->nmass;
tagint *buf = (tagint *) cbuf;
int m,n;
double massone;
for (int i = 0; i < ndatum; i += 6) {
m = atom->map(buf[i+1]);
if (m >= 0 && m < nlocal) {
buf[i+2] = mask[m];
buf[i+3] = type[m];
if (nmass) {
if (rmass) massone = rmass[m];
else massone = mass[type[m]];
buf[i+4] = fsptr->masscheck(massone);
}
if (buf[i+5] == 0) {
n = fsptr->bondtype_findset(m,buf[i],buf[i+1],0);
if (n) buf[i+5] = n;
}
}
}
}
/* ----------------------------------------------------------------------
when receive buffer, scan bond partner IDs for atoms I own
if I own partner, fill in nshake value
------------------------------------------------------------------------- */
void FixShake::ring_nshake(int ndatum, char *cbuf)
{
Atom *atom = fsptr->atom;
int nlocal = atom->nlocal;
int *nshake = fsptr->nshake;
tagint *buf = (tagint *) cbuf;
int m;
for (int i = 0; i < ndatum; i += 3) {
m = atom->map(buf[i+1]);
if (m >= 0 && m < nlocal) buf[i+2] = nshake[m];
}
}
/* ----------------------------------------------------------------------
when receive buffer, scan bond partner IDs for atoms I own
if I own partner, fill in nshake value
------------------------------------------------------------------------- */
void FixShake::ring_shake(int ndatum, char *cbuf)
{
Atom *atom = fsptr->atom;
int nlocal = atom->nlocal;
int *shake_flag = fsptr->shake_flag;
tagint **shake_atom = fsptr->shake_atom;
int **shake_type = fsptr->shake_type;
tagint *buf = (tagint *) cbuf;
int m;
for (int i = 0; i < ndatum; i += 9) {
m = atom->map(buf[i]);
if (m >= 0 && m < nlocal) {
shake_flag[m] = buf[i+1];
shake_atom[m][0] = buf[i+2];
shake_atom[m][1] = buf[i+3];
shake_atom[m][2] = buf[i+4];
shake_atom[m][3] = buf[i+5];
shake_type[m][0] = buf[i+6];
shake_type[m][1] = buf[i+7];
shake_type[m][2] = buf[i+8];
}
}
}
/* ----------------------------------------------------------------------
check if massone is within MASSDELTA of any mass in mass_list
return 1 if yes, 0 if not
------------------------------------------------------------------------- */
int FixShake::masscheck(double massone)
{
for (int i = 0; i < nmass; i++)
if (fabs(mass_list[i]-massone) <= MASSDELTA) return 1;
return 0;
}
/* ----------------------------------------------------------------------
update the unconstrained position of each atom
only for SHAKE clusters, else set to 0.0
assumes NVE update, seems to be accurate enough for NVT,NPT,NPH as well
------------------------------------------------------------------------- */
void FixShake::unconstrained_update()
{
double dtfmsq;
if (rmass) {
for (int i = 0; i < nlocal; i++) {
if (shake_flag[i]) {
dtfmsq = dtfsq / rmass[i];
xshake[i][0] = x[i][0] + dtv*v[i][0] + dtfmsq*f[i][0];
xshake[i][1] = x[i][1] + dtv*v[i][1] + dtfmsq*f[i][1];
xshake[i][2] = x[i][2] + dtv*v[i][2] + dtfmsq*f[i][2];
} else xshake[i][2] = xshake[i][1] = xshake[i][0] = 0.0;
}
} else {
for (int i = 0; i < nlocal; i++) {
if (shake_flag[i]) {
dtfmsq = dtfsq / mass[type[i]];
xshake[i][0] = x[i][0] + dtv*v[i][0] + dtfmsq*f[i][0];
xshake[i][1] = x[i][1] + dtv*v[i][1] + dtfmsq*f[i][1];
xshake[i][2] = x[i][2] + dtv*v[i][2] + dtfmsq*f[i][2];
} else xshake[i][2] = xshake[i][1] = xshake[i][0] = 0.0;
}
}
}
/* ----------------------------------------------------------------------
update the unconstrained position of each atom in a rRESPA step
only for SHAKE clusters, else set to 0.0
assumes NVE update, seems to be accurate enough for NVT,NPT,NPH as well
------------------------------------------------------------------------- */
void FixShake::unconstrained_update_respa(int ilevel)
{
// xshake = atom coords after next x update in innermost loop
// depends on rRESPA level
// for levels > 0 this includes more than one velocity update
// xshake = predicted position from call to this routine at level N =
// x + dt0 (v + dtN/m fN + 1/2 dt(N-1)/m f(N-1) + ... + 1/2 dt0/m f0)
// also set dtfsq = dt0*dtN so that shake,shake3,etc can use it
double ***f_level = ((FixRespa *) modify->fix[ifix_respa])->f_level;
dtfsq = dtf_inner * step_respa[ilevel];
double invmass,dtfmsq;
int jlevel;
if (rmass) {
for (int i = 0; i < nlocal; i++) {
if (shake_flag[i]) {
invmass = 1.0 / rmass[i];
dtfmsq = dtfsq * invmass;
xshake[i][0] = x[i][0] + dtv*v[i][0] + dtfmsq*f[i][0];
xshake[i][1] = x[i][1] + dtv*v[i][1] + dtfmsq*f[i][1];
xshake[i][2] = x[i][2] + dtv*v[i][2] + dtfmsq*f[i][2];
for (jlevel = 0; jlevel < ilevel; jlevel++) {
dtfmsq = dtf_innerhalf * step_respa[jlevel] * invmass;
xshake[i][0] += dtfmsq*f_level[i][jlevel][0];
xshake[i][1] += dtfmsq*f_level[i][jlevel][1];
xshake[i][2] += dtfmsq*f_level[i][jlevel][2];
}
} else xshake[i][2] = xshake[i][1] = xshake[i][0] = 0.0;
}
} else {
for (int i = 0; i < nlocal; i++) {
if (shake_flag[i]) {
invmass = 1.0 / mass[type[i]];
dtfmsq = dtfsq * invmass;
xshake[i][0] = x[i][0] + dtv*v[i][0] + dtfmsq*f[i][0];
xshake[i][1] = x[i][1] + dtv*v[i][1] + dtfmsq*f[i][1];
xshake[i][2] = x[i][2] + dtv*v[i][2] + dtfmsq*f[i][2];
for (jlevel = 0; jlevel < ilevel; jlevel++) {
dtfmsq = dtf_innerhalf * step_respa[jlevel] * invmass;
xshake[i][0] += dtfmsq*f_level[i][jlevel][0];
xshake[i][1] += dtfmsq*f_level[i][jlevel][1];
xshake[i][2] += dtfmsq*f_level[i][jlevel][2];
}
} else xshake[i][2] = xshake[i][1] = xshake[i][0] = 0.0;
}
}
}
/* ---------------------------------------------------------------------- */
void FixShake::shake(int m)
{
int nlist,list[2];
double v[6];
double invmass0,invmass1;
// local atom IDs and constraint distances
int i0 = atom->map(shake_atom[m][0]);
int i1 = atom->map(shake_atom[m][1]);
double bond1 = bond_distance[shake_type[m][0]];
// r01 = distance vec between atoms, with PBC
double r01[3];
r01[0] = x[i0][0] - x[i1][0];
r01[1] = x[i0][1] - x[i1][1];
r01[2] = x[i0][2] - x[i1][2];
domain->minimum_image(r01);
// s01 = distance vec after unconstrained update, with PBC
+ // use Domain::minimum_image_once(), not minimum_image()
+ // b/c xshake values might be huge, due to e.g. fix gcmc
double s01[3];
s01[0] = xshake[i0][0] - xshake[i1][0];
s01[1] = xshake[i0][1] - xshake[i1][1];
s01[2] = xshake[i0][2] - xshake[i1][2];
- domain->minimum_image(s01);
+ domain->minimum_image_once(s01);
// scalar distances between atoms
double r01sq = r01[0]*r01[0] + r01[1]*r01[1] + r01[2]*r01[2];
double s01sq = s01[0]*s01[0] + s01[1]*s01[1] + s01[2]*s01[2];
// a,b,c = coeffs in quadratic equation for lamda
if (rmass) {
invmass0 = 1.0/rmass[i0];
invmass1 = 1.0/rmass[i1];
} else {
invmass0 = 1.0/mass[type[i0]];
invmass1 = 1.0/mass[type[i1]];
}
double a = (invmass0+invmass1)*(invmass0+invmass1) * r01sq;
double b = 2.0 * (invmass0+invmass1) *
(s01[0]*r01[0] + s01[1]*r01[1] + s01[2]*r01[2]);
double c = s01sq - bond1*bond1;
// error check
double determ = b*b - 4.0*a*c;
if (determ < 0.0) {
error->warning(FLERR,"Shake determinant < 0.0",0);
determ = 0.0;
}
// exact quadratic solution for lamda
double lamda,lamda1,lamda2;
lamda1 = (-b+sqrt(determ)) / (2.0*a);
lamda2 = (-b-sqrt(determ)) / (2.0*a);
if (fabs(lamda1) <= fabs(lamda2)) lamda = lamda1;
else lamda = lamda2;
// update forces if atom is owned by this processor
lamda /= dtfsq;
if (i0 < nlocal) {
f[i0][0] += lamda*r01[0];
f[i0][1] += lamda*r01[1];
f[i0][2] += lamda*r01[2];
}
if (i1 < nlocal) {
f[i1][0] -= lamda*r01[0];
f[i1][1] -= lamda*r01[1];
f[i1][2] -= lamda*r01[2];
}
if (evflag) {
nlist = 0;
if (i0 < nlocal) list[nlist++] = i0;
if (i1 < nlocal) list[nlist++] = i1;
v[0] = lamda*r01[0]*r01[0];
v[1] = lamda*r01[1]*r01[1];
v[2] = lamda*r01[2]*r01[2];
v[3] = lamda*r01[0]*r01[1];
v[4] = lamda*r01[0]*r01[2];
v[5] = lamda*r01[1]*r01[2];
v_tally(nlist,list,2.0,v);
}
}
/* ---------------------------------------------------------------------- */
void FixShake::shake3(int m)
{
int nlist,list[3];
double v[6];
double invmass0,invmass1,invmass2;
// local atom IDs and constraint distances
int i0 = atom->map(shake_atom[m][0]);
int i1 = atom->map(shake_atom[m][1]);
int i2 = atom->map(shake_atom[m][2]);
double bond1 = bond_distance[shake_type[m][0]];
double bond2 = bond_distance[shake_type[m][1]];
// r01,r02 = distance vec between atoms, with PBC
double r01[3];
r01[0] = x[i0][0] - x[i1][0];
r01[1] = x[i0][1] - x[i1][1];
r01[2] = x[i0][2] - x[i1][2];
domain->minimum_image(r01);
double r02[3];
r02[0] = x[i0][0] - x[i2][0];
r02[1] = x[i0][1] - x[i2][1];
r02[2] = x[i0][2] - x[i2][2];
domain->minimum_image(r02);
// s01,s02 = distance vec after unconstrained update, with PBC
+ // use Domain::minimum_image_once(), not minimum_image()
+ // b/c xshake values might be huge, due to e.g. fix gcmc
double s01[3];
s01[0] = xshake[i0][0] - xshake[i1][0];
s01[1] = xshake[i0][1] - xshake[i1][1];
s01[2] = xshake[i0][2] - xshake[i1][2];
- domain->minimum_image(s01);
+ domain->minimum_image_once(s01);
double s02[3];
s02[0] = xshake[i0][0] - xshake[i2][0];
s02[1] = xshake[i0][1] - xshake[i2][1];
s02[2] = xshake[i0][2] - xshake[i2][2];
- domain->minimum_image(s02);
+ domain->minimum_image_once(s02);
// scalar distances between atoms
double r01sq = r01[0]*r01[0] + r01[1]*r01[1] + r01[2]*r01[2];
double r02sq = r02[0]*r02[0] + r02[1]*r02[1] + r02[2]*r02[2];
double s01sq = s01[0]*s01[0] + s01[1]*s01[1] + s01[2]*s01[2];
double s02sq = s02[0]*s02[0] + s02[1]*s02[1] + s02[2]*s02[2];
// matrix coeffs and rhs for lamda equations
if (rmass) {
invmass0 = 1.0/rmass[i0];
invmass1 = 1.0/rmass[i1];
invmass2 = 1.0/rmass[i2];
} else {
invmass0 = 1.0/mass[type[i0]];
invmass1 = 1.0/mass[type[i1]];
invmass2 = 1.0/mass[type[i2]];
}
double a11 = 2.0 * (invmass0+invmass1) *
(s01[0]*r01[0] + s01[1]*r01[1] + s01[2]*r01[2]);
double a12 = 2.0 * invmass0 *
(s01[0]*r02[0] + s01[1]*r02[1] + s01[2]*r02[2]);
double a21 = 2.0 * invmass0 *
(s02[0]*r01[0] + s02[1]*r01[1] + s02[2]*r01[2]);
double a22 = 2.0 * (invmass0+invmass2) *
(s02[0]*r02[0] + s02[1]*r02[1] + s02[2]*r02[2]);
// inverse of matrix
double determ = a11*a22 - a12*a21;
if (determ == 0.0) error->one(FLERR,"Shake determinant = 0.0");
double determinv = 1.0/determ;
double a11inv = a22*determinv;
double a12inv = -a12*determinv;
double a21inv = -a21*determinv;
double a22inv = a11*determinv;
// quadratic correction coeffs
double r0102 = (r01[0]*r02[0] + r01[1]*r02[1] + r01[2]*r02[2]);
double quad1_0101 = (invmass0+invmass1)*(invmass0+invmass1) * r01sq;
double quad1_0202 = invmass0*invmass0 * r02sq;
double quad1_0102 = 2.0 * (invmass0+invmass1)*invmass0 * r0102;
double quad2_0202 = (invmass0+invmass2)*(invmass0+invmass2) * r02sq;
double quad2_0101 = invmass0*invmass0 * r01sq;
double quad2_0102 = 2.0 * (invmass0+invmass2)*invmass0 * r0102;
// iterate until converged
double lamda01 = 0.0;
double lamda02 = 0.0;
int niter = 0;
int done = 0;
double quad1,quad2,b1,b2,lamda01_new,lamda02_new;
while (!done && niter < max_iter) {
quad1 = quad1_0101 * lamda01*lamda01 + quad1_0202 * lamda02*lamda02 +
quad1_0102 * lamda01*lamda02;
quad2 = quad2_0101 * lamda01*lamda01 + quad2_0202 * lamda02*lamda02 +
quad2_0102 * lamda01*lamda02;
b1 = bond1*bond1 - s01sq - quad1;
b2 = bond2*bond2 - s02sq - quad2;
lamda01_new = a11inv*b1 + a12inv*b2;
lamda02_new = a21inv*b1 + a22inv*b2;
done = 1;
if (fabs(lamda01_new-lamda01) > tolerance) done = 0;
if (fabs(lamda02_new-lamda02) > tolerance) done = 0;
lamda01 = lamda01_new;
lamda02 = lamda02_new;
niter++;
}
// update forces if atom is owned by this processor
lamda01 = lamda01/dtfsq;
lamda02 = lamda02/dtfsq;
if (i0 < nlocal) {
f[i0][0] += lamda01*r01[0] + lamda02*r02[0];
f[i0][1] += lamda01*r01[1] + lamda02*r02[1];
f[i0][2] += lamda01*r01[2] + lamda02*r02[2];
}
if (i1 < nlocal) {
f[i1][0] -= lamda01*r01[0];
f[i1][1] -= lamda01*r01[1];
f[i1][2] -= lamda01*r01[2];
}
if (i2 < nlocal) {
f[i2][0] -= lamda02*r02[0];
f[i2][1] -= lamda02*r02[1];
f[i2][2] -= lamda02*r02[2];
}
if (evflag) {
nlist = 0;
if (i0 < nlocal) list[nlist++] = i0;
if (i1 < nlocal) list[nlist++] = i1;
if (i2 < nlocal) list[nlist++] = i2;
v[0] = lamda01*r01[0]*r01[0] + lamda02*r02[0]*r02[0];
v[1] = lamda01*r01[1]*r01[1] + lamda02*r02[1]*r02[1];
v[2] = lamda01*r01[2]*r01[2] + lamda02*r02[2]*r02[2];
v[3] = lamda01*r01[0]*r01[1] + lamda02*r02[0]*r02[1];
v[4] = lamda01*r01[0]*r01[2] + lamda02*r02[0]*r02[2];
v[5] = lamda01*r01[1]*r01[2] + lamda02*r02[1]*r02[2];
v_tally(nlist,list,3.0,v);
}
}
/* ---------------------------------------------------------------------- */
void FixShake::shake4(int m)
{
int nlist,list[4];
double v[6];
double invmass0,invmass1,invmass2,invmass3;
// local atom IDs and constraint distances
int i0 = atom->map(shake_atom[m][0]);
int i1 = atom->map(shake_atom[m][1]);
int i2 = atom->map(shake_atom[m][2]);
int i3 = atom->map(shake_atom[m][3]);
double bond1 = bond_distance[shake_type[m][0]];
double bond2 = bond_distance[shake_type[m][1]];
double bond3 = bond_distance[shake_type[m][2]];
// r01,r02,r03 = distance vec between atoms, with PBC
double r01[3];
r01[0] = x[i0][0] - x[i1][0];
r01[1] = x[i0][1] - x[i1][1];
r01[2] = x[i0][2] - x[i1][2];
domain->minimum_image(r01);
double r02[3];
r02[0] = x[i0][0] - x[i2][0];
r02[1] = x[i0][1] - x[i2][1];
r02[2] = x[i0][2] - x[i2][2];
domain->minimum_image(r02);
double r03[3];
r03[0] = x[i0][0] - x[i3][0];
r03[1] = x[i0][1] - x[i3][1];
r03[2] = x[i0][2] - x[i3][2];
domain->minimum_image(r03);
// s01,s02,s03 = distance vec after unconstrained update, with PBC
+ // use Domain::minimum_image_once(), not minimum_image()
+ // b/c xshake values might be huge, due to e.g. fix gcmc
double s01[3];
s01[0] = xshake[i0][0] - xshake[i1][0];
s01[1] = xshake[i0][1] - xshake[i1][1];
s01[2] = xshake[i0][2] - xshake[i1][2];
- domain->minimum_image(s01);
+ domain->minimum_image_once(s01);
double s02[3];
s02[0] = xshake[i0][0] - xshake[i2][0];
s02[1] = xshake[i0][1] - xshake[i2][1];
s02[2] = xshake[i0][2] - xshake[i2][2];
- domain->minimum_image(s02);
+ domain->minimum_image_once(s02);
double s03[3];
s03[0] = xshake[i0][0] - xshake[i3][0];
s03[1] = xshake[i0][1] - xshake[i3][1];
s03[2] = xshake[i0][2] - xshake[i3][2];
- domain->minimum_image(s03);
+ domain->minimum_image_once(s03);
// scalar distances between atoms
double r01sq = r01[0]*r01[0] + r01[1]*r01[1] + r01[2]*r01[2];
double r02sq = r02[0]*r02[0] + r02[1]*r02[1] + r02[2]*r02[2];
double r03sq = r03[0]*r03[0] + r03[1]*r03[1] + r03[2]*r03[2];
double s01sq = s01[0]*s01[0] + s01[1]*s01[1] + s01[2]*s01[2];
double s02sq = s02[0]*s02[0] + s02[1]*s02[1] + s02[2]*s02[2];
double s03sq = s03[0]*s03[0] + s03[1]*s03[1] + s03[2]*s03[2];
// matrix coeffs and rhs for lamda equations
if (rmass) {
invmass0 = 1.0/rmass[i0];
invmass1 = 1.0/rmass[i1];
invmass2 = 1.0/rmass[i2];
invmass3 = 1.0/rmass[i3];
} else {
invmass0 = 1.0/mass[type[i0]];
invmass1 = 1.0/mass[type[i1]];
invmass2 = 1.0/mass[type[i2]];
invmass3 = 1.0/mass[type[i3]];
}
double a11 = 2.0 * (invmass0+invmass1) *
(s01[0]*r01[0] + s01[1]*r01[1] + s01[2]*r01[2]);
double a12 = 2.0 * invmass0 *
(s01[0]*r02[0] + s01[1]*r02[1] + s01[2]*r02[2]);
double a13 = 2.0 * invmass0 *
(s01[0]*r03[0] + s01[1]*r03[1] + s01[2]*r03[2]);
double a21 = 2.0 * invmass0 *
(s02[0]*r01[0] + s02[1]*r01[1] + s02[2]*r01[2]);
double a22 = 2.0 * (invmass0+invmass2) *
(s02[0]*r02[0] + s02[1]*r02[1] + s02[2]*r02[2]);
double a23 = 2.0 * invmass0 *
(s02[0]*r03[0] + s02[1]*r03[1] + s02[2]*r03[2]);
double a31 = 2.0 * invmass0 *
(s03[0]*r01[0] + s03[1]*r01[1] + s03[2]*r01[2]);
double a32 = 2.0 * invmass0 *
(s03[0]*r02[0] + s03[1]*r02[1] + s03[2]*r02[2]);
double a33 = 2.0 * (invmass0+invmass3) *
(s03[0]*r03[0] + s03[1]*r03[1] + s03[2]*r03[2]);
// inverse of matrix;
double determ = a11*a22*a33 + a12*a23*a31 + a13*a21*a32 -
a11*a23*a32 - a12*a21*a33 - a13*a22*a31;
if (determ == 0.0) error->one(FLERR,"Shake determinant = 0.0");
double determinv = 1.0/determ;
double a11inv = determinv * (a22*a33 - a23*a32);
double a12inv = -determinv * (a12*a33 - a13*a32);
double a13inv = determinv * (a12*a23 - a13*a22);
double a21inv = -determinv * (a21*a33 - a23*a31);
double a22inv = determinv * (a11*a33 - a13*a31);
double a23inv = -determinv * (a11*a23 - a13*a21);
double a31inv = determinv * (a21*a32 - a22*a31);
double a32inv = -determinv * (a11*a32 - a12*a31);
double a33inv = determinv * (a11*a22 - a12*a21);
// quadratic correction coeffs
double r0102 = (r01[0]*r02[0] + r01[1]*r02[1] + r01[2]*r02[2]);
double r0103 = (r01[0]*r03[0] + r01[1]*r03[1] + r01[2]*r03[2]);
double r0203 = (r02[0]*r03[0] + r02[1]*r03[1] + r02[2]*r03[2]);
double quad1_0101 = (invmass0+invmass1)*(invmass0+invmass1) * r01sq;
double quad1_0202 = invmass0*invmass0 * r02sq;
double quad1_0303 = invmass0*invmass0 * r03sq;
double quad1_0102 = 2.0 * (invmass0+invmass1)*invmass0 * r0102;
double quad1_0103 = 2.0 * (invmass0+invmass1)*invmass0 * r0103;
double quad1_0203 = 2.0 * invmass0*invmass0 * r0203;
double quad2_0101 = invmass0*invmass0 * r01sq;
double quad2_0202 = (invmass0+invmass2)*(invmass0+invmass2) * r02sq;
double quad2_0303 = invmass0*invmass0 * r03sq;
double quad2_0102 = 2.0 * (invmass0+invmass2)*invmass0 * r0102;
double quad2_0103 = 2.0 * invmass0*invmass0 * r0103;
double quad2_0203 = 2.0 * (invmass0+invmass2)*invmass0 * r0203;
double quad3_0101 = invmass0*invmass0 * r01sq;
double quad3_0202 = invmass0*invmass0 * r02sq;
double quad3_0303 = (invmass0+invmass3)*(invmass0+invmass3) * r03sq;
double quad3_0102 = 2.0 * invmass0*invmass0 * r0102;
double quad3_0103 = 2.0 * (invmass0+invmass3)*invmass0 * r0103;
double quad3_0203 = 2.0 * (invmass0+invmass3)*invmass0 * r0203;
// iterate until converged
double lamda01 = 0.0;
double lamda02 = 0.0;
double lamda03 = 0.0;
int niter = 0;
int done = 0;
double quad1,quad2,quad3,b1,b2,b3,lamda01_new,lamda02_new,lamda03_new;
while (!done && niter < max_iter) {
quad1 = quad1_0101 * lamda01*lamda01 +
quad1_0202 * lamda02*lamda02 +
quad1_0303 * lamda03*lamda03 +
quad1_0102 * lamda01*lamda02 +
quad1_0103 * lamda01*lamda03 +
quad1_0203 * lamda02*lamda03;
quad2 = quad2_0101 * lamda01*lamda01 +
quad2_0202 * lamda02*lamda02 +
quad2_0303 * lamda03*lamda03 +
quad2_0102 * lamda01*lamda02 +
quad2_0103 * lamda01*lamda03 +
quad2_0203 * lamda02*lamda03;
quad3 = quad3_0101 * lamda01*lamda01 +
quad3_0202 * lamda02*lamda02 +
quad3_0303 * lamda03*lamda03 +
quad3_0102 * lamda01*lamda02 +
quad3_0103 * lamda01*lamda03 +
quad3_0203 * lamda02*lamda03;
b1 = bond1*bond1 - s01sq - quad1;
b2 = bond2*bond2 - s02sq - quad2;
b3 = bond3*bond3 - s03sq - quad3;
lamda01_new = a11inv*b1 + a12inv*b2 + a13inv*b3;
lamda02_new = a21inv*b1 + a22inv*b2 + a23inv*b3;
lamda03_new = a31inv*b1 + a32inv*b2 + a33inv*b3;
done = 1;
if (fabs(lamda01_new-lamda01) > tolerance) done = 0;
if (fabs(lamda02_new-lamda02) > tolerance) done = 0;
if (fabs(lamda03_new-lamda03) > tolerance) done = 0;
lamda01 = lamda01_new;
lamda02 = lamda02_new;
lamda03 = lamda03_new;
niter++;
}
// update forces if atom is owned by this processor
lamda01 = lamda01/dtfsq;
lamda02 = lamda02/dtfsq;
lamda03 = lamda03/dtfsq;
if (i0 < nlocal) {
f[i0][0] += lamda01*r01[0] + lamda02*r02[0] + lamda03*r03[0];
f[i0][1] += lamda01*r01[1] + lamda02*r02[1] + lamda03*r03[1];
f[i0][2] += lamda01*r01[2] + lamda02*r02[2] + lamda03*r03[2];
}
if (i1 < nlocal) {
f[i1][0] -= lamda01*r01[0];
f[i1][1] -= lamda01*r01[1];
f[i1][2] -= lamda01*r01[2];
}
if (i2 < nlocal) {
f[i2][0] -= lamda02*r02[0];
f[i2][1] -= lamda02*r02[1];
f[i2][2] -= lamda02*r02[2];
}
if (i3 < nlocal) {
f[i3][0] -= lamda03*r03[0];
f[i3][1] -= lamda03*r03[1];
f[i3][2] -= lamda03*r03[2];
}
if (evflag) {
nlist = 0;
if (i0 < nlocal) list[nlist++] = i0;
if (i1 < nlocal) list[nlist++] = i1;
if (i2 < nlocal) list[nlist++] = i2;
if (i3 < nlocal) list[nlist++] = i3;
v[0] = lamda01*r01[0]*r01[0]+lamda02*r02[0]*r02[0]+lamda03*r03[0]*r03[0];
v[1] = lamda01*r01[1]*r01[1]+lamda02*r02[1]*r02[1]+lamda03*r03[1]*r03[1];
v[2] = lamda01*r01[2]*r01[2]+lamda02*r02[2]*r02[2]+lamda03*r03[2]*r03[2];
v[3] = lamda01*r01[0]*r01[1]+lamda02*r02[0]*r02[1]+lamda03*r03[0]*r03[1];
v[4] = lamda01*r01[0]*r01[2]+lamda02*r02[0]*r02[2]+lamda03*r03[0]*r03[2];
v[5] = lamda01*r01[1]*r01[2]+lamda02*r02[1]*r02[2]+lamda03*r03[1]*r03[2];
v_tally(nlist,list,4.0,v);
}
}
/* ---------------------------------------------------------------------- */
void FixShake::shake3angle(int m)
{
int nlist,list[3];
double v[6];
double invmass0,invmass1,invmass2;
// local atom IDs and constraint distances
int i0 = atom->map(shake_atom[m][0]);
int i1 = atom->map(shake_atom[m][1]);
int i2 = atom->map(shake_atom[m][2]);
double bond1 = bond_distance[shake_type[m][0]];
double bond2 = bond_distance[shake_type[m][1]];
double bond12 = angle_distance[shake_type[m][2]];
// r01,r02,r12 = distance vec between atoms, with PBC
double r01[3];
r01[0] = x[i0][0] - x[i1][0];
r01[1] = x[i0][1] - x[i1][1];
r01[2] = x[i0][2] - x[i1][2];
domain->minimum_image(r01);
double r02[3];
r02[0] = x[i0][0] - x[i2][0];
r02[1] = x[i0][1] - x[i2][1];
r02[2] = x[i0][2] - x[i2][2];
domain->minimum_image(r02);
double r12[3];
r12[0] = x[i1][0] - x[i2][0];
r12[1] = x[i1][1] - x[i2][1];
r12[2] = x[i1][2] - x[i2][2];
domain->minimum_image(r12);
// s01,s02,s12 = distance vec after unconstrained update, with PBC
+ // use Domain::minimum_image_once(), not minimum_image()
+ // b/c xshake values might be huge, due to e.g. fix gcmc
double s01[3];
s01[0] = xshake[i0][0] - xshake[i1][0];
s01[1] = xshake[i0][1] - xshake[i1][1];
s01[2] = xshake[i0][2] - xshake[i1][2];
- domain->minimum_image(s01);
+ domain->minimum_image_once(s01);
double s02[3];
s02[0] = xshake[i0][0] - xshake[i2][0];
s02[1] = xshake[i0][1] - xshake[i2][1];
s02[2] = xshake[i0][2] - xshake[i2][2];
- domain->minimum_image(s02);
+ domain->minimum_image_once(s02);
double s12[3];
s12[0] = xshake[i1][0] - xshake[i2][0];
s12[1] = xshake[i1][1] - xshake[i2][1];
s12[2] = xshake[i1][2] - xshake[i2][2];
- domain->minimum_image(s12);
+ domain->minimum_image_once(s12);
// scalar distances between atoms
double r01sq = r01[0]*r01[0] + r01[1]*r01[1] + r01[2]*r01[2];
double r02sq = r02[0]*r02[0] + r02[1]*r02[1] + r02[2]*r02[2];
double r12sq = r12[0]*r12[0] + r12[1]*r12[1] + r12[2]*r12[2];
double s01sq = s01[0]*s01[0] + s01[1]*s01[1] + s01[2]*s01[2];
double s02sq = s02[0]*s02[0] + s02[1]*s02[1] + s02[2]*s02[2];
double s12sq = s12[0]*s12[0] + s12[1]*s12[1] + s12[2]*s12[2];
// matrix coeffs and rhs for lamda equations
if (rmass) {
invmass0 = 1.0/rmass[i0];
invmass1 = 1.0/rmass[i1];
invmass2 = 1.0/rmass[i2];
} else {
invmass0 = 1.0/mass[type[i0]];
invmass1 = 1.0/mass[type[i1]];
invmass2 = 1.0/mass[type[i2]];
}
double a11 = 2.0 * (invmass0+invmass1) *
(s01[0]*r01[0] + s01[1]*r01[1] + s01[2]*r01[2]);
double a12 = 2.0 * invmass0 *
(s01[0]*r02[0] + s01[1]*r02[1] + s01[2]*r02[2]);
double a13 = - 2.0 * invmass1 *
(s01[0]*r12[0] + s01[1]*r12[1] + s01[2]*r12[2]);
double a21 = 2.0 * invmass0 *
(s02[0]*r01[0] + s02[1]*r01[1] + s02[2]*r01[2]);
double a22 = 2.0 * (invmass0+invmass2) *
(s02[0]*r02[0] + s02[1]*r02[1] + s02[2]*r02[2]);
double a23 = 2.0 * invmass2 *
(s02[0]*r12[0] + s02[1]*r12[1] + s02[2]*r12[2]);
double a31 = - 2.0 * invmass1 *
(s12[0]*r01[0] + s12[1]*r01[1] + s12[2]*r01[2]);
double a32 = 2.0 * invmass2 *
(s12[0]*r02[0] + s12[1]*r02[1] + s12[2]*r02[2]);
double a33 = 2.0 * (invmass1+invmass2) *
(s12[0]*r12[0] + s12[1]*r12[1] + s12[2]*r12[2]);
// inverse of matrix
double determ = a11*a22*a33 + a12*a23*a31 + a13*a21*a32 -
a11*a23*a32 - a12*a21*a33 - a13*a22*a31;
if (determ == 0.0) error->one(FLERR,"Shake determinant = 0.0");
double determinv = 1.0/determ;
double a11inv = determinv * (a22*a33 - a23*a32);
double a12inv = -determinv * (a12*a33 - a13*a32);
double a13inv = determinv * (a12*a23 - a13*a22);
double a21inv = -determinv * (a21*a33 - a23*a31);
double a22inv = determinv * (a11*a33 - a13*a31);
double a23inv = -determinv * (a11*a23 - a13*a21);
double a31inv = determinv * (a21*a32 - a22*a31);
double a32inv = -determinv * (a11*a32 - a12*a31);
double a33inv = determinv * (a11*a22 - a12*a21);
// quadratic correction coeffs
double r0102 = (r01[0]*r02[0] + r01[1]*r02[1] + r01[2]*r02[2]);
double r0112 = (r01[0]*r12[0] + r01[1]*r12[1] + r01[2]*r12[2]);
double r0212 = (r02[0]*r12[0] + r02[1]*r12[1] + r02[2]*r12[2]);
double quad1_0101 = (invmass0+invmass1)*(invmass0+invmass1) * r01sq;
double quad1_0202 = invmass0*invmass0 * r02sq;
double quad1_1212 = invmass1*invmass1 * r12sq;
double quad1_0102 = 2.0 * (invmass0+invmass1)*invmass0 * r0102;
double quad1_0112 = - 2.0 * (invmass0+invmass1)*invmass1 * r0112;
double quad1_0212 = - 2.0 * invmass0*invmass1 * r0212;
double quad2_0101 = invmass0*invmass0 * r01sq;
double quad2_0202 = (invmass0+invmass2)*(invmass0+invmass2) * r02sq;
double quad2_1212 = invmass2*invmass2 * r12sq;
double quad2_0102 = 2.0 * (invmass0+invmass2)*invmass0 * r0102;
double quad2_0112 = 2.0 * invmass0*invmass2 * r0112;
double quad2_0212 = 2.0 * (invmass0+invmass2)*invmass2 * r0212;
double quad3_0101 = invmass1*invmass1 * r01sq;
double quad3_0202 = invmass2*invmass2 * r02sq;
double quad3_1212 = (invmass1+invmass2)*(invmass1+invmass2) * r12sq;
double quad3_0102 = - 2.0 * invmass1*invmass2 * r0102;
double quad3_0112 = - 2.0 * (invmass1+invmass2)*invmass1 * r0112;
double quad3_0212 = 2.0 * (invmass1+invmass2)*invmass2 * r0212;
// iterate until converged
double lamda01 = 0.0;
double lamda02 = 0.0;
double lamda12 = 0.0;
int niter = 0;
int done = 0;
double quad1,quad2,quad3,b1,b2,b3,lamda01_new,lamda02_new,lamda12_new;
while (!done && niter < max_iter) {
+
quad1 = quad1_0101 * lamda01*lamda01 +
quad1_0202 * lamda02*lamda02 +
quad1_1212 * lamda12*lamda12 +
quad1_0102 * lamda01*lamda02 +
quad1_0112 * lamda01*lamda12 +
quad1_0212 * lamda02*lamda12;
quad2 = quad2_0101 * lamda01*lamda01 +
quad2_0202 * lamda02*lamda02 +
quad2_1212 * lamda12*lamda12 +
quad2_0102 * lamda01*lamda02 +
quad2_0112 * lamda01*lamda12 +
quad2_0212 * lamda02*lamda12;
quad3 = quad3_0101 * lamda01*lamda01 +
quad3_0202 * lamda02*lamda02 +
quad3_1212 * lamda12*lamda12 +
quad3_0102 * lamda01*lamda02 +
quad3_0112 * lamda01*lamda12 +
quad3_0212 * lamda02*lamda12;
b1 = bond1*bond1 - s01sq - quad1;
b2 = bond2*bond2 - s02sq - quad2;
b3 = bond12*bond12 - s12sq - quad3;
lamda01_new = a11inv*b1 + a12inv*b2 + a13inv*b3;
lamda02_new = a21inv*b1 + a22inv*b2 + a23inv*b3;
lamda12_new = a31inv*b1 + a32inv*b2 + a33inv*b3;
done = 1;
if (fabs(lamda01_new-lamda01) > tolerance) done = 0;
if (fabs(lamda02_new-lamda02) > tolerance) done = 0;
if (fabs(lamda12_new-lamda12) > tolerance) done = 0;
lamda01 = lamda01_new;
lamda02 = lamda02_new;
lamda12 = lamda12_new;
niter++;
}
// update forces if atom is owned by this processor
lamda01 = lamda01/dtfsq;
lamda02 = lamda02/dtfsq;
lamda12 = lamda12/dtfsq;
if (i0 < nlocal) {
f[i0][0] += lamda01*r01[0] + lamda02*r02[0];
f[i0][1] += lamda01*r01[1] + lamda02*r02[1];
f[i0][2] += lamda01*r01[2] + lamda02*r02[2];
}
if (i1 < nlocal) {
f[i1][0] -= lamda01*r01[0] - lamda12*r12[0];
f[i1][1] -= lamda01*r01[1] - lamda12*r12[1];
f[i1][2] -= lamda01*r01[2] - lamda12*r12[2];
}
if (i2 < nlocal) {
f[i2][0] -= lamda02*r02[0] + lamda12*r12[0];
f[i2][1] -= lamda02*r02[1] + lamda12*r12[1];
f[i2][2] -= lamda02*r02[2] + lamda12*r12[2];
}
if (evflag) {
nlist = 0;
if (i0 < nlocal) list[nlist++] = i0;
if (i1 < nlocal) list[nlist++] = i1;
if (i2 < nlocal) list[nlist++] = i2;
v[0] = lamda01*r01[0]*r01[0]+lamda02*r02[0]*r02[0]+lamda12*r12[0]*r12[0];
v[1] = lamda01*r01[1]*r01[1]+lamda02*r02[1]*r02[1]+lamda12*r12[1]*r12[1];
v[2] = lamda01*r01[2]*r01[2]+lamda02*r02[2]*r02[2]+lamda12*r12[2]*r12[2];
v[3] = lamda01*r01[0]*r01[1]+lamda02*r02[0]*r02[1]+lamda12*r12[0]*r12[1];
v[4] = lamda01*r01[0]*r01[2]+lamda02*r02[0]*r02[2]+lamda12*r12[0]*r12[2];
v[5] = lamda01*r01[1]*r01[2]+lamda02*r02[1]*r02[2]+lamda12*r12[1]*r12[2];
v_tally(nlist,list,3.0,v);
}
}
/* ----------------------------------------------------------------------
print-out bond & angle statistics
------------------------------------------------------------------------- */
void FixShake::stats()
{
int i,j,m,n,iatom,jatom,katom;
double delx,dely,delz;
double r,r1,r2,r3,angle;
// zero out accumulators
int nb = atom->nbondtypes + 1;
int na = atom->nangletypes + 1;
for (i = 0; i < nb; i++) {
b_count[i] = 0;
b_ave[i] = b_max[i] = 0.0;
b_min[i] = BIG;
}
for (i = 0; i < na; i++) {
a_count[i] = 0;
a_ave[i] = a_max[i] = 0.0;
a_min[i] = BIG;
}
// log stats for each bond & angle
// OK to double count since are just averaging
double **x = atom->x;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++) {
if (shake_flag[i] == 0) continue;
// bond stats
n = shake_flag[i];
if (n == 1) n = 3;
iatom = atom->map(shake_atom[i][0]);
for (j = 1; j < n; j++) {
jatom = atom->map(shake_atom[i][j]);
delx = x[iatom][0] - x[jatom][0];
dely = x[iatom][1] - x[jatom][1];
delz = x[iatom][2] - x[jatom][2];
domain->minimum_image(delx,dely,delz);
r = sqrt(delx*delx + dely*dely + delz*delz);
m = shake_type[i][j-1];
b_count[m]++;
b_ave[m] += r;
b_max[m] = MAX(b_max[m],r);
b_min[m] = MIN(b_min[m],r);
}
// angle stats
if (shake_flag[i] == 1) {
iatom = atom->map(shake_atom[i][0]);
jatom = atom->map(shake_atom[i][1]);
katom = atom->map(shake_atom[i][2]);
delx = x[iatom][0] - x[jatom][0];
dely = x[iatom][1] - x[jatom][1];
delz = x[iatom][2] - x[jatom][2];
domain->minimum_image(delx,dely,delz);
r1 = sqrt(delx*delx + dely*dely + delz*delz);
delx = x[iatom][0] - x[katom][0];
dely = x[iatom][1] - x[katom][1];
delz = x[iatom][2] - x[katom][2];
domain->minimum_image(delx,dely,delz);
r2 = sqrt(delx*delx + dely*dely + delz*delz);
delx = x[jatom][0] - x[katom][0];
dely = x[jatom][1] - x[katom][1];
delz = x[jatom][2] - x[katom][2];
domain->minimum_image(delx,dely,delz);
r3 = sqrt(delx*delx + dely*dely + delz*delz);
angle = acos((r1*r1 + r2*r2 - r3*r3) / (2.0*r1*r2));
angle *= 180.0/MY_PI;
m = shake_type[i][2];
a_count[m]++;
a_ave[m] += angle;
a_max[m] = MAX(a_max[m],angle);
a_min[m] = MIN(a_min[m],angle);
}
}
// sum across all procs
MPI_Allreduce(b_count,b_count_all,nb,MPI_INT,MPI_SUM,world);
MPI_Allreduce(b_ave,b_ave_all,nb,MPI_DOUBLE,MPI_SUM,world);
MPI_Allreduce(b_max,b_max_all,nb,MPI_DOUBLE,MPI_MAX,world);
MPI_Allreduce(b_min,b_min_all,nb,MPI_DOUBLE,MPI_MIN,world);
MPI_Allreduce(a_count,a_count_all,na,MPI_INT,MPI_SUM,world);
MPI_Allreduce(a_ave,a_ave_all,na,MPI_DOUBLE,MPI_SUM,world);
MPI_Allreduce(a_max,a_max_all,na,MPI_DOUBLE,MPI_MAX,world);
MPI_Allreduce(a_min,a_min_all,na,MPI_DOUBLE,MPI_MIN,world);
// print stats only for non-zero counts
if (me == 0) {
if (screen) {
fprintf(screen,
"SHAKE stats (type/ave/delta) on step " BIGINT_FORMAT "\n",
update->ntimestep);
for (i = 1; i < nb; i++)
if (b_count_all[i])
fprintf(screen," %d %g %g %d\n",i,
b_ave_all[i]/b_count_all[i],b_max_all[i]-b_min_all[i],
b_count_all[i]);
for (i = 1; i < na; i++)
if (a_count_all[i])
fprintf(screen," %d %g %g\n",i,
a_ave_all[i]/a_count_all[i],a_max_all[i]-a_min_all[i]);
}
if (logfile) {
fprintf(logfile,
"SHAKE stats (type/ave/delta) on step " BIGINT_FORMAT "\n",
update->ntimestep);
for (i = 0; i < nb; i++)
if (b_count_all[i])
fprintf(logfile," %d %g %g\n",i,
b_ave_all[i]/b_count_all[i],b_max_all[i]-b_min_all[i]);
for (i = 0; i < na; i++)
if (a_count_all[i])
fprintf(logfile," %d %g %g\n",i,
a_ave_all[i]/a_count_all[i],a_max_all[i]-a_min_all[i]);
}
}
// next timestep for stats
next_output += output_every;
}
/* ----------------------------------------------------------------------
find a bond between global atom IDs n1 and n2 stored with local atom i
if find it:
if setflag = 0, return bond type
if setflag = -1/1, set bond type to negative/positive and return 0
if do not find it, return 0
------------------------------------------------------------------------- */
int FixShake::bondtype_findset(int i, tagint n1, tagint n2, int setflag)
{
int m,nbonds;
int *btype;
if (molecular == 1) {
tagint *tag = atom->tag;
tagint **bond_atom = atom->bond_atom;
nbonds = atom->num_bond[i];
for (m = 0; m < nbonds; m++) {
if (n1 == tag[i] && n2 == bond_atom[i][m]) break;
if (n1 == bond_atom[i][m] && n2 == tag[i]) break;
}
} else {
int imol = atom->molindex[i];
int iatom = atom->molatom[i];
tagint *tag = atom->tag;
tagint tagprev = tag[i] - iatom - 1;
tagint *batom = atommols[imol]->bond_atom[iatom];
btype = atommols[imol]->bond_type[iatom];
nbonds = atommols[imol]->num_bond[iatom];
for (m = 0; m < nbonds; m++) {
if (n1 == tag[i] && n2 == batom[m]+tagprev) break;
if (n1 == batom[m]+tagprev && n2 == tag[i]) break;
}
}
if (m < nbonds) {
if (setflag == 0) {
if (molecular == 1) return atom->bond_type[i][m];
else return btype[m];
}
if (molecular == 1) {
if ((setflag < 0 && atom->bond_type[i][m] > 0) ||
(setflag > 0 && atom->bond_type[i][m] < 0))
atom->bond_type[i][m] = -atom->bond_type[i][m];
} else {
if ((setflag < 0 && btype[m] > 0) ||
(setflag > 0 && btype[m] < 0)) btype[m] = -btype[m];
}
}
return 0;
}
/* ----------------------------------------------------------------------
find an angle with global end atom IDs n1 and n2 stored with local atom i
if find it:
if setflag = 0, return angle type
if setflag = -1/1, set angle type to negative/positive and return 0
if do not find it, return 0
------------------------------------------------------------------------- */
int FixShake::angletype_findset(int i, tagint n1, tagint n2, int setflag)
{
int m,nangles;
int *atype;
if (molecular == 1) {
tagint **angle_atom1 = atom->angle_atom1;
tagint **angle_atom3 = atom->angle_atom3;
nangles = atom->num_angle[i];
for (m = 0; m < nangles; m++) {
if (n1 == angle_atom1[i][m] && n2 == angle_atom3[i][m]) break;
if (n1 == angle_atom3[i][m] && n2 == angle_atom1[i][m]) break;
}
} else {
int imol = atom->molindex[i];
int iatom = atom->molatom[i];
tagint *tag = atom->tag;
tagint tagprev = tag[i] - iatom - 1;
tagint *aatom1 = atommols[imol]->angle_atom1[iatom];
tagint *aatom3 = atommols[imol]->angle_atom3[iatom];
atype = atommols[imol]->angle_type[iatom];
nangles = atommols[imol]->num_angle[iatom];
for (m = 0; m < nangles; m++) {
if (n1 == aatom1[m]+tagprev && n2 == aatom3[m]+tagprev) break;
if (n1 == aatom3[m]+tagprev && n2 == aatom1[m]+tagprev) break;
}
}
if (m < nangles) {
if (setflag == 0) {
if (molecular == 1) return atom->angle_type[i][m];
else return atype[m];
}
if (molecular == 1) {
if ((setflag < 0 && atom->angle_type[i][m] > 0) ||
(setflag > 0 && atom->angle_type[i][m] < 0))
atom->angle_type[i][m] = -atom->angle_type[i][m];
} else {
if ((setflag < 0 && atype[m] > 0) ||
(setflag > 0 && atype[m] < 0)) atype[m] = -atype[m];
}
}
return 0;
}
/* ----------------------------------------------------------------------
memory usage of local atom-based arrays
------------------------------------------------------------------------- */
double FixShake::memory_usage()
{
int nmax = atom->nmax;
double bytes = nmax * sizeof(int);
bytes += nmax*4 * sizeof(int);
bytes += nmax*3 * sizeof(int);
bytes += nmax*3 * sizeof(double);
bytes += maxvatom*6 * sizeof(double);
return bytes;
}
/* ----------------------------------------------------------------------
allocate local atom-based arrays
------------------------------------------------------------------------- */
void FixShake::grow_arrays(int nmax)
{
memory->grow(shake_flag,nmax,"shake:shake_flag");
memory->grow(shake_atom,nmax,4,"shake:shake_atom");
memory->grow(shake_type,nmax,3,"shake:shake_type");
memory->destroy(xshake);
memory->create(xshake,nmax,3,"shake:xshake");
memory->destroy(ftmp);
memory->create(ftmp,nmax,3,"shake:ftmp");
memory->destroy(vtmp);
memory->create(vtmp,nmax,3,"shake:vtmp");
}
/* ----------------------------------------------------------------------
copy values within local atom-based arrays
------------------------------------------------------------------------- */
void FixShake::copy_arrays(int i, int j, int delflag)
{
int flag = shake_flag[j] = shake_flag[i];
if (flag == 1) {
shake_atom[j][0] = shake_atom[i][0];
shake_atom[j][1] = shake_atom[i][1];
shake_atom[j][2] = shake_atom[i][2];
shake_type[j][0] = shake_type[i][0];
shake_type[j][1] = shake_type[i][1];
shake_type[j][2] = shake_type[i][2];
} else if (flag == 2) {
shake_atom[j][0] = shake_atom[i][0];
shake_atom[j][1] = shake_atom[i][1];
shake_type[j][0] = shake_type[i][0];
} else if (flag == 3) {
shake_atom[j][0] = shake_atom[i][0];
shake_atom[j][1] = shake_atom[i][1];
shake_atom[j][2] = shake_atom[i][2];
shake_type[j][0] = shake_type[i][0];
shake_type[j][1] = shake_type[i][1];
} else if (flag == 4) {
shake_atom[j][0] = shake_atom[i][0];
shake_atom[j][1] = shake_atom[i][1];
shake_atom[j][2] = shake_atom[i][2];
shake_atom[j][3] = shake_atom[i][3];
shake_type[j][0] = shake_type[i][0];
shake_type[j][1] = shake_type[i][1];
shake_type[j][2] = shake_type[i][2];
}
}
/* ----------------------------------------------------------------------
initialize one atom's array values, called when atom is created
------------------------------------------------------------------------- */
void FixShake::set_arrays(int i)
{
shake_flag[i] = 0;
}
/* ----------------------------------------------------------------------
update one atom's array values
called when molecule is created from fix gcmc
------------------------------------------------------------------------- */
void FixShake::update_arrays(int i, int atom_offset)
{
int flag = shake_flag[i];
if (flag == 1) {
shake_atom[i][0] += atom_offset;
shake_atom[i][1] += atom_offset;
shake_atom[i][2] += atom_offset;
} else if (flag == 2) {
shake_atom[i][0] += atom_offset;
shake_atom[i][1] += atom_offset;
} else if (flag == 3) {
shake_atom[i][0] += atom_offset;
shake_atom[i][1] += atom_offset;
shake_atom[i][2] += atom_offset;
} else if (flag == 4) {
shake_atom[i][0] += atom_offset;
shake_atom[i][1] += atom_offset;
shake_atom[i][2] += atom_offset;
shake_atom[i][3] += atom_offset;
}
}
/* ----------------------------------------------------------------------
initialize a molecule inserted by another fix, e.g. deposit or pour
called when molecule is created
nlocalprev = # of atoms on this proc before molecule inserted
tagprev = atom ID previous to new atoms in the molecule
xgeom,vcm,quat ignored
------------------------------------------------------------------------- */
void FixShake::set_molecule(int nlocalprev, tagint tagprev, int imol,
double *xgeom, double *vcm, double *quat)
{
int m,flag;
int nlocal = atom->nlocal;
if (nlocalprev == nlocal) return;
tagint *tag = atom->tag;
tagint **mol_shake_atom = onemols[imol]->shake_atom;
int **mol_shake_type = onemols[imol]->shake_type;
for (int i = nlocalprev; i < nlocal; i++) {
m = tag[i] - tagprev-1;
flag = shake_flag[i] = onemols[imol]->shake_flag[m];
if (flag == 1) {
shake_atom[i][0] = mol_shake_atom[m][0] + tagprev;
shake_atom[i][1] = mol_shake_atom[m][1] + tagprev;
shake_atom[i][2] = mol_shake_atom[m][2] + tagprev;
shake_type[i][0] = mol_shake_type[m][0];
shake_type[i][1] = mol_shake_type[m][1];
shake_type[i][2] = mol_shake_type[m][2];
} else if (flag == 2) {
shake_atom[i][0] = mol_shake_atom[m][0] + tagprev;
shake_atom[i][1] = mol_shake_atom[m][1] + tagprev;
shake_type[i][0] = mol_shake_type[m][0];
} else if (flag == 3) {
shake_atom[i][0] = mol_shake_atom[m][0] + tagprev;
shake_atom[i][1] = mol_shake_atom[m][1] + tagprev;
shake_atom[i][2] = mol_shake_atom[m][2] + tagprev;
shake_type[i][0] = mol_shake_type[m][0];
shake_type[i][1] = mol_shake_type[m][1];
} else if (flag == 4) {
shake_atom[i][0] = mol_shake_atom[m][0] + tagprev;
shake_atom[i][1] = mol_shake_atom[m][1] + tagprev;
shake_atom[i][2] = mol_shake_atom[m][2] + tagprev;
shake_atom[i][3] = mol_shake_atom[m][3] + tagprev;
shake_type[i][0] = mol_shake_type[m][0];
shake_type[i][1] = mol_shake_type[m][1];
shake_type[i][2] = mol_shake_type[m][2];
}
}
}
/* ----------------------------------------------------------------------
pack values in local atom-based arrays for exchange with another proc
------------------------------------------------------------------------- */
int FixShake::pack_exchange(int i, double *buf)
{
int m = 0;
buf[m++] = shake_flag[i];
int flag = shake_flag[i];
if (flag == 1) {
buf[m++] = shake_atom[i][0];
buf[m++] = shake_atom[i][1];
buf[m++] = shake_atom[i][2];
buf[m++] = shake_type[i][0];
buf[m++] = shake_type[i][1];
buf[m++] = shake_type[i][2];
} else if (flag == 2) {
buf[m++] = shake_atom[i][0];
buf[m++] = shake_atom[i][1];
buf[m++] = shake_type[i][0];
} else if (flag == 3) {
buf[m++] = shake_atom[i][0];
buf[m++] = shake_atom[i][1];
buf[m++] = shake_atom[i][2];
buf[m++] = shake_type[i][0];
buf[m++] = shake_type[i][1];
} else if (flag == 4) {
buf[m++] = shake_atom[i][0];
buf[m++] = shake_atom[i][1];
buf[m++] = shake_atom[i][2];
buf[m++] = shake_atom[i][3];
buf[m++] = shake_type[i][0];
buf[m++] = shake_type[i][1];
buf[m++] = shake_type[i][2];
}
return m;
}
/* ----------------------------------------------------------------------
unpack values in local atom-based arrays from exchange with another proc
------------------------------------------------------------------------- */
int FixShake::unpack_exchange(int nlocal, double *buf)
{
int m = 0;
int flag = shake_flag[nlocal] = static_cast<int> (buf[m++]);
if (flag == 1) {
shake_atom[nlocal][0] = static_cast<tagint> (buf[m++]);
shake_atom[nlocal][1] = static_cast<tagint> (buf[m++]);
shake_atom[nlocal][2] = static_cast<tagint> (buf[m++]);
shake_type[nlocal][0] = static_cast<int> (buf[m++]);
shake_type[nlocal][1] = static_cast<int> (buf[m++]);
shake_type[nlocal][2] = static_cast<int> (buf[m++]);
} else if (flag == 2) {
shake_atom[nlocal][0] = static_cast<tagint> (buf[m++]);
shake_atom[nlocal][1] = static_cast<tagint> (buf[m++]);
shake_type[nlocal][0] = static_cast<int> (buf[m++]);
} else if (flag == 3) {
shake_atom[nlocal][0] = static_cast<tagint> (buf[m++]);
shake_atom[nlocal][1] = static_cast<tagint> (buf[m++]);
shake_atom[nlocal][2] = static_cast<tagint> (buf[m++]);
shake_type[nlocal][0] = static_cast<int> (buf[m++]);
shake_type[nlocal][1] = static_cast<int> (buf[m++]);
} else if (flag == 4) {
shake_atom[nlocal][0] = static_cast<tagint> (buf[m++]);
shake_atom[nlocal][1] = static_cast<tagint> (buf[m++]);
shake_atom[nlocal][2] = static_cast<tagint> (buf[m++]);
shake_atom[nlocal][3] = static_cast<tagint> (buf[m++]);
shake_type[nlocal][0] = static_cast<int> (buf[m++]);
shake_type[nlocal][1] = static_cast<int> (buf[m++]);
shake_type[nlocal][2] = static_cast<int> (buf[m++]);
}
return m;
}
/* ---------------------------------------------------------------------- */
int FixShake::pack_forward_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
double dx,dy,dz;
m = 0;
if (pbc_flag == 0) {
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = xshake[j][0];
buf[m++] = xshake[j][1];
buf[m++] = xshake[j][2];
}
} else {
if (domain->triclinic == 0) {
dx = pbc[0]*domain->xprd;
dy = pbc[1]*domain->yprd;
dz = pbc[2]*domain->zprd;
} else {
dx = pbc[0]*domain->xprd + pbc[5]*domain->xy + pbc[4]*domain->xz;
dy = pbc[1]*domain->yprd + pbc[3]*domain->yz;
dz = pbc[2]*domain->zprd;
}
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = xshake[j][0] + dx;
buf[m++] = xshake[j][1] + dy;
buf[m++] = xshake[j][2] + dz;
}
}
return m;
}
/* ---------------------------------------------------------------------- */
void FixShake::unpack_forward_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
xshake[i][0] = buf[m++];
xshake[i][1] = buf[m++];
xshake[i][2] = buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void FixShake::reset_dt()
{
if (strstr(update->integrate_style,"verlet")) {
dtv = update->dt;
if (rattle) dtfsq = 0.5 * update->dt * update->dt * force->ftm2v;
else dtfsq = update->dt * update->dt * force->ftm2v;
} else {
dtv = step_respa[0];
dtf_innerhalf = 0.5 * step_respa[0] * force->ftm2v;
if (rattle) dtf_inner = dtf_innerhalf;
else dtf_inner = step_respa[0] * force->ftm2v;
}
}
/* ----------------------------------------------------------------------
extract Molecule ptr
------------------------------------------------------------------------- */
void *FixShake::extract(const char *str, int &dim)
{
dim = 0;
if (strcmp(str,"onemol") == 0) return onemols;
return NULL;
}
/* ----------------------------------------------------------------------
add coordinate constraining forces
this method is called at the end of a timestep
------------------------------------------------------------------------- */
void FixShake::shake_end_of_step(int vflag) {
if (!respa) {
dtv = update->dt;
dtfsq = 0.5 * update->dt * update->dt * force->ftm2v;
FixShake::post_force(vflag);
if (!rattle) dtfsq = update->dt * update->dt * force->ftm2v;
} else {
dtv = step_respa[0];
dtf_innerhalf = 0.5 * step_respa[0] * force->ftm2v;
dtf_inner = dtf_innerhalf;
// apply correction to all rRESPA levels
for (int ilevel = 0; ilevel < nlevels_respa; ilevel++) {
((Respa *) update->integrate)->copy_flevel_f(ilevel);
FixShake::post_force_respa(vflag,ilevel,loop_respa[ilevel]-1);
((Respa *) update->integrate)->copy_f_flevel(ilevel);
}
if (!rattle) dtf_inner = step_respa[0] * force->ftm2v;
}
}
/* ----------------------------------------------------------------------
wrapper method for end_of_step fixes which modify velocities
------------------------------------------------------------------------- */
void FixShake::correct_velocities() {}
/* ----------------------------------------------------------------------
calculate constraining forces based on the current configuration
change coordinates
------------------------------------------------------------------------- */
void FixShake::correct_coordinates(int vflag) {
// save current forces and velocities so that you
// initialise them to zero such that FixShake::unconstrained_coordinate_update has no effect
for (int j=0; j<nlocal; j++) {
for (int k=0; k<3; k++) {
// store current value of forces and velocities
ftmp[j][k] = f[j][k];
vtmp[j][k] = v[j][k];
// set f and v to zero for SHAKE
v[j][k] = 0;
f[j][k] = 0;
}
}
// call SHAKE to correct the coordinates which were updated without constraints
// IMPORTANT: use 1 as argument and thereby enforce velocity Verlet
dtfsq = 0.5 * update->dt * update->dt * force->ftm2v;
FixShake::post_force(vflag);
// integrate coordiantes: x' = xnp1 + dt^2/2m_i * f, where f is the constraining force
// NOTE: After this command, the coordinates geometry of the molecules will be correct!
double dtfmsq;
if (rmass) {
for (int i = 0; i < nlocal; i++) {
dtfmsq = dtfsq/ rmass[i];
x[i][0] = x[i][0] + dtfmsq*f[i][0];
x[i][1] = x[i][1] + dtfmsq*f[i][1];
x[i][2] = x[i][2] + dtfmsq*f[i][2];
}
}
else {
for (int i = 0; i < nlocal; i++) {
dtfmsq = dtfsq / mass[type[i]];
x[i][0] = x[i][0] + dtfmsq*f[i][0];
x[i][1] = x[i][1] + dtfmsq*f[i][1];
x[i][2] = x[i][2] + dtfmsq*f[i][2];
}
}
// copy forces and velocities back
for (int j=0; j<nlocal; j++) {
for (int k=0; k<3; k++) {
f[j][k] = ftmp[j][k];
v[j][k] = vtmp[j][k];
}
}
if (!rattle) dtfsq = update->dt * update->dt * force->ftm2v;
// communicate changes
// NOTE: for compatibility xshake is temporarily set to x, such that pack/unpack_forward
// can be used for communicating the coordinates.
double **xtmp = xshake;
xshake = x;
if (nprocs > 1) {
comm->forward_comm_fix(this);
}
xshake = xtmp;
}
diff --git a/src/SNAP/compute_sna_atom.cpp b/src/SNAP/compute_sna_atom.cpp
index ad934535a..cba6fae9b 100644
--- a/src/SNAP/compute_sna_atom.cpp
+++ b/src/SNAP/compute_sna_atom.cpp
@@ -1,286 +1,301 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include "sna.h"
#include <string.h>
#include <stdlib.h>
#include "compute_sna_atom.h"
#include "atom.h"
#include "update.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "force.h"
#include "pair.h"
#include "comm.h"
#include "memory.h"
#include "error.h"
#include "openmp_snap.h"
using namespace LAMMPS_NS;
ComputeSNAAtom::ComputeSNAAtom(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg), cutsq(NULL), list(NULL), sna(NULL),
radelem(NULL), wjelem(NULL)
{
double rmin0, rfac0;
int twojmax, switchflag, bzeroflag;
radelem = NULL;
wjelem = NULL;
int ntypes = atom->ntypes;
int nargmin = 6+2*ntypes;
if (narg < nargmin) error->all(FLERR,"Illegal compute sna/atom command");
// default values
diagonalstyle = 0;
rmin0 = 0.0;
switchflag = 1;
- bzeroflag = 0;
+ bzeroflag = 1;
+ quadraticflag = 0;
// offset by 1 to match up with types
memory->create(radelem,ntypes+1,"sna/atom:radelem");
memory->create(wjelem,ntypes+1,"sna/atom:wjelem");
rcutfac = atof(arg[3]);
rfac0 = atof(arg[4]);
twojmax = atoi(arg[5]);
for(int i = 0; i < ntypes; i++)
radelem[i+1] = atof(arg[6+i]);
for(int i = 0; i < ntypes; i++)
wjelem[i+1] = atof(arg[6+ntypes+i]);
// construct cutsq
double cut;
cutmax = 0.0;
memory->create(cutsq,ntypes+1,ntypes+1,"sna/atom:cutsq");
for(int i = 1; i <= ntypes; i++) {
cut = 2.0*radelem[i]*rcutfac;
if (cut > cutmax) cutmax = cut;
cutsq[i][i] = cut*cut;
for(int j = i+1; j <= ntypes; j++) {
cut = (radelem[i]+radelem[j])*rcutfac;
cutsq[i][j] = cutsq[j][i] = cut*cut;
}
}
// process optional args
int iarg = nargmin;
while (iarg < narg) {
if (strcmp(arg[iarg],"diagonal") == 0) {
if (iarg+2 > narg)
error->all(FLERR,"Illegal compute sna/atom command");
diagonalstyle = atoi(arg[iarg+1]);
if (diagonalstyle < 0 || diagonalstyle > 3)
error->all(FLERR,"Illegal compute sna/atom command");
iarg += 2;
} else if (strcmp(arg[iarg],"rmin0") == 0) {
if (iarg+2 > narg)
error->all(FLERR,"Illegal compute sna/atom command");
rmin0 = atof(arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"switchflag") == 0) {
if (iarg+2 > narg)
error->all(FLERR,"Illegal compute sna/atom command");
switchflag = atoi(arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"bzeroflag") == 0) {
if (iarg+2 > narg)
error->all(FLERR,"Illegal compute sna/atom command");
bzeroflag = atoi(arg[iarg+1]);
iarg += 2;
+ } else if (strcmp(arg[iarg],"quadraticflag") == 0) {
+ if (iarg+2 > narg)
+ error->all(FLERR,"Illegal compute sna/atom command");
+ quadraticflag = atoi(arg[iarg+1]);
+ iarg += 2;
} else error->all(FLERR,"Illegal compute sna/atom command");
}
snaptr = new SNA*[comm->nthreads];
#if defined(_OPENMP)
#pragma omp parallel default(none) shared(lmp,rfac0,twojmax,rmin0,switchflag,bzeroflag)
#endif
{
int tid = omp_get_thread_num();
// always unset use_shared_arrays since it does not work with computes
snaptr[tid] = new SNA(lmp,rfac0,twojmax,diagonalstyle,
0 /*use_shared_arrays*/, rmin0,switchflag,bzeroflag);
}
ncoeff = snaptr[0]->ncoeff;
- peratom_flag = 1;
size_peratom_cols = ncoeff;
+ if (quadraticflag) size_peratom_cols += ncoeff*ncoeff;
+ peratom_flag = 1;
nmax = 0;
njmax = 0;
sna = NULL;
}
/* ---------------------------------------------------------------------- */
ComputeSNAAtom::~ComputeSNAAtom()
{
memory->destroy(sna);
memory->destroy(radelem);
memory->destroy(wjelem);
memory->destroy(cutsq);
delete [] snaptr;
}
/* ---------------------------------------------------------------------- */
void ComputeSNAAtom::init()
{
if (force->pair == NULL)
error->all(FLERR,"Compute sna/atom requires a pair style be defined");
if (cutmax > force->pair->cutforce)
error->all(FLERR,"Compute sna/atom cutoff is longer than pairwise cutoff");
// need an occasional full neighbor list
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->pair = 0;
neighbor->requests[irequest]->compute = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->full = 1;
neighbor->requests[irequest]->occasional = 1;
int count = 0;
for (int i = 0; i < modify->ncompute; i++)
if (strcmp(modify->compute[i]->style,"sna/atom") == 0) count++;
if (count > 1 && comm->me == 0)
error->warning(FLERR,"More than one compute sna/atom");
#if defined(_OPENMP)
#pragma omp parallel default(none)
#endif
{
int tid = omp_get_thread_num();
snaptr[tid]->init();
}
}
/* ---------------------------------------------------------------------- */
void ComputeSNAAtom::init_list(int id, NeighList *ptr)
{
list = ptr;
}
/* ---------------------------------------------------------------------- */
void ComputeSNAAtom::compute_peratom()
{
invoked_peratom = update->ntimestep;
// grow sna array if necessary
if (atom->nmax > nmax) {
memory->destroy(sna);
nmax = atom->nmax;
memory->create(sna,nmax,size_peratom_cols,"sna/atom:sna");
array_atom = sna;
}
// invoke full neighbor list (will copy or build if necessary)
neighbor->build_one(list);
const int inum = list->inum;
const int* const ilist = list->ilist;
const int* const numneigh = list->numneigh;
int** const firstneigh = list->firstneigh;
int * const type = atom->type;
// compute sna for each atom in group
// use full neighbor list to count atoms less than cutoff
double** const x = atom->x;
const int* const mask = atom->mask;
#if defined(_OPENMP)
#pragma omp parallel for default(none)
#endif
for (int ii = 0; ii < inum; ii++) {
const int tid = omp_get_thread_num();
const int i = ilist[ii];
if (mask[i] & groupbit) {
const double xtmp = x[i][0];
const double ytmp = x[i][1];
const double ztmp = x[i][2];
const int itype = type[i];
const double radi = radelem[itype];
const int* const jlist = firstneigh[i];
const int jnum = numneigh[i];
// insure rij, inside, and typej are of size jnum
snaptr[tid]->grow_rij(jnum);
// rij[][3] = displacements between atom I and those neighbors
// inside = indices of neighbors of I within cutoff
// typej = types of neighbors of I within cutoff
int ninside = 0;
for (int jj = 0; jj < jnum; jj++) {
int j = jlist[jj];
j &= NEIGHMASK;
const double delx = xtmp - x[j][0];
const double dely = ytmp - x[j][1];
const double delz = ztmp - x[j][2];
const double rsq = delx*delx + dely*dely + delz*delz;
int jtype = type[j];
if (rsq < cutsq[itype][jtype] && rsq>1e-20) {
snaptr[tid]->rij[ninside][0] = delx;
snaptr[tid]->rij[ninside][1] = dely;
snaptr[tid]->rij[ninside][2] = delz;
snaptr[tid]->inside[ninside] = j;
snaptr[tid]->wj[ninside] = wjelem[jtype];
snaptr[tid]->rcutij[ninside] = (radi+radelem[jtype])*rcutfac;
ninside++;
}
}
snaptr[tid]->compute_ui(ninside);
snaptr[tid]->compute_zi();
snaptr[tid]->compute_bi();
snaptr[tid]->copy_bi2bvec();
for (int icoeff = 0; icoeff < ncoeff; icoeff++)
sna[i][icoeff] = snaptr[tid]->bvec[icoeff];
+ if (quadraticflag) {
+ int ncount = ncoeff;
+ for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
+ double bi = snaptr[tid]->bvec[icoeff];
+ for (int jcoeff = 0; jcoeff < ncoeff; jcoeff++)
+ sna[i][ncount++] = bi*snaptr[tid]->bvec[jcoeff];
+ }
+ }
} else {
- for (int icoeff = 0; icoeff < ncoeff; icoeff++)
+ for (int icoeff = 0; icoeff < size_peratom_cols; icoeff++)
sna[i][icoeff] = 0.0;
}
}
}
/* ----------------------------------------------------------------------
memory usage
------------------------------------------------------------------------- */
double ComputeSNAAtom::memory_usage()
{
double bytes = nmax*size_peratom_cols * sizeof(double);
bytes += 3*njmax*sizeof(double);
bytes += njmax*sizeof(int);
bytes += snaptr[0]->memory_usage()*comm->nthreads;
return bytes;
}
diff --git a/src/SNAP/compute_sna_atom.h b/src/SNAP/compute_sna_atom.h
index af62d7cf3..b22eea71b 100644
--- a/src/SNAP/compute_sna_atom.h
+++ b/src/SNAP/compute_sna_atom.h
@@ -1,75 +1,75 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef COMPUTE_CLASS
ComputeStyle(sna/atom,ComputeSNAAtom)
#else
#ifndef LMP_COMPUTE_SNA_ATOM_H
#define LMP_COMPUTE_SNA_ATOM_H
#include "compute.h"
namespace LAMMPS_NS {
class ComputeSNAAtom : public Compute {
public:
ComputeSNAAtom(class LAMMPS *, int, char **);
~ComputeSNAAtom();
void init();
void init_list(int, class NeighList *);
void compute_peratom();
double memory_usage();
private:
int nmax, njmax, diagonalstyle;
int ncoeff;
double **cutsq;
class NeighList *list;
double **sna;
double rcutfac;
double *radelem;
double *wjelem;
class SNA** snaptr;
double cutmax;
-
+ int quadraticflag;
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Compute sna/atom requires a pair style be defined
Self-explanatory.
E: Compute sna/atom cutoff is longer than pairwise cutoff
Self-explanatory.
W: More than one compute sna/atom
Self-explanatory.
*/
diff --git a/src/SNAP/compute_snad_atom.cpp b/src/SNAP/compute_snad_atom.cpp
index 73452427b..39f34dd8c 100644
--- a/src/SNAP/compute_snad_atom.cpp
+++ b/src/SNAP/compute_snad_atom.cpp
@@ -1,337 +1,391 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include "sna.h"
#include <string.h>
#include <stdlib.h>
#include "compute_snad_atom.h"
#include "atom.h"
#include "update.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "force.h"
#include "pair.h"
#include "comm.h"
#include "memory.h"
#include "error.h"
#include "openmp_snap.h"
using namespace LAMMPS_NS;
ComputeSNADAtom::ComputeSNADAtom(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg), cutsq(NULL), list(NULL), snad(NULL),
radelem(NULL), wjelem(NULL)
{
double rfac0, rmin0;
int twojmax, switchflag, bzeroflag;
radelem = NULL;
wjelem = NULL;
int ntypes = atom->ntypes;
int nargmin = 6+2*ntypes;
if (narg < nargmin) error->all(FLERR,"Illegal compute snad/atom command");
// default values
diagonalstyle = 0;
rmin0 = 0.0;
switchflag = 1;
- bzeroflag = 0;
+ bzeroflag = 1;
+ quadraticflag = 0;
// process required arguments
+
memory->create(radelem,ntypes+1,"sna/atom:radelem"); // offset by 1 to match up with types
memory->create(wjelem,ntypes+1,"sna/atom:wjelem");
rcutfac = atof(arg[3]);
rfac0 = atof(arg[4]);
twojmax = atoi(arg[5]);
for(int i = 0; i < ntypes; i++)
radelem[i+1] = atof(arg[6+i]);
for(int i = 0; i < ntypes; i++)
wjelem[i+1] = atof(arg[6+ntypes+i]);
+
// construct cutsq
+
double cut;
+ cutmax = 0.0;
memory->create(cutsq,ntypes+1,ntypes+1,"sna/atom:cutsq");
for(int i = 1; i <= ntypes; i++) {
cut = 2.0*radelem[i]*rcutfac;
+ if (cut > cutmax) cutmax = cut;
cutsq[i][i] = cut*cut;
for(int j = i+1; j <= ntypes; j++) {
cut = (radelem[i]+radelem[j])*rcutfac;
cutsq[i][j] = cutsq[j][i] = cut*cut;
}
}
// process optional args
int iarg = nargmin;
while (iarg < narg) {
if (strcmp(arg[iarg],"diagonal") == 0) {
if (iarg+2 > narg)
error->all(FLERR,"Illegal compute snad/atom command");
diagonalstyle = atof(arg[iarg+1]);
if (diagonalstyle < 0 || diagonalstyle > 3)
error->all(FLERR,"Illegal compute snad/atom command");
iarg += 2;
} else if (strcmp(arg[iarg],"rmin0") == 0) {
if (iarg+2 > narg)
error->all(FLERR,"Illegal compute snad/atom command");
rmin0 = atof(arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"switchflag") == 0) {
if (iarg+2 > narg)
error->all(FLERR,"Illegal compute snad/atom command");
switchflag = atoi(arg[iarg+1]);
iarg += 2;
+ } else if (strcmp(arg[iarg],"quadraticflag") == 0) {
+ if (iarg+2 > narg)
+ error->all(FLERR,"Illegal compute snad/atom command");
+ quadraticflag = atoi(arg[iarg+1]);
+ iarg += 2;
} else error->all(FLERR,"Illegal compute snad/atom command");
}
snaptr = new SNA*[comm->nthreads];
#if defined(_OPENMP)
#pragma omp parallel default(none) shared(lmp,rfac0,twojmax,rmin0,switchflag,bzeroflag)
#endif
{
int tid = omp_get_thread_num();
// always unset use_shared_arrays since it does not work with computes
snaptr[tid] = new SNA(lmp,rfac0,twojmax,diagonalstyle,
0 /*use_shared_arrays*/, rmin0,switchflag,bzeroflag);
}
ncoeff = snaptr[0]->ncoeff;
- peratom_flag = 1;
- size_peratom_cols = 3*ncoeff*atom->ntypes;
+ twoncoeff = 2*ncoeff;
+ threencoeff = 3*ncoeff;
+ size_peratom_cols = threencoeff*atom->ntypes;
+ if (quadraticflag) {
+ ncoeffsq = ncoeff*ncoeff;
+ twoncoeffsq = 2*ncoeffsq;
+ threencoeffsq = 3*ncoeffsq;
+ size_peratom_cols +=
+ threencoeffsq*atom->ntypes;
+ }
comm_reverse = size_peratom_cols;
+ peratom_flag = 1;
+
nmax = 0;
njmax = 0;
snad = NULL;
}
/* ---------------------------------------------------------------------- */
ComputeSNADAtom::~ComputeSNADAtom()
{
memory->destroy(snad);
memory->destroy(radelem);
memory->destroy(wjelem);
memory->destroy(cutsq);
delete [] snaptr;
}
/* ---------------------------------------------------------------------- */
void ComputeSNADAtom::init()
{
if (force->pair == NULL)
error->all(FLERR,"Compute snad/atom requires a pair style be defined");
- // TODO: Not sure what to do with this error check since cutoff radius is not
- // a single number
- //if (sqrt(cutsq) > force->pair->cutforce)
- //error->all(FLERR,"Compute snad/atom cutoff is longer than pairwise cutoff");
+
+ if (cutmax > force->pair->cutforce)
+ error->all(FLERR,"Compute sna/atom cutoff is longer than pairwise cutoff");
// need an occasional full neighbor list
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->pair = 0;
neighbor->requests[irequest]->compute = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->full = 1;
neighbor->requests[irequest]->occasional = 1;
int count = 0;
for (int i = 0; i < modify->ncompute; i++)
if (strcmp(modify->compute[i]->style,"snad/atom") == 0) count++;
if (count > 1 && comm->me == 0)
error->warning(FLERR,"More than one compute snad/atom");
#if defined(_OPENMP)
#pragma omp parallel default(none)
#endif
{
int tid = omp_get_thread_num();
snaptr[tid]->init();
}
}
/* ---------------------------------------------------------------------- */
void ComputeSNADAtom::init_list(int id, NeighList *ptr)
{
list = ptr;
}
/* ---------------------------------------------------------------------- */
void ComputeSNADAtom::compute_peratom()
{
int ntotal = atom->nlocal + atom->nghost;
invoked_peratom = update->ntimestep;
// grow snad array if necessary
if (atom->nmax > nmax) {
memory->destroy(snad);
nmax = atom->nmax;
memory->create(snad,nmax,size_peratom_cols,
"snad/atom:snad");
array_atom = snad;
}
// clear local array
for (int i = 0; i < ntotal; i++)
for (int icoeff = 0; icoeff < size_peratom_cols; icoeff++) {
snad[i][icoeff] = 0.0;
}
// invoke full neighbor list (will copy or build if necessary)
neighbor->build_one(list);
const int inum = list->inum;
const int* const ilist = list->ilist;
const int* const numneigh = list->numneigh;
int** const firstneigh = list->firstneigh;
int * const type = atom->type;
// compute sna derivatives for each atom in group
// use full neighbor list to count atoms less than cutoff
double** const x = atom->x;
const int* const mask = atom->mask;
#if defined(_OPENMP)
#pragma omp parallel for default(none)
#endif
for (int ii = 0; ii < inum; ii++) {
const int tid = omp_get_thread_num();
const int i = ilist[ii];
if (mask[i] & groupbit) {
const double xtmp = x[i][0];
const double ytmp = x[i][1];
const double ztmp = x[i][2];
const int itype = type[i];
const double radi = radelem[itype];
const int* const jlist = firstneigh[i];
const int jnum = numneigh[i];
- const int typeoffset = 3*ncoeff*(atom->type[i]-1);
+ const int typeoffset = threencoeff*(atom->type[i]-1);
+ const int quadraticoffset = threencoeff*atom->ntypes +
+ threencoeffsq*(atom->type[i]-1);
// insure rij, inside, and typej are of size jnum
snaptr[tid]->grow_rij(jnum);
// rij[][3] = displacements between atom I and those neighbors
// inside = indices of neighbors of I within cutoff
// typej = types of neighbors of I within cutoff
// note Rij sign convention => dU/dRij = dU/dRj = -dU/dRi
int ninside = 0;
for (int jj = 0; jj < jnum; jj++) {
int j = jlist[jj];
j &= NEIGHMASK;
const double delx = x[j][0] - xtmp;
const double dely = x[j][1] - ytmp;
const double delz = x[j][2] - ztmp;
const double rsq = delx*delx + dely*dely + delz*delz;
int jtype = type[j];
if (rsq < cutsq[itype][jtype]&&rsq>1e-20) {
snaptr[tid]->rij[ninside][0] = delx;
snaptr[tid]->rij[ninside][1] = dely;
snaptr[tid]->rij[ninside][2] = delz;
snaptr[tid]->inside[ninside] = j;
snaptr[tid]->wj[ninside] = wjelem[jtype];
snaptr[tid]->rcutij[ninside] = (radi+radelem[jtype])*rcutfac;
ninside++;
}
}
snaptr[tid]->compute_ui(ninside);
snaptr[tid]->compute_zi();
-
+ if (quadraticflag) {
+ snaptr[tid]->compute_bi();
+ snaptr[tid]->copy_bi2bvec();
+ }
+
for (int jj = 0; jj < ninside; jj++) {
const int j = snaptr[tid]->inside[jj];
snaptr[tid]->compute_duidrj(snaptr[tid]->rij[jj],
snaptr[tid]->wj[jj],
snaptr[tid]->rcutij[jj]);
snaptr[tid]->compute_dbidrj();
snaptr[tid]->copy_dbi2dbvec();
// Accumulate -dBi/dRi, -dBi/dRj
double *snadi = snad[i]+typeoffset;
double *snadj = snad[j]+typeoffset;
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
snadi[icoeff] += snaptr[tid]->dbvec[icoeff][0];
snadi[icoeff+ncoeff] += snaptr[tid]->dbvec[icoeff][1];
- snadi[icoeff+2*ncoeff] += snaptr[tid]->dbvec[icoeff][2];
+ snadi[icoeff+twoncoeff] += snaptr[tid]->dbvec[icoeff][2];
snadj[icoeff] -= snaptr[tid]->dbvec[icoeff][0];
snadj[icoeff+ncoeff] -= snaptr[tid]->dbvec[icoeff][1];
- snadj[icoeff+2*ncoeff] -= snaptr[tid]->dbvec[icoeff][2];
+ snadj[icoeff+twoncoeff] -= snaptr[tid]->dbvec[icoeff][2];
}
+
+ if (quadraticflag) {
+ double *snadi = snad[i]+quadraticoffset;
+ double *snadj = snad[j]+quadraticoffset;
+ int ncount = 0;
+ for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
+ double bi = snaptr[tid]->bvec[icoeff];
+ double bix = snaptr[tid]->dbvec[icoeff][0];
+ double biy = snaptr[tid]->dbvec[icoeff][1];
+ double biz = snaptr[tid]->dbvec[icoeff][2];
+ for (int jcoeff = 0; jcoeff < ncoeff; jcoeff++) {
+ double dbxtmp = bi*snaptr[tid]->dbvec[jcoeff][0]
+ + bix*snaptr[tid]->bvec[jcoeff];
+ double dbytmp = bi*snaptr[tid]->dbvec[jcoeff][1]
+ + biy*snaptr[tid]->bvec[jcoeff];
+ double dbztmp = bi*snaptr[tid]->dbvec[jcoeff][2]
+ + biz*snaptr[tid]->bvec[jcoeff];
+ snadi[ncount] += dbxtmp;
+ snadi[ncount+ncoeffsq] += dbytmp;
+ snadi[ncount+twoncoeffsq] += dbztmp;
+ snadj[ncount] -= dbxtmp;
+ snadj[ncount+ncoeffsq] -= dbytmp;
+ snadj[ncount+twoncoeffsq] -= dbztmp;
+ ncount++;
+ }
+ }
+ }
}
}
}
// communicate snad contributions between neighbor procs
comm->reverse_comm_compute(this);
}
/* ---------------------------------------------------------------------- */
int ComputeSNADAtom::pack_reverse_comm(int n, int first, double *buf)
{
int i,m,last,icoeff;
m = 0;
last = first + n;
for (i = first; i < last; i++)
for (icoeff = 0; icoeff < size_peratom_cols; icoeff++)
buf[m++] = snad[i][icoeff];
return comm_reverse;
}
/* ---------------------------------------------------------------------- */
void ComputeSNADAtom::unpack_reverse_comm(int n, int *list, double *buf)
{
int i,j,m,icoeff;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
for (icoeff = 0; icoeff < size_peratom_cols; icoeff++)
snad[j][icoeff] += buf[m++];
}
}
/* ----------------------------------------------------------------------
memory usage
------------------------------------------------------------------------- */
double ComputeSNADAtom::memory_usage()
{
double bytes = nmax*size_peratom_cols * sizeof(double);
bytes += 3*njmax*sizeof(double);
bytes += njmax*sizeof(int);
- bytes += ncoeff*3;
+ bytes += threencoeff*atom->ntypes;
+ if (quadraticflag) bytes += threencoeffsq*atom->ntypes;
bytes += snaptr[0]->memory_usage()*comm->nthreads;
return bytes;
}
diff --git a/src/SNAP/compute_snad_atom.h b/src/SNAP/compute_snad_atom.h
index 31f5bf252..0d5a369ab 100644
--- a/src/SNAP/compute_snad_atom.h
+++ b/src/SNAP/compute_snad_atom.h
@@ -1,76 +1,77 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef COMPUTE_CLASS
ComputeStyle(snad/atom,ComputeSNADAtom)
#else
#ifndef LMP_COMPUTE_SNAD_ATOM_H
#define LMP_COMPUTE_SNAD_ATOM_H
#include "compute.h"
namespace LAMMPS_NS {
class ComputeSNADAtom : public Compute {
public:
ComputeSNADAtom(class LAMMPS *, int, char **);
~ComputeSNADAtom();
void init();
void init_list(int, class NeighList *);
void compute_peratom();
int pack_reverse_comm(int, int, double *);
void unpack_reverse_comm(int, int *, double *);
double memory_usage();
private:
int nmax, njmax, diagonalstyle;
- int ncoeff;
+ int ncoeff, twoncoeff, threencoeff, ncoeffsq, twoncoeffsq, threencoeffsq;
double **cutsq;
class NeighList *list;
double **snad;
double rcutfac;
double *radelem;
double *wjelem;
class SNA** snaptr;
-
+ double cutmax;
+ int quadraticflag;
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Compute snad/atom requires a pair style be defined
Self-explanatory.
E: Compute snad/atom cutoff is longer than pairwise cutoff
Self-explanatory.
W: More than one compute snad/atom
Self-explanatory.
*/
diff --git a/src/SNAP/compute_snav_atom.cpp b/src/SNAP/compute_snav_atom.cpp
index f75b02fba..0d21d1656 100644
--- a/src/SNAP/compute_snav_atom.cpp
+++ b/src/SNAP/compute_snav_atom.cpp
@@ -1,347 +1,407 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include "sna.h"
#include <string.h>
#include <stdlib.h>
#include "compute_snav_atom.h"
#include "atom.h"
#include "update.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "force.h"
#include "pair.h"
#include "comm.h"
#include "memory.h"
#include "error.h"
#include "openmp_snap.h"
using namespace LAMMPS_NS;
ComputeSNAVAtom::ComputeSNAVAtom(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg), cutsq(NULL), list(NULL), snav(NULL),
radelem(NULL), wjelem(NULL)
{
double rfac0, rmin0;
int twojmax, switchflag, bzeroflag;
radelem = NULL;
wjelem = NULL;
- nvirial = 6;
-
int ntypes = atom->ntypes;
int nargmin = 6+2*ntypes;
if (narg < nargmin) error->all(FLERR,"Illegal compute snav/atom command");
// default values
diagonalstyle = 0;
rmin0 = 0.0;
switchflag = 1;
- bzeroflag = 0;
+ bzeroflag = 1;
+ quadraticflag = 0;
// process required arguments
+
memory->create(radelem,ntypes+1,"sna/atom:radelem"); // offset by 1 to match up with types
memory->create(wjelem,ntypes+1,"sna/atom:wjelem");
rcutfac = atof(arg[3]);
rfac0 = atof(arg[4]);
twojmax = atoi(arg[5]);
for(int i = 0; i < ntypes; i++)
radelem[i+1] = atof(arg[6+i]);
for(int i = 0; i < ntypes; i++)
wjelem[i+1] = atof(arg[6+ntypes+i]);
// construct cutsq
double cut;
memory->create(cutsq,ntypes+1,ntypes+1,"sna/atom:cutsq");
for(int i = 1; i <= ntypes; i++) {
cut = 2.0*radelem[i]*rcutfac;
cutsq[i][i] = cut*cut;
for(int j = i+1; j <= ntypes; j++) {
cut = (radelem[i]+radelem[j])*rcutfac;
cutsq[i][j] = cutsq[j][i] = cut*cut;
}
}
// process optional args
int iarg = nargmin;
while (iarg < narg) {
if (strcmp(arg[iarg],"diagonal") == 0) {
if (iarg+2 > narg)
error->all(FLERR,"Illegal compute snav/atom command");
diagonalstyle = atof(arg[iarg+1]);
if (diagonalstyle < 0 || diagonalstyle > 3)
error->all(FLERR,"Illegal compute snav/atom command");
iarg += 2;
} else if (strcmp(arg[iarg],"rmin0") == 0) {
if (iarg+2 > narg)
error->all(FLERR,"Illegal compute snav/atom command");
rmin0 = atof(arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"switchflag") == 0) {
if (iarg+2 > narg)
error->all(FLERR,"Illegal compute snav/atom command");
switchflag = atoi(arg[iarg+1]);
iarg += 2;
+ } else if (strcmp(arg[iarg],"quadraticflag") == 0) {
+ if (iarg+2 > narg)
+ error->all(FLERR,"Illegal compute snav/atom command");
+ quadraticflag = atoi(arg[iarg+1]);
+ iarg += 2;
} else error->all(FLERR,"Illegal compute snav/atom command");
}
snaptr = new SNA*[comm->nthreads];
#if defined(_OPENMP)
#pragma omp parallel default(none) shared(lmp,rfac0,twojmax,rmin0,switchflag,bzeroflag)
#endif
{
int tid = omp_get_thread_num();
// always unset use_shared_arrays since it does not work with computes
snaptr[tid] = new SNA(lmp,rfac0,twojmax,diagonalstyle,
0 /*use_shared_arrays*/, rmin0,switchflag,bzeroflag);
}
ncoeff = snaptr[0]->ncoeff;
- peratom_flag = 1;
- size_peratom_cols = nvirial*ncoeff*atom->ntypes;
+ twoncoeff = 2*ncoeff;
+ threencoeff = 3*ncoeff;
+ fourncoeff = 4*ncoeff;
+ fivencoeff = 5*ncoeff;
+ sixncoeff = 6*ncoeff;
+ size_peratom_cols = sixncoeff*atom->ntypes;
+ if (quadraticflag) {
+ ncoeffsq = ncoeff*ncoeff;
+ twoncoeffsq = 2*ncoeffsq;
+ threencoeffsq = 3*ncoeffsq;
+ fourncoeffsq = 4*ncoeffsq;
+ fivencoeffsq = 5*ncoeffsq;
+ sixncoeffsq = 6*ncoeffsq;
+ size_peratom_cols +=
+ sixncoeffsq*atom->ntypes;
+ }
comm_reverse = size_peratom_cols;
+ peratom_flag = 1;
nmax = 0;
njmax = 0;
snav = NULL;
}
/* ---------------------------------------------------------------------- */
ComputeSNAVAtom::~ComputeSNAVAtom()
{
memory->destroy(snav);
memory->destroy(radelem);
memory->destroy(wjelem);
memory->destroy(cutsq);
delete [] snaptr;
}
/* ---------------------------------------------------------------------- */
void ComputeSNAVAtom::init()
{
if (force->pair == NULL)
error->all(FLERR,"Compute snav/atom requires a pair style be defined");
// TODO: Not sure what to do with this error check since cutoff radius is not
// a single number
//if (sqrt(cutsq) > force->pair->cutforce)
// error->all(FLERR,"Compute snav/atom cutoff is longer than pairwise cutoff");
// need an occasional full neighbor list
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->pair = 0;
neighbor->requests[irequest]->compute = 1;
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->full = 1;
neighbor->requests[irequest]->occasional = 1;
int count = 0;
for (int i = 0; i < modify->ncompute; i++)
if (strcmp(modify->compute[i]->style,"snav/atom") == 0) count++;
if (count > 1 && comm->me == 0)
error->warning(FLERR,"More than one compute snav/atom");
#if defined(_OPENMP)
#pragma omp parallel default(none)
#endif
{
int tid = omp_get_thread_num();
snaptr[tid]->init();
}
}
/* ---------------------------------------------------------------------- */
void ComputeSNAVAtom::init_list(int id, NeighList *ptr)
{
list = ptr;
}
/* ---------------------------------------------------------------------- */
void ComputeSNAVAtom::compute_peratom()
{
int ntotal = atom->nlocal + atom->nghost;
invoked_peratom = update->ntimestep;
// grow snav array if necessary
if (atom->nmax > nmax) {
memory->destroy(snav);
nmax = atom->nmax;
memory->create(snav,nmax,size_peratom_cols,
"snav/atom:snav");
array_atom = snav;
}
// clear local array
for (int i = 0; i < ntotal; i++)
for (int icoeff = 0; icoeff < size_peratom_cols; icoeff++) {
snav[i][icoeff] = 0.0;
}
// invoke full neighbor list (will copy or build if necessary)
neighbor->build_one(list);
const int inum = list->inum;
const int* const ilist = list->ilist;
const int* const numneigh = list->numneigh;
int** const firstneigh = list->firstneigh;
int * const type = atom->type;
// compute sna derivatives for each atom in group
// use full neighbor list to count atoms less than cutoff
double** const x = atom->x;
const int* const mask = atom->mask;
#if defined(_OPENMP)
#pragma omp parallel for default(none)
#endif
for (int ii = 0; ii < inum; ii++) {
const int tid = omp_get_thread_num();
const int i = ilist[ii];
if (mask[i] & groupbit) {
const double xtmp = x[i][0];
const double ytmp = x[i][1];
const double ztmp = x[i][2];
const int itype = type[i];
const double radi = radelem[itype];
const int* const jlist = firstneigh[i];
const int jnum = numneigh[i];
- const int typeoffset = nvirial*ncoeff*(atom->type[i]-1);
+ const int typeoffset = sixncoeff*(atom->type[i]-1);
+ const int quadraticoffset = sixncoeff*atom->ntypes +
+ sixncoeffsq*(atom->type[i]-1);
// insure rij, inside, and typej are of size jnum
snaptr[tid]->grow_rij(jnum);
// rij[][3] = displacements between atom I and those neighbors
// inside = indices of neighbors of I within cutoff
// typej = types of neighbors of I within cutoff
// note Rij sign convention => dU/dRij = dU/dRj = -dU/dRi
int ninside = 0;
for (int jj = 0; jj < jnum; jj++) {
int j = jlist[jj];
j &= NEIGHMASK;
const double delx = x[j][0] - xtmp;
const double dely = x[j][1] - ytmp;
const double delz = x[j][2] - ztmp;
const double rsq = delx*delx + dely*dely + delz*delz;
int jtype = type[j];
if (rsq < cutsq[itype][jtype]&&rsq>1e-20) {
snaptr[tid]->rij[ninside][0] = delx;
snaptr[tid]->rij[ninside][1] = dely;
snaptr[tid]->rij[ninside][2] = delz;
snaptr[tid]->inside[ninside] = j;
snaptr[tid]->wj[ninside] = wjelem[jtype];
snaptr[tid]->rcutij[ninside] = (radi+radelem[jtype])*rcutfac;
ninside++;
}
}
snaptr[tid]->compute_ui(ninside);
snaptr[tid]->compute_zi();
+ if (quadraticflag) {
+ snaptr[tid]->compute_bi();
+ snaptr[tid]->copy_bi2bvec();
+ }
for (int jj = 0; jj < ninside; jj++) {
const int j = snaptr[tid]->inside[jj];
snaptr[tid]->compute_duidrj(snaptr[tid]->rij[jj],
snaptr[tid]->wj[jj],
snaptr[tid]->rcutij[jj]);
snaptr[tid]->compute_dbidrj();
snaptr[tid]->copy_dbi2dbvec();
// Accumulate -dBi/dRi*Ri, -dBi/dRj*Rj
double *snavi = snav[i]+typeoffset;
double *snavj = snav[j]+typeoffset;
for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
- snavi[icoeff] += snaptr[tid]->dbvec[icoeff][0]*xtmp;
- snavi[icoeff+ncoeff] += snaptr[tid]->dbvec[icoeff][1]*ytmp;
- snavi[icoeff+2*ncoeff] += snaptr[tid]->dbvec[icoeff][2]*ztmp;
- snavi[icoeff+3*ncoeff] += snaptr[tid]->dbvec[icoeff][1]*ztmp;
- snavi[icoeff+4*ncoeff] += snaptr[tid]->dbvec[icoeff][0]*ztmp;
- snavi[icoeff+5*ncoeff] += snaptr[tid]->dbvec[icoeff][0]*ytmp;
- snavj[icoeff] -= snaptr[tid]->dbvec[icoeff][0]*x[j][0];
- snavj[icoeff+ncoeff] -= snaptr[tid]->dbvec[icoeff][1]*x[j][1];
- snavj[icoeff+2*ncoeff] -= snaptr[tid]->dbvec[icoeff][2]*x[j][2];
- snavj[icoeff+3*ncoeff] -= snaptr[tid]->dbvec[icoeff][1]*x[j][2];
- snavj[icoeff+4*ncoeff] -= snaptr[tid]->dbvec[icoeff][0]*x[j][2];
- snavj[icoeff+5*ncoeff] -= snaptr[tid]->dbvec[icoeff][0]*x[j][1];
+ snavi[icoeff] += snaptr[tid]->dbvec[icoeff][0]*xtmp;
+ snavi[icoeff+ncoeff] += snaptr[tid]->dbvec[icoeff][1]*ytmp;
+ snavi[icoeff+twoncoeff] += snaptr[tid]->dbvec[icoeff][2]*ztmp;
+ snavi[icoeff+threencoeff] += snaptr[tid]->dbvec[icoeff][1]*ztmp;
+ snavi[icoeff+fourncoeff] += snaptr[tid]->dbvec[icoeff][0]*ztmp;
+ snavi[icoeff+fivencoeff] += snaptr[tid]->dbvec[icoeff][0]*ytmp;
+ snavj[icoeff] -= snaptr[tid]->dbvec[icoeff][0]*x[j][0];
+ snavj[icoeff+ncoeff] -= snaptr[tid]->dbvec[icoeff][1]*x[j][1];
+ snavj[icoeff+twoncoeff] -= snaptr[tid]->dbvec[icoeff][2]*x[j][2];
+ snavj[icoeff+threencoeff] -= snaptr[tid]->dbvec[icoeff][1]*x[j][2];
+ snavj[icoeff+fourncoeff] -= snaptr[tid]->dbvec[icoeff][0]*x[j][2];
+ snavj[icoeff+fivencoeff] -= snaptr[tid]->dbvec[icoeff][0]*x[j][1];
}
+
+ if (quadraticflag) {
+ double *snavi = snav[i]+quadraticoffset;
+ double *snavj = snav[j]+quadraticoffset;
+ int ncount = 0;
+ for (int icoeff = 0; icoeff < ncoeff; icoeff++) {
+ double bi = snaptr[tid]->bvec[icoeff];
+ double bix = snaptr[tid]->dbvec[icoeff][0];
+ double biy = snaptr[tid]->dbvec[icoeff][1];
+ double biz = snaptr[tid]->dbvec[icoeff][2];
+ for (int jcoeff = 0; jcoeff < ncoeff; jcoeff++) {
+ double dbxtmp = bi*snaptr[tid]->dbvec[jcoeff][0]
+ + bix*snaptr[tid]->bvec[jcoeff];
+ double dbytmp = bi*snaptr[tid]->dbvec[jcoeff][1]
+ + biy*snaptr[tid]->bvec[jcoeff];
+ double dbztmp = bi*snaptr[tid]->dbvec[jcoeff][2]
+ + biz*snaptr[tid]->bvec[jcoeff];
+ snavi[ncount] += dbxtmp*xtmp;
+ snavi[ncount+ncoeffsq] += dbytmp*ytmp;
+ snavi[ncount+twoncoeffsq] += dbztmp*ztmp;
+ snavi[ncount+threencoeffsq] += dbytmp*ztmp;
+ snavi[ncount+fourncoeffsq] += dbxtmp*ztmp;
+ snavi[ncount+fivencoeffsq] += dbxtmp*ytmp;
+ snavj[ncount] -= dbxtmp*x[j][0];
+ snavj[ncount+ncoeffsq] -= dbytmp*x[j][1];
+ snavj[ncount+twoncoeffsq] -= dbztmp*x[j][2];
+ snavj[ncount+threencoeffsq] -= dbytmp*x[j][2];
+ snavj[ncount+fourncoeffsq] -= dbxtmp*x[j][2];
+ snavj[ncount+fivencoeffsq] -= dbxtmp*x[j][1];
+ ncount++;
+ }
+ }
+ }
}
}
}
// communicate snav contributions between neighbor procs
comm->reverse_comm_compute(this);
}
/* ---------------------------------------------------------------------- */
int ComputeSNAVAtom::pack_reverse_comm(int n, int first, double *buf)
{
int i,m,last,icoeff;
m = 0;
last = first + n;
for (i = first; i < last; i++)
for (icoeff = 0; icoeff < size_peratom_cols; icoeff++)
buf[m++] = snav[i][icoeff];
return comm_reverse;
}
/* ---------------------------------------------------------------------- */
void ComputeSNAVAtom::unpack_reverse_comm(int n, int *list, double *buf)
{
int i,j,m,icoeff;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
for (icoeff = 0; icoeff < size_peratom_cols; icoeff++)
snav[j][icoeff] += buf[m++];
}
}
/* ----------------------------------------------------------------------
memory usage
------------------------------------------------------------------------- */
double ComputeSNAVAtom::memory_usage()
{
double bytes = nmax*size_peratom_cols * sizeof(double);
bytes += 3*njmax*sizeof(double);
bytes += njmax*sizeof(int);
- bytes += ncoeff*nvirial;
+ bytes += sixncoeff*atom->ntypes;
+ if (quadraticflag) bytes += sixncoeffsq*atom->ntypes;
bytes += snaptr[0]->memory_usage()*comm->nthreads;
return bytes;
}
diff --git a/src/SNAP/compute_snav_atom.h b/src/SNAP/compute_snav_atom.h
index 0252be705..33ae4f921 100644
--- a/src/SNAP/compute_snav_atom.h
+++ b/src/SNAP/compute_snav_atom.h
@@ -1,77 +1,78 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef COMPUTE_CLASS
ComputeStyle(snav/atom,ComputeSNAVAtom)
#else
#ifndef LMP_COMPUTE_SNAV_ATOM_H
#define LMP_COMPUTE_SNAV_ATOM_H
#include "compute.h"
namespace LAMMPS_NS {
class ComputeSNAVAtom : public Compute {
public:
ComputeSNAVAtom(class LAMMPS *, int, char **);
~ComputeSNAVAtom();
void init();
void init_list(int, class NeighList *);
void compute_peratom();
int pack_reverse_comm(int, int, double *);
void unpack_reverse_comm(int, int *, double *);
double memory_usage();
private:
int nmax, njmax, diagonalstyle;
- int ncoeff,nvirial;
+ int ncoeff, twoncoeff, threencoeff, fourncoeff, fivencoeff, sixncoeff;
+ int ncoeffsq, twoncoeffsq, threencoeffsq, fourncoeffsq, fivencoeffsq, sixncoeffsq;
double **cutsq;
class NeighList *list;
double **snav;
double rcutfac;
double *radelem;
double *wjelem;
-
class SNA** snaptr;
-
+ double cutmax;
+ int quadraticflag;
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Compute snav/atom requires a pair style be defined
Self-explanatory.
E: Compute snav/atom cutoff is longer than pairwise cutoff
Self-explanatory.
W: More than one compute snav/atom
Self-explanatory.
*/
diff --git a/src/SNAP/pair_snap.cpp b/src/SNAP/pair_snap.cpp
index 06c2e4848..e4ed57b93 100644
--- a/src/SNAP/pair_snap.cpp
+++ b/src/SNAP/pair_snap.cpp
@@ -1,1733 +1,1734 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "pair_snap.h"
#include "atom.h"
#include "atom_vec.h"
#include "force.h"
#include "comm.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "sna.h"
#include "openmp_snap.h"
#include "domain.h"
#include "memory.h"
#include "error.h"
#include <cmath>
using namespace LAMMPS_NS;
#define MAXLINE 1024
#define MAXWORD 3
/* ---------------------------------------------------------------------- */
PairSNAP::PairSNAP(LAMMPS *lmp) : Pair(lmp)
{
single_enable = 0;
restartinfo = 0;
one_coeff = 1;
manybody_flag = 1;
nelements = 0;
elements = NULL;
radelem = NULL;
wjelem = NULL;
coeffelem = NULL;
nmax = 0;
nthreads = 1;
schedule_user = 0;
schedule_time_guided = -1;
schedule_time_dynamic = -1;
ncalls_neigh =-1;
ilistmask_max = 0;
ilistmask = NULL;
ghostinum = 0;
ghostilist_max = 0;
ghostilist = NULL;
ghostnumneigh_max = 0;
ghostnumneigh = NULL;
ghostneighs = NULL;
ghostfirstneigh = NULL;
ghostneighs_total = 0;
ghostneighs_max = 0;
i_max = 0;
i_neighmax = 0;
i_numpairs = 0;
i_rij = NULL;
i_inside = NULL;
i_wj = NULL;
i_rcutij = NULL;
i_ninside = NULL;
i_pairs = NULL;
i_uarraytot_r = NULL;
i_uarraytot_i = NULL;
i_zarray_r = NULL;
i_zarray_i =NULL;
use_shared_arrays = 0;
#ifdef TIMING_INFO
timers[0] = 0;
timers[1] = 0;
timers[2] = 0;
timers[3] = 0;
#endif
// Need to set this because restart not handled by PairHybrid
sna = NULL;
}
/* ---------------------------------------------------------------------- */
PairSNAP::~PairSNAP()
{
if (nelements) {
for (int i = 0; i < nelements; i++)
delete[] elements[i];
delete[] elements;
memory->destroy(radelem);
memory->destroy(wjelem);
memory->destroy(coeffelem);
}
// Need to set this because restart not handled by PairHybrid
if (sna) {
#ifdef TIMING_INFO
double time[5];
double timeave[5];
double timeave_mpi[5];
double timemax_mpi[5];
for (int i = 0; i < 5; i++) {
time[i] = 0;
timeave[i] = 0;
for (int tid = 0; tid<nthreads; tid++) {
if (sna[tid]->timers[i]>time[i])
time[i] = sna[tid]->timers[i];
timeave[i] += sna[tid]->timers[i];
}
timeave[i] /= nthreads;
}
MPI_Reduce(timeave, timeave_mpi, 5, MPI_DOUBLE, MPI_SUM, 0, world);
MPI_Reduce(time, timemax_mpi, 5, MPI_DOUBLE, MPI_MAX, 0, world);
#endif
for (int tid = 0; tid<nthreads; tid++)
delete sna[tid];
delete [] sna;
}
if (allocated) {
memory->destroy(setflag);
memory->destroy(cutsq);
memory->destroy(map);
}
}
void PairSNAP::compute(int eflag, int vflag)
{
if (use_optimized)
compute_optimized(eflag, vflag);
else
compute_regular(eflag, vflag);
}
/* ----------------------------------------------------------------------
This version is a straightforward implementation
---------------------------------------------------------------------- */
void PairSNAP::compute_regular(int eflag, int vflag)
{
int i,j,jnum,ninside;
double delx,dely,delz,evdwl,rsq;
double fij[3];
int *jlist,*numneigh,**firstneigh;
evdwl = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
double **x = atom->x;
double **f = atom->f;
int *type = atom->type;
int nlocal = atom->nlocal;
int newton_pair = force->newton_pair;
class SNA* snaptr = sna[0];
numneigh = list->numneigh;
firstneigh = list->firstneigh;
for (int ii = 0; ii < list->inum; ii++) {
i = list->ilist[ii];
const double xtmp = x[i][0];
const double ytmp = x[i][1];
const double ztmp = x[i][2];
const int itype = type[i];
const int ielem = map[itype];
const double radi = radelem[ielem];
jlist = firstneigh[i];
jnum = numneigh[i];
// insure rij, inside, wj, and rcutij are of size jnum
snaptr->grow_rij(jnum);
// rij[][3] = displacements between atom I and those neighbors
// inside = indices of neighbors of I within cutoff
// wj = weights for neighbors of I within cutoff
// rcutij = cutoffs for neighbors of I within cutoff
// note Rij sign convention => dU/dRij = dU/dRj = -dU/dRi
ninside = 0;
for (int jj = 0; jj < jnum; jj++) {
j = jlist[jj];
j &= NEIGHMASK;
delx = x[j][0] - xtmp;
dely = x[j][1] - ytmp;
delz = x[j][2] - ztmp;
rsq = delx*delx + dely*dely + delz*delz;
int jtype = type[j];
int jelem = map[jtype];
if (rsq < cutsq[itype][jtype]&&rsq>1e-20) {
snaptr->rij[ninside][0] = delx;
snaptr->rij[ninside][1] = dely;
snaptr->rij[ninside][2] = delz;
snaptr->inside[ninside] = j;
snaptr->wj[ninside] = wjelem[jelem];
snaptr->rcutij[ninside] = (radi + radelem[jelem])*rcutfac;
ninside++;
}
}
// compute Ui, Zi, and Bi for atom I
snaptr->compute_ui(ninside);
snaptr->compute_zi();
if (!gammaoneflag) {
snaptr->compute_bi();
snaptr->copy_bi2bvec();
}
// for neighbors of I within cutoff:
// compute dUi/drj and dBi/drj
// Fij = dEi/dRj = -dEi/dRi => add to Fi, subtract from Fj
double* coeffi = coeffelem[ielem];
for (int jj = 0; jj < ninside; jj++) {
int j = snaptr->inside[jj];
snaptr->compute_duidrj(snaptr->rij[jj],
snaptr->wj[jj],snaptr->rcutij[jj]);
snaptr->compute_dbidrj();
snaptr->copy_dbi2dbvec();
fij[0] = 0.0;
fij[1] = 0.0;
fij[2] = 0.0;
for (int k = 1; k <= ncoeff; k++) {
double bgb;
if (gammaoneflag)
bgb = coeffi[k];
else bgb = coeffi[k]*
gamma*pow(snaptr->bvec[k-1],gamma-1.0);
fij[0] += bgb*snaptr->dbvec[k-1][0];
fij[1] += bgb*snaptr->dbvec[k-1][1];
fij[2] += bgb*snaptr->dbvec[k-1][2];
}
f[i][0] += fij[0];
f[i][1] += fij[1];
f[i][2] += fij[2];
f[j][0] -= fij[0];
f[j][1] -= fij[1];
f[j][2] -= fij[2];
if (evflag)
ev_tally_xyz(i,j,nlocal,newton_pair,0.0,0.0,
fij[0],fij[1],fij[2],
snaptr->rij[jj][0],snaptr->rij[jj][1],
snaptr->rij[jj][2]);
}
if (eflag) {
// evdwl = energy of atom I, sum over coeffs_k * Bi_k
evdwl = coeffi[0];
if (gammaoneflag) {
snaptr->compute_bi();
snaptr->copy_bi2bvec();
for (int k = 1; k <= ncoeff; k++)
evdwl += coeffi[k]*snaptr->bvec[k-1];
} else
for (int k = 1; k <= ncoeff; k++)
evdwl += coeffi[k]*pow(snaptr->bvec[k-1],gamma);
ev_tally_full(i,2.0*evdwl,0.0,0.0,delx,dely,delz);
}
}
if (vflag_fdotr) virial_fdotr_compute();
}
/* ----------------------------------------------------------------------
This version is optimized for threading, micro-load balancing
---------------------------------------------------------------------- */
void PairSNAP::compute_optimized(int eflag, int vflag)
{
// if reneighboring took place do load_balance if requested
if (do_load_balance > 0 &&
(neighbor->ncalls != ncalls_neigh)) {
ghostinum = 0;
// reset local ghost neighbor lists
ncalls_neigh = neighbor->ncalls;
if (ilistmask_max < list->inum) {
memory->grow(ilistmask,list->inum,"PairSnap::ilistmask");
ilistmask_max = list->inum;
}
for (int i = 0; i < list->inum; i++)
ilistmask[i] = 1;
//multiple passes for loadbalancing
for (int i = 0; i < do_load_balance; i++)
load_balance();
}
int numpairs = 0;
for (int ii = 0; ii < list->inum; ii++) {
if ((do_load_balance <= 0) || ilistmask[ii]) {
int i = list->ilist[ii];
int jnum = list->numneigh[i];
numpairs += jnum;
}
}
if (do_load_balance)
for (int ii = 0; ii < ghostinum; ii++) {
int i = ghostilist[ii];
int jnum = ghostnumneigh[i];
numpairs += jnum;
}
// optimized schedule setting
int time_dynamic = 0;
int time_guided = 0;
if (schedule_user == 0) schedule_user = 4;
switch (schedule_user) {
case 1:
omp_set_schedule(omp_sched_static,1);
break;
case 2:
omp_set_schedule(omp_sched_dynamic,1);
break;
case 3:
omp_set_schedule(omp_sched_guided,2);
break;
case 4:
omp_set_schedule(omp_sched_auto,0);
break;
case 5:
if (numpairs < 8*nthreads) omp_set_schedule(omp_sched_dynamic,1);
else if (schedule_time_guided < 0.0) {
omp_set_schedule(omp_sched_guided,2);
if (!eflag && !vflag) time_guided = 1;
} else if (schedule_time_dynamic<0.0) {
omp_set_schedule(omp_sched_dynamic,1);
if (!eflag && !vflag) time_dynamic = 1;
} else if (schedule_time_guided<schedule_time_dynamic)
omp_set_schedule(omp_sched_guided,2);
else
omp_set_schedule(omp_sched_dynamic,1);
break;
}
if (use_shared_arrays)
build_per_atom_arrays();
#if defined(_OPENMP)
#pragma omp parallel shared(eflag,vflag,time_dynamic,time_guided) firstprivate(numpairs) default(none)
#endif
{
// begin of pragma omp parallel
int tid = omp_get_thread_num();
int** pairs_tid_unique = NULL;
int** pairs;
if (use_shared_arrays) pairs = i_pairs;
else {
memory->create(pairs_tid_unique,numpairs,4,"numpairs");
pairs = pairs_tid_unique;
}
if (!use_shared_arrays) {
numpairs = 0;
for (int ii = 0; ii < list->inum; ii++) {
if ((do_load_balance <= 0) || ilistmask[ii]) {
int i = list->ilist[ii];
int jnum = list->numneigh[i];
for (int jj = 0; jj<jnum; jj++) {
pairs[numpairs][0] = i;
pairs[numpairs][1] = jj;
pairs[numpairs][2] = -1;
numpairs++;
}
}
}
for (int ii = 0; ii < ghostinum; ii++) {
int i = ghostilist[ii];
int jnum = ghostnumneigh[i];
for (int jj = 0; jj<jnum; jj++) {
pairs[numpairs][0] = i;
pairs[numpairs][1] = jj;
pairs[numpairs][2] = -1;
numpairs++;
}
}
}
int ielem;
int jj,k,jnum,jtype,ninside;
double delx,dely,delz,evdwl,rsq;
double fij[3];
int *jlist,*numneigh,**firstneigh;
evdwl = 0.0;
#if defined(_OPENMP)
#pragma omp master
#endif
{
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = 0;
}
#if defined(_OPENMP)
#pragma omp barrier
{ ; }
#endif
double **x = atom->x;
double **f = atom->f;
int *type = atom->type;
int nlocal = atom->nlocal;
int newton_pair = force->newton_pair;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
#ifdef TIMING_INFO
// only update micro timers after setup
static int count=0;
if (count<2) {
sna[tid]->timers[0] = 0;
sna[tid]->timers[1] = 0;
sna[tid]->timers[2] = 0;
sna[tid]->timers[3] = 0;
sna[tid]->timers[4] = 0;
}
count++;
#endif
// did thread start working on interactions of new atom
int iold = -1;
double starttime, endtime;
if (time_dynamic || time_guided)
starttime = MPI_Wtime();
#if defined(_OPENMP)
#pragma omp for schedule(runtime)
#endif
for (int iijj = 0; iijj < numpairs; iijj++) {
int i = 0;
if (use_shared_arrays) {
i = i_pairs[iijj][0];
if (iold != i) {
set_sna_to_shared(tid,i_pairs[iijj][3]);
ielem = map[type[i]];
}
iold = i;
} else {
i = pairs[iijj][0];
if (iold != i) {
iold = i;
const double xtmp = x[i][0];
const double ytmp = x[i][1];
const double ztmp = x[i][2];
const int itype = type[i];
ielem = map[itype];
const double radi = radelem[ielem];
if (i < nlocal) {
jlist = firstneigh[i];
jnum = numneigh[i];
} else {
jlist = ghostneighs+ghostfirstneigh[i];
jnum = ghostnumneigh[i];
}
// insure rij, inside, wj, and rcutij are of size jnum
sna[tid]->grow_rij(jnum);
// rij[][3] = displacements between atom I and those neighbors
// inside = indices of neighbors of I within cutoff
// wj = weights of neighbors of I within cutoff
// rcutij = cutoffs of neighbors of I within cutoff
// note Rij sign convention => dU/dRij = dU/dRj = -dU/dRi
ninside = 0;
for (jj = 0; jj < jnum; jj++) {
int j = jlist[jj];
j &= NEIGHMASK;
delx = x[j][0] - xtmp; //unitialised
dely = x[j][1] - ytmp;
delz = x[j][2] - ztmp;
rsq = delx*delx + dely*dely + delz*delz;
jtype = type[j];
int jelem = map[jtype];
if (rsq < cutsq[itype][jtype]&&rsq>1e-20) { //unitialised
sna[tid]->rij[ninside][0] = delx;
sna[tid]->rij[ninside][1] = dely;
sna[tid]->rij[ninside][2] = delz;
sna[tid]->inside[ninside] = j;
sna[tid]->wj[ninside] = wjelem[jelem];
sna[tid]->rcutij[ninside] = (radi + radelem[jelem])*rcutfac;
ninside++;
// update index list with inside index
pairs[iijj + (jj - pairs[iijj][1])][2] =
ninside-1; //unitialised
}
}
// compute Ui and Zi for atom I
sna[tid]->compute_ui(ninside); //unitialised
sna[tid]->compute_zi();
}
}
// for neighbors of I within cutoff:
// compute dUi/drj and dBi/drj
// Fij = dEi/dRj = -dEi/dRi => add to Fi, subtract from Fj
// entry into loop if inside index is set
double* coeffi = coeffelem[ielem];
if (pairs[iijj][2] >= 0) {
jj = pairs[iijj][2];
int j = sna[tid]->inside[jj];
sna[tid]->compute_duidrj(sna[tid]->rij[jj],
sna[tid]->wj[jj],sna[tid]->rcutij[jj]);
sna[tid]->compute_dbidrj();
sna[tid]->copy_dbi2dbvec();
if (!gammaoneflag) {
sna[tid]->compute_bi();
sna[tid]->copy_bi2bvec();
}
fij[0] = 0.0;
fij[1] = 0.0;
fij[2] = 0.0;
for (k = 1; k <= ncoeff; k++) {
double bgb;
if (gammaoneflag)
bgb = coeffi[k];
else bgb = coeffi[k]*
gamma*pow(sna[tid]->bvec[k-1],gamma-1.0);
fij[0] += bgb*sna[tid]->dbvec[k-1][0];
fij[1] += bgb*sna[tid]->dbvec[k-1][1];
fij[2] += bgb*sna[tid]->dbvec[k-1][2];
}
#if defined(_OPENMP)
#pragma omp critical
#endif
{
f[i][0] += fij[0];
f[i][1] += fij[1];
f[i][2] += fij[2];
f[j][0] -= fij[0];
f[j][1] -= fij[1];
f[j][2] -= fij[2];
if (evflag)
ev_tally_xyz(i,j,nlocal,newton_pair,0.0,0.0,
fij[0],fij[1],fij[2],
sna[tid]->rij[jj][0],sna[tid]->rij[jj][1],
sna[tid]->rij[jj][2]);
}
}
// evdwl = energy of atom I, sum over coeffs_k * Bi_k
// only call this for first pair of each atom i
// if atom has no pairs, eatom=0, which is wrong
if (eflag&&pairs[iijj][1] == 0) {
evdwl = coeffi[0];
if (gammaoneflag) {
sna[tid]->compute_bi();
sna[tid]->copy_bi2bvec();
for (int k = 1; k <= ncoeff; k++)
evdwl += coeffi[k]*sna[tid]->bvec[k-1];
} else
for (int k = 1; k <= ncoeff; k++)
evdwl += coeffi[k]*pow(sna[tid]->bvec[k-1],gamma);
#if defined(_OPENMP)
#pragma omp critical
#endif
ev_tally_full(i,2.0*evdwl,0.0,0.0,delx,dely,delz);
}
}
if (time_dynamic || time_guided)
endtime = MPI_Wtime();
if (time_dynamic) schedule_time_dynamic = endtime - starttime;
if (time_guided) schedule_time_guided = endtime - starttime;
if (!use_shared_arrays) memory->destroy(pairs);
}// end of pragma omp parallel
if (vflag_fdotr) virial_fdotr_compute();
}
inline int PairSNAP::equal(double* x,double* y)
{
double dist2 =
(x[0]-y[0])*(x[0]-y[0]) +
(x[1]-y[1])*(x[1]-y[1]) +
(x[2]-y[2])*(x[2]-y[2]);
if (dist2 < 1e-20) return 1;
return 0;
}
inline double PairSNAP::dist2(double* x,double* y)
{
return
(x[0]-y[0])*(x[0]-y[0]) +
(x[1]-y[1])*(x[1]-y[1]) +
(x[2]-y[2])*(x[2]-y[2]);
}
// return extra communication cutoff
// extra_cutoff = max(subdomain_length)
double PairSNAP::extra_cutoff()
{
double sublo[3],subhi[3];
if (domain->triclinic == 0) {
for (int dim = 0 ; dim < 3 ; dim++) {
sublo[dim] = domain->sublo[dim];
subhi[dim] = domain->subhi[dim];
}
} else {
domain->lamda2x(domain->sublo_lamda,sublo);
domain->lamda2x(domain->subhi_lamda,subhi);
}
double sub_size[3];
for (int dim = 0; dim < 3; dim++)
sub_size[dim] = subhi[dim] - sublo[dim];
double max_sub_size = 0;
for (int dim = 0; dim < 3; dim++)
max_sub_size = MAX(max_sub_size,sub_size[dim]);
// note: for triclinic, probably need something different
// see Comm::setup()
return max_sub_size;
}
// micro load_balancer: each MPI process will
// check with each of its 26 neighbors,
// whether an imbalance exists in the number
// of atoms to calculate forces for.
// If it does it will set ilistmask of one of
// its local atoms to zero, and send its Tag
// to the neighbor process. The neighboring process
// will check its ghost list for the
// ghost atom with the same Tag which is closest
// to its domain center, and build a
// neighborlist for this ghost atom. For this to work,
// the communication cutoff has to be
// as large as the neighbor cutoff +
// maximum subdomain length.
// Note that at most one atom is exchanged per processor pair.
// Also note that the local atom assignment
// doesn't change. This load balancer will cause
// some ghost atoms to have full neighborlists
// which are unique to PairSNAP.
// They are not part of the generally accessible neighborlist.
// At the same time corresponding local atoms on
// other MPI processes will not be
// included in the force computation since
// their ilistmask is 0. This does not effect
// any other classes which might
// access the same general neighborlist.
// Reverse communication (newton on) of forces is required.
// Currently the load balancer does two passes,
// since its exchanging atoms upstream and downstream.
void PairSNAP::load_balance()
{
double sublo[3],subhi[3];
if (domain->triclinic == 0) {
double* sublotmp = domain->sublo;
double* subhitmp = domain->subhi;
for (int dim = 0 ; dim<3 ; dim++) {
sublo[dim]=sublotmp[dim];
subhi[dim]=subhitmp[dim];
}
} else {
double* sublotmp = domain->sublo_lamda;
double* subhitmp = domain->subhi_lamda;
domain->lamda2x(sublotmp,sublo);
domain->lamda2x(subhitmp,subhi);
}
//if (list->inum==0) list->grow(atom->nmax);
int nlocal = ghostinum;
for (int i=0; i < list->inum; i++)
if (ilistmask[i]) nlocal++;
int ***grid2proc = comm->grid2proc;
int* procgrid = comm->procgrid;
int nlocal_up,nlocal_down;
MPI_Request request;
double sub_mid[3];
for (int dim=0; dim<3; dim++)
sub_mid[dim] = (subhi[dim] + sublo[dim])/2;
if (comm->cutghostuser <
neighbor->cutneighmax+extra_cutoff())
error->all(FLERR,"Communication cutoff too small for SNAP micro load balancing");
int nrecv = ghostinum;
int totalsend = 0;
int nsend = 0;
int depth = 1;
for (int dx = -depth; dx < depth+1; dx++)
for (int dy = -depth; dy < depth+1; dy++)
for (int dz = -depth; dz < depth+1; dz++) {
if (dx == dy && dy == dz && dz == 0) continue;
int sendloc[3] = {comm->myloc[0],
comm->myloc[1], comm->myloc[2]
};
sendloc[0] += dx;
sendloc[1] += dy;
sendloc[2] += dz;
for (int dim = 0; dim < 3; dim++)
if (sendloc[dim] >= procgrid[dim])
sendloc[dim] = sendloc[dim] - procgrid[dim];
for (int dim = 0; dim < 3; dim++)
if (sendloc[dim] < 0)
sendloc[dim] = procgrid[dim] + sendloc[dim];
int recvloc[3] = {comm->myloc[0],
comm->myloc[1], comm->myloc[2]
};
recvloc[0] -= dx;
recvloc[1] -= dy;
recvloc[2] -= dz;
for (int dim = 0; dim < 3; dim++)
if (recvloc[dim] < 0)
recvloc[dim] = procgrid[dim] + recvloc[dim];
for (int dim = 0; dim < 3; dim++)
if (recvloc[dim] >= procgrid[dim])
recvloc[dim] = recvloc[dim] - procgrid[dim];
int sendproc = grid2proc[sendloc[0]][sendloc[1]][sendloc[2]];
int recvproc = grid2proc[recvloc[0]][recvloc[1]][recvloc[2]];
// two stage process, first upstream movement, then downstream
MPI_Sendrecv(&nlocal,1,MPI_INT,sendproc,0,
&nlocal_up,1,MPI_INT,recvproc,0,world,MPI_STATUS_IGNORE);
MPI_Sendrecv(&nlocal,1,MPI_INT,recvproc,0,
&nlocal_down,1,MPI_INT,sendproc,0,world,MPI_STATUS_IGNORE);
nsend = 0;
// send upstream
if (nlocal > nlocal_up+1) {
int i = totalsend++;
while(i < list->inum && ilistmask[i] == 0)
i = totalsend++;
if (i < list->inum)
MPI_Isend(&atom->tag[i],1,MPI_INT,recvproc,0,world,&request);
else {
int j = -1;
MPI_Isend(&j,1,MPI_INT,recvproc,0,world,&request);
}
if (i < list->inum) {
for (int j = 0; j < list->inum; j++)
if (list->ilist[j] == i)
ilistmask[j] = 0;
nsend = 1;
}
}
// recv downstream
if (nlocal < nlocal_down-1) {
nlocal++;
int get_tag = -1;
MPI_Recv(&get_tag,1,MPI_INT,sendproc,0,world,MPI_STATUS_IGNORE);
// if get_tag -1 the other process didnt have local atoms to send
if (get_tag >= 0) {
if (ghostinum >= ghostilist_max) {
memory->grow(ghostilist,ghostinum+10,
"PairSnap::ghostilist");
ghostilist_max = ghostinum+10;
}
if (atom->nlocal + atom->nghost >= ghostnumneigh_max) {
ghostnumneigh_max = atom->nlocal+atom->nghost+100;
memory->grow(ghostnumneigh,ghostnumneigh_max,
"PairSnap::ghostnumneigh");
memory->grow(ghostfirstneigh,ghostnumneigh_max,
"PairSnap::ghostfirstneigh");
}
// find closest ghost image of the transfered particle
double mindist = 1e200;
int closestghost = -1;
for (int j = 0; j < atom->nlocal + atom->nghost; j++)
if (atom->tag[j] == get_tag)
if (dist2(sub_mid, atom->x[j]) < mindist) {
closestghost = j;
mindist = dist2(sub_mid, atom->x[j]);
}
// build neighborlist for this particular
// ghost atom, and add it to list->ilist
if (ghostneighs_max - ghostneighs_total <
neighbor->oneatom) {
memory->grow(ghostneighs,
ghostneighs_total + neighbor->oneatom,
"PairSnap::ghostneighs");
ghostneighs_max = ghostneighs_total + neighbor->oneatom;
}
int j = closestghost;
ghostilist[ghostinum] = j;
ghostnumneigh[j] = 0;
ghostfirstneigh[j] = ghostneighs_total;
ghostinum++;
int* jlist = ghostneighs + ghostfirstneigh[j];
// find all neighbors by looping
// over all local and ghost atoms
for (int k = 0; k < atom->nlocal + atom->nghost; k++)
if (dist2(atom->x[j],atom->x[k]) <
neighbor->cutneighmax*neighbor->cutneighmax) {
jlist[ghostnumneigh[j]] = k;
ghostnumneigh[j]++;
ghostneighs_total++;
}
}
if (get_tag >= 0) nrecv++;
}
// decrease nlocal later, so that it is the
// initial number both for receiving and sending
if (nsend) nlocal--;
// second pass through the grid
MPI_Sendrecv(&nlocal,1,MPI_INT,sendproc,0,
&nlocal_up,1,MPI_INT,recvproc,0,world,MPI_STATUS_IGNORE);
MPI_Sendrecv(&nlocal,1,MPI_INT,recvproc,0,
&nlocal_down,1,MPI_INT,sendproc,0,world,MPI_STATUS_IGNORE);
// send downstream
nsend=0;
if (nlocal > nlocal_down+1) {
int i = totalsend++;
while(i < list->inum && ilistmask[i]==0) i = totalsend++;
if (i < list->inum)
MPI_Isend(&atom->tag[i],1,MPI_INT,sendproc,0,world,&request);
else {
int j =- 1;
MPI_Isend(&j,1,MPI_INT,sendproc,0,world,&request);
}
if (i < list->inum) {
for (int j=0; j<list->inum; j++)
if (list->ilist[j] == i) ilistmask[j] = 0;
nsend = 1;
}
}
// receive upstream
if (nlocal < nlocal_up-1) {
nlocal++;
int get_tag = -1;
MPI_Recv(&get_tag,1,MPI_INT,recvproc,0,world,MPI_STATUS_IGNORE);
if (get_tag >= 0) {
if (ghostinum >= ghostilist_max) {
memory->grow(ghostilist,ghostinum+10,
"PairSnap::ghostilist");
ghostilist_max = ghostinum+10;
}
if (atom->nlocal + atom->nghost >= ghostnumneigh_max) {
ghostnumneigh_max = atom->nlocal + atom->nghost + 100;
memory->grow(ghostnumneigh,ghostnumneigh_max,
"PairSnap::ghostnumneigh");
memory->grow(ghostfirstneigh,ghostnumneigh_max,
"PairSnap::ghostfirstneigh");
}
// find closest ghost image of the transfered particle
double mindist = 1e200;
int closestghost = -1;
for (int j = 0; j < atom->nlocal + atom->nghost; j++)
if (atom->tag[j] == get_tag)
if (dist2(sub_mid,atom->x[j])<mindist) {
closestghost = j;
mindist = dist2(sub_mid,atom->x[j]);
}
// build neighborlist for this particular ghost atom
if (ghostneighs_max-ghostneighs_total < neighbor->oneatom) {
memory->grow(ghostneighs,ghostneighs_total + neighbor->oneatom,
"PairSnap::ghostneighs");
ghostneighs_max = ghostneighs_total + neighbor->oneatom;
}
int j = closestghost;
ghostilist[ghostinum] = j;
ghostnumneigh[j] = 0;
ghostfirstneigh[j] = ghostneighs_total;
ghostinum++;
int* jlist = ghostneighs + ghostfirstneigh[j];
for (int k = 0; k < atom->nlocal + atom->nghost; k++)
if (dist2(atom->x[j],atom->x[k]) <
neighbor->cutneighmax*neighbor->cutneighmax) {
jlist[ghostnumneigh[j]] = k;
ghostnumneigh[j]++;
ghostneighs_total++;
}
}
if (get_tag >= 0) nrecv++;
}
if (nsend) nlocal--;
}
}
void PairSNAP::set_sna_to_shared(int snaid,int i)
{
sna[snaid]->rij = i_rij[i];
sna[snaid]->inside = i_inside[i];
sna[snaid]->wj = i_wj[i];
sna[snaid]->rcutij = i_rcutij[i];
sna[snaid]->zarray_r = i_zarray_r[i];
sna[snaid]->zarray_i = i_zarray_i[i];
sna[snaid]->uarraytot_r = i_uarraytot_r[i];
sna[snaid]->uarraytot_i = i_uarraytot_i[i];
}
void PairSNAP::build_per_atom_arrays()
{
#ifdef TIMING_INFO
clock_gettime(CLOCK_REALTIME,&starttime);
#endif
int count = 0;
int neighmax = 0;
for (int ii = 0; ii < list->inum; ii++)
if ((do_load_balance <= 0) || ilistmask[ii]) {
neighmax=MAX(neighmax,list->numneigh[list->ilist[ii]]);
++count;
}
for (int ii = 0; ii < ghostinum; ii++) {
neighmax=MAX(neighmax,ghostnumneigh[ghostilist[ii]]);
++count;
}
if (i_max < count || i_neighmax < neighmax) {
int i_maxt = MAX(count,i_max);
i_neighmax = MAX(neighmax,i_neighmax);
memory->destroy(i_rij);
memory->destroy(i_inside);
memory->destroy(i_wj);
memory->destroy(i_rcutij);
memory->destroy(i_ninside);
memory->destroy(i_pairs);
memory->create(i_rij,i_maxt,i_neighmax,3,"PairSNAP::i_rij");
memory->create(i_inside,i_maxt,i_neighmax,"PairSNAP::i_inside");
memory->create(i_wj,i_maxt,i_neighmax,"PairSNAP::i_wj");
memory->create(i_rcutij,i_maxt,i_neighmax,"PairSNAP::i_rcutij");
memory->create(i_ninside,i_maxt,"PairSNAP::i_ninside");
memory->create(i_pairs,i_maxt*i_neighmax,4,"PairSNAP::i_pairs");
}
if (i_max < count) {
int jdim = sna[0]->twojmax+1;
memory->destroy(i_uarraytot_r);
memory->destroy(i_uarraytot_i);
memory->create(i_uarraytot_r,count,jdim,jdim,jdim,
"PairSNAP::i_uarraytot_r");
memory->create(i_uarraytot_i,count,jdim,jdim,jdim,
"PairSNAP::i_uarraytot_i");
if (i_zarray_r != NULL)
for (int i = 0; i < i_max; i++) {
memory->destroy(i_zarray_r[i]);
memory->destroy(i_zarray_i[i]);
}
delete [] i_zarray_r;
delete [] i_zarray_i;
i_zarray_r = new double*****[count];
i_zarray_i = new double*****[count];
for (int i = 0; i < count; i++) {
memory->create(i_zarray_r[i],jdim,jdim,jdim,jdim,jdim,
"PairSNAP::i_zarray_r");
memory->create(i_zarray_i[i],jdim,jdim,jdim,jdim,jdim,
"PairSNAP::i_zarray_i");
}
}
if (i_max < count)
i_max = count;
count = 0;
i_numpairs = 0;
for (int ii = 0; ii < list->inum; ii++) {
if ((do_load_balance <= 0) || ilistmask[ii]) {
int i = list->ilist[ii];
int jnum = list->numneigh[i];
int* jlist = list->firstneigh[i];
const double xtmp = atom->x[i][0];
const double ytmp = atom->x[i][1];
const double ztmp = atom->x[i][2];
const int itype = atom->type[i];
const int ielem = map[itype];
const double radi = radelem[ielem];
int ninside = 0;
for (int jj = 0; jj < jnum; jj++) {
int j = jlist[jj];
j &= NEIGHMASK;
const double delx = atom->x[j][0] - xtmp;
const double dely = atom->x[j][1] - ytmp;
const double delz = atom->x[j][2] - ztmp;
const double rsq = delx*delx + dely*dely + delz*delz;
int jtype = atom->type[j];
int jelem = map[jtype];
i_pairs[i_numpairs][0] = i;
i_pairs[i_numpairs][1] = jj;
i_pairs[i_numpairs][2] = -1;
i_pairs[i_numpairs][3] = count;
if (rsq < cutsq[itype][jtype]&&rsq>1e-20) {
i_rij[count][ninside][0] = delx;
i_rij[count][ninside][1] = dely;
i_rij[count][ninside][2] = delz;
i_inside[count][ninside] = j;
i_wj[count][ninside] = wjelem[jelem];
i_rcutij[count][ninside] = (radi + radelem[jelem])*rcutfac;
// update index list with inside index
i_pairs[i_numpairs][2] = ninside++;
}
i_numpairs++;
}
i_ninside[count] = ninside;
count++;
}
}
for (int ii = 0; ii < ghostinum; ii++) {
int i = ghostilist[ii];
int jnum = ghostnumneigh[i];
int* jlist = ghostneighs+ghostfirstneigh[i];
const double xtmp = atom->x[i][0];
const double ytmp = atom->x[i][1];
const double ztmp = atom->x[i][2];
const int itype = atom->type[i];
const int ielem = map[itype];
const double radi = radelem[ielem];
int ninside = 0;
for (int jj = 0; jj < jnum; jj++) {
int j = jlist[jj];
j &= NEIGHMASK;
const double delx = atom->x[j][0] - xtmp;
const double dely = atom->x[j][1] - ytmp;
const double delz = atom->x[j][2] - ztmp;
const double rsq = delx*delx + dely*dely + delz*delz;
int jtype = atom->type[j];
int jelem = map[jtype];
i_pairs[i_numpairs][0] = i;
i_pairs[i_numpairs][1] = jj;
i_pairs[i_numpairs][2] = -1;
i_pairs[i_numpairs][3] = count;
if (rsq < cutsq[itype][jtype]&&rsq>1e-20) {
i_rij[count][ninside][0] = delx;
i_rij[count][ninside][1] = dely;
i_rij[count][ninside][2] = delz;
i_inside[count][ninside] = j;
i_wj[count][ninside] = wjelem[jelem];
i_rcutij[count][ninside] = (radi + radelem[jelem])*rcutfac;
// update index list with inside index
i_pairs[i_numpairs][2] = ninside++;
}
i_numpairs++;
}
i_ninside[count] = ninside;
count++;
}
#ifdef TIMING_INFO
clock_gettime(CLOCK_REALTIME,&endtime);
timers[0]+=(endtime.tv_sec-starttime.tv_sec+1.0*
(endtime.tv_nsec-starttime.tv_nsec)/1000000000);
#endif
#ifdef TIMING_INFO
clock_gettime(CLOCK_REALTIME,&starttime);
#endif
#if defined(_OPENMP)
#pragma omp parallel for shared(count) default(none)
#endif
for (int ii=0; ii < count; ii++) {
int tid = omp_get_thread_num();
set_sna_to_shared(tid,ii);
//sna[tid]->compute_ui(i_ninside[ii]);
#ifdef TIMING_INFO
clock_gettime(CLOCK_REALTIME,&starttime);
#endif
sna[tid]->compute_ui_omp(i_ninside[ii],MAX(int(nthreads/count),1));
#ifdef TIMING_INFO
clock_gettime(CLOCK_REALTIME,&endtime);
sna[tid]->timers[0]+=(endtime.tv_sec-starttime.tv_sec+1.0*
(endtime.tv_nsec-starttime.tv_nsec)/1000000000);
#endif
}
#ifdef TIMING_INFO
clock_gettime(CLOCK_REALTIME,&starttime);
#endif
for (int ii=0; ii < count; ii++) {
int tid = 0;//omp_get_thread_num();
set_sna_to_shared(tid,ii);
sna[tid]->compute_zi_omp(MAX(int(nthreads/count),1));
}
#ifdef TIMING_INFO
clock_gettime(CLOCK_REALTIME,&endtime);
sna[0]->timers[1]+=(endtime.tv_sec-starttime.tv_sec+1.0*
(endtime.tv_nsec-starttime.tv_nsec)/1000000000);
#endif
#ifdef TIMING_INFO
clock_gettime(CLOCK_REALTIME,&endtime);
timers[1]+=(endtime.tv_sec-starttime.tv_sec+1.0*
(endtime.tv_nsec-starttime.tv_nsec)/1000000000);
#endif
}
/* ----------------------------------------------------------------------
allocate all arrays
------------------------------------------------------------------------- */
void PairSNAP::allocate()
{
allocated = 1;
int n = atom->ntypes;
memory->create(setflag,n+1,n+1,"pair:setflag");
memory->create(cutsq,n+1,n+1,"pair:cutsq");
memory->create(map,n+1,"pair:map");
}
/* ----------------------------------------------------------------------
global settings
------------------------------------------------------------------------- */
void PairSNAP::settings(int narg, char **arg)
{
// set default values for optional arguments
nthreads = -1;
use_shared_arrays=-1;
do_load_balance = 0;
use_optimized = 1;
// optional arguments
for (int i=0; i < narg; i++) {
if (i+2>narg) error->all(FLERR,"Illegal pair_style command");
if (strcmp(arg[i],"nthreads")==0) {
nthreads=force->inumeric(FLERR,arg[++i]);
#if defined(LMP_USER_OMP)
error->all(FLERR,"Must set number of threads via package omp command");
#else
omp_set_num_threads(nthreads);
comm->nthreads=nthreads;
#endif
continue;
}
if (strcmp(arg[i],"optimized")==0) {
use_optimized=force->inumeric(FLERR,arg[++i]);
continue;
}
if (strcmp(arg[i],"shared")==0) {
use_shared_arrays=force->inumeric(FLERR,arg[++i]);
continue;
}
if (strcmp(arg[i],"loadbalance")==0) {
do_load_balance = force->inumeric(FLERR,arg[++i]);
if (do_load_balance) {
double mincutoff = extra_cutoff() +
rcutmax + neighbor->skin;
if (comm->cutghostuser < mincutoff) {
char buffer[255];
//apparently mincutoff is 0 after sprintf command ?????
double tmp = mincutoff + 0.1;
sprintf(buffer, "Communication cutoff is too small "
"for SNAP micro load balancing, increased to %lf",
mincutoff+0.1);
if (comm->me==0)
error->warning(FLERR,buffer);
comm->cutghostuser = tmp;
}
}
continue;
}
if (strcmp(arg[i],"schedule")==0) {
i++;
if (strcmp(arg[i],"static")==0)
schedule_user = 1;
if (strcmp(arg[i],"dynamic")==0)
schedule_user = 2;
if (strcmp(arg[i],"guided")==0)
schedule_user = 3;
if (strcmp(arg[i],"auto")==0)
schedule_user = 4;
if (strcmp(arg[i],"determine")==0)
schedule_user = 5;
if (schedule_user == 0)
error->all(FLERR,"Illegal pair_style command");
continue;
}
error->all(FLERR,"Illegal pair_style command");
}
if (nthreads < 0)
nthreads = comm->nthreads;
if (use_shared_arrays < 0) {
if (nthreads > 1 && atom->nlocal <= 2*nthreads)
use_shared_arrays = 1;
else use_shared_arrays = 0;
}
// check if running non-optimized code with
// optimization flags set
if (!use_optimized)
if (nthreads > 1 ||
use_shared_arrays ||
do_load_balance ||
schedule_user)
error->all(FLERR,"Illegal pair_style command");
}
/* ----------------------------------------------------------------------
set coeffs for one or more type pairs
------------------------------------------------------------------------- */
void PairSNAP::coeff(int narg, char **arg)
{
// read SNAP element names between 2 filenames
// nelements = # of SNAP elements
// elements = list of unique element names
if (narg < 6) error->all(FLERR,"Incorrect args for pair coefficients");
if (!allocated) allocate();
if (nelements) {
for (int i = 0; i < nelements; i++)
delete[] elements[i];
delete[] elements;
memory->destroy(radelem);
memory->destroy(wjelem);
memory->destroy(coeffelem);
}
nelements = narg - 4 - atom->ntypes;
if (nelements < 1) error->all(FLERR,"Incorrect args for pair coefficients");
char* type1 = arg[0];
char* type2 = arg[1];
char* coefffilename = arg[2];
char** elemlist = &arg[3];
char* paramfilename = arg[3+nelements];
char** elemtypes = &arg[4+nelements];
// insure I,J args are * *
if (strcmp(type1,"*") != 0 || strcmp(type2,"*") != 0)
error->all(FLERR,"Incorrect args for pair coefficients");
elements = new char*[nelements];
for (int i = 0; i < nelements; i++) {
char* elemname = elemlist[i];
int n = strlen(elemname) + 1;
elements[i] = new char[n];
strcpy(elements[i],elemname);
}
// read snapcoeff and snapparam files
read_files(coefffilename,paramfilename);
// read args that map atom types to SNAP elements
// map[i] = which element the Ith atom type is, -1 if not mapped
// map[0] is not used
for (int i = 1; i <= atom->ntypes; i++) {
char* elemname = elemtypes[i-1];
int jelem;
for (jelem = 0; jelem < nelements; jelem++)
if (strcmp(elemname,elements[jelem]) == 0)
break;
if (jelem < nelements)
map[i] = jelem;
else if (strcmp(elemname,"NULL") == 0) map[i] = -1;
else error->all(FLERR,"Incorrect args for pair coefficients");
}
// clear setflag since coeff() called once with I,J = * *
int n = atom->ntypes;
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j++)
setflag[i][j] = 0;
// set setflag i,j for type pairs where both are mapped to elements
int count = 0;
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j++)
if (map[i] >= 0 && map[j] >= 0) {
setflag[i][j] = 1;
count++;
}
if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
sna = new SNA*[nthreads];
// allocate memory for per OpenMP thread data which
// is wrapped into the sna class
#if defined(_OPENMP)
#pragma omp parallel default(none)
#endif
{
int tid = omp_get_thread_num();
sna[tid] = new SNA(lmp,rfac0,twojmax,
diagonalstyle,use_shared_arrays,
rmin0,switchflag,bzeroflag);
if (!use_shared_arrays)
sna[tid]->grow_rij(nmax);
}
if (ncoeff != sna[0]->ncoeff) {
printf("ncoeff = %d snancoeff = %d \n",ncoeff,sna[0]->ncoeff);
error->all(FLERR,"Incorrect SNAP parameter file");
}
// Calculate maximum cutoff for all elements
rcutmax = 0.0;
for (int ielem = 0; ielem < nelements; ielem++)
rcutmax = MAX(2.0*radelem[ielem]*rcutfac,rcutmax);
}
/* ----------------------------------------------------------------------
init specific to this pair style
------------------------------------------------------------------------- */
void PairSNAP::init_style()
{
if (force->newton_pair == 0)
error->all(FLERR,"Pair style SNAP requires newton pair on");
// need a full neighbor list
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->half = 0;
neighbor->requests[irequest]->full = 1;
#if defined(_OPENMP)
#pragma omp parallel default(none)
#endif
{
int tid = omp_get_thread_num();
sna[tid]->init();
}
}
/* ----------------------------------------------------------------------
init for one type pair i,j and corresponding j,i
------------------------------------------------------------------------- */
double PairSNAP::init_one(int i, int j)
{
if (setflag[i][j] == 0) error->all(FLERR,"All pair coeffs are not set");
return (radelem[map[i]] +
radelem[map[j]])*rcutfac;
}
/* ---------------------------------------------------------------------- */
void PairSNAP::read_files(char *coefffilename, char *paramfilename)
{
// open SNAP ceofficient file on proc 0
FILE *fpcoeff;
if (comm->me == 0) {
fpcoeff = force->open_potential(coefffilename);
if (fpcoeff == NULL) {
char str[128];
sprintf(str,"Cannot open SNAP coefficient file %s",coefffilename);
error->one(FLERR,str);
}
}
char line[MAXLINE],*ptr;
int eof = 0;
int n;
int nwords = 0;
while (nwords == 0) {
if (comm->me == 0) {
ptr = fgets(line,MAXLINE,fpcoeff);
if (ptr == NULL) {
eof = 1;
fclose(fpcoeff);
} else n = strlen(line) + 1;
}
MPI_Bcast(&eof,1,MPI_INT,0,world);
if (eof) break;
MPI_Bcast(&n,1,MPI_INT,0,world);
MPI_Bcast(line,n,MPI_CHAR,0,world);
// strip comment, skip line if blank
if ((ptr = strchr(line,'#'))) *ptr = '\0';
nwords = atom->count_words(line);
}
if (nwords != 2)
error->all(FLERR,"Incorrect format in SNAP coefficient file");
// words = ptrs to all words in line
// strip single and double quotes from words
char* words[MAXWORD];
int iword = 0;
words[iword] = strtok(line,"' \t\n\r\f");
iword = 1;
words[iword] = strtok(NULL,"' \t\n\r\f");
int nelemfile = atoi(words[0]);
ncoeff = atoi(words[1])-1;
// Set up element lists
memory->create(radelem,nelements,"pair:radelem");
memory->create(wjelem,nelements,"pair:wjelem");
memory->create(coeffelem,nelements,ncoeff+1,"pair:coeffelem");
int *found = new int[nelements];
for (int ielem = 0; ielem < nelements; ielem++)
found[ielem] = 0;
// Loop over elements in the SNAP coefficient file
for (int ielemfile = 0; ielemfile < nelemfile; ielemfile++) {
if (comm->me == 0) {
ptr = fgets(line,MAXLINE,fpcoeff);
if (ptr == NULL) {
eof = 1;
fclose(fpcoeff);
} else n = strlen(line) + 1;
}
MPI_Bcast(&eof,1,MPI_INT,0,world);
if (eof)
error->all(FLERR,"Incorrect format in SNAP coefficient file");
MPI_Bcast(&n,1,MPI_INT,0,world);
MPI_Bcast(line,n,MPI_CHAR,0,world);
nwords = atom->count_words(line);
if (nwords != 3)
error->all(FLERR,"Incorrect format in SNAP coefficient file");
iword = 0;
words[iword] = strtok(line,"' \t\n\r\f");
iword = 1;
words[iword] = strtok(NULL,"' \t\n\r\f");
iword = 2;
words[iword] = strtok(NULL,"' \t\n\r\f");
char* elemtmp = words[0];
double radtmp = atof(words[1]);
double wjtmp = atof(words[2]);
// skip if element name isn't in element list
int ielem;
for (ielem = 0; ielem < nelements; ielem++)
if (strcmp(elemtmp,elements[ielem]) == 0) break;
if (ielem == nelements) {
if (comm->me == 0)
for (int icoeff = 0; icoeff <= ncoeff; icoeff++)
ptr = fgets(line,MAXLINE,fpcoeff);
continue;
}
// skip if element already appeared
if (found[ielem]) {
if (comm->me == 0)
for (int icoeff = 0; icoeff <= ncoeff; icoeff++)
ptr = fgets(line,MAXLINE,fpcoeff);
continue;
}
found[ielem] = 1;
radelem[ielem] = radtmp;
wjelem[ielem] = wjtmp;
if (comm->me == 0) {
if (screen) fprintf(screen,"SNAP Element = %s, Radius %g, Weight %g \n",
elements[ielem], radelem[ielem], wjelem[ielem]);
if (logfile) fprintf(logfile,"SNAP Element = %s, Radius %g, Weight %g \n",
elements[ielem], radelem[ielem], wjelem[ielem]);
}
for (int icoeff = 0; icoeff <= ncoeff; icoeff++) {
if (comm->me == 0) {
ptr = fgets(line,MAXLINE,fpcoeff);
if (ptr == NULL) {
eof = 1;
fclose(fpcoeff);
} else n = strlen(line) + 1;
}
MPI_Bcast(&eof,1,MPI_INT,0,world);
if (eof)
error->all(FLERR,"Incorrect format in SNAP coefficient file");
MPI_Bcast(&n,1,MPI_INT,0,world);
MPI_Bcast(line,n,MPI_CHAR,0,world);
nwords = atom->count_words(line);
if (nwords != 1)
error->all(FLERR,"Incorrect format in SNAP coefficient file");
iword = 0;
words[iword] = strtok(line,"' \t\n\r\f");
coeffelem[ielem][icoeff] = atof(words[0]);
}
}
// set flags for required keywords
rcutfacflag = 0;
twojmaxflag = 0;
// Set defaults for optional keywords
gamma = 1.0;
gammaoneflag = 1;
rfac0 = 0.99363;
rmin0 = 0.0;
diagonalstyle = 3;
switchflag = 1;
- bzeroflag = 0;
+ bzeroflag = 1;
+
// open SNAP parameter file on proc 0
FILE *fpparam;
if (comm->me == 0) {
fpparam = force->open_potential(paramfilename);
if (fpparam == NULL) {
char str[128];
sprintf(str,"Cannot open SNAP parameter file %s",paramfilename);
error->one(FLERR,str);
}
}
eof = 0;
while (1) {
if (comm->me == 0) {
ptr = fgets(line,MAXLINE,fpparam);
if (ptr == NULL) {
eof = 1;
fclose(fpparam);
} else n = strlen(line) + 1;
}
MPI_Bcast(&eof,1,MPI_INT,0,world);
if (eof) break;
MPI_Bcast(&n,1,MPI_INT,0,world);
MPI_Bcast(line,n,MPI_CHAR,0,world);
// strip comment, skip line if blank
if ((ptr = strchr(line,'#'))) *ptr = '\0';
nwords = atom->count_words(line);
if (nwords == 0) continue;
if (nwords != 2)
error->all(FLERR,"Incorrect format in SNAP parameter file");
// words = ptrs to all words in line
// strip single and double quotes from words
char* keywd = strtok(line,"' \t\n\r\f");
char* keyval = strtok(NULL,"' \t\n\r\f");
if (comm->me == 0) {
if (screen) fprintf(screen,"SNAP keyword %s %s \n",keywd,keyval);
if (logfile) fprintf(logfile,"SNAP keyword %s %s \n",keywd,keyval);
}
if (strcmp(keywd,"rcutfac") == 0) {
rcutfac = atof(keyval);
rcutfacflag = 1;
} else if (strcmp(keywd,"twojmax") == 0) {
twojmax = atoi(keyval);
twojmaxflag = 1;
} else if (strcmp(keywd,"gamma") == 0)
gamma = atof(keyval);
else if (strcmp(keywd,"rfac0") == 0)
rfac0 = atof(keyval);
else if (strcmp(keywd,"rmin0") == 0)
rmin0 = atof(keyval);
else if (strcmp(keywd,"diagonalstyle") == 0)
diagonalstyle = atoi(keyval);
else if (strcmp(keywd,"switchflag") == 0)
switchflag = atoi(keyval);
else if (strcmp(keywd,"bzeroflag") == 0)
bzeroflag = atoi(keyval);
else
error->all(FLERR,"Incorrect SNAP parameter file");
}
if (rcutfacflag == 0 || twojmaxflag == 0)
error->all(FLERR,"Incorrect SNAP parameter file");
if (gamma == 1.0) gammaoneflag = 1;
else gammaoneflag = 0;
delete[] found;
}
/* ----------------------------------------------------------------------
memory usage
------------------------------------------------------------------------- */
double PairSNAP::memory_usage()
{
double bytes = Pair::memory_usage();
int n = atom->ntypes+1;
bytes += n*n*sizeof(int);
bytes += n*n*sizeof(double);
bytes += 3*nmax*sizeof(double);
bytes += nmax*sizeof(int);
bytes += (2*ncoeff+1)*sizeof(double);
bytes += (ncoeff*3)*sizeof(double);
bytes += sna[0]->memory_usage()*nthreads;
return bytes;
}
diff --git a/src/USER-CG-CMM/Install.sh b/src/USER-CGSDK/Install.sh
similarity index 100%
rename from src/USER-CG-CMM/Install.sh
rename to src/USER-CGSDK/Install.sh
diff --git a/src/USER-CG-CMM/README b/src/USER-CGSDK/README
similarity index 58%
rename from src/USER-CG-CMM/README
rename to src/USER-CGSDK/README
index b37fbd376..535bd43ac 100644
--- a/src/USER-CG-CMM/README
+++ b/src/USER-CGSDK/README
@@ -1,46 +1,38 @@
This package implements 3 commands which can be used in a LAMMPS input
script:
pair_style lj/sdk
pair_style lj/sdk/coul/long
angle_style sdk
These styles allow coarse grained MD simulations with the
parametrization of Shinoda, DeVane, Klein, Mol Sim, 33, 27 (2007)
(SDK), with extensions to simulate ionic liquids, electrolytes,
lipids and charged amino acids.
See the doc pages for these commands for details.
There are example scripts for using this package in
-examples/USER/cg-cmm.
+examples/USER/cgsdk
This is the second generation implementation reducing the the clutter
of the previous version. For many systems with long range
electrostatics, it will be faster to use pair_style hybrid/overlay
with lj/sdk and coul/long instead of the combined lj/sdk/coul/long
-style, since the number of charged atom types is usually small. To
-exploit this property, the use of the kspace_style pppm/cg is
-recommended over regular pppm. For all new styles, input file backward
-compatibility is provided. The old implementation is still available
-through appending the /old suffix. These will be discontinued and
-removed after the new implementation has been fully validated.
-
-The current version of this package should be considered beta
-quality. The CG potentials work correctly for "normal" situations, but
-have not been testing with all kinds of potential parameters and
-simuation systems.
+style, since the number of charged atom types is usually small.
+To exploit this property, the use of the kspace_style pppm/cg is
+recommended over regular pppm.
The person who created this package is Axel Kohlmeyer at Temple U
(akohlmey at gmail.com). Contact him directly if you have questions.
---------------------------------
Thanks for contributions, support and testing goes to
-Wataru Shinoda (AIST, Tsukuba)
+Wataru Shinoda (Nagoya University)
Russell DeVane (Procter & Gamble)
-Michael L. Klein (CMM / U Penn, Philadelphia)
+Michael L. Klein (Temple University, Philadelphia)
Balasubramanian Sundaram (JNCASR, Bangalore)
-version: 0.99 / 2011-11-29
+version: 1.0 / 2017-04-26
diff --git a/src/USER-CG-CMM/angle_sdk.cpp b/src/USER-CGSDK/angle_sdk.cpp
similarity index 100%
rename from src/USER-CG-CMM/angle_sdk.cpp
rename to src/USER-CGSDK/angle_sdk.cpp
diff --git a/src/USER-CG-CMM/angle_sdk.h b/src/USER-CGSDK/angle_sdk.h
similarity index 98%
rename from src/USER-CG-CMM/angle_sdk.h
rename to src/USER-CGSDK/angle_sdk.h
index fbd546118..a5d917e57 100644
--- a/src/USER-CG-CMM/angle_sdk.h
+++ b/src/USER-CGSDK/angle_sdk.h
@@ -1,63 +1,62 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef ANGLE_CLASS
AngleStyle(sdk,AngleSDK)
-AngleStyle(cg/cmm,AngleSDK)
#else
#ifndef LMP_ANGLE_SDK_H
#define LMP_ANGLE_SDK_H
#include <stdio.h>
#include "angle.h"
namespace LAMMPS_NS {
class AngleSDK : public Angle {
public:
AngleSDK(class LAMMPS *);
virtual ~AngleSDK();
virtual void compute(int, int);
void coeff(int, char **);
void init_style();
double equilibrium_angle(int);
void write_restart(FILE *);
void read_restart(FILE *);
void write_data(FILE *);
double single(int, int, int, int);
protected:
double *k,*theta0;
// scaling factor for repulsive 1-3 interaction
double *repscale;
// parameters from SDK pair style
int **lj_type;
double **lj1,**lj2, **lj3, **lj4;
double **rminsq,**emin;
int repflag; // 1 if we have to handle 1-3 repulsion
void ev_tally13(int, int, int, int, double, double,
double, double, double);
void allocate();
};
}
#endif
#endif
diff --git a/src/USER-CG-CMM/lj_sdk_common.h b/src/USER-CGSDK/lj_sdk_common.h
similarity index 100%
rename from src/USER-CG-CMM/lj_sdk_common.h
rename to src/USER-CGSDK/lj_sdk_common.h
diff --git a/src/USER-CG-CMM/pair_lj_sdk.cpp b/src/USER-CGSDK/pair_lj_sdk.cpp
similarity index 100%
rename from src/USER-CG-CMM/pair_lj_sdk.cpp
rename to src/USER-CGSDK/pair_lj_sdk.cpp
diff --git a/src/USER-CG-CMM/pair_lj_sdk.h b/src/USER-CGSDK/pair_lj_sdk.h
similarity index 98%
rename from src/USER-CG-CMM/pair_lj_sdk.h
rename to src/USER-CGSDK/pair_lj_sdk.h
index de27485c1..ef0263c06 100644
--- a/src/USER-CG-CMM/pair_lj_sdk.h
+++ b/src/USER-CGSDK/pair_lj_sdk.h
@@ -1,76 +1,75 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Axel Kohlmeyer (Temple U)
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/sdk,PairLJSDK)
-PairStyle(cg/cmm,PairLJSDK)
#else
#ifndef LMP_PAIR_LJ_SDK_H
#define LMP_PAIR_LJ_SDK_H
#include "pair.h"
namespace LAMMPS_NS {
class LAMMPS;
class PairLJSDK : public Pair {
public:
PairLJSDK(LAMMPS *);
virtual ~PairLJSDK();
virtual void compute(int, int);
virtual void settings(int, char **);
virtual void coeff(int, char **);
virtual double init_one(int, int);
void write_restart(FILE *);
void read_restart(FILE *);
void write_restart_settings(FILE *);
void read_restart_settings(FILE *);
void write_data(FILE *);
void write_data_all(FILE *);
double single(int, int, int, int, double, double, double, double &);
void *extract(const char *, int &);
virtual double memory_usage();
protected:
int **lj_type; // type of lennard jones potential
double **cut;
double **epsilon,**sigma;
double **lj1,**lj2,**lj3,**lj4,**offset;
// cutoff and offset for minimum of LJ potential
// to be used in SDK angle potential, which
// uses only the repulsive part of the potential
double **rminsq, **emin;
double cut_global;
virtual void allocate();
private:
template <int EVFLAG, int EFLAG, int NEWTON_PAIR> void eval();
};
}
#endif
#endif
diff --git a/src/USER-CG-CMM/pair_lj_sdk_coul_long.cpp b/src/USER-CGSDK/pair_lj_sdk_coul_long.cpp
similarity index 100%
rename from src/USER-CG-CMM/pair_lj_sdk_coul_long.cpp
rename to src/USER-CGSDK/pair_lj_sdk_coul_long.cpp
diff --git a/src/USER-CG-CMM/pair_lj_sdk_coul_long.h b/src/USER-CGSDK/pair_lj_sdk_coul_long.h
similarity index 97%
rename from src/USER-CG-CMM/pair_lj_sdk_coul_long.h
rename to src/USER-CGSDK/pair_lj_sdk_coul_long.h
index 508ffe5e6..57779cc0b 100644
--- a/src/USER-CG-CMM/pair_lj_sdk_coul_long.h
+++ b/src/USER-CGSDK/pair_lj_sdk_coul_long.h
@@ -1,77 +1,76 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Axel Kohlmeyer (Temple U)
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/sdk/coul/long,PairLJSDKCoulLong)
-PairStyle(cg/cmm/coul/long,PairLJSDKCoulLong)
#else
#ifndef LMP_PAIR_LJ_SDK_COUL_LONG_H
#define LMP_PAIR_LJ_SDK_COUL_LONG_H
#include "pair.h"
namespace LAMMPS_NS {
class PairLJSDKCoulLong : public Pair {
public:
PairLJSDKCoulLong(class LAMMPS *);
virtual ~PairLJSDKCoulLong();
virtual void compute(int, int);
virtual void settings(int, char **);
void coeff(int, char **);
void init_style();
double init_one(int, int);
void write_restart(FILE *);
void read_restart(FILE *);
void write_data(FILE *);
void write_data_all(FILE *);
virtual void write_restart_settings(FILE *);
virtual void read_restart_settings(FILE *);
virtual double single(int, int, int, int, double, double, double, double &);
virtual void *extract(const char *, int &);
virtual double memory_usage();
protected:
double **cut_lj,**cut_ljsq;
double cut_coul,cut_coulsq;
double **epsilon,**sigma;
double **lj1,**lj2,**lj3,**lj4,**offset;
int **lj_type;
// cutoff and offset for minimum of LJ potential
// to be used in SDK angle potential, which
// uses only the repulsive part of the potential
double **rminsq, **emin;
double cut_lj_global;
double g_ewald;
void allocate();
private:
template <int EVFLAG, int EFLAG, int NEWTON_PAIR> void eval();
};
}
#endif
#endif
diff --git a/src/USER-CG-CMM/pair_lj_sdk_coul_msm.cpp b/src/USER-CGSDK/pair_lj_sdk_coul_msm.cpp
similarity index 100%
rename from src/USER-CG-CMM/pair_lj_sdk_coul_msm.cpp
rename to src/USER-CGSDK/pair_lj_sdk_coul_msm.cpp
diff --git a/src/USER-CG-CMM/pair_lj_sdk_coul_msm.h b/src/USER-CGSDK/pair_lj_sdk_coul_msm.h
similarity index 97%
rename from src/USER-CG-CMM/pair_lj_sdk_coul_msm.h
rename to src/USER-CGSDK/pair_lj_sdk_coul_msm.h
index be56c0cec..8438ced66 100644
--- a/src/USER-CG-CMM/pair_lj_sdk_coul_msm.h
+++ b/src/USER-CGSDK/pair_lj_sdk_coul_msm.h
@@ -1,57 +1,56 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Axel Kohlmeyer (Temple U)
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/sdk/coul/msm,PairLJSDKCoulMSM)
-PairStyle(cg/cmm/coul/msm,PairLJSDKCoulMSM)
#else
#ifndef LMP_PAIR_LJ_SDK_COUL_MSM_H
#define LMP_PAIR_LJ_SDK_COUL_MSM_H
#include "pair_lj_sdk_coul_long.h"
namespace LAMMPS_NS {
class PairLJSDKCoulMSM : public PairLJSDKCoulLong {
public:
PairLJSDKCoulMSM(class LAMMPS *);
virtual ~PairLJSDKCoulMSM() {};
virtual void compute(int, int);
virtual double single(int, int, int, int, double, double, double, double &);
virtual void *extract(const char *, int &);
private:
template <int EVFLAG, int EFLAG, int NEWTON_PAIR> void eval_msm();
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Must use 'kspace_modify pressure/scalar no' with Pair style
The kspace scalar pressure option is not (yet) compatible with at least one of
the defined Pair styles.
*/
diff --git a/src/USER-MISC/fix_srp.cpp b/src/USER-MISC/fix_srp.cpp
index fbd8473cb..f3dec42a8 100644
--- a/src/USER-MISC/fix_srp.cpp
+++ b/src/USER-MISC/fix_srp.cpp
@@ -1,631 +1,638 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing authors: Timothy Sirk (ARL), Pieter in't Veld (BASF)
------------------------------------------------------------------------- */
#include <string.h>
#include <stdlib.h>
#include "fix_srp.h"
#include "atom.h"
#include "force.h"
#include "domain.h"
#include "comm.h"
#include "memory.h"
#include "error.h"
#include "neighbor.h"
#include "atom_vec.h"
#include "modify.h"
using namespace LAMMPS_NS;
using namespace FixConst;
/* ---------------------------------------------------------------------- */
FixSRP::FixSRP(LAMMPS *lmp, int narg, char **arg) : Fix(lmp, narg, arg)
{
// settings
nevery=1;
peratom_freq = 1;
time_integrate = 0;
create_attribute = 0;
comm_border = 2;
// restart settings
restart_global = 1;
restart_peratom = 1;
restart_pbc = 1;
// per-atom array width 2
peratom_flag = 1;
size_peratom_cols = 2;
// initial allocation of atom-based array
// register with Atom class
array = NULL;
grow_arrays(atom->nmax);
// extends pack_exchange()
atom->add_callback(0);
atom->add_callback(1); // restart
atom->add_callback(2);
// initialize to illegal values so we capture
btype = -1;
bptype = -1;
// zero
for (int i = 0; i < atom->nmax; i++)
for (int m = 0; m < 2; m++)
array[i][m] = 0.0;
}
/* ---------------------------------------------------------------------- */
FixSRP::~FixSRP()
{
// unregister callbacks to this fix from Atom class
atom->delete_callback(id,0);
atom->delete_callback(id,1);
atom->delete_callback(id,2);
memory->destroy(array);
}
/* ---------------------------------------------------------------------- */
int FixSRP::setmask()
{
int mask = 0;
mask |= PRE_FORCE;
mask |= PRE_EXCHANGE;
mask |= POST_RUN;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixSRP::init()
{
if (force->pair_match("hybrid",1) == NULL)
error->all(FLERR,"Cannot use pair srp without pair_style hybrid");
+ int has_rigid = 0;
+ for (int i = 0; i < modify->nfix; i++)
+ if (strncmp(modify->fix[i]->style,"rigid",5) == 0) ++has_rigid;
+
+ if (has_rigid > 0)
+ error->all(FLERR,"Pair srp is not compatible with rigid fixes.");
+
if ((bptype < 1) || (bptype > atom->ntypes))
error->all(FLERR,"Illegal bond particle type");
// fix SRP should be the first fix running at the PRE_EXCHANGE step.
// Otherwise it might conflict with, e.g. fix deform
if (modify->n_pre_exchange > 1) {
char *first = modify->fix[modify->list_pre_exchange[0]]->id;
if ((comm->me == 0) && (strcmp(id,first) != 0))
error->warning(FLERR,"Internal fix for pair srp defined too late."
" May lead to incorrect behavior.");
}
// setup neigh exclusions for diff atom types
// bond particles do not interact with other types
// type bptype only interacts with itself
char* arg1[4];
arg1[0] = (char *) "exclude";
arg1[1] = (char *) "type";
char c0[20];
char c1[20];
for(int z = 1; z < atom->ntypes; z++) {
if(z == bptype)
continue;
sprintf(c0, "%d", z);
arg1[2] = c0;
sprintf(c1, "%d", bptype);
arg1[3] = c1;
neighbor->modify_params(4, arg1);
}
}
/* ----------------------------------------------------------------------
insert bond particles
------------------------------------------------------------------------- */
void FixSRP::setup_pre_force(int zz)
{
double **x = atom->x;
double **xold;
tagint *tag = atom->tag;
tagint *tagold;
int *type = atom->type;
int* dlist;
AtomVec *avec = atom->avec;
int **bondlist = neighbor->bondlist;
int nlocal, nlocal_old;
nlocal = nlocal_old = atom->nlocal;
bigint nall = atom->nlocal + atom->nghost;
int nbondlist = neighbor->nbondlist;
int i,j,n;
// make a copy of all coordinates and tags
// that is consistent with the bond list as
// atom->x will be affected by creating/deleting atoms.
// also compile list of local atoms to be deleted.
memory->create(xold,nall,3,"fix_srp:xold");
memory->create(tagold,nall,"fix_srp:tagold");
memory->create(dlist,nall,"fix_srp:dlist");
for (i = 0; i < nall; i++){
xold[i][0] = x[i][0];
xold[i][1] = x[i][1];
xold[i][2] = x[i][2];
tagold[i]=tag[i];
dlist[i] = (type[i] == bptype) ? 1 : 0;
for (n = 0; n < 2; n++)
array[i][n] = 0.0;
}
// delete local atoms flagged in dlist
i = 0;
int ndel = 0;
while (i < nlocal) {
if (dlist[i]) {
avec->copy(nlocal-1,i,1);
dlist[i] = dlist[nlocal-1];
nlocal--;
ndel++;
} else i++;
}
atom->nlocal = nlocal;
memory->destroy(dlist);
int nadd = 0;
double rsqold = 0.0;
double delx, dely, delz, rmax, rsq, rsqmax;
double xone[3];
for (n = 0; n < nbondlist; n++) {
// consider only the user defined bond type
// btype of zero considers all bonds
if(btype > 0 && bondlist[n][2] != btype)
continue;
i = bondlist[n][0];
j = bondlist[n][1];
// position of bond i
xone[0] = (xold[i][0] + xold[j][0])*0.5;
xone[1] = (xold[i][1] + xold[j][1])*0.5;
xone[2] = (xold[i][2] + xold[j][2])*0.5;
// record longest bond
// this used to set ghost cutoff
delx = xold[j][0] - xold[i][0];
dely = xold[j][1] - xold[i][1];
delz = xold[j][2] - xold[i][2];
rsq = delx*delx + dely*dely + delz*delz;
if(rsq > rsqold) rsqold = rsq;
// make one particle for each bond
// i is local
// if newton bond, always make particle
// if j is local, always make particle
// if j is ghost, decide from tag
if ((force->newton_bond) || (j < nlocal_old) || (tagold[i] > tagold[j])) {
atom->natoms++;
avec->create_atom(bptype,xone);
// pack tag i/j into buffer for comm
array[atom->nlocal-1][0] = static_cast<double>(tagold[i]);
array[atom->nlocal-1][1] = static_cast<double>(tagold[j]);
nadd++;
}
}
bigint nblocal = atom->nlocal;
MPI_Allreduce(&nblocal,&atom->natoms,1,MPI_LMP_BIGINT,MPI_SUM,world);
// free temporary storage
memory->destroy(xold);
memory->destroy(tagold);
char str[128];
int nadd_all = 0, ndel_all = 0;
MPI_Allreduce(&ndel,&ndel_all,1,MPI_INT,MPI_SUM,world);
MPI_Allreduce(&nadd,&nadd_all,1,MPI_INT,MPI_SUM,world);
if(comm->me == 0){
sprintf(str, "Removed/inserted %d/%d bond particles.", ndel_all,nadd_all);
error->message(FLERR,str);
}
// check ghost comm distances
// warn and change if shorter from estimate
// ghost atoms must be present for bonds on edge of neighbor cutoff
// extend cutghost slightly more than half of the longest bond
MPI_Allreduce(&rsqold,&rsqmax,1,MPI_DOUBLE,MPI_MAX,world);
rmax = sqrt(rsqmax);
double cutneighmax_srp = neighbor->cutneighmax + 0.51*rmax;
// find smallest cutghost
double cutghostmin = comm->cutghost[0];
if (cutghostmin > comm->cutghost[1])
cutghostmin = comm->cutghost[1];
if (cutghostmin > comm->cutghost[2])
cutghostmin = comm->cutghost[2];
// stop if cutghost is insufficient
if (cutneighmax_srp > cutghostmin){
sprintf(str, "Communication cutoff too small for fix srp. "
"Need %f, current %f.", cutneighmax_srp, cutghostmin);
error->all(FLERR,str);
}
// assign tags for new atoms, update map
atom->tag_extend();
if (atom->map_style) {
atom->nghost = 0;
atom->map_init();
atom->map_set();
}
// put new particles in the box before exchange
// move owned to new procs
// get ghosts
// build neigh lists again
// if triclinic, lambda coords needed for pbc, exchange, borders
if (domain->triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
comm->setup();
if (neighbor->style) neighbor->setup_bins();
comm->exchange();
if (atom->sortfreq > 0) atom->sort();
comm->borders();
// back to box coords
if (domain->triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
domain->image_check();
domain->box_too_small_check();
modify->setup_pre_neighbor();
neighbor->build();
neighbor->ncalls = 0;
// new atom counts
nlocal = atom->nlocal;
nall = atom->nlocal + atom->nghost;
// zero all forces
for(i = 0; i < nall; i++)
atom->f[i][0] = atom->f[i][1] = atom->f[i][2] = 0.0;
// do not include bond particles in thermo output
// remove them from all groups. set their velocity to zero.
for(i=0; i< nlocal; i++)
if(atom->type[i] == bptype) {
atom->mask[i] = 0;
atom->v[i][0] = atom->v[i][1] = atom->v[i][2] = 0.0;
}
}
/* ----------------------------------------------------------------------
set position of bond particles
------------------------------------------------------------------------- */
void FixSRP::pre_exchange()
{
// update ghosts
comm->forward_comm();
// reassign bond particle coordinates to midpoint of bonds
// only need to do this before neigh rebuild
double **x=atom->x;
int i,j;
int nlocal = atom->nlocal;
for(int ii = 0; ii < nlocal; ii++){
if(atom->type[ii] != bptype) continue;
i = atom->map(static_cast<tagint>(array[ii][0]));
if(i < 0) error->all(FLERR,"Fix SRP failed to map atom");
i = domain->closest_image(ii,i);
j = atom->map(static_cast<tagint>(array[ii][1]));
if(j < 0) error->all(FLERR,"Fix SRP failed to map atom");
j = domain->closest_image(ii,j);
// position of bond particle ii
atom->x[ii][0] = (x[i][0] + x[j][0])*0.5;
atom->x[ii][1] = (x[i][1] + x[j][1])*0.5;
atom->x[ii][2] = (x[i][2] + x[j][2])*0.5;
}
}
/* ----------------------------------------------------------------------
memory usage of local atom-based array
------------------------------------------------------------------------- */
double FixSRP::memory_usage()
{
double bytes = atom->nmax*2 * sizeof(double);
return bytes;
}
/* ----------------------------------------------------------------------
allocate atom-based array
------------------------------------------------------------------------- */
void FixSRP::grow_arrays(int nmax)
{
memory->grow(array,nmax,2,"fix_srp:array");
array_atom = array;
}
/* ----------------------------------------------------------------------
copy values within local atom-based array
called when move to new proc
------------------------------------------------------------------------- */
void FixSRP::copy_arrays(int i, int j, int delflag)
{
for (int m = 0; m < 2; m++)
array[j][m] = array[i][m];
}
/* ----------------------------------------------------------------------
initialize one atom's array values
called when atom is created
------------------------------------------------------------------------- */
void FixSRP::set_arrays(int i)
{
array[i][0] = -1;
array[i][1] = -1;
}
/* ----------------------------------------------------------------------
pack values in local atom-based array for exchange with another proc
------------------------------------------------------------------------- */
int FixSRP::pack_exchange(int i, double *buf)
{
for (int m = 0; m < 2; m++) buf[m] = array[i][m];
return 2;
}
/* ----------------------------------------------------------------------
unpack values in local atom-based array from exchange with another proc
------------------------------------------------------------------------- */
int FixSRP::unpack_exchange(int nlocal, double *buf)
{
for (int m = 0; m < 2; m++) array[nlocal][m] = buf[m];
return 2;
}
/* ----------------------------------------------------------------------
pack values for border communication at re-neighboring
------------------------------------------------------------------------- */
int FixSRP::pack_border(int n, int *list, double *buf)
{
// pack buf for border com
int i,j;
int m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = array[j][0];
buf[m++] = array[j][1];
}
return m;
}
/* ----------------------------------------------------------------------
unpack values for border communication at re-neighboring
------------------------------------------------------------------------- */
int FixSRP::unpack_border(int n, int first, double *buf)
{
// unpack buf into array
int i,last;
int m = 0;
last = first + n;
for (i = first; i < last; i++){
array[i][0] = buf[m++];
array[i][1] = buf[m++];
}
return m;
}
/* ----------------------------------------------------------------------
remove particles after run
------------------------------------------------------------------------- */
void FixSRP::post_run()
{
// all bond particles are removed after each run
// useful for write_data and write_restart commands
// since those commands occur between runs
bigint natoms_previous = atom->natoms;
int nlocal = atom->nlocal;
int* dlist;
memory->create(dlist,nlocal,"fix_srp:dlist");
for (int i = 0; i < nlocal; i++){
if(atom->type[i] == bptype)
dlist[i] = 1;
else
dlist[i] = 0;
}
// delete local atoms flagged in dlist
// reset nlocal
AtomVec *avec = atom->avec;
int i = 0;
while (i < nlocal) {
if (dlist[i]) {
avec->copy(nlocal-1,i,1);
dlist[i] = dlist[nlocal-1];
nlocal--;
} else i++;
}
atom->nlocal = nlocal;
memory->destroy(dlist);
// reset atom->natoms
// reset atom->map if it exists
// set nghost to 0 so old ghosts won't be mapped
bigint nblocal = atom->nlocal;
MPI_Allreduce(&nblocal,&atom->natoms,1,MPI_LMP_BIGINT,MPI_SUM,world);
if (atom->map_style) {
atom->nghost = 0;
atom->map_init();
atom->map_set();
}
// print before and after atom count
bigint ndelete = natoms_previous - atom->natoms;
if (comm->me == 0) {
if (screen) fprintf(screen,"Deleted " BIGINT_FORMAT
" atoms, new total = " BIGINT_FORMAT "\n",
ndelete,atom->natoms);
if (logfile) fprintf(logfile,"Deleted " BIGINT_FORMAT
" atoms, new total = " BIGINT_FORMAT "\n",
ndelete,atom->natoms);
}
// verlet calls box_too_small_check() in post_run
// this check maps all bond partners
// therefore need ghosts
// need to convert to lambda coords before apply pbc
if (domain->triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
comm->setup();
comm->exchange();
if (atom->sortfreq > 0) atom->sort();
comm->borders();
// change back to box coordinates
if (domain->triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
}
/* ----------------------------------------------------------------------
pack values in local atom-based arrays for restart file
------------------------------------------------------------------------- */
int FixSRP::pack_restart(int i, double *buf)
{
int m = 0;
buf[m++] = 3;
buf[m++] = array[i][0];
buf[m++] = array[i][1];
return m;
}
/* ----------------------------------------------------------------------
unpack values from atom->extra array to restart the fix
------------------------------------------------------------------------- */
void FixSRP::unpack_restart(int nlocal, int nth)
{
double **extra = atom->extra;
// skip to Nth set of extra values
int m = 0;
for (int i = 0; i < nth; i++){
m += extra[nlocal][m];
}
m++;
array[nlocal][0] = extra[nlocal][m++];
array[nlocal][1] = extra[nlocal][m++];
}
/* ----------------------------------------------------------------------
maxsize of any atom's restart data
------------------------------------------------------------------------- */
int FixSRP::maxsize_restart()
{
return 3;
}
/* ----------------------------------------------------------------------
size of atom nlocal's restart data
------------------------------------------------------------------------- */
int FixSRP::size_restart(int nlocal)
{
return 3;
}
/* ----------------------------------------------------------------------
pack global state of Fix
------------------------------------------------------------------------- */
void FixSRP::write_restart(FILE *fp)
{
int n = 0;
double list[3];
list[n++] = comm->cutghostuser;
list[n++] = btype;
list[n++] = bptype;
if (comm->me == 0) {
int size = n * sizeof(double);
fwrite(&size,sizeof(int),1,fp);
fwrite(list,sizeof(double),n,fp);
}
}
/* ----------------------------------------------------------------------
use info from restart file to restart the Fix
------------------------------------------------------------------------- */
void FixSRP::restart(char *buf)
{
int n = 0;
double *list = (double *) buf;
comm->cutghostuser = static_cast<double> (list[n++]);
btype = static_cast<int> (list[n++]);
bptype = static_cast<int> (list[n++]);
}
/* ----------------------------------------------------------------------
interface with pair class
pair srp sets the bond type in this fix
------------------------------------------------------------------------- */
int FixSRP::modify_param(int narg, char **arg)
{
if (strcmp(arg[0],"btype") == 0) {
btype = atoi(arg[1]);
return 2;
}
if (strcmp(arg[0],"bptype") == 0) {
bptype = atoi(arg[1]);
return 2;
}
return 0;
}
diff --git a/src/USER-MISC/improper_ring.cpp b/src/USER-MISC/improper_ring.cpp
index 5a7937e4e..adf17ed1d 100644
--- a/src/USER-MISC/improper_ring.cpp
+++ b/src/USER-MISC/improper_ring.cpp
@@ -1,339 +1,339 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Georgios G. Vogiatzis (CoMSE, NTU Athens),
gvog@chemeng.ntua.gr
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Description: This file implements the improper potential introduced
by Destree et al., in Equation 9 of:
- M. Destree, F. Laupretre, A. Lyulin, and J.-P.
Ryckaert, J. Chem. Phys. 112, 9632 (2000),
and subsequently referred in:
- A.V. Lyulin, M.A.J Michels, Macromolecules, 35, 1463,
(2002)
This potential does not affect small amplitude vibrations
but is used in an ad hoc way to prevent the onset of
accidentally large amplitude fluctuations leading to
the occurrence of a planar conformation of the three
bonds i, i + 1 and i', an intermediate conformation
toward the chiral inversion of a methine carbon.
In the "Impropers" section of data file four atoms:
i, j, k and l are specified with i,j and l lying on the
backbone of the chain and k specifying the chirality
of j.
------------------------------------------------------------------------- */
#include <mpi.h>
#include <math.h>
#include <stdlib.h>
#include "improper_ring.h"
#include "atom.h"
#include "comm.h"
#include "neighbor.h"
#include "domain.h"
#include "force.h"
#include "update.h"
#include "math_const.h"
#include "math_special.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace MathConst;
using namespace MathSpecial;
#define TOLERANCE 0.05
#define SMALL 0.001
/* ---------------------------------------------------------------------- */
ImproperRing::ImproperRing(LAMMPS *lmp) : Improper(lmp) {}
/* ---------------------------------------------------------------------- */
ImproperRing::~ImproperRing()
{
if (allocated) {
memory->destroy(setflag);
memory->destroy(k);
memory->destroy(chi);
}
}
/* ---------------------------------------------------------------------- */
void ImproperRing::compute(int eflag, int vflag)
{
/* Be careful!: "chi" is the equilibrium angle in radians. */
int i1,i2,i3,i4,n,type;
double eimproper ;
/* Compatibility variables. */
double vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z;
double f1[3], f3[3], f4[3];
/* Actual computation variables. */
int at1[3], at2[3], at3[3], icomb;
double bvec1x[3], bvec1y[3], bvec1z[3],
bvec2x[3], bvec2y[3], bvec2z[3],
bvec1n[3], bvec2n[3], bend_angle[3];
double angle_summer, angfac, cfact1, cfact2, cfact3;
double cjiji, ckjji, ckjkj, fix, fiy, fiz, fjx, fjy, fjz, fkx, fky, fkz;
eimproper = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = 0;
/* References to simulation data. */
double **x = atom->x;
double **f = atom->f;
int **improperlist = neighbor->improperlist;
int nimproperlist = neighbor->nimproperlist;
int nlocal = atom->nlocal;
int newton_bond = force->newton_bond;
/* A description of the potential can be found in
Macromolecules 35, pp. 1463-1472 (2002). */
for (n = 0; n < nimproperlist; n++)
{
/* Take the ids of the atoms contributing to the improper potential. */
i1 = improperlist[n][0]; /* Atom "1" of Figure 1 from the above reference.*/
i2 = improperlist[n][1]; /* Atom "2" ... */
i3 = improperlist[n][2]; /* Atom "3" ... */
i4 = improperlist[n][3]; /* Atom "9" ... */
type = improperlist[n][4];
/* Calculate the necessary variables for LAMMPS implementation.
if (evflag) ev_tally(i1,i2,i3,i4,nlocal,newton_bond,eimproper,f1,f3,f4,
vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z);
Although, they are irrelevant to the calculation of the potential, we keep
them for maximal compatibility. */
vb1x = x[i1][0] - x[i2][0]; vb1y = x[i1][1] - x[i2][1]; vb1z = x[i1][2] - x[i2][2];
vb2x = x[i3][0] - x[i2][0]; vb2y = x[i3][1] - x[i2][1]; vb2z = x[i3][2] - x[i2][2];
vb3x = x[i4][0] - x[i3][0]; vb3y = x[i4][1] - x[i3][1]; vb3z = x[i4][2] - x[i3][2];
/* Pass the atom tags to form the necessary combinations. */
at1[0] = i1; at2[0] = i2; at3[0] = i4; /* ids: 1-2-9 */
at1[1] = i1; at2[1] = i2; at3[1] = i3; /* ids: 1-2-3 */
at1[2] = i4; at2[2] = i2; at3[2] = i3; /* ids: 9-2-3 */
/* Initialize the sum of the angles differences. */
angle_summer = 0.0;
/* Take a loop over the three angles, defined by each triad: */
for (icomb = 0; icomb < 3; icomb ++)
{
/* Bond vector connecting the first and the second atom. */
bvec1x[icomb] = x[at2[icomb]][0] - x[at1[icomb]][0];
bvec1y[icomb] = x[at2[icomb]][1] - x[at1[icomb]][1];
bvec1z[icomb] = x[at2[icomb]][2] - x[at1[icomb]][2];
/* also calculate the norm of the vector: */
bvec1n[icomb] = sqrt( bvec1x[icomb]*bvec1x[icomb]
+ bvec1y[icomb]*bvec1y[icomb]
+ bvec1z[icomb]*bvec1z[icomb]);
/* Bond vector connecting the second and the third atom. */
bvec2x[icomb] = x[at3[icomb]][0] - x[at2[icomb]][0];
bvec2y[icomb] = x[at3[icomb]][1] - x[at2[icomb]][1];
bvec2z[icomb] = x[at3[icomb]][2] - x[at2[icomb]][2];
/* also calculate the norm of the vector: */
bvec2n[icomb] = sqrt( bvec2x[icomb]*bvec2x[icomb]
+ bvec2y[icomb]*bvec2y[icomb]
+ bvec2z[icomb]*bvec2z[icomb]);
/* Calculate the bending angle of the atom triad: */
bend_angle[icomb] = ( bvec2x[icomb]*bvec1x[icomb]
+ bvec2y[icomb]*bvec1y[icomb]
+ bvec2z[icomb]*bvec1z[icomb]);
bend_angle[icomb] /= (bvec1n[icomb] * bvec2n[icomb]);
if (bend_angle[icomb] > 1.0) bend_angle[icomb] -= SMALL;
if (bend_angle[icomb] < -1.0) bend_angle[icomb] += SMALL;
/* Append the current angle to the sum of angle differences. */
angle_summer += (bend_angle[icomb] - chi[type]);
}
if (eflag) eimproper = (1.0/6.0) *k[type] * powint(angle_summer,6);
/*
printf("The tags: %d-%d-%d-%d, of type %d .\n",atom->tag[i1],atom->tag[i2],atom->tag[i3],atom->tag[i4],type);
// printf("The coordinates of the first: %f, %f, %f.\n", x[i1][0], x[i1][1], x[i1][2]);
// printf("The coordinates of the second: %f, %f, %f.\n", x[i2][0], x[i2][1], x[i2][2]);
// printf("The coordinates of the third: %f, %f, %f.\n", x[i3][0], x[i3][1], x[i3][2]);
// printf("The coordinates of the fourth: %f, %f, %f.\n", x[i4][0], x[i4][1], x[i4][2]);
printf("The angles are: %f / %f / %f equilibrium: %f.\n", bend_angle[0], bend_angle[1], bend_angle[2],chi[type]);
printf("The energy of the improper: %f with prefactor %f.\n", eimproper,(1.0/6.0)*k[type]);
printf("The sum of the angles: %f.\n", angle_summer);
*/
/* Force calculation acting on all atoms.
Calculate the derivatives of the potential. */
angfac = k[type] * powint(angle_summer,5);
f1[0] = 0.0; f1[1] = 0.0; f1[2] = 0.0;
f3[0] = 0.0; f3[1] = 0.0; f3[2] = 0.0;
f4[0] = 0.0; f4[1] = 0.0; f4[2] = 0.0;
/* Take a loop over the three angles, defined by each triad: */
for (icomb = 0; icomb < 3; icomb ++)
{
/* Calculate the squares of the distances. */
cjiji = bvec1n[icomb] * bvec1n[icomb]; ckjkj = bvec2n[icomb] * bvec2n[icomb];
ckjji = bvec2x[icomb] * bvec1x[icomb]
+ bvec2y[icomb] * bvec1y[icomb]
+ bvec2z[icomb] * bvec1z[icomb] ;
cfact1 = angfac / (sqrt(ckjkj * cjiji));
cfact2 = ckjji / ckjkj;
cfact3 = ckjji / cjiji;
- /* Calculate the force acted on the thrid atom of the angle. */
+ /* Calculate the force acted on the third atom of the angle. */
fkx = cfact2 * bvec2x[icomb] - bvec1x[icomb];
fky = cfact2 * bvec2y[icomb] - bvec1y[icomb];
fkz = cfact2 * bvec2z[icomb] - bvec1z[icomb];
/* Calculate the force acted on the first atom of the angle. */
fix = bvec2x[icomb] - cfact3 * bvec1x[icomb];
fiy = bvec2y[icomb] - cfact3 * bvec1y[icomb];
fiz = bvec2z[icomb] - cfact3 * bvec1z[icomb];
/* Finally, calculate the force acted on the middle atom of the angle.*/
fjx = - fix - fkx; fjy = - fiy - fky; fjz = - fiz - fkz;
/* Consider the appropriate scaling of the forces: */
fix *= cfact1; fiy *= cfact1; fiz *= cfact1;
fjx *= cfact1; fjy *= cfact1; fjz *= cfact1;
fkx *= cfact1; fky *= cfact1; fkz *= cfact1;
if (at1[icomb] == i1) {f1[0] += fix; f1[1] += fiy; f1[2] += fiz;}
else if (at2[icomb] == i1) {f1[0] += fjx; f1[1] += fjy; f1[2] += fjz;}
else if (at3[icomb] == i1) {f1[0] += fkx; f1[1] += fky; f1[2] += fkz;}
if (at1[icomb] == i3) {f3[0] += fix; f3[1] += fiy; f3[2] += fiz;}
else if (at2[icomb] == i3) {f3[0] += fjx; f3[1] += fjy; f3[2] += fjz;}
else if (at3[icomb] == i3) {f3[0] += fkx; f3[1] += fky; f3[2] += fkz;}
if (at1[icomb] == i4) {f4[0] += fix; f4[1] += fiy; f4[2] += fiz;}
else if (at2[icomb] == i4) {f4[0] += fjx; f4[1] += fjy; f4[2] += fjz;}
else if (at3[icomb] == i4) {f4[0] += fkx; f4[1] += fky; f4[2] += fkz;}
/* Store the contribution to the global arrays: */
/* Take the id of the atom from the at1[icomb] element, i1 = at1[icomb]. */
if (newton_bond || at1[icomb] < nlocal) {
f[at1[icomb]][0] += fix;
f[at1[icomb]][1] += fiy;
f[at1[icomb]][2] += fiz;
}
/* Take the id of the atom from the at2[icomb] element, i2 = at2[icomb]. */
if (newton_bond || at2[icomb] < nlocal) {
f[at2[icomb]][0] += fjx;
f[at2[icomb]][1] += fjy;
f[at2[icomb]][2] += fjz;
}
/* Take the id of the atom from the at3[icomb] element, i3 = at3[icomb]. */
if (newton_bond || at3[icomb] < nlocal) {
f[at3[icomb]][0] += fkx;
f[at3[icomb]][1] += fky;
f[at3[icomb]][2] += fkz;
}
}
if (evflag) ev_tally(i1,i2,i3,i4,nlocal,newton_bond,eimproper,f1,f3,f4,
vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z);
}
}
/* ---------------------------------------------------------------------- */
void ImproperRing::allocate()
{
allocated = 1;
int n = atom->nimpropertypes;
memory->create(k,n+1,"improper:k");
memory->create(chi,n+1,"improper:chi");
memory->create(setflag,n+1,"improper:setflag");
for (int i = 1; i <= n; i++) setflag[i] = 0;
}
/* ----------------------------------------------------------------------
set coeffs for one type
------------------------------------------------------------------------- */
void ImproperRing ::coeff(int narg, char **arg)
{
/* Check whether there exist sufficient number of arguments.
0: type of improper to be applied to
1: energetic constant
2: equilibrium angle in degrees */
if (narg != 3) error->all(FLERR,"Incorrect args for RING improper coefficients");
if (!allocated) allocate();
int ilo,ihi;
force->bounds(FLERR,arg[0],atom->nimpropertypes,ilo,ihi);
double k_one = force->numeric(FLERR,arg[1]);
double chi_one = force->numeric(FLERR,arg[2]);
int count = 0;
for (int i = ilo; i <= ihi; i++) {
/* Read the k parameter in kcal/mol. */
k[i] = k_one;
/* "chi_one" stores the equilibrium angle in degrees.
Convert it to radians and store its cosine. */
chi[i] = cos((chi_one/180.0)*MY_PI);
setflag[i] = 1;
count++;
}
if (count == 0) error->all(FLERR,"Incorrect args for improper coefficients");
}
/* ----------------------------------------------------------------------
proc 0 writes out coeffs to restart file
------------------------------------------------------------------------- */
void ImproperRing ::write_restart(FILE *fp)
{
fwrite(&k[1],sizeof(double),atom->nimpropertypes,fp);
fwrite(&chi[1],sizeof(double),atom->nimpropertypes,fp);
}
/* ----------------------------------------------------------------------
proc 0 reads coeffs from restart file, bcasts them
------------------------------------------------------------------------- */
void ImproperRing::read_restart(FILE *fp)
{
allocate();
if (comm->me == 0) {
fread(&k[1],sizeof(double),atom->nimpropertypes,fp);
fread(&chi[1],sizeof(double),atom->nimpropertypes,fp);
}
MPI_Bcast(&k[1],atom->nimpropertypes,MPI_DOUBLE,0,world);
MPI_Bcast(&chi[1],atom->nimpropertypes,MPI_DOUBLE,0,world);
for (int i = 1; i <= atom->nimpropertypes; i++) setflag[i] = 1;
}
diff --git a/src/USER-MOLFILE/README b/src/USER-MOLFILE/README
index f6defed6a..4437b587e 100644
--- a/src/USER-MOLFILE/README
+++ b/src/USER-MOLFILE/README
@@ -1,35 +1,22 @@
This package provides a C++ interface class to the VMD molfile
plugins, http://www.ks.uiuc.edu/Research/vmd/plugins/molfile, and a
set of LAMMPS classes that use this interface.
-Molfile plugins provide a consistent programming interface to read and
-write file formats commonly used in molecular simulations. This
+Molfile plugins provide a consistent programming interface to read
+and write file formats commonly used in molecular simulations. This
package only provides the interface code, not the plugins; these can
be taken as precompiled binaries directly from a VMD installation that
matches the platform of your LAMMPS executable. Using the plugin
interface one can add support for additional file formats to LAMMPS
simply by telling LAMMPS where to find a suitable plugin without
having to recompile or change LAMMPS directly. The plugins bundled
with VMD are usually installed in a directory inside the VMD
installation tree named "plugins/<VMDARCH>/molfile".
To be able to dynamically load and execute the plugins from inside
LAMMPS, you need to link with an appropriate system library, which
is done using the settings in lib/molfile/Makefile.lammps. See
that file and the lib/molfile/README file for more details.
-NOTE: while the programming interface (API) to the molfile plugins is
-backward compatible (i.e. you can expect to be able to compile this
-package for plugins from newer VMD packages), the binary interface
-(ABI) is not. So it is necessary to compile this package with the
-molfile plugin header files (vmdplugin.h and molfile_plugin.h) taken
-from the _same_ VMD installation that the (binary) plugin files are
-taken from. These header files can be found inside the VMD
-installation tree under: "plugins/include".
-
-For convenience, this package includes a set of header files that is
-compatible with VMD 1.9 and 1.9.1 (the current version in June 2012)
-and should be compilable with VMD versions back to about version 1.8.4
-
The person who created this package is Axel Kohlmeyer at Temple U
(akohlmey at gmail.com). Contact him directly if you have questions.
diff --git a/src/USER-NC-DUMP/Install.sh b/src/USER-NETCDF/Install.sh
similarity index 100%
rename from src/USER-NC-DUMP/Install.sh
rename to src/USER-NETCDF/Install.sh
diff --git a/src/USER-NC-DUMP/README b/src/USER-NETCDF/README
similarity index 95%
rename from src/USER-NC-DUMP/README
rename to src/USER-NETCDF/README
index c02e879c6..57dec5e4c 100644
--- a/src/USER-NC-DUMP/README
+++ b/src/USER-NETCDF/README
@@ -1,39 +1,39 @@
-USER-NC-DUMP
+USER-NETCDF
============
-This package provides the nc and (optionally) the nc/mpiio dump styles.
+This package provides the netcf and netcdf/mpiio dump styles.
See the doc page for dump nc or dump nc/mpiio command for how to use them.
Compiling these dump styles requires having the netCDF library installed
on your system. See lib/netcdf/README for additional details.
PACKAGE DESCRIPTION
-------------------
This is a LAMMPS (http://lammps.sandia.gov/) dump style for output into a NetCDF
database. The database format follows the AMBER NetCDF trajectory convention
(http://ambermd.org/netcdf/nctraj.xhtml), but includes extensions to this
convention. These extension are:
* A variable "cell_origin" (of dimension "frame", "cell_spatial") that contains
the bottom left corner of the simulation cell.
* Any number of additional variables corresponding to per atom scalar, vector
or tensor quantities available within LAMMPS. Tensor quantities are written in
Voigt notation. An additional dimension "Voigt" of length 6 is created for
this purpose.
* Possibility to output to an HDF5 database.
NetCDF files can be directly visualized with the following tools:
* Ovito (http://www.ovito.org/). Ovito supports the AMBER convention and all of
the above extensions.
* VMD (http://www.ks.uiuc.edu/Research/vmd/).
* AtomEye (http://www.libatoms.org/). The libAtoms version of AtomEye contains
a NetCDF reader that is not present in the standard distribution of AtomEye.
The person who created these files is Lars Pastewka at
Karlsruhe Institute of Technology (lars.pastewka@kit.edu).
Contact him directly if you have questions.
Lars Pastewka
Institute for Applied Materials (IAM)
Karlsruhe Institute of Technology (KIT)
Kaiserstrasse 12, 76131 Karlsruhe
e-mail: lars.pastewka@kit.edu
diff --git a/src/USER-NC-DUMP/dump_nc.cpp b/src/USER-NETCDF/dump_netcdf.cpp
similarity index 97%
rename from src/USER-NC-DUMP/dump_nc.cpp
rename to src/USER-NETCDF/dump_netcdf.cpp
index 7a66eb022..bad90bdef 100644
--- a/src/USER-NC-DUMP/dump_nc.cpp
+++ b/src/USER-NETCDF/dump_netcdf.cpp
@@ -1,1144 +1,1141 @@
/* ======================================================================
LAMMPS NetCDF dump style
https://github.com/pastewka/lammps-netcdf
Lars Pastewka, lars.pastewka@kit.edu
Copyright (2011-2013) Fraunhofer IWM
Copyright (2014) Karlsruhe Institute of Technology
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
====================================================================== */
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
+
#if defined(LMP_HAS_NETCDF)
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
-
#include <netcdf.h>
-
+#include "dump_netcdf.h"
#include "atom.h"
#include "comm.h"
#include "compute.h"
#include "domain.h"
#include "error.h"
#include "fix.h"
#include "group.h"
#include "input.h"
#include "math_const.h"
#include "memory.h"
#include "modify.h"
#include "update.h"
#include "universe.h"
#include "variable.h"
#include "force.h"
-#include "dump_nc.h"
-
using namespace LAMMPS_NS;
using namespace MathConst;
enum{INT,DOUBLE}; // same as in dump_custom.cpp
const char NC_FRAME_STR[] = "frame";
const char NC_SPATIAL_STR[] = "spatial";
const char NC_VOIGT_STR[] = "Voigt";
const char NC_ATOM_STR[] = "atom";
const char NC_CELL_SPATIAL_STR[] = "cell_spatial";
const char NC_CELL_ANGULAR_STR[] = "cell_angular";
const char NC_LABEL_STR[] = "label";
const char NC_TIME_STR[] = "time";
const char NC_CELL_ORIGIN_STR[] = "cell_origin";
const char NC_CELL_LENGTHS_STR[] = "cell_lengths";
const char NC_CELL_ANGLES_STR[] = "cell_angles";
const char NC_UNITS_STR[] = "units";
const char NC_SCALE_FACTOR_STR[] = "scale_factor";
const int THIS_IS_A_FIX = -1;
const int THIS_IS_A_COMPUTE = -2;
const int THIS_IS_A_VARIABLE = -3;
const int THIS_IS_A_BIGINT = -4;
/* ---------------------------------------------------------------------- */
#define NCERR(x) ncerr(x, NULL, __LINE__)
#define NCERRX(x, descr) ncerr(x, descr, __LINE__)
/* ---------------------------------------------------------------------- */
-DumpNC::DumpNC(LAMMPS *lmp, int narg, char **arg) :
+DumpNetCDF::DumpNetCDF(LAMMPS *lmp, int narg, char **arg) :
DumpCustom(lmp, narg, arg)
{
// arrays for data rearrangement
sort_flag = 1;
sortcol = 0;
binary = 1;
flush_flag = 0;
if (multiproc)
error->all(FLERR,"Multi-processor writes are not supported.");
if (multifile)
error->all(FLERR,"Multiple files are not supported.");
perat = new nc_perat_t[nfield];
for (int i = 0; i < nfield; i++) {
perat[i].dims = 0;
}
n_perat = 0;
for (int iarg = 5; iarg < narg; iarg++) {
int i = iarg-5;
int idim = 0;
int ndims = 1;
char mangled[1024];
bool constant = false;
strcpy(mangled, arg[iarg]);
// name mangling
// in the AMBER specification
if (!strcmp(mangled, "x") || !strcmp(mangled, "y") ||
!strcmp(mangled, "z")) {
idim = mangled[0] - 'x';
ndims = 3;
strcpy(mangled, "coordinates");
}
else if (!strcmp(mangled, "vx") || !strcmp(mangled, "vy") ||
!strcmp(mangled, "vz")) {
idim = mangled[1] - 'x';
ndims = 3;
strcpy(mangled, "velocities");
}
// extensions to the AMBER specification
else if (!strcmp(mangled, "type")) {
strcpy(mangled, "atom_types");
}
else if (!strcmp(mangled, "xs") || !strcmp(mangled, "ys") ||
!strcmp(mangled, "zs")) {
idim = mangled[0] - 'x';
ndims = 3;
strcpy(mangled, "scaled_coordinates");
}
else if (!strcmp(mangled, "xu") || !strcmp(mangled, "yu") ||
!strcmp(mangled, "zu")) {
idim = mangled[0] - 'x';
ndims = 3;
strcpy(mangled, "unwrapped_coordinates");
}
else if (!strcmp(mangled, "fx") || !strcmp(mangled, "fy") ||
!strcmp(mangled, "fz")) {
idim = mangled[1] - 'x';
ndims = 3;
strcpy(mangled, "forces");
}
else if (!strcmp(mangled, "mux") || !strcmp(mangled, "muy") ||
!strcmp(mangled, "muz")) {
idim = mangled[2] - 'x';
ndims = 3;
strcpy(mangled, "mu");
}
else if (!strncmp(mangled, "c_", 2)) {
char *ptr = strchr(mangled, '[');
if (ptr) {
if (mangled[strlen(mangled)-1] != ']')
error->all(FLERR,"Missing ']' in dump command");
*ptr = '\0';
idim = ptr[1] - '1';
ndims = THIS_IS_A_COMPUTE;
}
}
else if (!strncmp(mangled, "f_", 2)) {
char *ptr = strchr(mangled, '[');
if (ptr) {
if (mangled[strlen(mangled)-1] != ']')
error->all(FLERR,"Missing ']' in dump command");
*ptr = '\0';
idim = ptr[1] - '1';
ndims = THIS_IS_A_FIX;
}
}
// find mangled name
int inc = -1;
for (int j = 0; j < n_perat && inc < 0; j++) {
if (!strcmp(perat[j].name, mangled)) {
inc = j;
}
}
if (inc < 0) {
// this has not yet been defined
inc = n_perat;
perat[inc].dims = ndims;
if (ndims < 0) ndims = DUMP_NC_MAX_DIMS;
for (int j = 0; j < DUMP_NC_MAX_DIMS; j++) {
perat[inc].field[j] = -1;
}
strcpy(perat[inc].name, mangled);
n_perat++;
}
perat[inc].constant = constant;
perat[inc].ndumped = 0;
perat[inc].field[idim] = i;
}
n_perframe = 0;
perframe = NULL;
n_buffer = 0;
int_buffer = NULL;
double_buffer = NULL;
double_precision = false;
framei = 0;
}
/* ---------------------------------------------------------------------- */
-DumpNC::~DumpNC()
+DumpNetCDF::~DumpNetCDF()
{
closefile();
delete [] perat;
if (n_perframe > 0)
delete [] perframe;
if (int_buffer) memory->sfree(int_buffer);
if (double_buffer) memory->sfree(double_buffer);
}
/* ---------------------------------------------------------------------- */
-void DumpNC::openfile()
+void DumpNetCDF::openfile()
{
// now the computes and fixes have been initialized, so we can query
// for the size of vector quantities
for (int i = 0; i < n_perat; i++) {
if (perat[i].dims == THIS_IS_A_COMPUTE) {
int j = -1;
for (int k = 0; k < DUMP_NC_MAX_DIMS; k++) {
if (perat[i].field[k] >= 0) {
j = field2index[perat[i].field[0]];
}
}
if (j < 0)
error->all(FLERR,"Internal error.");
if (!compute[j]->peratom_flag)
error->all(FLERR,"compute does not provide per atom data");
perat[i].dims = compute[j]->size_peratom_cols;
if (perat[i].dims > DUMP_NC_MAX_DIMS)
error->all(FLERR,"perat[i].dims > DUMP_NC_MAX_DIMS");
}
else if (perat[i].dims == THIS_IS_A_FIX) {
int j = -1;
for (int k = 0; k < DUMP_NC_MAX_DIMS; k++) {
if (perat[i].field[k] >= 0) {
j = field2index[perat[i].field[0]];
}
}
if (j < 0)
error->all(FLERR,"Internal error.");
if (!fix[j]->peratom_flag)
error->all(FLERR,"fix does not provide per atom data");
perat[i].dims = fix[j]->size_peratom_cols;
if (perat[i].dims > DUMP_NC_MAX_DIMS)
error->all(FLERR,"perat[i].dims > DUMP_NC_MAX_DIMS");
}
}
// get total number of atoms
ntotalgr = group->count(igroup);
if (filewriter) {
if (append_flag && access(filename, F_OK) != -1) {
// Fixme! Perform checks if dimensions and variables conform with
// data structure standard.
if (singlefile_opened) return;
singlefile_opened = 1;
NCERRX( nc_open(filename, NC_WRITE, &ncid), filename );
// dimensions
NCERRX( nc_inq_dimid(ncid, NC_FRAME_STR, &frame_dim), NC_FRAME_STR );
NCERRX( nc_inq_dimid(ncid, NC_SPATIAL_STR, &spatial_dim),
NC_SPATIAL_STR );
NCERRX( nc_inq_dimid(ncid, NC_VOIGT_STR, &Voigt_dim), NC_VOIGT_STR );
NCERRX( nc_inq_dimid(ncid, NC_ATOM_STR, &atom_dim), NC_ATOM_STR );
NCERRX( nc_inq_dimid(ncid, NC_CELL_SPATIAL_STR, &cell_spatial_dim),
NC_CELL_SPATIAL_STR );
NCERRX( nc_inq_dimid(ncid, NC_CELL_ANGULAR_STR, &cell_angular_dim),
NC_CELL_ANGULAR_STR );
NCERRX( nc_inq_dimid(ncid, NC_LABEL_STR, &label_dim), NC_LABEL_STR );
// default variables
NCERRX( nc_inq_varid(ncid, NC_SPATIAL_STR, &spatial_var),
NC_SPATIAL_STR );
NCERRX( nc_inq_varid(ncid, NC_CELL_SPATIAL_STR, &cell_spatial_var),
NC_CELL_SPATIAL_STR);
NCERRX( nc_inq_varid(ncid, NC_CELL_ANGULAR_STR, &cell_angular_var),
NC_CELL_ANGULAR_STR);
NCERRX( nc_inq_varid(ncid, NC_TIME_STR, &time_var), NC_TIME_STR );
NCERRX( nc_inq_varid(ncid, NC_CELL_ORIGIN_STR, &cell_origin_var),
NC_CELL_ORIGIN_STR );
NCERRX( nc_inq_varid(ncid, NC_CELL_LENGTHS_STR, &cell_lengths_var),
NC_CELL_LENGTHS_STR);
NCERRX( nc_inq_varid(ncid, NC_CELL_ANGLES_STR, &cell_angles_var),
NC_CELL_ANGLES_STR);
// variables specified in the input file
for (int i = 0; i < n_perat; i++) {
nc_type xtype;
// Type mangling
if (vtype[perat[i].field[0]] == INT) {
xtype = NC_INT;
}
else {
if (double_precision)
xtype = NC_DOUBLE;
else
xtype = NC_FLOAT;
}
NCERRX( nc_inq_varid(ncid, perat[i].name, &perat[i].var),
perat[i].name );
}
// perframe variables
for (int i = 0; i < n_perframe; i++) {
NCERRX( nc_inq_varid(ncid, perframe[i].name, &perframe[i].var),
perframe[i].name );
}
size_t nframes;
NCERR( nc_inq_dimlen(ncid, frame_dim, &nframes) );
// framei == -1 means append to file, == -2 means override last frame
// Note that in the input file this translates to 'yes', '-1', etc.
if (framei < 0 || (append_flag && framei == 0)) framei = nframes+framei+1;
if (framei < 1) framei = 1;
}
else {
int dims[NC_MAX_VAR_DIMS];
size_t index[NC_MAX_VAR_DIMS], count[NC_MAX_VAR_DIMS];
double d[1];
if (singlefile_opened) return;
singlefile_opened = 1;
NCERRX( nc_create(filename, NC_64BIT_OFFSET, &ncid),
filename );
// dimensions
NCERRX( nc_def_dim(ncid, NC_FRAME_STR, NC_UNLIMITED, &frame_dim),
NC_FRAME_STR );
NCERRX( nc_def_dim(ncid, NC_SPATIAL_STR, 3, &spatial_dim),
NC_SPATIAL_STR );
NCERRX( nc_def_dim(ncid, NC_VOIGT_STR, 6, &Voigt_dim),
NC_VOIGT_STR );
NCERRX( nc_def_dim(ncid, NC_ATOM_STR, ntotalgr, &atom_dim),
NC_ATOM_STR );
NCERRX( nc_def_dim(ncid, NC_CELL_SPATIAL_STR, 3, &cell_spatial_dim),
NC_CELL_SPATIAL_STR );
NCERRX( nc_def_dim(ncid, NC_CELL_ANGULAR_STR, 3, &cell_angular_dim),
NC_CELL_ANGULAR_STR );
NCERRX( nc_def_dim(ncid, NC_LABEL_STR, 10, &label_dim),
NC_LABEL_STR );
// default variables
dims[0] = spatial_dim;
NCERRX( nc_def_var(ncid, NC_SPATIAL_STR, NC_CHAR, 1, dims, &spatial_var),
NC_SPATIAL_STR );
NCERRX( nc_def_var(ncid, NC_CELL_SPATIAL_STR, NC_CHAR, 1, dims,
&cell_spatial_var), NC_CELL_SPATIAL_STR );
dims[0] = spatial_dim;
dims[1] = label_dim;
NCERRX( nc_def_var(ncid, NC_CELL_ANGULAR_STR, NC_CHAR, 2, dims,
&cell_angular_var), NC_CELL_ANGULAR_STR );
dims[0] = frame_dim;
NCERRX( nc_def_var(ncid, NC_TIME_STR, NC_DOUBLE, 1, dims, &time_var),
NC_TIME_STR);
dims[0] = frame_dim;
dims[1] = cell_spatial_dim;
NCERRX( nc_def_var(ncid, NC_CELL_ORIGIN_STR, NC_DOUBLE, 2, dims,
&cell_origin_var), NC_CELL_ORIGIN_STR );
NCERRX( nc_def_var(ncid, NC_CELL_LENGTHS_STR, NC_DOUBLE, 2, dims,
&cell_lengths_var), NC_CELL_LENGTHS_STR );
dims[0] = frame_dim;
dims[1] = cell_angular_dim;
NCERRX( nc_def_var(ncid, NC_CELL_ANGLES_STR, NC_DOUBLE, 2, dims,
&cell_angles_var), NC_CELL_ANGLES_STR );
// variables specified in the input file
dims[0] = frame_dim;
dims[1] = atom_dim;
dims[2] = spatial_dim;
for (int i = 0; i < n_perat; i++) {
nc_type xtype;
// Type mangling
if (vtype[perat[i].field[0]] == INT) {
xtype = NC_INT;
}
else {
if (double_precision)
xtype = NC_DOUBLE;
else
xtype = NC_FLOAT;
}
if (perat[i].constant) {
// this quantity will only be written once
if (perat[i].dims == 6) {
// this is a tensor in Voigt notation
dims[2] = Voigt_dim;
NCERRX( nc_def_var(ncid, perat[i].name, xtype, 2, dims+1,
&perat[i].var), perat[i].name );
}
else if (perat[i].dims == 3) {
// this is a vector, we need to store x-, y- and z-coordinates
dims[2] = spatial_dim;
NCERRX( nc_def_var(ncid, perat[i].name, xtype, 2, dims+1,
&perat[i].var), perat[i].name );
}
else if (perat[i].dims == 1) {
NCERRX( nc_def_var(ncid, perat[i].name, xtype, 1, dims+1,
&perat[i].var), perat[i].name );
}
else {
char errstr[1024];
sprintf(errstr, "%i dimensions for '%s'. Not sure how to write "
"this to the NetCDF trajectory file.", perat[i].dims,
perat[i].name);
error->all(FLERR,errstr);
}
}
else {
if (perat[i].dims == 6) {
// this is a tensor in Voigt notation
dims[2] = Voigt_dim;
NCERRX( nc_def_var(ncid, perat[i].name, xtype, 3, dims,
&perat[i].var), perat[i].name );
}
else if (perat[i].dims == 3) {
// this is a vector, we need to store x-, y- and z-coordinates
dims[2] = spatial_dim;
NCERRX( nc_def_var(ncid, perat[i].name, xtype, 3, dims,
&perat[i].var), perat[i].name );
}
else if (perat[i].dims == 1) {
NCERRX( nc_def_var(ncid, perat[i].name, xtype, 2, dims,
&perat[i].var), perat[i].name );
}
else {
char errstr[1024];
sprintf(errstr, "%i dimensions for '%s'. Not sure how to write "
"this to the NetCDF trajectory file.", perat[i].dims,
perat[i].name);
error->all(FLERR,errstr);
}
}
}
// perframe variables
for (int i = 0; i < n_perframe; i++) {
if (perframe[i].type == THIS_IS_A_BIGINT) {
NCERRX( nc_def_var(ncid, perframe[i].name, NC_LONG, 1, dims,
&perframe[i].var), perframe[i].name );
}
else {
NCERRX( nc_def_var(ncid, perframe[i].name, NC_DOUBLE, 1, dims,
&perframe[i].var), perframe[i].name );
}
}
// attributes
NCERR( nc_put_att_text(ncid, NC_GLOBAL, "Conventions",
5, "AMBER") );
NCERR( nc_put_att_text(ncid, NC_GLOBAL, "ConventionVersion",
3, "1.0") );
NCERR( nc_put_att_text(ncid, NC_GLOBAL, "program",
6, "LAMMPS") );
NCERR( nc_put_att_text(ncid, NC_GLOBAL, "programVersion",
strlen(universe->version), universe->version) );
// units
if (!strcmp(update->unit_style, "lj")) {
NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
2, "lj") );
NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
2, "lj") );
NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
2, "lj") );
}
else if (!strcmp(update->unit_style, "real")) {
NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
11, "femtosecond") );
NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
8, "Angstrom") );
NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
8, "Angstrom") );
}
else if (!strcmp(update->unit_style, "metal")) {
NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
10, "picosecond") );
NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
8, "Angstrom") );
NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
8, "Angstrom") );
}
else if (!strcmp(update->unit_style, "si")) {
NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
6, "second") );
NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
5, "meter") );
NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
5, "meter") );
}
else if (!strcmp(update->unit_style, "cgs")) {
NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
6, "second") );
NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
10, "centimeter") );
NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
10, "centimeter") );
}
else if (!strcmp(update->unit_style, "electron")) {
NCERR( nc_put_att_text(ncid, time_var, NC_UNITS_STR,
11, "femtosecond") );
NCERR( nc_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
4, "Bohr") );
NCERR( nc_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
4, "Bohr") );
}
else {
char errstr[1024];
sprintf(errstr, "Unsupported unit style '%s'", update->unit_style);
error->all(FLERR,errstr);
}
NCERR( nc_put_att_text(ncid, cell_angles_var, NC_UNITS_STR,
6, "degree") );
d[0] = update->dt;
NCERR( nc_put_att_double(ncid, time_var, NC_SCALE_FACTOR_STR,
NC_DOUBLE, 1, d) );
d[0] = 1.0;
NCERR( nc_put_att_double(ncid, cell_origin_var, NC_SCALE_FACTOR_STR,
NC_DOUBLE, 1, d) );
d[0] = 1.0;
NCERR( nc_put_att_double(ncid, cell_lengths_var, NC_SCALE_FACTOR_STR,
NC_DOUBLE, 1, d) );
/*
* Finished with definition
*/
NCERR( nc_enddef(ncid) );
/*
* Write label variables
*/
NCERR( nc_put_var_text(ncid, spatial_var, "xyz") );
NCERR( nc_put_var_text(ncid, cell_spatial_var, "abc") );
index[0] = 0;
index[1] = 0;
count[0] = 1;
count[1] = 5;
NCERR( nc_put_vara_text(ncid, cell_angular_var, index, count, "alpha") );
index[0] = 1;
count[1] = 4;
NCERR( nc_put_vara_text(ncid, cell_angular_var, index, count, "beta") );
index[0] = 2;
count[1] = 5;
NCERR( nc_put_vara_text(ncid, cell_angular_var, index, count, "gamma") );
framei = 1;
}
}
}
/* ---------------------------------------------------------------------- */
-void DumpNC::closefile()
+void DumpNetCDF::closefile()
{
if (filewriter && singlefile_opened) {
NCERR( nc_close(ncid) );
singlefile_opened = 0;
- // append next time DumpNC::openfile is called
+ // append next time DumpNetCDF::openfile is called
append_flag = 1;
// write to next frame upon next open
framei++;
}
}
/* ---------------------------------------------------------------------- */
-void DumpNC::write()
+void DumpNetCDF::write()
{
// open file
openfile();
// need to write per-frame (global) properties here since they may come
// from computes. write_header below is only called from the writing
// processes, but modify->compute[j]->compute_* must be called from all
// processes.
size_t start[2];
start[0] = framei-1;
start[1] = 0;
for (int i = 0; i < n_perframe; i++) {
if (perframe[i].type == THIS_IS_A_BIGINT) {
bigint data;
(this->*perframe[i].compute)((void*) &data);
if (filewriter)
#if defined(LAMMPS_SMALLBIG) || defined(LAMMPS_BIGBIG)
NCERR( nc_put_var1_long(ncid, perframe[i].var, start, &data) );
#else
NCERR( nc_put_var1_int(ncid, perframe[i].var, start, &data) );
#endif
}
else {
double data;
int j = perframe[i].index;
int idim = perframe[i].dim;
if (perframe[i].type == THIS_IS_A_COMPUTE) {
if (idim >= 0) {
modify->compute[j]->compute_vector();
data = modify->compute[j]->vector[idim];
}
else
data = modify->compute[j]->compute_scalar();
}
else if (perframe[i].type == THIS_IS_A_FIX) {
if (idim >= 0) {
data = modify->fix[j]->compute_vector(idim);
}
else
data = modify->fix[j]->compute_scalar();
}
else if (perframe[i].type == THIS_IS_A_VARIABLE) {
j = input->variable->find(perframe[i].id);
data = input->variable->compute_equal(j);
}
if (filewriter)
NCERR( nc_put_var1_double(ncid, perframe[i].var, start, &data) );
}
}
// call write of superclass
Dump::write();
// close file. this ensures data is flushed and mimized data corruption
closefile();
}
/* ---------------------------------------------------------------------- */
-void DumpNC::write_header(bigint n)
+void DumpNetCDF::write_header(bigint n)
{
size_t start[2];
start[0] = framei-1;
start[1] = 0;
if (filewriter) {
size_t count[2];
double time, cell_origin[3], cell_lengths[3], cell_angles[3];
time = update->ntimestep;
if (domain->triclinic == 0) {
cell_origin[0] = domain->boxlo[0];
cell_origin[1] = domain->boxlo[1];
cell_origin[2] = domain->boxlo[2];
cell_lengths[0] = domain->xprd;
cell_lengths[1] = domain->yprd;
cell_lengths[2] = domain->zprd;
cell_angles[0] = 90;
cell_angles[1] = 90;
cell_angles[2] = 90;
}
else {
double cosalpha, cosbeta, cosgamma;
double *h = domain->h;
cell_origin[0] = domain->boxlo[0];
cell_origin[1] = domain->boxlo[1];
cell_origin[2] = domain->boxlo[2];
cell_lengths[0] = domain->xprd;
cell_lengths[1] = sqrt(h[1]*h[1]+h[5]*h[5]);
cell_lengths[2] = sqrt(h[2]*h[2]+h[3]*h[3]+h[4]*h[4]);
cosalpha = (h[5]*h[4]+h[1]*h[3])/
sqrt((h[1]*h[1]+h[5]*h[5])*(h[2]*h[2]+h[3]*h[3]+h[4]*h[4]));
cosbeta = h[4]/sqrt(h[2]*h[2]+h[3]*h[3]+h[4]*h[4]);
cosgamma = h[5]/sqrt(h[1]*h[1]+h[5]*h[5]);
cell_angles[0] = acos(cosalpha)*180.0/MY_PI;
cell_angles[1] = acos(cosbeta)*180.0/MY_PI;
cell_angles[2] = acos(cosgamma)*180.0/MY_PI;
}
// Recent AMBER conventions say that nonperiodic boundaries should have
// 'cell_lengths' set to zero.
for (int dim = 0; dim < 3; dim++) {
if (!domain->periodicity[dim])
cell_lengths[dim] = 0.0;
}
count[0] = 1;
count[1] = 3;
NCERR( nc_put_var1_double(ncid, time_var, start, &time) );
NCERR( nc_put_vara_double(ncid, cell_origin_var, start, count,
cell_origin) );
NCERR( nc_put_vara_double(ncid, cell_lengths_var, start, count,
cell_lengths) );
NCERR( nc_put_vara_double(ncid, cell_angles_var, start, count,
cell_angles) );
}
ndata = n;
blocki = 0;
}
/* ----------------------------------------------------------------------
write data lines to file in a block-by-block style
write head of block (mass & element name) only if has atoms of the type
------------------------------------------------------------------------- */
-void DumpNC::write_data(int n, double *mybuf)
+void DumpNetCDF::write_data(int n, double *mybuf)
{
size_t start[NC_MAX_VAR_DIMS], count[NC_MAX_VAR_DIMS];
ptrdiff_t stride[NC_MAX_VAR_DIMS];
if (!int_buffer) {
n_buffer = n;
int_buffer = (int *)
- memory->smalloc(n*sizeof(int), "DumpNC::int_buffer");
+ memory->smalloc(n*sizeof(int),"dump::int_buffer");
double_buffer = (double *)
- memory->smalloc(n*sizeof(double), "DumpNC::double_buffer");
+ memory->smalloc(n*sizeof(double),"dump::double_buffer");
}
if (n > n_buffer) {
n_buffer = n;
int_buffer = (int *)
- memory->srealloc(int_buffer, n*sizeof(int), "DumpNC::int_buffer");
+ memory->srealloc(int_buffer, n*sizeof(int),"dump::int_buffer");
double_buffer = (double *)
- memory->srealloc(double_buffer, n*sizeof(double),
- "DumpNC::double_buffer");
+ memory->srealloc(double_buffer, n*sizeof(double),"dump::double_buffer");
}
start[0] = framei-1;
start[1] = blocki;
start[2] = 0;
count[0] = 1;
count[1] = n;
count[2] = 1;
stride[0] = 1;
stride[1] = 1;
stride[2] = 3;
for (int i = 0; i < n_perat; i++) {
int iaux = perat[i].field[0];
if (vtype[iaux] == INT) {
// integers
if (perat[i].dims > 1) {
for (int idim = 0; idim < perat[i].dims; idim++) {
iaux = perat[i].field[idim];
if (iaux >= 0) {
for (int j = 0; j < n; j++, iaux+=size_one) {
int_buffer[j] = mybuf[iaux];
}
start[2] = idim;
if (perat[i].constant) {
if (perat[i].ndumped < ntotalgr) {
NCERR( nc_put_vars_int(ncid, perat[i].var,
start+1, count+1, stride+1,
int_buffer) );
perat[i].ndumped += n;
}
}
else
NCERR( nc_put_vars_int(ncid, perat[i].var, start, count, stride,
int_buffer) );
}
}
}
else {
for (int j = 0; j < n; j++, iaux+=size_one) {
int_buffer[j] = mybuf[iaux];
}
if (perat[i].constant) {
if (perat[i].ndumped < ntotalgr) {
NCERR( nc_put_vara_int(ncid, perat[i].var, start+1, count+1,
int_buffer) );
perat[i].ndumped += n;
}
}
else
NCERR( nc_put_vara_int(ncid, perat[i].var, start, count,
int_buffer) );
}
}
else {
// doubles
if (perat[i].dims > 1) {
for (int idim = 0; idim < perat[i].dims; idim++) {
iaux = perat[i].field[idim];
if (iaux >= 0) {
for (int j = 0; j < n; j++, iaux+=size_one) {
double_buffer[j] = mybuf[iaux];
}
start[2] = idim;
if (perat[i].constant) {
if (perat[i].ndumped < ntotalgr) {
NCERR( nc_put_vars_double(ncid, perat[i].var,
start+1, count+1, stride+1,
double_buffer) );
perat[i].ndumped += n;
}
}
else
NCERR( nc_put_vars_double(ncid, perat[i].var, start, count,
stride, double_buffer) );
}
}
}
else {
for (int j = 0; j < n; j++, iaux+=size_one) {
double_buffer[j] = mybuf[iaux];
}
if (perat[i].constant) {
if (perat[i].ndumped < ntotalgr) {
NCERR( nc_put_vara_double(ncid, perat[i].var, start+1, count+1,
double_buffer) );
perat[i].ndumped += n;
}
}
else
NCERR( nc_put_vara_double(ncid, perat[i].var, start, count,
double_buffer) );
}
}
}
blocki += n;
}
/* ---------------------------------------------------------------------- */
-int DumpNC::modify_param(int narg, char **arg)
+int DumpNetCDF::modify_param(int narg, char **arg)
{
int iarg = 0;
if (strcmp(arg[iarg],"double") == 0) {
iarg++;
if (iarg >= narg)
error->all(FLERR,"expected 'yes' or 'no' after 'double' keyword.");
if (strcmp(arg[iarg],"yes") == 0) {
double_precision = true;
}
else if (strcmp(arg[iarg],"no") == 0) {
double_precision = false;
}
else error->all(FLERR,"expected 'yes' or 'no' after 'double' keyword.");
iarg++;
return 2;
}
else if (strcmp(arg[iarg],"at") == 0) {
iarg++;
framei = force->inumeric(FLERR,arg[iarg]);
if (framei < 0) framei--;
iarg++;
return 2;
}
else if (strcmp(arg[iarg],"global") == 0) {
// "perframe" quantities, i.e. not per-atom stuff
iarg++;
n_perframe = narg-iarg;
perframe = new nc_perframe_t[n_perframe];
for (int i = 0; iarg < narg; iarg++, i++) {
int n;
char *suffix=NULL;
if (!strcmp(arg[iarg],"step")) {
perframe[i].type = THIS_IS_A_BIGINT;
- perframe[i].compute = &DumpNC::compute_step;
+ perframe[i].compute = &DumpNetCDF::compute_step;
strcpy(perframe[i].name, arg[iarg]);
}
else if (!strcmp(arg[iarg],"elapsed")) {
perframe[i].type = THIS_IS_A_BIGINT;
- perframe[i].compute = &DumpNC::compute_elapsed;
+ perframe[i].compute = &DumpNetCDF::compute_elapsed;
strcpy(perframe[i].name, arg[iarg]);
}
else if (!strcmp(arg[iarg],"elaplong")) {
perframe[i].type = THIS_IS_A_BIGINT;
- perframe[i].compute = &DumpNC::compute_elapsed_long;
+ perframe[i].compute = &DumpNetCDF::compute_elapsed_long;
strcpy(perframe[i].name, arg[iarg]);
}
else {
n = strlen(arg[iarg]);
if (n > 2) {
suffix = new char[n-1];
strcpy(suffix, arg[iarg]+2);
}
else {
char errstr[1024];
sprintf(errstr, "perframe quantity '%s' must thermo quantity or "
"compute, fix or variable", arg[iarg]);
error->all(FLERR,errstr);
}
if (!strncmp(arg[iarg], "c_", 2)) {
int idim = -1;
char *ptr = strchr(suffix, '[');
if (ptr) {
if (suffix[strlen(suffix)-1] != ']')
error->all(FLERR,"Missing ']' in dump modify command");
*ptr = '\0';
idim = ptr[1] - '1';
}
n = modify->find_compute(suffix);
if (n < 0)
error->all(FLERR,"Could not find dump modify compute ID");
if (modify->compute[n]->peratom_flag != 0)
error->all(FLERR,"Dump modify compute ID computes per-atom info");
if (idim >= 0 && modify->compute[n]->vector_flag == 0)
error->all(FLERR,"Dump modify compute ID does not compute vector");
if (idim < 0 && modify->compute[n]->scalar_flag == 0)
error->all(FLERR,"Dump modify compute ID does not compute scalar");
perframe[i].type = THIS_IS_A_COMPUTE;
perframe[i].dim = idim;
perframe[i].index = n;
strcpy(perframe[i].name, arg[iarg]);
}
else if (!strncmp(arg[iarg], "f_", 2)) {
int idim = -1;
char *ptr = strchr(suffix, '[');
if (ptr) {
if (suffix[strlen(suffix)-1] != ']')
error->all(FLERR,"Missing ']' in dump modify command");
*ptr = '\0';
idim = ptr[1] - '1';
}
n = modify->find_fix(suffix);
if (n < 0)
error->all(FLERR,"Could not find dump modify fix ID");
if (modify->fix[n]->peratom_flag != 0)
error->all(FLERR,"Dump modify fix ID computes per-atom info");
if (idim >= 0 && modify->fix[n]->vector_flag == 0)
error->all(FLERR,"Dump modify fix ID does not compute vector");
if (idim < 0 && modify->fix[n]->scalar_flag == 0)
error->all(FLERR,"Dump modify fix ID does not compute vector");
perframe[i].type = THIS_IS_A_FIX;
perframe[i].dim = idim;
perframe[i].index = n;
strcpy(perframe[i].name, arg[iarg]);
}
else if (!strncmp(arg[iarg], "v_", 2)) {
n = input->variable->find(suffix);
if (n < 0)
error->all(FLERR,"Could not find dump modify variable ID");
if (!input->variable->equalstyle(n))
error->all(FLERR,"Dump modify variable must be of style equal");
perframe[i].type = THIS_IS_A_VARIABLE;
perframe[i].dim = 1;
perframe[i].index = n;
strcpy(perframe[i].name, arg[iarg]);
strcpy(perframe[i].id, suffix);
}
else {
char errstr[1024];
sprintf(errstr, "perframe quantity '%s' must be compute, fix or "
"variable", arg[iarg]);
error->all(FLERR,errstr);
}
delete [] suffix;
}
}
return narg;
} else return 0;
}
/* ---------------------------------------------------------------------- */
-void DumpNC::write_prmtop()
+void DumpNetCDF::write_prmtop()
{
char fn[1024];
char tmp[81];
FILE *f;
strcpy(fn, filename);
strcat(fn, ".prmtop");
f = fopen(fn, "w");
fprintf(f, "%%VERSION LAMMPS\n");
fprintf(f, "%%FLAG TITLE\n");
fprintf(f, "%%FORMAT(20a4)\n");
memset(tmp, ' ', 76);
tmp[76] = '\0';
fprintf(f, "NASN%s\n", tmp);
fprintf(f, "%%FLAG POINTERS\n");
fprintf(f, "%%FORMAT(10I8)\n");
#if defined(LAMMPS_SMALLBIG) || defined(LAMMPS_BIGBIG)
fprintf(f, "%8li", ntotalgr);
#else
fprintf(f, "%8i", ntotalgr);
#endif
for (int i = 0; i < 11; i++)
fprintf(f, "%8i", 0);
fprintf(f, "\n");
for (int i = 0; i < 12; i++)
fprintf(f, "%8i", 0);
fprintf(f, "\n");
for (int i = 0; i < 6; i++)
fprintf(f, "%8i", 0);
fprintf(f, "\n");
fprintf(f, "%%FLAG ATOM_NAME\n");
fprintf(f, "%%FORMAT(20a4)\n");
for (int i = 0; i < ntotalgr; i++) {
fprintf(f, "%4s", "He");
if ((i+1) % 20 == 0)
fprintf(f, "\n");
}
fprintf(f, "%%FLAG CHARGE\n");
fprintf(f, "%%FORMAT(5E16.5)\n");
for (int i = 0; i < ntotalgr; i++) {
fprintf(f, "%16.5e", 0.0);
if ((i+1) % 5 == 0)
fprintf(f, "\n");
}
fprintf(f, "%%FLAG MASS\n");
fprintf(f, "%%FORMAT(5E16.5)\n");
for (int i = 0; i < ntotalgr; i++) {
fprintf(f, "%16.5e", 1.0);
if ((i+1) % 5 == 0)
fprintf(f, "\n");
}
fclose(f);
}
/* ---------------------------------------------------------------------- */
-void DumpNC::ncerr(int err, const char *descr, int line)
+void DumpNetCDF::ncerr(int err, const char *descr, int line)
{
if (err != NC_NOERR) {
char errstr[1024];
if (descr) {
sprintf(errstr, "NetCDF failed with error '%s' (while accessing '%s') "
" in line %i of %s.", nc_strerror(err), descr, line, __FILE__);
}
else {
sprintf(errstr, "NetCDF failed with error '%s' in line %i of %s.",
nc_strerror(err), line, __FILE__);
}
error->one(FLERR,errstr);
}
}
/* ----------------------------------------------------------------------
one method for every keyword thermo can output
called by compute() or evaluate_keyword()
compute will have already been called
set ivalue/dvalue/bivalue if value is int/double/bigint
customize a new keyword by adding a method
------------------------------------------------------------------------- */
-void DumpNC::compute_step(void *r)
+void DumpNetCDF::compute_step(void *r)
{
*((bigint *) r) = update->ntimestep;
}
/* ---------------------------------------------------------------------- */
-void DumpNC::compute_elapsed(void *r)
+void DumpNetCDF::compute_elapsed(void *r)
{
*((bigint *) r) = update->ntimestep - update->firststep;
}
/* ---------------------------------------------------------------------- */
-void DumpNC::compute_elapsed_long(void *r)
+void DumpNetCDF::compute_elapsed_long(void *r)
{
*((bigint *) r) = update->ntimestep - update->beginstep;
}
#endif /* defined(LMP_HAS_NETCDF) */
diff --git a/src/USER-NC-DUMP/dump_nc.h b/src/USER-NETCDF/dump_netcdf.h
similarity index 94%
rename from src/USER-NC-DUMP/dump_nc.h
rename to src/USER-NETCDF/dump_netcdf.h
index 788a9368f..daf4e9d0d 100644
--- a/src/USER-NC-DUMP/dump_nc.h
+++ b/src/USER-NETCDF/dump_netcdf.h
@@ -1,143 +1,144 @@
/* ======================================================================
LAMMPS NetCDF dump style
https://github.com/pastewka/lammps-netcdf
Lars Pastewka, lars.pastewka@kit.edu
Copyright (2011-2013) Fraunhofer IWM
Copyright (2014) Karlsruhe Institute of Technology
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
====================================================================== */
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
+
#if defined(LMP_HAS_NETCDF)
#ifdef DUMP_CLASS
-DumpStyle(nc,DumpNC)
+DumpStyle(netcdf,DumpNetCDF)
#else
-#ifndef LMP_DUMP_NC_H
-#define LMP_DUMP_NC_H
+#ifndef LMP_DUMP_NETCDF_H
+#define LMP_DUMP_NETCDFC_H
#include "dump_custom.h"
namespace LAMMPS_NS {
const int NC_FIELD_NAME_MAX = 100;
const int DUMP_NC_MAX_DIMS = 100;
-class DumpNC : public DumpCustom {
+class DumpNetCDF : public DumpCustom {
public:
- DumpNC(class LAMMPS *, int, char **);
- virtual ~DumpNC();
+ DumpNetCDF(class LAMMPS *, int, char **);
+ virtual ~DumpNetCDF();
virtual void write();
private:
// per-atoms quantities (positions, velocities, etc.)
struct nc_perat_t {
int dims; // number of dimensions
int field[DUMP_NC_MAX_DIMS]; // field indices corresponding to the dim.
char name[NC_FIELD_NAME_MAX]; // field name
int var; // NetCDF variable
bool constant; // is this property per file (not per frame)
int ndumped; // number of enties written for this prop.
};
- typedef void (DumpNC::*funcptr_t)(void *);
+ typedef void (DumpNetCDF::*funcptr_t)(void *);
// per-frame quantities (variables, fixes or computes)
struct nc_perframe_t {
char name[NC_FIELD_NAME_MAX]; // field name
int var; // NetCDF variable
int type; // variable, fix, compute or callback
int index; // index in fix/compute list
funcptr_t compute; // compute function
int dim; // dimension
char id[NC_FIELD_NAME_MAX]; // variable id
bigint bigint_data; // actual data
double double_data; // actual data
};
int framei; // current frame index
int blocki; // current block index
int ndata; // number of data blocks to expect
bigint ntotalgr; // # of atoms
int n_perat; // # of netcdf per-atom properties
nc_perat_t *perat; // per-atom properties
int n_perframe; // # of global netcdf (not per-atom) fix props
nc_perframe_t *perframe; // global properties
bool double_precision; // write everything as double precision
bigint n_buffer; // size of buffer
int *int_buffer; // buffer for passing data to netcdf
double *double_buffer; // buffer for passing data to netcdf
int ncid;
int frame_dim;
int spatial_dim;
int Voigt_dim;
int atom_dim;
int cell_spatial_dim;
int cell_angular_dim;
int label_dim;
int spatial_var;
int cell_spatial_var;
int cell_angular_var;
int time_var;
int cell_origin_var;
int cell_lengths_var;
int cell_angles_var;
virtual void openfile();
void closefile();
virtual void write_header(bigint);
virtual void write_data(int, double *);
void write_prmtop();
virtual int modify_param(int, char **);
void ncerr(int, const char *, int);
void compute_step(void *);
void compute_elapsed(void *);
void compute_elapsed_long(void *);
};
}
#endif
#endif
#endif /* defined(LMP_HAS_NETCDF) */
diff --git a/src/USER-NC-DUMP/dump_nc_mpiio.cpp b/src/USER-NETCDF/dump_netcdf_mpiio.cpp
similarity index 96%
rename from src/USER-NC-DUMP/dump_nc_mpiio.cpp
rename to src/USER-NETCDF/dump_netcdf_mpiio.cpp
index 6b2601403..2e9ec274a 100644
--- a/src/USER-NC-DUMP/dump_nc_mpiio.cpp
+++ b/src/USER-NETCDF/dump_netcdf_mpiio.cpp
@@ -1,1077 +1,1074 @@
/* ======================================================================
LAMMPS NetCDF dump style
https://github.com/pastewka/lammps-netcdf
Lars Pastewka, lars.pastewka@kit.edu
Copyright (2011-2013) Fraunhofer IWM
Copyright (2014) Karlsruhe Institute of Technology
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
====================================================================== */
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
+
#if defined(LMP_HAS_PNETCDF)
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
-
#include <pnetcdf.h>
-
+#include "dump_netcdf_mpiio.h"
#include "atom.h"
#include "comm.h"
#include "compute.h"
#include "domain.h"
#include "error.h"
#include "fix.h"
#include "group.h"
#include "input.h"
#include "math_const.h"
#include "memory.h"
#include "modify.h"
#include "update.h"
#include "universe.h"
#include "variable.h"
#include "force.h"
-#include "dump_nc_mpiio.h"
-
using namespace LAMMPS_NS;
using namespace MathConst;
enum{INT,DOUBLE}; // same as in dump_custom.cpp
const char NC_FRAME_STR[] = "frame";
const char NC_SPATIAL_STR[] = "spatial";
const char NC_VOIGT_STR[] = "Voigt";
const char NC_ATOM_STR[] = "atom";
const char NC_CELL_SPATIAL_STR[] = "cell_spatial";
const char NC_CELL_ANGULAR_STR[] = "cell_angular";
const char NC_LABEL_STR[] = "label";
const char NC_TIME_STR[] = "time";
const char NC_CELL_ORIGIN_STR[] = "cell_origin";
const char NC_CELL_LENGTHS_STR[] = "cell_lengths";
const char NC_CELL_ANGLES_STR[] = "cell_angles";
const char NC_UNITS_STR[] = "units";
const char NC_SCALE_FACTOR_STR[] = "scale_factor";
const int THIS_IS_A_FIX = -1;
const int THIS_IS_A_COMPUTE = -2;
const int THIS_IS_A_VARIABLE = -3;
const int THIS_IS_A_BIGINT = -4;
/* ---------------------------------------------------------------------- */
#define NCERR(x) ncerr(x, NULL, __LINE__)
#define NCERRX(x, descr) ncerr(x, descr, __LINE__)
/* ---------------------------------------------------------------------- */
-DumpNCMPIIO::DumpNCMPIIO(LAMMPS *lmp, int narg, char **arg) :
+DumpNetCDFMPIIO::DumpNetCDFMPIIO(LAMMPS *lmp, int narg, char **arg) :
DumpCustom(lmp, narg, arg)
{
// arrays for data rearrangement
sort_flag = 1;
sortcol = 0;
binary = 1;
flush_flag = 0;
if (multiproc)
error->all(FLERR,"Multi-processor writes are not supported.");
if (multifile)
error->all(FLERR,"Multiple files are not supported.");
perat = new nc_perat_t[nfield];
for (int i = 0; i < nfield; i++) {
perat[i].dims = 0;
}
n_perat = 0;
for (int iarg = 5; iarg < narg; iarg++) {
int i = iarg-5;
int idim = 0;
int ndims = 1;
char mangled[1024];
strcpy(mangled, arg[iarg]);
// name mangling
// in the AMBER specification
if (!strcmp(mangled, "x") || !strcmp(mangled, "y") ||
!strcmp(mangled, "z")) {
idim = mangled[0] - 'x';
ndims = 3;
strcpy(mangled, "coordinates");
}
else if (!strcmp(mangled, "vx") || !strcmp(mangled, "vy") ||
!strcmp(mangled, "vz")) {
idim = mangled[1] - 'x';
ndims = 3;
strcpy(mangled, "velocities");
}
else if (!strcmp(mangled, "xs") || !strcmp(mangled, "ys") ||
!strcmp(mangled, "zs")) {
idim = mangled[0] - 'x';
ndims = 3;
strcpy(mangled, "scaled_coordinates");
}
else if (!strcmp(mangled, "xu") || !strcmp(mangled, "yu") ||
!strcmp(mangled, "zu")) {
idim = mangled[0] - 'x';
ndims = 3;
strcpy(mangled, "unwrapped_coordinates");
}
else if (!strcmp(mangled, "fx") || !strcmp(mangled, "fy") ||
!strcmp(mangled, "fz")) {
idim = mangled[1] - 'x';
ndims = 3;
strcpy(mangled, "forces");
}
else if (!strcmp(mangled, "mux") || !strcmp(mangled, "muy") ||
!strcmp(mangled, "muz")) {
idim = mangled[2] - 'x';
ndims = 3;
strcpy(mangled, "mu");
}
else if (!strncmp(mangled, "c_", 2)) {
char *ptr = strchr(mangled, '[');
if (ptr) {
if (mangled[strlen(mangled)-1] != ']')
error->all(FLERR,"Missing ']' in dump command");
*ptr = '\0';
idim = ptr[1] - '1';
ndims = THIS_IS_A_COMPUTE;
}
}
else if (!strncmp(mangled, "f_", 2)) {
char *ptr = strchr(mangled, '[');
if (ptr) {
if (mangled[strlen(mangled)-1] != ']')
error->all(FLERR,"Missing ']' in dump command");
*ptr = '\0';
idim = ptr[1] - '1';
ndims = THIS_IS_A_FIX;
}
}
// find mangled name
int inc = -1;
for (int j = 0; j < n_perat && inc < 0; j++) {
if (!strcmp(perat[j].name, mangled)) {
inc = j;
}
}
if (inc < 0) {
// this has not yet been defined
inc = n_perat;
perat[inc].dims = ndims;
if (ndims < 0) ndims = DUMP_NC_MPIIO_MAX_DIMS;
for (int j = 0; j < DUMP_NC_MPIIO_MAX_DIMS; j++) {
perat[inc].field[j] = -1;
}
strcpy(perat[inc].name, mangled);
n_perat++;
}
perat[inc].field[idim] = i;
}
n_perframe = 0;
perframe = NULL;
n_buffer = 0;
int_buffer = NULL;
double_buffer = NULL;
double_precision = false;
framei = 0;
}
/* ---------------------------------------------------------------------- */
-DumpNCMPIIO::~DumpNCMPIIO()
+DumpNetCDFMPIIO::~DumpNetCDFMPIIO()
{
closefile();
delete [] perat;
if (n_perframe > 0)
delete [] perframe;
if (int_buffer) memory->sfree(int_buffer);
if (double_buffer) memory->sfree(double_buffer);
}
/* ---------------------------------------------------------------------- */
-void DumpNCMPIIO::openfile()
+void DumpNetCDFMPIIO::openfile()
{
// now the computes and fixes have been initialized, so we can query
// for the size of vector quantities
for (int i = 0; i < n_perat; i++) {
if (perat[i].dims == THIS_IS_A_COMPUTE) {
int j = -1;
for (int k = 0; k < DUMP_NC_MPIIO_MAX_DIMS; k++) {
if (perat[i].field[k] >= 0) {
j = field2index[perat[i].field[0]];
}
}
if (j < 0)
error->all(FLERR,"Internal error.");
if (!compute[j]->peratom_flag)
error->all(FLERR,"compute does not provide per atom data");
perat[i].dims = compute[j]->size_peratom_cols;
if (perat[i].dims > DUMP_NC_MPIIO_MAX_DIMS)
error->all(FLERR,"perat[i].dims > DUMP_NC_MPIIO_MAX_DIMS");
}
else if (perat[i].dims == THIS_IS_A_FIX) {
int j = -1;
for (int k = 0; k < DUMP_NC_MPIIO_MAX_DIMS; k++) {
if (perat[i].field[k] >= 0) {
j = field2index[perat[i].field[0]];
}
}
if (j < 0)
error->all(FLERR,"Internal error.");
if (!fix[j]->peratom_flag)
error->all(FLERR,"fix does not provide per atom data");
perat[i].dims = fix[j]->size_peratom_cols;
if (perat[i].dims > DUMP_NC_MPIIO_MAX_DIMS)
error->all(FLERR,"perat[i].dims > DUMP_NC_MPIIO_MAX_DIMS");
}
}
// get total number of atoms
ntotalgr = group->count(igroup);
if (append_flag && access(filename, F_OK) != -1) {
// Fixme! Perform checks if dimensions and variables conform with
// data structure standard.
MPI_Offset index[NC_MAX_VAR_DIMS], count[NC_MAX_VAR_DIMS];
double d[1];
if (singlefile_opened) return;
singlefile_opened = 1;
NCERRX( ncmpi_open(MPI_COMM_WORLD, filename, NC_WRITE, MPI_INFO_NULL,
&ncid), filename );
// dimensions
NCERRX( ncmpi_inq_dimid(ncid, NC_FRAME_STR, &frame_dim), NC_FRAME_STR );
NCERRX( ncmpi_inq_dimid(ncid, NC_SPATIAL_STR, &spatial_dim),
NC_SPATIAL_STR );
NCERRX( ncmpi_inq_dimid(ncid, NC_VOIGT_STR, &Voigt_dim), NC_VOIGT_STR );
NCERRX( ncmpi_inq_dimid(ncid, NC_ATOM_STR, &atom_dim), NC_ATOM_STR );
NCERRX( ncmpi_inq_dimid(ncid, NC_CELL_SPATIAL_STR, &cell_spatial_dim),
NC_CELL_SPATIAL_STR );
NCERRX( ncmpi_inq_dimid(ncid, NC_CELL_ANGULAR_STR, &cell_angular_dim),
NC_CELL_ANGULAR_STR );
NCERRX( ncmpi_inq_dimid(ncid, NC_LABEL_STR, &label_dim), NC_LABEL_STR );
// default variables
NCERRX( ncmpi_inq_varid(ncid, NC_SPATIAL_STR, &spatial_var),
NC_SPATIAL_STR );
NCERRX( ncmpi_inq_varid(ncid, NC_CELL_SPATIAL_STR, &cell_spatial_var),
NC_CELL_SPATIAL_STR);
NCERRX( ncmpi_inq_varid(ncid, NC_CELL_ANGULAR_STR, &cell_angular_var),
NC_CELL_ANGULAR_STR);
NCERRX( ncmpi_inq_varid(ncid, NC_TIME_STR, &time_var), NC_TIME_STR );
NCERRX( ncmpi_inq_varid(ncid, NC_CELL_ORIGIN_STR, &cell_origin_var),
NC_CELL_ORIGIN_STR );
NCERRX( ncmpi_inq_varid(ncid, NC_CELL_LENGTHS_STR, &cell_lengths_var),
NC_CELL_LENGTHS_STR);
NCERRX( ncmpi_inq_varid(ncid, NC_CELL_ANGLES_STR, &cell_angles_var),
NC_CELL_ANGLES_STR);
// variables specified in the input file
for (int i = 0; i < n_perat; i++) {
nc_type xtype;
// Type mangling
if (vtype[perat[i].field[0]] == INT) {
xtype = NC_INT;
}
else {
if (double_precision)
xtype = NC_DOUBLE;
else
xtype = NC_FLOAT;
}
NCERRX( ncmpi_inq_varid(ncid, perat[i].name, &perat[i].var),
perat[i].name );
}
// perframe variables
for (int i = 0; i < n_perframe; i++) {
NCERRX( ncmpi_inq_varid(ncid, perframe[i].name, &perframe[i].var),
perframe[i].name );
}
MPI_Offset nframes;
NCERR( ncmpi_inq_dimlen(ncid, frame_dim, &nframes) );
// framei == -1 means append to file, == -2 means override last frame
// Note that in the input file this translates to 'yes', '-1', etc.
if (framei < 0 || (append_flag && framei == 0)) framei = nframes+framei+1;
if (framei < 1) framei = 1;
}
else {
int dims[NC_MAX_VAR_DIMS];
MPI_Offset index[NC_MAX_VAR_DIMS], count[NC_MAX_VAR_DIMS];
double d[1];
if (singlefile_opened) return;
singlefile_opened = 1;
NCERRX( ncmpi_create(MPI_COMM_WORLD, filename, NC_64BIT_OFFSET,
MPI_INFO_NULL, &ncid), filename );
// dimensions
NCERRX( ncmpi_def_dim(ncid, NC_FRAME_STR, NC_UNLIMITED, &frame_dim),
NC_FRAME_STR );
NCERRX( ncmpi_def_dim(ncid, NC_SPATIAL_STR, 3, &spatial_dim),
NC_SPATIAL_STR );
NCERRX( ncmpi_def_dim(ncid, NC_VOIGT_STR, 6, &Voigt_dim),
NC_VOIGT_STR );
NCERRX( ncmpi_def_dim(ncid, NC_ATOM_STR, ntotalgr, &atom_dim),
NC_ATOM_STR );
NCERRX( ncmpi_def_dim(ncid, NC_CELL_SPATIAL_STR, 3, &cell_spatial_dim),
NC_CELL_SPATIAL_STR );
NCERRX( ncmpi_def_dim(ncid, NC_CELL_ANGULAR_STR, 3, &cell_angular_dim),
NC_CELL_ANGULAR_STR );
NCERRX( ncmpi_def_dim(ncid, NC_LABEL_STR, 10, &label_dim),
NC_LABEL_STR );
// default variables
dims[0] = spatial_dim;
NCERRX( ncmpi_def_var(ncid, NC_SPATIAL_STR, NC_CHAR, 1, dims, &spatial_var),
NC_SPATIAL_STR );
NCERRX( ncmpi_def_var(ncid, NC_CELL_SPATIAL_STR, NC_CHAR, 1, dims,
&cell_spatial_var), NC_CELL_SPATIAL_STR );
dims[0] = spatial_dim;
dims[1] = label_dim;
NCERRX( ncmpi_def_var(ncid, NC_CELL_ANGULAR_STR, NC_CHAR, 2, dims,
&cell_angular_var), NC_CELL_ANGULAR_STR );
dims[0] = frame_dim;
NCERRX( ncmpi_def_var(ncid, NC_TIME_STR, NC_DOUBLE, 1, dims, &time_var),
NC_TIME_STR);
dims[0] = frame_dim;
dims[1] = cell_spatial_dim;
NCERRX( ncmpi_def_var(ncid, NC_CELL_ORIGIN_STR, NC_DOUBLE, 2, dims,
&cell_origin_var), NC_CELL_ORIGIN_STR );
NCERRX( ncmpi_def_var(ncid, NC_CELL_LENGTHS_STR, NC_DOUBLE, 2, dims,
&cell_lengths_var), NC_CELL_LENGTHS_STR );
dims[0] = frame_dim;
dims[1] = cell_angular_dim;
NCERRX( ncmpi_def_var(ncid, NC_CELL_ANGLES_STR, NC_DOUBLE, 2, dims,
&cell_angles_var), NC_CELL_ANGLES_STR );
// variables specified in the input file
dims[0] = frame_dim;
dims[1] = atom_dim;
dims[2] = spatial_dim;
for (int i = 0; i < n_perat; i++) {
nc_type xtype;
// Type mangling
if (vtype[perat[i].field[0]] == INT) {
xtype = NC_INT;
}
else {
if (double_precision)
xtype = NC_DOUBLE;
else
xtype = NC_FLOAT;
}
if (perat[i].dims == 6) {
// this is a tensor in Voigt notation
dims[2] = Voigt_dim;
NCERRX( ncmpi_def_var(ncid, perat[i].name, xtype, 3, dims,
&perat[i].var), perat[i].name );
}
else if (perat[i].dims == 3) {
// this is a vector, we need to store x-, y- and z-coordinates
dims[2] = spatial_dim;
NCERRX( ncmpi_def_var(ncid, perat[i].name, xtype, 3, dims,
&perat[i].var), perat[i].name );
}
else if (perat[i].dims == 1) {
NCERRX( ncmpi_def_var(ncid, perat[i].name, xtype, 2, dims,
&perat[i].var), perat[i].name );
}
else {
char errstr[1024];
sprintf(errstr, "%i dimensions for '%s'. Not sure how to write "
"this to the NetCDF trajectory file.", perat[i].dims,
perat[i].name);
error->all(FLERR,errstr);
}
}
// perframe variables
for (int i = 0; i < n_perframe; i++) {
if (perframe[i].type == THIS_IS_A_BIGINT) {
NCERRX( ncmpi_def_var(ncid, perframe[i].name, NC_INT, 1, dims,
&perframe[i].var), perframe[i].name );
}
else {
NCERRX( ncmpi_def_var(ncid, perframe[i].name, NC_DOUBLE, 1, dims,
&perframe[i].var), perframe[i].name );
}
}
// attributes
NCERR( ncmpi_put_att_text(ncid, NC_GLOBAL, "Conventions",
5, "AMBER") );
NCERR( ncmpi_put_att_text(ncid, NC_GLOBAL, "ConventionVersion",
3, "1.0") );
NCERR( ncmpi_put_att_text(ncid, NC_GLOBAL, "program",
6, "LAMMPS") );
NCERR( ncmpi_put_att_text(ncid, NC_GLOBAL, "programVersion",
strlen(universe->version), universe->version) );
// units
if (!strcmp(update->unit_style, "lj")) {
NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
2, "lj") );
NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
2, "lj") );
NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
2, "lj") );
}
else if (!strcmp(update->unit_style, "real")) {
NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
11, "femtosecond") );
NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
8, "Angstrom") );
NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
8, "Angstrom") );
}
else if (!strcmp(update->unit_style, "metal")) {
NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
10, "picosecond") );
NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
8, "Angstrom") );
NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
8, "Angstrom") );
}
else if (!strcmp(update->unit_style, "si")) {
NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
6, "second") );
NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
5, "meter") );
NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
5, "meter") );
}
else if (!strcmp(update->unit_style, "cgs")) {
NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
6, "second") );
NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
10, "centimeter") );
NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
10, "centimeter") );
}
else if (!strcmp(update->unit_style, "electron")) {
NCERR( ncmpi_put_att_text(ncid, time_var, NC_UNITS_STR,
11, "femtosecond") );
NCERR( ncmpi_put_att_text(ncid, cell_origin_var, NC_UNITS_STR,
4, "Bohr") );
NCERR( ncmpi_put_att_text(ncid, cell_lengths_var, NC_UNITS_STR,
4, "Bohr") );
}
else {
char errstr[1024];
sprintf(errstr, "Unsupported unit style '%s'", update->unit_style);
error->all(FLERR,errstr);
}
NCERR( ncmpi_put_att_text(ncid, cell_angles_var, NC_UNITS_STR,
6, "degree") );
d[0] = update->dt;
NCERR( ncmpi_put_att_double(ncid, time_var, NC_SCALE_FACTOR_STR,
NC_DOUBLE, 1, d) );
d[0] = 1.0;
NCERR( ncmpi_put_att_double(ncid, cell_origin_var, NC_SCALE_FACTOR_STR,
NC_DOUBLE, 1, d) );
d[0] = 1.0;
NCERR( ncmpi_put_att_double(ncid, cell_lengths_var, NC_SCALE_FACTOR_STR,
NC_DOUBLE, 1, d) );
/*
* Finished with definition
*/
NCERR( ncmpi_enddef(ncid) );
/*
* Write label variables
*/
NCERR( ncmpi_begin_indep_data(ncid) );
if (filewriter) {
NCERR( ncmpi_put_var_text(ncid, spatial_var, "xyz") );
NCERR( ncmpi_put_var_text(ncid, cell_spatial_var, "abc") );
index[0] = 0;
index[1] = 0;
count[0] = 1;
count[1] = 5;
NCERR( ncmpi_put_vara_text(ncid, cell_angular_var, index, count,
"alpha") );
index[0] = 1;
count[1] = 4;
NCERR( ncmpi_put_vara_text(ncid, cell_angular_var, index, count,
"beta") );
index[0] = 2;
count[1] = 5;
NCERR( ncmpi_put_vara_text(ncid, cell_angular_var, index, count,
"gamma") );
}
NCERR( ncmpi_end_indep_data(ncid) );
framei = 1;
}
}
/* ---------------------------------------------------------------------- */
-void DumpNCMPIIO::closefile()
+void DumpNetCDFMPIIO::closefile()
{
if (singlefile_opened) {
NCERR( ncmpi_close(ncid) );
singlefile_opened = 0;
- // append next time DumpNCMPIIO::openfile is called
+ // append next time DumpNetCDFMPIIO::openfile is called
append_flag = 1;
// write to next frame upon next open
framei++;
}
}
/* ---------------------------------------------------------------------- */
-void DumpNCMPIIO::write()
+void DumpNetCDFMPIIO::write()
{
// open file
openfile();
// need to write per-frame (global) properties here since they may come
// from computes. write_header below is only called from the writing
// processes, but modify->compute[j]->compute_* must be called from all
// processes.
MPI_Offset start[2];
start[0] = framei-1;
start[1] = 0;
NCERR( ncmpi_begin_indep_data(ncid) );
for (int i = 0; i < n_perframe; i++) {
if (perframe[i].type == THIS_IS_A_BIGINT) {
bigint data;
(this->*perframe[i].compute)((void*) &data);
if (filewriter)
#if defined(LAMMPS_SMALLBIG) || defined(LAMMPS_BIGBIG)
NCERR( ncmpi_put_var1_long(ncid, perframe[i].var, start, &data) );
#else
NCERR( ncmpi_put_var1_int(ncid, perframe[i].var, start, &data) );
#endif
}
else {
double data;
int j = perframe[i].index;
int idim = perframe[i].dim;
if (perframe[i].type == THIS_IS_A_COMPUTE) {
if (idim >= 0) {
modify->compute[j]->compute_vector();
data = modify->compute[j]->vector[idim];
}
else
data = modify->compute[j]->compute_scalar();
}
else if (perframe[i].type == THIS_IS_A_FIX) {
if (idim >= 0) {
data = modify->fix[j]->compute_vector(idim);
}
else
data = modify->fix[j]->compute_scalar();
}
else if (perframe[i].type == THIS_IS_A_VARIABLE) {
j = input->variable->find(perframe[i].id);
data = input->variable->compute_equal(j);
}
if (filewriter)
NCERR( ncmpi_put_var1_double(ncid, perframe[i].var, start, &data) );
}
}
// write timestep header
write_time_and_cell();
NCERR( ncmpi_end_indep_data(ncid) );
// nme = # of dump lines this proc contributes to dump
nme = count();
int *block_sizes = new int[comm->nprocs];
MPI_Allgather(&nme, 1, MPI_INT, block_sizes, 1, MPI_INT, MPI_COMM_WORLD);
blocki = 0;
for (int i = 0; i < comm->me; i++) blocki += block_sizes[i];
delete [] block_sizes;
// insure buf is sized for packing and communicating
// use nme to insure filewriter proc can receive info from others
// limit nme*size_one to int since used as arg in MPI calls
if (nme > maxbuf) {
if ((bigint) nme * size_one > MAXSMALLINT)
error->all(FLERR,"Too much per-proc info for dump");
maxbuf = nme;
memory->destroy(buf);
memory->create(buf,maxbuf*size_one,"dump:buf");
}
// pack my data into buf
pack(NULL);
// each process writes its data
write_data(nme, buf);
// close file. this ensures data is flushed and minimizes data corruption
closefile();
}
/* ---------------------------------------------------------------------- */
-void DumpNCMPIIO::write_time_and_cell()
+void DumpNetCDFMPIIO::write_time_and_cell()
{
MPI_Offset start[2];
start[0] = framei-1;
start[1] = 0;
MPI_Offset count[2];
double time, cell_origin[3], cell_lengths[3], cell_angles[3];
time = update->ntimestep;
if (domain->triclinic == 0) {
cell_origin[0] = domain->boxlo[0];
cell_origin[1] = domain->boxlo[1];
cell_origin[2] = domain->boxlo[2];
cell_lengths[0] = domain->xprd;
cell_lengths[1] = domain->yprd;
cell_lengths[2] = domain->zprd;
cell_angles[0] = 90;
cell_angles[1] = 90;
cell_angles[2] = 90;
}
else {
double cosalpha, cosbeta, cosgamma;
double *h = domain->h;
cell_origin[0] = domain->boxlo[0];
cell_origin[1] = domain->boxlo[1];
cell_origin[2] = domain->boxlo[2];
cell_lengths[0] = domain->xprd;
cell_lengths[1] = sqrt(h[1]*h[1]+h[5]*h[5]);
cell_lengths[2] = sqrt(h[2]*h[2]+h[3]*h[3]+h[4]*h[4]);
cosalpha = (h[5]*h[4]+h[1]*h[3])/
sqrt((h[1]*h[1]+h[5]*h[5])*(h[2]*h[2]+h[3]*h[3]+h[4]*h[4]));
cosbeta = h[4]/sqrt(h[2]*h[2]+h[3]*h[3]+h[4]*h[4]);
cosgamma = h[5]/sqrt(h[1]*h[1]+h[5]*h[5]);
cell_angles[0] = acos(cosalpha)*180.0/MY_PI;
cell_angles[1] = acos(cosbeta)*180.0/MY_PI;
cell_angles[2] = acos(cosgamma)*180.0/MY_PI;
}
// Recent AMBER conventions say that nonperiodic boundaries should have
// 'cell_lengths' set to zero.
for (int dim = 0; dim < 3; dim++) {
if (!domain->periodicity[dim])
cell_lengths[dim] = 0.0;
}
count[0] = 1;
count[1] = 3;
if (filewriter) {
NCERR( ncmpi_put_var1_double(ncid, time_var, start, &time) );
NCERR( ncmpi_put_vara_double(ncid, cell_origin_var, start, count,
cell_origin) );
NCERR( ncmpi_put_vara_double(ncid, cell_lengths_var, start, count,
cell_lengths) );
NCERR( ncmpi_put_vara_double(ncid, cell_angles_var, start, count,
cell_angles) );
}
}
/* ----------------------------------------------------------------------
write data lines to file in a block-by-block style
write head of block (mass & element name) only if has atoms of the type
------------------------------------------------------------------------- */
-void DumpNCMPIIO::write_data(int n, double *mybuf)
+void DumpNetCDFMPIIO::write_data(int n, double *mybuf)
{
MPI_Offset start[NC_MAX_VAR_DIMS], count[NC_MAX_VAR_DIMS];
MPI_Offset stride[NC_MAX_VAR_DIMS];
if (!int_buffer) {
n_buffer = std::max(1, n);
int_buffer = (int *)
- memory->smalloc(n_buffer*sizeof(int), "DumpNCMPIIO::int_buffer");
+ memory->smalloc(n_buffer*sizeof(int),"dump::int_buffer");
double_buffer = (double *)
- memory->smalloc(n_buffer*sizeof(double), "DumpNCMPIIO::double_buffer");
+ memory->smalloc(n_buffer*sizeof(double),"dump::double_buffer");
}
if (n > n_buffer) {
n_buffer = std::max(1, n);
int_buffer = (int *)
- memory->srealloc(int_buffer, n_buffer*sizeof(int),
- "DumpNCMPIIO::int_buffer");
+ memory->srealloc(int_buffer, n_buffer*sizeof(int),"dump::int_buffer");
double_buffer = (double *)
memory->srealloc(double_buffer, n_buffer*sizeof(double),
- "DumpNCMPIIO::double_buffer");
+ "dump::double_buffer");
}
start[0] = framei-1;
start[1] = blocki;
start[2] = 0;
if (n == 0) {
/* If there is no data, we need to make sure the start values don't exceed
dimension bounds. Just set them to zero. */
start[1] = 0;
}
count[0] = 1;
count[1] = n;
count[2] = 1;
stride[0] = 1;
stride[1] = 1;
stride[2] = 3;
for (int i = 0; i < n_perat; i++) {
int iaux = perat[i].field[0];
if (iaux < 0 || iaux >= size_one) {
char errmsg[1024];
sprintf(errmsg, "Internal error: name = %s, iaux = %i, "
"size_one = %i", perat[i].name, iaux, size_one);
error->one(FLERR,errmsg);
}
if (vtype[iaux] == INT) {
// integers
if (perat[i].dims > 1) {
for (int idim = 0; idim < perat[i].dims; idim++) {
iaux = perat[i].field[idim];
if (iaux >= 0) {
if (iaux >= size_one) {
char errmsg[1024];
sprintf(errmsg, "Internal error: name = %s, iaux = %i, "
"size_one = %i", perat[i].name, iaux, size_one);
error->one(FLERR,errmsg);
}
for (int j = 0; j < n; j++, iaux+=size_one) {
int_buffer[j] = mybuf[iaux];
}
start[2] = idim;
NCERRX( ncmpi_put_vars_int_all(ncid, perat[i].var, start, count,
stride, int_buffer), perat[i].name );
}
}
}
else {
for (int j = 0; j < n; j++, iaux+=size_one) {
int_buffer[j] = mybuf[iaux];
}
NCERRX( ncmpi_put_vara_int_all(ncid, perat[i].var, start, count,
int_buffer), perat[i].name );
}
}
else {
// doubles
if (perat[i].dims > 1) {
for (int idim = 0; idim < perat[i].dims; idim++) {
iaux = perat[i].field[idim];
if (iaux >= 0) {
if (iaux >= size_one) {
char errmsg[1024];
sprintf(errmsg, "Internal error: name = %s, iaux = %i, "
"size_one = %i", perat[i].name, iaux, size_one);
error->one(FLERR,errmsg);
}
for (int j = 0; j < n; j++, iaux+=size_one) {
double_buffer[j] = mybuf[iaux];
}
start[2] = idim;
NCERRX( ncmpi_put_vars_double_all(ncid, perat[i].var, start, count,
stride, double_buffer), perat[i].name );
}
}
}
else {
for (int j = 0; j < n; j++, iaux+=size_one) {
double_buffer[j] = mybuf[iaux];
}
NCERRX( ncmpi_put_vara_double_all(ncid, perat[i].var, start, count,
double_buffer), perat[i].name );
}
}
}
}
/* ---------------------------------------------------------------------- */
-int DumpNCMPIIO::modify_param(int narg, char **arg)
+int DumpNetCDFMPIIO::modify_param(int narg, char **arg)
{
int iarg = 0;
if (strcmp(arg[iarg],"double") == 0) {
iarg++;
if (iarg >= narg)
error->all(FLERR,"expected 'yes' or 'no' after 'double' keyword.");
if (strcmp(arg[iarg],"yes") == 0) {
double_precision = true;
}
else if (strcmp(arg[iarg],"no") == 0) {
double_precision = false;
}
else error->all(FLERR,"expected 'yes' or 'no' after 'double' keyword.");
iarg++;
return 2;
}
else if (strcmp(arg[iarg],"at") == 0) {
iarg++;
framei = force->inumeric(FLERR,arg[iarg]);
if (framei < 0) framei--;
iarg++;
return 2;
}
else if (strcmp(arg[iarg],"global") == 0) {
// "perframe" quantities, i.e. not per-atom stuff
iarg++;
n_perframe = narg-iarg;
perframe = new nc_perframe_t[n_perframe];
for (int i = 0; iarg < narg; iarg++, i++) {
int n;
char *suffix;
if (!strcmp(arg[iarg],"step")) {
perframe[i].type = THIS_IS_A_BIGINT;
- perframe[i].compute = &DumpNCMPIIO::compute_step;
+ perframe[i].compute = &DumpNetCDFMPIIO::compute_step;
strcpy(perframe[i].name, arg[iarg]);
}
else if (!strcmp(arg[iarg],"elapsed")) {
perframe[i].type = THIS_IS_A_BIGINT;
- perframe[i].compute = &DumpNCMPIIO::compute_elapsed;
+ perframe[i].compute = &DumpNetCDFMPIIO::compute_elapsed;
strcpy(perframe[i].name, arg[iarg]);
}
else if (!strcmp(arg[iarg],"elaplong")) {
perframe[i].type = THIS_IS_A_BIGINT;
- perframe[i].compute = &DumpNCMPIIO::compute_elapsed_long;
+ perframe[i].compute = &DumpNetCDFMPIIO::compute_elapsed_long;
strcpy(perframe[i].name, arg[iarg]);
}
else {
n = strlen(arg[iarg]);
if (n > 2) {
suffix = new char[n-1];
strcpy(suffix, arg[iarg]+2);
}
else {
char errstr[1024];
sprintf(errstr, "perframe quantity '%s' must thermo quantity or "
"compute, fix or variable", arg[iarg]);
error->all(FLERR,errstr);
}
if (!strncmp(arg[iarg], "c_", 2)) {
int idim = -1;
char *ptr = strchr(suffix, '[');
if (ptr) {
if (suffix[strlen(suffix)-1] != ']')
error->all(FLERR,"Missing ']' in dump modify command");
*ptr = '\0';
idim = ptr[1] - '1';
}
n = modify->find_compute(suffix);
if (n < 0)
error->all(FLERR,"Could not find dump modify compute ID");
if (modify->compute[n]->peratom_flag != 0)
error->all(FLERR,"Dump modify compute ID computes per-atom info");
if (idim >= 0 && modify->compute[n]->vector_flag == 0)
error->all(FLERR,"Dump modify compute ID does not compute vector");
if (idim < 0 && modify->compute[n]->scalar_flag == 0)
error->all(FLERR,"Dump modify compute ID does not compute scalar");
perframe[i].type = THIS_IS_A_COMPUTE;
perframe[i].dim = idim;
perframe[i].index = n;
strcpy(perframe[i].name, arg[iarg]);
}
else if (!strncmp(arg[iarg], "f_", 2)) {
int idim = -1;
char *ptr = strchr(suffix, '[');
if (ptr) {
if (suffix[strlen(suffix)-1] != ']')
error->all(FLERR,"Missing ']' in dump modify command");
*ptr = '\0';
idim = ptr[1] - '1';
}
n = modify->find_fix(suffix);
if (n < 0)
error->all(FLERR,"Could not find dump modify fix ID");
if (modify->fix[n]->peratom_flag != 0)
error->all(FLERR,"Dump modify fix ID computes per-atom info");
if (idim >= 0 && modify->fix[n]->vector_flag == 0)
error->all(FLERR,"Dump modify fix ID does not compute vector");
if (idim < 0 && modify->fix[n]->scalar_flag == 0)
error->all(FLERR,"Dump modify fix ID does not compute vector");
perframe[i].type = THIS_IS_A_FIX;
perframe[i].dim = idim;
perframe[i].index = n;
strcpy(perframe[i].name, arg[iarg]);
}
else if (!strncmp(arg[iarg], "v_", 2)) {
n = input->variable->find(suffix);
if (n < 0)
error->all(FLERR,"Could not find dump modify variable ID");
if (!input->variable->equalstyle(n))
error->all(FLERR,"Dump modify variable must be of style equal");
perframe[i].type = THIS_IS_A_VARIABLE;
perframe[i].dim = 1;
perframe[i].index = n;
strcpy(perframe[i].name, arg[iarg]);
strcpy(perframe[i].id, suffix);
}
else {
char errstr[1024];
sprintf(errstr, "perframe quantity '%s' must be compute, fix or "
"variable", arg[iarg]);
error->all(FLERR,errstr);
}
delete [] suffix;
}
}
return narg;
} else return 0;
}
/* ---------------------------------------------------------------------- */
-void DumpNCMPIIO::ncerr(int err, const char *descr, int line)
+void DumpNetCDFMPIIO::ncerr(int err, const char *descr, int line)
{
if (err != NC_NOERR) {
char errstr[1024];
if (descr) {
sprintf(errstr, "NetCDF failed with error '%s' (while accessing '%s') "
" in line %i of %s.", ncmpi_strerror(err), descr, line, __FILE__);
}
else {
sprintf(errstr, "NetCDF failed with error '%s' in line %i of %s.",
ncmpi_strerror(err), line, __FILE__);
}
error->one(FLERR,errstr);
}
}
/* ----------------------------------------------------------------------
one method for every keyword thermo can output
called by compute() or evaluate_keyword()
compute will have already been called
set ivalue/dvalue/bivalue if value is int/double/bigint
customize a new keyword by adding a method
------------------------------------------------------------------------- */
-void DumpNCMPIIO::compute_step(void *r)
+void DumpNetCDFMPIIO::compute_step(void *r)
{
*((bigint *) r) = update->ntimestep;
}
/* ---------------------------------------------------------------------- */
-void DumpNCMPIIO::compute_elapsed(void *r)
+void DumpNetCDFMPIIO::compute_elapsed(void *r)
{
*((bigint *) r) = update->ntimestep - update->firststep;
}
/* ---------------------------------------------------------------------- */
-void DumpNCMPIIO::compute_elapsed_long(void *r)
+void DumpNetCDFMPIIO::compute_elapsed_long(void *r)
{
*((bigint *) r) = update->ntimestep - update->beginstep;
}
#endif /* defined(LMP_HAS_PNETCDF) */
diff --git a/src/USER-NC-DUMP/dump_nc_mpiio.h b/src/USER-NETCDF/dump_netcdf_mpiio.h
similarity index 95%
rename from src/USER-NC-DUMP/dump_nc_mpiio.h
rename to src/USER-NETCDF/dump_netcdf_mpiio.h
index 5e36335e6..6f5b00b03 100644
--- a/src/USER-NC-DUMP/dump_nc_mpiio.h
+++ b/src/USER-NETCDF/dump_netcdf_mpiio.h
@@ -1,140 +1,141 @@
/* ======================================================================
LAMMPS NetCDF dump style
https://github.com/pastewka/lammps-netcdf
Lars Pastewka, lars.pastewka@kit.edu
Copyright (2011-2013) Fraunhofer IWM
Copyright (2014) Karlsruhe Institute of Technology
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
====================================================================== */
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
+
#if defined(LMP_HAS_PNETCDF)
#ifdef DUMP_CLASS
-DumpStyle(nc/mpiio,DumpNCMPIIO)
+DumpStyle(netcdf/mpiio,DumpNetCDFMPIIO)
#else
-#ifndef LMP_DUMP_NC_MPIIO_H
-#define LMP_DUMP_NC_MPIIO_H
+#ifndef LMP_DUMP_NETCDF_MPIIO_H
+#define LMP_DUMP_NETCDF_MPIIO_H
#include "dump_custom.h"
namespace LAMMPS_NS {
const int NC_MPIIO_FIELD_NAME_MAX = 100;
const int DUMP_NC_MPIIO_MAX_DIMS = 100;
-class DumpNCMPIIO : public DumpCustom {
+class DumpNetCDFMPIIO : public DumpCustom {
public:
- DumpNCMPIIO(class LAMMPS *, int, char **);
- virtual ~DumpNCMPIIO();
+ DumpNetCDFMPIIO(class LAMMPS *, int, char **);
+ virtual ~DumpNetCDFMPIIO();
virtual void write();
private:
// per-atoms quantities (positions, velocities, etc.)
struct nc_perat_t {
int dims; // number of dimensions
int field[DUMP_NC_MPIIO_MAX_DIMS]; // field indices corresponding to the dim.
char name[NC_MPIIO_FIELD_NAME_MAX]; // field name
int var; // NetCDF variable
};
typedef void (DumpNCMPIIO::*funcptr_t)(void *);
// per-frame quantities (variables, fixes or computes)
struct nc_perframe_t {
char name[NC_MPIIO_FIELD_NAME_MAX]; // field name
int var; // NetCDF variable
int type; // variable, fix, compute or callback
int index; // index in fix/compute list
funcptr_t compute; // compute function
int dim; // dimension
char id[NC_MPIIO_FIELD_NAME_MAX]; // variable id
bigint bigint_data; // actual data
double double_data; // actual data
};
int framei; // current frame index
int blocki; // current block index
int ndata; // number of data blocks to expect
bigint ntotalgr; // # of atoms
int n_perat; // # of netcdf per-atom properties
nc_perat_t *perat; // per-atom properties
int n_perframe; // # of global netcdf (not per-atom) fix props
nc_perframe_t *perframe; // global properties
bool double_precision; // write everything as double precision
bigint n_buffer; // size of buffer
int *int_buffer; // buffer for passing data to netcdf
double *double_buffer; // buffer for passing data to netcdf
int ncid;
int frame_dim;
int spatial_dim;
int Voigt_dim;
int atom_dim;
int cell_spatial_dim;
int cell_angular_dim;
int label_dim;
int spatial_var;
int cell_spatial_var;
int cell_angular_var;
int time_var;
int cell_origin_var;
int cell_lengths_var;
int cell_angles_var;
virtual void openfile();
void closefile();
void write_time_and_cell();
virtual void write_data(int, double *);
void write_prmtop();
virtual int modify_param(int, char **);
void ncerr(int, const char *, int);
void compute_step(void *);
void compute_elapsed(void *);
void compute_elapsed_long(void *);
};
}
#endif
#endif
#endif /* defined(LMP_HAS_PNETCDF) */
diff --git a/src/USER-OMP/angle_sdk_omp.h b/src/USER-OMP/angle_sdk_omp.h
index 9ab75904c..c041c2ecc 100644
--- a/src/USER-OMP/angle_sdk_omp.h
+++ b/src/USER-OMP/angle_sdk_omp.h
@@ -1,47 +1,46 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Axel Kohlmeyer (Temple U)
------------------------------------------------------------------------- */
#ifdef ANGLE_CLASS
AngleStyle(sdk/omp,AngleSDKOMP)
-AngleStyle(cg/cmm/omp,AngleSDKOMP)
#else
#ifndef LMP_ANGLE_SDK_OMP_H
#define LMP_ANGLE_SDK_OMP_H
#include "angle_sdk.h"
#include "thr_omp.h"
namespace LAMMPS_NS {
class AngleSDKOMP : public AngleSDK, public ThrOMP {
public:
AngleSDKOMP(class LAMMPS *lmp);
virtual void compute(int, int);
private:
template <int EVFLAG, int EFLAG, int NEWTON_BOND>
void eval(int ifrom, int ito, ThrData * const thr);
};
}
#endif
#endif
diff --git a/src/USER-OMP/improper_ring_omp.cpp b/src/USER-OMP/improper_ring_omp.cpp
index bd7593c51..4eadc8318 100644
--- a/src/USER-OMP/improper_ring_omp.cpp
+++ b/src/USER-OMP/improper_ring_omp.cpp
@@ -1,266 +1,266 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Axel Kohlmeyer (Temple U)
------------------------------------------------------------------------- */
#include <math.h>
#include "improper_ring_omp.h"
#include "atom.h"
#include "comm.h"
#include "neighbor.h"
#include "domain.h"
#include "force.h"
#include "update.h"
#include "error.h"
#include "math_special.h"
#include "suffix.h"
using namespace LAMMPS_NS;
using namespace MathSpecial;
#define TOLERANCE 0.05
#define SMALL 0.001
/* ---------------------------------------------------------------------- */
ImproperRingOMP::ImproperRingOMP(class LAMMPS *lmp)
: ImproperRing(lmp), ThrOMP(lmp,THR_IMPROPER)
{
suffix_flag |= Suffix::OMP;
}
/* ---------------------------------------------------------------------- */
void ImproperRingOMP::compute(int eflag, int vflag)
{
if (eflag || vflag) {
ev_setup(eflag,vflag);
} else evflag = 0;
const int nall = atom->nlocal + atom->nghost;
const int nthreads = comm->nthreads;
const int inum = neighbor->nimproperlist;
#if defined(_OPENMP)
#pragma omp parallel default(none) shared(eflag,vflag)
#endif
{
int ifrom, ito, tid;
loop_setup_thr(ifrom, ito, tid, inum, nthreads);
ThrData *thr = fix->get_thr(tid);
thr->timer(Timer::START);
ev_setup_thr(eflag, vflag, nall, eatom, vatom, thr);
if (inum > 0) {
if (evflag) {
if (eflag) {
if (force->newton_bond) eval<1,1,1>(ifrom, ito, thr);
else eval<1,1,0>(ifrom, ito, thr);
} else {
if (force->newton_bond) eval<1,0,1>(ifrom, ito, thr);
else eval<1,0,0>(ifrom, ito, thr);
}
} else {
if (force->newton_bond) eval<0,0,1>(ifrom, ito, thr);
else eval<0,0,0>(ifrom, ito, thr);
}
thr->timer(Timer::BOND);
reduce_thr(this, eflag, vflag, thr);
}
} // end of omp parallel region
}
template <int EVFLAG, int EFLAG, int NEWTON_BOND>
void ImproperRingOMP::eval(int nfrom, int nto, ThrData * const thr)
{
/* Be careful!: "chi" is the equilibrium angle in radians. */
int i1,i2,i3,i4,n,type;
double eimproper;
/* Compatibility variables. */
double vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z;
double f1[3], f3[3], f4[3];
/* Actual computation variables. */
int at1[3], at2[3], at3[3], icomb;
double bvec1x[3], bvec1y[3], bvec1z[3],
bvec2x[3], bvec2y[3], bvec2z[3],
bvec1n[3], bvec2n[3], bend_angle[3];
double angle_summer, angfac, cfact1, cfact2, cfact3;
double cjiji, ckjji, ckjkj, fix, fiy, fiz, fjx, fjy, fjz, fkx, fky, fkz;
eimproper = 0.0;
const double * const * const x = atom->x;
double * const * const f = thr->get_f();
const int * const * const improperlist = neighbor->improperlist;
const int nlocal = atom->nlocal;
/* A description of the potential can be found in
Macromolecules 35, pp. 1463-1472 (2002). */
for (n = nfrom; n < nto; n++) {
/* Take the ids of the atoms contributing to the improper potential. */
i1 = improperlist[n][0]; /* Atom "1" of Figure 1 from the above reference.*/
i2 = improperlist[n][1]; /* Atom "2" ... */
i3 = improperlist[n][2]; /* Atom "3" ... */
i4 = improperlist[n][3]; /* Atom "9" ... */
type = improperlist[n][4];
/* Calculate the necessary variables for LAMMPS implementation.
if (evflag) ev_tally(i1,i2,i3,i4,nlocal,newton_bond,eimproper,f1,f3,f4,
vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z);
Although, they are irrelevant to the calculation of the potential, we keep
them for maximal compatibility. */
vb1x = x[i1][0] - x[i2][0]; vb1y = x[i1][1] - x[i2][1]; vb1z = x[i1][2] - x[i2][2];
vb2x = x[i3][0] - x[i2][0]; vb2y = x[i3][1] - x[i2][1]; vb2z = x[i3][2] - x[i2][2];
vb3x = x[i4][0] - x[i3][0]; vb3y = x[i4][1] - x[i3][1]; vb3z = x[i4][2] - x[i3][2];
/* Pass the atom tags to form the necessary combinations. */
at1[0] = i1; at2[0] = i2; at3[0] = i4; /* ids: 1-2-9 */
at1[1] = i1; at2[1] = i2; at3[1] = i3; /* ids: 1-2-3 */
at1[2] = i4; at2[2] = i2; at3[2] = i3; /* ids: 9-2-3 */
/* Initialize the sum of the angles differences. */
angle_summer = 0.0;
/* Take a loop over the three angles, defined by each triad: */
for (icomb = 0; icomb < 3; icomb ++) {
/* Bond vector connecting the first and the second atom. */
bvec1x[icomb] = x[at2[icomb]][0] - x[at1[icomb]][0];
bvec1y[icomb] = x[at2[icomb]][1] - x[at1[icomb]][1];
bvec1z[icomb] = x[at2[icomb]][2] - x[at1[icomb]][2];
/* also calculate the norm of the vector: */
bvec1n[icomb] = sqrt( bvec1x[icomb]*bvec1x[icomb]
+ bvec1y[icomb]*bvec1y[icomb]
+ bvec1z[icomb]*bvec1z[icomb]);
/* Bond vector connecting the second and the third atom. */
bvec2x[icomb] = x[at3[icomb]][0] - x[at2[icomb]][0];
bvec2y[icomb] = x[at3[icomb]][1] - x[at2[icomb]][1];
bvec2z[icomb] = x[at3[icomb]][2] - x[at2[icomb]][2];
/* also calculate the norm of the vector: */
bvec2n[icomb] = sqrt( bvec2x[icomb]*bvec2x[icomb]
+ bvec2y[icomb]*bvec2y[icomb]
+ bvec2z[icomb]*bvec2z[icomb]);
/* Calculate the bending angle of the atom triad: */
bend_angle[icomb] = ( bvec2x[icomb]*bvec1x[icomb]
+ bvec2y[icomb]*bvec1y[icomb]
+ bvec2z[icomb]*bvec1z[icomb]);
bend_angle[icomb] /= (bvec1n[icomb] * bvec2n[icomb]);
if (bend_angle[icomb] > 1.0) bend_angle[icomb] -= SMALL;
if (bend_angle[icomb] < -1.0) bend_angle[icomb] += SMALL;
/* Append the current angle to the sum of angle differences. */
angle_summer += (bend_angle[icomb] - chi[type]);
}
if (EFLAG) eimproper = (1.0/6.0) *k[type] * powint(angle_summer,6);
/*
printf("The tags: %d-%d-%d-%d, of type %d .\n",atom->tag[i1],atom->tag[i2],atom->tag[i3],atom->tag[i4],type);
// printf("The coordinates of the first: %f, %f, %f.\n", x[i1][0], x[i1][1], x[i1][2]);
// printf("The coordinates of the second: %f, %f, %f.\n", x[i2][0], x[i2][1], x[i2][2]);
// printf("The coordinates of the third: %f, %f, %f.\n", x[i3][0], x[i3][1], x[i3][2]);
// printf("The coordinates of the fourth: %f, %f, %f.\n", x[i4][0], x[i4][1], x[i4][2]);
printf("The angles are: %f / %f / %f equilibrium: %f.\n", bend_angle[0], bend_angle[1], bend_angle[2],chi[type]);
printf("The energy of the improper: %f with prefactor %f.\n", eimproper,(1.0/6.0)*k[type]);
printf("The sum of the angles: %f.\n", angle_summer);
*/
/* Force calculation acting on all atoms.
Calculate the derivatives of the potential. */
angfac = k[type] * powint(angle_summer,5);
f1[0] = 0.0; f1[1] = 0.0; f1[2] = 0.0;
f3[0] = 0.0; f3[1] = 0.0; f3[2] = 0.0;
f4[0] = 0.0; f4[1] = 0.0; f4[2] = 0.0;
/* Take a loop over the three angles, defined by each triad: */
for (icomb = 0; icomb < 3; icomb ++)
{
/* Calculate the squares of the distances. */
cjiji = bvec1n[icomb] * bvec1n[icomb]; ckjkj = bvec2n[icomb] * bvec2n[icomb];
ckjji = bvec2x[icomb] * bvec1x[icomb]
+ bvec2y[icomb] * bvec1y[icomb]
+ bvec2z[icomb] * bvec1z[icomb] ;
cfact1 = angfac / (sqrt(ckjkj * cjiji));
cfact2 = ckjji / ckjkj;
cfact3 = ckjji / cjiji;
- /* Calculate the force acted on the thrid atom of the angle. */
+ /* Calculate the force acted on the third atom of the angle. */
fkx = cfact2 * bvec2x[icomb] - bvec1x[icomb];
fky = cfact2 * bvec2y[icomb] - bvec1y[icomb];
fkz = cfact2 * bvec2z[icomb] - bvec1z[icomb];
/* Calculate the force acted on the first atom of the angle. */
fix = bvec2x[icomb] - cfact3 * bvec1x[icomb];
fiy = bvec2y[icomb] - cfact3 * bvec1y[icomb];
fiz = bvec2z[icomb] - cfact3 * bvec1z[icomb];
/* Finally, calculate the force acted on the middle atom of the angle.*/
fjx = - fix - fkx; fjy = - fiy - fky; fjz = - fiz - fkz;
/* Consider the appropriate scaling of the forces: */
fix *= cfact1; fiy *= cfact1; fiz *= cfact1;
fjx *= cfact1; fjy *= cfact1; fjz *= cfact1;
fkx *= cfact1; fky *= cfact1; fkz *= cfact1;
if (at1[icomb] == i1) {f1[0] += fix; f1[1] += fiy; f1[2] += fiz;}
else if (at2[icomb] == i1) {f1[0] += fjx; f1[1] += fjy; f1[2] += fjz;}
else if (at3[icomb] == i1) {f1[0] += fkx; f1[1] += fky; f1[2] += fkz;}
if (at1[icomb] == i3) {f3[0] += fix; f3[1] += fiy; f3[2] += fiz;}
else if (at2[icomb] == i3) {f3[0] += fjx; f3[1] += fjy; f3[2] += fjz;}
else if (at3[icomb] == i3) {f3[0] += fkx; f3[1] += fky; f3[2] += fkz;}
if (at1[icomb] == i4) {f4[0] += fix; f4[1] += fiy; f4[2] += fiz;}
else if (at2[icomb] == i4) {f4[0] += fjx; f4[1] += fjy; f4[2] += fjz;}
else if (at3[icomb] == i4) {f4[0] += fkx; f4[1] += fky; f4[2] += fkz;}
/* Store the contribution to the global arrays: */
/* Take the id of the atom from the at1[icomb] element, i1 = at1[icomb]. */
if (NEWTON_BOND || at1[icomb] < nlocal) {
f[at1[icomb]][0] += fix;
f[at1[icomb]][1] += fiy;
f[at1[icomb]][2] += fiz;
}
/* Take the id of the atom from the at2[icomb] element, i2 = at2[icomb]. */
if (NEWTON_BOND || at2[icomb] < nlocal) {
f[at2[icomb]][0] += fjx;
f[at2[icomb]][1] += fjy;
f[at2[icomb]][2] += fjz;
}
/* Take the id of the atom from the at3[icomb] element, i3 = at3[icomb]. */
if (NEWTON_BOND || at3[icomb] < nlocal) {
f[at3[icomb]][0] += fkx;
f[at3[icomb]][1] += fky;
f[at3[icomb]][2] += fkz;
}
}
if (EVFLAG)
ev_tally_thr(this,i1,i2,i3,i4,nlocal,NEWTON_BOND,eimproper,f1,f3,f4,
vb1x,vb1y,vb1z,vb2x,vb2y,vb2z,vb3x,vb3y,vb3z,thr);
}
}
diff --git a/src/USER-OMP/pair_lj_sdk_coul_long_omp.h b/src/USER-OMP/pair_lj_sdk_coul_long_omp.h
index a615efb50..1886d2c7b 100644
--- a/src/USER-OMP/pair_lj_sdk_coul_long_omp.h
+++ b/src/USER-OMP/pair_lj_sdk_coul_long_omp.h
@@ -1,49 +1,48 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Axel Kohlmeyer (Temple U)
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/sdk/coul/long/omp,PairLJSDKCoulLongOMP)
-PairStyle(cg/cmm/coul/long/omp,PairLJSDKCoulLongOMP)
#else
#ifndef LMP_PAIR_LJ_SDK_COUL_LONG_OMP_H
#define LMP_PAIR_LJ_SDK_COUL_LONG_OMP_H
#include "pair_lj_sdk_coul_long.h"
#include "thr_omp.h"
namespace LAMMPS_NS {
class PairLJSDKCoulLongOMP : public PairLJSDKCoulLong, public ThrOMP {
public:
PairLJSDKCoulLongOMP(class LAMMPS *);
virtual void compute(int, int);
virtual double memory_usage();
private:
template <int EVFLAG, int EFLAG, int NEWTON_PAIR>
void eval_thr(int ifrom, int ito, ThrData * const thr);
};
}
#endif
#endif
diff --git a/src/USER-OMP/pair_lj_sdk_coul_msm_omp.h b/src/USER-OMP/pair_lj_sdk_coul_msm_omp.h
index 9e4a922c3..9841408b8 100644
--- a/src/USER-OMP/pair_lj_sdk_coul_msm_omp.h
+++ b/src/USER-OMP/pair_lj_sdk_coul_msm_omp.h
@@ -1,57 +1,56 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Axel Kohlmeyer (Temple U)
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/sdk/coul/msm/omp,PairLJSDKCoulMSMOMP)
-PairStyle(cg/cmm/coul/msm/omp,PairLJSDKCoulMSMOMP)
#else
#ifndef LMP_PAIR_LJ_SDK_COUL_MSM_OMP_H
#define LMP_PAIR_LJ_SDK_COUL_MSM_OMP_H
#include "pair_lj_sdk_coul_msm.h"
#include "thr_omp.h"
namespace LAMMPS_NS {
class PairLJSDKCoulMSMOMP : public PairLJSDKCoulMSM, public ThrOMP {
public:
PairLJSDKCoulMSMOMP(class LAMMPS *);
virtual void compute(int, int);
virtual double memory_usage();
private:
template <int EVFLAG, int EFLAG, int NEWTON_PAIR>
void eval_msm_thr(int ifrom, int ito, ThrData * const thr);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Must use 'kspace_modify pressure/scalar no' with OMP MSM Pair styles
The kspace scalar pressure option is not (yet) compatible with OMP MSM Pair styles.
-*/
\ No newline at end of file
+*/
diff --git a/src/USER-OMP/pair_lj_sdk_omp.h b/src/USER-OMP/pair_lj_sdk_omp.h
index c3837fb68..36c913252 100644
--- a/src/USER-OMP/pair_lj_sdk_omp.h
+++ b/src/USER-OMP/pair_lj_sdk_omp.h
@@ -1,49 +1,48 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Axel Kohlmeyer (Temple U)
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(lj/sdk/omp,PairLJSDKOMP)
-PairStyle(cg/cmm/omp,PairLJSDKOMP)
#else
#ifndef LMP_PAIR_LJ_SDK_OMP_H
#define LMP_PAIR_LJ_SDK_OMP_H
#include "pair_lj_sdk.h"
#include "thr_omp.h"
namespace LAMMPS_NS {
class PairLJSDKOMP : public PairLJSDK, public ThrOMP {
public:
PairLJSDKOMP(class LAMMPS *);
virtual void compute(int, int);
virtual double memory_usage();
private:
template <int EVFLAG, int EFLAG, int NEWTON_PAIR>
void eval_thr(int ifrom, int ito, ThrData * const thr);
};
}
#endif
#endif
diff --git a/src/USER-REAXC/compute_spec_atom.cpp b/src/USER-REAXC/compute_spec_atom.cpp
index 4af8efcae..164ce8720 100644
--- a/src/USER-REAXC/compute_spec_atom.cpp
+++ b/src/USER-REAXC/compute_spec_atom.cpp
@@ -1,648 +1,648 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Labo0ratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include <string.h>
#include "compute_spec_atom.h"
#include "math_extra.h"
#include "atom.h"
#include "update.h"
#include "force.h"
#include "domain.h"
#include "memory.h"
#include "error.h"
#include "reaxc_defs.h"
#include "reaxc_types.h"
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
using namespace LAMMPS_NS;
enum{KEYWORD,COMPUTE,FIX,VARIABLE};
/* ---------------------------------------------------------------------- */
ComputeSpecAtom::ComputeSpecAtom(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg)
{
if (narg < 4) error->all(FLERR,"Illegal compute reax/c/atom command");
peratom_flag = 1;
nvalues = narg - 3;
if (nvalues == 1) size_peratom_cols = 0;
else size_peratom_cols = nvalues;
// Initiate reaxc
reaxc = (PairReaxC *) force->pair_match("reax/c",1);
if (reaxc == NULL)
reaxc = (PairReaxC *) force->pair_match("reax/c/kk",1);
pack_choice = new FnPtrPack[nvalues];
int i;
for (int iarg = 3; iarg < narg; iarg++) {
i = iarg-3;
// standard lammps attributes
if (strcmp(arg[iarg],"q") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_q;
} else if (strcmp(arg[iarg],"x") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_x;
} else if (strcmp(arg[iarg],"y") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_y;
} else if (strcmp(arg[iarg],"z") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_z;
} else if (strcmp(arg[iarg],"vx") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_vx;
} else if (strcmp(arg[iarg],"vy") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_vy;
} else if (strcmp(arg[iarg],"vz") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_vz;
- // from pair_reax_c
+ // from pair_reaxc
} else if (strcmp(arg[iarg],"abo01") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo01;
} else if (strcmp(arg[iarg],"abo02") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo02;
} else if (strcmp(arg[iarg],"abo03") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo03;
} else if (strcmp(arg[iarg],"abo04") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo04;
} else if (strcmp(arg[iarg],"abo05") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo05;
} else if (strcmp(arg[iarg],"abo06") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo06;
} else if (strcmp(arg[iarg],"abo07") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo07;
} else if (strcmp(arg[iarg],"abo08") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo08;
} else if (strcmp(arg[iarg],"abo09") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo09;
} else if (strcmp(arg[iarg],"abo10") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo10;
} else if (strcmp(arg[iarg],"abo11") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo11;
} else if (strcmp(arg[iarg],"abo12") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo12;
} else if (strcmp(arg[iarg],"abo13") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo13;
} else if (strcmp(arg[iarg],"abo14") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo14;
} else if (strcmp(arg[iarg],"abo15") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo15;
} else if (strcmp(arg[iarg],"abo16") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo16;
} else if (strcmp(arg[iarg],"abo17") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo17;
} else if (strcmp(arg[iarg],"abo18") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo18;
} else if (strcmp(arg[iarg],"abo19") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo19;
} else if (strcmp(arg[iarg],"abo20") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo20;
} else if (strcmp(arg[iarg],"abo21") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo21;
} else if (strcmp(arg[iarg],"abo22") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo22;
} else if (strcmp(arg[iarg],"abo23") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo23;
} else if (strcmp(arg[iarg],"abo24") == 0) {
pack_choice[i] = &ComputeSpecAtom::pack_abo24;
} else error->all(FLERR,"Invalid keyword in compute reax/c/atom command");
}
nmax = 0;
vector = NULL;
array = NULL;
}
/* ---------------------------------------------------------------------- */
ComputeSpecAtom::~ComputeSpecAtom()
{
delete [] pack_choice;
memory->destroy(vector);
memory->destroy(array);
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::compute_peratom()
{
invoked_peratom = update->ntimestep;
// grow vector or array if necessary
if (atom->nmax > nmax) {
nmax = atom->nmax;
if (nvalues == 1) {
memory->destroy(vector);
memory->create(vector,nmax,"property/atom:vector");
vector_atom = vector;
} else {
memory->destroy(array);
memory->create(array,nmax,nvalues,"property/atom:array");
array_atom = array;
}
}
// fill vector or array with per-atom values
if (nvalues == 1) {
buf = vector;
(this->*pack_choice[0])(0);
} else {
if (nmax > 0) {
buf = &array[0][0];
for (int n = 0; n < nvalues; n++)
(this->*pack_choice[n])(n);
}
}
}
/* ----------------------------------------------------------------------
memory usage of local atom-based array
------------------------------------------------------------------------- */
double ComputeSpecAtom::memory_usage()
{
double bytes = nmax*nvalues * sizeof(double);
return bytes;
}
/* ----------------------------------------------------------------------
one method for every keyword compute property/atom can output
the atom property is packed into buf starting at n with stride nvalues
customize a new keyword by adding a method
------------------------------------------------------------------------- */
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_q(int n)
{
double *q = atom->q;
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = q[i];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_x(int n)
{
double **x = atom->x;
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = x[i][0];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_y(int n)
{
double **x = atom->x;
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = x[i][1];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_z(int n)
{
double **x = atom->x;
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = x[i][2];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_vx(int n)
{
double **v = atom->v;
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = v[i][0];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_vy(int n)
{
double **v = atom->v;
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = v[i][1];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_vz(int n)
{
double **v = atom->v;
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = v[i][2];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo01(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][0];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo02(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][1];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo03(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][2];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo04(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][3];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo05(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][4];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo06(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][5];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo07(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][6];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo08(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][7];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo09(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][8];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo10(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][9];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo11(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][10];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo12(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][11];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo13(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][12];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo14(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][13];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo15(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][14];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo16(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][15];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo17(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][16];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo18(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][17];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo19(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][18];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo20(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][19];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo21(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][20];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo22(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][21];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo23(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][22];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
void ComputeSpecAtom::pack_abo24(int n)
{
int *mask = atom->mask;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) buf[n] = reaxc->tmpbo[i][23];
else buf[n] = 0.0;
n += nvalues;
}
}
/* ---------------------------------------------------------------------- */
diff --git a/src/USER-REAXC/fix_qeq_reax.cpp b/src/USER-REAXC/fix_qeq_reax.cpp
index 26cf03f60..01ecd9d39 100644
--- a/src/USER-REAXC/fix_qeq_reax.cpp
+++ b/src/USER-REAXC/fix_qeq_reax.cpp
@@ -1,1041 +1,1041 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Hasan Metin Aktulga, Purdue University
(now at Lawrence Berkeley National Laboratory, hmaktulga@lbl.gov)
Hybrid and sub-group capabilities: Ray Shan (Sandia)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "fix_qeq_reax.h"
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "atom.h"
#include "comm.h"
#include "domain.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "update.h"
#include "force.h"
#include "group.h"
#include "pair.h"
#include "respa.h"
#include "memory.h"
#include "citeme.h"
#include "error.h"
#include "reaxc_defs.h"
using namespace LAMMPS_NS;
using namespace FixConst;
#define EV_TO_KCAL_PER_MOL 14.4
//#define DANGER_ZONE 0.95
//#define LOOSE_ZONE 0.7
#define SQR(x) ((x)*(x))
#define CUBE(x) ((x)*(x)*(x))
#define MIN_NBRS 100
static const char cite_fix_qeq_reax[] =
"fix qeq/reax command:\n\n"
"@Article{Aktulga12,\n"
" author = {H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama},\n"
" title = {Parallel reactive molecular dynamics: Numerical methods and algorithmic techniques},\n"
" journal = {Parallel Computing},\n"
" year = 2012,\n"
" volume = 38,\n"
" pages = {245--259}\n"
"}\n\n";
/* ---------------------------------------------------------------------- */
FixQEqReax::FixQEqReax(LAMMPS *lmp, int narg, char **arg) :
Fix(lmp, narg, arg)
{
if (lmp->citeme) lmp->citeme->add(cite_fix_qeq_reax);
if (narg != 8) error->all(FLERR,"Illegal fix qeq/reax command");
nevery = force->inumeric(FLERR,arg[3]);
if (nevery <= 0) error->all(FLERR,"Illegal fix qeq/reax command");
swa = force->numeric(FLERR,arg[4]);
swb = force->numeric(FLERR,arg[5]);
tolerance = force->numeric(FLERR,arg[6]);
pertype_parameters(arg[7]);
shld = NULL;
n = n_cap = 0;
N = nmax = 0;
m_fill = m_cap = 0;
pack_flag = 0;
s = NULL;
t = NULL;
nprev = 5;
Hdia_inv = NULL;
b_s = NULL;
b_t = NULL;
b_prc = NULL;
b_prm = NULL;
// CG
p = NULL;
q = NULL;
r = NULL;
d = NULL;
// H matrix
H.firstnbr = NULL;
H.numnbrs = NULL;
H.jlist = NULL;
H.val = NULL;
comm_forward = comm_reverse = 1;
// perform initial allocation of atom-based arrays
// register with Atom class
s_hist = t_hist = NULL;
grow_arrays(atom->nmax);
atom->add_callback(0);
for( int i = 0; i < atom->nmax; i++ )
for (int j = 0; j < nprev; ++j )
s_hist[i][j] = t_hist[i][j] = 0;
reaxc = NULL;
reaxc = (PairReaxC *) force->pair_match("reax/c",1);
}
/* ---------------------------------------------------------------------- */
FixQEqReax::~FixQEqReax()
{
// unregister callbacks to this fix from Atom class
if (copymode) return;
atom->delete_callback(id,0);
memory->destroy(s_hist);
memory->destroy(t_hist);
deallocate_storage();
deallocate_matrix();
memory->destroy(shld);
if (!reaxflag) {
memory->destroy(chi);
memory->destroy(eta);
memory->destroy(gamma);
}
}
/* ---------------------------------------------------------------------- */
int FixQEqReax::setmask()
{
int mask = 0;
mask |= PRE_FORCE;
mask |= PRE_FORCE_RESPA;
mask |= MIN_PRE_FORCE;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::pertype_parameters(char *arg)
{
if (strcmp(arg,"reax/c") == 0) {
reaxflag = 1;
Pair *pair = force->pair_match("reax/c",1);
if (pair == NULL)
pair = force->pair_match("reax/c/kk",1);
if (pair == NULL) error->all(FLERR,"No pair reax/c for fix qeq/reax");
int tmp;
chi = (double *) pair->extract("chi",tmp);
eta = (double *) pair->extract("eta",tmp);
gamma = (double *) pair->extract("gamma",tmp);
if (chi == NULL || eta == NULL || gamma == NULL)
error->all(FLERR,
"Fix qeq/reax could not extract params from pair reax/c");
return;
}
int i,itype,ntypes;
double v1,v2,v3;
FILE *pf;
reaxflag = 0;
ntypes = atom->ntypes;
memory->create(chi,ntypes+1,"qeq/reax:chi");
memory->create(eta,ntypes+1,"qeq/reax:eta");
memory->create(gamma,ntypes+1,"qeq/reax:gamma");
if (comm->me == 0) {
if ((pf = fopen(arg,"r")) == NULL)
error->one(FLERR,"Fix qeq/reax parameter file could not be found");
for (i = 1; i <= ntypes && !feof(pf); i++) {
fscanf(pf,"%d %lg %lg %lg",&itype,&v1,&v2,&v3);
if (itype < 1 || itype > ntypes)
error->one(FLERR,"Fix qeq/reax invalid atom type in param file");
chi[itype] = v1;
eta[itype] = v2;
gamma[itype] = v3;
}
if (i <= ntypes) error->one(FLERR,"Invalid param file for fix qeq/reax");
fclose(pf);
}
MPI_Bcast(&chi[1],ntypes,MPI_DOUBLE,0,world);
MPI_Bcast(&eta[1],ntypes,MPI_DOUBLE,0,world);
MPI_Bcast(&gamma[1],ntypes,MPI_DOUBLE,0,world);
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::allocate_storage()
{
nmax = atom->nmax;
memory->create(s,nmax,"qeq:s");
memory->create(t,nmax,"qeq:t");
memory->create(Hdia_inv,nmax,"qeq:Hdia_inv");
memory->create(b_s,nmax,"qeq:b_s");
memory->create(b_t,nmax,"qeq:b_t");
memory->create(b_prc,nmax,"qeq:b_prc");
memory->create(b_prm,nmax,"qeq:b_prm");
memory->create(p,nmax,"qeq:p");
memory->create(q,nmax,"qeq:q");
memory->create(r,nmax,"qeq:r");
memory->create(d,nmax,"qeq:d");
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::deallocate_storage()
{
memory->destroy(s);
memory->destroy(t);
memory->destroy( Hdia_inv );
memory->destroy( b_s );
memory->destroy( b_t );
memory->destroy( b_prc );
memory->destroy( b_prm );
memory->destroy( p );
memory->destroy( q );
memory->destroy( r );
memory->destroy( d );
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::reallocate_storage()
{
deallocate_storage();
allocate_storage();
init_storage();
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::allocate_matrix()
{
int i,ii,inum,m;
int *ilist, *numneigh;
int mincap;
double safezone;
if( reaxflag ) {
mincap = reaxc->system->mincap;
safezone = reaxc->system->safezone;
} else {
mincap = MIN_CAP;
safezone = SAFE_ZONE;
}
n = atom->nlocal;
n_cap = MAX( (int)(n * safezone), mincap );
// determine the total space for the H matrix
if (reaxc) {
inum = reaxc->list->inum;
ilist = reaxc->list->ilist;
numneigh = reaxc->list->numneigh;
} else {
inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
}
m = 0;
for( ii = 0; ii < inum; ii++ ) {
i = ilist[ii];
m += numneigh[i];
}
m_cap = MAX( (int)(m * safezone), mincap * MIN_NBRS );
H.n = n_cap;
H.m = m_cap;
memory->create(H.firstnbr,n_cap,"qeq:H.firstnbr");
memory->create(H.numnbrs,n_cap,"qeq:H.numnbrs");
memory->create(H.jlist,m_cap,"qeq:H.jlist");
memory->create(H.val,m_cap,"qeq:H.val");
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::deallocate_matrix()
{
memory->destroy( H.firstnbr );
memory->destroy( H.numnbrs );
memory->destroy( H.jlist );
memory->destroy( H.val );
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::reallocate_matrix()
{
deallocate_matrix();
allocate_matrix();
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::init()
{
if (!atom->q_flag) error->all(FLERR,"Fix qeq/reax requires atom attribute q");
ngroup = group->count(igroup);
if (ngroup == 0) error->all(FLERR,"Fix qeq/reax group has no atoms");
/*
if (reaxc)
if (ngroup != reaxc->ngroup)
error->all(FLERR,"Fix qeq/reax group and pair reax/c have "
"different numbers of atoms");
*/
// need a half neighbor list w/ Newton off and ghost neighbors
// built whenever re-neighboring occurs
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->pair = 0;
neighbor->requests[irequest]->fix = 1;
neighbor->requests[irequest]->newton = 2;
neighbor->requests[irequest]->ghost = 1;
init_shielding();
init_taper();
if (strstr(update->integrate_style,"respa"))
nlevels_respa = ((Respa *) update->integrate)->nlevels;
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::init_list(int id, NeighList *ptr)
{
list = ptr;
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::init_shielding()
{
int i,j;
int ntypes;
ntypes = atom->ntypes;
if (shld == NULL)
- memory->create(shld,ntypes+1,ntypes+1,"qeq:shileding");
+ memory->create(shld,ntypes+1,ntypes+1,"qeq:shielding");
for( i = 1; i <= ntypes; ++i )
for( j = 1; j <= ntypes; ++j )
shld[i][j] = pow( gamma[i] * gamma[j], -1.5 );
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::init_taper()
{
double d7, swa2, swa3, swb2, swb3;
if (fabs(swa) > 0.01 && comm->me == 0)
error->warning(FLERR,"Fix qeq/reax has non-zero lower Taper radius cutoff");
if (swb < 0)
error->all(FLERR, "Fix qeq/reax has negative upper Taper radius cutoff");
else if (swb < 5 && comm->me == 0)
error->warning(FLERR,"Fix qeq/reax has very low Taper radius cutoff");
d7 = pow( swb - swa, 7 );
swa2 = SQR( swa );
swa3 = CUBE( swa );
swb2 = SQR( swb );
swb3 = CUBE( swb );
Tap[7] = 20.0 / d7;
Tap[6] = -70.0 * (swa + swb) / d7;
Tap[5] = 84.0 * (swa2 + 3.0*swa*swb + swb2) / d7;
Tap[4] = -35.0 * (swa3 + 9.0*swa2*swb + 9.0*swa*swb2 + swb3 ) / d7;
Tap[3] = 140.0 * (swa3*swb + 3.0*swa2*swb2 + swa*swb3 ) / d7;
Tap[2] =-210.0 * (swa3*swb2 + swa2*swb3) / d7;
Tap[1] = 140.0 * swa3 * swb3 / d7;
Tap[0] = (-35.0*swa3*swb2*swb2 + 21.0*swa2*swb3*swb2 +
7.0*swa*swb3*swb3 + swb3*swb3*swb ) / d7;
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::setup_pre_force(int vflag)
{
// should not be needed
// neighbor->build_one(list);
deallocate_storage();
allocate_storage();
init_storage();
deallocate_matrix();
allocate_matrix();
pre_force(vflag);
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::setup_pre_force_respa(int vflag, int ilevel)
{
if (ilevel < nlevels_respa-1) return;
setup_pre_force(vflag);
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::init_storage()
{
int NN;
if (reaxc)
NN = reaxc->list->inum + reaxc->list->gnum;
else
NN = list->inum + list->gnum;
for( int i = 0; i < NN; i++ ) {
Hdia_inv[i] = 1. / eta[atom->type[i]];
b_s[i] = -chi[atom->type[i]];
b_t[i] = -1.0;
b_prc[i] = 0;
b_prm[i] = 0;
s[i] = t[i] = 0;
}
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::pre_force(int vflag)
{
double t_start, t_end;
if (update->ntimestep % nevery) return;
if( comm->me == 0 ) t_start = MPI_Wtime();
n = atom->nlocal;
N = atom->nlocal + atom->nghost;
// grow arrays if necessary
// need to be atom->nmax in length
if( atom->nmax > nmax ) reallocate_storage();
if( n > n_cap*DANGER_ZONE || m_fill > m_cap*DANGER_ZONE )
reallocate_matrix();
init_matvec();
matvecs = CG(b_s, s); // CG on s - parallel
matvecs += CG(b_t, t); // CG on t - parallel
calculate_Q();
if( comm->me == 0 ) {
t_end = MPI_Wtime();
qeq_time = t_end - t_start;
}
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::pre_force_respa(int vflag, int ilevel, int iloop)
{
if (ilevel == nlevels_respa-1) pre_force(vflag);
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::min_pre_force(int vflag)
{
pre_force(vflag);
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::init_matvec()
{
/* fill-in H matrix */
compute_H();
int nn, ii, i;
int *ilist;
if (reaxc) {
nn = reaxc->list->inum;
ilist = reaxc->list->ilist;
} else {
nn = list->inum;
ilist = list->ilist;
}
for( ii = 0; ii < nn; ++ii ) {
i = ilist[ii];
if (atom->mask[i] & groupbit) {
/* init pre-conditioner for H and init solution vectors */
Hdia_inv[i] = 1. / eta[ atom->type[i] ];
b_s[i] = -chi[ atom->type[i] ];
b_t[i] = -1.0;
/* linear extrapolation for s & t from previous solutions */
//s[i] = 2 * s_hist[i][0] - s_hist[i][1];
//t[i] = 2 * t_hist[i][0] - t_hist[i][1];
/* quadratic extrapolation for s & t from previous solutions */
//s[i] = s_hist[i][2] + 3 * ( s_hist[i][0] - s_hist[i][1] );
t[i] = t_hist[i][2] + 3 * ( t_hist[i][0] - t_hist[i][1] );
/* cubic extrapolation for s & t from previous solutions */
s[i] = 4*(s_hist[i][0]+s_hist[i][2])-(6*s_hist[i][1]+s_hist[i][3]);
//t[i] = 4*(t_hist[i][0]+t_hist[i][2])-(6*t_hist[i][1]+t_hist[i][3]);
}
}
pack_flag = 2;
comm->forward_comm_fix(this); //Dist_vector( s );
pack_flag = 3;
comm->forward_comm_fix(this); //Dist_vector( t );
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::compute_H()
{
int inum, jnum, *ilist, *jlist, *numneigh, **firstneigh;
int i, j, ii, jj, flag;
double **x, SMALL = 0.0001;
double dx, dy, dz, r_sqr;
int *type = atom->type;
tagint *tag = atom->tag;
x = atom->x;
int *mask = atom->mask;
if (reaxc) {
inum = reaxc->list->inum;
ilist = reaxc->list->ilist;
numneigh = reaxc->list->numneigh;
firstneigh = reaxc->list->firstneigh;
} else {
inum = list->inum;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
}
// fill in the H matrix
m_fill = 0;
r_sqr = 0;
for( ii = 0; ii < inum; ii++ ) {
i = ilist[ii];
if (mask[i] & groupbit) {
jlist = firstneigh[i];
jnum = numneigh[i];
H.firstnbr[i] = m_fill;
for( jj = 0; jj < jnum; jj++ ) {
j = jlist[jj];
dx = x[j][0] - x[i][0];
dy = x[j][1] - x[i][1];
dz = x[j][2] - x[i][2];
r_sqr = SQR(dx) + SQR(dy) + SQR(dz);
flag = 0;
if (r_sqr <= SQR(swb)) {
if (j < n) flag = 1;
else if (tag[i] < tag[j]) flag = 1;
else if (tag[i] == tag[j]) {
if (dz > SMALL) flag = 1;
else if (fabs(dz) < SMALL) {
if (dy > SMALL) flag = 1;
else if (fabs(dy) < SMALL && dx > SMALL)
flag = 1;
}
}
}
if( flag ) {
H.jlist[m_fill] = j;
H.val[m_fill] = calculate_H( sqrt(r_sqr), shld[type[i]][type[j]] );
m_fill++;
}
}
H.numnbrs[i] = m_fill - H.firstnbr[i];
}
}
if (m_fill >= H.m) {
char str[128];
sprintf(str,"H matrix size has been exceeded: m_fill=%d H.m=%d\n",
m_fill, H.m );
error->warning(FLERR,str);
error->all(FLERR,"Fix qeq/reax has insufficient QEq matrix size");
}
}
/* ---------------------------------------------------------------------- */
double FixQEqReax::calculate_H( double r, double gamma )
{
double Taper, denom;
Taper = Tap[7] * r + Tap[6];
Taper = Taper * r + Tap[5];
Taper = Taper * r + Tap[4];
Taper = Taper * r + Tap[3];
Taper = Taper * r + Tap[2];
Taper = Taper * r + Tap[1];
Taper = Taper * r + Tap[0];
denom = r * r * r + gamma;
denom = pow(denom,0.3333333333333);
return Taper * EV_TO_KCAL_PER_MOL / denom;
}
/* ---------------------------------------------------------------------- */
int FixQEqReax::CG( double *b, double *x )
{
int i, j, imax;
double tmp, alpha, beta, b_norm;
double sig_old, sig_new;
int nn, jj;
int *ilist;
if (reaxc) {
nn = reaxc->list->inum;
ilist = reaxc->list->ilist;
} else {
nn = list->inum;
ilist = list->ilist;
}
imax = 200;
pack_flag = 1;
sparse_matvec( &H, x, q );
comm->reverse_comm_fix( this ); //Coll_Vector( q );
vector_sum( r , 1., b, -1., q, nn );
for( jj = 0; jj < nn; ++jj ) {
j = ilist[jj];
if (atom->mask[j] & groupbit)
d[j] = r[j] * Hdia_inv[j]; //pre-condition
}
b_norm = parallel_norm( b, nn );
sig_new = parallel_dot( r, d, nn);
for( i = 1; i < imax && sqrt(sig_new) / b_norm > tolerance; ++i ) {
comm->forward_comm_fix(this); //Dist_vector( d );
sparse_matvec( &H, d, q );
comm->reverse_comm_fix(this); //Coll_vector( q );
tmp = parallel_dot( d, q, nn);
alpha = sig_new / tmp;
vector_add( x, alpha, d, nn );
vector_add( r, -alpha, q, nn );
// pre-conditioning
for( jj = 0; jj < nn; ++jj ) {
j = ilist[jj];
if (atom->mask[j] & groupbit)
p[j] = r[j] * Hdia_inv[j];
}
sig_old = sig_new;
sig_new = parallel_dot( r, p, nn);
beta = sig_new / sig_old;
vector_sum( d, 1., p, beta, d, nn );
}
if (i >= imax && comm->me == 0) {
char str[128];
sprintf(str,"Fix qeq/reax CG convergence failed after %d iterations "
"at " BIGINT_FORMAT " step",i,update->ntimestep);
error->warning(FLERR,str);
}
return i;
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::sparse_matvec( sparse_matrix *A, double *x, double *b )
{
int i, j, itr_j;
int nn, NN, ii;
int *ilist;
if (reaxc) {
nn = reaxc->list->inum;
NN = reaxc->list->inum + reaxc->list->gnum;
ilist = reaxc->list->ilist;
} else {
nn = list->inum;
NN = list->inum + list->gnum;
ilist = list->ilist;
}
for( ii = 0; ii < nn; ++ii ) {
i = ilist[ii];
if (atom->mask[i] & groupbit)
b[i] = eta[ atom->type[i] ] * x[i];
}
for( ii = nn; ii < NN; ++ii ) {
i = ilist[ii];
if (atom->mask[i] & groupbit)
b[i] = 0;
}
for( ii = 0; ii < nn; ++ii ) {
i = ilist[ii];
if (atom->mask[i] & groupbit) {
for( itr_j=A->firstnbr[i]; itr_j<A->firstnbr[i]+A->numnbrs[i]; itr_j++) {
j = A->jlist[itr_j];
b[i] += A->val[itr_j] * x[j];
b[j] += A->val[itr_j] * x[i];
}
}
}
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::calculate_Q()
{
int i, k;
double u, s_sum, t_sum;
double *q = atom->q;
int nn, ii;
int *ilist;
if (reaxc) {
nn = reaxc->list->inum;
ilist = reaxc->list->ilist;
} else {
nn = list->inum;
ilist = list->ilist;
}
s_sum = parallel_vector_acc( s, nn );
t_sum = parallel_vector_acc( t, nn);
u = s_sum / t_sum;
for( ii = 0; ii < nn; ++ii ) {
i = ilist[ii];
if (atom->mask[i] & groupbit) {
q[i] = s[i] - u * t[i];
/* backup s & t */
for( k = 4; k > 0; --k ) {
s_hist[i][k] = s_hist[i][k-1];
t_hist[i][k] = t_hist[i][k-1];
}
s_hist[i][0] = s[i];
t_hist[i][0] = t[i];
}
}
pack_flag = 4;
comm->forward_comm_fix( this ); //Dist_vector( atom->q );
}
/* ---------------------------------------------------------------------- */
int FixQEqReax::pack_forward_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int m;
if( pack_flag == 1)
for(m = 0; m < n; m++) buf[m] = d[list[m]];
else if( pack_flag == 2 )
for(m = 0; m < n; m++) buf[m] = s[list[m]];
else if( pack_flag == 3 )
for(m = 0; m < n; m++) buf[m] = t[list[m]];
else if( pack_flag == 4 )
for(m = 0; m < n; m++) buf[m] = atom->q[list[m]];
return n;
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::unpack_forward_comm(int n, int first, double *buf)
{
int i, m;
if( pack_flag == 1)
for(m = 0, i = first; m < n; m++, i++) d[i] = buf[m];
else if( pack_flag == 2)
for(m = 0, i = first; m < n; m++, i++) s[i] = buf[m];
else if( pack_flag == 3)
for(m = 0, i = first; m < n; m++, i++) t[i] = buf[m];
else if( pack_flag == 4)
for(m = 0, i = first; m < n; m++, i++) atom->q[i] = buf[m];
}
/* ---------------------------------------------------------------------- */
int FixQEqReax::pack_reverse_comm(int n, int first, double *buf)
{
int i, m;
for(m = 0, i = first; m < n; m++, i++) buf[m] = q[i];
return n;
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::unpack_reverse_comm(int n, int *list, double *buf)
{
for(int m = 0; m < n; m++) q[list[m]] += buf[m];
}
/* ----------------------------------------------------------------------
memory usage of local atom-based arrays
------------------------------------------------------------------------- */
double FixQEqReax::memory_usage()
{
double bytes;
bytes = atom->nmax*nprev*2 * sizeof(double); // s_hist & t_hist
bytes += atom->nmax*11 * sizeof(double); // storage
bytes += n_cap*2 * sizeof(int); // matrix...
bytes += m_cap * sizeof(int);
bytes += m_cap * sizeof(double);
return bytes;
}
/* ----------------------------------------------------------------------
allocate fictitious charge arrays
------------------------------------------------------------------------- */
void FixQEqReax::grow_arrays(int nmax)
{
memory->grow(s_hist,nmax,nprev,"qeq:s_hist");
memory->grow(t_hist,nmax,nprev,"qeq:t_hist");
}
/* ----------------------------------------------------------------------
copy values within fictitious charge arrays
------------------------------------------------------------------------- */
void FixQEqReax::copy_arrays(int i, int j, int delflag)
{
for (int m = 0; m < nprev; m++) {
s_hist[j][m] = s_hist[i][m];
t_hist[j][m] = t_hist[i][m];
}
}
/* ----------------------------------------------------------------------
pack values in local atom-based array for exchange with another proc
------------------------------------------------------------------------- */
int FixQEqReax::pack_exchange(int i, double *buf)
{
for (int m = 0; m < nprev; m++) buf[m] = s_hist[i][m];
for (int m = 0; m < nprev; m++) buf[nprev+m] = t_hist[i][m];
return nprev*2;
}
/* ----------------------------------------------------------------------
unpack values in local atom-based array from exchange with another proc
------------------------------------------------------------------------- */
int FixQEqReax::unpack_exchange(int nlocal, double *buf)
{
for (int m = 0; m < nprev; m++) s_hist[nlocal][m] = buf[m];
for (int m = 0; m < nprev; m++) t_hist[nlocal][m] = buf[nprev+m];
return nprev*2;
}
/* ---------------------------------------------------------------------- */
double FixQEqReax::parallel_norm( double *v, int n )
{
int i;
double my_sum, norm_sqr;
int ii;
int *ilist;
if (reaxc)
ilist = reaxc->list->ilist;
else
ilist = list->ilist;
my_sum = 0.0;
norm_sqr = 0.0;
for( ii = 0; ii < n; ++ii ) {
i = ilist[ii];
if (atom->mask[i] & groupbit)
my_sum += SQR( v[i] );
}
MPI_Allreduce( &my_sum, &norm_sqr, 1, MPI_DOUBLE, MPI_SUM, world );
return sqrt( norm_sqr );
}
/* ---------------------------------------------------------------------- */
double FixQEqReax::parallel_dot( double *v1, double *v2, int n)
{
int i;
double my_dot, res;
int ii;
int *ilist;
if (reaxc)
ilist = reaxc->list->ilist;
else
ilist = list->ilist;
my_dot = 0.0;
res = 0.0;
for( ii = 0; ii < n; ++ii ) {
i = ilist[ii];
if (atom->mask[i] & groupbit)
my_dot += v1[i] * v2[i];
}
MPI_Allreduce( &my_dot, &res, 1, MPI_DOUBLE, MPI_SUM, world );
return res;
}
/* ---------------------------------------------------------------------- */
double FixQEqReax::parallel_vector_acc( double *v, int n )
{
int i;
double my_acc, res;
int ii;
int *ilist;
if (reaxc)
ilist = reaxc->list->ilist;
else
ilist = list->ilist;
my_acc = 0.0;
res = 0.0;
for( ii = 0; ii < n; ++ii ) {
i = ilist[ii];
if (atom->mask[i] & groupbit)
my_acc += v[i];
}
MPI_Allreduce( &my_acc, &res, 1, MPI_DOUBLE, MPI_SUM, world );
return res;
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::vector_sum( double* dest, double c, double* v,
double d, double* y, int k )
{
int kk;
int *ilist;
if (reaxc)
ilist = reaxc->list->ilist;
else
ilist = list->ilist;
for( --k; k>=0; --k ) {
kk = ilist[k];
if (atom->mask[kk] & groupbit)
dest[kk] = c * v[kk] + d * y[kk];
}
}
/* ---------------------------------------------------------------------- */
void FixQEqReax::vector_add( double* dest, double c, double* v, int k )
{
int kk;
int *ilist;
if (reaxc)
ilist = reaxc->list->ilist;
else
ilist = list->ilist;
for( --k; k>=0; --k ) {
kk = ilist[k];
if (atom->mask[kk] & groupbit)
dest[kk] += c * v[kk];
}
}
diff --git a/src/USER-REAXC/fix_reax_c.cpp b/src/USER-REAXC/fix_reaxc.cpp
similarity index 99%
rename from src/USER-REAXC/fix_reax_c.cpp
rename to src/USER-REAXC/fix_reaxc.cpp
index e1cc4e340..df0621799 100644
--- a/src/USER-REAXC/fix_reax_c.cpp
+++ b/src/USER-REAXC/fix_reaxc.cpp
@@ -1,161 +1,161 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Hasan Metin Aktulga, Purdue University
(now at Lawrence Berkeley National Laboratory, hmaktulga@lbl.gov)
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
------------------------------------------------------------------------- */
-#include "fix_reax_c.h"
+#include "fix_reaxc.h"
#include "atom.h"
#include "pair.h"
#include "comm.h"
#include "memory.h"
using namespace LAMMPS_NS;
using namespace FixConst;
#define MAX_REAX_BONDS 30
#define MIN_REAX_BONDS 15
#define MIN_REAX_HBONDS 25
/* ---------------------------------------------------------------------- */
FixReaxC::FixReaxC(LAMMPS *lmp,int narg, char **arg) :
Fix(lmp, narg, arg)
{
// perform initial allocation of atom-based arrays
// register with atom class
num_bonds = NULL;
num_hbonds = NULL;
grow_arrays(atom->nmax);
atom->add_callback(0);
// initialize arrays to MIN so atom migration is OK the 1st time
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++)
num_bonds[i] = num_hbonds[i] = MIN_REAX_BONDS;
// set comm sizes needed by this fix
comm_forward = 1;
}
/* ---------------------------------------------------------------------- */
FixReaxC::~FixReaxC()
{
// unregister this fix so atom class doesn't invoke it any more
atom->delete_callback(id,0);
// delete locally stored arrays
memory->destroy(num_bonds);
memory->destroy(num_hbonds);
}
/* ---------------------------------------------------------------------- */
int FixReaxC::setmask()
{
int mask = 0;
return mask;
}
/* ----------------------------------------------------------------------
memory usage of local atom-based arrays
------------------------------------------------------------------------- */
double FixReaxC::memory_usage()
{
int nmax = atom->nmax;
double bytes = nmax * 2 * sizeof(int);
return bytes;
}
/* ----------------------------------------------------------------------
allocate local atom-based arrays
------------------------------------------------------------------------- */
void FixReaxC::grow_arrays(int nmax)
{
memory->grow(num_bonds,nmax,"reaxc:num_bonds");
memory->grow(num_hbonds,nmax,"reaxc:num_hbonds");
}
/* ----------------------------------------------------------------------
copy values within local atom-based arrays
------------------------------------------------------------------------- */
void FixReaxC::copy_arrays(int i, int j, int delflag)
{
num_bonds[j] = num_bonds[i];
num_hbonds[j] = num_hbonds[i];
}
/* ----------------------------------------------------------------------
pack values in local atom-based arrays for exchange with another proc
------------------------------------------------------------------------- */
int FixReaxC::pack_exchange(int i, double *buf)
{
buf[0] = num_bonds[i];
buf[1] = num_hbonds[i];
return 2;
}
/* ----------------------------------------------------------------------
unpack values in local atom-based arrays from exchange with another proc
------------------------------------------------------------------------- */
int FixReaxC::unpack_exchange(int nlocal, double *buf)
{
num_bonds[nlocal] = static_cast<int> (buf[0]);
num_hbonds[nlocal] = static_cast<int> (buf[1]);
return 2;
}
/* ---------------------------------------------------------------------- */
int FixReaxC::pack_forward_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m++] = num_bonds[j];
}
return m;
}
/* ---------------------------------------------------------------------- */
void FixReaxC::unpack_forward_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++)
num_bonds[i] = static_cast<int> (buf[m++]);
}
diff --git a/src/USER-REAXC/fix_reax_c.h b/src/USER-REAXC/fix_reaxc.h
similarity index 100%
rename from src/USER-REAXC/fix_reax_c.h
rename to src/USER-REAXC/fix_reaxc.h
diff --git a/src/USER-REAXC/fix_reaxc_bonds.cpp b/src/USER-REAXC/fix_reaxc_bonds.cpp
index 543669de7..cf9e4789c 100644
--- a/src/USER-REAXC/fix_reaxc_bonds.cpp
+++ b/src/USER-REAXC/fix_reaxc_bonds.cpp
@@ -1,359 +1,359 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Ray Shan (Sandia, tnshan@sandia.gov)
------------------------------------------------------------------------- */
#include <stdlib.h>
#include <string.h>
#include "fix_ave_atom.h"
#include "fix_reaxc_bonds.h"
#include "atom.h"
#include "update.h"
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "comm.h"
#include "force.h"
#include "compute.h"
#include "input.h"
#include "variable.h"
#include "memory.h"
#include "error.h"
#include "reaxc_list.h"
#include "reaxc_types.h"
#include "reaxc_defs.h"
using namespace LAMMPS_NS;
using namespace FixConst;
/* ---------------------------------------------------------------------- */
FixReaxCBonds::FixReaxCBonds(LAMMPS *lmp, int narg, char **arg) :
Fix(lmp, narg, arg)
{
if (narg != 5) error->all(FLERR,"Illegal fix reax/c/bonds command");
MPI_Comm_rank(world,&me);
MPI_Comm_size(world,&nprocs);
ntypes = atom->ntypes;
nmax = atom->nmax;
nevery = force->inumeric(FLERR,arg[3]);
if (nevery <= 0 )
error->all(FLERR,"Illegal fix reax/c/bonds command");
if (me == 0) {
fp = fopen(arg[4],"w");
if (fp == NULL) {
char str[128];
sprintf(str,"Cannot open fix reax/c/bonds file %s",arg[4]);
error->one(FLERR,str);
}
}
if (atom->tag_consecutive() == 0)
error->all(FLERR,"Atom IDs must be consecutive for fix reax/c bonds");
abo = NULL;
neighid = NULL;
numneigh = NULL;
allocate();
}
/* ---------------------------------------------------------------------- */
FixReaxCBonds::~FixReaxCBonds()
{
MPI_Comm_rank(world,&me);
destroy();
if (me == 0) fclose(fp);
}
/* ---------------------------------------------------------------------- */
int FixReaxCBonds::setmask()
{
int mask = 0;
mask |= END_OF_STEP;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixReaxCBonds::setup(int vflag)
{
end_of_step();
}
/* ---------------------------------------------------------------------- */
void FixReaxCBonds::init()
{
reaxc = (PairReaxC *) force->pair_match("reax/c",1);
if (reaxc == NULL)
reaxc = (PairReaxC *) force->pair_match("reax/c/kk",1);
if (reaxc == NULL) error->all(FLERR,"Cannot use fix reax/c/bonds without "
"pair_style reax/c");
}
/* ---------------------------------------------------------------------- */
void FixReaxCBonds::end_of_step()
{
Output_ReaxC_Bonds(update->ntimestep,fp);
if (me == 0) fflush(fp);
}
/* ---------------------------------------------------------------------- */
void FixReaxCBonds::Output_ReaxC_Bonds(bigint ntimestep, FILE *fp)
{
int i, j;
int nbuf, nbuf_local;
int nlocal_max, numbonds, numbonds_max;
double *buf;
int nlocal = atom->nlocal;
int nlocal_tot = static_cast<int> (atom->natoms);
if (atom->nmax > nmax) {
destroy();
nmax = atom->nmax;
allocate();
}
for (i = 0; i < nmax; i++) {
numneigh[i] = 0;
for (j = 0; j < MAXREAXBOND; j++) {
neighid[i][j] = 0;
abo[i][j] = 0.0;
}
}
numbonds = 0;
FindBond(lists, numbonds);
// allocate a temporary buffer for the snapshot info
MPI_Allreduce(&numbonds,&numbonds_max,1,MPI_INT,MPI_MAX,world);
MPI_Allreduce(&nlocal,&nlocal_max,1,MPI_INT,MPI_MAX,world);
nbuf = 1+(numbonds_max*2+10)*nlocal_max;
memory->create(buf,nbuf,"reax/c/bonds:buf");
for (i = 0; i < nbuf; i ++) buf[i] = 0.0;
// Pass information to buffer
PassBuffer(buf, nbuf_local);
// Receive information from buffer for output
RecvBuffer(buf, nbuf, nbuf_local, nlocal_tot, numbonds_max);
memory->destroy(buf);
}
/* ---------------------------------------------------------------------- */
void FixReaxCBonds::FindBond(struct _reax_list *lists, int &numbonds)
{
int *ilist, i, ii, inum;
int j, pj, nj;
tagint jtag;
double bo_tmp,bo_cut;
inum = reaxc->list->inum;
ilist = reaxc->list->ilist;
bond_data *bo_ij;
bo_cut = reaxc->control->bg_cut;
tagint *tag = atom->tag;
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
nj = 0;
for( pj = Start_Index(i, reaxc->lists); pj < End_Index(i, reaxc->lists); ++pj ) {
bo_ij = &( reaxc->lists->select.bond_list[pj] );
j = bo_ij->nbr;
jtag = tag[j];
bo_tmp = bo_ij->bo_data.BO;
if (bo_tmp > bo_cut) {
neighid[i][nj] = jtag;
abo[i][nj] = bo_tmp;
nj ++;
}
}
numneigh[i] = nj;
if (nj > numbonds) numbonds = nj;
}
}
/* ---------------------------------------------------------------------- */
void FixReaxCBonds::PassBuffer(double *buf, int &nbuf_local)
{
int i, j, k, numbonds;
int nlocal = atom->nlocal;
j = 2;
buf[0] = nlocal;
for (i = 0; i < nlocal; i++) {
buf[j-1] = atom->tag[i];
buf[j+0] = atom->type[i];
buf[j+1] = reaxc->workspace->total_bond_order[i];
buf[j+2] = reaxc->workspace->nlp[i];
buf[j+3] = atom->q[i];
buf[j+4] = numneigh[i];
numbonds = nint(buf[j+4]);
for (k = 5; k < 5+numbonds; k++) {
buf[j+k] = neighid[i][k-5];
}
j += (5+numbonds);
if (atom->molecule == NULL ) buf[j] = 0.0;
else buf[j] = atom->molecule[i];
j ++;
for (k = 0; k < numbonds; k++) {
buf[j+k] = abo[i][k];
}
j += (1+numbonds);
}
nbuf_local = j - 1;
}
/* ---------------------------------------------------------------------- */
void FixReaxCBonds::RecvBuffer(double *buf, int nbuf, int nbuf_local,
int natoms, int maxnum)
{
int i, j, k, itype;
int inode, nlocal_tmp, numbonds;
tagint itag,jtag;
int nlocal = atom->nlocal;
bigint ntimestep = update->ntimestep;
double sbotmp, nlptmp, avqtmp, abotmp;
double cutof3 = reaxc->control->bg_cut;
MPI_Request irequest, irequest2;
if (me == 0 ){
fprintf(fp,"# Timestep " BIGINT_FORMAT " \n",ntimestep);
fprintf(fp,"# \n");
fprintf(fp,"# Number of particles %d \n",natoms);
fprintf(fp,"# \n");
fprintf(fp,"# Max number of bonds per atom %d with "
"coarse bond order cutoff %5.3f \n",maxnum,cutof3);
fprintf(fp,"# Particle connection table and bond orders \n");
fprintf(fp,"# id type nb id_1...id_nb mol bo_1...bo_nb abo nlp q \n");
}
j = 2;
if (me == 0) {
for (inode = 0; inode < nprocs; inode ++) {
if (inode == 0) {
nlocal_tmp = nlocal;
} else {
MPI_Irecv(&buf[0],nbuf,MPI_DOUBLE,inode,0,world,&irequest);
MPI_Wait(&irequest,MPI_STATUS_IGNORE);
nlocal_tmp = nint(buf[0]);
}
j = 2;
for (i = 0; i < nlocal_tmp; i ++) {
itag = static_cast<tagint> (buf[j-1]);
itype = nint(buf[j+0]);
sbotmp = buf[j+1];
nlptmp = buf[j+2];
avqtmp = buf[j+3];
numbonds = nint(buf[j+4]);
fprintf(fp," " TAGINT_FORMAT " %d %d",itag,itype,numbonds);
for (k = 5; k < 5+numbonds; k++) {
jtag = static_cast<tagint> (buf[j+k]);
fprintf(fp," " TAGINT_FORMAT,jtag);
}
j += (5+numbonds);
fprintf(fp," " TAGINT_FORMAT,static_cast<tagint> (buf[j]));
j ++;
for (k = 0; k < numbonds; k++) {
abotmp = buf[j+k];
fprintf(fp,"%14.3f",abotmp);
}
j += (1+numbonds);
fprintf(fp,"%14.3f%14.3f%14.3f\n",sbotmp,nlptmp,avqtmp);
}
}
} else {
MPI_Isend(&buf[0],nbuf_local,MPI_DOUBLE,0,0,world,&irequest2);
MPI_Wait(&irequest2,MPI_STATUS_IGNORE);
}
if(me ==0) fprintf(fp,"# \n");
}
/* ---------------------------------------------------------------------- */
int FixReaxCBonds::nint(const double &r)
{
int i = 0;
if (r>0.0) i = static_cast<int>(r+0.5);
else if (r<0.0) i = static_cast<int>(r-0.5);
return i;
}
/* ---------------------------------------------------------------------- */
void FixReaxCBonds::destroy()
{
memory->destroy(abo);
memory->destroy(neighid);
memory->destroy(numneigh);
}
/* ---------------------------------------------------------------------- */
void FixReaxCBonds::allocate()
{
memory->create(abo,nmax,MAXREAXBOND,"reax/c/bonds:abo");
memory->create(neighid,nmax,MAXREAXBOND,"reax/c/bonds:neighid");
memory->create(numneigh,nmax,"reax/c/bonds:numneigh");
}
/* ---------------------------------------------------------------------- */
double FixReaxCBonds::memory_usage()
{
double bytes;
bytes = 3.0*nmax*sizeof(double);
bytes += nmax*sizeof(int);
bytes += 1.0*nmax*MAXREAXBOND*sizeof(double);
bytes += 1.0*nmax*MAXREAXBOND*sizeof(int);
return bytes;
}
diff --git a/src/USER-REAXC/fix_reaxc_species.cpp b/src/USER-REAXC/fix_reaxc_species.cpp
index ead73f02a..d291903fa 100644
--- a/src/USER-REAXC/fix_reaxc_species.cpp
+++ b/src/USER-REAXC/fix_reaxc_species.cpp
@@ -1,985 +1,985 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing authors: Ray Shan (Sandia, tnshan@sandia.gov)
Oleg Sergeev (VNIIA, sergeev@vniia.ru)
------------------------------------------------------------------------- */
#include <stdlib.h>
#include <math.h>
#include "atom.h"
#include <string.h>
#include "fix_ave_atom.h"
#include "fix_reaxc_species.h"
#include "domain.h"
#include "update.h"
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "modify.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "comm.h"
#include "force.h"
#include "compute.h"
#include "input.h"
#include "variable.h"
#include "memory.h"
#include "error.h"
#include "reaxc_list.h"
using namespace LAMMPS_NS;
using namespace FixConst;
/* ---------------------------------------------------------------------- */
FixReaxCSpecies::FixReaxCSpecies(LAMMPS *lmp, int narg, char **arg) :
Fix(lmp, narg, arg)
{
if (narg < 7) error->all(FLERR,"Illegal fix reax/c/species command");
force_reneighbor = 0;
vector_flag = 1;
size_vector = 2;
extvector = 0;
peratom_flag = 1;
size_peratom_cols = 0;
peratom_freq = 1;
nvalid = -1;
MPI_Comm_rank(world,&me);
MPI_Comm_size(world,&nprocs);
ntypes = atom->ntypes;
nevery = atoi(arg[3]);
nrepeat = atoi(arg[4]);
global_freq = nfreq = atoi(arg[5]);
comm_forward = 5;
if (nevery <= 0 || nrepeat <= 0 || nfreq <= 0)
error->all(FLERR,"Illegal fix reax/c/species command");
if (nfreq % nevery || nrepeat*nevery > nfreq)
error->all(FLERR,"Illegal fix reax/c/species command");
// Neighbor lists must stay unchanged during averaging of bonds,
// but may be updated when no averaging is performed.
int rene_flag = 0;
if (nevery * nrepeat != 1 && (nfreq % neighbor->every != 0 || neighbor->every < nevery * nrepeat)) {
int newneighborevery = nevery * nrepeat;
while (nfreq % newneighborevery != 0 && newneighborevery <= nfreq / 2)
newneighborevery++;
if (nfreq % newneighborevery != 0)
newneighborevery = nfreq;
neighbor->every = newneighborevery;
rene_flag = 1;
}
if (nevery * nrepeat != 1 && (neighbor->delay != 0 || neighbor->dist_check != 0)) {
neighbor->delay = 0;
neighbor->dist_check = 0;
rene_flag = 1;
}
if (me == 0 && rene_flag) {
char str[128];
sprintf(str,"Resetting reneighboring criteria for fix reax/c/species");
error->warning(FLERR,str);
}
tmparg = NULL;
memory->create(tmparg,4,4,"reax/c/species:tmparg");
strcpy(tmparg[0],arg[3]);
strcpy(tmparg[1],arg[4]);
strcpy(tmparg[2],arg[5]);
if (me == 0) {
fp = fopen(arg[6],"w");
if (fp == NULL) {
char str[128];
sprintf(str,"Cannot open fix reax/c/species file %s",arg[6]);
error->one(FLERR,str);
}
}
x0 = NULL;
PBCconnected = NULL;
clusterID = NULL;
int ntmp = 1;
memory->create(x0,ntmp,"reax/c/species:x0");
memory->create(PBCconnected,ntmp,"reax/c/species:PBCconnected");
memory->create(clusterID,ntmp,"reax/c/species:clusterID");
vector_atom = clusterID;
BOCut = NULL;
Name = NULL;
MolName = NULL;
MolType = NULL;
NMol = NULL;
nd = NULL;
molmap = NULL;
nmax = 0;
setupflag = 0;
// set default bond order cutoff
int n, i, j, itype, jtype;
double bo_cut;
bg_cut = 0.30;
n = ntypes+1;
memory->create(BOCut,n,n,"reax/c/species:BOCut");
for (i = 1; i < n; i ++)
for (j = 1; j < n; j ++)
BOCut[i][j] = bg_cut;
// optional args
eletype = NULL;
ele = filepos = NULL;
eleflag = posflag = padflag = 0;
singlepos_opened = multipos_opened = 0;
multipos = 0;
posfreq = 0;
int iarg = 7;
while (iarg < narg) {
// set BO cutoff
if (strcmp(arg[iarg],"cutoff") == 0) {
if (iarg+4 > narg) error->all(FLERR,"Illegal fix reax/c/species command");
itype = atoi(arg[iarg+1]);
jtype = atoi(arg[iarg+2]);
bo_cut = atof(arg[iarg+3]);
if (itype > ntypes || jtype > ntypes)
error->all(FLERR,"Illegal fix reax/c/species command");
if (itype <= 0 || jtype <= 0)
error->all(FLERR,"Illegal fix reax/c/species command");
if (bo_cut > 1.0 || bo_cut < 0.0)
error->all(FLERR,"Illegal fix reax/c/species command");
BOCut[itype][jtype] = bo_cut;
BOCut[jtype][itype] = bo_cut;
iarg += 4;
// modify element type names
} else if (strcmp(arg[iarg],"element") == 0) {
if (iarg+ntypes+1 > narg) error->all(FLERR,"Illegal fix reax/c/species command");
eletype = (char**) malloc(ntypes*sizeof(char*));
for (int i = 0; i < ntypes; i ++) {
eletype[i] = (char*) malloc(2*sizeof(char));
strcpy(eletype[i],arg[iarg+1+i]);
}
eleflag = 1;
iarg += ntypes + 1;
// position of molecules
} else if (strcmp(arg[iarg],"position") == 0) {
if (iarg+3 > narg) error->all(FLERR,"Illegal fix reax/c/species command");
posflag = 1;
posfreq = atoi(arg[iarg+1]);
if (posfreq < nfreq || (posfreq%nfreq != 0))
error->all(FLERR,"Illegal fix reax/c/species command");
filepos = new char[255];
strcpy(filepos,arg[iarg+2]);
if (strchr(filepos,'*')) {
multipos = 1;
} else {
if (me == 0) {
pos = fopen(filepos, "w");
if (pos == NULL) error->one(FLERR,"Cannot open fix reax/c/species position file");
}
singlepos_opened = 1;
multipos = 0;
}
iarg += 3;
} else error->all(FLERR,"Illegal fix reax/c/species command");
}
if (!eleflag) {
memory->create(ele,ntypes+1,"reax/c/species:ele");
ele[0]='C';
if (ntypes > 1)
ele[1]='H';
if (ntypes > 2)
ele[2]='O';
if (ntypes > 3)
ele[3]='N';
}
vector_nmole = 0;
vector_nspec = 0;
}
/* ---------------------------------------------------------------------- */
FixReaxCSpecies::~FixReaxCSpecies()
{
memory->destroy(ele);
memory->destroy(BOCut);
memory->destroy(clusterID);
memory->destroy(PBCconnected);
memory->destroy(x0);
memory->destroy(nd);
memory->destroy(Name);
memory->destroy(NMol);
memory->destroy(MolType);
memory->destroy(MolName);
memory->destroy(tmparg);
if (filepos)
delete [] filepos;
if (me == 0) fclose(fp);
if (me == 0 && posflag && multipos_opened) fclose(pos);
modify->delete_compute("SPECATOM");
modify->delete_fix("SPECBOND");
}
/* ---------------------------------------------------------------------- */
int FixReaxCSpecies::setmask()
{
int mask = 0;
mask |= POST_INTEGRATE;
return mask;
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::setup(int vflag)
{
ntotal = static_cast<int> (atom->natoms);
if (Name == NULL)
memory->create(Name,ntypes,"reax/c/species:Name");
post_integrate();
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::init()
{
if (atom->tag_enable == 0)
error->all(FLERR,"Cannot use fix reax/c/species unless atoms have IDs");
reaxc = (PairReaxC *) force->pair_match("reax/c",1);
if (reaxc == NULL)
reaxc = (PairReaxC *) force->pair_match("reax/c/kk",1);
if (reaxc == NULL) error->all(FLERR,"Cannot use fix reax/c/species without "
"pair_style reax/c");
reaxc->fixspecies_flag = 1;
// reset next output timestep if not yet set or timestep has been reset
if (nvalid != update->ntimestep)
nvalid = update->ntimestep+nfreq;
// check if this fix has been called twice
int count = 0;
for (int i = 0; i < modify->nfix; i++)
if (strcmp(modify->fix[i]->style,"reax/c/species") == 0) count++;
if (count > 1 && comm->me == 0)
error->warning(FLERR,"More than one fix reax/c/species");
if (!setupflag) {
// create a compute to store properties
create_compute();
// create a fix to point to fix_ave_atom for averaging stored properties
create_fix();
setupflag = 1;
}
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::create_compute()
{
int narg;
char **args;
narg = 34;
args = new char*[narg];
args[0] = (char *) "SPECATOM";
args[1] = (char *) "all";
args[2] = (char *) "SPEC/ATOM";
args[3] = (char *) "q";
args[4] = (char *) "x";
args[5] = (char *) "y";
args[6] = (char *) "z";
args[7] = (char *) "vx";
args[8] = (char *) "vy";
args[9] = (char *) "vz";
args[10] = (char *) "abo01";
args[11] = (char *) "abo02";
args[12] = (char *) "abo03";
args[13] = (char *) "abo04";
args[14] = (char *) "abo05";
args[15] = (char *) "abo06";
args[16] = (char *) "abo07";
args[17] = (char *) "abo08";
args[18] = (char *) "abo09";
args[19] = (char *) "abo10";
args[20] = (char *) "abo11";
args[21] = (char *) "abo12";
args[22] = (char *) "abo13";
args[23] = (char *) "abo14";
args[24] = (char *) "abo15";
args[25] = (char *) "abo16";
args[26] = (char *) "abo17";
args[27] = (char *) "abo18";
args[28] = (char *) "abo19";
args[29] = (char *) "abo20";
args[30] = (char *) "abo21";
args[31] = (char *) "abo22";
args[32] = (char *) "abo23";
args[33] = (char *) "abo24";
modify->add_compute(narg,args);
delete [] args;
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::create_fix()
{
int narg;
char **args;
narg = 37;
args = new char*[narg];
args[0] = (char *) "SPECBOND";
args[1] = (char *) "all";
args[2] = (char *) "ave/atom";
args[3] = tmparg[0];
args[4] = tmparg[1];
args[5] = tmparg[2];
args[6] = (char *) "c_SPECATOM[1]"; // q, array_atoms[i][0]
args[7] = (char *) "c_SPECATOM[2]"; // x, 1
args[8] = (char *) "c_SPECATOM[3]"; // y, 2
args[9] = (char *) "c_SPECATOM[4]"; // z, 3
args[10] = (char *) "c_SPECATOM[5]"; // vx, 4
args[11] = (char *) "c_SPECATOM[6]"; // vy, 5
args[12] = (char *) "c_SPECATOM[7]"; // vz, 6
args[13] = (char *) "c_SPECATOM[8]"; // abo01, 7
args[14] = (char *) "c_SPECATOM[9]";
args[15] = (char *) "c_SPECATOM[10]";
args[16] = (char *) "c_SPECATOM[11]";
args[17] = (char *) "c_SPECATOM[12]";
args[18] = (char *) "c_SPECATOM[13]";
args[19] = (char *) "c_SPECATOM[14]";
args[20] = (char *) "c_SPECATOM[15]";
args[21] = (char *) "c_SPECATOM[16]";
args[22] = (char *) "c_SPECATOM[17]";
args[23] = (char *) "c_SPECATOM[18]";
args[24] = (char *) "c_SPECATOM[19]"; // abo12, 18
args[25] = (char *) "c_SPECATOM[20]";
args[26] = (char *) "c_SPECATOM[21]";
args[27] = (char *) "c_SPECATOM[22]";
args[28] = (char *) "c_SPECATOM[23]";
args[29] = (char *) "c_SPECATOM[24]";
args[30] = (char *) "c_SPECATOM[25]";
args[31] = (char *) "c_SPECATOM[26]";
args[32] = (char *) "c_SPECATOM[27]";
args[33] = (char *) "c_SPECATOM[28]";
args[34] = (char *) "c_SPECATOM[29]";
args[35] = (char *) "c_SPECATOM[30]";
args[36] = (char *) "c_SPECATOM[31]";
modify->add_fix(narg,args);
f_SPECBOND = (FixAveAtom *) modify->fix[modify->nfix-1];
delete [] args;
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::init_list(int id, NeighList *ptr)
{
list = ptr;
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::post_integrate()
{
Output_ReaxC_Bonds(update->ntimestep,fp);
if (me == 0) fflush(fp);
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::Output_ReaxC_Bonds(bigint ntimestep, FILE *fp)
{
int Nmole, Nspec;
// point to fix_ave_atom
f_SPECBOND->end_of_step();
if (ntimestep != nvalid) return;
nlocal = atom->nlocal;
if (atom->nmax > nmax) {
nmax = atom->nmax;
memory->destroy(x0);
memory->destroy(PBCconnected);
memory->destroy(clusterID);
memory->create(x0,nmax,"reax/c/species:x0");
memory->create(PBCconnected,nmax,"reax/c/species:PBCconnected");
memory->create(clusterID,nmax,"reax/c/species:clusterID");
vector_atom = clusterID;
}
for (int i = 0; i < nmax; i++) {
PBCconnected[i] = 0;
x0[i].x = x0[i].y = x0[i].z = 0.0;
}
Nmole = Nspec = 0;
FindMolecule();
SortMolecule (Nmole);
FindSpecies(Nmole, Nspec);
vector_nmole = Nmole;
vector_nspec = Nspec;
if (me == 0 && ntimestep >= 0)
WriteFormulas (Nmole, Nspec);
if (posflag && ((ntimestep)%posfreq==0)) {
WritePos(Nmole, Nspec);
if (me == 0) fflush(pos);
}
nvalid += nfreq;
}
/* ---------------------------------------------------------------------- */
AtomCoord FixReaxCSpecies::chAnchor(AtomCoord in1, AtomCoord in2)
{
if (in1.x < in2.x)
return in1;
return in2;
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::FindMolecule ()
{
int i,j,ii,jj,inum,itype,jtype,loop,looptot;
int change,done,anychange;
int *mask = atom->mask;
int *ilist;
double bo_tmp,bo_cut;
double **spec_atom = f_SPECBOND->array_atom;
inum = reaxc->list->inum;
ilist = reaxc->list->ilist;
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
if (mask[i] & groupbit) {
clusterID[i] = atom->tag[i];
x0[i].x = spec_atom[i][1];
x0[i].y = spec_atom[i][2];
x0[i].z = spec_atom[i][3];
}
else clusterID[i] = 0.0;
}
loop = 0;
while (1) {
comm->forward_comm_fix(this);
loop ++;
change = 0;
while (1) {
done = 1;
for (ii = 0; ii < inum; ii++) {
i = ilist[ii];
if (!(mask[i] & groupbit)) continue;
itype = atom->type[i];
for (jj = 0; jj < MAXSPECBOND; jj++) {
j = reaxc->tmpid[i][jj];
if (j < i) continue;
if (!(mask[j] & groupbit)) continue;
if (clusterID[i] == clusterID[j] && PBCconnected[i] == PBCconnected[j]
&& x0[i].x == x0[j].x && x0[i].y == x0[j].y && x0[i].z == x0[j].z) continue;
jtype = atom->type[j];
bo_cut = BOCut[itype][jtype];
bo_tmp = spec_atom[i][jj+7];
if (bo_tmp > bo_cut) {
clusterID[i] = clusterID[j] = MIN(clusterID[i], clusterID[j]);
PBCconnected[i] = PBCconnected[j] = MAX(PBCconnected[i], PBCconnected[j]);
x0[i] = x0[j] = chAnchor(x0[i], x0[j]);
if ((fabs(spec_atom[i][1] - spec_atom[j][1]) > reaxc->control->bond_cut)
|| (fabs(spec_atom[i][2] - spec_atom[j][2]) > reaxc->control->bond_cut)
|| (fabs(spec_atom[i][3] - spec_atom[j][3]) > reaxc->control->bond_cut))
PBCconnected[i] = PBCconnected[j] = 1;
done = 0;
}
}
}
if (!done) change = 1;
if (done) break;
}
MPI_Allreduce(&change,&anychange,1,MPI_INT,MPI_MAX,world);
if (!anychange) break;
MPI_Allreduce(&loop,&looptot,1,MPI_INT,MPI_SUM,world);
if (looptot >= 400*nprocs) break;
}
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::SortMolecule(int &Nmole)
{
memory->destroy(molmap);
molmap = NULL;
int n, idlo, idhi;
int *mask =atom->mask;
int lo = ntotal;
int hi = -ntotal;
int flag = 0;
for (n = 0; n < nlocal; n++) {
if (!(mask[n] & groupbit)) continue;
if (clusterID[n] == 0.0) flag = 1;
lo = MIN(lo,nint(clusterID[n]));
hi = MAX(hi,nint(clusterID[n]));
}
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
if (flagall && me == 0)
error->warning(FLERR,"Atom with cluster ID = 0 included in "
"fix reax/c/species group");
MPI_Allreduce(&lo,&idlo,1,MPI_INT,MPI_MIN,world);
MPI_Allreduce(&hi,&idhi,1,MPI_INT,MPI_MAX,world);
if (idlo == ntotal)
if (me == 0)
error->warning(FLERR,"Atom with cluster ID = maxmol "
"included in fix reax/c/species group");
int nlen = idhi - idlo + 1;
memory->create(molmap,nlen,"reax/c/species:molmap");
for (n = 0; n < nlen; n++) molmap[n] = 0;
for (n = 0; n < nlocal; n++) {
if (!(mask[n] & groupbit)) continue;
molmap[nint(clusterID[n])-idlo] = 1;
}
int *molmapall;
memory->create(molmapall,nlen,"reax/c/species:molmapall");
MPI_Allreduce(molmap,molmapall,nlen,MPI_INT,MPI_MAX,world);
Nmole = 0;
for (n = 0; n < nlen; n++) {
if (molmapall[n]) molmap[n] = Nmole++;
else molmap[n] = -1;
}
memory->destroy(molmapall);
flag = 0;
for (n = 0; n < nlocal; n++) {
if (mask[n] & groupbit) continue;
if (nint(clusterID[n]) < idlo || nint(clusterID[n]) > idhi) continue;
if (molmap[nint(clusterID[n])-idlo] >= 0) flag = 1;
}
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
if (flagall && comm->me == 0)
error->warning(FLERR,"One or more cluster has atoms not in group");
for (n = 0; n < nlocal; n++) {
if (!(mask[n] & groupbit)) continue;
clusterID[n] = molmap[nint(clusterID[n])-idlo] + 1;
}
memory->destroy(molmap);
molmap = NULL;
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::FindSpecies(int Nmole, int &Nspec)
{
int k, l, m, n, itype, cid;
int flag_identity, flag_mol, flag_spec;
int flag_tmp;
int *mask =atom->mask;
int *Nameall, *NMolall;
memory->destroy(MolName);
MolName = NULL;
memory->create(MolName,Nmole*(ntypes+1),"reax/c/species:MolName");
memory->destroy(NMol);
NMol = NULL;
memory->create(NMol,Nmole,"reax/c/species:NMol");
for (m = 0; m < Nmole; m ++)
NMol[m] = 1;
memory->create(Nameall,ntypes,"reax/c/species:Nameall");
memory->create(NMolall,Nmole,"reax/c/species:NMolall");
for (m = 1, Nspec = 0; m <= Nmole; m ++) {
for (n = 0; n < ntypes; n ++) Name[n] = 0;
for (n = 0, flag_mol = 0; n < nlocal; n ++) {
if (!(mask[n] & groupbit)) continue;
cid = nint(clusterID[n]);
if (cid == m) {
itype = atom->type[n]-1;
Name[itype] ++;
flag_mol = 1;
}
}
MPI_Allreduce(&flag_mol,&flag_tmp,1,MPI_INT,MPI_MAX,world);
flag_mol = flag_tmp;
MPI_Allreduce(Name,Nameall,ntypes,MPI_INT,MPI_SUM,world);
for (n = 0; n < ntypes; n++) Name[n] = Nameall[n];
if (flag_mol == 1) {
flag_identity = 1;
for (k = 0; k < Nspec; k ++) {
flag_spec=0;
for (l = 0; l < ntypes; l ++)
if (MolName[ntypes*k+l] != Name[l]) flag_spec = 1;
if (flag_spec == 0) NMol[k] ++;
flag_identity *= flag_spec;
}
if (Nspec == 0 || flag_identity == 1) {
for (l = 0; l < ntypes; l ++)
MolName[ntypes*Nspec+l] = Name[l];
Nspec ++;
}
}
}
memory->destroy(NMolall);
memory->destroy(Nameall);
memory->destroy(nd);
nd = NULL;
memory->create(nd,Nspec,"reax/c/species:nd");
memory->destroy(MolType);
MolType = NULL;
memory->create(MolType,Nspec*(ntypes+2),"reax/c/species:MolType");
}
/* ---------------------------------------------------------------------- */
int FixReaxCSpecies::CheckExistence(int id, int ntypes)
{
int i, j, molid, flag;
for (i = 0; i < Nmoltype; i ++) {
flag = 0;
for (j = 0; j < ntypes; j ++) {
molid = MolType[ntypes * i + j];
if (molid != MolName[ntypes * id + j]) flag = 1;
}
if (flag == 0) return i;
}
for (i = 0; i < ntypes; i ++)
MolType[ntypes * Nmoltype + i] = MolName[ntypes *id + i];
Nmoltype ++;
return Nmoltype - 1;
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::WriteFormulas(int Nmole, int Nspec)
{
int i, j, itemp;
bigint ntimestep = update->ntimestep;
fprintf(fp,"# Timestep No_Moles No_Specs ");
Nmoltype = 0;
for (i = 0; i < Nspec; i ++)
nd[i] = CheckExistence(i, ntypes);
for (i = 0; i < Nmoltype; i ++) {
for (j = 0;j < ntypes; j ++) {
itemp = MolType[ntypes * i + j];
if (itemp != 0) {
if (eletype) fprintf(fp,"%s",eletype[j]);
else fprintf(fp,"%c",ele[j]);
if (itemp != 1) fprintf(fp,"%d",itemp);
}
}
fprintf(fp,"\t");
}
fprintf(fp,"\n");
fprintf(fp,BIGINT_FORMAT,ntimestep);
fprintf(fp,"%11d%11d\t",Nmole,Nspec);
for (i = 0; i < Nmoltype; i ++)
fprintf(fp," %d\t",NMol[i]);
fprintf(fp,"\n");
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::OpenPos()
{
char *filecurrent;
bigint ntimestep = update->ntimestep;
filecurrent = (char*) malloc((strlen(filepos)+16)*sizeof(char));
char *ptr = strchr(filepos,'*');
*ptr = '\0';
if (padflag == 0)
sprintf(filecurrent,"%s" BIGINT_FORMAT "%s",
filepos,ntimestep,ptr+1);
else {
char bif[8],pad[16];
strcpy(bif,BIGINT_FORMAT);
sprintf(pad,"%%s%%0%d%s%%s",padflag,&bif[1]);
sprintf(filecurrent,pad,filepos,ntimestep,ptr+1);
}
*ptr = '*';
if (me == 0) {
pos = fopen(filecurrent, "w");
if (pos == NULL) error->one(FLERR,"Cannot open fix reax/c/species position file");
} else pos = NULL;
multipos_opened = 1;
free(filecurrent);
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::WritePos(int Nmole, int Nspec)
{
int i, itype, cid;
int count, count_tmp, m, n, k;
int *Nameall;
int *mask =atom->mask;
double avq, avq_tmp, avx[3], avx_tmp, box[3], halfbox[3];
double **spec_atom = f_SPECBOND->array_atom;
if (multipos) OpenPos();
box[0] = domain->boxhi[0] - domain->boxlo[0];
box[1] = domain->boxhi[1] - domain->boxlo[1];
box[2] = domain->boxhi[2] - domain->boxlo[2];
for (int j = 0; j < 3; j++)
halfbox[j] = box[j] / 2;
if (me == 0) {
fprintf(pos,"Timestep " BIGINT_FORMAT " NMole %d NSpec %d xlo %f "
"xhi %f ylo %f yhi %f zlo %f zhi %f\n",
update->ntimestep,Nmole, Nspec,
domain->boxlo[0],domain->boxhi[0],
domain->boxlo[1],domain->boxhi[1],
domain->boxlo[2],domain->boxhi[2]);
fprintf(pos,"ID\tAtom_Count\tType\tAve_q\t\tCoM_x\t\tCoM_y\t\tCoM_z\n");
}
Nameall = NULL;
memory->create(Nameall,ntypes,"reax/c/species:Nameall");
for (m = 1; m <= Nmole; m ++) {
count = 0;
avq = 0.0;
for (n = 0; n < 3; n++)
avx[n] = 0.0;
for (n = 0; n < ntypes; n ++)
Name[n] = 0;
for (i = 0; i < nlocal; i ++) {
if (!(mask[i] & groupbit)) continue;
cid = nint(clusterID[i]);
if (cid == m) {
itype = atom->type[i]-1;
Name[itype] ++;
count ++;
avq += spec_atom[i][0];
if (PBCconnected[i]) {
if ((x0[i].x - spec_atom[i][1]) > halfbox[0])
spec_atom[i][1] += box[0];
if ((spec_atom[i][1] - x0[i].x) > halfbox[0])
spec_atom[i][1] -= box[0];
if ((x0[i].y - spec_atom[i][2]) > halfbox[1])
spec_atom[i][2] += box[1];
if ((spec_atom[i][2] - x0[i].y) > halfbox[1])
spec_atom[i][2] -= box[1];
if ((x0[i].z - spec_atom[i][3]) > halfbox[2])
spec_atom[i][3] += box[2];
if ((spec_atom[i][3] - x0[i].z) > halfbox[2])
spec_atom[i][3] -= box[2];
}
for (n = 0; n < 3; n++)
avx[n] += spec_atom[i][n+1];
}
}
avq_tmp = 0.0;
MPI_Allreduce(&avq,&avq_tmp,1,MPI_DOUBLE,MPI_SUM,world);
avq = avq_tmp;
for (n = 0; n < 3; n++) {
avx_tmp = 0.0;
MPI_Reduce(&avx[n],&avx_tmp,1,MPI_DOUBLE,MPI_SUM,0,world);
avx[n] = avx_tmp;
}
MPI_Reduce(&count,&count_tmp,1,MPI_INT,MPI_SUM,0,world);
count = count_tmp;
MPI_Reduce(Name,Nameall,ntypes,MPI_INT,MPI_SUM,0,world);
for (n = 0; n < ntypes; n++) Name[n] = Nameall[n];
if (me == 0) {
fprintf(pos,"%d\t%d\t",m,count);
for (n = 0; n < ntypes; n++) {
if (Name[n] != 0) {
if (eletype) fprintf(pos,"%s",eletype[n]);
else fprintf(pos,"%c",ele[n]);
if (Name[n] != 1) fprintf(pos,"%d",Name[n]);
}
}
if (count > 0) {
avq /= count;
for (k = 0; k < 3; k++) {
avx[k] /= count;
if (avx[k] >= domain->boxhi[k])
avx[k] -= box[k];
if (avx[k] < domain->boxlo[k])
avx[k] += box[k];
avx[k] -= domain->boxlo[k];
avx[k] /= box[k];
}
fprintf(pos,"\t%.8f \t%.8f \t%.8f \t%.8f",
avq,avx[0],avx[1],avx[2]);
}
fprintf(pos,"\n");
}
}
if (me == 0 && !multipos) fprintf(pos,"#\n");
memory->destroy(Nameall);
}
/* ---------------------------------------------------------------------- */
double FixReaxCSpecies::compute_vector(int n)
{
if (n == 0)
return vector_nmole;
if (n == 1)
return vector_nspec;
return 0.0;
}
/* ---------------------------------------------------------------------- */
int FixReaxCSpecies::nint(const double &r)
{
int i = 0;
if (r>0.0) i = static_cast<int>(r+0.5);
else if (r<0.0) i = static_cast<int>(r-0.5);
return i;
}
/* ---------------------------------------------------------------------- */
int FixReaxCSpecies::pack_forward_comm(int n, int *list, double *buf,
int pbc_flag, int *pbc)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
buf[m] = clusterID[j];
buf[m+1] = (double)PBCconnected[j];
buf[m+2] = x0[j].x;
buf[m+3] = x0[j].y;
buf[m+4] = x0[j].z;
m += 5;
}
return m;
}
/* ---------------------------------------------------------------------- */
void FixReaxCSpecies::unpack_forward_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
clusterID[i] = buf[m];
PBCconnected[i] = (int)buf[m+1];
x0[i].x = buf[m+2];
x0[i].y = buf[m+3];
x0[i].z = buf[m+4];
m += 5;
}
}
/* ---------------------------------------------------------------------- */
double FixReaxCSpecies::memory_usage()
{
double bytes;
bytes = 5*nmax*sizeof(double); // clusterID + PBCconnected + x0
return bytes;
}
/* ---------------------------------------------------------------------- */
diff --git a/src/USER-REAXC/fix_reaxc_species.h b/src/USER-REAXC/fix_reaxc_species.h
index 872ea2528..563a10f39 100644
--- a/src/USER-REAXC/fix_reaxc_species.h
+++ b/src/USER-REAXC/fix_reaxc_species.h
@@ -1,94 +1,94 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef FIX_CLASS
FixStyle(reax/c/species,FixReaxCSpecies)
#else
#ifndef LMP_FIX_REAXC_SPECIES_H
#define LMP_FIX_REAXC_SPECIES_H
#include "fix.h"
#include "pointers.h"
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_types.h"
#include "reaxc_defs.h"
#define BUFLEN 1000
namespace LAMMPS_NS {
typedef struct {
double x, y, z;
} AtomCoord;
class FixReaxCSpecies : public Fix {
public:
FixReaxCSpecies(class LAMMPS *, int, char **);
virtual ~FixReaxCSpecies();
int setmask();
virtual void init();
void init_list(int, class NeighList *);
void setup(int);
void post_integrate();
double compute_vector(int);
protected:
int me, nprocs, nmax, nlocal, ntypes, ntotal;
int nrepeat, nfreq, posfreq;
int Nmoltype, vector_nmole, vector_nspec;
int *Name, *MolName, *NMol, *nd, *MolType, *molmap;
double *clusterID;
int *PBCconnected;
AtomCoord *x0;
double bg_cut;
double **BOCut;
char **tmparg;
FILE *fp, *pos;
int eleflag, posflag, multipos, padflag, setupflag;
int singlepos_opened, multipos_opened;
char *ele, **eletype, *filepos;
void Output_ReaxC_Bonds(bigint, FILE *);
void create_compute();
void create_fix();
AtomCoord chAnchor(AtomCoord, AtomCoord);
virtual void FindMolecule();
void SortMolecule(int &);
void FindSpecies(int, int &);
void WriteFormulas(int, int);
int CheckExistence(int, int);
int nint(const double &);
int pack_forward_comm(int, int *, double *, int, int *);
void unpack_forward_comm(int, int, double *);
void OpenPos();
void WritePos(int, int);
double memory_usage();
bigint nvalid;
class NeighList *list;
class FixAveAtom *f_SPECBOND;
class PairReaxC *reaxc;
};
}
#endif
#endif
diff --git a/src/USER-REAXC/pair_reax_c.cpp b/src/USER-REAXC/pair_reaxc.cpp
similarity index 97%
rename from src/USER-REAXC/pair_reax_c.cpp
rename to src/USER-REAXC/pair_reaxc.cpp
index 4933c90f0..d51b0fc2f 100644
--- a/src/USER-REAXC/pair_reax_c.cpp
+++ b/src/USER-REAXC/pair_reaxc.cpp
@@ -1,826 +1,833 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Hasan Metin Aktulga, Purdue University
(now at Lawrence Berkeley National Laboratory, hmaktulga@lbl.gov)
Per-atom energy/virial added by Ray Shan (Sandia)
Fix reax/c/bonds and fix reax/c/species for pair_style reax/c added by
Ray Shan (Sandia)
Hybrid and hybrid/overlay compatibility added by Ray Shan (Sandia)
------------------------------------------------------------------------- */
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "atom.h"
#include "update.h"
#include "force.h"
#include "comm.h"
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "modify.h"
#include "fix.h"
-#include "fix_reax_c.h"
+#include "fix_reaxc.h"
#include "citeme.h"
#include "memory.h"
#include "error.h"
#include "reaxc_types.h"
#include "reaxc_allocate.h"
#include "reaxc_control.h"
#include "reaxc_ffield.h"
#include "reaxc_forces.h"
#include "reaxc_init_md.h"
#include "reaxc_io_tools.h"
#include "reaxc_list.h"
#include "reaxc_lookup.h"
#include "reaxc_reset_tools.h"
#include "reaxc_traj.h"
#include "reaxc_vector.h"
#include "fix_reaxc_bonds.h"
using namespace LAMMPS_NS;
static const char cite_pair_reax_c[] =
"pair reax/c command:\n\n"
"@Article{Aktulga12,\n"
" author = {H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama},\n"
" title = {Parallel reactive molecular dynamics: Numerical methods and algorithmic techniques},\n"
" journal = {Parallel Computing},\n"
" year = 2012,\n"
" volume = 38,\n"
" pages = {245--259}\n"
"}\n\n";
/* ---------------------------------------------------------------------- */
PairReaxC::PairReaxC(LAMMPS *lmp) : Pair(lmp)
{
if (lmp->citeme) lmp->citeme->add(cite_pair_reax_c);
single_enable = 0;
restartinfo = 0;
one_coeff = 1;
manybody_flag = 1;
ghostneigh = 1;
system = (reax_system *)
memory->smalloc(sizeof(reax_system),"reax:system");
control = (control_params *)
memory->smalloc(sizeof(control_params),"reax:control");
data = (simulation_data *)
memory->smalloc(sizeof(simulation_data),"reax:data");
workspace = (storage *)
memory->smalloc(sizeof(storage),"reax:storage");
lists = (reax_list *)
memory->smalloc(LIST_N * sizeof(reax_list),"reax:lists");
out_control = (output_controls *)
memory->smalloc(sizeof(output_controls),"reax:out_control");
mpi_data = (mpi_datatypes *)
memory->smalloc(sizeof(mpi_datatypes),"reax:mpi");
MPI_Comm_rank(world,&system->my_rank);
system->my_coords[0] = 0;
system->my_coords[1] = 0;
system->my_coords[2] = 0;
system->num_nbrs = 0;
system->n = 0; // my atoms
system->N = 0; // mine + ghosts
system->bigN = 0; // all atoms in the system
system->local_cap = 0;
system->total_cap = 0;
system->gcell_cap = 0;
system->bndry_cuts.ghost_nonb = 0;
system->bndry_cuts.ghost_hbond = 0;
system->bndry_cuts.ghost_bond = 0;
system->bndry_cuts.ghost_cutoff = 0;
system->my_atoms = NULL;
system->pair_ptr = this;
fix_reax = NULL;
tmpid = NULL;
tmpbo = NULL;
nextra = 14;
pvector = new double[nextra];
setup_flag = 0;
fixspecies_flag = 0;
nmax = 0;
}
/* ---------------------------------------------------------------------- */
PairReaxC::~PairReaxC()
{
if (copymode) return;
if (fix_reax) modify->delete_fix("REAXC");
if (setup_flag) {
Close_Output_Files( system, control, out_control, mpi_data );
// deallocate reax data-structures
if( control->tabulate ) Deallocate_Lookup_Tables( system );
if( control->hbond_cut > 0 ) Delete_List( lists+HBONDS, world );
Delete_List( lists+BONDS, world );
Delete_List( lists+THREE_BODIES, world );
Delete_List( lists+FAR_NBRS, world );
DeAllocate_Workspace( control, workspace );
DeAllocate_System( system );
}
memory->destroy( system );
memory->destroy( control );
memory->destroy( data );
memory->destroy( workspace );
memory->destroy( lists );
memory->destroy( out_control );
memory->destroy( mpi_data );
// deallocate interface storage
if( allocated ) {
memory->destroy(setflag);
memory->destroy(cutsq);
memory->destroy(cutghost);
delete [] map;
delete [] chi;
delete [] eta;
delete [] gamma;
}
memory->destroy(tmpid);
memory->destroy(tmpbo);
delete [] pvector;
}
/* ---------------------------------------------------------------------- */
void PairReaxC::allocate( )
{
allocated = 1;
int n = atom->ntypes;
memory->create(setflag,n+1,n+1,"pair:setflag");
memory->create(cutsq,n+1,n+1,"pair:cutsq");
memory->create(cutghost,n+1,n+1,"pair:cutghost");
map = new int[n+1];
chi = new double[n+1];
eta = new double[n+1];
gamma = new double[n+1];
}
/* ---------------------------------------------------------------------- */
void PairReaxC::settings(int narg, char **arg)
{
if (narg < 1) error->all(FLERR,"Illegal pair_style command");
// read name of control file or use default controls
if (strcmp(arg[0],"NULL") == 0) {
strcpy( control->sim_name, "simulate" );
control->ensemble = 0;
out_control->energy_update_freq = 0;
control->tabulate = 0;
control->reneighbor = 1;
control->vlist_cut = control->nonb_cut;
control->bond_cut = 5.;
control->hbond_cut = 7.50;
control->thb_cut = 0.001;
control->thb_cutsq = 0.00001;
control->bg_cut = 0.3;
out_control->write_steps = 0;
out_control->traj_method = 0;
strcpy( out_control->traj_title, "default_title" );
out_control->atom_info = 0;
out_control->bond_info = 0;
out_control->angle_info = 0;
} else Read_Control_File(arg[0], control, out_control);
// default values
qeqflag = 1;
control->lgflag = 0;
+ control->enobondsflag = 1;
system->mincap = MIN_CAP;
system->safezone = SAFE_ZONE;
system->saferzone = SAFER_ZONE;
-
+
// process optional keywords
int iarg = 1;
while (iarg < narg) {
if (strcmp(arg[iarg],"checkqeq") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal pair_style reax/c command");
if (strcmp(arg[iarg+1],"yes") == 0) qeqflag = 1;
else if (strcmp(arg[iarg+1],"no") == 0) qeqflag = 0;
else error->all(FLERR,"Illegal pair_style reax/c command");
iarg += 2;
- } else if (strcmp(arg[iarg],"lgvdw") == 0) {
+ } else if (strcmp(arg[iarg],"enobonds") == 0) {
+ if (iarg+2 > narg) error->all(FLERR,"Illegal pair_style reax/c command");
+ if (strcmp(arg[iarg+1],"yes") == 0) control->enobondsflag = 1;
+ else if (strcmp(arg[iarg+1],"no") == 0) control->enobondsflag = 0;
+ else error->all(FLERR,"Illegal pair_style reax/c command");
+ iarg += 2;
+ } else if (strcmp(arg[iarg],"lgvdw") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal pair_style reax/c command");
if (strcmp(arg[iarg+1],"yes") == 0) control->lgflag = 1;
else if (strcmp(arg[iarg+1],"no") == 0) control->lgflag = 0;
else error->all(FLERR,"Illegal pair_style reax/c command");
iarg += 2;
} else if (strcmp(arg[iarg],"safezone") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal pair_style reax/c command");
system->safezone = force->numeric(FLERR,arg[iarg+1]);
if (system->safezone < 0.0)
error->all(FLERR,"Illegal pair_style reax/c safezone command");
system->saferzone = system->safezone*1.2;
iarg += 2;
} else if (strcmp(arg[iarg],"mincap") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal pair_style reax/c command");
system->mincap = force->inumeric(FLERR,arg[iarg+1]);
if (system->mincap < 0)
error->all(FLERR,"Illegal pair_style reax/c mincap command");
iarg += 2;
} else error->all(FLERR,"Illegal pair_style reax/c command");
}
// LAMMPS is responsible for generating nbrs
control->reneighbor = 1;
}
/* ---------------------------------------------------------------------- */
void PairReaxC::coeff( int nargs, char **args )
{
if (!allocated) allocate();
if (nargs != 3 + atom->ntypes)
error->all(FLERR,"Incorrect args for pair coefficients");
// insure I,J args are * *
if (strcmp(args[0],"*") != 0 || strcmp(args[1],"*") != 0)
error->all(FLERR,"Incorrect args for pair coefficients");
// read ffield file
char *file = args[2];
FILE *fp;
fp = force->open_potential(file);
if (fp != NULL)
Read_Force_Field(fp, &(system->reax_param), control);
else {
char str[128];
sprintf(str,"Cannot open ReaxFF potential file %s",file);
error->all(FLERR,str);
}
// read args that map atom types to elements in potential file
// map[i] = which element the Ith atom type is, -1 if NULL
int itmp = 0;
int nreax_types = system->reax_param.num_atom_types;
for (int i = 3; i < nargs; i++) {
if (strcmp(args[i],"NULL") == 0) {
map[i-2] = -1;
itmp ++;
continue;
}
}
int n = atom->ntypes;
// pair_coeff element map
for (int i = 3; i < nargs; i++)
for (int j = 0; j < nreax_types; j++)
if (strcasecmp(args[i],system->reax_param.sbp[j].name) == 0) {
map[i-2] = j;
itmp ++;
}
// error check
if (itmp != n)
error->all(FLERR,"Non-existent ReaxFF type");
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j++)
setflag[i][j] = 0;
// set setflag i,j for type pairs where both are mapped to elements
int count = 0;
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j++)
if (map[i] >= 0 && map[j] >= 0) {
setflag[i][j] = 1;
count++;
}
if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
}
/* ---------------------------------------------------------------------- */
void PairReaxC::init_style( )
{
if (!atom->q_flag)
error->all(FLERR,"Pair style reax/c requires atom attribute q");
// firstwarn = 1;
int iqeq;
for (iqeq = 0; iqeq < modify->nfix; iqeq++)
if (strstr(modify->fix[iqeq]->style,"qeq/reax")) break;
if (iqeq == modify->nfix && qeqflag == 1)
error->all(FLERR,"Pair reax/c requires use of fix qeq/reax");
system->n = atom->nlocal; // my atoms
system->N = atom->nlocal + atom->nghost; // mine + ghosts
system->bigN = static_cast<int> (atom->natoms); // all atoms in the system
system->wsize = comm->nprocs;
system->big_box.V = 0;
system->big_box.box_norms[0] = 0;
system->big_box.box_norms[1] = 0;
system->big_box.box_norms[2] = 0;
if (atom->tag_enable == 0)
error->all(FLERR,"Pair style reax/c requires atom IDs");
if (force->newton_pair == 0)
error->all(FLERR,"Pair style reax/c requires newton pair on");
// need a half neighbor list w/ Newton off and ghost neighbors
// built whenever re-neighboring occurs
int irequest = neighbor->request(this,instance_me);
neighbor->requests[irequest]->newton = 2;
neighbor->requests[irequest]->ghost = 1;
cutmax = MAX3(control->nonb_cut, control->hbond_cut, 2*control->bond_cut);
for( int i = 0; i < LIST_N; ++i )
lists[i].allocated = 0;
if (fix_reax == NULL) {
char **fixarg = new char*[3];
fixarg[0] = (char *) "REAXC";
fixarg[1] = (char *) "all";
fixarg[2] = (char *) "REAXC";
modify->add_fix(3,fixarg);
delete [] fixarg;
fix_reax = (FixReaxC *) modify->fix[modify->nfix-1];
}
}
/* ---------------------------------------------------------------------- */
void PairReaxC::setup( )
{
int oldN;
int mincap = system->mincap;
double safezone = system->safezone;
system->n = atom->nlocal; // my atoms
system->N = atom->nlocal + atom->nghost; // mine + ghosts
oldN = system->N;
system->bigN = static_cast<int> (atom->natoms); // all atoms in the system
if (setup_flag == 0) {
setup_flag = 1;
int *num_bonds = fix_reax->num_bonds;
int *num_hbonds = fix_reax->num_hbonds;
control->vlist_cut = neighbor->cutneighmax;
// determine the local and total capacity
system->local_cap = MAX( (int)(system->n * safezone), mincap );
system->total_cap = MAX( (int)(system->N * safezone), mincap );
// initialize my data structures
PreAllocate_Space( system, control, workspace, world );
write_reax_atoms();
int num_nbrs = estimate_reax_lists();
if(!Make_List(system->total_cap, num_nbrs, TYP_FAR_NEIGHBOR,
lists+FAR_NBRS, world))
error->all(FLERR,"Pair reax/c problem in far neighbor list");
write_reax_lists();
Initialize( system, control, data, workspace, &lists, out_control,
mpi_data, world );
for( int k = 0; k < system->N; ++k ) {
num_bonds[k] = system->my_atoms[k].num_bonds;
num_hbonds[k] = system->my_atoms[k].num_hbonds;
}
} else {
// fill in reax datastructures
write_reax_atoms();
// reset the bond list info for new atoms
for(int k = oldN; k < system->N; ++k)
Set_End_Index( k, Start_Index( k, lists+BONDS ), lists+BONDS );
// check if I need to shrink/extend my data-structs
ReAllocate( system, control, data, workspace, &lists, mpi_data );
}
}
/* ---------------------------------------------------------------------- */
double PairReaxC::init_one(int i, int j)
{
if (setflag[i][j] == 0) error->all(FLERR,"All pair coeffs are not set");
cutghost[i][j] = cutghost[j][i] = cutmax;
return cutmax;
}
/* ---------------------------------------------------------------------- */
void PairReaxC::compute(int eflag, int vflag)
{
double evdwl,ecoul;
double t_start, t_end;
// communicate num_bonds once every reneighboring
// 2 num arrays stored by fix, grab ptr to them
if (neighbor->ago == 0) comm->forward_comm_fix(fix_reax);
int *num_bonds = fix_reax->num_bonds;
int *num_hbonds = fix_reax->num_hbonds;
evdwl = ecoul = 0.0;
if (eflag || vflag) ev_setup(eflag,vflag);
else ev_unset();
if (vflag_global) control->virial = 1;
else control->virial = 0;
system->n = atom->nlocal; // my atoms
system->N = atom->nlocal + atom->nghost; // mine + ghosts
system->bigN = static_cast<int> (atom->natoms); // all atoms in the system
system->big_box.V = 0;
system->big_box.box_norms[0] = 0;
system->big_box.box_norms[1] = 0;
system->big_box.box_norms[2] = 0;
if( comm->me == 0 ) t_start = MPI_Wtime();
// setup data structures
setup();
Reset( system, control, data, workspace, &lists, world );
workspace->realloc.num_far = write_reax_lists();
// timing for filling in the reax lists
if( comm->me == 0 ) {
t_end = MPI_Wtime();
data->timing.nbrs = t_end - t_start;
}
// forces
Compute_Forces(system,control,data,workspace,&lists,out_control,mpi_data);
read_reax_forces(vflag);
for(int k = 0; k < system->N; ++k) {
num_bonds[k] = system->my_atoms[k].num_bonds;
num_hbonds[k] = system->my_atoms[k].num_hbonds;
}
// energies and pressure
if (eflag_global) {
evdwl += data->my_en.e_bond;
evdwl += data->my_en.e_ov;
evdwl += data->my_en.e_un;
evdwl += data->my_en.e_lp;
evdwl += data->my_en.e_ang;
evdwl += data->my_en.e_pen;
evdwl += data->my_en.e_coa;
evdwl += data->my_en.e_hb;
evdwl += data->my_en.e_tor;
evdwl += data->my_en.e_con;
evdwl += data->my_en.e_vdW;
ecoul += data->my_en.e_ele;
ecoul += data->my_en.e_pol;
// eng_vdwl += evdwl;
// eng_coul += ecoul;
// Store the different parts of the energy
// in a list for output by compute pair command
pvector[0] = data->my_en.e_bond;
pvector[1] = data->my_en.e_ov + data->my_en.e_un;
pvector[2] = data->my_en.e_lp;
pvector[3] = 0.0;
pvector[4] = data->my_en.e_ang;
pvector[5] = data->my_en.e_pen;
pvector[6] = data->my_en.e_coa;
pvector[7] = data->my_en.e_hb;
pvector[8] = data->my_en.e_tor;
pvector[9] = data->my_en.e_con;
pvector[10] = data->my_en.e_vdW;
pvector[11] = data->my_en.e_ele;
pvector[12] = 0.0;
pvector[13] = data->my_en.e_pol;
}
if (vflag_fdotr) virial_fdotr_compute();
// Set internal timestep counter to that of LAMMPS
data->step = update->ntimestep;
Output_Results( system, control, data, &lists, out_control, mpi_data );
// populate tmpid and tmpbo arrays for fix reax/c/species
int i, j;
if(fixspecies_flag) {
if (system->N > nmax) {
memory->destroy(tmpid);
memory->destroy(tmpbo);
nmax = system->N;
memory->create(tmpid,nmax,MAXSPECBOND,"pair:tmpid");
memory->create(tmpbo,nmax,MAXSPECBOND,"pair:tmpbo");
}
for (i = 0; i < system->N; i ++)
for (j = 0; j < MAXSPECBOND; j ++) {
tmpbo[i][j] = 0.0;
tmpid[i][j] = 0;
}
FindBond();
}
}
/* ---------------------------------------------------------------------- */
void PairReaxC::write_reax_atoms()
{
int *num_bonds = fix_reax->num_bonds;
int *num_hbonds = fix_reax->num_hbonds;
if (system->N > system->total_cap)
error->all(FLERR,"Too many ghost atoms");
for( int i = 0; i < system->N; ++i ){
system->my_atoms[i].orig_id = atom->tag[i];
system->my_atoms[i].type = map[atom->type[i]];
system->my_atoms[i].x[0] = atom->x[i][0];
system->my_atoms[i].x[1] = atom->x[i][1];
system->my_atoms[i].x[2] = atom->x[i][2];
system->my_atoms[i].q = atom->q[i];
system->my_atoms[i].num_bonds = num_bonds[i];
system->my_atoms[i].num_hbonds = num_hbonds[i];
}
}
/* ---------------------------------------------------------------------- */
void PairReaxC::get_distance( rvec xj, rvec xi, double *d_sqr, rvec *dvec )
{
(*dvec)[0] = xj[0] - xi[0];
(*dvec)[1] = xj[1] - xi[1];
(*dvec)[2] = xj[2] - xi[2];
*d_sqr = SQR((*dvec)[0]) + SQR((*dvec)[1]) + SQR((*dvec)[2]);
}
/* ---------------------------------------------------------------------- */
void PairReaxC::set_far_nbr( far_neighbor_data *fdest,
int j, double d, rvec dvec )
{
fdest->nbr = j;
fdest->d = d;
rvec_Copy( fdest->dvec, dvec );
ivec_MakeZero( fdest->rel_box );
}
/* ---------------------------------------------------------------------- */
int PairReaxC::estimate_reax_lists()
{
int itr_i, itr_j, i, j;
int num_nbrs, num_marked;
int *ilist, *jlist, *numneigh, **firstneigh, *marked;
double d_sqr;
rvec dvec;
double **x;
int mincap = system->mincap;
double safezone = system->safezone;
x = atom->x;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
num_nbrs = 0;
num_marked = 0;
marked = (int*) calloc( system->N, sizeof(int) );
int numall = list->inum + list->gnum;
for( itr_i = 0; itr_i < numall; ++itr_i ){
i = ilist[itr_i];
marked[i] = 1;
++num_marked;
jlist = firstneigh[i];
for( itr_j = 0; itr_j < numneigh[i]; ++itr_j ){
j = jlist[itr_j];
j &= NEIGHMASK;
get_distance( x[j], x[i], &d_sqr, &dvec );
if( d_sqr <= SQR(control->nonb_cut) )
++num_nbrs;
}
}
free( marked );
return static_cast<int> (MAX( num_nbrs*safezone, mincap*MIN_NBRS ));
}
/* ---------------------------------------------------------------------- */
int PairReaxC::write_reax_lists()
{
int itr_i, itr_j, i, j;
int num_nbrs;
int *ilist, *jlist, *numneigh, **firstneigh;
double d_sqr;
rvec dvec;
double *dist, **x;
reax_list *far_nbrs;
far_neighbor_data *far_list;
x = atom->x;
ilist = list->ilist;
numneigh = list->numneigh;
firstneigh = list->firstneigh;
far_nbrs = lists + FAR_NBRS;
far_list = far_nbrs->select.far_nbr_list;
num_nbrs = 0;
dist = (double*) calloc( system->N, sizeof(double) );
int numall = list->inum + list->gnum;
for( itr_i = 0; itr_i < numall; ++itr_i ){
i = ilist[itr_i];
jlist = firstneigh[i];
Set_Start_Index( i, num_nbrs, far_nbrs );
for( itr_j = 0; itr_j < numneigh[i]; ++itr_j ){
j = jlist[itr_j];
j &= NEIGHMASK;
get_distance( x[j], x[i], &d_sqr, &dvec );
if( d_sqr <= (control->nonb_cut*control->nonb_cut) ){
dist[j] = sqrt( d_sqr );
set_far_nbr( &far_list[num_nbrs], j, dist[j], dvec );
++num_nbrs;
}
}
Set_End_Index( i, num_nbrs, far_nbrs );
}
free( dist );
return num_nbrs;
}
/* ---------------------------------------------------------------------- */
void PairReaxC::read_reax_forces(int vflag)
{
for( int i = 0; i < system->N; ++i ) {
system->my_atoms[i].f[0] = workspace->f[i][0];
system->my_atoms[i].f[1] = workspace->f[i][1];
system->my_atoms[i].f[2] = workspace->f[i][2];
atom->f[i][0] += -workspace->f[i][0];
atom->f[i][1] += -workspace->f[i][1];
atom->f[i][2] += -workspace->f[i][2];
}
}
/* ---------------------------------------------------------------------- */
void *PairReaxC::extract(const char *str, int &dim)
{
dim = 1;
if (strcmp(str,"chi") == 0 && chi) {
for (int i = 1; i <= atom->ntypes; i++)
if (map[i] >= 0) chi[i] = system->reax_param.sbp[map[i]].chi;
else chi[i] = 0.0;
return (void *) chi;
}
if (strcmp(str,"eta") == 0 && eta) {
for (int i = 1; i <= atom->ntypes; i++)
if (map[i] >= 0) eta[i] = system->reax_param.sbp[map[i]].eta;
else eta[i] = 0.0;
return (void *) eta;
}
if (strcmp(str,"gamma") == 0 && gamma) {
for (int i = 1; i <= atom->ntypes; i++)
if (map[i] >= 0) gamma[i] = system->reax_param.sbp[map[i]].gamma;
else gamma[i] = 0.0;
return (void *) gamma;
}
return NULL;
}
/* ---------------------------------------------------------------------- */
double PairReaxC::memory_usage()
{
double bytes = 0.0;
// From pair_reax_c
bytes += 1.0 * system->N * sizeof(int);
bytes += 1.0 * system->N * sizeof(double);
// From reaxc_allocate: BO
bytes += 1.0 * system->total_cap * sizeof(reax_atom);
bytes += 19.0 * system->total_cap * sizeof(double);
bytes += 3.0 * system->total_cap * sizeof(int);
// From reaxc_lists
bytes += 2.0 * lists->n * sizeof(int);
bytes += lists->num_intrs * sizeof(three_body_interaction_data);
bytes += lists->num_intrs * sizeof(bond_data);
bytes += lists->num_intrs * sizeof(dbond_data);
bytes += lists->num_intrs * sizeof(dDelta_data);
bytes += lists->num_intrs * sizeof(far_neighbor_data);
bytes += lists->num_intrs * sizeof(hbond_data);
if(fixspecies_flag)
bytes += 2 * nmax * MAXSPECBOND * sizeof(double);
return bytes;
}
/* ---------------------------------------------------------------------- */
void PairReaxC::FindBond()
{
int i, j, pj, nj;
double bo_tmp, bo_cut;
bond_data *bo_ij;
bo_cut = 0.10;
for (i = 0; i < system->n; i++) {
nj = 0;
for( pj = Start_Index(i, lists); pj < End_Index(i, lists); ++pj ) {
bo_ij = &( lists->select.bond_list[pj] );
j = bo_ij->nbr;
if (j < i) continue;
bo_tmp = bo_ij->bo_data.BO;
if (bo_tmp >= bo_cut ) {
tmpid[i][nj] = j;
tmpbo[i][nj] = bo_tmp;
nj ++;
if (nj > MAXSPECBOND) error->all(FLERR,"Increase MAXSPECBOND in reaxc_defs.h");
}
}
}
}
diff --git a/src/USER-REAXC/pair_reax_c.h b/src/USER-REAXC/pair_reaxc.h
similarity index 100%
rename from src/USER-REAXC/pair_reax_c.h
rename to src/USER-REAXC/pair_reaxc.h
diff --git a/src/USER-REAXC/reaxc_allocate.cpp b/src/USER-REAXC/reaxc_allocate.cpp
index dc8545e00..969912e08 100644
--- a/src/USER-REAXC/reaxc_allocate.cpp
+++ b/src/USER-REAXC/reaxc_allocate.cpp
@@ -1,461 +1,461 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_allocate.h"
#include "reaxc_list.h"
#include "reaxc_reset_tools.h"
#include "reaxc_tool_box.h"
#include "reaxc_vector.h"
/* allocate space for my_atoms
important: we cannot know the exact number of atoms that will fall into a
process's box throughout the whole simulation. therefore
we need to make upper bound estimates for various data structures */
int PreAllocate_Space( reax_system *system, control_params *control,
storage *workspace, MPI_Comm comm )
{
int mincap = system->mincap;
double safezone = system->safezone;
// determine the local and total capacity
system->local_cap = MAX( (int)(system->n * safezone), mincap );
system->total_cap = MAX( (int)(system->N * safezone), mincap );
system->my_atoms = (reax_atom*)
scalloc( system->total_cap, sizeof(reax_atom), "my_atoms", comm );
return SUCCESS;
}
/************* system *************/
int Allocate_System( reax_system *system, int local_cap, int total_cap,
char *msg )
{
system->my_atoms = (reax_atom*)
realloc( system->my_atoms, total_cap*sizeof(reax_atom) );
return SUCCESS;
}
void DeAllocate_System( reax_system *system )
{
int i, j, k;
int ntypes;
reax_interaction *ff_params;
// dealloocate the atom list
sfree( system->my_atoms, "system->my_atoms" );
// deallocate the ffield parameters storage
ff_params = &(system->reax_param);
ntypes = ff_params->num_atom_types;
sfree( ff_params->gp.l, "ff:globals" );
for( i = 0; i < ntypes; ++i ) {
for( j = 0; j < ntypes; ++j ) {
for( k = 0; k < ntypes; ++k ) {
sfree( ff_params->fbp[i][j][k], "ff:fbp[i,j,k]" );
}
sfree( ff_params->fbp[i][j], "ff:fbp[i,j]" );
sfree( ff_params->thbp[i][j], "ff:thbp[i,j]" );
sfree( ff_params->hbp[i][j], "ff:hbp[i,j]" );
}
sfree( ff_params->fbp[i], "ff:fbp[i]" );
sfree( ff_params->thbp[i], "ff:thbp[i]" );
sfree( ff_params->hbp[i], "ff:hbp[i]" );
sfree( ff_params->tbp[i], "ff:tbp[i]" );
}
sfree( ff_params->fbp, "ff:fbp" );
sfree( ff_params->thbp, "ff:thbp" );
sfree( ff_params->hbp, "ff:hbp" );
sfree( ff_params->tbp, "ff:tbp" );
sfree( ff_params->sbp, "ff:sbp" );
}
/************* workspace *************/
void DeAllocate_Workspace( control_params *control, storage *workspace )
{
int i;
if( !workspace->allocated )
return;
workspace->allocated = 0;
/* communication storage */
for( i = 0; i < MAX_NBRS; ++i ) {
sfree( workspace->tmp_dbl[i], "tmp_dbl[i]" );
sfree( workspace->tmp_rvec[i], "tmp_rvec[i]" );
sfree( workspace->tmp_rvec2[i], "tmp_rvec2[i]" );
}
/* bond order storage */
sfree( workspace->within_bond_box, "skin" );
sfree( workspace->total_bond_order, "total_bo" );
sfree( workspace->Deltap, "Deltap" );
sfree( workspace->Deltap_boc, "Deltap_boc" );
sfree( workspace->dDeltap_self, "dDeltap_self" );
sfree( workspace->Delta, "Delta" );
sfree( workspace->Delta_lp, "Delta_lp" );
sfree( workspace->Delta_lp_temp, "Delta_lp_temp" );
sfree( workspace->dDelta_lp, "dDelta_lp" );
sfree( workspace->dDelta_lp_temp, "dDelta_lp_temp" );
sfree( workspace->Delta_e, "Delta_e" );
sfree( workspace->Delta_boc, "Delta_boc" );
sfree( workspace->Delta_val, "Delta_val" );
sfree( workspace->nlp, "nlp" );
sfree( workspace->nlp_temp, "nlp_temp" );
sfree( workspace->Clp, "Clp" );
sfree( workspace->vlpex, "vlpex" );
sfree( workspace->bond_mark, "bond_mark" );
sfree( workspace->done_after, "done_after" );
/* QEq storage */
sfree( workspace->Hdia_inv, "Hdia_inv" );
sfree( workspace->b_s, "b_s" );
sfree( workspace->b_t, "b_t" );
sfree( workspace->b_prc, "b_prc" );
sfree( workspace->b_prm, "b_prm" );
sfree( workspace->s, "s" );
sfree( workspace->t, "t" );
sfree( workspace->droptol, "droptol" );
sfree( workspace->b, "b" );
sfree( workspace->x, "x" );
/* GMRES storage */
for( i = 0; i < RESTART+1; ++i ) {
sfree( workspace->h[i], "h[i]" );
sfree( workspace->v[i], "v[i]" );
}
sfree( workspace->h, "h" );
sfree( workspace->v, "v" );
sfree( workspace->y, "y" );
sfree( workspace->z, "z" );
sfree( workspace->g, "g" );
sfree( workspace->hs, "hs" );
sfree( workspace->hc, "hc" );
/* CG storage */
sfree( workspace->r, "r" );
sfree( workspace->d, "d" );
sfree( workspace->q, "q" );
sfree( workspace->p, "p" );
sfree( workspace->r2, "r2" );
sfree( workspace->d2, "d2" );
sfree( workspace->q2, "q2" );
sfree( workspace->p2, "p2" );
/* integrator */
sfree( workspace->v_const, "v_const" );
/* force related storage */
sfree( workspace->f, "f" );
sfree( workspace->CdDelta, "CdDelta" );
}
int Allocate_Workspace( reax_system *system, control_params *control,
storage *workspace, int local_cap, int total_cap,
MPI_Comm comm, char *msg )
{
int i, total_real, total_rvec, local_rvec;
workspace->allocated = 1;
total_real = total_cap * sizeof(double);
total_rvec = total_cap * sizeof(rvec);
local_rvec = local_cap * sizeof(rvec);
/* communication storage */
for( i = 0; i < MAX_NBRS; ++i ) {
workspace->tmp_dbl[i] = (double*)
scalloc( total_cap, sizeof(double), "tmp_dbl", comm );
workspace->tmp_rvec[i] = (rvec*)
scalloc( total_cap, sizeof(rvec), "tmp_rvec", comm );
workspace->tmp_rvec2[i] = (rvec2*)
scalloc( total_cap, sizeof(rvec2), "tmp_rvec2", comm );
}
/* bond order related storage */
workspace->within_bond_box = (int*)
scalloc( total_cap, sizeof(int), "skin", comm );
workspace->total_bond_order = (double*) smalloc( total_real, "total_bo", comm );
workspace->Deltap = (double*) smalloc( total_real, "Deltap", comm );
workspace->Deltap_boc = (double*) smalloc( total_real, "Deltap_boc", comm );
workspace->dDeltap_self = (rvec*) smalloc( total_rvec, "dDeltap_self", comm );
workspace->Delta = (double*) smalloc( total_real, "Delta", comm );
workspace->Delta_lp = (double*) smalloc( total_real, "Delta_lp", comm );
workspace->Delta_lp_temp = (double*)
smalloc( total_real, "Delta_lp_temp", comm );
workspace->dDelta_lp = (double*) smalloc( total_real, "dDelta_lp", comm );
workspace->dDelta_lp_temp = (double*)
smalloc( total_real, "dDelta_lp_temp", comm );
workspace->Delta_e = (double*) smalloc( total_real, "Delta_e", comm );
workspace->Delta_boc = (double*) smalloc( total_real, "Delta_boc", comm );
workspace->Delta_val = (double*) smalloc( total_real, "Delta_val", comm );
workspace->nlp = (double*) smalloc( total_real, "nlp", comm );
workspace->nlp_temp = (double*) smalloc( total_real, "nlp_temp", comm );
workspace->Clp = (double*) smalloc( total_real, "Clp", comm );
workspace->vlpex = (double*) smalloc( total_real, "vlpex", comm );
workspace->bond_mark = (int*)
scalloc( total_cap, sizeof(int), "bond_mark", comm );
workspace->done_after = (int*)
scalloc( total_cap, sizeof(int), "done_after", comm );
/* QEq storage */
workspace->Hdia_inv = (double*)
scalloc( total_cap, sizeof(double), "Hdia_inv", comm );
workspace->b_s = (double*) scalloc( total_cap, sizeof(double), "b_s", comm );
workspace->b_t = (double*) scalloc( total_cap, sizeof(double), "b_t", comm );
workspace->b_prc = (double*) scalloc( total_cap, sizeof(double), "b_prc", comm );
workspace->b_prm = (double*) scalloc( total_cap, sizeof(double), "b_prm", comm );
workspace->s = (double*) scalloc( total_cap, sizeof(double), "s", comm );
workspace->t = (double*) scalloc( total_cap, sizeof(double), "t", comm );
workspace->droptol = (double*)
scalloc( total_cap, sizeof(double), "droptol", comm );
workspace->b = (rvec2*) scalloc( total_cap, sizeof(rvec2), "b", comm );
workspace->x = (rvec2*) scalloc( total_cap, sizeof(rvec2), "x", comm );
/* GMRES storage */
workspace->y = (double*) scalloc( RESTART+1, sizeof(double), "y", comm );
workspace->z = (double*) scalloc( RESTART+1, sizeof(double), "z", comm );
workspace->g = (double*) scalloc( RESTART+1, sizeof(double), "g", comm );
workspace->h = (double**) scalloc( RESTART+1, sizeof(double*), "h", comm );
workspace->hs = (double*) scalloc( RESTART+1, sizeof(double), "hs", comm );
workspace->hc = (double*) scalloc( RESTART+1, sizeof(double), "hc", comm );
workspace->v = (double**) scalloc( RESTART+1, sizeof(double*), "v", comm );
for( i = 0; i < RESTART+1; ++i ) {
workspace->h[i] = (double*) scalloc( RESTART+1, sizeof(double), "h[i]", comm );
workspace->v[i] = (double*) scalloc( total_cap, sizeof(double), "v[i]", comm );
}
/* CG storage */
workspace->r = (double*) scalloc( total_cap, sizeof(double), "r", comm );
workspace->d = (double*) scalloc( total_cap, sizeof(double), "d", comm );
workspace->q = (double*) scalloc( total_cap, sizeof(double), "q", comm );
workspace->p = (double*) scalloc( total_cap, sizeof(double), "p", comm );
workspace->r2 = (rvec2*) scalloc( total_cap, sizeof(rvec2), "r2", comm );
workspace->d2 = (rvec2*) scalloc( total_cap, sizeof(rvec2), "d2", comm );
workspace->q2 = (rvec2*) scalloc( total_cap, sizeof(rvec2), "q2", comm );
workspace->p2 = (rvec2*) scalloc( total_cap, sizeof(rvec2), "p2", comm );
/* integrator storage */
workspace->v_const = (rvec*) smalloc( local_rvec, "v_const", comm );
// /* force related storage */
workspace->f = (rvec*) scalloc( total_cap, sizeof(rvec), "f", comm );
workspace->CdDelta = (double*)
scalloc( total_cap, sizeof(double), "CdDelta", comm );
return SUCCESS;
}
static void Reallocate_Neighbor_List( reax_list *far_nbrs, int n,
int num_intrs, MPI_Comm comm )
{
Delete_List( far_nbrs, comm );
if(!Make_List( n, num_intrs, TYP_FAR_NEIGHBOR, far_nbrs, comm )){
fprintf(stderr, "Problem in initializing far nbrs list. Terminating!\n");
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
}
static int Reallocate_HBonds_List( reax_system *system, reax_list *hbonds,
MPI_Comm comm )
{
int i, id, total_hbonds;
int mincap = system->mincap;
double saferzone = system->saferzone;
total_hbonds = 0;
for( i = 0; i < system->n; ++i )
if( (id = system->my_atoms[i].Hindex) >= 0 ) {
total_hbonds += system->my_atoms[i].num_hbonds;
}
total_hbonds = (int)(MAX( total_hbonds*saferzone, mincap*MIN_HBONDS ));
Delete_List( hbonds, comm );
if( !Make_List( system->Hcap, total_hbonds, TYP_HBOND, hbonds, comm ) ) {
fprintf( stderr, "not enough space for hbonds list. terminating!\n" );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
return total_hbonds;
}
static int Reallocate_Bonds_List( reax_system *system, reax_list *bonds,
int *total_bonds, int *est_3body,
MPI_Comm comm )
{
int i;
int mincap = system->mincap;
double safezone = system->safezone;
*total_bonds = 0;
*est_3body = 0;
for( i = 0; i < system->N; ++i ){
*est_3body += SQR(system->my_atoms[i].num_bonds);
*total_bonds += system->my_atoms[i].num_bonds;
}
*total_bonds = (int)(MAX( *total_bonds * safezone, mincap*MIN_BONDS ));
Delete_List( bonds, comm );
if(!Make_List(system->total_cap, *total_bonds, TYP_BOND, bonds, comm)) {
fprintf( stderr, "not enough space for bonds list. terminating!\n" );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
return SUCCESS;
}
void ReAllocate( reax_system *system, control_params *control,
simulation_data *data, storage *workspace, reax_list **lists,
mpi_datatypes *mpi_data )
{
int num_bonds, est_3body, Hflag, ret;
int renbr, newsize;
reallocate_data *realloc;
reax_list *far_nbrs;
MPI_Comm comm;
char msg[200];
int mincap = system->mincap;
double safezone = system->safezone;
double saferzone = system->saferzone;
realloc = &(workspace->realloc);
comm = mpi_data->world;
if( system->n >= DANGER_ZONE * system->local_cap ||
(0 && system->n <= LOOSE_ZONE * system->local_cap) ) {
system->local_cap = MAX( (int)(system->n * safezone), mincap );
}
int Nflag = 0;
if( system->N >= DANGER_ZONE * system->total_cap ||
(0 && system->N <= LOOSE_ZONE * system->total_cap) ) {
Nflag = 1;
system->total_cap = MAX( (int)(system->N * safezone), mincap );
}
if( Nflag ) {
/* system */
ret = Allocate_System( system, system->local_cap, system->total_cap, msg );
if( ret != SUCCESS ) {
fprintf( stderr, "not enough space for atom_list: total_cap=%d",
system->total_cap );
fprintf( stderr, "terminating...\n" );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
/* workspace */
DeAllocate_Workspace( control, workspace );
ret = Allocate_Workspace( system, control, workspace, system->local_cap,
system->total_cap, comm, msg );
if( ret != SUCCESS ) {
fprintf( stderr, "no space for workspace: local_cap=%d total_cap=%d",
system->local_cap, system->total_cap );
fprintf( stderr, "terminating...\n" );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
}
renbr = (data->step - data->prev_steps) % control->reneighbor == 0;
/* far neighbors */
if( renbr ) {
far_nbrs = *lists + FAR_NBRS;
if( Nflag || realloc->num_far >= far_nbrs->num_intrs * DANGER_ZONE ) {
if( realloc->num_far > far_nbrs->num_intrs ) {
fprintf( stderr, "step%d-ran out of space on far_nbrs: top=%d, max=%d",
data->step, realloc->num_far, far_nbrs->num_intrs );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
newsize = static_cast<int>
(MAX( realloc->num_far*safezone, mincap*MIN_NBRS ));
Reallocate_Neighbor_List( far_nbrs, system->total_cap, newsize, comm );
realloc->num_far = 0;
}
}
/* hydrogen bonds list */
if( control->hbond_cut > 0 ) {
Hflag = 0;
if( system->numH >= DANGER_ZONE * system->Hcap ||
(0 && system->numH <= LOOSE_ZONE * system->Hcap) ) {
Hflag = 1;
system->Hcap = int(MAX( system->numH * saferzone, mincap ));
}
if( Hflag || realloc->hbonds ) {
ret = Reallocate_HBonds_List( system, (*lists)+HBONDS, comm );
realloc->hbonds = 0;
}
}
/* bonds list */
num_bonds = est_3body = -1;
if( Nflag || realloc->bonds ){
Reallocate_Bonds_List( system, (*lists)+BONDS, &num_bonds,
&est_3body, comm );
realloc->bonds = 0;
realloc->num_3body = MAX( realloc->num_3body, est_3body );
}
/* 3-body list */
if( realloc->num_3body > 0 ) {
Delete_List( (*lists)+THREE_BODIES, comm );
if( num_bonds == -1 )
num_bonds = ((*lists)+BONDS)->num_intrs;
realloc->num_3body = (int)(MAX(realloc->num_3body*safezone, MIN_3BODIES));
if( !Make_List( num_bonds, realloc->num_3body, TYP_THREE_BODY,
(*lists)+THREE_BODIES, comm ) ) {
fprintf( stderr, "Problem in initializing angles list. Terminating!\n" );
MPI_Abort( comm, CANNOT_INITIALIZE );
}
realloc->num_3body = -1;
}
}
diff --git a/src/USER-REAXC/reaxc_bond_orders.cpp b/src/USER-REAXC/reaxc_bond_orders.cpp
index 0b4ca21ad..04cedf18a 100644
--- a/src/USER-REAXC/reaxc_bond_orders.cpp
+++ b/src/USER-REAXC/reaxc_bond_orders.cpp
@@ -1,592 +1,592 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_types.h"
#include "reaxc_bond_orders.h"
#include "reaxc_list.h"
#include "reaxc_vector.h"
void Add_dBond_to_Forces_NPT( int i, int pj, simulation_data *data,
storage *workspace, reax_list **lists )
{
reax_list *bonds = (*lists) + BONDS;
bond_data *nbr_j, *nbr_k;
bond_order_data *bo_ij, *bo_ji;
dbond_coefficients coef;
rvec temp, ext_press;
ivec rel_box;
int pk, k, j;
/* Initializations */
nbr_j = &(bonds->select.bond_list[pj]);
j = nbr_j->nbr;
bo_ij = &(nbr_j->bo_data);
bo_ji = &(bonds->select.bond_list[ nbr_j->sym_index ].bo_data);
coef.C1dbo = bo_ij->C1dbo * (bo_ij->Cdbo + bo_ji->Cdbo);
coef.C2dbo = bo_ij->C2dbo * (bo_ij->Cdbo + bo_ji->Cdbo);
coef.C3dbo = bo_ij->C3dbo * (bo_ij->Cdbo + bo_ji->Cdbo);
coef.C1dbopi = bo_ij->C1dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
coef.C2dbopi = bo_ij->C2dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
coef.C3dbopi = bo_ij->C3dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
coef.C4dbopi = bo_ij->C4dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
coef.C1dbopi2 = bo_ij->C1dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
coef.C2dbopi2 = bo_ij->C2dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
coef.C3dbopi2 = bo_ij->C3dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
coef.C4dbopi2 = bo_ij->C4dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
coef.C1dDelta = bo_ij->C1dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);
coef.C2dDelta = bo_ij->C2dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);
coef.C3dDelta = bo_ij->C3dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);
for( pk = Start_Index(i, bonds); pk < End_Index(i, bonds); ++pk ) {
nbr_k = &(bonds->select.bond_list[pk]);
k = nbr_k->nbr;
rvec_Scale(temp, -coef.C2dbo, nbr_k->bo_data.dBOp); /*2nd, dBO*/
rvec_ScaledAdd(temp, -coef.C2dDelta, nbr_k->bo_data.dBOp);/*dDelta*/
rvec_ScaledAdd(temp, -coef.C3dbopi, nbr_k->bo_data.dBOp); /*3rd, dBOpi*/
rvec_ScaledAdd(temp, -coef.C3dbopi2, nbr_k->bo_data.dBOp);/*3rd, dBOpi2*/
/* force */
rvec_Add( workspace->f[k], temp );
/* pressure */
rvec_iMultiply( ext_press, nbr_k->rel_box, temp );
rvec_Add( data->my_ext_press, ext_press );
}
/* then atom i itself */
rvec_Scale( temp, coef.C1dbo, bo_ij->dBOp ); /*1st,dBO*/
rvec_ScaledAdd( temp, coef.C2dbo, workspace->dDeltap_self[i] ); /*2nd,dBO*/
rvec_ScaledAdd( temp, coef.C1dDelta, bo_ij->dBOp ); /*1st,dBO*/
rvec_ScaledAdd( temp, coef.C2dDelta, workspace->dDeltap_self[i] );/*2nd,dBO*/
rvec_ScaledAdd( temp, coef.C1dbopi, bo_ij->dln_BOp_pi ); /*1st,dBOpi*/
rvec_ScaledAdd( temp, coef.C2dbopi, bo_ij->dBOp ); /*2nd,dBOpi*/
rvec_ScaledAdd( temp, coef.C3dbopi, workspace->dDeltap_self[i]);/*3rd,dBOpi*/
rvec_ScaledAdd( temp, coef.C1dbopi2, bo_ij->dln_BOp_pi2 ); /*1st,dBO_pi2*/
rvec_ScaledAdd( temp, coef.C2dbopi2, bo_ij->dBOp ); /*2nd,dBO_pi2*/
rvec_ScaledAdd( temp, coef.C3dbopi2, workspace->dDeltap_self[i] );/*3rd*/
/* force */
rvec_Add( workspace->f[i], temp );
for( pk = Start_Index(j, bonds); pk < End_Index(j, bonds); ++pk ) {
nbr_k = &(bonds->select.bond_list[pk]);
k = nbr_k->nbr;
rvec_Scale( temp, -coef.C3dbo, nbr_k->bo_data.dBOp ); /*3rd,dBO*/
rvec_ScaledAdd( temp, -coef.C3dDelta, nbr_k->bo_data.dBOp);/*dDelta*/
rvec_ScaledAdd( temp, -coef.C4dbopi, nbr_k->bo_data.dBOp); /*4th,dBOpi*/
rvec_ScaledAdd( temp, -coef.C4dbopi2, nbr_k->bo_data.dBOp);/*4th,dBOpi2*/
/* force */
rvec_Add( workspace->f[k], temp );
/* pressure */
if( k != i ) {
ivec_Sum( rel_box, nbr_k->rel_box, nbr_j->rel_box ); //rel_box(k, i)
rvec_iMultiply( ext_press, rel_box, temp );
rvec_Add( data->my_ext_press, ext_press );
}
}
/* then atom j itself */
rvec_Scale( temp, -coef.C1dbo, bo_ij->dBOp ); /*1st, dBO*/
rvec_ScaledAdd( temp, coef.C3dbo, workspace->dDeltap_self[j] ); /*2nd, dBO*/
rvec_ScaledAdd( temp, -coef.C1dDelta, bo_ij->dBOp ); /*1st, dBO*/
rvec_ScaledAdd( temp, coef.C3dDelta, workspace->dDeltap_self[j]);/*2nd, dBO*/
rvec_ScaledAdd( temp, -coef.C1dbopi, bo_ij->dln_BOp_pi ); /*1st,dBOpi*/
rvec_ScaledAdd( temp, -coef.C2dbopi, bo_ij->dBOp ); /*2nd,dBOpi*/
rvec_ScaledAdd( temp, coef.C4dbopi, workspace->dDeltap_self[j]);/*3rd,dBOpi*/
rvec_ScaledAdd( temp, -coef.C1dbopi2, bo_ij->dln_BOp_pi2 ); /*1st,dBOpi2*/
rvec_ScaledAdd( temp, -coef.C2dbopi2, bo_ij->dBOp ); /*2nd,dBOpi2*/
rvec_ScaledAdd( temp,coef.C4dbopi2,workspace->dDeltap_self[j]);/*3rd,dBOpi2*/
/* force */
rvec_Add( workspace->f[j], temp );
/* pressure */
rvec_iMultiply( ext_press, nbr_j->rel_box, temp );
rvec_Add( data->my_ext_press, ext_press );
}
void Add_dBond_to_Forces( reax_system *system, int i, int pj,
storage *workspace, reax_list **lists )
{
reax_list *bonds = (*lists) + BONDS;
bond_data *nbr_j, *nbr_k;
bond_order_data *bo_ij, *bo_ji;
dbond_coefficients coef;
int pk, k, j;
/* Virial Tallying variables */
rvec fi_tmp, fj_tmp, fk_tmp, delij, delji, delki, delkj, temp;
/* Initializations */
nbr_j = &(bonds->select.bond_list[pj]);
j = nbr_j->nbr;
bo_ij = &(nbr_j->bo_data);
bo_ji = &(bonds->select.bond_list[ nbr_j->sym_index ].bo_data);
coef.C1dbo = bo_ij->C1dbo * (bo_ij->Cdbo + bo_ji->Cdbo);
coef.C2dbo = bo_ij->C2dbo * (bo_ij->Cdbo + bo_ji->Cdbo);
coef.C3dbo = bo_ij->C3dbo * (bo_ij->Cdbo + bo_ji->Cdbo);
coef.C1dbopi = bo_ij->C1dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
coef.C2dbopi = bo_ij->C2dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
coef.C3dbopi = bo_ij->C3dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
coef.C4dbopi = bo_ij->C4dbopi * (bo_ij->Cdbopi + bo_ji->Cdbopi);
coef.C1dbopi2 = bo_ij->C1dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
coef.C2dbopi2 = bo_ij->C2dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
coef.C3dbopi2 = bo_ij->C3dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
coef.C4dbopi2 = bo_ij->C4dbopi2 * (bo_ij->Cdbopi2 + bo_ji->Cdbopi2);
coef.C1dDelta = bo_ij->C1dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);
coef.C2dDelta = bo_ij->C2dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);
coef.C3dDelta = bo_ij->C3dbo * (workspace->CdDelta[i]+workspace->CdDelta[j]);
// forces on i
rvec_Scale( temp, coef.C1dbo, bo_ij->dBOp );
rvec_ScaledAdd( temp, coef.C2dbo, workspace->dDeltap_self[i] );
rvec_ScaledAdd( temp, coef.C1dDelta, bo_ij->dBOp );
rvec_ScaledAdd( temp, coef.C2dDelta, workspace->dDeltap_self[i] );
rvec_ScaledAdd( temp, coef.C1dbopi, bo_ij->dln_BOp_pi );
rvec_ScaledAdd( temp, coef.C2dbopi, bo_ij->dBOp );
rvec_ScaledAdd( temp, coef.C3dbopi, workspace->dDeltap_self[i]);
rvec_ScaledAdd( temp, coef.C1dbopi2, bo_ij->dln_BOp_pi2 );
rvec_ScaledAdd( temp, coef.C2dbopi2, bo_ij->dBOp );
rvec_ScaledAdd( temp, coef.C3dbopi2, workspace->dDeltap_self[i] );
rvec_Add( workspace->f[i], temp );
if( system->pair_ptr->vflag_atom) {
rvec_Scale(fi_tmp, -1.0, temp);
rvec_ScaledSum( delij, 1., system->my_atoms[i].x,-1., system->my_atoms[j].x );
system->pair_ptr->v_tally(i,fi_tmp,delij);
}
// forces on j
rvec_Scale( temp, -coef.C1dbo, bo_ij->dBOp );
rvec_ScaledAdd( temp, coef.C3dbo, workspace->dDeltap_self[j] );
rvec_ScaledAdd( temp, -coef.C1dDelta, bo_ij->dBOp );
rvec_ScaledAdd( temp, coef.C3dDelta, workspace->dDeltap_self[j]);
rvec_ScaledAdd( temp, -coef.C1dbopi, bo_ij->dln_BOp_pi );
rvec_ScaledAdd( temp, -coef.C2dbopi, bo_ij->dBOp );
rvec_ScaledAdd( temp, coef.C4dbopi, workspace->dDeltap_self[j]);
rvec_ScaledAdd( temp, -coef.C1dbopi2, bo_ij->dln_BOp_pi2 );
rvec_ScaledAdd( temp, -coef.C2dbopi2, bo_ij->dBOp );
rvec_ScaledAdd( temp, coef.C4dbopi2, workspace->dDeltap_self[j]);
rvec_Add( workspace->f[j], temp );
if( system->pair_ptr->vflag_atom) {
rvec_Scale(fj_tmp, -1.0, temp);
rvec_ScaledSum( delji, 1., system->my_atoms[j].x,-1., system->my_atoms[i].x );
system->pair_ptr->v_tally(j,fj_tmp,delji);
}
// forces on k: i neighbor
for( pk = Start_Index(i, bonds); pk < End_Index(i, bonds); ++pk ) {
nbr_k = &(bonds->select.bond_list[pk]);
k = nbr_k->nbr;
rvec_Scale( temp, -coef.C2dbo, nbr_k->bo_data.dBOp);
rvec_ScaledAdd( temp, -coef.C2dDelta, nbr_k->bo_data.dBOp);
rvec_ScaledAdd( temp, -coef.C3dbopi, nbr_k->bo_data.dBOp);
rvec_ScaledAdd( temp, -coef.C3dbopi2, nbr_k->bo_data.dBOp);
rvec_Add( workspace->f[k], temp );
if( system->pair_ptr->vflag_atom ) {
rvec_Scale(fk_tmp, -1.0, temp);
rvec_ScaledSum(delki,1.,system->my_atoms[k].x,-1.,system->my_atoms[i].x);
system->pair_ptr->v_tally(k,fk_tmp,delki);
rvec_ScaledSum(delkj,1.,system->my_atoms[k].x,-1.,system->my_atoms[j].x);
system->pair_ptr->v_tally(k,fk_tmp,delkj);
}
}
// forces on k: j neighbor
for( pk = Start_Index(j, bonds); pk < End_Index(j, bonds); ++pk ) {
nbr_k = &(bonds->select.bond_list[pk]);
k = nbr_k->nbr;
rvec_Scale( temp, -coef.C3dbo, nbr_k->bo_data.dBOp );
rvec_ScaledAdd( temp, -coef.C3dDelta, nbr_k->bo_data.dBOp);
rvec_ScaledAdd( temp, -coef.C4dbopi, nbr_k->bo_data.dBOp);
rvec_ScaledAdd( temp, -coef.C4dbopi2, nbr_k->bo_data.dBOp);
rvec_Add( workspace->f[k], temp );
if( system->pair_ptr->vflag_atom ) {
rvec_Scale(fk_tmp, -1.0, temp);
rvec_ScaledSum(delki,1.,system->my_atoms[k].x,-1.,system->my_atoms[i].x);
system->pair_ptr->v_tally(k,fk_tmp,delki);
rvec_ScaledSum(delkj,1.,system->my_atoms[k].x,-1.,system->my_atoms[j].x);
system->pair_ptr->v_tally(k,fk_tmp,delkj);
}
}
}
int BOp( storage *workspace, reax_list *bonds, double bo_cut,
int i, int btop_i, far_neighbor_data *nbr_pj,
single_body_parameters *sbp_i, single_body_parameters *sbp_j,
two_body_parameters *twbp ) {
int j, btop_j;
double r2, C12, C34, C56;
double Cln_BOp_s, Cln_BOp_pi, Cln_BOp_pi2;
double BO, BO_s, BO_pi, BO_pi2;
bond_data *ibond, *jbond;
bond_order_data *bo_ij, *bo_ji;
j = nbr_pj->nbr;
r2 = SQR(nbr_pj->d);
if( sbp_i->r_s > 0.0 && sbp_j->r_s > 0.0 ) {
C12 = twbp->p_bo1 * pow( nbr_pj->d / twbp->r_s, twbp->p_bo2 );
BO_s = (1.0 + bo_cut) * exp( C12 );
}
else BO_s = C12 = 0.0;
if( sbp_i->r_pi > 0.0 && sbp_j->r_pi > 0.0 ) {
C34 = twbp->p_bo3 * pow( nbr_pj->d / twbp->r_p, twbp->p_bo4 );
BO_pi = exp( C34 );
}
else BO_pi = C34 = 0.0;
if( sbp_i->r_pi_pi > 0.0 && sbp_j->r_pi_pi > 0.0 ) {
C56 = twbp->p_bo5 * pow( nbr_pj->d / twbp->r_pp, twbp->p_bo6 );
BO_pi2= exp( C56 );
}
else BO_pi2 = C56 = 0.0;
/* Initially BO values are the uncorrected ones, page 1 */
BO = BO_s + BO_pi + BO_pi2;
if( BO >= bo_cut ) {
/****** bonds i-j and j-i ******/
ibond = &( bonds->select.bond_list[btop_i] );
btop_j = End_Index( j, bonds );
jbond = &(bonds->select.bond_list[btop_j]);
ibond->nbr = j;
jbond->nbr = i;
ibond->d = nbr_pj->d;
jbond->d = nbr_pj->d;
rvec_Copy( ibond->dvec, nbr_pj->dvec );
rvec_Scale( jbond->dvec, -1, nbr_pj->dvec );
ivec_Copy( ibond->rel_box, nbr_pj->rel_box );
ivec_Scale( jbond->rel_box, -1, nbr_pj->rel_box );
ibond->dbond_index = btop_i;
jbond->dbond_index = btop_i;
ibond->sym_index = btop_j;
jbond->sym_index = btop_i;
Set_End_Index( j, btop_j+1, bonds );
bo_ij = &( ibond->bo_data );
bo_ji = &( jbond->bo_data );
bo_ji->BO = bo_ij->BO = BO;
bo_ji->BO_s = bo_ij->BO_s = BO_s;
bo_ji->BO_pi = bo_ij->BO_pi = BO_pi;
bo_ji->BO_pi2 = bo_ij->BO_pi2 = BO_pi2;
/* Bond Order page2-3, derivative of total bond order prime */
Cln_BOp_s = twbp->p_bo2 * C12 / r2;
Cln_BOp_pi = twbp->p_bo4 * C34 / r2;
Cln_BOp_pi2 = twbp->p_bo6 * C56 / r2;
/* Only dln_BOp_xx wrt. dr_i is stored here, note that
dln_BOp_xx/dr_i = -dln_BOp_xx/dr_j and all others are 0 */
rvec_Scale(bo_ij->dln_BOp_s,-bo_ij->BO_s*Cln_BOp_s,ibond->dvec);
rvec_Scale(bo_ij->dln_BOp_pi,-bo_ij->BO_pi*Cln_BOp_pi,ibond->dvec);
rvec_Scale(bo_ij->dln_BOp_pi2,
-bo_ij->BO_pi2*Cln_BOp_pi2,ibond->dvec);
rvec_Scale(bo_ji->dln_BOp_s, -1., bo_ij->dln_BOp_s);
rvec_Scale(bo_ji->dln_BOp_pi, -1., bo_ij->dln_BOp_pi );
rvec_Scale(bo_ji->dln_BOp_pi2, -1., bo_ij->dln_BOp_pi2 );
rvec_Scale( bo_ij->dBOp,
-(bo_ij->BO_s * Cln_BOp_s +
bo_ij->BO_pi * Cln_BOp_pi +
bo_ij->BO_pi2 * Cln_BOp_pi2), ibond->dvec );
rvec_Scale( bo_ji->dBOp, -1., bo_ij->dBOp );
rvec_Add( workspace->dDeltap_self[i], bo_ij->dBOp );
rvec_Add( workspace->dDeltap_self[j], bo_ji->dBOp );
bo_ij->BO_s -= bo_cut;
bo_ij->BO -= bo_cut;
bo_ji->BO_s -= bo_cut;
bo_ji->BO -= bo_cut;
workspace->total_bond_order[i] += bo_ij->BO; //currently total_BOp
workspace->total_bond_order[j] += bo_ji->BO; //currently total_BOp
bo_ij->Cdbo = bo_ij->Cdbopi = bo_ij->Cdbopi2 = 0.0;
bo_ji->Cdbo = bo_ji->Cdbopi = bo_ji->Cdbopi2 = 0.0;
return 1;
}
return 0;
}
void BO( reax_system *system, control_params *control, simulation_data *data,
storage *workspace, reax_list **lists, output_controls *out_control )
{
int i, j, pj, type_i, type_j;
int start_i, end_i, sym_index;
double val_i, Deltap_i, Deltap_boc_i;
double val_j, Deltap_j, Deltap_boc_j;
double f1, f2, f3, f4, f5, f4f5, exp_f4, exp_f5;
double exp_p1i, exp_p2i, exp_p1j, exp_p2j;
double temp, u1_ij, u1_ji, Cf1A_ij, Cf1B_ij, Cf1_ij, Cf1_ji;
double Cf45_ij, Cf45_ji, p_lp1; //u_ij, u_ji
double A0_ij, A1_ij, A2_ij, A2_ji, A3_ij, A3_ji;
double explp1, p_boc1, p_boc2;
single_body_parameters *sbp_i, *sbp_j;
two_body_parameters *twbp;
bond_order_data *bo_ij, *bo_ji;
reax_list *bonds = (*lists) + BONDS;
p_boc1 = system->reax_param.gp.l[0];
p_boc2 = system->reax_param.gp.l[1];
/* Calculate Deltaprime, Deltaprime_boc values */
for( i = 0; i < system->N; ++i ) {
type_i = system->my_atoms[i].type;
if (type_i < 0) continue;
sbp_i = &(system->reax_param.sbp[type_i]);
workspace->Deltap[i] = workspace->total_bond_order[i] - sbp_i->valency;
workspace->Deltap_boc[i] =
workspace->total_bond_order[i] - sbp_i->valency_val;
workspace->total_bond_order[i] = 0;
}
/* Corrected Bond Order calculations */
for( i = 0; i < system->N; ++i ) {
type_i = system->my_atoms[i].type;
if (type_i < 0) continue;
sbp_i = &(system->reax_param.sbp[type_i]);
val_i = sbp_i->valency;
Deltap_i = workspace->Deltap[i];
Deltap_boc_i = workspace->Deltap_boc[i];
start_i = Start_Index(i, bonds);
end_i = End_Index(i, bonds);
for( pj = start_i; pj < end_i; ++pj ) {
j = bonds->select.bond_list[pj].nbr;
type_j = system->my_atoms[j].type;
if (type_j < 0) continue;
bo_ij = &( bonds->select.bond_list[pj].bo_data );
// fprintf( stderr, "\tj:%d - ubo: %8.3f\n", j+1, bo_ij->BO );
if( i < j || workspace->bond_mark[j] > 3 ) {
twbp = &( system->reax_param.tbp[type_i][type_j] );
if( twbp->ovc < 0.001 && twbp->v13cor < 0.001 ) {
bo_ij->C1dbo = 1.000000;
bo_ij->C2dbo = 0.000000;
bo_ij->C3dbo = 0.000000;
bo_ij->C1dbopi = bo_ij->BO_pi;
bo_ij->C2dbopi = 0.000000;
bo_ij->C3dbopi = 0.000000;
bo_ij->C4dbopi = 0.000000;
bo_ij->C1dbopi2 = bo_ij->BO_pi2;
bo_ij->C2dbopi2 = 0.000000;
bo_ij->C3dbopi2 = 0.000000;
bo_ij->C4dbopi2 = 0.000000;
}
else {
val_j = system->reax_param.sbp[type_j].valency;
Deltap_j = workspace->Deltap[j];
Deltap_boc_j = workspace->Deltap_boc[j];
/* on page 1 */
if( twbp->ovc >= 0.001 ) {
/* Correction for overcoordination */
exp_p1i = exp( -p_boc1 * Deltap_i );
exp_p2i = exp( -p_boc2 * Deltap_i );
exp_p1j = exp( -p_boc1 * Deltap_j );
exp_p2j = exp( -p_boc2 * Deltap_j );
f2 = exp_p1i + exp_p1j;
f3 = -1.0 / p_boc2 * log( 0.5 * ( exp_p2i + exp_p2j ) );
f1 = 0.5 * ( ( val_i + f2 )/( val_i + f2 + f3 ) +
( val_j + f2 )/( val_j + f2 + f3 ) );
temp = f2 + f3;
u1_ij = val_i + temp;
u1_ji = val_j + temp;
Cf1A_ij = 0.5 * f3 * (1.0 / SQR( u1_ij ) +
1.0 / SQR( u1_ji ));
Cf1B_ij = -0.5 * (( u1_ij - f3 ) / SQR( u1_ij ) +
( u1_ji - f3 ) / SQR( u1_ji ));
Cf1_ij = 0.50 * ( -p_boc1 * exp_p1i / u1_ij -
((val_i+f2) / SQR(u1_ij)) *
( -p_boc1 * exp_p1i +
exp_p2i / ( exp_p2i + exp_p2j ) ) +
-p_boc1 * exp_p1i / u1_ji -
((val_j+f2) / SQR(u1_ji)) *
( -p_boc1 * exp_p1i +
exp_p2i / ( exp_p2i + exp_p2j ) ));
Cf1_ji = -Cf1A_ij * p_boc1 * exp_p1j +
Cf1B_ij * exp_p2j / ( exp_p2i + exp_p2j );
}
else {
/* No overcoordination correction! */
f1 = 1.0;
Cf1_ij = Cf1_ji = 0.0;
}
if( twbp->v13cor >= 0.001 ) {
/* Correction for 1-3 bond orders */
exp_f4 =exp(-(twbp->p_boc4 * SQR( bo_ij->BO ) -
Deltap_boc_i) * twbp->p_boc3 + twbp->p_boc5);
exp_f5 =exp(-(twbp->p_boc4 * SQR( bo_ij->BO ) -
Deltap_boc_j) * twbp->p_boc3 + twbp->p_boc5);
f4 = 1. / (1. + exp_f4);
f5 = 1. / (1. + exp_f5);
f4f5 = f4 * f5;
/* Bond Order pages 8-9, derivative of f4 and f5 */
Cf45_ij = -f4 * exp_f4;
Cf45_ji = -f5 * exp_f5;
}
else {
f4 = f5 = f4f5 = 1.0;
Cf45_ij = Cf45_ji = 0.0;
}
/* Bond Order page 10, derivative of total bond order */
A0_ij = f1 * f4f5;
A1_ij = -2 * twbp->p_boc3 * twbp->p_boc4 * bo_ij->BO *
(Cf45_ij + Cf45_ji);
A2_ij = Cf1_ij / f1 + twbp->p_boc3 * Cf45_ij;
A2_ji = Cf1_ji / f1 + twbp->p_boc3 * Cf45_ji;
A3_ij = A2_ij + Cf1_ij / f1;
A3_ji = A2_ji + Cf1_ji / f1;
/* find corrected bond orders and their derivative coef */
bo_ij->BO = bo_ij->BO * A0_ij;
bo_ij->BO_pi = bo_ij->BO_pi * A0_ij *f1;
bo_ij->BO_pi2= bo_ij->BO_pi2* A0_ij *f1;
bo_ij->BO_s = bo_ij->BO - ( bo_ij->BO_pi + bo_ij->BO_pi2 );
bo_ij->C1dbo = A0_ij + bo_ij->BO * A1_ij;
bo_ij->C2dbo = bo_ij->BO * A2_ij;
bo_ij->C3dbo = bo_ij->BO * A2_ji;
bo_ij->C1dbopi = f1*f1*f4*f5;
bo_ij->C2dbopi = bo_ij->BO_pi * A1_ij;
bo_ij->C3dbopi = bo_ij->BO_pi * A3_ij;
bo_ij->C4dbopi = bo_ij->BO_pi * A3_ji;
bo_ij->C1dbopi2 = f1*f1*f4*f5;
bo_ij->C2dbopi2 = bo_ij->BO_pi2 * A1_ij;
bo_ij->C3dbopi2 = bo_ij->BO_pi2 * A3_ij;
bo_ij->C4dbopi2 = bo_ij->BO_pi2 * A3_ji;
}
/* neglect bonds that are < 1e-10 */
if( bo_ij->BO < 1e-10 )
bo_ij->BO = 0.0;
if( bo_ij->BO_s < 1e-10 )
bo_ij->BO_s = 0.0;
if( bo_ij->BO_pi < 1e-10 )
bo_ij->BO_pi = 0.0;
if( bo_ij->BO_pi2 < 1e-10 )
bo_ij->BO_pi2 = 0.0;
workspace->total_bond_order[i] += bo_ij->BO; //now keeps total_BO
}
else {
/* We only need to update bond orders from bo_ji
everything else is set in uncorrected_bo calculations */
sym_index = bonds->select.bond_list[pj].sym_index;
bo_ji = &(bonds->select.bond_list[ sym_index ].bo_data);
bo_ij->BO = bo_ji->BO;
bo_ij->BO_s = bo_ji->BO_s;
bo_ij->BO_pi = bo_ji->BO_pi;
bo_ij->BO_pi2 = bo_ji->BO_pi2;
workspace->total_bond_order[i] += bo_ij->BO;// now keeps total_BO
}
}
}
p_lp1 = system->reax_param.gp.l[15];
for( j = 0; j < system->N; ++j ){
type_j = system->my_atoms[j].type;
if (type_j < 0) continue;
sbp_j = &(system->reax_param.sbp[ type_j ]);
workspace->Delta[j] = workspace->total_bond_order[j] - sbp_j->valency;
workspace->Delta_e[j] = workspace->total_bond_order[j] - sbp_j->valency_e;
workspace->Delta_boc[j] = workspace->total_bond_order[j] -
sbp_j->valency_boc;
workspace->Delta_val[j] = workspace->total_bond_order[j] -
sbp_j->valency_val;
workspace->vlpex[j] = workspace->Delta_e[j] -
2.0 * (int)(workspace->Delta_e[j]/2.0);
explp1 = exp(-p_lp1 * SQR(2.0 + workspace->vlpex[j]));
workspace->nlp[j] = explp1 - (int)(workspace->Delta_e[j] / 2.0);
workspace->Delta_lp[j] = sbp_j->nlp_opt - workspace->nlp[j];
workspace->Clp[j] = 2.0 * p_lp1 * explp1 * (2.0 + workspace->vlpex[j]);
workspace->dDelta_lp[j] = workspace->Clp[j];
if( sbp_j->mass > 21.0 ) {
workspace->nlp_temp[j] = 0.5 * (sbp_j->valency_e - sbp_j->valency);
workspace->Delta_lp_temp[j] = sbp_j->nlp_opt - workspace->nlp_temp[j];
workspace->dDelta_lp_temp[j] = 0.;
}
else {
workspace->nlp_temp[j] = workspace->nlp[j];
workspace->Delta_lp_temp[j] = sbp_j->nlp_opt - workspace->nlp_temp[j];
workspace->dDelta_lp_temp[j] = workspace->Clp[j];
}
}
}
diff --git a/src/USER-REAXC/reaxc_bonds.cpp b/src/USER-REAXC/reaxc_bonds.cpp
index e0ef38ba0..a8a129816 100644
--- a/src/USER-REAXC/reaxc_bonds.cpp
+++ b/src/USER-REAXC/reaxc_bonds.cpp
@@ -1,137 +1,137 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_bonds.h"
#include "reaxc_bond_orders.h"
#include "reaxc_list.h"
#include "reaxc_tool_box.h"
#include "reaxc_vector.h"
void Bonds( reax_system *system, control_params *control,
simulation_data *data, storage *workspace, reax_list **lists,
output_controls *out_control )
{
int i, j, pj, natoms;
int start_i, end_i;
int type_i, type_j;
double ebond, pow_BOs_be2, exp_be12, CEbo;
double gp3, gp4, gp7, gp10, gp37;
double exphu, exphua1, exphub1, exphuov, hulpov, estriph;
double decobdbo, decobdboua, decobdboub;
single_body_parameters *sbp_i, *sbp_j;
two_body_parameters *twbp;
bond_order_data *bo_ij;
reax_list *bonds;
bonds = (*lists) + BONDS;
gp3 = system->reax_param.gp.l[3];
gp4 = system->reax_param.gp.l[4];
gp7 = system->reax_param.gp.l[7];
gp10 = system->reax_param.gp.l[10];
gp37 = (int) system->reax_param.gp.l[37];
natoms = system->n;
for( i = 0; i < natoms; ++i ) {
start_i = Start_Index(i, bonds);
end_i = End_Index(i, bonds);
for( pj = start_i; pj < end_i; ++pj ) {
j = bonds->select.bond_list[pj].nbr;
if( system->my_atoms[i].orig_id > system->my_atoms[j].orig_id )
continue;
if( system->my_atoms[i].orig_id == system->my_atoms[j].orig_id ) {
if (system->my_atoms[j].x[2] < system->my_atoms[i].x[2]) continue;
if (system->my_atoms[j].x[2] == system->my_atoms[i].x[2] &&
system->my_atoms[j].x[1] < system->my_atoms[i].x[1]) continue;
if (system->my_atoms[j].x[2] == system->my_atoms[i].x[2] &&
system->my_atoms[j].x[1] == system->my_atoms[i].x[1] &&
system->my_atoms[j].x[0] < system->my_atoms[i].x[0]) continue;
}
/* set the pointers */
type_i = system->my_atoms[i].type;
type_j = system->my_atoms[j].type;
sbp_i = &( system->reax_param.sbp[type_i] );
sbp_j = &( system->reax_param.sbp[type_j] );
twbp = &( system->reax_param.tbp[type_i][type_j] );
bo_ij = &( bonds->select.bond_list[pj].bo_data );
/* calculate the constants */
pow_BOs_be2 = pow( bo_ij->BO_s, twbp->p_be2 );
exp_be12 = exp( twbp->p_be1 * ( 1.0 - pow_BOs_be2 ) );
CEbo = -twbp->De_s * exp_be12 *
( 1.0 - twbp->p_be1 * twbp->p_be2 * pow_BOs_be2 );
/* calculate the Bond Energy */
data->my_en.e_bond += ebond =
-twbp->De_s * bo_ij->BO_s * exp_be12
-twbp->De_p * bo_ij->BO_pi
-twbp->De_pp * bo_ij->BO_pi2;
/* tally into per-atom energy */
if( system->pair_ptr->evflag)
system->pair_ptr->ev_tally(i,j,natoms,1,ebond,0.0,0.0,0.0,0.0,0.0);
/* calculate derivatives of Bond Orders */
bo_ij->Cdbo += CEbo;
bo_ij->Cdbopi -= (CEbo + twbp->De_p);
bo_ij->Cdbopi2 -= (CEbo + twbp->De_pp);
/* Stabilisation terminal triple bond */
if( bo_ij->BO >= 1.00 ) {
if( gp37 == 2 ||
(sbp_i->mass == 12.0000 && sbp_j->mass == 15.9990) ||
(sbp_j->mass == 12.0000 && sbp_i->mass == 15.9990) ) {
exphu = exp( -gp7 * SQR(bo_ij->BO - 2.50) );
exphua1 = exp(-gp3 * (workspace->total_bond_order[i]-bo_ij->BO));
exphub1 = exp(-gp3 * (workspace->total_bond_order[j]-bo_ij->BO));
exphuov = exp(gp4 * (workspace->Delta[i] + workspace->Delta[j]));
hulpov = 1.0 / (1.0 + 25.0 * exphuov);
estriph = gp10 * exphu * hulpov * (exphua1 + exphub1);
data->my_en.e_bond += estriph;
decobdbo = gp10 * exphu * hulpov * (exphua1 + exphub1) *
( gp3 - 2.0 * gp7 * (bo_ij->BO-2.50) );
decobdboua = -gp10 * exphu * hulpov *
(gp3*exphua1 + 25.0*gp4*exphuov*hulpov*(exphua1+exphub1));
decobdboub = -gp10 * exphu * hulpov *
(gp3*exphub1 + 25.0*gp4*exphuov*hulpov*(exphua1+exphub1));
/* tally into per-atom energy */
if( system->pair_ptr->evflag)
system->pair_ptr->ev_tally(i,j,natoms,1,estriph,0.0,0.0,0.0,0.0,0.0);
bo_ij->Cdbo += decobdbo;
workspace->CdDelta[i] += decobdboua;
workspace->CdDelta[j] += decobdboub;
}
}
}
}
}
diff --git a/src/USER-REAXC/reaxc_control.cpp b/src/USER-REAXC/reaxc_control.cpp
index 3753360c6..4def41bc8 100644
--- a/src/USER-REAXC/reaxc_control.cpp
+++ b/src/USER-REAXC/reaxc_control.cpp
@@ -1,385 +1,385 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_control.h"
#include "reaxc_tool_box.h"
char Read_Control_File( char *control_file, control_params* control,
output_controls *out_control )
{
FILE *fp;
char *s, **tmp;
int i,ival;
double val;
/* open control file */
if ( (fp = fopen( control_file, "r" ) ) == NULL ) {
fprintf( stderr, "error opening the control file! terminating...\n" );
MPI_Abort( MPI_COMM_WORLD, FILE_NOT_FOUND );
}
/* assign default values */
strcpy( control->sim_name, "simulate" );
control->ensemble = NVE;
control->nsteps = 0;
control->dt = 0.25;
control->nprocs = 1;
control->procs_by_dim[0] = 1;
control->procs_by_dim[1] = 1;
control->procs_by_dim[2] = 1;
control->geo_format = 1;
control->restart = 0;
out_control->restart_format = WRITE_BINARY;
out_control->restart_freq = 0;
control->reposition_atoms = 0;
control->restrict_bonds = 0;
control->remove_CoM_vel = 25;
out_control->debug_level = 0;
out_control->energy_update_freq = 0;
control->reneighbor = 1;
control->vlist_cut = control->nonb_cut;
control->bond_cut = 5.0;
control->bg_cut = 0.3;
control->thb_cut = 0.001;
control->thb_cutsq = 0.00001;
control->hbond_cut = 7.5;
control->tabulate = 0;
control->qeq_freq = 1;
control->q_err = 1e-6;
control->refactor = 100;
control->droptol = 1e-2;;
control->T_init = 0.;
control->T_final = 300.;
control->Tau_T = 500.0;
control->T_mode = 0;
control->T_rate = 1.;
control->T_freq = 1.;
control->P[0] = control->P[1] = control->P[2] = 0.000101325;
control->Tau_P[0] = control->Tau_P[1] = control->Tau_P[2] = 500.0;
control->Tau_PT[0] = control->Tau_PT[1] = control->Tau_PT[2] = 500.0;
control->compressibility = 1.0;
control->press_mode = 0;
control->virial = 0;
out_control->write_steps = 0;
out_control->traj_compress = 0;
out_control->traj_method = REG_TRAJ;
strcpy( out_control->traj_title, "default_title" );
out_control->atom_info = 0;
out_control->bond_info = 0;
out_control->angle_info = 0;
control->molecular_analysis = 0;
control->dipole_anal = 0;
control->freq_dipole_anal = 0;
control->diffusion_coef = 0;
control->freq_diffusion_coef = 0;
control->restrict_type = 0;
/* memory allocations */
s = (char*) malloc(sizeof(char)*MAX_LINE);
tmp = (char**) malloc(sizeof(char*)*MAX_TOKENS);
for (i=0; i < MAX_TOKENS; i++)
tmp[i] = (char*) malloc(sizeof(char)*MAX_LINE);
/* read control parameters file */
while (!feof(fp)) {
fgets( s, MAX_LINE, fp );
Tokenize( s, &tmp );
if( strcmp(tmp[0], "simulation_name") == 0 ) {
strcpy( control->sim_name, tmp[1] );
}
else if( strcmp(tmp[0], "ensemble_type") == 0 ) {
ival = atoi(tmp[1]);
control->ensemble = ival;
if( ival == iNPT || ival ==sNPT || ival == NPT )
control->virial = 1;
}
else if( strcmp(tmp[0], "nsteps") == 0 ) {
ival = atoi(tmp[1]);
control->nsteps = ival;
}
else if( strcmp(tmp[0], "dt") == 0) {
val = atof(tmp[1]);
control->dt = val * 1.e-3; // convert dt from fs to ps!
}
else if( strcmp(tmp[0], "proc_by_dim") == 0 ) {
ival = atoi(tmp[1]);
control->procs_by_dim[0] = ival;
ival = atoi(tmp[2]);
control->procs_by_dim[1] = ival;
ival = atoi(tmp[3]);
control->procs_by_dim[2] = ival;
control->nprocs = control->procs_by_dim[0]*control->procs_by_dim[1]*
control->procs_by_dim[2];
}
else if( strcmp(tmp[0], "random_vel") == 0 ) {
ival = atoi(tmp[1]);
control->random_vel = ival;
}
else if( strcmp(tmp[0], "restart_format") == 0 ) {
ival = atoi(tmp[1]);
out_control->restart_format = ival;
}
else if( strcmp(tmp[0], "restart_freq") == 0 ) {
ival = atoi(tmp[1]);
out_control->restart_freq = ival;
}
else if( strcmp(tmp[0], "reposition_atoms") == 0 ) {
ival = atoi(tmp[1]);
control->reposition_atoms = ival;
}
else if( strcmp(tmp[0], "restrict_bonds") == 0 ) {
ival = atoi( tmp[1] );
control->restrict_bonds = ival;
}
else if( strcmp(tmp[0], "remove_CoM_vel") == 0 ) {
ival = atoi(tmp[1]);
control->remove_CoM_vel = ival;
}
else if( strcmp(tmp[0], "debug_level") == 0 ) {
ival = atoi(tmp[1]);
out_control->debug_level = ival;
}
else if( strcmp(tmp[0], "energy_update_freq") == 0 ) {
ival = atoi(tmp[1]);
out_control->energy_update_freq = ival;
}
else if( strcmp(tmp[0], "reneighbor") == 0 ) {
ival = atoi( tmp[1] );
control->reneighbor = ival;
}
else if( strcmp(tmp[0], "vlist_buffer") == 0 ) {
val = atof(tmp[1]);
control->vlist_cut= val + control->nonb_cut;
}
else if( strcmp(tmp[0], "nbrhood_cutoff") == 0 ) {
val = atof(tmp[1]);
control->bond_cut = val;
}
else if( strcmp(tmp[0], "bond_graph_cutoff") == 0 ) {
val = atof(tmp[1]);
control->bg_cut = val;
}
else if( strcmp(tmp[0], "thb_cutoff") == 0 ) {
val = atof(tmp[1]);
control->thb_cut = val;
}
else if( strcmp(tmp[0], "thb_cutoff_sq") == 0 ) {
val = atof(tmp[1]);
control->thb_cutsq = val;
}
else if( strcmp(tmp[0], "hbond_cutoff") == 0 ) {
val = atof( tmp[1] );
control->hbond_cut = val;
}
else if( strcmp(tmp[0], "ghost_cutoff") == 0 ) {
val = atof(tmp[1]);
control->user_ghost_cut = val;
}
else if( strcmp(tmp[0], "tabulate_long_range") == 0 ) {
ival = atoi( tmp[1] );
control->tabulate = ival;
}
else if( strcmp(tmp[0], "qeq_freq") == 0 ) {
ival = atoi( tmp[1] );
control->qeq_freq = ival;
}
else if( strcmp(tmp[0], "q_err") == 0 ) {
val = atof( tmp[1] );
control->q_err = val;
}
else if( strcmp(tmp[0], "ilu_refactor") == 0 ) {
ival = atoi( tmp[1] );
control->refactor = ival;
}
else if( strcmp(tmp[0], "ilu_droptol") == 0 ) {
val = atof( tmp[1] );
control->droptol = val;
}
else if( strcmp(tmp[0], "temp_init") == 0 ) {
val = atof(tmp[1]);
control->T_init = val;
if( control->T_init < 0.1 )
control->T_init = 0.1;
}
else if( strcmp(tmp[0], "temp_final") == 0 ) {
val = atof(tmp[1]);
control->T_final = val;
if( control->T_final < 0.1 )
control->T_final = 0.1;
}
else if( strcmp(tmp[0], "t_mass") == 0 ) {
val = atof(tmp[1]);
control->Tau_T = val * 1.e-3; // convert t_mass from fs to ps
}
else if( strcmp(tmp[0], "t_mode") == 0 ) {
ival = atoi(tmp[1]);
control->T_mode = ival;
}
else if( strcmp(tmp[0], "t_rate") == 0 ) {
val = atof(tmp[1]);
control->T_rate = val;
}
else if( strcmp(tmp[0], "t_freq") == 0 ) {
val = atof(tmp[1]);
control->T_freq = val;
}
else if( strcmp(tmp[0], "pressure") == 0 ) {
if( control->ensemble == iNPT ) {
control->P[0] = control->P[1] = control->P[2] = atof(tmp[1]);
}
else if( control->ensemble == sNPT ) {
control->P[0] = atof(tmp[1]);
control->P[1] = atof(tmp[2]);
control->P[2] = atof(tmp[3]);
}
}
else if( strcmp(tmp[0], "p_mass") == 0 ) {
// convert p_mass from fs to ps
if( control->ensemble == iNPT ) {
control->Tau_P[0] = control->Tau_P[1] = control->Tau_P[2] =
atof(tmp[1]) * 1.e-3;
}
else if( control->ensemble == sNPT ) {
control->Tau_P[0] = atof(tmp[1]) * 1.e-3;
control->Tau_P[1] = atof(tmp[2]) * 1.e-3;
control->Tau_P[2] = atof(tmp[3]) * 1.e-3;
}
}
else if( strcmp(tmp[0], "pt_mass") == 0 ) {
val = atof(tmp[1]);
control->Tau_PT[0] = control->Tau_PT[1] = control->Tau_PT[2] =
val * 1.e-3; // convert pt_mass from fs to ps
}
else if( strcmp(tmp[0], "compress") == 0 ) {
val = atof(tmp[1]);
control->compressibility = val;
}
else if( strcmp(tmp[0], "press_mode") == 0 ) {
ival = atoi(tmp[1]);
control->press_mode = ival;
}
else if( strcmp(tmp[0], "geo_format") == 0 ) {
ival = atoi( tmp[1] );
control->geo_format = ival;
}
else if( strcmp(tmp[0], "write_freq") == 0 ) {
ival = atoi(tmp[1]);
out_control->write_steps = ival;
}
else if( strcmp(tmp[0], "traj_compress") == 0 ) {
ival = atoi(tmp[1]);
out_control->traj_compress = ival;
}
else if( strcmp(tmp[0], "traj_method") == 0 ) {
ival = atoi(tmp[1]);
out_control->traj_method = ival;
}
else if( strcmp(tmp[0], "traj_title") == 0 ) {
strcpy( out_control->traj_title, tmp[1] );
}
else if( strcmp(tmp[0], "atom_info") == 0 ) {
ival = atoi(tmp[1]);
out_control->atom_info += ival * 4;
}
else if( strcmp(tmp[0], "atom_velocities") == 0 ) {
ival = atoi(tmp[1]);
out_control->atom_info += ival * 2;
}
else if( strcmp(tmp[0], "atom_forces") == 0 ) {
ival = atoi(tmp[1]);
out_control->atom_info += ival * 1;
}
else if( strcmp(tmp[0], "bond_info") == 0 ) {
ival = atoi(tmp[1]);
out_control->bond_info = ival;
}
else if( strcmp(tmp[0], "angle_info") == 0 ) {
ival = atoi(tmp[1]);
out_control->angle_info = ival;
}
else if( strcmp(tmp[0], "molecular_analysis") == 0 ) {
ival = atoi(tmp[1]);
control->molecular_analysis = ival;
}
else if( strcmp(tmp[0], "ignore") == 0 ) {
control->num_ignored = atoi(tmp[1]);
for( i = 0; i < control->num_ignored; ++i )
control->ignore[atoi(tmp[i+2])] = 1;
}
else if( strcmp(tmp[0], "dipole_anal") == 0 ) {
ival = atoi(tmp[1]);
control->dipole_anal = ival;
}
else if( strcmp(tmp[0], "freq_dipole_anal") == 0 ) {
ival = atoi(tmp[1]);
control->freq_dipole_anal = ival;
}
else if( strcmp(tmp[0], "diffusion_coef") == 0 ) {
ival = atoi(tmp[1]);
control->diffusion_coef = ival;
}
else if( strcmp(tmp[0], "freq_diffusion_coef") == 0 ) {
ival = atoi(tmp[1]);
control->freq_diffusion_coef = ival;
}
else if( strcmp(tmp[0], "restrict_type") == 0 ) {
ival = atoi(tmp[1]);
control->restrict_type = ival;
}
else {
fprintf( stderr, "WARNING: unknown parameter %s\n", tmp[0] );
MPI_Abort( MPI_COMM_WORLD, 15 );
}
}
/* determine target T */
if( control->T_mode == 0 )
control->T = control->T_final;
else control->T = control->T_init;
/* free memory allocations at the top */
for( i = 0; i < MAX_TOKENS; i++ )
free( tmp[i] );
free( tmp );
free( s );
fclose(fp);
return SUCCESS;
}
diff --git a/src/USER-REAXC/reaxc_defs.h b/src/USER-REAXC/reaxc_defs.h
index d0a75d431..101b554fb 100644
--- a/src/USER-REAXC/reaxc_defs.h
+++ b/src/USER-REAXC/reaxc_defs.h
@@ -1,159 +1,159 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
#ifndef REAX_DEFS_H
#define REAX_DEFS_H
#if defined(__IBMC__)
#define inline __inline__
#endif /*IBMC*/
#ifndef SUCCESS
#define SUCCESS 1
#endif
#ifndef FAILURE
#define FAILURE 0
#endif
#ifndef TRUE
#define TRUE 1
#endif
#ifndef FALSE
#define FALSE 0
#endif
#define SQR(x) ((x)*(x))
#define CUBE(x) ((x)*(x)*(x))
#define DEG2RAD(a) ((a)*constPI/180.0)
#define RAD2DEG(a) ((a)*180.0/constPI)
// #define MAX(x,y) (((x) > (y)) ? (x) : (y))
// #define MIN(x,y) (((x) < (y)) ? (x) : (y))
#define MAX3(x,y,z) MAX( MAX(x,y), z)
#define constPI 3.14159265
#define C_ele 332.06371
//#define K_B 503.398008 // kcal/mol/K
#define K_B 0.831687 // amu A^2 / ps^2 / K
#define F_CONV 1e6 / 48.88821291 / 48.88821291 // --> amu A / ps^2
#define E_CONV 0.002391 // amu A^2 / ps^2 --> kcal/mol
#define EV_to_KCALpMOL 14.400000 // ElectronVolt --> KCAL per MOLe
#define KCALpMOL_to_EV 23.02 // 23.060549 //KCAL per MOLe --> ElectronVolt
#define ECxA_to_DEBYE 4.803204 // elem. charge * Ang -> debye
#define CAL_to_JOULES 4.184000 // CALories --> JOULES
#define JOULES_to_CAL 1/4.184000 // JOULES --> CALories
#define AMU_to_GRAM 1.6605e-24
#define ANG_to_CM 1e-8
#define AVOGNR 6.0221367e23
#define P_CONV 1e-24 * AVOGNR * JOULES_to_CAL
#define MAX_STR 1024
#define MAX_LINE 1024
#define MAX_TOKENS 1024
#define MAX_TOKEN_LEN 1024
#define MAX_ATOM_ID 100000
#define MAX_RESTRICT 15
#define MAX_MOLECULE_SIZE 20
#define MAX_ATOM_TYPES 25
#define NUM_INTRS 10
#define ALMOST_ZERO 1e-10
#define NEG_INF -1e10
#define NO_BOND 1e-3 // 0.001
#define HB_THRESHOLD 1e-2 // 0.01
#define MIN_CAP 50
#define MIN_NBRS 100
#define MIN_HENTRIES 100
#define MAX_BONDS 30
#define MIN_BONDS 25
#define MIN_HBONDS 25
#define MIN_3BODIES 1000
#define MIN_GCELL_POPL 50
#define MIN_SEND 100
#define SAFE_ZONE 1.2
#define SAFER_ZONE 1.4
#define DANGER_ZONE 0.90
#define LOOSE_ZONE 0.75
#define MAX_3BODY_PARAM 5
#define MAX_4BODY_PARAM 5
#define MAX_dV 1.01
#define MIN_dV 0.99
#define MAX_dT 4.00
#define MIN_dT 0.00
#define MASTER_NODE 0
#define MAX_NBRS 6 //27
#define MYSELF 13 // encoding of relative coordinate (0,0,0)
#define MAX_ITR 10
#define RESTART 30
#define MAX_BOND 20
-#define MAXREAXBOND 24 /* used in fix_reaxc_bonds.cpp and pair_reax_c.cpp */
-#define MAXSPECBOND 24 /* used in fix_reaxc_species.cpp and pair_reax_c.cpp */
+#define MAXREAXBOND 24 /* used in fix_reaxc_bonds.cpp and pair_reaxc.cpp */
+#define MAXSPECBOND 24 /* used in fix_reaxc_species.cpp and pair_reaxc.cpp */
/******************* ENUMERATIONS *************************/
enum geo_formats { CUSTOM, PDB, ASCII_RESTART, BINARY_RESTART, GF_N };
enum restart_formats { WRITE_ASCII, WRITE_BINARY, RF_N };
enum ensembles { NVE, bNVT, nhNVT, sNPT, iNPT, NPT, ens_N };
enum lists { BONDS, OLD_BONDS, THREE_BODIES,
HBONDS, FAR_NBRS, DBOS, DDELTAS, LIST_N };
enum interactions { TYP_VOID, TYP_BOND, TYP_THREE_BODY,
TYP_HBOND, TYP_FAR_NEIGHBOR, TYP_DBO, TYP_DDELTA, TYP_N };
enum message_tags { INIT, UPDATE, BNDRY, UPDATE_BNDRY,
EXC_VEC1, EXC_VEC2, DIST_RVEC2, COLL_RVEC2,
DIST_RVECS, COLL_RVECS, INIT_DESCS, ATOM_LINES,
BOND_LINES, ANGLE_LINES, RESTART_ATOMS, TAGS_N };
enum errors { FILE_NOT_FOUND = -10, UNKNOWN_ATOM_TYPE = -11,
CANNOT_OPEN_FILE = -12, CANNOT_INITIALIZE = -13,
INSUFFICIENT_MEMORY = -14, UNKNOWN_OPTION = -15,
INVALID_INPUT = -16, INVALID_GEO = -17 };
enum exchanges { NONE, NEAR_EXCH, FULL_EXCH };
enum gcell_types { NO_NBRS=0, NEAR_ONLY=1, HBOND_ONLY=2, FAR_ONLY=4,
NEAR_HBOND=3, NEAR_FAR=5, HBOND_FAR=6, FULL_NBRS=7,
NATIVE=8 };
enum atoms { C_ATOM = 0, H_ATOM = 1, O_ATOM = 2, N_ATOM = 3,
S_ATOM = 4, SI_ATOM = 5, GE_ATOM = 6, X_ATOM = 7 };
enum traj_methods { REG_TRAJ, MPI_TRAJ, TF_N };
enum molecules { UNKNOWN, WATER };
#endif
diff --git a/src/USER-REAXC/reaxc_ffield.cpp b/src/USER-REAXC/reaxc_ffield.cpp
index fda284140..58a347ebf 100644
--- a/src/USER-REAXC/reaxc_ffield.cpp
+++ b/src/USER-REAXC/reaxc_ffield.cpp
@@ -1,699 +1,699 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "error.h"
#include "reaxc_ffield.h"
#include "reaxc_tool_box.h"
char Read_Force_Field( FILE *fp, reax_interaction *reax,
control_params *control )
{
char *s;
char **tmp;
char ****tor_flag;
int c, i, j, k, l, m, n, o, p, cnt;
int lgflag = control->lgflag;
int errorflag = 1;
double val;
MPI_Comm comm;
comm = MPI_COMM_WORLD;
s = (char*) malloc(sizeof(char)*MAX_LINE);
tmp = (char**) malloc(sizeof(char*)*MAX_TOKENS);
for (i=0; i < MAX_TOKENS; i++)
tmp[i] = (char*) malloc(sizeof(char)*MAX_TOKEN_LEN);
/* reading first header comment */
fgets( s, MAX_LINE, fp );
/* line 2 is number of global parameters */
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
/* reading the number of global parameters */
n = atoi(tmp[0]);
if (n < 1) {
fprintf( stderr, "WARNING: number of globals in ffield file is 0!\n" );
fclose(fp);
free(s);
free(tmp);
return 1;
}
reax->gp.n_global = n;
reax->gp.l = (double*) malloc(sizeof(double)*n);
/* see reax_types.h for mapping between l[i] and the lambdas used in ff */
for (i=0; i < n; i++) {
fgets(s,MAX_LINE,fp);
c = Tokenize(s,&tmp);
val = (double) atof(tmp[0]);
reax->gp.l[i] = val;
}
control->bo_cut = 0.01 * reax->gp.l[29];
control->nonb_low = reax->gp.l[11];
control->nonb_cut = reax->gp.l[12];
/* next line is number of atom types and some comments */
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
reax->num_atom_types = atoi(tmp[0]);
/* 3 lines of comments */
fgets(s,MAX_LINE,fp);
fgets(s,MAX_LINE,fp);
fgets(s,MAX_LINE,fp);
/* Allocating structures in reax_interaction */
reax->sbp = (single_body_parameters*)
scalloc( reax->num_atom_types, sizeof(single_body_parameters), "sbp",
comm );
reax->tbp = (two_body_parameters**)
scalloc( reax->num_atom_types, sizeof(two_body_parameters*), "tbp", comm );
reax->thbp= (three_body_header***)
scalloc( reax->num_atom_types, sizeof(three_body_header**), "thbp", comm );
reax->hbp = (hbond_parameters***)
scalloc( reax->num_atom_types, sizeof(hbond_parameters**), "hbp", comm );
reax->fbp = (four_body_header****)
scalloc( reax->num_atom_types, sizeof(four_body_header***), "fbp", comm );
tor_flag = (char****)
scalloc( reax->num_atom_types, sizeof(char***), "tor_flag", comm );
for( i = 0; i < reax->num_atom_types; i++ ) {
reax->tbp[i] = (two_body_parameters*)
scalloc( reax->num_atom_types, sizeof(two_body_parameters), "tbp[i]",
comm );
reax->thbp[i]= (three_body_header**)
scalloc( reax->num_atom_types, sizeof(three_body_header*), "thbp[i]",
comm );
reax->hbp[i] = (hbond_parameters**)
scalloc( reax->num_atom_types, sizeof(hbond_parameters*), "hbp[i]",
comm );
reax->fbp[i] = (four_body_header***)
scalloc( reax->num_atom_types, sizeof(four_body_header**), "fbp[i]",
comm );
tor_flag[i] = (char***)
scalloc( reax->num_atom_types, sizeof(char**), "tor_flag[i]", comm );
for( j = 0; j < reax->num_atom_types; j++ ) {
reax->thbp[i][j]= (three_body_header*)
scalloc( reax->num_atom_types, sizeof(three_body_header), "thbp[i,j]",
comm );
reax->hbp[i][j] = (hbond_parameters*)
scalloc( reax->num_atom_types, sizeof(hbond_parameters), "hbp[i,j]",
comm );
reax->fbp[i][j] = (four_body_header**)
scalloc( reax->num_atom_types, sizeof(four_body_header*), "fbp[i,j]",
comm );
tor_flag[i][j] = (char**)
scalloc( reax->num_atom_types, sizeof(char*), "tor_flag[i,j]", comm );
for (k=0; k < reax->num_atom_types; k++) {
reax->fbp[i][j][k] = (four_body_header*)
scalloc( reax->num_atom_types, sizeof(four_body_header), "fbp[i,j,k]",
comm );
tor_flag[i][j][k] = (char*)
scalloc( reax->num_atom_types, sizeof(char), "tor_flag[i,j,k]",
comm );
}
}
}
reax->gp.vdw_type = 0;
for( i = 0; i < reax->num_atom_types; i++ ) {
/* line one */
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
for( j = 0; j < (int)(strlen(tmp[0])); ++j )
reax->sbp[i].name[j] = toupper( tmp[0][j] );
val = atof(tmp[1]); reax->sbp[i].r_s = val;
val = atof(tmp[2]); reax->sbp[i].valency = val;
val = atof(tmp[3]); reax->sbp[i].mass = val;
val = atof(tmp[4]); reax->sbp[i].r_vdw = val;
val = atof(tmp[5]); reax->sbp[i].epsilon = val;
val = atof(tmp[6]); reax->sbp[i].gamma = val;
val = atof(tmp[7]); reax->sbp[i].r_pi = val;
val = atof(tmp[8]); reax->sbp[i].valency_e = val;
reax->sbp[i].nlp_opt = 0.5 * (reax->sbp[i].valency_e-reax->sbp[i].valency);
/* line two */
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
val = atof(tmp[0]); reax->sbp[i].alpha = val;
val = atof(tmp[1]); reax->sbp[i].gamma_w = val;
val = atof(tmp[2]); reax->sbp[i].valency_boc= val;
val = atof(tmp[3]); reax->sbp[i].p_ovun5 = val;
val = atof(tmp[4]);
val = atof(tmp[5]); reax->sbp[i].chi = val;
val = atof(tmp[6]); reax->sbp[i].eta = 2.0 * val;
val = atof(tmp[7]); reax->sbp[i].p_hbond = (int) val;
/* line 3 */
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
val = atof(tmp[0]); reax->sbp[i].r_pi_pi = val;
val = atof(tmp[1]); reax->sbp[i].p_lp2 = val;
val = atof(tmp[2]);
val = atof(tmp[3]); reax->sbp[i].b_o_131 = val;
val = atof(tmp[4]); reax->sbp[i].b_o_132 = val;
val = atof(tmp[5]); reax->sbp[i].b_o_133 = val;
val = atof(tmp[6]);
val = atof(tmp[7]);
/* line 4 */
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
/* Sanity check */
if (c < 3) {
fprintf(stderr, "Inconsistent ffield file (reaxc_ffield.cpp) \n");
MPI_Abort( comm, FILE_NOT_FOUND );
}
val = atof(tmp[0]); reax->sbp[i].p_ovun2 = val;
val = atof(tmp[1]); reax->sbp[i].p_val3 = val;
val = atof(tmp[2]);
val = atof(tmp[3]); reax->sbp[i].valency_val= val;
val = atof(tmp[4]); reax->sbp[i].p_val5 = val;
val = atof(tmp[5]); reax->sbp[i].rcore2 = val;
val = atof(tmp[6]); reax->sbp[i].ecore2 = val;
val = atof(tmp[7]); reax->sbp[i].acore2 = val;
/* line 5, only if lgvdw is yes */
if (lgflag) {
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
/* Sanity check */
if (c > 3) {
fprintf(stderr, "Inconsistent ffield file (reaxc_ffield.cpp) \n");
MPI_Abort( comm, FILE_NOT_FOUND );
}
val = atof(tmp[0]); reax->sbp[i].lgcij = val;
val = atof(tmp[1]); reax->sbp[i].lgre = val;
}
if( reax->sbp[i].rcore2>0.01 && reax->sbp[i].acore2>0.01 ){ // Inner-wall
if( reax->sbp[i].gamma_w>0.5 ){ // Shielding vdWaals
if( reax->gp.vdw_type != 0 && reax->gp.vdw_type != 3 ) {
if (errorflag)
fprintf( stderr, "Warning: inconsistent vdWaals-parameters\n" \
"Force field parameters for element %s\n" \
"indicate inner wall+shielding, but earlier\n" \
"atoms indicate different vdWaals-method.\n" \
"This may cause division-by-zero errors.\n" \
"Keeping vdWaals-setting for earlier atoms.\n",
reax->sbp[i].name );
errorflag = 0;
}
else{
reax->gp.vdw_type = 3;
}
}
else { // No shielding vdWaals parameters present
if( reax->gp.vdw_type != 0 && reax->gp.vdw_type != 2 )
fprintf( stderr, "Warning: inconsistent vdWaals-parameters\n" \
"Force field parameters for element %s\n" \
"indicate inner wall without shielding, but earlier\n" \
"atoms indicate different vdWaals-method.\n" \
"This may cause division-by-zero errors.\n" \
"Keeping vdWaals-setting for earlier atoms.\n",
reax->sbp[i].name );
else{
reax->gp.vdw_type = 2;
}
}
}
else{ // No Inner wall parameters present
if( reax->sbp[i].gamma_w>0.5 ){ // Shielding vdWaals
if( reax->gp.vdw_type != 0 && reax->gp.vdw_type != 1 )
fprintf( stderr, "Warning: inconsistent vdWaals-parameters\n" \
"Force field parameters for element %s\n" \
"indicate shielding without inner wall, but earlier\n" \
"atoms indicate different vdWaals-method.\n" \
"This may cause division-by-zero errors.\n" \
"Keeping vdWaals-setting for earlier atoms.\n",
reax->sbp[i].name );
else{
reax->gp.vdw_type = 1;
}
}
else{
fprintf( stderr, "Error: inconsistent vdWaals-parameters\n"\
"No shielding or inner-wall set for element %s\n",
reax->sbp[i].name );
MPI_Abort( comm, INVALID_INPUT );
}
}
}
/* Equate vval3 to valf for first-row elements (25/10/2004) */
for( i = 0; i < reax->num_atom_types; i++ )
if( reax->sbp[i].mass < 21 &&
reax->sbp[i].valency_val != reax->sbp[i].valency_boc ){
fprintf( stderr, "Warning: changed valency_val to valency_boc for %s\n",
reax->sbp[i].name );
reax->sbp[i].valency_val = reax->sbp[i].valency_boc;
}
/* next line is number of two body combination and some comments */
fgets(s,MAX_LINE,fp);
c=Tokenize(s,&tmp);
l = atoi(tmp[0]);
/* a line of comments */
fgets(s,MAX_LINE,fp);
for (i=0; i < l; i++) {
/* line 1 */
fgets(s,MAX_LINE,fp);
c=Tokenize(s,&tmp);
j = atoi(tmp[0]) - 1;
k = atoi(tmp[1]) - 1;
if (j < reax->num_atom_types && k < reax->num_atom_types) {
val = atof(tmp[2]); reax->tbp[j][k].De_s = val;
reax->tbp[k][j].De_s = val;
val = atof(tmp[3]); reax->tbp[j][k].De_p = val;
reax->tbp[k][j].De_p = val;
val = atof(tmp[4]); reax->tbp[j][k].De_pp = val;
reax->tbp[k][j].De_pp = val;
val = atof(tmp[5]); reax->tbp[j][k].p_be1 = val;
reax->tbp[k][j].p_be1 = val;
val = atof(tmp[6]); reax->tbp[j][k].p_bo5 = val;
reax->tbp[k][j].p_bo5 = val;
val = atof(tmp[7]); reax->tbp[j][k].v13cor = val;
reax->tbp[k][j].v13cor = val;
val = atof(tmp[8]); reax->tbp[j][k].p_bo6 = val;
reax->tbp[k][j].p_bo6 = val;
val = atof(tmp[9]); reax->tbp[j][k].p_ovun1 = val;
reax->tbp[k][j].p_ovun1 = val;
/* line 2 */
fgets(s,MAX_LINE,fp);
c=Tokenize(s,&tmp);
val = atof(tmp[0]); reax->tbp[j][k].p_be2 = val;
reax->tbp[k][j].p_be2 = val;
val = atof(tmp[1]); reax->tbp[j][k].p_bo3 = val;
reax->tbp[k][j].p_bo3 = val;
val = atof(tmp[2]); reax->tbp[j][k].p_bo4 = val;
reax->tbp[k][j].p_bo4 = val;
val = atof(tmp[3]);
val = atof(tmp[4]); reax->tbp[j][k].p_bo1 = val;
reax->tbp[k][j].p_bo1 = val;
val = atof(tmp[5]); reax->tbp[j][k].p_bo2 = val;
reax->tbp[k][j].p_bo2 = val;
val = atof(tmp[6]); reax->tbp[j][k].ovc = val;
reax->tbp[k][j].ovc = val;
val = atof(tmp[7]);
}
}
for (i=0; i < reax->num_atom_types; i++)
for (j=i; j < reax->num_atom_types; j++) {
reax->tbp[i][j].r_s = 0.5 *
(reax->sbp[i].r_s + reax->sbp[j].r_s);
reax->tbp[j][i].r_s = 0.5 *
(reax->sbp[j].r_s + reax->sbp[i].r_s);
reax->tbp[i][j].r_p = 0.5 *
(reax->sbp[i].r_pi + reax->sbp[j].r_pi);
reax->tbp[j][i].r_p = 0.5 *
(reax->sbp[j].r_pi + reax->sbp[i].r_pi);
reax->tbp[i][j].r_pp = 0.5 *
(reax->sbp[i].r_pi_pi + reax->sbp[j].r_pi_pi);
reax->tbp[j][i].r_pp = 0.5 *
(reax->sbp[j].r_pi_pi + reax->sbp[i].r_pi_pi);
reax->tbp[i][j].p_boc3 =
sqrt(reax->sbp[i].b_o_132 *
reax->sbp[j].b_o_132);
reax->tbp[j][i].p_boc3 =
sqrt(reax->sbp[j].b_o_132 *
reax->sbp[i].b_o_132);
reax->tbp[i][j].p_boc4 =
sqrt(reax->sbp[i].b_o_131 *
reax->sbp[j].b_o_131);
reax->tbp[j][i].p_boc4 =
sqrt(reax->sbp[j].b_o_131 *
reax->sbp[i].b_o_131);
reax->tbp[i][j].p_boc5 =
sqrt(reax->sbp[i].b_o_133 *
reax->sbp[j].b_o_133);
reax->tbp[j][i].p_boc5 =
sqrt(reax->sbp[j].b_o_133 *
reax->sbp[i].b_o_133);
reax->tbp[i][j].D =
sqrt(reax->sbp[i].epsilon *
reax->sbp[j].epsilon);
reax->tbp[j][i].D =
sqrt(reax->sbp[j].epsilon *
reax->sbp[i].epsilon);
reax->tbp[i][j].alpha =
sqrt(reax->sbp[i].alpha *
reax->sbp[j].alpha);
reax->tbp[j][i].alpha =
sqrt(reax->sbp[j].alpha *
reax->sbp[i].alpha);
reax->tbp[i][j].r_vdW =
2.0 * sqrt(reax->sbp[i].r_vdw * reax->sbp[j].r_vdw);
reax->tbp[j][i].r_vdW =
2.0 * sqrt(reax->sbp[j].r_vdw * reax->sbp[i].r_vdw);
reax->tbp[i][j].gamma_w =
sqrt(reax->sbp[i].gamma_w *
reax->sbp[j].gamma_w);
reax->tbp[j][i].gamma_w =
sqrt(reax->sbp[j].gamma_w *
reax->sbp[i].gamma_w);
reax->tbp[i][j].gamma =
pow(reax->sbp[i].gamma *
reax->sbp[j].gamma,-1.5);
reax->tbp[j][i].gamma =
pow(reax->sbp[j].gamma *
reax->sbp[i].gamma,-1.5);
// additions for additional vdWaals interaction types - inner core
reax->tbp[i][j].rcore = reax->tbp[j][i].rcore =
sqrt( reax->sbp[i].rcore2 * reax->sbp[j].rcore2 );
reax->tbp[i][j].ecore = reax->tbp[j][i].ecore =
sqrt( reax->sbp[i].ecore2 * reax->sbp[j].ecore2 );
reax->tbp[i][j].acore = reax->tbp[j][i].acore =
sqrt( reax->sbp[i].acore2 * reax->sbp[j].acore2 );
// additions for additional vdWalls interaction types lg correction
reax->tbp[i][j].lgcij = reax->tbp[j][i].lgcij =
sqrt( reax->sbp[i].lgcij * reax->sbp[j].lgcij );
reax->tbp[i][j].lgre = reax->tbp[j][i].lgre = 2.0 * reax->gp.l[35] *
sqrt( reax->sbp[i].lgre*reax->sbp[j].lgre );
}
fgets(s,MAX_LINE,fp);
c=Tokenize(s,&tmp);
l = atoi(tmp[0]);
for (i=0; i < l; i++) {
fgets(s,MAX_LINE,fp);
c=Tokenize(s,&tmp);
j = atoi(tmp[0]) - 1;
k = atoi(tmp[1]) - 1;
if (j < reax->num_atom_types && k < reax->num_atom_types) {
val = atof(tmp[2]);
if (val > 0.0) {
reax->tbp[j][k].D = val;
reax->tbp[k][j].D = val;
}
val = atof(tmp[3]);
if (val > 0.0) {
reax->tbp[j][k].r_vdW = 2 * val;
reax->tbp[k][j].r_vdW = 2 * val;
}
val = atof(tmp[4]);
if (val > 0.0) {
reax->tbp[j][k].alpha = val;
reax->tbp[k][j].alpha = val;
}
val = atof(tmp[5]);
if (val > 0.0) {
reax->tbp[j][k].r_s = val;
reax->tbp[k][j].r_s = val;
}
val = atof(tmp[6]);
if (val > 0.0) {
reax->tbp[j][k].r_p = val;
reax->tbp[k][j].r_p = val;
}
val = atof(tmp[7]);
if (val > 0.0) {
reax->tbp[j][k].r_pp = val;
reax->tbp[k][j].r_pp = val;
}
val = atof(tmp[8]);
if (val >= 0.0) {
reax->tbp[j][k].lgcij = val;
reax->tbp[k][j].lgcij = val;
}
}
}
for( i = 0; i < reax->num_atom_types; ++i )
for( j = 0; j < reax->num_atom_types; ++j )
for( k = 0; k < reax->num_atom_types; ++k )
reax->thbp[i][j][k].cnt = 0;
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
l = atoi( tmp[0] );
for( i = 0; i < l; i++ ) {
fgets(s,MAX_LINE,fp);
c=Tokenize(s,&tmp);
j = atoi(tmp[0]) - 1;
k = atoi(tmp[1]) - 1;
m = atoi(tmp[2]) - 1;
if (j < reax->num_atom_types && k < reax->num_atom_types &&
m < reax->num_atom_types) {
cnt = reax->thbp[j][k][m].cnt;
reax->thbp[j][k][m].cnt++;
reax->thbp[m][k][j].cnt++;
val = atof(tmp[3]);
reax->thbp[j][k][m].prm[cnt].theta_00 = val;
reax->thbp[m][k][j].prm[cnt].theta_00 = val;
val = atof(tmp[4]);
reax->thbp[j][k][m].prm[cnt].p_val1 = val;
reax->thbp[m][k][j].prm[cnt].p_val1 = val;
val = atof(tmp[5]);
reax->thbp[j][k][m].prm[cnt].p_val2 = val;
reax->thbp[m][k][j].prm[cnt].p_val2 = val;
val = atof(tmp[6]);
reax->thbp[j][k][m].prm[cnt].p_coa1 = val;
reax->thbp[m][k][j].prm[cnt].p_coa1 = val;
val = atof(tmp[7]);
reax->thbp[j][k][m].prm[cnt].p_val7 = val;
reax->thbp[m][k][j].prm[cnt].p_val7 = val;
val = atof(tmp[8]);
reax->thbp[j][k][m].prm[cnt].p_pen1 = val;
reax->thbp[m][k][j].prm[cnt].p_pen1 = val;
val = atof(tmp[9]);
reax->thbp[j][k][m].prm[cnt].p_val4 = val;
reax->thbp[m][k][j].prm[cnt].p_val4 = val;
}
}
/* clear all entries first */
for( i = 0; i < reax->num_atom_types; ++i )
for( j = 0; j < reax->num_atom_types; ++j )
for( k = 0; k < reax->num_atom_types; ++k )
for( m = 0; m < reax->num_atom_types; ++m ) {
reax->fbp[i][j][k][m].cnt = 0;
tor_flag[i][j][k][m] = 0;
}
/* next line is number of 4-body params and some comments */
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
l = atoi( tmp[0] );
for( i = 0; i < l; i++ ) {
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
j = atoi(tmp[0]) - 1;
k = atoi(tmp[1]) - 1;
m = atoi(tmp[2]) - 1;
n = atoi(tmp[3]) - 1;
if (j >= 0 && n >= 0) { // this means the entry is not in compact form
if (j < reax->num_atom_types && k < reax->num_atom_types &&
m < reax->num_atom_types && n < reax->num_atom_types) {
tor_flag[j][k][m][n] = 1;
tor_flag[n][m][k][j] = 1;
reax->fbp[j][k][m][n].cnt = 1;
reax->fbp[n][m][k][j].cnt = 1;
val = atof(tmp[4]);
reax->fbp[j][k][m][n].prm[0].V1 = val;
reax->fbp[n][m][k][j].prm[0].V1 = val;
val = atof(tmp[5]);
reax->fbp[j][k][m][n].prm[0].V2 = val;
reax->fbp[n][m][k][j].prm[0].V2 = val;
val = atof(tmp[6]);
reax->fbp[j][k][m][n].prm[0].V3 = val;
reax->fbp[n][m][k][j].prm[0].V3 = val;
val = atof(tmp[7]);
reax->fbp[j][k][m][n].prm[0].p_tor1 = val;
reax->fbp[n][m][k][j].prm[0].p_tor1 = val;
val = atof(tmp[8]);
reax->fbp[j][k][m][n].prm[0].p_cot1 = val;
reax->fbp[n][m][k][j].prm[0].p_cot1 = val;
}
}
else { /* This means the entry is of the form 0-X-Y-0 */
if( k < reax->num_atom_types && m < reax->num_atom_types )
for( p = 0; p < reax->num_atom_types; p++ )
for( o = 0; o < reax->num_atom_types; o++ ) {
reax->fbp[p][k][m][o].cnt = 1;
reax->fbp[o][m][k][p].cnt = 1;
if (tor_flag[p][k][m][o] == 0) {
reax->fbp[p][k][m][o].prm[0].V1 = atof(tmp[4]);
reax->fbp[p][k][m][o].prm[0].V2 = atof(tmp[5]);
reax->fbp[p][k][m][o].prm[0].V3 = atof(tmp[6]);
reax->fbp[p][k][m][o].prm[0].p_tor1 = atof(tmp[7]);
reax->fbp[p][k][m][o].prm[0].p_cot1 = atof(tmp[8]);
}
if (tor_flag[o][m][k][p] == 0) {
reax->fbp[o][m][k][p].prm[0].V1 = atof(tmp[4]);
reax->fbp[o][m][k][p].prm[0].V2 = atof(tmp[5]);
reax->fbp[o][m][k][p].prm[0].V3 = atof(tmp[6]);
reax->fbp[o][m][k][p].prm[0].p_tor1 = atof(tmp[7]);
reax->fbp[o][m][k][p].prm[0].p_cot1 = atof(tmp[8]);
}
}
}
}
/* next line is number of hydrogen bond params and some comments */
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
l = atoi( tmp[0] );
for( i = 0; i < reax->num_atom_types; ++i )
for( j = 0; j < reax->num_atom_types; ++j )
for( k = 0; k < reax->num_atom_types; ++k )
reax->hbp[i][j][k].r0_hb = -1.0;
for( i = 0; i < l; i++ ) {
fgets( s, MAX_LINE, fp );
c = Tokenize( s, &tmp );
j = atoi(tmp[0]) - 1;
k = atoi(tmp[1]) - 1;
m = atoi(tmp[2]) - 1;
if( j < reax->num_atom_types && m < reax->num_atom_types ) {
val = atof(tmp[3]);
reax->hbp[j][k][m].r0_hb = val;
val = atof(tmp[4]);
reax->hbp[j][k][m].p_hb1 = val;
val = atof(tmp[5]);
reax->hbp[j][k][m].p_hb2 = val;
val = atof(tmp[6]);
reax->hbp[j][k][m].p_hb3 = val;
}
}
/* deallocate helper storage */
for( i = 0; i < MAX_TOKENS; i++ )
free( tmp[i] );
free( tmp );
free( s );
/* deallocate tor_flag */
for( i = 0; i < reax->num_atom_types; i++ ) {
for( j = 0; j < reax->num_atom_types; j++ ) {
for( k = 0; k < reax->num_atom_types; k++ ) {
free( tor_flag[i][j][k] );
}
free( tor_flag[i][j] );
}
free( tor_flag[i] );
}
free( tor_flag );
// close file
fclose(fp);
return SUCCESS;
}
diff --git a/src/USER-REAXC/reaxc_forces.cpp b/src/USER-REAXC/reaxc_forces.cpp
index 7f11f5565..215ded6e5 100644
--- a/src/USER-REAXC/reaxc_forces.cpp
+++ b/src/USER-REAXC/reaxc_forces.cpp
@@ -1,459 +1,459 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_forces.h"
#include "reaxc_bond_orders.h"
#include "reaxc_bonds.h"
#include "reaxc_hydrogen_bonds.h"
#include "reaxc_io_tools.h"
#include "reaxc_list.h"
#include "reaxc_lookup.h"
#include "reaxc_multi_body.h"
#include "reaxc_nonbonded.h"
#include "reaxc_tool_box.h"
#include "reaxc_torsion_angles.h"
#include "reaxc_valence_angles.h"
#include "reaxc_vector.h"
interaction_function Interaction_Functions[NUM_INTRS];
void Dummy_Interaction( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, output_controls *out_control )
{
}
void Init_Force_Functions( control_params *control )
{
Interaction_Functions[0] = BO;
Interaction_Functions[1] = Bonds; //Dummy_Interaction;
Interaction_Functions[2] = Atom_Energy; //Dummy_Interaction;
Interaction_Functions[3] = Valence_Angles; //Dummy_Interaction;
Interaction_Functions[4] = Torsion_Angles; //Dummy_Interaction;
if( control->hbond_cut > 0 )
Interaction_Functions[5] = Hydrogen_Bonds;
else Interaction_Functions[5] = Dummy_Interaction;
Interaction_Functions[6] = Dummy_Interaction; //empty
Interaction_Functions[7] = Dummy_Interaction; //empty
Interaction_Functions[8] = Dummy_Interaction; //empty
Interaction_Functions[9] = Dummy_Interaction; //empty
}
void Compute_Bonded_Forces( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, output_controls *out_control,
MPI_Comm comm )
{
int i;
/* Implement all force calls as function pointers */
for( i = 0; i < NUM_INTRS; i++ ) {
(Interaction_Functions[i])( system, control, data, workspace,
lists, out_control );
}
}
void Compute_NonBonded_Forces( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, output_controls *out_control,
MPI_Comm comm )
{
/* van der Waals and Coulomb interactions */
if( control->tabulate == 0 )
vdW_Coulomb_Energy( system, control, data, workspace,
lists, out_control );
else
Tabulated_vdW_Coulomb_Energy( system, control, data, workspace,
lists, out_control );
}
void Compute_Total_Force( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, mpi_datatypes *mpi_data )
{
int i, pj;
reax_list *bonds = (*lists) + BONDS;
for( i = 0; i < system->N; ++i )
for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj )
if( i < bonds->select.bond_list[pj].nbr ) {
if( control->virial == 0 )
Add_dBond_to_Forces( system, i, pj, workspace, lists );
else
Add_dBond_to_Forces_NPT( i, pj, data, workspace, lists );
}
}
void Validate_Lists( reax_system *system, storage *workspace, reax_list **lists,
int step, int n, int N, int numH, MPI_Comm comm )
{
int i, comp, Hindex;
reax_list *bonds, *hbonds;
double saferzone = system->saferzone;
/* bond list */
if( N > 0 ) {
bonds = *lists + BONDS;
for( i = 0; i < N; ++i ) {
system->my_atoms[i].num_bonds = MAX(Num_Entries(i,bonds)*2, MIN_BONDS);
if( i < N-1 )
comp = Start_Index(i+1, bonds);
else comp = bonds->num_intrs;
if( End_Index(i, bonds) > comp ) {
fprintf( stderr, "step%d-bondchk failed: i=%d end(i)=%d str(i+1)=%d\n",
step, i, End_Index(i,bonds), comp );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
}
}
/* hbonds list */
if( numH > 0 ) {
hbonds = *lists + HBONDS;
for( i = 0; i < N; ++i ) {
Hindex = system->my_atoms[i].Hindex;
if( Hindex > -1 ) {
system->my_atoms[i].num_hbonds =
(int)(MAX( Num_Entries(Hindex, hbonds)*saferzone, MIN_HBONDS ));
//if( Num_Entries(i, hbonds) >=
//(Start_Index(i+1,hbonds)-Start_Index(i,hbonds))*0.90/*DANGER_ZONE*/){
// workspace->realloc.hbonds = 1;
if( Hindex < numH-1 )
comp = Start_Index(Hindex+1, hbonds);
else comp = hbonds->num_intrs;
if( End_Index(Hindex, hbonds) > comp ) {
fprintf(stderr,"step%d-hbondchk failed: H=%d end(H)=%d str(H+1)=%d\n",
step, Hindex, End_Index(Hindex,hbonds), comp );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
}
}
}
}
void Init_Forces_noQEq( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, output_controls *out_control,
MPI_Comm comm ) {
int i, j, pj;
int start_i, end_i;
int type_i, type_j;
int btop_i, btop_j, num_bonds, num_hbonds;
int ihb, jhb, ihb_top, jhb_top;
int local, flag, renbr;
double cutoff;
reax_list *far_nbrs, *bonds, *hbonds;
single_body_parameters *sbp_i, *sbp_j;
two_body_parameters *twbp;
far_neighbor_data *nbr_pj;
reax_atom *atom_i, *atom_j;
far_nbrs = *lists + FAR_NBRS;
bonds = *lists + BONDS;
hbonds = *lists + HBONDS;
for( i = 0; i < system->n; ++i )
workspace->bond_mark[i] = 0;
for( i = system->n; i < system->N; ++i ) {
workspace->bond_mark[i] = 1000; // put ghost atoms to an infinite distance
}
num_bonds = 0;
num_hbonds = 0;
btop_i = btop_j = 0;
renbr = (data->step-data->prev_steps) % control->reneighbor == 0;
for( i = 0; i < system->N; ++i ) {
atom_i = &(system->my_atoms[i]);
type_i = atom_i->type;
if (type_i < 0) continue;
start_i = Start_Index(i, far_nbrs);
end_i = End_Index(i, far_nbrs);
btop_i = End_Index( i, bonds );
sbp_i = &(system->reax_param.sbp[type_i]);
if( i < system->n ) {
local = 1;
cutoff = MAX( control->hbond_cut, control->bond_cut );
}
else {
local = 0;
cutoff = control->bond_cut;
}
ihb = -1;
ihb_top = -1;
if( local && control->hbond_cut > 0 ) {
ihb = sbp_i->p_hbond;
if( ihb == 1 )
ihb_top = End_Index( atom_i->Hindex, hbonds );
else ihb_top = -1;
}
/* update i-j distance - check if j is within cutoff */
for( pj = start_i; pj < end_i; ++pj ) {
nbr_pj = &( far_nbrs->select.far_nbr_list[pj] );
j = nbr_pj->nbr;
atom_j = &(system->my_atoms[j]);
if( renbr ) {
if( nbr_pj->d <= cutoff )
flag = 1;
else flag = 0;
}
else{
nbr_pj->dvec[0] = atom_j->x[0] - atom_i->x[0];
nbr_pj->dvec[1] = atom_j->x[1] - atom_i->x[1];
nbr_pj->dvec[2] = atom_j->x[2] - atom_i->x[2];
nbr_pj->d = rvec_Norm_Sqr( nbr_pj->dvec );
if( nbr_pj->d <= SQR(cutoff) ) {
nbr_pj->d = sqrt(nbr_pj->d);
flag = 1;
}
else {
flag = 0;
}
}
if( flag ) {
type_j = atom_j->type;
if (type_j < 0) continue;
sbp_j = &(system->reax_param.sbp[type_j]);
twbp = &(system->reax_param.tbp[type_i][type_j]);
if( local ) {
/* hydrogen bond lists */
if( control->hbond_cut > 0 && (ihb==1 || ihb==2) &&
nbr_pj->d <= control->hbond_cut ) {
// fprintf( stderr, "%d %d\n", atom1, atom2 );
jhb = sbp_j->p_hbond;
if( ihb == 1 && jhb == 2 ) {
hbonds->select.hbond_list[ihb_top].nbr = j;
hbonds->select.hbond_list[ihb_top].scl = 1;
hbonds->select.hbond_list[ihb_top].ptr = nbr_pj;
++ihb_top;
++num_hbonds;
}
else if( j < system->n && ihb == 2 && jhb == 1 ) {
jhb_top = End_Index( atom_j->Hindex, hbonds );
hbonds->select.hbond_list[jhb_top].nbr = i;
hbonds->select.hbond_list[jhb_top].scl = -1;
hbonds->select.hbond_list[jhb_top].ptr = nbr_pj;
Set_End_Index( atom_j->Hindex, jhb_top+1, hbonds );
++num_hbonds;
}
}
}
if( //(workspace->bond_mark[i] < 3 || workspace->bond_mark[j] < 3) &&
nbr_pj->d <= control->bond_cut &&
BOp( workspace, bonds, control->bo_cut,
i , btop_i, nbr_pj, sbp_i, sbp_j, twbp ) ) {
num_bonds += 2;
++btop_i;
if( workspace->bond_mark[j] > workspace->bond_mark[i] + 1 )
workspace->bond_mark[j] = workspace->bond_mark[i] + 1;
else if( workspace->bond_mark[i] > workspace->bond_mark[j] + 1 ) {
workspace->bond_mark[i] = workspace->bond_mark[j] + 1;
}
}
}
}
Set_End_Index( i, btop_i, bonds );
if( local && ihb == 1 )
Set_End_Index( atom_i->Hindex, ihb_top, hbonds );
}
workspace->realloc.num_bonds = num_bonds;
workspace->realloc.num_hbonds = num_hbonds;
Validate_Lists( system, workspace, lists, data->step,
system->n, system->N, system->numH, comm );
}
void Estimate_Storages( reax_system *system, control_params *control,
reax_list **lists, int *Htop, int *hb_top,
int *bond_top, int *num_3body, MPI_Comm comm )
{
int i, j, pj;
int start_i, end_i;
int type_i, type_j;
int ihb, jhb;
int local;
double cutoff;
double r_ij;
double C12, C34, C56;
double BO, BO_s, BO_pi, BO_pi2;
reax_list *far_nbrs;
single_body_parameters *sbp_i, *sbp_j;
two_body_parameters *twbp;
far_neighbor_data *nbr_pj;
reax_atom *atom_i, *atom_j;
int mincap = system->mincap;
double safezone = system->safezone;
double saferzone = system->saferzone;
far_nbrs = *lists + FAR_NBRS;
*Htop = 0;
memset( hb_top, 0, sizeof(int) * system->local_cap );
memset( bond_top, 0, sizeof(int) * system->total_cap );
*num_3body = 0;
for( i = 0; i < system->N; ++i ) {
atom_i = &(system->my_atoms[i]);
type_i = atom_i->type;
if (type_i < 0) continue;
start_i = Start_Index(i, far_nbrs);
end_i = End_Index(i, far_nbrs);
sbp_i = &(system->reax_param.sbp[type_i]);
if( i < system->n ) {
local = 1;
cutoff = control->nonb_cut;
++(*Htop);
ihb = sbp_i->p_hbond;
}
else {
local = 0;
cutoff = control->bond_cut;
ihb = -1;
}
for( pj = start_i; pj < end_i; ++pj ) {
nbr_pj = &( far_nbrs->select.far_nbr_list[pj] );
j = nbr_pj->nbr;
atom_j = &(system->my_atoms[j]);
if(nbr_pj->d <= cutoff) {
type_j = system->my_atoms[j].type;
if (type_j < 0) continue;
r_ij = nbr_pj->d;
sbp_j = &(system->reax_param.sbp[type_j]);
twbp = &(system->reax_param.tbp[type_i][type_j]);
if( local ) {
if( j < system->n || atom_i->orig_id < atom_j->orig_id ) //tryQEq ||1
++(*Htop);
/* hydrogen bond lists */
if( control->hbond_cut > 0.1 && (ihb==1 || ihb==2) &&
nbr_pj->d <= control->hbond_cut ) {
jhb = sbp_j->p_hbond;
if( ihb == 1 && jhb == 2 )
++hb_top[i];
else if( j < system->n && ihb == 2 && jhb == 1 )
++hb_top[j];
}
}
/* uncorrected bond orders */
if( nbr_pj->d <= control->bond_cut ) {
if( sbp_i->r_s > 0.0 && sbp_j->r_s > 0.0) {
C12 = twbp->p_bo1 * pow( r_ij / twbp->r_s, twbp->p_bo2 );
BO_s = (1.0 + control->bo_cut) * exp( C12 );
}
else BO_s = C12 = 0.0;
if( sbp_i->r_pi > 0.0 && sbp_j->r_pi > 0.0) {
C34 = twbp->p_bo3 * pow( r_ij / twbp->r_p, twbp->p_bo4 );
BO_pi = exp( C34 );
}
else BO_pi = C34 = 0.0;
if( sbp_i->r_pi_pi > 0.0 && sbp_j->r_pi_pi > 0.0) {
C56 = twbp->p_bo5 * pow( r_ij / twbp->r_pp, twbp->p_bo6 );
BO_pi2= exp( C56 );
}
else BO_pi2 = C56 = 0.0;
/* Initially BO values are the uncorrected ones, page 1 */
BO = BO_s + BO_pi + BO_pi2;
if( BO >= control->bo_cut ) {
++bond_top[i];
++bond_top[j];
}
}
}
}
}
*Htop = (int)(MAX( *Htop * safezone, mincap * MIN_HENTRIES ));
for( i = 0; i < system->n; ++i )
hb_top[i] = (int)(MAX( hb_top[i] * saferzone, MIN_HBONDS ));
for( i = 0; i < system->N; ++i ) {
*num_3body += SQR(bond_top[i]);
bond_top[i] = MAX( bond_top[i] * 2, MIN_BONDS );
}
}
void Compute_Forces( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, output_controls *out_control,
mpi_datatypes *mpi_data )
{
MPI_Comm comm = mpi_data->world;
Init_Forces_noQEq( system, control, data, workspace,
lists, out_control, comm );
/********* bonded interactions ************/
Compute_Bonded_Forces( system, control, data, workspace,
lists, out_control, mpi_data->world );
/********* nonbonded interactions ************/
Compute_NonBonded_Forces( system, control, data, workspace,
lists, out_control, mpi_data->world );
/*********** total force ***************/
Compute_Total_Force( system, control, data, workspace, lists, mpi_data );
}
diff --git a/src/USER-REAXC/reaxc_hydrogen_bonds.cpp b/src/USER-REAXC/reaxc_hydrogen_bonds.cpp
index 8d7b3b381..ff771ad65 100644
--- a/src/USER-REAXC/reaxc_hydrogen_bonds.cpp
+++ b/src/USER-REAXC/reaxc_hydrogen_bonds.cpp
@@ -1,184 +1,184 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_hydrogen_bonds.h"
#include "reaxc_bond_orders.h"
#include "reaxc_list.h"
#include "reaxc_valence_angles.h"
#include "reaxc_vector.h"
void Hydrogen_Bonds( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, output_controls *out_control )
{
int i, j, k, pi, pk;
int type_i, type_j, type_k;
int start_j, end_j, hb_start_j, hb_end_j;
int hblist[MAX_BONDS];
int itr, top;
int num_hb_intrs = 0;
ivec rel_jk;
double r_jk, theta, cos_theta, sin_xhz4, cos_xhz1, sin_theta2;
double e_hb, exp_hb2, exp_hb3, CEhb1, CEhb2, CEhb3;
rvec dcos_theta_di, dcos_theta_dj, dcos_theta_dk;
rvec dvec_jk, force, ext_press;
hbond_parameters *hbp;
bond_order_data *bo_ij;
bond_data *pbond_ij;
far_neighbor_data *nbr_jk;
reax_list *bonds, *hbonds;
bond_data *bond_list;
hbond_data *hbond_list;
// tally variables
double fi_tmp[3], fk_tmp[3], delij[3], delkj[3];
bonds = (*lists) + BONDS;
bond_list = bonds->select.bond_list;
hbonds = (*lists) + HBONDS;
hbond_list = hbonds->select.hbond_list;
for( j = 0; j < system->n; ++j )
if( system->reax_param.sbp[system->my_atoms[j].type].p_hbond == 1 ) {
type_j = system->my_atoms[j].type;
start_j = Start_Index(j, bonds);
end_j = End_Index(j, bonds);
hb_start_j = Start_Index( system->my_atoms[j].Hindex, hbonds );
hb_end_j = End_Index( system->my_atoms[j].Hindex, hbonds );
if (type_j < 0) continue;
top = 0;
for( pi = start_j; pi < end_j; ++pi ) {
pbond_ij = &( bond_list[pi] );
i = pbond_ij->nbr;
type_i = system->my_atoms[i].type;
if (type_i < 0) continue;
bo_ij = &(pbond_ij->bo_data);
if( system->reax_param.sbp[type_i].p_hbond == 2 &&
bo_ij->BO >= HB_THRESHOLD )
hblist[top++] = pi;
}
for( pk = hb_start_j; pk < hb_end_j; ++pk ) {
/* set k's varibles */
k = hbond_list[pk].nbr;
type_k = system->my_atoms[k].type;
if (type_k < 0) continue;
nbr_jk = hbond_list[pk].ptr;
r_jk = nbr_jk->d;
rvec_Scale( dvec_jk, hbond_list[pk].scl, nbr_jk->dvec );
for( itr = 0; itr < top; ++itr ) {
pi = hblist[itr];
pbond_ij = &( bonds->select.bond_list[pi] );
i = pbond_ij->nbr;
if( system->my_atoms[i].orig_id != system->my_atoms[k].orig_id ) {
bo_ij = &(pbond_ij->bo_data);
type_i = system->my_atoms[i].type;
if (type_i < 0) continue;
hbp = &(system->reax_param.hbp[ type_i ][ type_j ][ type_k ]);
if (hbp->r0_hb <= 0.0) continue;
++num_hb_intrs;
Calculate_Theta( pbond_ij->dvec, pbond_ij->d, dvec_jk, r_jk,
&theta, &cos_theta );
/* the derivative of cos(theta) */
Calculate_dCos_Theta( pbond_ij->dvec, pbond_ij->d, dvec_jk, r_jk,
&dcos_theta_di, &dcos_theta_dj,
&dcos_theta_dk );
/* hyrogen bond energy*/
sin_theta2 = sin( theta/2.0 );
sin_xhz4 = SQR(sin_theta2);
sin_xhz4 *= sin_xhz4;
cos_xhz1 = ( 1.0 - cos_theta );
exp_hb2 = exp( -hbp->p_hb2 * bo_ij->BO );
exp_hb3 = exp( -hbp->p_hb3 * ( hbp->r0_hb / r_jk +
r_jk / hbp->r0_hb - 2.0 ) );
data->my_en.e_hb += e_hb =
hbp->p_hb1 * (1.0 - exp_hb2) * exp_hb3 * sin_xhz4;
CEhb1 = hbp->p_hb1 * hbp->p_hb2 * exp_hb2 * exp_hb3 * sin_xhz4;
CEhb2 = -hbp->p_hb1/2.0 * (1.0 - exp_hb2) * exp_hb3 * cos_xhz1;
CEhb3 = -hbp->p_hb3 *
(-hbp->r0_hb / SQR(r_jk) + 1.0 / hbp->r0_hb) * e_hb;
/* hydrogen bond forces */
bo_ij->Cdbo += CEhb1; // dbo term
if( control->virial == 0 ) {
// dcos terms
rvec_ScaledAdd( workspace->f[i], +CEhb2, dcos_theta_di );
rvec_ScaledAdd( workspace->f[j], +CEhb2, dcos_theta_dj );
rvec_ScaledAdd( workspace->f[k], +CEhb2, dcos_theta_dk );
// dr terms
rvec_ScaledAdd( workspace->f[j], -CEhb3/r_jk, dvec_jk );
rvec_ScaledAdd( workspace->f[k], +CEhb3/r_jk, dvec_jk );
}
else {
rvec_Scale( force, +CEhb2, dcos_theta_di ); // dcos terms
rvec_Add( workspace->f[i], force );
rvec_iMultiply( ext_press, pbond_ij->rel_box, force );
rvec_ScaledAdd( data->my_ext_press, 1.0, ext_press );
rvec_ScaledAdd( workspace->f[j], +CEhb2, dcos_theta_dj );
ivec_Scale( rel_jk, hbond_list[pk].scl, nbr_jk->rel_box );
rvec_Scale( force, +CEhb2, dcos_theta_dk );
rvec_Add( workspace->f[k], force );
rvec_iMultiply( ext_press, rel_jk, force );
rvec_ScaledAdd( data->my_ext_press, 1.0, ext_press );
// dr terms
rvec_ScaledAdd( workspace->f[j], -CEhb3/r_jk, dvec_jk );
rvec_Scale( force, CEhb3/r_jk, dvec_jk );
rvec_Add( workspace->f[k], force );
rvec_iMultiply( ext_press, rel_jk, force );
rvec_ScaledAdd( data->my_ext_press, 1.0, ext_press );
}
/* tally into per-atom virials */
if (system->pair_ptr->vflag_atom || system->pair_ptr->evflag) {
rvec_ScaledSum( delij, 1., system->my_atoms[j].x,
-1., system->my_atoms[i].x );
rvec_ScaledSum( delkj, 1., system->my_atoms[j].x,
-1., system->my_atoms[k].x );
rvec_Scale(fi_tmp, CEhb2, dcos_theta_di);
rvec_Scale(fk_tmp, CEhb2, dcos_theta_dk);
rvec_ScaledAdd(fk_tmp, CEhb3/r_jk, dvec_jk);
system->pair_ptr->ev_tally3(i,j,k,e_hb,0.0,fi_tmp,fk_tmp,delij,delkj);
}
}
}
}
}
}
diff --git a/src/USER-REAXC/reaxc_init_md.cpp b/src/USER-REAXC/reaxc_init_md.cpp
index f912c95ea..b11cdd2fb 100644
--- a/src/USER-REAXC/reaxc_init_md.cpp
+++ b/src/USER-REAXC/reaxc_init_md.cpp
@@ -1,279 +1,279 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_init_md.h"
#include "reaxc_allocate.h"
#include "reaxc_forces.h"
#include "reaxc_io_tools.h"
#include "reaxc_list.h"
#include "reaxc_lookup.h"
#include "reaxc_reset_tools.h"
#include "reaxc_system_props.h"
#include "reaxc_tool_box.h"
#include "reaxc_vector.h"
int Init_System( reax_system *system, control_params *control, char *msg )
{
int i;
reax_atom *atom;
int mincap = system->mincap;
double safezone = system->safezone;
double saferzone = system->saferzone;
// determine the local and total capacity
system->local_cap = MAX( (int)(system->n * safezone), mincap);
system->total_cap = MAX( (int)(system->N * safezone), mincap);
/* estimate numH and Hcap */
system->numH = 0;
if( control->hbond_cut > 0 )
for( i = 0; i < system->n; ++i ) {
atom = &(system->my_atoms[i]);
if (system->reax_param.sbp[ atom->type ].p_hbond == 1 && atom->type >= 0)
atom->Hindex = system->numH++;
else atom->Hindex = -1;
}
system->Hcap = (int)(MAX( system->numH * saferzone, mincap ));
return SUCCESS;
}
int Init_Simulation_Data( reax_system *system, control_params *control,
simulation_data *data, char *msg )
{
Reset_Simulation_Data( data, control->virial );
/* initialize the timer(s) */
if( system->my_rank == MASTER_NODE ) {
data->timing.start = Get_Time( );
}
data->step = data->prev_steps = 0;
return SUCCESS;
}
void Init_Taper( control_params *control, storage *workspace, MPI_Comm comm )
{
double d1, d7;
double swa, swa2, swa3;
double swb, swb2, swb3;
swa = control->nonb_low;
swb = control->nonb_cut;
if( fabs( swa ) > 0.01 )
fprintf( stderr, "Warning: non-zero lower Taper-radius cutoff\n" );
if( swb < 0 ) {
fprintf( stderr, "Negative upper Taper-radius cutoff\n" );
MPI_Abort( comm, INVALID_INPUT );
}
else if( swb < 5 )
fprintf( stderr, "Warning: very low Taper-radius cutoff: %f\n", swb );
d1 = swb - swa;
d7 = pow( d1, 7.0 );
swa2 = SQR( swa );
swa3 = CUBE( swa );
swb2 = SQR( swb );
swb3 = CUBE( swb );
workspace->Tap[7] = 20.0 / d7;
workspace->Tap[6] = -70.0 * (swa + swb) / d7;
workspace->Tap[5] = 84.0 * (swa2 + 3.0*swa*swb + swb2) / d7;
workspace->Tap[4] = -35.0 * (swa3 + 9.0*swa2*swb + 9.0*swa*swb2 + swb3 ) / d7;
workspace->Tap[3] = 140.0 * (swa3*swb + 3.0*swa2*swb2 + swa*swb3 ) / d7;
workspace->Tap[2] =-210.0 * (swa3*swb2 + swa2*swb3) / d7;
workspace->Tap[1] = 140.0 * swa3 * swb3 / d7;
workspace->Tap[0] = (-35.0*swa3*swb2*swb2 + 21.0*swa2*swb3*swb2 +
7.0*swa*swb3*swb3 + swb3*swb3*swb ) / d7;
}
int Init_Workspace( reax_system *system, control_params *control,
storage *workspace, MPI_Comm comm, char *msg )
{
int ret;
ret = Allocate_Workspace( system, control, workspace,
system->local_cap, system->total_cap, comm, msg );
if( ret != SUCCESS )
return ret;
memset( &(workspace->realloc), 0, sizeof(reallocate_data) );
Reset_Workspace( system, workspace );
/* Initialize the Taper function */
Init_Taper( control, workspace, comm );
return SUCCESS;
}
/************** setup communication data structures **************/
int Init_MPI_Datatypes( reax_system *system, storage *workspace,
mpi_datatypes *mpi_data, MPI_Comm comm, char *msg )
{
/* setup the world */
mpi_data->world = comm;
MPI_Comm_size( comm, &(system->wsize) );
return SUCCESS;
}
int Init_Lists( reax_system *system, control_params *control,
simulation_data *data, storage *workspace, reax_list **lists,
mpi_datatypes *mpi_data, char *msg )
{
int i, total_hbonds, total_bonds, bond_cap, num_3body, cap_3body, Htop;
int *hb_top, *bond_top;
MPI_Comm comm;
int mincap = system->mincap;
double safezone = system->safezone;
double saferzone = system->saferzone;
comm = mpi_data->world;
bond_top = (int*) calloc( system->total_cap, sizeof(int) );
hb_top = (int*) calloc( system->local_cap, sizeof(int) );
Estimate_Storages( system, control, lists,
&Htop, hb_top, bond_top, &num_3body, comm );
if( control->hbond_cut > 0 ) {
/* init H indexes */
total_hbonds = 0;
for( i = 0; i < system->n; ++i ) {
system->my_atoms[i].num_hbonds = hb_top[i];
total_hbonds += hb_top[i];
}
total_hbonds = (int)(MAX( total_hbonds*saferzone, mincap*MIN_HBONDS ));
if( !Make_List( system->Hcap, total_hbonds, TYP_HBOND,
*lists+HBONDS, comm ) ) {
fprintf( stderr, "not enough space for hbonds list. terminating!\n" );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
}
total_bonds = 0;
for( i = 0; i < system->N; ++i ) {
system->my_atoms[i].num_bonds = bond_top[i];
total_bonds += bond_top[i];
}
bond_cap = (int)(MAX( total_bonds*safezone, mincap*MIN_BONDS ));
if( !Make_List( system->total_cap, bond_cap, TYP_BOND,
*lists+BONDS, comm ) ) {
fprintf( stderr, "not enough space for bonds list. terminating!\n" );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
/* 3bodies list */
cap_3body = (int)(MAX( num_3body*safezone, MIN_3BODIES ));
if( !Make_List( bond_cap, cap_3body, TYP_THREE_BODY,
*lists+THREE_BODIES, comm ) ){
fprintf( stderr, "Problem in initializing angles list. Terminating!\n" );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
free( hb_top );
free( bond_top );
return SUCCESS;
}
void Initialize( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, output_controls *out_control,
mpi_datatypes *mpi_data, MPI_Comm comm )
{
char msg[MAX_STR];
if( Init_MPI_Datatypes(system, workspace, mpi_data, comm, msg) == FAILURE ) {
fprintf( stderr, "p%d: init_mpi_datatypes: could not create datatypes\n",
system->my_rank );
fprintf( stderr, "p%d: mpi_data couldn't be initialized! terminating.\n",
system->my_rank );
MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
}
if( Init_System(system, control, msg) == FAILURE ){
fprintf( stderr, "p%d: %s\n", system->my_rank, msg );
fprintf( stderr, "p%d: system could not be initialized! terminating.\n",
system->my_rank );
MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
}
if( Init_Simulation_Data( system, control, data, msg ) == FAILURE ) {
fprintf( stderr, "p%d: %s\n", system->my_rank, msg );
fprintf( stderr, "p%d: sim_data couldn't be initialized! terminating.\n",
system->my_rank );
MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
}
if( Init_Workspace( system, control, workspace, mpi_data->world, msg ) ==
FAILURE ) {
fprintf( stderr, "p%d:init_workspace: not enough memory\n",
system->my_rank );
fprintf( stderr, "p%d:workspace couldn't be initialized! terminating.\n",
system->my_rank );
MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
}
if( Init_Lists( system, control, data, workspace, lists, mpi_data, msg ) ==
FAILURE ) {
fprintf( stderr, "p%d: %s\n", system->my_rank, msg );
fprintf( stderr, "p%d: system could not be initialized! terminating.\n",
system->my_rank );
MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
}
if( Init_Output_Files(system,control,out_control,mpi_data,msg)== FAILURE) {
fprintf( stderr, "p%d: %s\n", system->my_rank, msg );
fprintf( stderr, "p%d: could not open output files! terminating...\n",
system->my_rank );
MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
}
if( control->tabulate ) {
if( Init_Lookup_Tables( system, control, workspace, mpi_data, msg ) == FAILURE ) {
fprintf( stderr, "p%d: %s\n", system->my_rank, msg );
fprintf( stderr, "p%d: couldn't create lookup table! terminating.\n",
system->my_rank );
MPI_Abort( mpi_data->world, CANNOT_INITIALIZE );
}
}
Init_Force_Functions( control );
}
diff --git a/src/USER-REAXC/reaxc_io_tools.cpp b/src/USER-REAXC/reaxc_io_tools.cpp
index 0c14dad5d..4d58f7514 100644
--- a/src/USER-REAXC/reaxc_io_tools.cpp
+++ b/src/USER-REAXC/reaxc_io_tools.cpp
@@ -1,153 +1,153 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "update.h"
#include "reaxc_io_tools.h"
#include "reaxc_list.h"
#include "reaxc_reset_tools.h"
#include "reaxc_system_props.h"
#include "reaxc_tool_box.h"
#include "reaxc_traj.h"
#include "reaxc_vector.h"
int Init_Output_Files( reax_system *system, control_params *control,
output_controls *out_control, mpi_datatypes *mpi_data,
char *msg )
{
char temp[MAX_STR];
int ret;
if( out_control->write_steps > 0 ){
ret = Init_Traj( system, control, out_control, mpi_data, msg );
if( ret == FAILURE )
return ret;
}
if( system->my_rank == MASTER_NODE ) {
/* These files are written only by the master node */
if( out_control->energy_update_freq > 0 ) {
/* init potentials file */
sprintf( temp, "%s.pot", control->sim_name );
if( (out_control->pot = fopen( temp, "w" )) != NULL ) {
fflush( out_control->pot );
}
else {
strcpy( msg, "init_out_controls: .pot file could not be opened\n" );
return FAILURE;
}
/* init log file */
}
/* init pressure file */
if( control->ensemble == NPT ||
control->ensemble == iNPT ||
control->ensemble == sNPT ) {
sprintf( temp, "%s.prs", control->sim_name );
if( (out_control->prs = fopen( temp, "w" )) != NULL ) {
fprintf(out_control->prs,"%8s%13s%13s%13s%13s%13s%13s%13s\n",
"step", "Pint/norm[x]", "Pint/norm[y]", "Pint/norm[z]",
"Pext/Ptot[x]", "Pext/Ptot[y]", "Pext/Ptot[z]", "Pkin/V" );
fflush( out_control->prs );
}
else {
strcpy(msg,"init_out_controls: .prs file couldn't be opened\n");
return FAILURE;
}
}
}
return SUCCESS;
}
/************************ close output files ************************/
int Close_Output_Files( reax_system *system, control_params *control,
output_controls *out_control, mpi_datatypes *mpi_data )
{
if( out_control->write_steps > 0 )
End_Traj( system->my_rank, out_control );
if( system->my_rank == MASTER_NODE ) {
if( out_control->energy_update_freq > 0 ) {
fclose( out_control->pot );
}
if( control->ensemble == NPT || control->ensemble == iNPT ||
control->ensemble == sNPT )
fclose( out_control->prs );
}
return SUCCESS;
}
void Output_Results( reax_system *system, control_params *control,
simulation_data *data, reax_list **lists,
output_controls *out_control, mpi_datatypes *mpi_data )
{
if((out_control->energy_update_freq > 0 &&
data->step%out_control->energy_update_freq == 0) ||
(out_control->write_steps > 0 &&
data->step%out_control->write_steps == 0)){
/* update system-wide energies */
Compute_System_Energy( system, data, mpi_data->world );
/* output energies */
if( system->my_rank == MASTER_NODE &&
out_control->energy_update_freq > 0 &&
data->step % out_control->energy_update_freq == 0 ) {
if( control->virial ){
fprintf( out_control->prs,
"%8d%13.6f%13.6f%13.6f%13.6f%13.6f%13.6f%13.6f\n",
data->step,
data->int_press[0], data->int_press[1], data->int_press[2],
data->ext_press[0], data->ext_press[1], data->ext_press[2],
data->kin_press );
fprintf( out_control->prs,
"%8s%13.6f%13.6f%13.6f%13.6f%13.6f%13.6f%13.6f\n",
"",system->big_box.box_norms[0], system->big_box.box_norms[1],
system->big_box.box_norms[2],
data->tot_press[0], data->tot_press[1], data->tot_press[2],
system->big_box.V );
fflush( out_control->prs);
}
}
/* write current frame */
if( out_control->write_steps > 0 &&
(data->step-data->prev_steps) % out_control->write_steps == 0 ) {
Append_Frame( system, control, data, lists, out_control, mpi_data );
}
}
}
diff --git a/src/USER-REAXC/reaxc_list.cpp b/src/USER-REAXC/reaxc_list.cpp
index d22ac4ca7..2755d5506 100644
--- a/src/USER-REAXC/reaxc_list.cpp
+++ b/src/USER-REAXC/reaxc_list.cpp
@@ -1,126 +1,126 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_list.h"
#include "reaxc_tool_box.h"
/************* allocate list space ******************/
int Make_List(int n, int num_intrs, int type, reax_list *l, MPI_Comm comm)
{
l->allocated = 1;
l->n = n;
l->num_intrs = num_intrs;
l->index = (int*) smalloc( n * sizeof(int), "list:index", comm );
l->end_index = (int*) smalloc( n * sizeof(int), "list:end_index", comm );
l->type = type;
switch(l->type) {
case TYP_VOID:
l->select.v = (void*) smalloc(l->num_intrs * sizeof(void*), "list:v", comm);
break;
case TYP_THREE_BODY:
l->select.three_body_list = (three_body_interaction_data*)
smalloc( l->num_intrs * sizeof(three_body_interaction_data),
"list:three_bodies", comm );
break;
case TYP_BOND:
l->select.bond_list = (bond_data*)
smalloc( l->num_intrs * sizeof(bond_data), "list:bonds", comm );
break;
case TYP_DBO:
l->select.dbo_list = (dbond_data*)
smalloc( l->num_intrs * sizeof(dbond_data), "list:dbonds", comm );
break;
case TYP_DDELTA:
l->select.dDelta_list = (dDelta_data*)
smalloc( l->num_intrs * sizeof(dDelta_data), "list:dDeltas", comm );
break;
case TYP_FAR_NEIGHBOR:
l->select.far_nbr_list = (far_neighbor_data*)
smalloc(l->num_intrs * sizeof(far_neighbor_data), "list:far_nbrs", comm);
break;
case TYP_HBOND:
l->select.hbond_list = (hbond_data*)
smalloc( l->num_intrs * sizeof(hbond_data), "list:hbonds", comm );
break;
default:
fprintf( stderr, "ERROR: no %d list type defined!\n", l->type );
MPI_Abort( comm, INVALID_INPUT );
}
return SUCCESS;
}
void Delete_List( reax_list *l, MPI_Comm comm )
{
if( l->allocated == 0 )
return;
l->allocated = 0;
sfree( l->index, "list:index" );
sfree( l->end_index, "list:end_index" );
switch(l->type) {
case TYP_VOID:
sfree( l->select.v, "list:v" );
break;
case TYP_HBOND:
sfree( l->select.hbond_list, "list:hbonds" );
break;
case TYP_FAR_NEIGHBOR:
sfree( l->select.far_nbr_list, "list:far_nbrs" );
break;
case TYP_BOND:
sfree( l->select.bond_list, "list:bonds" );
break;
case TYP_DBO:
sfree( l->select.dbo_list, "list:dbos" );
break;
case TYP_DDELTA:
sfree( l->select.dDelta_list, "list:dDeltas" );
break;
case TYP_THREE_BODY:
sfree( l->select.three_body_list, "list:three_bodies" );
break;
default:
fprintf( stderr, "ERROR: no %d list type defined!\n", l->type );
MPI_Abort( comm, INVALID_INPUT );
}
}
diff --git a/src/USER-REAXC/reaxc_lookup.cpp b/src/USER-REAXC/reaxc_lookup.cpp
index 903e54962..9db8b7b9f 100644
--- a/src/USER-REAXC/reaxc_lookup.cpp
+++ b/src/USER-REAXC/reaxc_lookup.cpp
@@ -1,304 +1,304 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_lookup.h"
#include "reaxc_nonbonded.h"
#include "reaxc_tool_box.h"
LR_lookup_table **LR;
void Tridiagonal_Solve( const double *a, const double *b,
double *c, double *d, double *x, unsigned int n){
int i;
double id;
c[0] /= b[0]; /* Division by zero risk. */
d[0] /= b[0]; /* Division by zero would imply a singular matrix. */
for(i = 1; i < n; i++){
id = (b[i] - c[i-1] * a[i]); /* Division by zero risk. */
c[i] /= id; /* Last value calculated is redundant. */
d[i] = (d[i] - d[i-1] * a[i])/id;
}
x[n - 1] = d[n - 1];
for(i = n - 2; i >= 0; i--)
x[i] = d[i] - c[i] * x[i + 1];
}
void Natural_Cubic_Spline( const double *h, const double *f,
cubic_spline_coef *coef, unsigned int n,
MPI_Comm comm )
{
int i;
double *a, *b, *c, *d, *v;
/* allocate space for the linear system */
a = (double*) smalloc( n * sizeof(double), "cubic_spline:a", comm );
b = (double*) smalloc( n * sizeof(double), "cubic_spline:a", comm );
c = (double*) smalloc( n * sizeof(double), "cubic_spline:a", comm );
d = (double*) smalloc( n * sizeof(double), "cubic_spline:a", comm );
v = (double*) smalloc( n * sizeof(double), "cubic_spline:a", comm );
/* build the linear system */
a[0] = a[1] = a[n-1] = 0;
for( i = 2; i < n-1; ++i )
a[i] = h[i-1];
b[0] = b[n-1] = 0;
for( i = 1; i < n-1; ++i )
b[i] = 2 * (h[i-1] + h[i]);
c[0] = c[n-2] = c[n-1] = 0;
for( i = 1; i < n-2; ++i )
c[i] = h[i];
d[0] = d[n-1] = 0;
for( i = 1; i < n-1; ++i )
d[i] = 6 * ((f[i+1]-f[i])/h[i] - (f[i]-f[i-1])/h[i-1]);
v[0] = 0;
v[n-1] = 0;
Tridiagonal_Solve( &(a[1]), &(b[1]), &(c[1]), &(d[1]), &(v[1]), n-2 );
for( i = 1; i < n; ++i ){
coef[i-1].d = (v[i] - v[i-1]) / (6*h[i-1]);
coef[i-1].c = v[i]/2;
coef[i-1].b = (f[i]-f[i-1])/h[i-1] + h[i-1]*(2*v[i] + v[i-1])/6;
coef[i-1].a = f[i];
}
sfree( a, "cubic_spline:a" );
sfree( b, "cubic_spline:b" );
sfree( c, "cubic_spline:c" );
sfree( d, "cubic_spline:d" );
sfree( v, "cubic_spline:v" );
}
void Complete_Cubic_Spline( const double *h, const double *f, double v0, double vlast,
cubic_spline_coef *coef, unsigned int n,
MPI_Comm comm )
{
int i;
double *a, *b, *c, *d, *v;
/* allocate space for the linear system */
a = (double*) smalloc( n * sizeof(double), "cubic_spline:a", comm );
b = (double*) smalloc( n * sizeof(double), "cubic_spline:a", comm );
c = (double*) smalloc( n * sizeof(double), "cubic_spline:a", comm );
d = (double*) smalloc( n * sizeof(double), "cubic_spline:a", comm );
v = (double*) smalloc( n * sizeof(double), "cubic_spline:a", comm );
/* build the linear system */
a[0] = 0;
for( i = 1; i < n; ++i )
a[i] = h[i-1];
b[0] = 2*h[0];
for( i = 1; i < n; ++i )
b[i] = 2 * (h[i-1] + h[i]);
c[n-1] = 0;
for( i = 0; i < n-1; ++i )
c[i] = h[i];
d[0] = 6 * (f[1]-f[0])/h[0] - 6 * v0;
d[n-1] = 6 * vlast - 6 * (f[n-1]-f[n-2]/h[n-2]);
for( i = 1; i < n-1; ++i )
d[i] = 6 * ((f[i+1]-f[i])/h[i] - (f[i]-f[i-1])/h[i-1]);
Tridiagonal_Solve( &(a[0]), &(b[0]), &(c[0]), &(d[0]), &(v[0]), n );
for( i = 1; i < n; ++i ){
coef[i-1].d = (v[i] - v[i-1]) / (6*h[i-1]);
coef[i-1].c = v[i]/2;
coef[i-1].b = (f[i]-f[i-1])/h[i-1] + h[i-1]*(2*v[i] + v[i-1])/6;
coef[i-1].a = f[i];
}
sfree( a, "cubic_spline:a" );
sfree( b, "cubic_spline:b" );
sfree( c, "cubic_spline:c" );
sfree( d, "cubic_spline:d" );
sfree( v, "cubic_spline:v" );
}
int Init_Lookup_Tables( reax_system *system, control_params *control,
storage *workspace, mpi_datatypes *mpi_data, char *msg )
{
int i, j, r;
int num_atom_types;
int existing_types[MAX_ATOM_TYPES], aggregated[MAX_ATOM_TYPES];
double dr;
double *h, *fh, *fvdw, *fele, *fCEvd, *fCEclmb;
double v0_vdw, v0_ele, vlast_vdw, vlast_ele;
MPI_Comm comm;
/* initializations */
v0_vdw = 0;
v0_ele = 0;
vlast_vdw = 0;
vlast_ele = 0;
comm = mpi_data->world;
num_atom_types = system->reax_param.num_atom_types;
dr = control->nonb_cut / control->tabulate;
h = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:h", comm );
fh = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:fh", comm );
fvdw = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:fvdw", comm );
fCEvd = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:fCEvd", comm );
fele = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:fele", comm );
fCEclmb = (double*)
smalloc( (control->tabulate+2) * sizeof(double), "lookup:fCEclmb", comm );
LR = (LR_lookup_table**)
scalloc( num_atom_types, sizeof(LR_lookup_table*), "lookup:LR", comm );
for( i = 0; i < num_atom_types; ++i )
LR[i] = (LR_lookup_table*)
scalloc( num_atom_types, sizeof(LR_lookup_table), "lookup:LR[i]", comm );
for( i = 0; i < MAX_ATOM_TYPES; ++i )
existing_types[i] = 0;
for( i = 0; i < system->n; ++i )
existing_types[ system->my_atoms[i].type ] = 1;
MPI_Allreduce( existing_types, aggregated, MAX_ATOM_TYPES,
MPI_INT, MPI_SUM, mpi_data->world );
for( i = 0; i < num_atom_types; ++i ) {
if( aggregated[i] ) {
for( j = i; j < num_atom_types; ++j ) {
if( aggregated[j] ) {
LR[i][j].xmin = 0;
LR[i][j].xmax = control->nonb_cut;
LR[i][j].n = control->tabulate + 2;
LR[i][j].dx = dr;
LR[i][j].inv_dx = control->tabulate / control->nonb_cut;
LR[i][j].y = (LR_data*)
smalloc( LR[i][j].n * sizeof(LR_data), "lookup:LR[i,j].y", comm );
LR[i][j].H = (cubic_spline_coef*)
smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].H" ,
comm );
LR[i][j].vdW = (cubic_spline_coef*)
smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].vdW",
comm);
LR[i][j].CEvd = (cubic_spline_coef*)
smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].CEvd",
comm);
LR[i][j].ele = (cubic_spline_coef*)
smalloc( LR[i][j].n*sizeof(cubic_spline_coef),"lookup:LR[i,j].ele",
comm );
LR[i][j].CEclmb = (cubic_spline_coef*)
smalloc( LR[i][j].n*sizeof(cubic_spline_coef),
"lookup:LR[i,j].CEclmb", comm );
for( r = 1; r <= control->tabulate; ++r ) {
LR_vdW_Coulomb( system, workspace, control, i, j, r * dr, &(LR[i][j].y[r]) );
h[r] = LR[i][j].dx;
fh[r] = LR[i][j].y[r].H;
fvdw[r] = LR[i][j].y[r].e_vdW;
fCEvd[r] = LR[i][j].y[r].CEvd;
fele[r] = LR[i][j].y[r].e_ele;
fCEclmb[r] = LR[i][j].y[r].CEclmb;
}
// init the start-end points
h[r] = LR[i][j].dx;
v0_vdw = LR[i][j].y[1].CEvd;
v0_ele = LR[i][j].y[1].CEclmb;
fh[r] = fh[r-1];
fvdw[r] = fvdw[r-1];
fCEvd[r] = fCEvd[r-1];
fele[r] = fele[r-1];
fCEclmb[r] = fCEclmb[r-1];
vlast_vdw = fCEvd[r-1];
vlast_ele = fele[r-1];
Natural_Cubic_Spline( &h[1], &fh[1],
&(LR[i][j].H[1]), control->tabulate+1, comm );
Complete_Cubic_Spline( &h[1], &fvdw[1], v0_vdw, vlast_vdw,
&(LR[i][j].vdW[1]), control->tabulate+1,
comm );
Natural_Cubic_Spline( &h[1], &fCEvd[1],
&(LR[i][j].CEvd[1]), control->tabulate+1,
comm );
Complete_Cubic_Spline( &h[1], &fele[1], v0_ele, vlast_ele,
&(LR[i][j].ele[1]), control->tabulate+1,
comm );
Natural_Cubic_Spline( &h[1], &fCEclmb[1],
&(LR[i][j].CEclmb[1]), control->tabulate+1,
comm );
} else{
LR[i][j].n = 0;
}
}
}
}
free(h);
free(fh);
free(fvdw);
free(fCEvd);
free(fele);
free(fCEclmb);
return 1;
}
void Deallocate_Lookup_Tables( reax_system *system )
{
int i, j;
int ntypes;
ntypes = system->reax_param.num_atom_types;
for( i = 0; i < ntypes; ++i ) {
for( j = i; j < ntypes; ++j )
if( LR[i][j].n ) {
sfree( LR[i][j].y, "LR[i,j].y" );
sfree( LR[i][j].H, "LR[i,j].H" );
sfree( LR[i][j].vdW, "LR[i,j].vdW" );
sfree( LR[i][j].CEvd, "LR[i,j].CEvd" );
sfree( LR[i][j].ele, "LR[i,j].ele" );
sfree( LR[i][j].CEclmb, "LR[i,j].CEclmb" );
}
sfree( LR[i], "LR[i]" );
}
sfree( LR, "LR" );
}
diff --git a/src/USER-REAXC/reaxc_multi_body.cpp b/src/USER-REAXC/reaxc_multi_body.cpp
index 1923668e8..ecfd3ad04 100644
--- a/src/USER-REAXC/reaxc_multi_body.cpp
+++ b/src/USER-REAXC/reaxc_multi_body.cpp
@@ -1,240 +1,243 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_multi_body.h"
#include "reaxc_bond_orders.h"
#include "reaxc_list.h"
#include "reaxc_vector.h"
void Atom_Energy( reax_system *system, control_params *control,
simulation_data *data, storage *workspace, reax_list **lists,
output_controls *out_control )
{
int i, j, pj, type_i, type_j;
double Delta_lpcorr, dfvl;
double e_lp, expvd2, inv_expvd2, dElp, CElp, DlpVi;
double e_lph, Di, vov3, deahu2dbo, deahu2dsbo;
double e_ov, CEover1, CEover2, CEover3, CEover4;
double exp_ovun1, exp_ovun2, sum_ovun1, sum_ovun2;
double exp_ovun2n, exp_ovun6, exp_ovun8;
double inv_exp_ovun1, inv_exp_ovun2, inv_exp_ovun2n, inv_exp_ovun8;
double e_un, CEunder1, CEunder2, CEunder3, CEunder4;
double p_lp2, p_lp3;
double p_ovun2, p_ovun3, p_ovun4, p_ovun5, p_ovun6, p_ovun7, p_ovun8;
double eng_tmp;
int numbonds;
single_body_parameters *sbp_i;
two_body_parameters *twbp;
bond_data *pbond;
bond_order_data *bo_ij;
reax_list *bonds = (*lists) + BONDS;
/* Initialize parameters */
p_lp3 = system->reax_param.gp.l[5];
p_ovun3 = system->reax_param.gp.l[32];
p_ovun4 = system->reax_param.gp.l[31];
p_ovun6 = system->reax_param.gp.l[6];
p_ovun7 = system->reax_param.gp.l[8];
p_ovun8 = system->reax_param.gp.l[9];
for( i = 0; i < system->n; ++i ) {
/* set the parameter pointer */
type_i = system->my_atoms[i].type;
if (type_i < 0) continue;
sbp_i = &(system->reax_param.sbp[ type_i ]);
/* lone-pair Energy */
p_lp2 = sbp_i->p_lp2;
expvd2 = exp( -75 * workspace->Delta_lp[i] );
inv_expvd2 = 1. / (1. + expvd2 );
numbonds = 0;
e_lp = 0.0;
for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj )
numbonds ++;
/* calculate the energy */
- if (numbonds > 0)
+ if (numbonds > 0 || control->enobondsflag)
data->my_en.e_lp += e_lp =
p_lp2 * workspace->Delta_lp[i] * inv_expvd2;
dElp = p_lp2 * inv_expvd2 +
75 * p_lp2 * workspace->Delta_lp[i] * expvd2 * SQR(inv_expvd2);
CElp = dElp * workspace->dDelta_lp[i];
- if (numbonds > 0) workspace->CdDelta[i] += CElp; // lp - 1st term
+ if (numbonds > 0 || control->enobondsflag)
+ workspace->CdDelta[i] += CElp; // lp - 1st term
/* tally into per-atom energy */
if( system->pair_ptr->evflag)
system->pair_ptr->ev_tally(i,i,system->n,1,e_lp,0.0,0.0,0.0,0.0,0.0);
/* correction for C2 */
if( p_lp3 > 0.001 && !strcmp(system->reax_param.sbp[type_i].name, "C") )
for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj ) {
j = bonds->select.bond_list[pj].nbr;
type_j = system->my_atoms[j].type;
if (type_j < 0) continue;
if( !strcmp( system->reax_param.sbp[type_j].name, "C" ) ) {
twbp = &( system->reax_param.tbp[type_i][type_j]);
bo_ij = &( bonds->select.bond_list[pj].bo_data );
Di = workspace->Delta[i];
vov3 = bo_ij->BO - Di - 0.040*pow(Di, 4.);
if( vov3 > 3. ) {
data->my_en.e_lp += e_lph = p_lp3 * SQR(vov3-3.0);
deahu2dbo = 2.*p_lp3*(vov3 - 3.);
deahu2dsbo = 2.*p_lp3*(vov3 - 3.)*(-1. - 0.16*pow(Di, 3.));
bo_ij->Cdbo += deahu2dbo;
workspace->CdDelta[i] += deahu2dsbo;
/* tally into per-atom energy */
if( system->pair_ptr->evflag)
system->pair_ptr->ev_tally(i,j,system->n,1,e_lph,0.0,0.0,0.0,0.0,0.0);
}
}
}
}
for( i = 0; i < system->n; ++i ) {
type_i = system->my_atoms[i].type;
if (type_i < 0) continue;
sbp_i = &(system->reax_param.sbp[ type_i ]);
/* over-coordination energy */
if( sbp_i->mass > 21.0 )
dfvl = 0.0;
else dfvl = 1.0; // only for 1st-row elements
p_ovun2 = sbp_i->p_ovun2;
sum_ovun1 = sum_ovun2 = 0;
for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj ) {
j = bonds->select.bond_list[pj].nbr;
type_j = system->my_atoms[j].type;
if (type_j < 0) continue;
bo_ij = &(bonds->select.bond_list[pj].bo_data);
twbp = &(system->reax_param.tbp[ type_i ][ type_j ]);
sum_ovun1 += twbp->p_ovun1 * twbp->De_s * bo_ij->BO;
sum_ovun2 += (workspace->Delta[j] - dfvl*workspace->Delta_lp_temp[j])*
( bo_ij->BO_pi + bo_ij->BO_pi2 );
}
exp_ovun1 = p_ovun3 * exp( p_ovun4 * sum_ovun2 );
inv_exp_ovun1 = 1.0 / (1 + exp_ovun1);
Delta_lpcorr = workspace->Delta[i] -
(dfvl * workspace->Delta_lp_temp[i]) * inv_exp_ovun1;
exp_ovun2 = exp( p_ovun2 * Delta_lpcorr );
inv_exp_ovun2 = 1.0 / (1.0 + exp_ovun2);
DlpVi = 1.0 / (Delta_lpcorr + sbp_i->valency + 1e-8);
CEover1 = Delta_lpcorr * DlpVi * inv_exp_ovun2;
data->my_en.e_ov += e_ov = sum_ovun1 * CEover1;
CEover2 = sum_ovun1 * DlpVi * inv_exp_ovun2 *
(1.0 - Delta_lpcorr * ( DlpVi + p_ovun2 * exp_ovun2 * inv_exp_ovun2 ));
CEover3 = CEover2 * (1.0 - dfvl * workspace->dDelta_lp[i] * inv_exp_ovun1 );
CEover4 = CEover2 * (dfvl * workspace->Delta_lp_temp[i]) *
p_ovun4 * exp_ovun1 * SQR(inv_exp_ovun1);
/* under-coordination potential */
p_ovun2 = sbp_i->p_ovun2;
p_ovun5 = sbp_i->p_ovun5;
exp_ovun2n = 1.0 / exp_ovun2;
exp_ovun6 = exp( p_ovun6 * Delta_lpcorr );
exp_ovun8 = p_ovun7 * exp(p_ovun8 * sum_ovun2);
inv_exp_ovun2n = 1.0 / (1.0 + exp_ovun2n);
inv_exp_ovun8 = 1.0 / (1.0 + exp_ovun8);
numbonds = 0;
e_un = 0.0;
for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj )
numbonds ++;
- if (numbonds > 0)
+ if (numbonds > 0 || control->enobondsflag)
data->my_en.e_un += e_un =
-p_ovun5 * (1.0 - exp_ovun6) * inv_exp_ovun2n * inv_exp_ovun8;
CEunder1 = inv_exp_ovun2n *
( p_ovun5 * p_ovun6 * exp_ovun6 * inv_exp_ovun8 +
p_ovun2 * e_un * exp_ovun2n );
CEunder2 = -e_un * p_ovun8 * exp_ovun8 * inv_exp_ovun8;
CEunder3 = CEunder1 * (1.0 - dfvl*workspace->dDelta_lp[i]*inv_exp_ovun1);
CEunder4 = CEunder1 * (dfvl*workspace->Delta_lp_temp[i]) *
p_ovun4 * exp_ovun1 * SQR(inv_exp_ovun1) + CEunder2;
/* tally into per-atom energy */
if( system->pair_ptr->evflag) {
eng_tmp = e_ov;
- if (numbonds > 0) eng_tmp += e_un;
+ if (numbonds > 0 || control->enobondsflag)
+ eng_tmp += e_un;
system->pair_ptr->ev_tally(i,i,system->n,1,eng_tmp,0.0,0.0,0.0,0.0,0.0);
}
/* forces */
workspace->CdDelta[i] += CEover3; // OvCoor - 2nd term
- if (numbonds > 0) workspace->CdDelta[i] += CEunder3; // UnCoor - 1st term
+ if (numbonds > 0 || control->enobondsflag)
+ workspace->CdDelta[i] += CEunder3; // UnCoor - 1st term
for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj ) {
pbond = &(bonds->select.bond_list[pj]);
j = pbond->nbr;
bo_ij = &(pbond->bo_data);
twbp = &(system->reax_param.tbp[ system->my_atoms[i].type ]
[system->my_atoms[pbond->nbr].type]);
bo_ij->Cdbo += CEover1 * twbp->p_ovun1 * twbp->De_s;// OvCoor-1st
workspace->CdDelta[j] += CEover4 * (1.0 - dfvl*workspace->dDelta_lp[j]) *
(bo_ij->BO_pi + bo_ij->BO_pi2); // OvCoor-3a
bo_ij->Cdbopi += CEover4 *
(workspace->Delta[j] - dfvl*workspace->Delta_lp_temp[j]); // OvCoor-3b
bo_ij->Cdbopi2 += CEover4 *
(workspace->Delta[j] - dfvl*workspace->Delta_lp_temp[j]); // OvCoor-3b
workspace->CdDelta[j] += CEunder4 * (1.0 - dfvl*workspace->dDelta_lp[j]) *
(bo_ij->BO_pi + bo_ij->BO_pi2); // UnCoor - 2a
bo_ij->Cdbopi += CEunder4 *
(workspace->Delta[j] - dfvl*workspace->Delta_lp_temp[j]); // UnCoor-2b
bo_ij->Cdbopi2 += CEunder4 *
(workspace->Delta[j] - dfvl*workspace->Delta_lp_temp[j]); // UnCoor-2b
}
}
}
diff --git a/src/USER-REAXC/reaxc_nonbonded.cpp b/src/USER-REAXC/reaxc_nonbonded.cpp
index cb24e2dc3..9c223428a 100644
--- a/src/USER-REAXC/reaxc_nonbonded.cpp
+++ b/src/USER-REAXC/reaxc_nonbonded.cpp
@@ -1,432 +1,432 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_types.h"
#include "reaxc_nonbonded.h"
#include "reaxc_bond_orders.h"
#include "reaxc_list.h"
#include "reaxc_vector.h"
void vdW_Coulomb_Energy( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, output_controls *out_control )
{
int i, j, pj, natoms;
int start_i, end_i, flag;
rc_tagint orig_i, orig_j;
double p_vdW1, p_vdW1i;
double powr_vdW1, powgi_vdW1;
double tmp, r_ij, fn13, exp1, exp2;
double Tap, dTap, dfn13, CEvd, CEclmb, de_core;
double dr3gamij_1, dr3gamij_3;
double e_ele, e_vdW, e_core, SMALL = 0.0001;
double e_lg, de_lg, r_ij5, r_ij6, re6;
rvec temp, ext_press;
two_body_parameters *twbp;
far_neighbor_data *nbr_pj;
reax_list *far_nbrs;
// Tallying variables:
double pe_vdw, f_tmp, delij[3];
natoms = system->n;
far_nbrs = (*lists) + FAR_NBRS;
p_vdW1 = system->reax_param.gp.l[28];
p_vdW1i = 1.0 / p_vdW1;
e_core = 0;
e_vdW = 0;
e_lg = de_lg = 0.0;
for( i = 0; i < natoms; ++i ) {
if (system->my_atoms[i].type < 0) continue;
start_i = Start_Index(i, far_nbrs);
end_i = End_Index(i, far_nbrs);
orig_i = system->my_atoms[i].orig_id;
for( pj = start_i; pj < end_i; ++pj ) {
nbr_pj = &(far_nbrs->select.far_nbr_list[pj]);
j = nbr_pj->nbr;
if (system->my_atoms[j].type < 0) continue;
orig_j = system->my_atoms[j].orig_id;
flag = 0;
if(nbr_pj->d <= control->nonb_cut) {
if (j < natoms) flag = 1;
else if (orig_i < orig_j) flag = 1;
else if (orig_i == orig_j) {
if (nbr_pj->dvec[2] > SMALL) flag = 1;
else if (fabs(nbr_pj->dvec[2]) < SMALL) {
if (nbr_pj->dvec[1] > SMALL) flag = 1;
else if (fabs(nbr_pj->dvec[1]) < SMALL && nbr_pj->dvec[0] > SMALL)
flag = 1;
}
}
}
if (flag) {
r_ij = nbr_pj->d;
twbp = &(system->reax_param.tbp[ system->my_atoms[i].type ]
[ system->my_atoms[j].type ]);
Tap = workspace->Tap[7] * r_ij + workspace->Tap[6];
Tap = Tap * r_ij + workspace->Tap[5];
Tap = Tap * r_ij + workspace->Tap[4];
Tap = Tap * r_ij + workspace->Tap[3];
Tap = Tap * r_ij + workspace->Tap[2];
Tap = Tap * r_ij + workspace->Tap[1];
Tap = Tap * r_ij + workspace->Tap[0];
dTap = 7*workspace->Tap[7] * r_ij + 6*workspace->Tap[6];
dTap = dTap * r_ij + 5*workspace->Tap[5];
dTap = dTap * r_ij + 4*workspace->Tap[4];
dTap = dTap * r_ij + 3*workspace->Tap[3];
dTap = dTap * r_ij + 2*workspace->Tap[2];
dTap += workspace->Tap[1]/r_ij;
/*vdWaals Calculations*/
if(system->reax_param.gp.vdw_type==1 || system->reax_param.gp.vdw_type==3)
{ // shielding
powr_vdW1 = pow(r_ij, p_vdW1);
powgi_vdW1 = pow( 1.0 / twbp->gamma_w, p_vdW1);
fn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i );
exp1 = exp( twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );
exp2 = exp( 0.5 * twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );
e_vdW = twbp->D * (exp1 - 2.0 * exp2);
data->my_en.e_vdW += Tap * e_vdW;
dfn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i - 1.0) *
pow(r_ij, p_vdW1 - 2.0);
CEvd = dTap * e_vdW -
Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) * dfn13;
}
else{ // no shielding
exp1 = exp( twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );
exp2 = exp( 0.5 * twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );
e_vdW = twbp->D * (exp1 - 2.0 * exp2);
data->my_en.e_vdW += Tap * e_vdW;
CEvd = dTap * e_vdW -
Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) / r_ij;
}
if(system->reax_param.gp.vdw_type==2 || system->reax_param.gp.vdw_type==3)
{ // inner wall
e_core = twbp->ecore * exp(twbp->acore * (1.0-(r_ij/twbp->rcore)));
data->my_en.e_vdW += Tap * e_core;
de_core = -(twbp->acore/twbp->rcore) * e_core;
CEvd += dTap * e_core + Tap * de_core / r_ij;
// lg correction, only if lgvdw is yes
if (control->lgflag) {
r_ij5 = pow( r_ij, 5.0 );
r_ij6 = pow( r_ij, 6.0 );
re6 = pow( twbp->lgre, 6.0 );
e_lg = -(twbp->lgcij/( r_ij6 + re6 ));
data->my_en.e_vdW += Tap * e_lg;
de_lg = -6.0 * e_lg * r_ij5 / ( r_ij6 + re6 ) ;
CEvd += dTap * e_lg + Tap * de_lg / r_ij;
}
}
/*Coulomb Calculations*/
dr3gamij_1 = ( r_ij * r_ij * r_ij + twbp->gamma );
dr3gamij_3 = pow( dr3gamij_1 , 0.33333333333333 );
tmp = Tap / dr3gamij_3;
data->my_en.e_ele += e_ele =
C_ele * system->my_atoms[i].q * system->my_atoms[j].q * tmp;
CEclmb = C_ele * system->my_atoms[i].q * system->my_atoms[j].q *
( dTap - Tap * r_ij / dr3gamij_1 ) / dr3gamij_3;
/* tally into per-atom energy */
if( system->pair_ptr->evflag || system->pair_ptr->vflag_atom) {
pe_vdw = Tap * (e_vdW + e_core + e_lg);
rvec_ScaledSum( delij, 1., system->my_atoms[i].x,
-1., system->my_atoms[j].x );
f_tmp = -(CEvd + CEclmb);
system->pair_ptr->ev_tally(i,j,natoms,1,pe_vdw,e_ele,
f_tmp,delij[0],delij[1],delij[2]);
}
if( control->virial == 0 ) {
rvec_ScaledAdd( workspace->f[i], -(CEvd + CEclmb), nbr_pj->dvec );
rvec_ScaledAdd( workspace->f[j], +(CEvd + CEclmb), nbr_pj->dvec );
}
else { /* NPT, iNPT or sNPT */
rvec_Scale( temp, CEvd + CEclmb, nbr_pj->dvec );
rvec_ScaledAdd( workspace->f[i], -1., temp );
rvec_Add( workspace->f[j], temp );
rvec_iMultiply( ext_press, nbr_pj->rel_box, temp );
rvec_Add( data->my_ext_press, ext_press );
}
}
}
}
Compute_Polarization_Energy( system, data );
}
void Tabulated_vdW_Coulomb_Energy( reax_system *system,control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists,
output_controls *out_control )
{
int i, j, pj, r, natoms;
int type_i, type_j, tmin, tmax;
int start_i, end_i, flag;
rc_tagint orig_i, orig_j;
double r_ij, base, dif;
double e_vdW, e_ele;
double CEvd, CEclmb, SMALL = 0.0001;
double f_tmp, delij[3];
rvec temp, ext_press;
far_neighbor_data *nbr_pj;
reax_list *far_nbrs;
LR_lookup_table *t;
natoms = system->n;
far_nbrs = (*lists) + FAR_NBRS;
e_ele = e_vdW = 0;
for( i = 0; i < natoms; ++i ) {
type_i = system->my_atoms[i].type;
if (type_i < 0) continue;
start_i = Start_Index(i,far_nbrs);
end_i = End_Index(i,far_nbrs);
orig_i = system->my_atoms[i].orig_id;
for( pj = start_i; pj < end_i; ++pj ) {
nbr_pj = &(far_nbrs->select.far_nbr_list[pj]);
j = nbr_pj->nbr;
type_j = system->my_atoms[j].type;
if (type_j < 0) continue;
orig_j = system->my_atoms[j].orig_id;
flag = 0;
if(nbr_pj->d <= control->nonb_cut) {
if (j < natoms) flag = 1;
else if (orig_i < orig_j) flag = 1;
else if (orig_i == orig_j) {
if (nbr_pj->dvec[2] > SMALL) flag = 1;
else if (fabs(nbr_pj->dvec[2]) < SMALL) {
if (nbr_pj->dvec[1] > SMALL) flag = 1;
else if (fabs(nbr_pj->dvec[1]) < SMALL && nbr_pj->dvec[0] > SMALL)
flag = 1;
}
}
}
if (flag) {
r_ij = nbr_pj->d;
tmin = MIN( type_i, type_j );
tmax = MAX( type_i, type_j );
t = &( LR[tmin][tmax] );
/* Cubic Spline Interpolation */
r = (int)(r_ij * t->inv_dx);
if( r == 0 ) ++r;
base = (double)(r+1) * t->dx;
dif = r_ij - base;
e_vdW = ((t->vdW[r].d*dif + t->vdW[r].c)*dif + t->vdW[r].b)*dif +
t->vdW[r].a;
e_ele = ((t->ele[r].d*dif + t->ele[r].c)*dif + t->ele[r].b)*dif +
t->ele[r].a;
e_ele *= system->my_atoms[i].q * system->my_atoms[j].q;
data->my_en.e_vdW += e_vdW;
data->my_en.e_ele += e_ele;
CEvd = ((t->CEvd[r].d*dif + t->CEvd[r].c)*dif + t->CEvd[r].b)*dif +
t->CEvd[r].a;
CEclmb = ((t->CEclmb[r].d*dif+t->CEclmb[r].c)*dif+t->CEclmb[r].b)*dif +
t->CEclmb[r].a;
CEclmb *= system->my_atoms[i].q * system->my_atoms[j].q;
/* tally into per-atom energy */
if( system->pair_ptr->evflag || system->pair_ptr->vflag_atom) {
rvec_ScaledSum( delij, 1., system->my_atoms[i].x,
-1., system->my_atoms[j].x );
f_tmp = -(CEvd + CEclmb);
system->pair_ptr->ev_tally(i,j,natoms,1,e_vdW,e_ele,
f_tmp,delij[0],delij[1],delij[2]);
}
if( control->virial == 0 ) {
rvec_ScaledAdd( workspace->f[i], -(CEvd + CEclmb), nbr_pj->dvec );
rvec_ScaledAdd( workspace->f[j], +(CEvd + CEclmb), nbr_pj->dvec );
}
else { // NPT, iNPT or sNPT
rvec_Scale( temp, CEvd + CEclmb, nbr_pj->dvec );
rvec_ScaledAdd( workspace->f[i], -1., temp );
rvec_Add( workspace->f[j], temp );
rvec_iMultiply( ext_press, nbr_pj->rel_box, temp );
rvec_Add( data->my_ext_press, ext_press );
}
}
}
}
Compute_Polarization_Energy( system, data );
}
void Compute_Polarization_Energy( reax_system *system, simulation_data *data )
{
int i, type_i;
double q, en_tmp;
data->my_en.e_pol = 0.0;
for( i = 0; i < system->n; i++ ) {
type_i = system->my_atoms[i].type;
if (type_i < 0) continue;
q = system->my_atoms[i].q;
en_tmp = KCALpMOL_to_EV * (system->reax_param.sbp[type_i].chi * q +
(system->reax_param.sbp[type_i].eta / 2.) * SQR(q));
data->my_en.e_pol += en_tmp;
/* tally into per-atom energy */
if( system->pair_ptr->evflag)
system->pair_ptr->ev_tally(i,i,system->n,1,0.0,en_tmp,0.0,0.0,0.0,0.0);
}
}
void LR_vdW_Coulomb( reax_system *system, storage *workspace,
control_params *control, int i, int j, double r_ij, LR_data *lr )
{
double p_vdW1 = system->reax_param.gp.l[28];
double p_vdW1i = 1.0 / p_vdW1;
double powr_vdW1, powgi_vdW1;
double tmp, fn13, exp1, exp2;
double Tap, dTap, dfn13;
double dr3gamij_1, dr3gamij_3;
double e_core, de_core;
double e_lg, de_lg, r_ij5, r_ij6, re6;
two_body_parameters *twbp;
twbp = &(system->reax_param.tbp[i][j]);
e_core = 0;
de_core = 0;
e_lg = de_lg = 0.0;
/* calculate taper and its derivative */
Tap = workspace->Tap[7] * r_ij + workspace->Tap[6];
Tap = Tap * r_ij + workspace->Tap[5];
Tap = Tap * r_ij + workspace->Tap[4];
Tap = Tap * r_ij + workspace->Tap[3];
Tap = Tap * r_ij + workspace->Tap[2];
Tap = Tap * r_ij + workspace->Tap[1];
Tap = Tap * r_ij + workspace->Tap[0];
dTap = 7*workspace->Tap[7] * r_ij + 6*workspace->Tap[6];
dTap = dTap * r_ij + 5*workspace->Tap[5];
dTap = dTap * r_ij + 4*workspace->Tap[4];
dTap = dTap * r_ij + 3*workspace->Tap[3];
dTap = dTap * r_ij + 2*workspace->Tap[2];
dTap += workspace->Tap[1]/r_ij;
/*vdWaals Calculations*/
if(system->reax_param.gp.vdw_type==1 || system->reax_param.gp.vdw_type==3)
{ // shielding
powr_vdW1 = pow(r_ij, p_vdW1);
powgi_vdW1 = pow( 1.0 / twbp->gamma_w, p_vdW1);
fn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i );
exp1 = exp( twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );
exp2 = exp( 0.5 * twbp->alpha * (1.0 - fn13 / twbp->r_vdW) );
lr->e_vdW = Tap * twbp->D * (exp1 - 2.0 * exp2);
dfn13 = pow( powr_vdW1 + powgi_vdW1, p_vdW1i-1.0) * pow(r_ij, p_vdW1-2.0);
lr->CEvd = dTap * twbp->D * (exp1 - 2.0 * exp2) -
Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) * dfn13;
}
else{ // no shielding
exp1 = exp( twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );
exp2 = exp( 0.5 * twbp->alpha * (1.0 - r_ij / twbp->r_vdW) );
lr->e_vdW = Tap * twbp->D * (exp1 - 2.0 * exp2);
lr->CEvd = dTap * twbp->D * (exp1 - 2.0 * exp2) -
Tap * twbp->D * (twbp->alpha / twbp->r_vdW) * (exp1 - exp2) / r_ij;
}
if(system->reax_param.gp.vdw_type==2 || system->reax_param.gp.vdw_type==3)
{ // inner wall
e_core = twbp->ecore * exp(twbp->acore * (1.0-(r_ij/twbp->rcore)));
lr->e_vdW += Tap * e_core;
de_core = -(twbp->acore/twbp->rcore) * e_core;
lr->CEvd += dTap * e_core + Tap * de_core / r_ij;
// lg correction, only if lgvdw is yes
if (control->lgflag) {
r_ij5 = pow( r_ij, 5.0 );
r_ij6 = pow( r_ij, 6.0 );
re6 = pow( twbp->lgre, 6.0 );
e_lg = -(twbp->lgcij/( r_ij6 + re6 ));
lr->e_vdW += Tap * e_lg;
de_lg = -6.0 * e_lg * r_ij5 / ( r_ij6 + re6 ) ;
lr->CEvd += dTap * e_lg + Tap * de_lg/r_ij;
}
}
/* Coulomb calculations */
dr3gamij_1 = ( r_ij * r_ij * r_ij + twbp->gamma );
dr3gamij_3 = pow( dr3gamij_1 , 0.33333333333333 );
tmp = Tap / dr3gamij_3;
lr->H = EV_to_KCALpMOL * tmp;
lr->e_ele = C_ele * tmp;
lr->CEclmb = C_ele * ( dTap - Tap * r_ij / dr3gamij_1 ) / dr3gamij_3;
}
diff --git a/src/USER-REAXC/reaxc_reset_tools.cpp b/src/USER-REAXC/reaxc_reset_tools.cpp
index 1e6aeab47..4ec744e7b 100644
--- a/src/USER-REAXC/reaxc_reset_tools.cpp
+++ b/src/USER-REAXC/reaxc_reset_tools.cpp
@@ -1,192 +1,192 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_reset_tools.h"
#include "reaxc_list.h"
#include "reaxc_tool_box.h"
#include "reaxc_vector.h"
void Reset_Atoms( reax_system* system, control_params *control )
{
int i;
reax_atom *atom;
system->numH = 0;
if( control->hbond_cut > 0 )
for( i = 0; i < system->n; ++i ) {
atom = &(system->my_atoms[i]);
if (atom->type < 0) continue;
if( system->reax_param.sbp[ atom->type ].p_hbond == 1 )
atom->Hindex = system->numH++;
else atom->Hindex = -1;
}
}
void Reset_Energies( energy_data *en )
{
en->e_bond = 0;
en->e_ov = 0;
en->e_un = 0;
en->e_lp = 0;
en->e_ang = 0;
en->e_pen = 0;
en->e_coa = 0;
en->e_hb = 0;
en->e_tor = 0;
en->e_con = 0;
en->e_vdW = 0;
en->e_ele = 0;
en->e_pol = 0;
en->e_pot = 0;
en->e_kin = 0;
en->e_tot = 0;
}
void Reset_Temperatures( simulation_data *data )
{
data->therm.T = 0;
}
void Reset_Pressures( simulation_data *data )
{
data->flex_bar.P_scalar = 0;
rtensor_MakeZero( data->flex_bar.P );
data->iso_bar.P = 0;
rvec_MakeZero( data->int_press );
rvec_MakeZero( data->my_ext_press );
rvec_MakeZero( data->ext_press );
}
void Reset_Simulation_Data( simulation_data* data, int virial )
{
Reset_Energies( &data->my_en );
Reset_Energies( &data->sys_en );
Reset_Temperatures( data );
Reset_Pressures( data );
}
void Reset_Timing( reax_timing *rt )
{
rt->total = Get_Time();
rt->comm = 0;
rt->nbrs = 0;
rt->init_forces = 0;
rt->bonded = 0;
rt->nonb = 0;
rt->qEq = 0;
rt->s_matvecs = 0;
rt->t_matvecs = 0;
}
void Reset_Workspace( reax_system *system, storage *workspace )
{
memset( workspace->total_bond_order, 0, system->total_cap * sizeof( double ) );
memset( workspace->dDeltap_self, 0, system->total_cap * sizeof( rvec ) );
memset( workspace->CdDelta, 0, system->total_cap * sizeof( double ) );
memset( workspace->f, 0, system->total_cap * sizeof( rvec ) );
}
void Reset_Neighbor_Lists( reax_system *system, control_params *control,
storage *workspace, reax_list **lists,
MPI_Comm comm )
{
int i, total_bonds, Hindex, total_hbonds;
reax_list *bonds, *hbonds;
/* bonds list */
if( system->N > 0 ){
bonds = (*lists) + BONDS;
total_bonds = 0;
/* reset start-end indexes */
for( i = 0; i < system->N; ++i ) {
Set_Start_Index( i, total_bonds, bonds );
Set_End_Index( i, total_bonds, bonds );
total_bonds += system->my_atoms[i].num_bonds;
}
/* is reallocation needed? */
if( total_bonds >= bonds->num_intrs * DANGER_ZONE ) {
workspace->realloc.bonds = 1;
if( total_bonds >= bonds->num_intrs ) {
fprintf(stderr,
"p%d: not enough space for bonds! total=%d allocated=%d\n",
system->my_rank, total_bonds, bonds->num_intrs );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
}
}
if( control->hbond_cut > 0 && system->numH > 0 ) {
hbonds = (*lists) + HBONDS;
total_hbonds = 0;
/* reset start-end indexes */
for( i = 0; i < system->n; ++i ) {
Hindex = system->my_atoms[i].Hindex;
if( Hindex > -1 ) {
Set_Start_Index( Hindex, total_hbonds, hbonds );
Set_End_Index( Hindex, total_hbonds, hbonds );
total_hbonds += system->my_atoms[i].num_hbonds;
}
}
/* is reallocation needed? */
if( total_hbonds >= hbonds->num_intrs * 0.90/*DANGER_ZONE*/ ) {
workspace->realloc.hbonds = 1;
if( total_hbonds >= hbonds->num_intrs ) {
fprintf(stderr,
"p%d: not enough space for hbonds! total=%d allocated=%d\n",
system->my_rank, total_hbonds, hbonds->num_intrs );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
}
}
}
void Reset( reax_system *system, control_params *control, simulation_data *data,
storage *workspace, reax_list **lists, MPI_Comm comm )
{
Reset_Atoms( system, control );
Reset_Simulation_Data( data, control->virial );
Reset_Workspace( system, workspace );
Reset_Neighbor_Lists( system, control, workspace, lists, comm );
}
diff --git a/src/USER-REAXC/reaxc_system_props.cpp b/src/USER-REAXC/reaxc_system_props.cpp
index 6b4551a03..54eeb6da1 100644
--- a/src/USER-REAXC/reaxc_system_props.cpp
+++ b/src/USER-REAXC/reaxc_system_props.cpp
@@ -1,88 +1,88 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_system_props.h"
#include "reaxc_tool_box.h"
#include "reaxc_vector.h"
void Compute_System_Energy( reax_system *system, simulation_data *data,
MPI_Comm comm )
{
double my_en[15], sys_en[15];
my_en[0] = data->my_en.e_bond;
my_en[1] = data->my_en.e_ov;
my_en[2] = data->my_en.e_un;
my_en[3] = data->my_en.e_lp;
my_en[4] = data->my_en.e_ang;
my_en[5] = data->my_en.e_pen;
my_en[6] = data->my_en.e_coa;
my_en[7] = data->my_en.e_hb;
my_en[8] = data->my_en.e_tor;
my_en[9] = data->my_en.e_con;
my_en[10] = data->my_en.e_vdW;
my_en[11] = data->my_en.e_ele;
my_en[12] = data->my_en.e_pol;
my_en[13] = data->my_en.e_kin;
MPI_Reduce( my_en, sys_en, 14, MPI_DOUBLE, MPI_SUM, MASTER_NODE, comm );
data->my_en.e_pot = data->my_en.e_bond +
data->my_en.e_ov + data->my_en.e_un + data->my_en.e_lp +
data->my_en.e_ang + data->my_en.e_pen + data->my_en.e_coa +
data->my_en.e_hb +
data->my_en.e_tor + data->my_en.e_con +
data->my_en.e_vdW + data->my_en.e_ele + data->my_en.e_pol;
data->my_en.e_tot = data->my_en.e_pot + E_CONV * data->my_en.e_kin;
if( system->my_rank == MASTER_NODE ) {
data->sys_en.e_bond = sys_en[0];
data->sys_en.e_ov = sys_en[1];
data->sys_en.e_un = sys_en[2];
data->sys_en.e_lp = sys_en[3];
data->sys_en.e_ang = sys_en[4];
data->sys_en.e_pen = sys_en[5];
data->sys_en.e_coa = sys_en[6];
data->sys_en.e_hb = sys_en[7];
data->sys_en.e_tor = sys_en[8];
data->sys_en.e_con = sys_en[9];
data->sys_en.e_vdW = sys_en[10];
data->sys_en.e_ele = sys_en[11];
data->sys_en.e_pol = sys_en[12];
data->sys_en.e_kin = sys_en[13];
data->sys_en.e_pot = data->sys_en.e_bond +
data->sys_en.e_ov + data->sys_en.e_un + data->sys_en.e_lp +
data->sys_en.e_ang + data->sys_en.e_pen + data->sys_en.e_coa +
data->sys_en.e_hb +
data->sys_en.e_tor + data->sys_en.e_con +
data->sys_en.e_vdW + data->sys_en.e_ele + data->sys_en.e_pol;
data->sys_en.e_tot = data->sys_en.e_pot + E_CONV * data->sys_en.e_kin;
}
}
diff --git a/src/USER-REAXC/reaxc_tool_box.cpp b/src/USER-REAXC/reaxc_tool_box.cpp
index 22576e9f3..4fc6796ef 100644
--- a/src/USER-REAXC/reaxc_tool_box.cpp
+++ b/src/USER-REAXC/reaxc_tool_box.cpp
@@ -1,121 +1,121 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_tool_box.h"
struct timeval tim;
double t_end;
double Get_Time( )
{
gettimeofday(&tim, NULL );
return( tim.tv_sec + (tim.tv_usec / 1000000.0) );
}
int Tokenize( char* s, char*** tok )
{
char test[MAX_LINE];
const char *sep = (const char *)"\t \n\r\f!=";
char *word;
int count=0;
strncpy( test, s, MAX_LINE );
for( word = strtok(test, sep); word; word = strtok(NULL, sep) ) {
strncpy( (*tok)[count], word, MAX_LINE );
count++;
}
return count;
}
/* safe malloc */
void *smalloc( rc_bigint n, const char *name, MPI_Comm comm )
{
void *ptr;
if( n <= 0 ) {
fprintf( stderr, "WARNING: trying to allocate %ld bytes for array %s. ",
n, name );
fprintf( stderr, "returning NULL.\n" );
return NULL;
}
ptr = malloc( n );
if( ptr == NULL ) {
fprintf( stderr, "ERROR: failed to allocate %ld bytes for array %s",
n, name );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
return ptr;
}
/* safe calloc */
void *scalloc( rc_bigint n, rc_bigint size, const char *name, MPI_Comm comm )
{
void *ptr;
if( n <= 0 ) {
fprintf( stderr, "WARNING: trying to allocate %ld elements for array %s. ",
n, name );
fprintf( stderr, "returning NULL.\n" );
return NULL;
}
if( size <= 0 ) {
fprintf( stderr, "WARNING: elements size for array %s is %ld. ",
name, size );
fprintf( stderr, "returning NULL.\n" );
return NULL;
}
ptr = calloc( n, size );
if( ptr == NULL ) {
fprintf( stderr, "ERROR: failed to allocate %ld bytes for array %s",
n*size, name );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
return ptr;
}
/* safe free */
void sfree( void *ptr, const char *name )
{
if( ptr == NULL ) {
fprintf( stderr, "WARNING: trying to free the already NULL pointer %s!\n",
name );
return;
}
free( ptr );
ptr = NULL;
}
diff --git a/src/USER-REAXC/reaxc_torsion_angles.cpp b/src/USER-REAXC/reaxc_torsion_angles.cpp
index 2cfe32976..74d5b04f2 100644
--- a/src/USER-REAXC/reaxc_torsion_angles.cpp
+++ b/src/USER-REAXC/reaxc_torsion_angles.cpp
@@ -1,479 +1,479 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_torsion_angles.h"
#include "reaxc_bond_orders.h"
#include "reaxc_list.h"
#include "reaxc_tool_box.h"
#include "reaxc_vector.h"
#define MIN_SINE 1e-10
double Calculate_Omega( rvec dvec_ij, double r_ij,
rvec dvec_jk, double r_jk,
rvec dvec_kl, double r_kl,
rvec dvec_li, double r_li,
three_body_interaction_data *p_ijk,
three_body_interaction_data *p_jkl,
rvec dcos_omega_di, rvec dcos_omega_dj,
rvec dcos_omega_dk, rvec dcos_omega_dl,
output_controls *out_control )
{
double unnorm_cos_omega, unnorm_sin_omega, omega;
double sin_ijk, cos_ijk, sin_jkl, cos_jkl;
double htra, htrb, htrc, hthd, hthe, hnra, hnrc, hnhd, hnhe;
double arg, poem, tel;
rvec cross_jk_kl;
sin_ijk = sin( p_ijk->theta );
cos_ijk = cos( p_ijk->theta );
sin_jkl = sin( p_jkl->theta );
cos_jkl = cos( p_jkl->theta );
/* omega */
unnorm_cos_omega = -rvec_Dot(dvec_ij, dvec_jk) * rvec_Dot(dvec_jk, dvec_kl) +
SQR( r_jk ) * rvec_Dot( dvec_ij, dvec_kl );
rvec_Cross( cross_jk_kl, dvec_jk, dvec_kl );
unnorm_sin_omega = -r_jk * rvec_Dot( dvec_ij, cross_jk_kl );
omega = atan2( unnorm_sin_omega, unnorm_cos_omega );
htra = r_ij + cos_ijk * ( r_kl * cos_jkl - r_jk );
htrb = r_jk - r_ij * cos_ijk - r_kl * cos_jkl;
htrc = r_kl + cos_jkl * ( r_ij * cos_ijk - r_jk );
hthd = r_ij * sin_ijk * ( r_jk - r_kl * cos_jkl );
hthe = r_kl * sin_jkl * ( r_jk - r_ij * cos_ijk );
hnra = r_kl * sin_ijk * sin_jkl;
hnrc = r_ij * sin_ijk * sin_jkl;
hnhd = r_ij * r_kl * cos_ijk * sin_jkl;
hnhe = r_ij * r_kl * sin_ijk * cos_jkl;
poem = 2.0 * r_ij * r_kl * sin_ijk * sin_jkl;
if( poem < 1e-20 ) poem = 1e-20;
tel = SQR( r_ij ) + SQR( r_jk ) + SQR( r_kl ) - SQR( r_li ) -
2.0 * ( r_ij * r_jk * cos_ijk - r_ij * r_kl * cos_ijk * cos_jkl +
r_jk * r_kl * cos_jkl );
arg = tel / poem;
if( arg > 1.0 ) arg = 1.0;
if( arg < -1.0 ) arg = -1.0;
if( sin_ijk >= 0 && sin_ijk <= MIN_SINE ) sin_ijk = MIN_SINE;
else if( sin_ijk <= 0 && sin_ijk >= -MIN_SINE ) sin_ijk = -MIN_SINE;
if( sin_jkl >= 0 && sin_jkl <= MIN_SINE ) sin_jkl = MIN_SINE;
else if( sin_jkl <= 0 && sin_jkl >= -MIN_SINE ) sin_jkl = -MIN_SINE;
// dcos_omega_di
rvec_ScaledSum( dcos_omega_di, (htra-arg*hnra)/r_ij, dvec_ij, -1., dvec_li );
rvec_ScaledAdd( dcos_omega_di,-(hthd-arg*hnhd)/sin_ijk, p_ijk->dcos_dk );
rvec_Scale( dcos_omega_di, 2.0 / poem, dcos_omega_di );
// dcos_omega_dj
rvec_ScaledSum( dcos_omega_dj,-(htra-arg*hnra)/r_ij, dvec_ij,
-htrb / r_jk, dvec_jk );
rvec_ScaledAdd( dcos_omega_dj,-(hthd-arg*hnhd)/sin_ijk, p_ijk->dcos_dj );
rvec_ScaledAdd( dcos_omega_dj,-(hthe-arg*hnhe)/sin_jkl, p_jkl->dcos_di );
rvec_Scale( dcos_omega_dj, 2.0 / poem, dcos_omega_dj );
// dcos_omega_dk
rvec_ScaledSum( dcos_omega_dk,-(htrc-arg*hnrc)/r_kl, dvec_kl,
htrb / r_jk, dvec_jk );
rvec_ScaledAdd( dcos_omega_dk,-(hthd-arg*hnhd)/sin_ijk, p_ijk->dcos_di );
rvec_ScaledAdd( dcos_omega_dk,-(hthe-arg*hnhe)/sin_jkl, p_jkl->dcos_dj );
rvec_Scale( dcos_omega_dk, 2.0 / poem, dcos_omega_dk );
// dcos_omega_dl
rvec_ScaledSum( dcos_omega_dl, (htrc-arg*hnrc)/r_kl, dvec_kl, 1., dvec_li );
rvec_ScaledAdd( dcos_omega_dl,-(hthe-arg*hnhe)/sin_jkl, p_jkl->dcos_dk );
rvec_Scale( dcos_omega_dl, 2.0 / poem, dcos_omega_dl );
return omega;
}
void Torsion_Angles( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, output_controls *out_control )
{
int i, j, k, l, pi, pj, pk, pl, pij, plk, natoms;
int type_i, type_j, type_k, type_l;
int start_j, end_j;
int start_pj, end_pj, start_pk, end_pk;
int num_frb_intrs = 0;
double Delta_j, Delta_k;
double r_ij, r_jk, r_kl, r_li;
double BOA_ij, BOA_jk, BOA_kl;
double exp_tor2_ij, exp_tor2_jk, exp_tor2_kl;
double exp_tor1, exp_tor3_DjDk, exp_tor4_DjDk, exp_tor34_inv;
double exp_cot2_jk, exp_cot2_ij, exp_cot2_kl;
double fn10, f11_DjDk, dfn11, fn12;
double theta_ijk, theta_jkl;
double sin_ijk, sin_jkl;
double cos_ijk, cos_jkl;
double tan_ijk_i, tan_jkl_i;
double omega, cos_omega, cos2omega, cos3omega;
rvec dcos_omega_di, dcos_omega_dj, dcos_omega_dk, dcos_omega_dl;
double CV, cmn, CEtors1, CEtors2, CEtors3, CEtors4;
double CEtors5, CEtors6, CEtors7, CEtors8, CEtors9;
double Cconj, CEconj1, CEconj2, CEconj3;
double CEconj4, CEconj5, CEconj6;
double e_tor, e_con;
rvec dvec_li;
rvec force, ext_press;
ivec rel_box_jl;
// rtensor total_rtensor, temp_rtensor;
four_body_header *fbh;
four_body_parameters *fbp;
bond_data *pbond_ij, *pbond_jk, *pbond_kl;
bond_order_data *bo_ij, *bo_jk, *bo_kl;
three_body_interaction_data *p_ijk, *p_jkl;
double p_tor2 = system->reax_param.gp.l[23];
double p_tor3 = system->reax_param.gp.l[24];
double p_tor4 = system->reax_param.gp.l[25];
double p_cot2 = system->reax_param.gp.l[27];
reax_list *bonds = (*lists) + BONDS;
reax_list *thb_intrs = (*lists) + THREE_BODIES;
// Virial tallying variables
double delil[3], deljl[3], delkl[3];
double eng_tmp, fi_tmp[3], fj_tmp[3], fk_tmp[3];
natoms = system->n;
for( j = 0; j < natoms; ++j ) {
type_j = system->my_atoms[j].type;
Delta_j = workspace->Delta_boc[j];
start_j = Start_Index(j, bonds);
end_j = End_Index(j, bonds);
for( pk = start_j; pk < end_j; ++pk ) {
pbond_jk = &( bonds->select.bond_list[pk] );
k = pbond_jk->nbr;
bo_jk = &( pbond_jk->bo_data );
BOA_jk = bo_jk->BO - control->thb_cut;
if( system->my_atoms[j].orig_id > system->my_atoms[k].orig_id )
continue;
if( system->my_atoms[j].orig_id == system->my_atoms[k].orig_id ) {
if (system->my_atoms[k].x[2] < system->my_atoms[j].x[2]) continue;
if (system->my_atoms[k].x[2] == system->my_atoms[j].x[2] &&
system->my_atoms[k].x[1] < system->my_atoms[j].x[1]) continue;
if (system->my_atoms[k].x[2] == system->my_atoms[j].x[2] &&
system->my_atoms[k].x[1] == system->my_atoms[j].x[1] &&
system->my_atoms[k].x[0] < system->my_atoms[j].x[0]) continue;
}
if( bo_jk->BO > control->thb_cut/*0*/ && Num_Entries(pk, thb_intrs) ) {
pj = pbond_jk->sym_index; // pj points to j on k's list
if( Num_Entries(pj, thb_intrs) ) {
type_k = system->my_atoms[k].type;
Delta_k = workspace->Delta_boc[k];
r_jk = pbond_jk->d;
start_pk = Start_Index(pk, thb_intrs );
end_pk = End_Index(pk, thb_intrs );
start_pj = Start_Index(pj, thb_intrs );
end_pj = End_Index(pj, thb_intrs );
exp_tor2_jk = exp( -p_tor2 * BOA_jk );
exp_cot2_jk = exp( -p_cot2 * SQR(BOA_jk - 1.5) );
exp_tor3_DjDk = exp( -p_tor3 * (Delta_j + Delta_k) );
exp_tor4_DjDk = exp( p_tor4 * (Delta_j + Delta_k) );
exp_tor34_inv = 1.0 / (1.0 + exp_tor3_DjDk + exp_tor4_DjDk);
f11_DjDk = (2.0 + exp_tor3_DjDk) * exp_tor34_inv;
for( pi = start_pk; pi < end_pk; ++pi ) {
p_ijk = &( thb_intrs->select.three_body_list[pi] );
pij = p_ijk->pthb; // pij is pointer to i on j's bond_list
pbond_ij = &( bonds->select.bond_list[pij] );
bo_ij = &( pbond_ij->bo_data );
if( bo_ij->BO > control->thb_cut/*0*/ ) {
i = p_ijk->thb;
type_i = system->my_atoms[i].type;
r_ij = pbond_ij->d;
BOA_ij = bo_ij->BO - control->thb_cut;
theta_ijk = p_ijk->theta;
sin_ijk = sin( theta_ijk );
cos_ijk = cos( theta_ijk );
//tan_ijk_i = 1. / tan( theta_ijk );
if( sin_ijk >= 0 && sin_ijk <= MIN_SINE )
tan_ijk_i = cos_ijk / MIN_SINE;
else if( sin_ijk <= 0 && sin_ijk >= -MIN_SINE )
tan_ijk_i = cos_ijk / -MIN_SINE;
else tan_ijk_i = cos_ijk / sin_ijk;
exp_tor2_ij = exp( -p_tor2 * BOA_ij );
exp_cot2_ij = exp( -p_cot2 * SQR(BOA_ij -1.5) );
for( pl = start_pj; pl < end_pj; ++pl ) {
p_jkl = &( thb_intrs->select.three_body_list[pl] );
l = p_jkl->thb;
plk = p_jkl->pthb; //pointer to l on k's bond_list!
pbond_kl = &( bonds->select.bond_list[plk] );
bo_kl = &( pbond_kl->bo_data );
type_l = system->my_atoms[l].type;
fbh = &(system->reax_param.fbp[type_i][type_j]
[type_k][type_l]);
fbp = &(system->reax_param.fbp[type_i][type_j]
[type_k][type_l].prm[0]);
if( i != l && fbh->cnt &&
bo_kl->BO > control->thb_cut/*0*/ &&
bo_ij->BO * bo_jk->BO * bo_kl->BO > control->thb_cut/*0*/ ){
++num_frb_intrs;
r_kl = pbond_kl->d;
BOA_kl = bo_kl->BO - control->thb_cut;
theta_jkl = p_jkl->theta;
sin_jkl = sin( theta_jkl );
cos_jkl = cos( theta_jkl );
//tan_jkl_i = 1. / tan( theta_jkl );
if( sin_jkl >= 0 && sin_jkl <= MIN_SINE )
tan_jkl_i = cos_jkl / MIN_SINE;
else if( sin_jkl <= 0 && sin_jkl >= -MIN_SINE )
tan_jkl_i = cos_jkl / -MIN_SINE;
else tan_jkl_i = cos_jkl /sin_jkl;
rvec_ScaledSum( dvec_li, 1., system->my_atoms[i].x,
-1., system->my_atoms[l].x );
r_li = rvec_Norm( dvec_li );
/* omega and its derivative */
omega = Calculate_Omega( pbond_ij->dvec, r_ij,
pbond_jk->dvec, r_jk,
pbond_kl->dvec, r_kl,
dvec_li, r_li,
p_ijk, p_jkl,
dcos_omega_di, dcos_omega_dj,
dcos_omega_dk, dcos_omega_dl,
out_control );
cos_omega = cos( omega );
cos2omega = cos( 2. * omega );
cos3omega = cos( 3. * omega );
/* end omega calculations */
/* torsion energy */
exp_tor1 = exp( fbp->p_tor1 *
SQR(2.0 - bo_jk->BO_pi - f11_DjDk) );
exp_tor2_kl = exp( -p_tor2 * BOA_kl );
exp_cot2_kl = exp( -p_cot2 * SQR(BOA_kl - 1.5) );
fn10 = (1.0 - exp_tor2_ij) * (1.0 - exp_tor2_jk) *
(1.0 - exp_tor2_kl);
CV = 0.5 * ( fbp->V1 * (1.0 + cos_omega) +
fbp->V2 * exp_tor1 * (1.0 - cos2omega) +
fbp->V3 * (1.0 + cos3omega) );
data->my_en.e_tor += e_tor = fn10 * sin_ijk * sin_jkl * CV;
dfn11 = (-p_tor3 * exp_tor3_DjDk +
(p_tor3 * exp_tor3_DjDk - p_tor4 * exp_tor4_DjDk) *
(2.0 + exp_tor3_DjDk) * exp_tor34_inv) *
exp_tor34_inv;
CEtors1 = sin_ijk * sin_jkl * CV;
CEtors2 = -fn10 * 2.0 * fbp->p_tor1 * fbp->V2 * exp_tor1 *
(2.0 - bo_jk->BO_pi - f11_DjDk) * (1.0 - SQR(cos_omega)) *
sin_ijk * sin_jkl;
CEtors3 = CEtors2 * dfn11;
CEtors4 = CEtors1 * p_tor2 * exp_tor2_ij *
(1.0 - exp_tor2_jk) * (1.0 - exp_tor2_kl);
CEtors5 = CEtors1 * p_tor2 *
(1.0 - exp_tor2_ij) * exp_tor2_jk * (1.0 - exp_tor2_kl);
CEtors6 = CEtors1 * p_tor2 *
(1.0 - exp_tor2_ij) * (1.0 - exp_tor2_jk) * exp_tor2_kl;
cmn = -fn10 * CV;
CEtors7 = cmn * sin_jkl * tan_ijk_i;
CEtors8 = cmn * sin_ijk * tan_jkl_i;
CEtors9 = fn10 * sin_ijk * sin_jkl *
(0.5 * fbp->V1 - 2.0 * fbp->V2 * exp_tor1 * cos_omega +
1.5 * fbp->V3 * (cos2omega + 2.0 * SQR(cos_omega)));
/* end of torsion energy */
/* 4-body conjugation energy */
fn12 = exp_cot2_ij * exp_cot2_jk * exp_cot2_kl;
data->my_en.e_con += e_con =
fbp->p_cot1 * fn12 *
(1.0 + (SQR(cos_omega) - 1.0) * sin_ijk * sin_jkl);
Cconj = -2.0 * fn12 * fbp->p_cot1 * p_cot2 *
(1.0 + (SQR(cos_omega) - 1.0) * sin_ijk * sin_jkl);
CEconj1 = Cconj * (BOA_ij - 1.5e0);
CEconj2 = Cconj * (BOA_jk - 1.5e0);
CEconj3 = Cconj * (BOA_kl - 1.5e0);
CEconj4 = -fbp->p_cot1 * fn12 *
(SQR(cos_omega) - 1.0) * sin_jkl * tan_ijk_i;
CEconj5 = -fbp->p_cot1 * fn12 *
(SQR(cos_omega) - 1.0) * sin_ijk * tan_jkl_i;
CEconj6 = 2.0 * fbp->p_cot1 * fn12 *
cos_omega * sin_ijk * sin_jkl;
/* end 4-body conjugation energy */
/* forces */
bo_jk->Cdbopi += CEtors2;
workspace->CdDelta[j] += CEtors3;
workspace->CdDelta[k] += CEtors3;
bo_ij->Cdbo += (CEtors4 + CEconj1);
bo_jk->Cdbo += (CEtors5 + CEconj2);
bo_kl->Cdbo += (CEtors6 + CEconj3);
if( control->virial == 0 ) {
/* dcos_theta_ijk */
rvec_ScaledAdd( workspace->f[i],
CEtors7 + CEconj4, p_ijk->dcos_dk );
rvec_ScaledAdd( workspace->f[j],
CEtors7 + CEconj4, p_ijk->dcos_dj );
rvec_ScaledAdd( workspace->f[k],
CEtors7 + CEconj4, p_ijk->dcos_di );
/* dcos_theta_jkl */
rvec_ScaledAdd( workspace->f[j],
CEtors8 + CEconj5, p_jkl->dcos_di );
rvec_ScaledAdd( workspace->f[k],
CEtors8 + CEconj5, p_jkl->dcos_dj );
rvec_ScaledAdd( workspace->f[l],
CEtors8 + CEconj5, p_jkl->dcos_dk );
/* dcos_omega */
rvec_ScaledAdd( workspace->f[i],
CEtors9 + CEconj6, dcos_omega_di );
rvec_ScaledAdd( workspace->f[j],
CEtors9 + CEconj6, dcos_omega_dj );
rvec_ScaledAdd( workspace->f[k],
CEtors9 + CEconj6, dcos_omega_dk );
rvec_ScaledAdd( workspace->f[l],
CEtors9 + CEconj6, dcos_omega_dl );
}
else {
ivec_Sum(rel_box_jl, pbond_jk->rel_box, pbond_kl->rel_box);
/* dcos_theta_ijk */
rvec_Scale( force, CEtors7 + CEconj4, p_ijk->dcos_dk );
rvec_Add( workspace->f[i], force );
rvec_iMultiply( ext_press, pbond_ij->rel_box, force );
rvec_Add( data->my_ext_press, ext_press );
rvec_ScaledAdd( workspace->f[j],
CEtors7 + CEconj4, p_ijk->dcos_dj );
rvec_Scale( force, CEtors7 + CEconj4, p_ijk->dcos_di );
rvec_Add( workspace->f[k], force );
rvec_iMultiply( ext_press, pbond_jk->rel_box, force );
rvec_Add( data->my_ext_press, ext_press );
/* dcos_theta_jkl */
rvec_ScaledAdd( workspace->f[j],
CEtors8 + CEconj5, p_jkl->dcos_di );
rvec_Scale( force, CEtors8 + CEconj5, p_jkl->dcos_dj );
rvec_Add( workspace->f[k], force );
rvec_iMultiply( ext_press, pbond_jk->rel_box, force );
rvec_Add( data->my_ext_press, ext_press );
rvec_Scale( force, CEtors8 + CEconj5, p_jkl->dcos_dk );
rvec_Add( workspace->f[l], force );
rvec_iMultiply( ext_press, rel_box_jl, force );
rvec_Add( data->my_ext_press, ext_press );
/* dcos_omega */
rvec_Scale( force, CEtors9 + CEconj6, dcos_omega_di );
rvec_Add( workspace->f[i], force );
rvec_iMultiply( ext_press, pbond_ij->rel_box, force );
rvec_Add( data->my_ext_press, ext_press );
rvec_ScaledAdd( workspace->f[j],
CEtors9 + CEconj6, dcos_omega_dj );
rvec_Scale( force, CEtors9 + CEconj6, dcos_omega_dk );
rvec_Add( workspace->f[k], force );
rvec_iMultiply( ext_press, pbond_jk->rel_box, force );
rvec_Add( data->my_ext_press, ext_press );
rvec_Scale( force, CEtors9 + CEconj6, dcos_omega_dl );
rvec_Add( workspace->f[l], force );
rvec_iMultiply( ext_press, rel_box_jl, force );
rvec_Add( data->my_ext_press, ext_press );
}
/* tally into per-atom virials */
if( system->pair_ptr->vflag_atom || system->pair_ptr->evflag) {
// acquire vectors
rvec_ScaledSum( delil, 1., system->my_atoms[l].x,
-1., system->my_atoms[i].x );
rvec_ScaledSum( deljl, 1., system->my_atoms[l].x,
-1., system->my_atoms[j].x );
rvec_ScaledSum( delkl, 1., system->my_atoms[l].x,
-1., system->my_atoms[k].x );
// dcos_theta_ijk
rvec_Scale( fi_tmp, CEtors7 + CEconj4, p_ijk->dcos_dk );
rvec_Scale( fj_tmp, CEtors7 + CEconj4, p_ijk->dcos_dj );
rvec_Scale( fk_tmp, CEtors7 + CEconj4, p_ijk->dcos_di );
// dcos_theta_jkl
rvec_ScaledAdd( fj_tmp, CEtors8 + CEconj5, p_jkl->dcos_di );
rvec_ScaledAdd( fk_tmp, CEtors8 + CEconj5, p_jkl->dcos_dj );
// dcos_omega
rvec_ScaledAdd( fi_tmp, CEtors9 + CEconj6, dcos_omega_di );
rvec_ScaledAdd( fj_tmp, CEtors9 + CEconj6, dcos_omega_dj );
rvec_ScaledAdd( fk_tmp, CEtors9 + CEconj6, dcos_omega_dk );
// tally
eng_tmp = e_tor + e_con;
if( system->pair_ptr->evflag)
system->pair_ptr->ev_tally(j,k,natoms,1,eng_tmp,0.0,0.0,0.0,0.0,0.0);
if( system->pair_ptr->vflag_atom)
system->pair_ptr->v_tally4(i,j,k,l,fi_tmp,fj_tmp,fk_tmp,delil,deljl,delkl);
}
} // pl check ends
} // pl loop ends
} // pi check ends
} // pi loop ends
} // k-j neighbor check ends
} // j-k neighbor check ends
} // pk loop ends
} // j loop
}
diff --git a/src/USER-REAXC/reaxc_traj.cpp b/src/USER-REAXC/reaxc_traj.cpp
index 9d4fa7352..ae2bba215 100644
--- a/src/USER-REAXC/reaxc_traj.cpp
+++ b/src/USER-REAXC/reaxc_traj.cpp
@@ -1,777 +1,777 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_traj.h"
#include "reaxc_list.h"
#include "reaxc_tool_box.h"
int Reallocate_Output_Buffer( output_controls *out_control, int req_space,
MPI_Comm comm )
{
if( out_control->buffer_len > 0 )
free( out_control->buffer );
out_control->buffer_len = (int)(req_space*SAFE_ZONE);
out_control->buffer = (char*) malloc(out_control->buffer_len*sizeof(char));
if( out_control->buffer == NULL ) {
fprintf( stderr,
"insufficient memory for required buffer size %d. terminating!\n",
(int) (req_space*SAFE_ZONE) );
MPI_Abort( comm, INSUFFICIENT_MEMORY );
}
return SUCCESS;
}
void Write_Skip_Line( output_controls *out_control, mpi_datatypes *mpi_data,
int my_rank, int skip, int num_section )
{
if( my_rank == MASTER_NODE )
fprintf( out_control->strj, INT2_LINE,
"chars_to_skip_section:", skip, num_section );
}
int Write_Header( reax_system *system, control_params *control,
output_controls *out_control, mpi_datatypes *mpi_data )
{
int num_hdr_lines, my_hdr_lines, buffer_req;
char ensembles[ens_N][25] = { "NVE", "NVT", "fully flexible NPT",
"semi isotropic NPT", "isotropic NPT" };
char reposition[3][25] = { "fit to periodic box", "CoM to center of box",
"CoM to origin" };
char t_regime[3][25] = { "T-coupling only", "step-wise", "constant slope" };
char traj_methods[TF_N][10] = { "custom", "xyz" };
char atom_formats[8][40] = { "none", "invalid", "invalid", "invalid",
"xyz_q",
"xyz_q_fxfyfz",
"xyz_q_vxvyvz",
"detailed_atom_info" };
char bond_formats[3][30] = { "none",
"basic_bond_info",
"detailed_bond_info" };
char angle_formats[2][30] = { "none", "basic_angle_info" };
/* set header lengths */
num_hdr_lines = NUM_HEADER_LINES;
my_hdr_lines = num_hdr_lines * ( system->my_rank == MASTER_NODE );
buffer_req = my_hdr_lines * HEADER_LINE_LEN;
if( buffer_req > out_control->buffer_len * DANGER_ZONE )
Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );
/* only the master node writes into trajectory header */
if( system->my_rank == MASTER_NODE ) {
/* clear the contents of line & buffer */
out_control->line[0] = 0;
out_control->buffer[0] = 0;
/* to skip the header */
sprintf( out_control->line, INT_LINE, "chars_to_skip_header:",
(num_hdr_lines-1) * HEADER_LINE_LEN );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* general simulation info */
sprintf( out_control->line, STR_LINE, "simulation_name:",
out_control->traj_title );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, BIGINT_LINE, "number_of_atoms:", system->bigN );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, STR_LINE, "ensemble_type:",
ensembles[ control->ensemble ] );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, INT_LINE, "number_of_steps:",
control->nsteps );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "timestep_length_(in_fs):",
control->dt * 1000 );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* restart info */
sprintf( out_control->line, STR_LINE, "is_this_a_restart?:",
(control->restart ? "yes" : "no") );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, STR_LINE, "write_restart_files?:",
((out_control->restart_freq > 0) ? "yes" : "no") );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, INT_LINE, "frequency_to_write_restarts:",
out_control->restart_freq );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* preferences */
sprintf( out_control->line, STR_LINE, "tabulate_long_range_intrs?:",
(control->tabulate ? "yes" : "no") );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, INT_LINE, "table_size:", control->tabulate );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, STR_LINE, "restrict_bonds?:",
(control->restrict_bonds ? "yes" : "no") );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, INT_LINE, "bond_restriction_length:",
control->restrict_bonds );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, STR_LINE, "reposition_atoms?:",
reposition[control->reposition_atoms] );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, INT_LINE, "remove_CoM_velocity?:",
(control->ensemble==NVE) ? 0 : control->remove_CoM_vel);
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* cut-off values */
sprintf( out_control->line, REAL_LINE, "bonded_intr_dist_cutoff:",
control->bond_cut );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "nonbonded_intr_dist_cutoff:",
control->nonb_cut );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "hbond_dist_cutoff:",
control->hbond_cut );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "reax_bond_threshold:",
control->bo_cut );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "physical_bond_threshold:",
control->bg_cut );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "valence_angle_threshold:",
control->thb_cut );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, SCI_LINE, "QEq_tolerance:", control->q_err );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* temperature controls */
sprintf( out_control->line, REAL_LINE, "initial_temperature:",
control->T_init );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "target_temperature:",
control->T_final );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "thermal_inertia:",
control->Tau_T );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, STR_LINE, "temperature_regime:",
t_regime[ control->T_mode ] );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "temperature_change_rate_(K/ps):",
control->T_rate / control->T_freq );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* pressure controls */
sprintf( out_control->line, REAL3_LINE, "target_pressure_(GPa):",
control->P[0], control->P[1], control->P[2] );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL3_LINE, "virial_inertia:",
control->Tau_P[0], control->Tau_P[1], control->Tau_P[2] );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* trajectory */
sprintf( out_control->line, INT_LINE, "energy_dumping_freq:",
out_control->energy_update_freq );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, INT_LINE, "trajectory_dumping_freq:",
out_control->write_steps );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, STR_LINE, "compress_trajectory_output?:",
(out_control->traj_compress ? "yes" : "no") );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, STR_LINE, "trajectory_format:",
traj_methods[ out_control->traj_method ] );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, STR_LINE, "atom_info:",
atom_formats[ out_control->atom_info ] );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, STR_LINE, "bond_info:",
bond_formats[ out_control->bond_info ] );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, STR_LINE, "angle_info:",
angle_formats[ out_control->angle_info ] );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* analysis */
//sprintf( out_control->line, STR_LINE, "molecular_analysis:",
// (control->molec_anal ? "yes" : "no") );
//strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, INT_LINE, "molecular_analysis_frequency:",
control->molecular_analysis );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
}
/* dump out the buffer */
if( system->my_rank == MASTER_NODE )
fprintf( out_control->strj, "%s", out_control->buffer );
return SUCCESS;
}
int Write_Init_Desc( reax_system *system, control_params *control,
output_controls *out_control, mpi_datatypes *mpi_data )
{
int i, me, np, cnt, buffer_len, buffer_req;
reax_atom *p_atom;
MPI_Status status;
me = system->my_rank;
np = system->wsize;
/* skip info */
Write_Skip_Line( out_control, mpi_data, me,
system->bigN * INIT_DESC_LEN, system->bigN );
if( out_control->traj_method == REG_TRAJ && me == MASTER_NODE )
buffer_req = system->bigN * INIT_DESC_LEN + 1;
else buffer_req = system->n * INIT_DESC_LEN + 1;
if( buffer_req > out_control->buffer_len * DANGER_ZONE )
Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );
out_control->line[0] = 0;
out_control->buffer[0] = 0;
for( i = 0; i < system->n; ++i ) {
p_atom = &( system->my_atoms[i] );
sprintf( out_control->line, INIT_DESC,
p_atom->orig_id, p_atom->type, p_atom->name,
system->reax_param.sbp[ p_atom->type ].mass );
strncpy( out_control->buffer + i*INIT_DESC_LEN,
out_control->line, INIT_DESC_LEN+1 );
}
if( me != MASTER_NODE )
MPI_Send( out_control->buffer, buffer_req-1, MPI_CHAR, MASTER_NODE,
np * INIT_DESCS + me, mpi_data->world );
else{
buffer_len = system->n * INIT_DESC_LEN;
for( i = 0; i < np; ++i )
if( i != MASTER_NODE ) {
MPI_Recv( out_control->buffer + buffer_len, buffer_req - buffer_len,
MPI_CHAR, i, np*INIT_DESCS+i, mpi_data->world, &status );
MPI_Get_count( &status, MPI_CHAR, &cnt );
buffer_len += cnt;
}
out_control->buffer[buffer_len] = 0;
fprintf( out_control->strj, "%s", out_control->buffer );
}
return SUCCESS;
}
int Init_Traj( reax_system *system, control_params *control,
output_controls *out_control, mpi_datatypes *mpi_data,
char *msg )
{
char fname[MAX_STR];
int atom_line_len[ NR_OPT_ATOM ] = { 0, 0, 0, 0,
ATOM_BASIC_LEN, ATOM_wV_LEN,
ATOM_wF_LEN, ATOM_FULL_LEN };
int bond_line_len[ NR_OPT_BOND ] = { 0, BOND_BASIC_LEN, BOND_FULL_LEN };
int angle_line_len[ NR_OPT_ANGLE ] = { 0, ANGLE_BASIC_LEN };
/* generate trajectory name */
sprintf( fname, "%s.trj", control->sim_name );
/* how should I write atoms? */
out_control->atom_line_len = atom_line_len[ out_control->atom_info ];
out_control->write_atoms = ( out_control->atom_line_len ? 1 : 0 );
/* bonds? */
out_control->bond_line_len = bond_line_len[ out_control->bond_info ];
out_control->write_bonds = ( out_control->bond_line_len ? 1 : 0 );
/* angles? */
out_control->angle_line_len = angle_line_len[ out_control->angle_info ];
out_control->write_angles = ( out_control->angle_line_len ? 1 : 0 );
/* allocate line & buffer space */
out_control->line = (char*) calloc( MAX_TRJ_LINE_LEN + 1, sizeof(char) );
out_control->buffer_len = 0;
out_control->buffer = NULL;
/* write trajectory header and atom info, if applicable */
if( out_control->traj_method == REG_TRAJ) {
if( system->my_rank == MASTER_NODE )
out_control->strj = fopen( fname, "w" );
}
else {
strcpy( msg, "init_traj: unknown trajectory option" );
return FAILURE;
}
Write_Header( system, control, out_control, mpi_data );
Write_Init_Desc( system, control, out_control, mpi_data );
return SUCCESS;
}
int Write_Frame_Header( reax_system *system, control_params *control,
simulation_data *data, output_controls *out_control,
mpi_datatypes *mpi_data )
{
int me, num_frm_hdr_lines, my_frm_hdr_lines, buffer_req;
me = system->my_rank;
/* frame header lengths */
num_frm_hdr_lines = 22;
my_frm_hdr_lines = num_frm_hdr_lines * ( me == MASTER_NODE );
buffer_req = my_frm_hdr_lines * HEADER_LINE_LEN;
if( buffer_req > out_control->buffer_len * DANGER_ZONE )
Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );
/* only the master node writes into trajectory header */
if( me == MASTER_NODE ) {
/* clear the contents of line & buffer */
out_control->line[0] = 0;
out_control->buffer[0] = 0;
/* skip info */
sprintf( out_control->line, INT_LINE, "chars_to_skip_frame_header:",
(num_frm_hdr_lines - 1) * HEADER_LINE_LEN );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* step & time */
sprintf( out_control->line, INT_LINE, "step:", data->step );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "time_in_ps:",
data->step * control->dt );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* box info */
sprintf( out_control->line, REAL_LINE, "volume:", system->big_box.V );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL3_LINE, "box_dimensions:",
system->big_box.box_norms[0],
system->big_box.box_norms[1],
system->big_box.box_norms[2] );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL3_LINE,
"coordinate_angles:", 90.0, 90.0, 90.0 );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* system T and P */
sprintf( out_control->line, REAL_LINE, "temperature:", data->therm.T );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "pressure:",
(control->ensemble==iNPT) ?
data->iso_bar.P : data->flex_bar.P_scalar );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
/* energies */
sprintf( out_control->line, REAL_LINE, "total_energy:",
data->sys_en.e_tot );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "total_kinetic:",
data->sys_en.e_kin );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "total_potential:",
data->sys_en.e_pot );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "bond_energy:",
data->sys_en.e_bond );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "atom_energy:",
data->sys_en.e_ov + data->sys_en.e_un );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "lone_pair_energy:",
data->sys_en.e_lp );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "valence_angle_energy:",
data->sys_en.e_ang + data->sys_en.e_pen );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "3-body_conjugation:",
data->sys_en.e_coa );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "hydrogen_bond_energy:",
data->sys_en.e_hb );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "torsion_angle_energy:",
data->sys_en.e_tor );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "4-body_conjugation:",
data->sys_en.e_con );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "vdWaals_energy:",
data->sys_en.e_vdW );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "electrostatics_energy:",
data->sys_en.e_ele );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
sprintf( out_control->line, REAL_LINE, "polarization_energy:",
data->sys_en.e_pol );
strncat( out_control->buffer, out_control->line, HEADER_LINE_LEN+1 );
}
/* dump out the buffer */
if( system->my_rank == MASTER_NODE )
fprintf( out_control->strj, "%s", out_control->buffer );
return SUCCESS;
}
int Write_Atoms( reax_system *system, control_params *control,
output_controls *out_control, mpi_datatypes *mpi_data )
{
int i, me, np, line_len, buffer_len, buffer_req, cnt;
MPI_Status status;
reax_atom *p_atom;
me = system->my_rank;
np = system->wsize;
line_len = out_control->atom_line_len;
Write_Skip_Line( out_control, mpi_data, me,
system->bigN*line_len, system->bigN );
if( out_control->traj_method == REG_TRAJ && me == MASTER_NODE )
buffer_req = system->bigN * line_len + 1;
else buffer_req = system->n * line_len + 1;
if( buffer_req > out_control->buffer_len * DANGER_ZONE )
Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );
/* fill in buffer */
out_control->line[0] = 0;
out_control->buffer[0] = 0;
for( i = 0; i < system->n; ++i ) {
p_atom = &( system->my_atoms[i] );
switch( out_control->atom_info ) {
case OPT_ATOM_BASIC:
sprintf( out_control->line, ATOM_BASIC,
p_atom->orig_id, p_atom->x[0], p_atom->x[1], p_atom->x[2],
p_atom->q );
break;
case OPT_ATOM_wF:
sprintf( out_control->line, ATOM_wF,
p_atom->orig_id, p_atom->x[0], p_atom->x[1], p_atom->x[2],
p_atom->f[0], p_atom->f[1], p_atom->f[2], p_atom->q );
break;
case OPT_ATOM_wV:
sprintf( out_control->line, ATOM_wV,
p_atom->orig_id, p_atom->x[0], p_atom->x[1], p_atom->x[2],
p_atom->v[0], p_atom->v[1], p_atom->v[2], p_atom->q );
break;
case OPT_ATOM_FULL:
sprintf( out_control->line, ATOM_FULL,
p_atom->orig_id, p_atom->x[0], p_atom->x[1], p_atom->x[2],
p_atom->v[0], p_atom->v[1], p_atom->v[2],
p_atom->f[0], p_atom->f[1], p_atom->f[2], p_atom->q );
break;
default:
fprintf( stderr,
"write_traj_atoms: unknown atom trajectroy format!\n");
MPI_Abort( mpi_data->world, UNKNOWN_OPTION );
}
strncpy( out_control->buffer + i*line_len, out_control->line, line_len+1 );
}
if( me != MASTER_NODE )
MPI_Send( out_control->buffer, buffer_req-1, MPI_CHAR, MASTER_NODE,
np*ATOM_LINES+me, mpi_data->world );
else{
buffer_len = system->n * line_len;
for( i = 0; i < np; ++i )
if( i != MASTER_NODE ) {
MPI_Recv( out_control->buffer + buffer_len, buffer_req - buffer_len,
MPI_CHAR, i, np*ATOM_LINES+i, mpi_data->world, &status );
MPI_Get_count( &status, MPI_CHAR, &cnt );
buffer_len += cnt;
}
out_control->buffer[buffer_len] = 0;
fprintf( out_control->strj, "%s", out_control->buffer );
}
return SUCCESS;
}
int Write_Bonds(reax_system *system, control_params *control, reax_list *bonds,
output_controls *out_control, mpi_datatypes *mpi_data)
{
int i, j, pj, me, np;
int my_bonds, num_bonds;
int line_len, buffer_len, buffer_req, cnt;
MPI_Status status;
bond_data *bo_ij;
me = system->my_rank;
np = system->wsize;
line_len = out_control->bond_line_len;
/* count the number of bonds I will write */
my_bonds = 0;
for( i=0; i < system->n; ++i )
for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj ) {
j = bonds->select.bond_list[pj].nbr;
if( system->my_atoms[i].orig_id <= system->my_atoms[j].orig_id &&
bonds->select.bond_list[pj].bo_data.BO >= control->bg_cut )
++my_bonds;
}
/* allreduce - total number of bonds */
MPI_Allreduce( &my_bonds, &num_bonds, 1, MPI_INT, MPI_SUM, mpi_data->world );
Write_Skip_Line( out_control, mpi_data, me, num_bonds*line_len, num_bonds );
if( out_control->traj_method == REG_TRAJ && me == MASTER_NODE )
buffer_req = num_bonds * line_len + 1;
else buffer_req = my_bonds * line_len + 1;
if( buffer_req > out_control->buffer_len * DANGER_ZONE )
Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );
/* fill in the buffer */
out_control->line[0] = 0;
out_control->buffer[0] = 0;
my_bonds = 0;
for( i=0; i < system->n; ++i ) {
for( pj = Start_Index(i, bonds); pj < End_Index(i, bonds); ++pj ) {
bo_ij = &( bonds->select.bond_list[pj] );
j = bo_ij->nbr;
if( system->my_atoms[i].orig_id <= system->my_atoms[j].orig_id &&
bo_ij->bo_data.BO >= control->bg_cut ) {
switch( out_control->bond_info ) {
case OPT_BOND_BASIC:
sprintf( out_control->line, BOND_BASIC,
system->my_atoms[i].orig_id, system->my_atoms[j].orig_id,
bo_ij->d, bo_ij->bo_data.BO );
break;
case OPT_BOND_FULL:
sprintf( out_control->line, BOND_FULL,
system->my_atoms[i].orig_id, system->my_atoms[j].orig_id,
bo_ij->d, bo_ij->bo_data.BO, bo_ij->bo_data.BO_s,
bo_ij->bo_data.BO_pi, bo_ij->bo_data.BO_pi2 );
break;
default:
fprintf(stderr, "write_traj_bonds: FATAL! invalid bond_info option");
MPI_Abort( mpi_data->world, UNKNOWN_OPTION );
}
strncpy( out_control->buffer + my_bonds*line_len,
out_control->line, line_len+1 );
++my_bonds;
}
}
}
if( me != MASTER_NODE )
MPI_Send( out_control->buffer, buffer_req-1, MPI_CHAR, MASTER_NODE,
np*BOND_LINES+me, mpi_data->world );
else{
buffer_len = my_bonds * line_len;
for( i = 0; i < np; ++i )
if( i != MASTER_NODE ) {
MPI_Recv( out_control->buffer + buffer_len, buffer_req - buffer_len,
MPI_CHAR, i, np*BOND_LINES+i, mpi_data->world, &status );
MPI_Get_count( &status, MPI_CHAR, &cnt );
buffer_len += cnt;
}
out_control->buffer[buffer_len] = 0;
fprintf( out_control->strj, "%s", out_control->buffer );
}
return SUCCESS;
}
int Write_Angles( reax_system *system, control_params *control,
reax_list *bonds, reax_list *thb_intrs,
output_controls *out_control, mpi_datatypes *mpi_data )
{
int i, j, k, pi, pk, me, np;
int my_angles, num_angles;
int line_len, buffer_len, buffer_req, cnt;
bond_data *bo_ij, *bo_jk;
three_body_interaction_data *angle_ijk;
MPI_Status status;
me = system->my_rank;
np = system->wsize;
line_len = out_control->angle_line_len;
/* count the number of valence angles I will output */
my_angles = 0;
for( j = 0; j < system->n; ++j )
for( pi = Start_Index(j, bonds); pi < End_Index(j, bonds); ++pi ) {
bo_ij = &(bonds->select.bond_list[pi]);
i = bo_ij->nbr;
if( bo_ij->bo_data.BO >= control->bg_cut ) // physical j&i bond
for( pk = Start_Index( pi, thb_intrs );
pk < End_Index( pi, thb_intrs ); ++pk ) {
angle_ijk = &(thb_intrs->select.three_body_list[pk]);
k = angle_ijk->thb;
bo_jk = &(bonds->select.bond_list[ angle_ijk->pthb ]);
if( system->my_atoms[i].orig_id < system->my_atoms[k].orig_id &&
bo_jk->bo_data.BO >= control->bg_cut ) // physical j&k bond
++my_angles;
}
}
/* total number of valences */
MPI_Allreduce(&my_angles, &num_angles, 1, MPI_INT, MPI_SUM, mpi_data->world);
Write_Skip_Line( out_control, mpi_data, me, num_angles*line_len, num_angles );
if( out_control->traj_method == REG_TRAJ && me == MASTER_NODE )
buffer_req = num_angles * line_len + 1;
else buffer_req = my_angles * line_len + 1;
if( buffer_req > out_control->buffer_len * DANGER_ZONE )
Reallocate_Output_Buffer( out_control, buffer_req, mpi_data->world );
/* fill in the buffer */
my_angles = 0;
out_control->line[0] = 0;
out_control->buffer[0] = 0;
for( j = 0; j < system->n; ++j )
for( pi = Start_Index(j, bonds); pi < End_Index(j, bonds); ++pi ) {
bo_ij = &(bonds->select.bond_list[pi]);
i = bo_ij->nbr;
if( bo_ij->bo_data.BO >= control->bg_cut ) // physical j&i bond
for( pk = Start_Index( pi, thb_intrs );
pk < End_Index( pi, thb_intrs ); ++pk ) {
angle_ijk = &(thb_intrs->select.three_body_list[pk]);
k = angle_ijk->thb;
bo_jk = &(bonds->select.bond_list[ angle_ijk->pthb ]);
if( system->my_atoms[i].orig_id < system->my_atoms[k].orig_id &&
bo_jk->bo_data.BO >= control->bg_cut ) { // physical j&k bond
sprintf( out_control->line, ANGLE_BASIC,
system->my_atoms[i].orig_id, system->my_atoms[j].orig_id,
system->my_atoms[k].orig_id, RAD2DEG( angle_ijk->theta ) );
strncpy( out_control->buffer + my_angles*line_len,
out_control->line, line_len+1 );
++my_angles;
}
}
}
if( me != MASTER_NODE )
MPI_Send( out_control->buffer, buffer_req-1, MPI_CHAR, MASTER_NODE,
np*ANGLE_LINES+me, mpi_data->world );
else{
buffer_len = my_angles * line_len;
for( i = 0; i < np; ++i )
if( i != MASTER_NODE ) {
MPI_Recv( out_control->buffer + buffer_len, buffer_req - buffer_len,
MPI_CHAR, i, np*ANGLE_LINES+i, mpi_data->world, &status );
MPI_Get_count( &status, MPI_CHAR, &cnt );
buffer_len += cnt;
}
out_control->buffer[buffer_len] = 0;
fprintf( out_control->strj, "%s", out_control->buffer );
}
return SUCCESS;
}
int Append_Frame( reax_system *system, control_params *control,
simulation_data *data, reax_list **lists,
output_controls *out_control, mpi_datatypes *mpi_data )
{
Write_Frame_Header( system, control, data, out_control, mpi_data );
if( out_control->write_atoms )
Write_Atoms( system, control, out_control, mpi_data );
if( out_control->write_bonds )
Write_Bonds( system, control, (*lists + BONDS), out_control, mpi_data );
if( out_control->write_angles )
Write_Angles( system, control, (*lists + BONDS), (*lists + THREE_BODIES),
out_control, mpi_data );
return SUCCESS;
}
int End_Traj( int my_rank, output_controls *out_control )
{
if( my_rank == MASTER_NODE )
fclose( out_control->strj );
free( out_control->buffer );
free( out_control->line );
return SUCCESS;
}
diff --git a/src/USER-REAXC/reaxc_types.h b/src/USER-REAXC/reaxc_types.h
index db4cf0417..b3e2f40f0 100644
--- a/src/USER-REAXC/reaxc_types.h
+++ b/src/USER-REAXC/reaxc_types.h
@@ -1,882 +1,862 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
#ifndef __REAX_TYPES_H_
#define __REAX_TYPES_H_
#include "lmptype.h"
#include <ctype.h>
#include <math.h>
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "sys/time.h"
#include <time.h>
/************* SOME DEFS - crucial for reax_types.h *********/
#define LAMMPS_REAX
//#define DEBUG
//#define DEBUG_FOCUS
//#define TEST_ENERGY
//#define TEST_FORCES
//#define CG_PERFORMANCE
//#define LOG_PERFORMANCE
//#define STANDARD_BOUNDARIES
//#define OLD_BOUNDARIES
//#define MIDPOINT_BOUNDARIES
#define REAX_MAX_STR 1024
#define REAX_MAX_NBRS 6
#define REAX_MAX_3BODY_PARAM 5
#define REAX_MAX_4BODY_PARAM 5
#define REAX_MAX_ATOM_TYPES 25
#define REAX_MAX_MOLECULE_SIZE 20
#define MAX_BOND 20 // same as reaxc_defs.h
/********************** TYPE DEFINITIONS ********************/
-typedef int ivec[3];
+typedef int ivec[3];
typedef double rvec[3];
typedef double rtensor[3][3];
typedef double rvec2[2];
typedef double rvec4[4];
-
// import LAMMPS' definition of tagint and bigint
typedef LAMMPS_NS::tagint rc_tagint;
typedef LAMMPS_NS::bigint rc_bigint;
typedef struct
{
int cnt;
int *index;
void *out_atoms;
} mpi_out_data;
-
typedef struct
{
MPI_Comm world;
MPI_Comm comm_mesh3D;
MPI_Datatype sys_info;
MPI_Datatype mpi_atom_type;
MPI_Datatype boundary_atom_type;
MPI_Datatype mpi_rvec, mpi_rvec2;
MPI_Datatype restart_atom_type;
MPI_Datatype header_line;
MPI_Datatype header_view;
MPI_Datatype init_desc_line;
MPI_Datatype init_desc_view;
MPI_Datatype atom_line;
MPI_Datatype atom_view;
MPI_Datatype bond_line;
MPI_Datatype bond_view;
MPI_Datatype angle_line;
MPI_Datatype angle_view;
mpi_out_data out_buffers[REAX_MAX_NBRS];
void *in1_buffer;
void *in2_buffer;
} mpi_datatypes;
-
typedef struct
{
int n_global;
double* l;
int vdw_type;
} global_parameters;
-
-
typedef struct
{
/* Line one in field file */
char name[15]; // Two character atom name
double r_s;
double valency; // Valency of the atom
double mass; // Mass of atom
double r_vdw;
double epsilon;
double gamma;
double r_pi;
double valency_e;
double nlp_opt;
/* Line two in field file */
double alpha;
double gamma_w;
double valency_boc;
double p_ovun5;
double chi;
double eta;
int p_hbond; // 1 for H, 2 for hbonding atoms (O,S,P,N), 0 for others
/* Line three in field file */
double r_pi_pi;
double p_lp2;
double b_o_131;
double b_o_132;
double b_o_133;
/* Line four in the field file */
double p_ovun2;
double p_val3;
double valency_val;
double p_val5;
double rcore2;
double ecore2;
double acore2;
/* Line five in the ffield file, only for lgvdw yes */
double lgcij;
double lgre;
} single_body_parameters;
-
-
/* Two Body Parameters */
typedef struct {
/* Bond Order parameters */
double p_bo1,p_bo2,p_bo3,p_bo4,p_bo5,p_bo6;
double r_s, r_p, r_pp; // r_o distances in BO formula
double p_boc3, p_boc4, p_boc5;
/* Bond Energy parameters */
double p_be1, p_be2;
double De_s, De_p, De_pp;
/* Over/Under coordination parameters */
double p_ovun1;
/* Van der Waal interaction parameters */
double D;
double alpha;
double r_vdW;
double gamma_w;
double rcore, ecore, acore;
double lgcij, lgre;
/* electrostatic parameters */
double gamma; // note: this parameter is gamma^-3 and not gamma.
double v13cor, ovc;
} two_body_parameters;
-
-
/* 3-body parameters */
typedef struct {
/* valence angle */
double theta_00;
double p_val1, p_val2, p_val4, p_val7;
/* penalty */
double p_pen1;
/* 3-body conjugation */
double p_coa1;
} three_body_parameters;
typedef struct{
int cnt;
three_body_parameters prm[REAX_MAX_3BODY_PARAM];
} three_body_header;
-
-
/* hydrogen-bond parameters */
typedef struct{
double r0_hb, p_hb1, p_hb2, p_hb3;
} hbond_parameters;
-
-
/* 4-body parameters */
typedef struct {
double V1, V2, V3;
/* torsion angle */
double p_tor1;
/* 4-body conjugation */
double p_cot1;
} four_body_parameters;
-
typedef struct
{
int cnt;
four_body_parameters prm[REAX_MAX_4BODY_PARAM];
} four_body_header;
-
typedef struct
{
int num_atom_types;
global_parameters gp;
single_body_parameters *sbp;
two_body_parameters **tbp;
three_body_header ***thbp;
hbond_parameters ***hbp;
four_body_header ****fbp;
} reax_interaction;
-
-
struct _reax_atom
{
rc_tagint orig_id;
int imprt_id;
int type;
char name[8];
rvec x; // position
rvec v; // velocity
rvec f; // force
rvec f_old;
double q; // charge
rvec4 s; // they take part in
rvec4 t; // computing q
int Hindex;
int num_bonds;
int num_hbonds;
int renumber;
int numbonds; // true number of bonds around atoms
int nbr_id[MAX_BOND]; // ids of neighbors around atoms
double nbr_bo[MAX_BOND]; // BO values of bond between i and nbr
double sum_bo, no_lp; // sum of BO values and no. of lone pairs
};
typedef _reax_atom reax_atom;
-
-
typedef struct
{
double V;
rvec min, max, box_norms;
rtensor box, box_inv;
rtensor trans, trans_inv;
rtensor g;
} simulation_box;
-
-
struct grid_cell
{
double cutoff;
rvec min, max;
ivec rel_box;
int mark;
int type;
int str;
int end;
int top;
int* atoms;
struct grid_cell** nbrs;
ivec* nbrs_x;
rvec* nbrs_cp;
};
typedef struct grid_cell grid_cell;
typedef struct
{
int total, max_atoms, max_nbrs;
ivec ncells;
rvec cell_len;
rvec inv_len;
ivec bond_span;
ivec nonb_span;
ivec vlist_span;
ivec native_cells;
ivec native_str;
ivec native_end;
double ghost_cut;
ivec ghost_span;
ivec ghost_nonb_span;
ivec ghost_hbond_span;
ivec ghost_bond_span;
grid_cell*** cells;
ivec *order;
} grid;
typedef struct
{
int rank;
int est_send, est_recv;
int atoms_str, atoms_cnt;
ivec rltv, prdc;
rvec bndry_min, bndry_max;
int send_type;
int recv_type;
ivec str_send;
ivec end_send;
ivec str_recv;
ivec end_recv;
} neighbor_proc;
typedef struct
{
int N;
int exc_gcells;
int exc_atoms;
} bound_estimate;
typedef struct
{
double ghost_nonb;
double ghost_hbond;
double ghost_bond;
double ghost_cutoff;
} boundary_cutoff;
using LAMMPS_NS::Pair;
struct _reax_system
{
reax_interaction reax_param;
rc_bigint bigN;
int n, N, numH;
int local_cap, total_cap, gcell_cap, Hcap;
int est_recv, est_trans, max_recved;
int wsize, my_rank, num_nbrs;
ivec my_coords;
neighbor_proc my_nbrs[REAX_MAX_NBRS];
int *global_offset;
simulation_box big_box, my_box, my_ext_box;
grid my_grid;
boundary_cutoff bndry_cuts;
reax_atom *my_atoms;
class Pair *pair_ptr;
int my_bonds;
int mincap;
double safezone, saferzone;
};
typedef _reax_system reax_system;
/* system control parameters */
typedef struct
{
char sim_name[REAX_MAX_STR];
int nprocs;
ivec procs_by_dim;
/* ensemble values:
0 : NVE
1 : bNVT (Berendsen)
2 : nhNVT (Nose-Hoover)
3 : sNPT (Parrinello-Rehman-Nose-Hoover) semiisotropic
4 : iNPT (Parrinello-Rehman-Nose-Hoover) isotropic
5 : NPT (Parrinello-Rehman-Nose-Hoover) Anisotropic*/
int ensemble;
int nsteps;
double dt;
int geo_format;
int restart;
int restrict_bonds;
int remove_CoM_vel;
int random_vel;
int reposition_atoms;
int reneighbor;
double vlist_cut;
double bond_cut;
double nonb_cut, nonb_low;
double hbond_cut;
double user_ghost_cut;
double bg_cut;
double bo_cut;
double thb_cut;
double thb_cutsq;
int tabulate;
int qeq_freq;
double q_err;
int refactor;
double droptol;
double T_init, T_final, T;
double Tau_T;
int T_mode;
double T_rate, T_freq;
int virial;
rvec P, Tau_P, Tau_PT;
int press_mode;
double compressibility;
int molecular_analysis;
int num_ignored;
int ignore[REAX_MAX_ATOM_TYPES];
int dipole_anal;
int freq_dipole_anal;
int diffusion_coef;
int freq_diffusion_coef;
int restrict_type;
int lgflag;
-
+ int enobondsflag;
+
} control_params;
typedef struct
{
double T;
double xi;
double v_xi;
double v_xi_old;
double G_xi;
} thermostat;
typedef struct
{
double P;
double eps;
double v_eps;
double v_eps_old;
double a_eps;
} isotropic_barostat;
typedef struct
{
rtensor P;
double P_scalar;
double eps;
double v_eps;
double v_eps_old;
double a_eps;
rtensor h0;
rtensor v_g0;
rtensor v_g0_old;
rtensor a_g0;
} flexible_barostat;
typedef struct
{
double start;
double end;
double elapsed;
double total;
double comm;
double nbrs;
double init_forces;
double bonded;
double nonb;
double qEq;
int s_matvecs;
int t_matvecs;
} reax_timing;
typedef struct
{
double e_tot;
double e_kin; // Total kinetic energy
double e_pot;
double e_bond; // Total bond energy
double e_ov; // Total over coordination
double e_un; // Total under coordination energy
double e_lp; // Total under coordination energy
double e_ang; // Total valance angle energy
double e_pen; // Total penalty energy
double e_coa; // Total three body conjgation energy
double e_hb; // Total Hydrogen bond energy
double e_tor; // Total torsional energy
double e_con; // Total four body conjugation energy
double e_vdW; // Total van der Waals energy
double e_ele; // Total electrostatics energy
double e_pol; // Polarization energy
} energy_data;
typedef struct
{
int step;
int prev_steps;
double time;
double M; // Total Mass
double inv_M; // 1 / Total Mass
rvec xcm; // Center of mass
rvec vcm; // Center of mass velocity
rvec fcm; // Center of mass force
rvec amcm; // Angular momentum of CoM
rvec avcm; // Angular velocity of CoM
double etran_cm; // Translational kinetic energy of CoM
double erot_cm; // Rotational kinetic energy of CoM
rtensor kinetic; // Kinetic energy tensor
rtensor virial; // Hydrodynamic virial
energy_data my_en;
energy_data sys_en;
double N_f; //Number of degrees of freedom
rvec t_scale;
rtensor p_scale;
thermostat therm; // Used in Nose_Hoover method
isotropic_barostat iso_bar;
flexible_barostat flex_bar;
double inv_W;
double kin_press;
rvec int_press;
rvec my_ext_press;
rvec ext_press;
rvec tot_press;
reax_timing timing;
} simulation_data;
typedef struct{
int thb;
int pthb; // pointer to the third body on the central atom's nbrlist
double theta, cos_theta;
rvec dcos_di, dcos_dj, dcos_dk;
} three_body_interaction_data;
typedef struct {
int nbr;
ivec rel_box;
double d;
rvec dvec;
} far_neighbor_data;
typedef struct {
int nbr;
int scl;
far_neighbor_data *ptr;
} hbond_data;
typedef struct{
int wrt;
rvec dVal;
} dDelta_data;
typedef struct{
int wrt;
rvec dBO, dBOpi, dBOpi2;
} dbond_data;
typedef struct{
double BO, BO_s, BO_pi, BO_pi2;
double Cdbo, Cdbopi, Cdbopi2;
double C1dbo, C2dbo, C3dbo;
double C1dbopi, C2dbopi, C3dbopi, C4dbopi;
double C1dbopi2, C2dbopi2, C3dbopi2, C4dbopi2;
rvec dBOp, dln_BOp_s, dln_BOp_pi, dln_BOp_pi2;
} bond_order_data;
typedef struct {
int nbr;
int sym_index;
int dbond_index;
ivec rel_box;
// rvec ext_factor;
double d;
rvec dvec;
bond_order_data bo_data;
} bond_data;
typedef struct {
int j;
double val;
} sparse_matrix_entry;
typedef struct {
int cap, n, m;
int *start, *end;
sparse_matrix_entry *entries;
} sparse_matrix;
typedef struct {
int num_far;
int H, Htop;
int hbonds, num_hbonds;
int bonds, num_bonds;
int num_3body;
int gcell_atoms;
} reallocate_data;
typedef struct
{
int allocated;
/* communication storage */
double *tmp_dbl[REAX_MAX_NBRS];
rvec *tmp_rvec[REAX_MAX_NBRS];
rvec2 *tmp_rvec2[REAX_MAX_NBRS];
int *within_bond_box;
/* bond order related storage */
double *total_bond_order;
double *Deltap, *Deltap_boc;
double *Delta, *Delta_lp, *Delta_lp_temp, *Delta_e, *Delta_boc, *Delta_val;
double *dDelta_lp, *dDelta_lp_temp;
double *nlp, *nlp_temp, *Clp, *vlpex;
rvec *dDeltap_self;
int *bond_mark, *done_after;
/* QEq storage */
sparse_matrix *H, *L, *U;
double *Hdia_inv, *b_s, *b_t, *b_prc, *b_prm, *s, *t;
double *droptol;
rvec2 *b, *x;
/* GMRES storage */
double *y, *z, *g;
double *hc, *hs;
double **h, **v;
/* CG storage */
double *r, *d, *q, *p;
rvec2 *r2, *d2, *q2, *p2;
/* Taper */
double Tap[8]; //Tap7, Tap6, Tap5, Tap4, Tap3, Tap2, Tap1, Tap0;
/* storage for analysis */
int *mark, *old_mark;
rvec *x_old;
/* storage space for bond restrictions */
int *restricted;
int **restricted_list;
/* integrator */
rvec *v_const;
/* force calculations */
double *CdDelta; // coefficient of dDelta
rvec *f;
reallocate_data realloc;
} storage;
typedef union
{
void *v;
three_body_interaction_data *three_body_list;
bond_data *bond_list;
dbond_data *dbo_list;
dDelta_data *dDelta_list;
far_neighbor_data *far_nbr_list;
hbond_data *hbond_list;
} list_type;
struct _reax_list
{
int allocated;
int n;
int num_intrs;
int *index;
int *end_index;
int type;
list_type select;
};
typedef _reax_list reax_list;
typedef struct
{
FILE *strj;
int trj_offset;
int atom_line_len;
int bond_line_len;
int angle_line_len;
int write_atoms;
int write_bonds;
int write_angles;
char *line;
int buffer_len;
char *buffer;
FILE *out;
FILE *pot;
FILE *log;
FILE *mol, *ign;
FILE *dpl;
FILE *drft;
FILE *pdb;
FILE *prs;
int write_steps;
int traj_compress;
int traj_method;
char traj_title[81];
int atom_info;
int bond_info;
int angle_info;
int restart_format;
int restart_freq;
int debug_level;
int energy_update_freq;
} output_controls;
typedef struct
{
int atom_count;
int atom_list[REAX_MAX_MOLECULE_SIZE];
int mtypes[REAX_MAX_ATOM_TYPES];
} molecule;
struct LR_data
{
double H;
double e_vdW, CEvd;
double e_ele, CEclmb;
void operator = (const LR_data& rhs) {
H = rhs.H;
e_vdW = rhs.e_vdW;
CEvd = rhs.CEvd;
e_ele = rhs.e_ele;
CEclmb = rhs.CEclmb;
}
void operator = (const LR_data& rhs) volatile {
H = rhs.H;
e_vdW = rhs.e_vdW;
CEvd = rhs.CEvd;
e_ele = rhs.e_ele;
CEclmb = rhs.CEclmb;
}
};
struct cubic_spline_coef
{
double a, b, c, d;
void operator = (const cubic_spline_coef& rhs) {
a = rhs.a;
b = rhs.b;
c = rhs.c;
d = rhs.d;
}
void operator = (const cubic_spline_coef& rhs) volatile {
a = rhs.a;
b = rhs.b;
c = rhs.c;
d = rhs.d;
}
};
typedef struct
{
double xmin, xmax;
int n;
double dx, inv_dx;
double a;
double m;
double c;
LR_data *y;
cubic_spline_coef *H;
cubic_spline_coef *vdW, *CEvd;
cubic_spline_coef *ele, *CEclmb;
} LR_lookup_table;
extern LR_lookup_table **LR;
/* function pointer defs */
typedef void (*evolve_function)(reax_system*, control_params*,
simulation_data*, storage*, reax_list**,
output_controls*, mpi_datatypes* );
typedef void (*interaction_function) (reax_system*, control_params*,
simulation_data*, storage*,
reax_list**, output_controls*);
typedef void (*print_interaction)(reax_system*, control_params*,
simulation_data*, storage*,
reax_list**, output_controls*);
typedef double (*lookup_function)(double);
typedef void (*message_sorter) (reax_system*, int, int, int, mpi_out_data*);
typedef void (*unpacker) ( reax_system*, int, void*, int, neighbor_proc*, int );
typedef void (*dist_packer) (void*, mpi_out_data*);
typedef void (*coll_unpacker) (void*, void*, mpi_out_data*);
#endif
diff --git a/src/USER-REAXC/reaxc_valence_angles.cpp b/src/USER-REAXC/reaxc_valence_angles.cpp
index c2b3287be..c92996e56 100644
--- a/src/USER-REAXC/reaxc_valence_angles.cpp
+++ b/src/USER-REAXC/reaxc_valence_angles.cpp
@@ -1,416 +1,416 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_valence_angles.h"
#include "reaxc_bond_orders.h"
#include "reaxc_list.h"
#include "reaxc_vector.h"
static double Dot( double* v1, double* v2, int k )
{
double ret = 0.0;
for( int i=0; i < k; ++i )
ret += v1[i] * v2[i];
return ret;
}
void Calculate_Theta( rvec dvec_ji, double d_ji, rvec dvec_jk, double d_jk,
double *theta, double *cos_theta )
{
(*cos_theta) = Dot( dvec_ji, dvec_jk, 3 ) / ( d_ji * d_jk );
if( *cos_theta > 1. ) *cos_theta = 1.0;
if( *cos_theta < -1. ) *cos_theta = -1.0;
(*theta) = acos( *cos_theta );
}
void Calculate_dCos_Theta( rvec dvec_ji, double d_ji, rvec dvec_jk, double d_jk,
rvec* dcos_theta_di,
rvec* dcos_theta_dj,
rvec* dcos_theta_dk )
{
int t;
double sqr_d_ji = SQR(d_ji);
double sqr_d_jk = SQR(d_jk);
double inv_dists = 1.0 / (d_ji * d_jk);
double inv_dists3 = pow( inv_dists, 3.0 );
double dot_dvecs = Dot( dvec_ji, dvec_jk, 3 );
double Cdot_inv3 = dot_dvecs * inv_dists3;
for( t = 0; t < 3; ++t ) {
(*dcos_theta_di)[t] = dvec_jk[t] * inv_dists -
Cdot_inv3 * sqr_d_jk * dvec_ji[t];
(*dcos_theta_dj)[t] = -(dvec_jk[t] + dvec_ji[t]) * inv_dists +
Cdot_inv3 * ( sqr_d_jk * dvec_ji[t] + sqr_d_ji * dvec_jk[t] );
(*dcos_theta_dk)[t] = dvec_ji[t] * inv_dists -
Cdot_inv3 * sqr_d_ji * dvec_jk[t];
}
}
void Valence_Angles( reax_system *system, control_params *control,
simulation_data *data, storage *workspace,
reax_list **lists, output_controls *out_control )
{
int i, j, pi, k, pk, t;
int type_i, type_j, type_k;
int start_j, end_j, start_pk, end_pk;
int cnt, num_thb_intrs;
double temp, temp_bo_jt, pBOjt7;
double p_val1, p_val2, p_val3, p_val4, p_val5;
double p_val6, p_val7, p_val8, p_val9, p_val10;
double p_pen1, p_pen2, p_pen3, p_pen4;
double p_coa1, p_coa2, p_coa3, p_coa4;
double trm8, expval6, expval7, expval2theta, expval12theta, exp3ij, exp3jk;
double exp_pen2ij, exp_pen2jk, exp_pen3, exp_pen4, trm_pen34, exp_coa2;
double dSBO1, dSBO2, SBO, SBO2, CSBO2, SBOp, prod_SBO, vlpadj;
double CEval1, CEval2, CEval3, CEval4, CEval5, CEval6, CEval7, CEval8;
double CEpen1, CEpen2, CEpen3;
double e_ang, e_coa, e_pen;
double CEcoa1, CEcoa2, CEcoa3, CEcoa4, CEcoa5;
double Cf7ij, Cf7jk, Cf8j, Cf9j;
double f7_ij, f7_jk, f8_Dj, f9_Dj;
double Ctheta_0, theta_0, theta_00, theta, cos_theta, sin_theta;
double BOA_ij, BOA_jk;
rvec force, ext_press;
// Tallying variables
double eng_tmp, fi_tmp[3], fj_tmp[3], fk_tmp[3];
double delij[3], delkj[3];
three_body_header *thbh;
three_body_parameters *thbp;
three_body_interaction_data *p_ijk, *p_kji;
bond_data *pbond_ij, *pbond_jk, *pbond_jt;
bond_order_data *bo_ij, *bo_jk, *bo_jt;
reax_list *bonds = (*lists) + BONDS;
reax_list *thb_intrs = (*lists) + THREE_BODIES;
/* global parameters used in these calculations */
p_val6 = system->reax_param.gp.l[14];
p_val8 = system->reax_param.gp.l[33];
p_val9 = system->reax_param.gp.l[16];
p_val10 = system->reax_param.gp.l[17];
num_thb_intrs = 0;
for( j = 0; j < system->N; ++j ) { // Ray: the first one with system->N
type_j = system->my_atoms[j].type;
if (type_j < 0) continue;
start_j = Start_Index(j, bonds);
end_j = End_Index(j, bonds);
p_val3 = system->reax_param.sbp[ type_j ].p_val3;
p_val5 = system->reax_param.sbp[ type_j ].p_val5;
SBOp = 0, prod_SBO = 1;
for( t = start_j; t < end_j; ++t ) {
bo_jt = &(bonds->select.bond_list[t].bo_data);
SBOp += (bo_jt->BO_pi + bo_jt->BO_pi2);
temp = SQR( bo_jt->BO );
temp *= temp;
temp *= temp;
prod_SBO *= exp( -temp );
}
if( workspace->vlpex[j] >= 0 ){
vlpadj = 0;
dSBO2 = prod_SBO - 1;
}
else{
vlpadj = workspace->nlp[j];
dSBO2 = (prod_SBO - 1) * (1 - p_val8 * workspace->dDelta_lp[j]);
}
SBO = SBOp + (1 - prod_SBO) * (-workspace->Delta_boc[j] - p_val8 * vlpadj);
dSBO1 = -8 * prod_SBO * ( workspace->Delta_boc[j] + p_val8 * vlpadj );
if( SBO <= 0 )
SBO2 = 0, CSBO2 = 0;
else if( SBO > 0 && SBO <= 1 ) {
SBO2 = pow( SBO, p_val9 );
CSBO2 = p_val9 * pow( SBO, p_val9 - 1 );
}
else if( SBO > 1 && SBO < 2 ) {
SBO2 = 2 - pow( 2-SBO, p_val9 );
CSBO2 = p_val9 * pow( 2 - SBO, p_val9 - 1 );
}
else
SBO2 = 2, CSBO2 = 0;
expval6 = exp( p_val6 * workspace->Delta_boc[j] );
for( pi = start_j; pi < end_j; ++pi ) {
Set_Start_Index( pi, num_thb_intrs, thb_intrs );
pbond_ij = &(bonds->select.bond_list[pi]);
bo_ij = &(pbond_ij->bo_data);
BOA_ij = bo_ij->BO - control->thb_cut;
if( BOA_ij/*bo_ij->BO*/ > 0.0 &&
( j < system->n || pbond_ij->nbr < system->n ) ) {
i = pbond_ij->nbr;
type_i = system->my_atoms[i].type;
for( pk = start_j; pk < pi; ++pk ) {
start_pk = Start_Index( pk, thb_intrs );
end_pk = End_Index( pk, thb_intrs );
for( t = start_pk; t < end_pk; ++t )
if( thb_intrs->select.three_body_list[t].thb == i ) {
p_ijk = &(thb_intrs->select.three_body_list[num_thb_intrs] );
p_kji = &(thb_intrs->select.three_body_list[t]);
p_ijk->thb = bonds->select.bond_list[pk].nbr;
p_ijk->pthb = pk;
p_ijk->theta = p_kji->theta;
rvec_Copy( p_ijk->dcos_di, p_kji->dcos_dk );
rvec_Copy( p_ijk->dcos_dj, p_kji->dcos_dj );
rvec_Copy( p_ijk->dcos_dk, p_kji->dcos_di );
++num_thb_intrs;
break;
}
}
for( pk = pi+1; pk < end_j; ++pk ) {
pbond_jk = &(bonds->select.bond_list[pk]);
bo_jk = &(pbond_jk->bo_data);
BOA_jk = bo_jk->BO - control->thb_cut;
k = pbond_jk->nbr;
type_k = system->my_atoms[k].type;
p_ijk = &( thb_intrs->select.three_body_list[num_thb_intrs] );
Calculate_Theta( pbond_ij->dvec, pbond_ij->d,
pbond_jk->dvec, pbond_jk->d,
&theta, &cos_theta );
Calculate_dCos_Theta( pbond_ij->dvec, pbond_ij->d,
pbond_jk->dvec, pbond_jk->d,
&(p_ijk->dcos_di), &(p_ijk->dcos_dj),
&(p_ijk->dcos_dk) );
p_ijk->thb = k;
p_ijk->pthb = pk;
p_ijk->theta = theta;
sin_theta = sin( theta );
if( sin_theta < 1.0e-5 )
sin_theta = 1.0e-5;
++num_thb_intrs;
if( (j < system->n) && (BOA_jk > 0.0) &&
(bo_ij->BO > control->thb_cut) &&
(bo_jk->BO > control->thb_cut) &&
(bo_ij->BO * bo_jk->BO > control->thb_cutsq) ) {
thbh = &( system->reax_param.thbp[ type_i ][ type_j ][ type_k ] );
for( cnt = 0; cnt < thbh->cnt; ++cnt ) {
if( fabs(thbh->prm[cnt].p_val1) > 0.001 ) {
thbp = &( thbh->prm[cnt] );
/* ANGLE ENERGY */
p_val1 = thbp->p_val1;
p_val2 = thbp->p_val2;
p_val4 = thbp->p_val4;
p_val7 = thbp->p_val7;
theta_00 = thbp->theta_00;
exp3ij = exp( -p_val3 * pow( BOA_ij, p_val4 ) );
f7_ij = 1.0 - exp3ij;
Cf7ij = p_val3 * p_val4 * pow( BOA_ij, p_val4 - 1.0 ) * exp3ij;
exp3jk = exp( -p_val3 * pow( BOA_jk, p_val4 ) );
f7_jk = 1.0 - exp3jk;
Cf7jk = p_val3 * p_val4 * pow( BOA_jk, p_val4 - 1.0 ) * exp3jk;
expval7 = exp( -p_val7 * workspace->Delta_boc[j] );
trm8 = 1.0 + expval6 + expval7;
f8_Dj = p_val5 - ( (p_val5 - 1.0) * (2.0 + expval6) / trm8 );
Cf8j = ( (1.0 - p_val5) / SQR(trm8) ) *
( p_val6 * expval6 * trm8 -
(2.0 + expval6) * ( p_val6*expval6 - p_val7*expval7 ) );
theta_0 = 180.0 - theta_00 * (1.0 -
exp(-p_val10 * (2.0 - SBO2)));
theta_0 = DEG2RAD( theta_0 );
expval2theta = exp( -p_val2 * SQR(theta_0 - theta) );
if( p_val1 >= 0 )
expval12theta = p_val1 * (1.0 - expval2theta);
else // To avoid linear Me-H-Me angles (6/6/06)
expval12theta = p_val1 * -expval2theta;
CEval1 = Cf7ij * f7_jk * f8_Dj * expval12theta;
CEval2 = Cf7jk * f7_ij * f8_Dj * expval12theta;
CEval3 = Cf8j * f7_ij * f7_jk * expval12theta;
CEval4 = -2.0 * p_val1 * p_val2 * f7_ij * f7_jk * f8_Dj *
expval2theta * (theta_0 - theta);
Ctheta_0 = p_val10 * DEG2RAD(theta_00) *
exp( -p_val10 * (2.0 - SBO2) );
CEval5 = -CEval4 * Ctheta_0 * CSBO2;
CEval6 = CEval5 * dSBO1;
CEval7 = CEval5 * dSBO2;
CEval8 = -CEval4 / sin_theta;
data->my_en.e_ang += e_ang =
f7_ij * f7_jk * f8_Dj * expval12theta;
/* END ANGLE ENERGY*/
/* PENALTY ENERGY */
p_pen1 = thbp->p_pen1;
p_pen2 = system->reax_param.gp.l[19];
p_pen3 = system->reax_param.gp.l[20];
p_pen4 = system->reax_param.gp.l[21];
exp_pen2ij = exp( -p_pen2 * SQR( BOA_ij - 2.0 ) );
exp_pen2jk = exp( -p_pen2 * SQR( BOA_jk - 2.0 ) );
exp_pen3 = exp( -p_pen3 * workspace->Delta[j] );
exp_pen4 = exp( p_pen4 * workspace->Delta[j] );
trm_pen34 = 1.0 + exp_pen3 + exp_pen4;
f9_Dj = ( 2.0 + exp_pen3 ) / trm_pen34;
Cf9j = ( -p_pen3 * exp_pen3 * trm_pen34 -
(2.0 + exp_pen3) * ( -p_pen3 * exp_pen3 +
p_pen4 * exp_pen4 ) ) /
SQR( trm_pen34 );
data->my_en.e_pen += e_pen =
p_pen1 * f9_Dj * exp_pen2ij * exp_pen2jk;
CEpen1 = e_pen * Cf9j / f9_Dj;
temp = -2.0 * p_pen2 * e_pen;
CEpen2 = temp * (BOA_ij - 2.0);
CEpen3 = temp * (BOA_jk - 2.0);
/* END PENALTY ENERGY */
/* COALITION ENERGY */
p_coa1 = thbp->p_coa1;
p_coa2 = system->reax_param.gp.l[2];
p_coa3 = system->reax_param.gp.l[38];
p_coa4 = system->reax_param.gp.l[30];
exp_coa2 = exp( p_coa2 * workspace->Delta_val[j] );
data->my_en.e_coa += e_coa =
p_coa1 / (1. + exp_coa2) *
exp( -p_coa3 * SQR(workspace->total_bond_order[i]-BOA_ij) ) *
exp( -p_coa3 * SQR(workspace->total_bond_order[k]-BOA_jk) ) *
exp( -p_coa4 * SQR(BOA_ij - 1.5) ) *
exp( -p_coa4 * SQR(BOA_jk - 1.5) );
CEcoa1 = -2 * p_coa4 * (BOA_ij - 1.5) * e_coa;
CEcoa2 = -2 * p_coa4 * (BOA_jk - 1.5) * e_coa;
CEcoa3 = -p_coa2 * exp_coa2 * e_coa / (1 + exp_coa2);
CEcoa4 = -2 * p_coa3 *
(workspace->total_bond_order[i]-BOA_ij) * e_coa;
CEcoa5 = -2 * p_coa3 *
(workspace->total_bond_order[k]-BOA_jk) * e_coa;
/* END COALITION ENERGY */
/* FORCES */
bo_ij->Cdbo += (CEval1 + CEpen2 + (CEcoa1 - CEcoa4));
bo_jk->Cdbo += (CEval2 + CEpen3 + (CEcoa2 - CEcoa5));
workspace->CdDelta[j] += ((CEval3 + CEval7) + CEpen1 + CEcoa3);
workspace->CdDelta[i] += CEcoa4;
workspace->CdDelta[k] += CEcoa5;
for( t = start_j; t < end_j; ++t ) {
pbond_jt = &( bonds->select.bond_list[t] );
bo_jt = &(pbond_jt->bo_data);
temp_bo_jt = bo_jt->BO;
temp = CUBE( temp_bo_jt );
pBOjt7 = temp * temp * temp_bo_jt;
bo_jt->Cdbo += (CEval6 * pBOjt7);
bo_jt->Cdbopi += CEval5;
bo_jt->Cdbopi2 += CEval5;
}
if( control->virial == 0 ) {
rvec_ScaledAdd( workspace->f[i], CEval8, p_ijk->dcos_di );
rvec_ScaledAdd( workspace->f[j], CEval8, p_ijk->dcos_dj );
rvec_ScaledAdd( workspace->f[k], CEval8, p_ijk->dcos_dk );
}
else {
rvec_Scale( force, CEval8, p_ijk->dcos_di );
rvec_Add( workspace->f[i], force );
rvec_iMultiply( ext_press, pbond_ij->rel_box, force );
rvec_Add( data->my_ext_press, ext_press );
rvec_ScaledAdd( workspace->f[j], CEval8, p_ijk->dcos_dj );
rvec_Scale( force, CEval8, p_ijk->dcos_dk );
rvec_Add( workspace->f[k], force );
rvec_iMultiply( ext_press, pbond_jk->rel_box, force );
rvec_Add( data->my_ext_press, ext_press );
}
/* tally into per-atom virials */
if( system->pair_ptr->vflag_atom || system->pair_ptr->evflag) {
/* Acquire vectors */
rvec_ScaledSum( delij, 1., system->my_atoms[i].x,
-1., system->my_atoms[j].x );
rvec_ScaledSum( delkj, 1., system->my_atoms[k].x,
-1., system->my_atoms[j].x );
rvec_Scale( fi_tmp, -CEval8, p_ijk->dcos_di );
rvec_Scale( fj_tmp, -CEval8, p_ijk->dcos_dj );
rvec_Scale( fk_tmp, -CEval8, p_ijk->dcos_dk );
eng_tmp = e_ang + e_pen + e_coa;
if( system->pair_ptr->evflag)
system->pair_ptr->ev_tally(j,j,system->N,1,eng_tmp,0.0,0.0,0.0,0.0,0.0);
if( system->pair_ptr->vflag_atom)
system->pair_ptr->v_tally3(i,j,k,fi_tmp,fk_tmp,delij,delkj);
}
}
}
}
}
}
Set_End_Index(pi, num_thb_intrs, thb_intrs );
}
}
if( num_thb_intrs >= thb_intrs->num_intrs * DANGER_ZONE ) {
workspace->realloc.num_3body = num_thb_intrs;
if( num_thb_intrs > thb_intrs->num_intrs ) {
fprintf( stderr, "step%d-ran out of space on angle_list: top=%d, max=%d",
data->step, num_thb_intrs, thb_intrs->num_intrs );
MPI_Abort( MPI_COMM_WORLD, INSUFFICIENT_MEMORY );
}
}
}
diff --git a/src/USER-REAXC/reaxc_vector.cpp b/src/USER-REAXC/reaxc_vector.cpp
index ee63e9428..977b17a6d 100644
--- a/src/USER-REAXC/reaxc_vector.cpp
+++ b/src/USER-REAXC/reaxc_vector.cpp
@@ -1,159 +1,159 @@
/*----------------------------------------------------------------------
PuReMD - Purdue ReaxFF Molecular Dynamics Program
Copyright (2010) Purdue University
Hasan Metin Aktulga, hmaktulga@lbl.gov
Joseph Fogarty, jcfogart@mail.usf.edu
Sagar Pandit, pandit@usf.edu
Ananth Y Grama, ayg@cs.purdue.edu
Please cite the related publication:
H. M. Aktulga, J. C. Fogarty, S. A. Pandit, A. Y. Grama,
"Parallel Reactive Molecular Dynamics: Numerical Methods and
Algorithmic Techniques", Parallel Computing, in press.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details:
<http://www.gnu.org/licenses/>.
----------------------------------------------------------------------*/
-#include "pair_reax_c.h"
+#include "pair_reaxc.h"
#include "reaxc_vector.h"
void rvec_Copy( rvec dest, rvec src )
{
dest[0] = src[0], dest[1] = src[1], dest[2] = src[2];
}
void rvec_Scale( rvec ret, double c, rvec v )
{
ret[0] = c * v[0], ret[1] = c * v[1], ret[2] = c * v[2];
}
void rvec_Add( rvec ret, rvec v )
{
ret[0] += v[0], ret[1] += v[1], ret[2] += v[2];
}
void rvec_ScaledAdd( rvec ret, double c, rvec v )
{
ret[0] += c * v[0], ret[1] += c * v[1], ret[2] += c * v[2];
}
void rvec_ScaledSum( rvec ret, double c1, rvec v1 ,double c2, rvec v2 )
{
ret[0] = c1 * v1[0] + c2 * v2[0];
ret[1] = c1 * v1[1] + c2 * v2[1];
ret[2] = c1 * v1[2] + c2 * v2[2];
}
double rvec_Dot( rvec v1, rvec v2 )
{
return v1[0]*v2[0] + v1[1]*v2[1] + v1[2]*v2[2];
}
void rvec_iMultiply( rvec r, ivec v1, rvec v2 )
{
r[0] = v1[0] * v2[0];
r[1] = v1[1] * v2[1];
r[2] = v1[2] * v2[2];
}
void rvec_Cross( rvec ret, rvec v1, rvec v2 )
{
ret[0] = v1[1] * v2[2] - v1[2] * v2[1];
ret[1] = v1[2] * v2[0] - v1[0] * v2[2];
ret[2] = v1[0] * v2[1] - v1[1] * v2[0];
}
double rvec_Norm_Sqr( rvec v )
{
return SQR(v[0]) + SQR(v[1]) + SQR(v[2]);
}
double rvec_Norm( rvec v )
{
return sqrt( SQR(v[0]) + SQR(v[1]) + SQR(v[2]) );
}
void rvec_MakeZero( rvec v )
{
v[0] = v[1] = v[2] = 0.000000000000000e+00;
}
void rtensor_MatVec( rvec ret, rtensor m, rvec v )
{
int i;
rvec temp;
if( ret == v )
{
for( i = 0; i < 3; ++i )
temp[i] = m[i][0] * v[0] + m[i][1] * v[1] + m[i][2] * v[2];
for( i = 0; i < 3; ++i )
ret[i] = temp[i];
}
else
{
for( i = 0; i < 3; ++i )
ret[i] = m[i][0] * v[0] + m[i][1] * v[1] + m[i][2] * v[2];
}
}
void rtensor_MakeZero( rtensor t )
{
t[0][0] = t[0][1] = t[0][2] = 0;
t[1][0] = t[1][1] = t[1][2] = 0;
t[2][0] = t[2][1] = t[2][2] = 0;
}
void ivec_MakeZero( ivec v )
{
v[0] = v[1] = v[2] = 0;
}
void ivec_Copy( ivec dest, ivec src )
{
dest[0] = src[0], dest[1] = src[1], dest[2] = src[2];
}
void ivec_Scale( ivec dest, double C, ivec src )
{
dest[0] = (int)(C * src[0]);
dest[1] = (int)(C * src[1]);
dest[2] = (int)(C * src[2]);
}
void ivec_Sum( ivec dest, ivec v1, ivec v2 )
{
dest[0] = v1[0] + v2[0];
dest[1] = v1[1] + v2[1];
dest[2] = v1[2] + v2[2];
}
diff --git a/src/USER-TALLY/compute_force_tally.cpp b/src/USER-TALLY/compute_force_tally.cpp
index e9ecedd5a..e97a1c751 100644
--- a/src/USER-TALLY/compute_force_tally.cpp
+++ b/src/USER-TALLY/compute_force_tally.cpp
@@ -1,224 +1,224 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <string.h>
#include "compute_force_tally.h"
#include "atom.h"
#include "group.h"
#include "pair.h"
#include "update.h"
#include "memory.h"
#include "error.h"
#include "force.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
ComputeForceTally::ComputeForceTally(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg)
{
if (narg < 4) error->all(FLERR,"Illegal compute force/tally command");
igroup2 = group->find(arg[3]);
if (igroup2 == -1)
error->all(FLERR,"Could not find compute force/tally second group ID");
groupbit2 = group->bitmask[igroup2];
scalar_flag = 1;
vector_flag = 0;
peratom_flag = 1;
timeflag = 1;
comm_reverse = size_peratom_cols = 3;
extscalar = 1;
peflag = 1; // we need Pair::ev_tally() to be run
did_compute = 0;
invoked_peratom = invoked_scalar = -1;
nmax = -1;
fatom = NULL;
vector = new double[size_peratom_cols];
}
/* ---------------------------------------------------------------------- */
ComputeForceTally::~ComputeForceTally()
{
if (force && force->pair) force->pair->del_tally_callback(this);
memory->destroy(fatom);
delete[] vector;
}
/* ---------------------------------------------------------------------- */
void ComputeForceTally::init()
{
if (force->pair == NULL)
- error->all(FLERR,"Trying to use compute force/tally with no pair style");
+ error->all(FLERR,"Trying to use compute force/tally without pair style");
else
force->pair->add_tally_callback(this);
if (force->pair->single_enable == 0 || force->pair->manybody_flag)
- error->all(FLERR,"Compute force/tally used with incompatible pair style.");
+ error->warning(FLERR,"Compute force/tally used with incompatible pair style");
if ((comm->me == 0) && (force->bond || force->angle || force->dihedral
|| force->improper || force->kspace))
error->warning(FLERR,"Compute force/tally only called from pair style");
did_compute = -1;
}
/* ---------------------------------------------------------------------- */
void ComputeForceTally::pair_tally_callback(int i, int j, int nlocal, int newton,
double, double, double fpair,
double dx, double dy, double dz)
{
const int ntotal = atom->nlocal + atom->nghost;
const int * const mask = atom->mask;
// do setup work that needs to be done only once per timestep
if (did_compute != update->ntimestep) {
did_compute = update->ntimestep;
// grow local force array if necessary
// needs to be atom->nmax in length
if (atom->nmax > nmax) {
memory->destroy(fatom);
nmax = atom->nmax;
memory->create(fatom,nmax,size_peratom_cols,"force/tally:fatom");
array_atom = fatom;
}
// clear storage as needed
if (newton) {
for (int i=0; i < ntotal; ++i)
for (int j=0; j < size_peratom_cols; ++j)
fatom[i][j] = 0.0;
} else {
for (int i=0; i < atom->nlocal; ++i)
for (int j=0; j < size_peratom_cols; ++j)
fatom[i][j] = 0.0;
}
for (int i=0; i < size_peratom_cols; ++i)
vector[i] = ftotal[i] = 0.0;
}
if ( ((mask[i] & groupbit) && (mask[j] & groupbit2))
|| ((mask[i] & groupbit2) && (mask[j] & groupbit)) ) {
if (newton || i < nlocal) {
if (mask[i] & groupbit) {
ftotal[0] += fpair*dx;
ftotal[1] += fpair*dy;
ftotal[2] += fpair*dz;
}
fatom[i][0] += fpair*dx;
fatom[i][1] += fpair*dy;
fatom[i][2] += fpair*dz;
}
if (newton || j < nlocal) {
if (mask[j] & groupbit) {
ftotal[0] -= fpair*dx;
ftotal[1] -= fpair*dy;
ftotal[2] -= fpair*dz;
}
fatom[j][0] -= fpair*dx;
fatom[j][1] -= fpair*dy;
fatom[j][2] -= fpair*dz;
}
}
}
/* ---------------------------------------------------------------------- */
int ComputeForceTally::pack_reverse_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = fatom[i][0];
buf[m++] = fatom[i][1];
buf[m++] = fatom[i][2];
}
return m;
}
/* ---------------------------------------------------------------------- */
void ComputeForceTally::unpack_reverse_comm(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
fatom[j][0] += buf[m++];
fatom[j][1] += buf[m++];
fatom[j][2] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
double ComputeForceTally::compute_scalar()
{
invoked_scalar = update->ntimestep;
if ((did_compute != invoked_scalar) || (update->eflag_global != invoked_scalar))
error->all(FLERR,"Energy was not tallied on needed timestep");
// sum accumulated forces across procs
MPI_Allreduce(ftotal,vector,size_peratom_cols,MPI_DOUBLE,MPI_SUM,world);
scalar = sqrt(vector[0]*vector[0]+vector[1]*vector[1]+vector[2]*vector[2]);
return scalar;
}
/* ---------------------------------------------------------------------- */
void ComputeForceTally::compute_peratom()
{
invoked_peratom = update->ntimestep;
if ((did_compute != invoked_peratom) || (update->eflag_global != invoked_peratom))
error->all(FLERR,"Energy was not tallied on needed timestep");
// collect contributions from ghost atoms
if (force->newton_pair) {
comm->reverse_comm_compute(this);
const int nall = atom->nlocal + atom->nghost;
for (int i = atom->nlocal; i < nall; ++i)
for (int j = 0; j < size_peratom_cols; ++j)
fatom[i][j] = 0.0;
}
}
/* ----------------------------------------------------------------------
memory usage of local atom-based array
------------------------------------------------------------------------- */
double ComputeForceTally::memory_usage()
{
double bytes = nmax*size_peratom_cols * sizeof(double);
return bytes;
}
diff --git a/src/USER-TALLY/compute_heat_flux_tally.cpp b/src/USER-TALLY/compute_heat_flux_tally.cpp
index 214311cb3..48cad538d 100644
--- a/src/USER-TALLY/compute_heat_flux_tally.cpp
+++ b/src/USER-TALLY/compute_heat_flux_tally.cpp
@@ -1,286 +1,286 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <string.h>
#include "compute_heat_flux_tally.h"
#include "atom.h"
#include "group.h"
#include "pair.h"
#include "update.h"
#include "memory.h"
#include "error.h"
#include "force.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
ComputeHeatFluxTally::ComputeHeatFluxTally(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg)
{
if (narg < 4) error->all(FLERR,"Illegal compute heat/flux/tally command");
igroup2 = group->find(arg[3]);
if (igroup2 == -1)
error->all(FLERR,"Could not find compute heat/flux/tally second group ID");
groupbit2 = group->bitmask[igroup2];
vector_flag = 1;
timeflag = 1;
comm_reverse = 7;
extvector = 1;
size_vector = 6;
peflag = 1; // we need Pair::ev_tally() to be run
did_compute = 0;
invoked_peratom = invoked_scalar = -1;
nmax = -1;
stress = NULL;
eatom = NULL;
vector = new double[size_vector];
}
/* ---------------------------------------------------------------------- */
ComputeHeatFluxTally::~ComputeHeatFluxTally()
{
if (force && force->pair) force->pair->del_tally_callback(this);
delete[] vector;
}
/* ---------------------------------------------------------------------- */
void ComputeHeatFluxTally::init()
{
if (force->pair == NULL)
- error->all(FLERR,"Trying to use compute heat/flux/tally with no pair style");
+ error->all(FLERR,"Trying to use compute heat/flux/tally without pair style");
else
force->pair->add_tally_callback(this);
if (force->pair->single_enable == 0 || force->pair->manybody_flag)
- error->all(FLERR,"Compute heat/flux/tally used with incompatible pair style.");
+ error->warning(FLERR,"Compute heat/flux/tally used with incompatible pair style");
if ((comm->me == 0) && (force->bond || force->angle || force->dihedral
|| force->improper || force->kspace))
error->warning(FLERR,"Compute heat/flux/tally only called from pair style");
did_compute = -1;
}
/* ---------------------------------------------------------------------- */
void ComputeHeatFluxTally::pair_tally_callback(int i, int j, int nlocal, int newton,
double evdwl, double ecoul, double fpair,
double dx, double dy, double dz)
{
const int ntotal = atom->nlocal + atom->nghost;
const int * const mask = atom->mask;
// do setup work that needs to be done only once per timestep
if (did_compute != update->ntimestep) {
did_compute = update->ntimestep;
// grow local stress and eatom arrays if necessary
// needs to be atom->nmax in length
if (atom->nmax > nmax) {
memory->destroy(stress);
nmax = atom->nmax;
memory->create(stress,nmax,6,"heat/flux/tally:stress");
memory->destroy(eatom);
nmax = atom->nmax;
memory->create(eatom,nmax,"heat/flux/tally:eatom");
}
// clear storage as needed
if (newton) {
for (int i=0; i < ntotal; ++i) {
eatom[i] = 0.0;
stress[i][0] = 0.0;
stress[i][1] = 0.0;
stress[i][2] = 0.0;
stress[i][3] = 0.0;
stress[i][4] = 0.0;
stress[i][5] = 0.0;
}
} else {
for (int i=0; i < atom->nlocal; ++i) {
eatom[i] = 0.0;
stress[i][0] = 0.0;
stress[i][1] = 0.0;
stress[i][2] = 0.0;
stress[i][3] = 0.0;
stress[i][4] = 0.0;
stress[i][5] = 0.0;
}
}
for (int i=0; i < size_vector; ++i)
vector[i] = heatj[i] = 0.0;
}
if ( ((mask[i] & groupbit) && (mask[j] & groupbit2))
|| ((mask[i] & groupbit2) && (mask[j] & groupbit)) ) {
const double epairhalf = 0.5 * (evdwl + ecoul);
fpair *= 0.5;
const double v0 = dx*dx*fpair; // dx*fpair = Fij_x
const double v1 = dy*dy*fpair;
const double v2 = dz*dz*fpair;
const double v3 = dx*dy*fpair;
const double v4 = dx*dz*fpair;
const double v5 = dy*dz*fpair;
if (newton || i < nlocal) {
eatom[i] += epairhalf;
stress[i][0] += v0;
stress[i][1] += v1;
stress[i][2] += v2;
stress[i][3] += v3;
stress[i][4] += v4;
stress[i][5] += v5;
}
if (newton || j < nlocal) {
eatom[j] += epairhalf;
stress[j][0] += v0;
stress[j][1] += v1;
stress[j][2] += v2;
stress[j][3] += v3;
stress[j][4] += v4;
stress[j][5] += v5;
}
}
}
/* ---------------------------------------------------------------------- */
int ComputeHeatFluxTally::pack_reverse_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = eatom[i];
buf[m++] = stress[i][0];
buf[m++] = stress[i][1];
buf[m++] = stress[i][2];
buf[m++] = stress[i][3];
buf[m++] = stress[i][4];
buf[m++] = stress[i][5];
}
return m;
}
/* ---------------------------------------------------------------------- */
void ComputeHeatFluxTally::unpack_reverse_comm(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
eatom[j] += buf[m++];
stress[j][0] += buf[m++];
stress[j][1] += buf[m++];
stress[j][2] += buf[m++];
stress[j][3] += buf[m++];
stress[j][4] += buf[m++];
stress[j][5] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
void ComputeHeatFluxTally::compute_vector()
{
invoked_vector = update->ntimestep;
if ((did_compute != invoked_vector) || (update->eflag_global != invoked_vector))
error->all(FLERR,"Energy was not tallied on needed timestep");
// collect contributions from ghost atoms
if (force->newton_pair) {
comm->reverse_comm_compute(this);
const int nall = atom->nlocal + atom->nghost;
for (int i = atom->nlocal; i < nall; ++i) {
eatom[i] = 0.0;
stress[i][0] = 0.0;
stress[i][1] = 0.0;
stress[i][2] = 0.0;
stress[i][3] = 0.0;
stress[i][4] = 0.0;
stress[i][5] = 0.0;
}
}
// compute heat currents
// heat flux vector = jc[3] + jv[3]
// jc[3] = convective portion of heat flux = sum_i (ke_i + pe_i) v_i[3]
// jv[3] = virial portion of heat flux = sum_i (stress_tensor_i . v_i[3])
// normalization by volume is not included
// J = sum_i( (0.5*m*v_i^2 + 0.5*(evdwl_i+ecoul_i))*v_i +
// + (F_ij . v_i)*dR_ij/2 )
int nlocal = atom->nlocal;
int *mask = atom->mask;
const double pfactor = 0.5 * force->mvv2e;
double **v = atom->v;
double *mass = atom->mass;
int *type = atom->type;
double jc[3] = {0.0,0.0,0.0};
double jv[3] = {0.0,0.0,0.0};
for (int i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) {
double ke_i = pfactor * mass[type[i]] *
(v[i][0]*v[i][0] + v[i][1]*v[i][1] + v[i][2]*v[i][2]);
jc[0] += (ke_i + eatom[i]) * v[i][0];
jc[1] += (ke_i + eatom[i]) * v[i][1];
jc[2] += (ke_i + eatom[i]) * v[i][2];
jv[0] += stress[i][0]*v[i][0] + stress[i][3]*v[i][1] +
stress[i][4]*v[i][2];
jv[1] += stress[i][3]*v[i][0] + stress[i][1]*v[i][1] +
stress[i][5]*v[i][2];
jv[2] += stress[i][4]*v[i][0] + stress[i][5]*v[i][1] +
stress[i][2]*v[i][2];
}
}
// sum accumulated heatj across procs
heatj[0] = jc[0] + jv[0];
heatj[1] = jc[1] + jv[1];
heatj[2] = jc[2] + jv[2];
heatj[3] = jc[0];
heatj[4] = jc[1];
heatj[5] = jc[2];
MPI_Allreduce(heatj,vector,size_vector,MPI_DOUBLE,MPI_SUM,world);
}
/* ----------------------------------------------------------------------
memory usage of local atom-based array
------------------------------------------------------------------------- */
double ComputeHeatFluxTally::memory_usage()
{
double bytes = nmax*comm_reverse * sizeof(double);
return bytes;
}
diff --git a/src/USER-TALLY/compute_pe_mol_tally.cpp b/src/USER-TALLY/compute_pe_mol_tally.cpp
index 09ee04d57..a30f2d6b9 100644
--- a/src/USER-TALLY/compute_pe_mol_tally.cpp
+++ b/src/USER-TALLY/compute_pe_mol_tally.cpp
@@ -1,129 +1,129 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <string.h>
#include "compute_pe_mol_tally.h"
#include "atom.h"
#include "group.h"
#include "pair.h"
#include "update.h"
#include "memory.h"
#include "error.h"
#include "force.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
ComputePEMolTally::ComputePEMolTally(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg)
{
if (narg < 4) error->all(FLERR,"Illegal compute pe/mol/tally command");
igroup2 = group->find(arg[3]);
if (igroup2 == -1)
error->all(FLERR,"Could not find compute pe/mol/tally second group ID");
groupbit2 = group->bitmask[igroup2];
vector_flag = 1;
size_vector = 4;
timeflag = 1;
extvector = 1;
peflag = 1; // we need Pair::ev_tally() to be run
did_compute = invoked_vector = -1;
vector = new double[size_vector];
}
/* ---------------------------------------------------------------------- */
ComputePEMolTally::~ComputePEMolTally()
{
if (force && force->pair) force->pair->del_tally_callback(this);
delete[] vector;
}
/* ---------------------------------------------------------------------- */
void ComputePEMolTally::init()
{
if (force->pair == NULL)
- error->all(FLERR,"Trying to use compute pe/mol/tally with no pair style");
+ error->all(FLERR,"Trying to use compute pe/mol/tally without pair style");
else
force->pair->add_tally_callback(this);
if (atom->molecule_flag == 0)
- error->all(FLERR,"Compute pe/mol/tally requires molecule IDs.");
+ error->all(FLERR,"Compute pe/mol/tally requires molecule IDs");
if (force->pair->single_enable == 0 || force->pair->manybody_flag)
- error->all(FLERR,"Compute pe/mol/tally used with incompatible pair style.");
+ error->warning(FLERR,"Compute pe/mol/tally used with incompatible pair style");
if ((comm->me == 0) && (force->bond || force->angle || force->dihedral
|| force->improper || force->kspace))
error->warning(FLERR,"Compute pe/mol/tally only called from pair style");
did_compute = -1;
}
/* ---------------------------------------------------------------------- */
void ComputePEMolTally::pair_tally_callback(int i, int j, int nlocal, int newton,
double evdwl, double ecoul, double,
double, double, double)
{
const int * const mask = atom->mask;
const tagint * const molid = atom->molecule;
// do setup work that needs to be done only once per timestep
if (did_compute != update->ntimestep) {
did_compute = update->ntimestep;
etotal[0] = etotal[1] = etotal[2] = etotal[3] = 0.0;
}
if ( ((mask[i] & groupbit) && (mask[j] & groupbit2))
|| ((mask[i] & groupbit2) && (mask[j] & groupbit)) ){
evdwl *= 0.5; ecoul *= 0.5;
if (newton || i < nlocal) {
if (molid[i] == molid[j]) {
etotal[0] += evdwl; etotal[1] += ecoul;
} else {
etotal[2] += evdwl; etotal[3] += ecoul;
}
}
if (newton || j < nlocal) {
if (molid[i] == molid[j]) {
etotal[0] += evdwl; etotal[1] += ecoul;
} else {
etotal[2] += evdwl; etotal[3] += ecoul;
}
}
}
}
/* ---------------------------------------------------------------------- */
void ComputePEMolTally::compute_vector()
{
invoked_vector = update->ntimestep;
if ((did_compute != invoked_vector) || (update->eflag_global != invoked_vector))
error->all(FLERR,"Energy was not tallied on needed timestep");
// sum accumulated energies across procs
MPI_Allreduce(etotal,vector,size_vector,MPI_DOUBLE,MPI_SUM,world);
}
diff --git a/src/USER-TALLY/compute_pe_tally.cpp b/src/USER-TALLY/compute_pe_tally.cpp
index 68c00b6d2..2117f2cb1 100644
--- a/src/USER-TALLY/compute_pe_tally.cpp
+++ b/src/USER-TALLY/compute_pe_tally.cpp
@@ -1,205 +1,205 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <string.h>
#include "compute_pe_tally.h"
#include "atom.h"
#include "group.h"
#include "pair.h"
#include "update.h"
#include "memory.h"
#include "error.h"
#include "force.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
ComputePETally::ComputePETally(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg)
{
if (narg < 4) error->all(FLERR,"Illegal compute pe/tally command");
igroup2 = group->find(arg[3]);
if (igroup2 == -1)
error->all(FLERR,"Could not find compute pe/tally second group ID");
groupbit2 = group->bitmask[igroup2];
scalar_flag = 1;
vector_flag = 0;
peratom_flag = 1;
timeflag = 1;
comm_reverse = size_peratom_cols = 2;
extscalar = 1;
peflag = 1; // we need Pair::ev_tally() to be run
did_compute = invoked_peratom = invoked_scalar = -1;
nmax = -1;
eatom = NULL;
vector = new double[size_peratom_cols];
}
/* ---------------------------------------------------------------------- */
ComputePETally::~ComputePETally()
{
if (force && force->pair) force->pair->del_tally_callback(this);
memory->destroy(eatom);
delete[] vector;
}
/* ---------------------------------------------------------------------- */
void ComputePETally::init()
{
if (force->pair == NULL)
- error->all(FLERR,"Trying to use compute pe/tally with no pair style");
+ error->all(FLERR,"Trying to use compute pe/tally without a pair style");
else
force->pair->add_tally_callback(this);
if (force->pair->single_enable == 0 || force->pair->manybody_flag)
- error->all(FLERR,"Compute pe/tally used with incompatible pair style.");
+ error->warning(FLERR,"Compute pe/tally used with incompatible pair style");
if ((comm->me == 0) && (force->bond || force->angle || force->dihedral
|| force->improper || force->kspace))
error->warning(FLERR,"Compute pe/tally only called from pair style");
did_compute = -1;
}
/* ---------------------------------------------------------------------- */
void ComputePETally::pair_tally_callback(int i, int j, int nlocal, int newton,
double evdwl, double ecoul, double,
double, double, double)
{
const int ntotal = atom->nlocal + atom->nghost;
const int * const mask = atom->mask;
// do setup work that needs to be done only once per timestep
if (did_compute != update->ntimestep) {
did_compute = update->ntimestep;
// grow local eatom array if necessary
// needs to be atom->nmax in length
if (atom->nmax > nmax) {
memory->destroy(eatom);
nmax = atom->nmax;
memory->create(eatom,nmax,size_peratom_cols,"pe/tally:eatom");
array_atom = eatom;
}
// clear storage as needed
if (newton) {
for (int i=0; i < ntotal; ++i)
eatom[i][0] = eatom[i][1] = 0.0;
} else {
for (int i=0; i < atom->nlocal; ++i)
eatom[i][0] = eatom[i][1] = 0.0;
}
vector[0] = etotal[0] = vector[1] = etotal[1] = 0.0;
}
if ( ((mask[i] & groupbit) && (mask[j] & groupbit2))
|| ((mask[i] & groupbit2) && (mask[j] & groupbit)) ){
evdwl *= 0.5; ecoul *= 0.5;
if (newton || i < nlocal) {
etotal[0] += evdwl; eatom[i][0] += evdwl;
etotal[1] += ecoul; eatom[i][1] += ecoul;
}
if (newton || j < nlocal) {
etotal[0] += evdwl; eatom[j][0] += evdwl;
etotal[1] += ecoul; eatom[j][1] += ecoul;
}
}
}
/* ---------------------------------------------------------------------- */
int ComputePETally::pack_reverse_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = eatom[i][0];
buf[m++] = eatom[i][1];
}
return m;
}
/* ---------------------------------------------------------------------- */
void ComputePETally::unpack_reverse_comm(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
eatom[j][0] += buf[m++];
eatom[j][1] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
double ComputePETally::compute_scalar()
{
invoked_scalar = update->ntimestep;
if ((did_compute != invoked_scalar) || (update->eflag_global != invoked_scalar))
error->all(FLERR,"Energy was not tallied on needed timestep");
// sum accumulated energies across procs
MPI_Allreduce(etotal,vector,size_peratom_cols,MPI_DOUBLE,MPI_SUM,world);
scalar = vector[0]+vector[1];
return scalar;
}
/* ---------------------------------------------------------------------- */
void ComputePETally::compute_peratom()
{
invoked_peratom = update->ntimestep;
if ((did_compute != invoked_peratom) || (update->eflag_global != invoked_peratom))
error->all(FLERR,"Energy was not tallied on needed timestep");
// collect contributions from ghost atoms
if (force->newton_pair) {
comm->reverse_comm_compute(this);
const int nall = atom->nlocal + atom->nghost;
for (int i = atom->nlocal; i < nall; ++i)
eatom[i][0] = eatom[i][1] = 0.0;
}
}
/* ----------------------------------------------------------------------
memory usage of local atom-based array
------------------------------------------------------------------------- */
double ComputePETally::memory_usage()
{
double bytes = nmax*size_peratom_cols * sizeof(double);
return bytes;
}
diff --git a/src/USER-TALLY/compute_stress_tally.cpp b/src/USER-TALLY/compute_stress_tally.cpp
index 2575bd372..66df9f6e4 100644
--- a/src/USER-TALLY/compute_stress_tally.cpp
+++ b/src/USER-TALLY/compute_stress_tally.cpp
@@ -1,250 +1,250 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <string.h>
#include "compute_stress_tally.h"
#include "atom.h"
#include "group.h"
#include "pair.h"
#include "update.h"
#include "memory.h"
#include "error.h"
#include "force.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
ComputeStressTally::ComputeStressTally(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg)
{
if (narg < 4) error->all(FLERR,"Illegal compute stress/tally command");
igroup2 = group->find(arg[3]);
if (igroup2 == -1)
error->all(FLERR,"Could not find compute stress/tally second group ID");
groupbit2 = group->bitmask[igroup2];
scalar_flag = 1;
vector_flag = 0;
peratom_flag = 1;
timeflag = 1;
comm_reverse = size_peratom_cols = 6;
extscalar = 0;
peflag = 1; // we need Pair::ev_tally() to be run
did_compute = 0;
invoked_peratom = invoked_scalar = -1;
nmax = -1;
stress = NULL;
vector = new double[size_peratom_cols];
}
/* ---------------------------------------------------------------------- */
ComputeStressTally::~ComputeStressTally()
{
if (force && force->pair) force->pair->del_tally_callback(this);
memory->destroy(stress);
delete[] vector;
}
/* ---------------------------------------------------------------------- */
void ComputeStressTally::init()
{
if (force->pair == NULL)
- error->all(FLERR,"Trying to use compute stress/tally with no pair style");
+ error->all(FLERR,"Trying to use compute stress/tally without pair style");
else
force->pair->add_tally_callback(this);
if (force->pair->single_enable == 0 || force->pair->manybody_flag)
- error->all(FLERR,"Compute stress/tally used with incompatible pair style.");
+ error->warning(FLERR,"Compute stress/tally used with incompatible pair style");
if ((comm->me == 0) && (force->bond || force->angle || force->dihedral
|| force->improper || force->kspace))
error->warning(FLERR,"Compute stress/tally only called from pair style");
did_compute = -1;
}
/* ---------------------------------------------------------------------- */
void ComputeStressTally::pair_tally_callback(int i, int j, int nlocal, int newton,
double, double, double fpair,
double dx, double dy, double dz)
{
const int ntotal = atom->nlocal + atom->nghost;
const int * const mask = atom->mask;
// do setup work that needs to be done only once per timestep
if (did_compute != update->ntimestep) {
did_compute = update->ntimestep;
// grow local stress array if necessary
// needs to be atom->nmax in length
if (atom->nmax > nmax) {
memory->destroy(stress);
nmax = atom->nmax;
memory->create(stress,nmax,size_peratom_cols,"stress/tally:stress");
array_atom = stress;
}
// clear storage as needed
if (newton) {
for (int i=0; i < ntotal; ++i)
for (int j=0; j < size_peratom_cols; ++j)
stress[i][j] = 0.0;
} else {
for (int i=0; i < atom->nlocal; ++i)
for (int j=0; j < size_peratom_cols; ++j)
stress[i][j] = 0.0;
}
for (int i=0; i < size_peratom_cols; ++i)
vector[i] = virial[i] = 0.0;
}
if ( ((mask[i] & groupbit) && (mask[j] & groupbit2))
|| ((mask[i] & groupbit2) && (mask[j] & groupbit)) ) {
fpair *= 0.5;
const double v0 = dx*dx*fpair;
const double v1 = dy*dy*fpair;
const double v2 = dz*dz*fpair;
const double v3 = dx*dy*fpair;
const double v4 = dx*dz*fpair;
const double v5 = dy*dz*fpair;
if (newton || i < nlocal) {
virial[0] += v0; stress[i][0] += v0;
virial[1] += v1; stress[i][1] += v1;
virial[2] += v2; stress[i][2] += v2;
virial[3] += v3; stress[i][3] += v3;
virial[4] += v4; stress[i][4] += v4;
virial[5] += v5; stress[i][5] += v5;
}
if (newton || j < nlocal) {
virial[0] += v0; stress[j][0] += v0;
virial[1] += v1; stress[j][1] += v1;
virial[2] += v2; stress[j][2] += v2;
virial[3] += v3; stress[j][3] += v3;
virial[4] += v4; stress[j][4] += v4;
virial[5] += v5; stress[j][5] += v5;
}
}
}
/* ---------------------------------------------------------------------- */
int ComputeStressTally::pack_reverse_comm(int n, int first, double *buf)
{
int i,m,last;
m = 0;
last = first + n;
for (i = first; i < last; i++) {
buf[m++] = stress[i][0];
buf[m++] = stress[i][1];
buf[m++] = stress[i][2];
buf[m++] = stress[i][3];
buf[m++] = stress[i][4];
buf[m++] = stress[i][5];
}
return m;
}
/* ---------------------------------------------------------------------- */
void ComputeStressTally::unpack_reverse_comm(int n, int *list, double *buf)
{
int i,j,m;
m = 0;
for (i = 0; i < n; i++) {
j = list[i];
stress[j][0] += buf[m++];
stress[j][1] += buf[m++];
stress[j][2] += buf[m++];
stress[j][3] += buf[m++];
stress[j][4] += buf[m++];
stress[j][5] += buf[m++];
}
}
/* ---------------------------------------------------------------------- */
double ComputeStressTally::compute_scalar()
{
invoked_scalar = update->ntimestep;
if ((did_compute != invoked_scalar) || (update->eflag_global != invoked_scalar))
error->all(FLERR,"Energy was not tallied on needed timestep");
// sum accumulated forces across procs
MPI_Allreduce(virial,vector,size_peratom_cols,MPI_DOUBLE,MPI_SUM,world);
if (domain->dimension == 3)
scalar = (vector[0]+vector[1]+vector[2])/3.0;
else
scalar = (vector[0]+vector[1])/2.0;
return scalar;
}
/* ---------------------------------------------------------------------- */
void ComputeStressTally::compute_peratom()
{
invoked_peratom = update->ntimestep;
if ((did_compute != invoked_peratom) || (update->eflag_global != invoked_peratom))
error->all(FLERR,"Energy was not tallied on needed timestep");
// collect contributions from ghost atoms
if (force->newton_pair) {
comm->reverse_comm_compute(this);
const int nall = atom->nlocal + atom->nghost;
for (int i = atom->nlocal; i < nall; ++i)
for (int j = 0; j < size_peratom_cols; ++j)
stress[i][j] = 0.0;
}
// convert to stress*volume units = -pressure*volume
const double nktv2p = -force->nktv2p;
for (int i = 0; i < atom->nlocal; i++) {
stress[i][0] *= nktv2p;
stress[i][1] *= nktv2p;
stress[i][2] *= nktv2p;
stress[i][3] *= nktv2p;
stress[i][4] *= nktv2p;
stress[i][5] *= nktv2p;
}
}
/* ----------------------------------------------------------------------
memory usage of local atom-based array
------------------------------------------------------------------------- */
double ComputeStressTally::memory_usage()
{
double bytes = nmax*size_peratom_cols * sizeof(double);
return bytes;
}
diff --git a/src/USER-VTK/README b/src/USER-VTK/README
index 86ef56a74..3429c96b7 100644
--- a/src/USER-VTK/README
+++ b/src/USER-VTK/README
@@ -1,17 +1,17 @@
-This package implements the "dump custom/vtk" command which can be used in a
+This package implements the "dump vtk" command which can be used in a
LAMMPS input script.
-This dump allows to output atom data similar to dump custom, but directly into
-VTK files.
+This dump allows output of atom data similar to the dump custom
+command, but in VTK format.
-This package uses the VTK library (www.vtk.org) which must be installed on your
-system. See the lib/vtk/README file and the LAMMPS manual for information on
-building LAMMPS with external libraries. The settings in the Makefile.lammps
-file in that directory must be correct for LAMMPS to build correctly with this
-package installed.
+This package uses the VTK library (www.vtk.org) which must be
+installed on your system. See the lib/vtk/README file and the LAMMPS
+manual for information on building LAMMPS with external libraries.
+The settings in the Makefile.lammps file in that directory must be
+correct for LAMMPS to build correctly with this package installed.
-This code was initially developed for LIGGGHTS by Daniel Queteschiner at DCS
-Computing. This is an effort to integrate it back to LAMMPS.
+This code was initially developed for LIGGGHTS by Daniel Queteschiner
+at DCS Computing. This is an effort to integrate it back to LAMMPS.
The person who created this package is Richard Berger at JKU
(richard.berger@jku.at). Contact him directly if you have questions.
diff --git a/src/USER-VTK/dump_custom_vtk.cpp b/src/USER-VTK/dump_vtk.cpp
similarity index 91%
rename from src/USER-VTK/dump_custom_vtk.cpp
rename to src/USER-VTK/dump_vtk.cpp
index 0e4bc4597..0aa749e73 100644
--- a/src/USER-VTK/dump_custom_vtk.cpp
+++ b/src/USER-VTK/dump_vtk.cpp
@@ -1,2398 +1,2400 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
This file initially came from LIGGGHTS (www.liggghts.com)
Copyright (2014) DCS Computing GmbH, Linz
Copyright (2015) Johannes Kepler University Linz
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing authors:
Daniel Queteschiner (DCS, JKU)
Christoph Kloss (DCS)
Richard Berger (JKU)
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
-#include "dump_custom_vtk.h"
+#include "dump_vtk.h"
#include "atom.h"
#include "force.h"
#include "domain.h"
#include "region.h"
#include "group.h"
#include "input.h"
#include "variable.h"
#include "update.h"
#include "modify.h"
#include "compute.h"
#include "fix.h"
#include "memory.h"
#include "error.h"
+
#include <vector>
#include <sstream>
#include <vtkVersion.h>
+
#ifndef VTK_MAJOR_VERSION
#include <vtkConfigure.h>
#endif
+
#include <vtkPointData.h>
#include <vtkCellData.h>
#include <vtkDoubleArray.h>
#include <vtkIntArray.h>
#include <vtkStringArray.h>
#include <vtkPolyData.h>
#include <vtkPolyDataWriter.h>
#include <vtkXMLPolyDataWriter.h>
#include <vtkXMLPPolyDataWriter.h>
#include <vtkRectilinearGrid.h>
#include <vtkRectilinearGridWriter.h>
#include <vtkXMLRectilinearGridWriter.h>
#include <vtkHexahedron.h>
#include <vtkUnstructuredGrid.h>
#include <vtkUnstructuredGridWriter.h>
#include <vtkXMLUnstructuredGridWriter.h>
#include <vtkXMLPUnstructuredGridWriter.h>
using namespace LAMMPS_NS;
// customize by
// * adding an enum constant (add vector components in consecutive order)
// * adding a pack_*(int) function for the value
// * adjusting parse_fields function to add the pack_* function to pack_choice
// (in case of vectors, adjust identify_vectors as well)
// * adjusting thresh part in modify_param and count functions
enum{X,Y,Z, // required for vtk, must come first
ID,MOL,PROC,PROCP1,TYPE,ELEMENT,MASS,
XS,YS,ZS,XSTRI,YSTRI,ZSTRI,XU,YU,ZU,XUTRI,YUTRI,ZUTRI,
XSU,YSU,ZSU,XSUTRI,YSUTRI,ZSUTRI,
IX,IY,IZ,
VX,VY,VZ,FX,FY,FZ,
Q,MUX,MUY,MUZ,MU,RADIUS,DIAMETER,
OMEGAX,OMEGAY,OMEGAZ,ANGMOMX,ANGMOMY,ANGMOMZ,
TQX,TQY,TQZ,
VARIABLE,COMPUTE,FIX,INAME,DNAME,
ATTRIBUTES}; // must come last
enum{LT,LE,GT,GE,EQ,NEQ};
enum{INT,DOUBLE,STRING,BIGINT}; // same as in DumpCFG
enum{VTK,VTP,VTU,PVTP,PVTU}; // file formats
#define INVOKED_PERATOM 8
#define ONEFIELD 32
#define DELTA 1048576
/* ---------------------------------------------------------------------- */
-DumpCustomVTK::DumpCustomVTK(LAMMPS *lmp, int narg, char **arg) :
+DumpVTK::DumpVTK(LAMMPS *lmp, int narg, char **arg) :
DumpCustom(lmp, narg, arg)
{
- if (narg == 5) error->all(FLERR,"No dump custom/vtk arguments specified");
+ if (narg == 5) error->all(FLERR,"No dump vtk arguments specified");
pack_choice.clear();
vtype.clear();
name.clear();
myarrays.clear();
n_calls_ = 0;
// process attributes
// ioptional = start of additional optional args
// only dump image and dump movie styles process optional args
ioptional = parse_fields(narg,arg);
if (ioptional < narg &&
strcmp(style,"image") != 0 && strcmp(style,"movie") != 0)
- error->all(FLERR,"Invalid attribute in dump custom command");
+ error->all(FLERR,"Invalid attribute in dump vtk command");
size_one = pack_choice.size();
current_pack_choice_key = -1;
if (filewriter) reset_vtk_data_containers();
label = NULL;
{
// parallel vtp/vtu requires proc number to be preceded by underscore '_'
multiname_ex = NULL;
char *ptr = strchr(filename,'%');
if (ptr) {
multiname_ex = new char[strlen(filename) + 16];
*ptr = '\0';
sprintf(multiname_ex,"%s_%d%s",filename,me,ptr+1);
*ptr = '%';
}
}
vtk_file_format = VTK;
char *suffix = filename + strlen(filename) - strlen(".vtp");
if (suffix > filename && strcmp(suffix,".vtp") == 0) {
if (multiproc) vtk_file_format = PVTP;
else vtk_file_format = VTP;
} else if (suffix > filename && strcmp(suffix,".vtu") == 0) {
if (multiproc) vtk_file_format = PVTU;
else vtk_file_format = VTU;
}
if (vtk_file_format == VTK) { // no multiproc support for legacy vtk format
if (me != 0) filewriter = 0;
fileproc = 0;
multiproc = 0;
nclusterprocs = nprocs;
}
filecurrent = NULL;
domainfilecurrent = NULL;
parallelfilecurrent = NULL;
header_choice = NULL;
write_choice = NULL;
boxcorners = NULL;
}
/* ---------------------------------------------------------------------- */
-DumpCustomVTK::~DumpCustomVTK()
+DumpVTK::~DumpVTK()
{
delete [] filecurrent;
delete [] domainfilecurrent;
delete [] parallelfilecurrent;
delete [] multiname_ex;
delete [] label;
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::init_style()
+void DumpVTK::init_style()
{
// default for element names = C
if (typenames == NULL) {
typenames = new char*[ntypes+1];
for (int itype = 1; itype <= ntypes; itype++) {
typenames[itype] = new char[2];
strcpy(typenames[itype],"C");
}
}
// setup boundary string
domain->boundary_string(boundstr);
// setup function ptrs
- header_choice = &DumpCustomVTK::header_vtk;
+ header_choice = &DumpVTK::header_vtk;
if (vtk_file_format == VTP || vtk_file_format == PVTP)
- write_choice = &DumpCustomVTK::write_vtp;
+ write_choice = &DumpVTK::write_vtp;
else if (vtk_file_format == VTU || vtk_file_format == PVTU)
- write_choice = &DumpCustomVTK::write_vtu;
+ write_choice = &DumpVTK::write_vtu;
else
- write_choice = &DumpCustomVTK::write_vtk;
+ write_choice = &DumpVTK::write_vtk;
// find current ptr for each compute,fix,variable
// check that fix frequency is acceptable
int icompute;
for (int i = 0; i < ncompute; i++) {
icompute = modify->find_compute(id_compute[i]);
- if (icompute < 0) error->all(FLERR,"Could not find dump custom/vtk compute ID");
+ if (icompute < 0) error->all(FLERR,"Could not find dump vtk compute ID");
compute[i] = modify->compute[icompute];
}
int ifix;
for (int i = 0; i < nfix; i++) {
ifix = modify->find_fix(id_fix[i]);
- if (ifix < 0) error->all(FLERR,"Could not find dump custom/vtk fix ID");
+ if (ifix < 0) error->all(FLERR,"Could not find dump vtk fix ID");
fix[i] = modify->fix[ifix];
if (nevery % modify->fix[ifix]->peratom_freq)
- error->all(FLERR,"Dump custom/vtk and fix not computed at compatible times");
+ error->all(FLERR,"Dump vtk and fix not computed at compatible times");
}
int ivariable;
for (int i = 0; i < nvariable; i++) {
ivariable = input->variable->find(id_variable[i]);
if (ivariable < 0)
- error->all(FLERR,"Could not find dump custom/vtk variable name");
+ error->all(FLERR,"Could not find dump vtk variable name");
variable[i] = ivariable;
}
int icustom;
for (int i = 0; i < ncustom; i++) {
icustom = atom->find_custom(id_custom[i],flag_custom[i]);
if (icustom < 0)
error->all(FLERR,"Could not find custom per-atom property ID");
}
// set index and check validity of region
if (iregion >= 0) {
iregion = domain->find_region(idregion);
if (iregion == -1)
- error->all(FLERR,"Region ID for dump custom/vtk does not exist");
+ error->all(FLERR,"Region ID for dump vtk does not exist");
}
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::write_header(bigint)
+void DumpVTK::write_header(bigint)
{
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::header_vtk(bigint)
+void DumpVTK::header_vtk(bigint)
{
}
/* ---------------------------------------------------------------------- */
-int DumpCustomVTK::count()
+int DumpVTK::count()
{
n_calls_ = 0;
int i;
// grow choose and variable vbuf arrays if needed
int nlocal = atom->nlocal;
if (atom->nmax > maxlocal) {
maxlocal = atom->nmax;
memory->destroy(choose);
memory->destroy(dchoose);
memory->destroy(clist);
memory->create(choose,maxlocal,"dump:choose");
memory->create(dchoose,maxlocal,"dump:dchoose");
memory->create(clist,maxlocal,"dump:clist");
for (i = 0; i < nvariable; i++) {
memory->destroy(vbuf[i]);
memory->create(vbuf[i],maxlocal,"dump:vbuf");
}
}
// invoke Computes for per-atom quantities
// only if within a run or minimize
// else require that computes are current
// this prevents a compute from being invoked by the WriteDump class
if (ncompute) {
if (update->whichflag == 0) {
for (i = 0; i < ncompute; i++)
if (compute[i]->invoked_peratom != update->ntimestep)
error->all(FLERR,"Compute used in dump between runs is not current");
} else {
for (i = 0; i < ncompute; i++) {
if (!(compute[i]->invoked_flag & INVOKED_PERATOM)) {
compute[i]->compute_peratom();
compute[i]->invoked_flag |= INVOKED_PERATOM;
}
}
}
}
// evaluate atom-style Variables for per-atom quantities
if (nvariable)
for (i = 0; i < nvariable; i++)
input->variable->compute_atom(variable[i],igroup,vbuf[i],1,0);
// choose all local atoms for output
for (i = 0; i < nlocal; i++) choose[i] = 1;
// un-choose if not in group
if (igroup) {
int *mask = atom->mask;
for (i = 0; i < nlocal; i++)
if (!(mask[i] & groupbit))
choose[i] = 0;
}
// un-choose if not in region
if (iregion >= 0) {
Region *region = domain->regions[iregion];
region->prematch();
double **x = atom->x;
for (i = 0; i < nlocal; i++)
if (choose[i] && region->match(x[i][0],x[i][1],x[i][2]) == 0)
choose[i] = 0;
}
// un-choose if any threshold criterion isn't met
if (nthresh) {
double *ptr;
double value;
int nstride;
int nlocal = atom->nlocal;
for (int ithresh = 0; ithresh < nthresh; ithresh++) {
// customize by adding to if statement
if (thresh_array[ithresh] == ID) {
tagint *tag = atom->tag;
for (i = 0; i < nlocal; i++) dchoose[i] = tag[i];
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == MOL) {
if (!atom->molecule_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
tagint *molecule = atom->molecule;
for (i = 0; i < nlocal; i++) dchoose[i] = molecule[i];
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == PROC) {
for (i = 0; i < nlocal; i++) dchoose[i] = me;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == PROCP1) {
for (i = 0; i < nlocal; i++) dchoose[i] = me;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == TYPE) {
int *type = atom->type;
for (i = 0; i < nlocal; i++) dchoose[i] = type[i];
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == ELEMENT) {
int *type = atom->type;
for (i = 0; i < nlocal; i++) dchoose[i] = type[i];
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == MASS) {
if (atom->rmass) {
ptr = atom->rmass;
nstride = 1;
} else {
double *mass = atom->mass;
int *type = atom->type;
for (i = 0; i < nlocal; i++) dchoose[i] = mass[type[i]];
ptr = dchoose;
nstride = 1;
}
} else if (thresh_array[ithresh] == X) {
ptr = &atom->x[0][0];
nstride = 3;
} else if (thresh_array[ithresh] == Y) {
ptr = &atom->x[0][1];
nstride = 3;
} else if (thresh_array[ithresh] == Z) {
ptr = &atom->x[0][2];
nstride = 3;
} else if (thresh_array[ithresh] == XS) {
double **x = atom->x;
double boxxlo = domain->boxlo[0];
double invxprd = 1.0/domain->xprd;
for (i = 0; i < nlocal; i++)
dchoose[i] = (x[i][0] - boxxlo) * invxprd;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == YS) {
double **x = atom->x;
double boxylo = domain->boxlo[1];
double invyprd = 1.0/domain->yprd;
for (i = 0; i < nlocal; i++)
dchoose[i] = (x[i][1] - boxylo) * invyprd;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == ZS) {
double **x = atom->x;
double boxzlo = domain->boxlo[2];
double invzprd = 1.0/domain->zprd;
for (i = 0; i < nlocal; i++)
dchoose[i] = (x[i][2] - boxzlo) * invzprd;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == XSTRI) {
double **x = atom->x;
double *boxlo = domain->boxlo;
double *h_inv = domain->h_inv;
for (i = 0; i < nlocal; i++)
dchoose[i] = h_inv[0]*(x[i][0]-boxlo[0]) +
h_inv[5]*(x[i][1]-boxlo[1]) + h_inv[4]*(x[i][2]-boxlo[2]);
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == YSTRI) {
double **x = atom->x;
double *boxlo = domain->boxlo;
double *h_inv = domain->h_inv;
for (i = 0; i < nlocal; i++)
dchoose[i] = h_inv[1]*(x[i][1]-boxlo[1]) +
h_inv[3]*(x[i][2]-boxlo[2]);
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == ZSTRI) {
double **x = atom->x;
double *boxlo = domain->boxlo;
double *h_inv = domain->h_inv;
for (i = 0; i < nlocal; i++)
dchoose[i] = h_inv[2]*(x[i][2]-boxlo[2]);
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == XU) {
double **x = atom->x;
imageint *image = atom->image;
double xprd = domain->xprd;
for (i = 0; i < nlocal; i++)
dchoose[i] = x[i][0] + ((image[i] & IMGMASK) - IMGMAX) * xprd;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == YU) {
double **x = atom->x;
imageint *image = atom->image;
double yprd = domain->yprd;
for (i = 0; i < nlocal; i++)
dchoose[i] = x[i][1] +
((image[i] >> IMGBITS & IMGMASK) - IMGMAX) * yprd;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == ZU) {
double **x = atom->x;
imageint *image = atom->image;
double zprd = domain->zprd;
for (i = 0; i < nlocal; i++)
dchoose[i] = x[i][2] + ((image[i] >> IMG2BITS) - IMGMAX) * zprd;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == XUTRI) {
double **x = atom->x;
imageint *image = atom->image;
double *h = domain->h;
int xbox,ybox,zbox;
for (i = 0; i < nlocal; i++) {
xbox = (image[i] & IMGMASK) - IMGMAX;
ybox = (image[i] >> IMGBITS & IMGMASK) - IMGMAX;
zbox = (image[i] >> IMG2BITS) - IMGMAX;
dchoose[i] = x[i][0] + h[0]*xbox + h[5]*ybox + h[4]*zbox;
}
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == YUTRI) {
double **x = atom->x;
imageint *image = atom->image;
double *h = domain->h;
int ybox,zbox;
for (i = 0; i < nlocal; i++) {
ybox = (image[i] >> IMGBITS & IMGMASK) - IMGMAX;
zbox = (image[i] >> IMG2BITS) - IMGMAX;
dchoose[i] = x[i][1] + h[1]*ybox + h[3]*zbox;
}
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == ZUTRI) {
double **x = atom->x;
imageint *image = atom->image;
double *h = domain->h;
int zbox;
for (i = 0; i < nlocal; i++) {
zbox = (image[i] >> IMG2BITS) - IMGMAX;
dchoose[i] = x[i][2] + h[2]*zbox;
}
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == XSU) {
double **x = atom->x;
imageint *image = atom->image;
double boxxlo = domain->boxlo[0];
double invxprd = 1.0/domain->xprd;
for (i = 0; i < nlocal; i++)
dchoose[i] = (x[i][0] - boxxlo) * invxprd +
(image[i] & IMGMASK) - IMGMAX;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == YSU) {
double **x = atom->x;
imageint *image = atom->image;
double boxylo = domain->boxlo[1];
double invyprd = 1.0/domain->yprd;
for (i = 0; i < nlocal; i++)
dchoose[i] =
(x[i][1] - boxylo) * invyprd +
(image[i] >> IMGBITS & IMGMASK) - IMGMAX;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == ZSU) {
double **x = atom->x;
imageint *image = atom->image;
double boxzlo = domain->boxlo[2];
double invzprd = 1.0/domain->zprd;
for (i = 0; i < nlocal; i++)
dchoose[i] = (x[i][2] - boxzlo) * invzprd +
(image[i] >> IMG2BITS) - IMGMAX;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == XSUTRI) {
double **x = atom->x;
imageint *image = atom->image;
double *boxlo = domain->boxlo;
double *h_inv = domain->h_inv;
for (i = 0; i < nlocal; i++)
dchoose[i] = h_inv[0]*(x[i][0]-boxlo[0]) +
h_inv[5]*(x[i][1]-boxlo[1]) +
h_inv[4]*(x[i][2]-boxlo[2]) +
(image[i] & IMGMASK) - IMGMAX;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == YSUTRI) {
double **x = atom->x;
imageint *image = atom->image;
double *boxlo = domain->boxlo;
double *h_inv = domain->h_inv;
for (i = 0; i < nlocal; i++)
dchoose[i] = h_inv[1]*(x[i][1]-boxlo[1]) +
h_inv[3]*(x[i][2]-boxlo[2]) +
(image[i] >> IMGBITS & IMGMASK) - IMGMAX;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == ZSUTRI) {
double **x = atom->x;
imageint *image = atom->image;
double *boxlo = domain->boxlo;
double *h_inv = domain->h_inv;
for (i = 0; i < nlocal; i++)
dchoose[i] = h_inv[2]*(x[i][2]-boxlo[2]) +
(image[i] >> IMG2BITS) - IMGMAX;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == IX) {
imageint *image = atom->image;
for (i = 0; i < nlocal; i++)
dchoose[i] = (image[i] & IMGMASK) - IMGMAX;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == IY) {
imageint *image = atom->image;
for (i = 0; i < nlocal; i++)
dchoose[i] = (image[i] >> IMGBITS & IMGMASK) - IMGMAX;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == IZ) {
imageint *image = atom->image;
for (i = 0; i < nlocal; i++)
dchoose[i] = (image[i] >> IMG2BITS) - IMGMAX;
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == VX) {
ptr = &atom->v[0][0];
nstride = 3;
} else if (thresh_array[ithresh] == VY) {
ptr = &atom->v[0][1];
nstride = 3;
} else if (thresh_array[ithresh] == VZ) {
ptr = &atom->v[0][2];
nstride = 3;
} else if (thresh_array[ithresh] == FX) {
ptr = &atom->f[0][0];
nstride = 3;
} else if (thresh_array[ithresh] == FY) {
ptr = &atom->f[0][1];
nstride = 3;
} else if (thresh_array[ithresh] == FZ) {
ptr = &atom->f[0][2];
nstride = 3;
} else if (thresh_array[ithresh] == Q) {
if (!atom->q_flag)
error->all(FLERR,"Threshold for an atom property that isn't allocated");
ptr = atom->q;
nstride = 1;
} else if (thresh_array[ithresh] == MUX) {
if (!atom->mu_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->mu[0][0];
nstride = 4;
} else if (thresh_array[ithresh] == MUY) {
if (!atom->mu_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->mu[0][1];
nstride = 4;
} else if (thresh_array[ithresh] == MUZ) {
if (!atom->mu_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->mu[0][2];
nstride = 4;
} else if (thresh_array[ithresh] == MU) {
if (!atom->mu_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->mu[0][3];
nstride = 4;
} else if (thresh_array[ithresh] == RADIUS) {
if (!atom->radius_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = atom->radius;
nstride = 1;
} else if (thresh_array[ithresh] == DIAMETER) {
if (!atom->radius_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
double *radius = atom->radius;
for (i = 0; i < nlocal; i++) dchoose[i] = 2.0*radius[i];
ptr = dchoose;
nstride = 1;
} else if (thresh_array[ithresh] == OMEGAX) {
if (!atom->omega_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->omega[0][0];
nstride = 3;
} else if (thresh_array[ithresh] == OMEGAY) {
if (!atom->omega_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->omega[0][1];
nstride = 3;
} else if (thresh_array[ithresh] == OMEGAZ) {
if (!atom->omega_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->omega[0][2];
nstride = 3;
} else if (thresh_array[ithresh] == ANGMOMX) {
if (!atom->angmom_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->angmom[0][0];
nstride = 3;
} else if (thresh_array[ithresh] == ANGMOMY) {
if (!atom->angmom_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->angmom[0][1];
nstride = 3;
} else if (thresh_array[ithresh] == ANGMOMZ) {
if (!atom->angmom_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->angmom[0][2];
nstride = 3;
} else if (thresh_array[ithresh] == TQX) {
if (!atom->torque_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->torque[0][0];
nstride = 3;
} else if (thresh_array[ithresh] == TQY) {
if (!atom->torque_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->torque[0][1];
nstride = 3;
} else if (thresh_array[ithresh] == TQZ) {
if (!atom->torque_flag)
error->all(FLERR,
"Threshold for an atom property that isn't allocated");
ptr = &atom->torque[0][2];
nstride = 3;
} else if (thresh_array[ithresh] == COMPUTE) {
i = ATTRIBUTES + nfield + ithresh;
if (argindex[i] == 0) {
ptr = compute[field2index[i]]->vector_atom;
nstride = 1;
} else {
ptr = &compute[field2index[i]]->array_atom[0][argindex[i]-1];
nstride = compute[field2index[i]]->size_peratom_cols;
}
} else if (thresh_array[ithresh] == FIX) {
i = ATTRIBUTES + nfield + ithresh;
if (argindex[i] == 0) {
ptr = fix[field2index[i]]->vector_atom;
nstride = 1;
} else {
ptr = &fix[field2index[i]]->array_atom[0][argindex[i]-1];
nstride = fix[field2index[i]]->size_peratom_cols;
}
} else if (thresh_array[ithresh] == VARIABLE) {
i = ATTRIBUTES + nfield + ithresh;
ptr = vbuf[field2index[i]];
nstride = 1;
} else if (thresh_array[ithresh] == DNAME) {
int iwhich,tmp;
i = ATTRIBUTES + nfield + ithresh;
iwhich = atom->find_custom(id_custom[field2index[i]],tmp);
ptr = atom->dvector[iwhich];
nstride = 1;
} else if (thresh_array[ithresh] == INAME) {
int iwhich,tmp;
i = ATTRIBUTES + nfield + ithresh;
iwhich = atom->find_custom(id_custom[field2index[i]],tmp);
int *ivector = atom->ivector[iwhich];
for (i = 0; i < nlocal; i++)
dchoose[i] = ivector[i];
ptr = dchoose;
nstride = 1;
}
// unselect atoms that don't meet threshold criterion
value = thresh_value[ithresh];
switch (thresh_op[ithresh]) {
case LT:
for (i = 0; i < nlocal; i++, ptr += nstride)
if (choose[i] && *ptr >= value) choose[i] = 0;
break;
case LE:
for (i = 0; i < nlocal; i++, ptr += nstride)
if (choose[i] && *ptr > value) choose[i] = 0;
break;
case GT:
for (i = 0; i < nlocal; i++, ptr += nstride)
if (choose[i] && *ptr <= value) choose[i] = 0;
break;
case GE:
for (i = 0; i < nlocal; i++, ptr += nstride)
if (choose[i] && *ptr < value) choose[i] = 0;
break;
case EQ:
for (i = 0; i < nlocal; i++, ptr += nstride)
if (choose[i] && *ptr != value) choose[i] = 0;
break;
case NEQ:
for (i = 0; i < nlocal; i++, ptr += nstride)
if (choose[i] && *ptr == value) choose[i] = 0;
break;
}
}
}
// compress choose flags into clist
// nchoose = # of selected atoms
// clist[i] = local index of each selected atom
nchoose = 0;
for (i = 0; i < nlocal; i++)
if (choose[i]) clist[nchoose++] = i;
return nchoose;
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::write()
+void DumpVTK::write()
{
// simulation box bounds
if (domain->triclinic == 0) {
boxxlo = domain->boxlo[0];
boxxhi = domain->boxhi[0];
boxylo = domain->boxlo[1];
boxyhi = domain->boxhi[1];
boxzlo = domain->boxlo[2];
boxzhi = domain->boxhi[2];
} else {
domain->box_corners();
boxcorners = domain->corners;
}
// nme = # of dump lines this proc contributes to dump
nme = count();
// ntotal = total # of dump lines in snapshot
// nmax = max # of dump lines on any proc
bigint bnme = nme;
MPI_Allreduce(&bnme,&ntotal,1,MPI_LMP_BIGINT,MPI_SUM,world);
int nmax;
if (multiproc != nprocs) MPI_Allreduce(&nme,&nmax,1,MPI_INT,MPI_MAX,world);
else nmax = nme;
// write timestep header
// for multiproc,
// nheader = # of lines in this file via Allreduce on clustercomm
bigint nheader = ntotal;
if (multiproc)
MPI_Allreduce(&bnme,&nheader,1,MPI_LMP_BIGINT,MPI_SUM,clustercomm);
if (filewriter) write_header(nheader);
// insure buf is sized for packing and communicating
// use nmax to insure filewriter proc can receive info from others
// limit nmax*size_one to int since used as arg in MPI calls
if (nmax > maxbuf) {
if ((bigint) nmax * size_one > MAXSMALLINT)
error->all(FLERR,"Too much per-proc info for dump");
maxbuf = nmax;
memory->destroy(buf);
memory->create(buf,maxbuf*size_one,"dump:buf");
}
// insure ids buffer is sized for sorting
if (sort_flag && sortcol == 0 && nmax > maxids) {
maxids = nmax;
memory->destroy(ids);
memory->create(ids,maxids,"dump:ids");
}
// pack my data into buf
// if sorting on IDs also request ID list from pack()
// sort buf as needed
if (sort_flag && sortcol == 0) pack(ids);
else pack(NULL);
if (sort_flag) sort();
// filewriter = 1 = this proc writes to file
// ping each proc in my cluster, receive its data, write data to file
// else wait for ping from fileproc, send my data to fileproc
int tmp,nlines;
MPI_Status status;
MPI_Request request;
// comm and output buf of doubles
if (filewriter) {
for (int iproc = 0; iproc < nclusterprocs; iproc++) {
if (iproc) {
MPI_Irecv(buf,maxbuf*size_one,MPI_DOUBLE,me+iproc,0,world,&request);
MPI_Send(&tmp,0,MPI_INT,me+iproc,0,world);
MPI_Wait(&request,&status);
MPI_Get_count(&status,MPI_DOUBLE,&nlines);
nlines /= size_one;
} else nlines = nme;
write_data(nlines,buf);
}
} else {
MPI_Recv(&tmp,0,MPI_INT,fileproc,0,world,&status);
MPI_Rsend(buf,nme*size_one,MPI_DOUBLE,fileproc,0,world);
}
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::pack(tagint *ids)
+void DumpVTK::pack(tagint *ids)
{
int n = 0;
for (std::map<int,FnPtrPack>::iterator it=pack_choice.begin(); it!=pack_choice.end(); ++it, ++n) {
current_pack_choice_key = it->first; // work-around for pack_compute, pack_fix, pack_variable
(this->*(it->second))(n);
}
if (ids) {
tagint *tag = atom->tag;
for (int i = 0; i < nchoose; i++)
ids[i] = tag[clist[i]];
}
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::write_data(int n, double *mybuf)
+void DumpVTK::write_data(int n, double *mybuf)
{
(this->*write_choice)(n,mybuf);
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::setFileCurrent() {
+void DumpVTK::setFileCurrent() {
delete [] filecurrent;
filecurrent = NULL;
char *filestar = filename;
if (multiproc) {
if (multiproc > 1) { // if dump_modify fileper or nfile was used
delete [] multiname_ex;
multiname_ex = NULL;
char *ptr = strchr(filename,'%');
if (ptr) {
int id;
if (me + nclusterprocs == nprocs) // last filewriter
id = multiproc -1;
else
id = me/nclusterprocs;
multiname_ex = new char[strlen(filename) + 16];
*ptr = '\0';
sprintf(multiname_ex,"%s_%d%s",filename,id,ptr+1);
*ptr = '%';
}
} // else multiname_ex built in constructor is OK
filestar = multiname_ex;
}
if (multifile == 0) {
filecurrent = new char[strlen(filestar) + 1];
strcpy(filecurrent, filestar);
} else {
filecurrent = new char[strlen(filestar) + 16];
char *ptr = strchr(filestar,'*');
*ptr = '\0';
if (padflag == 0) {
sprintf(filecurrent,"%s" BIGINT_FORMAT "%s",
filestar,update->ntimestep,ptr+1);
} else {
char bif[8],pad[16];
strcpy(bif,BIGINT_FORMAT);
sprintf(pad,"%%s%%0%d%s%%s",padflag,&bif[1]);
sprintf(filecurrent,pad,filestar,update->ntimestep,ptr+1);
}
*ptr = '*';
}
// filename of domain box data file
delete [] domainfilecurrent;
domainfilecurrent = NULL;
if (multiproc) {
// remove '%' character
char *ptr = strchr(filename,'%');
domainfilecurrent = new char[strlen(filename)];
*ptr = '\0';
sprintf(domainfilecurrent,"%s%s",filename,ptr+1);
*ptr = '%';
// insert "_boundingBox" string
ptr = strrchr(domainfilecurrent,'.');
filestar = new char[strlen(domainfilecurrent)+16];
*ptr = '\0';
sprintf(filestar,"%s_boundingBox.%s",domainfilecurrent,ptr+1);
delete [] domainfilecurrent;
domainfilecurrent = NULL;
if (multifile == 0) {
domainfilecurrent = new char[strlen(filestar) + 1];
strcpy(domainfilecurrent, filestar);
} else {
domainfilecurrent = new char[strlen(filestar) + 16];
char *ptr = strchr(filestar,'*');
*ptr = '\0';
if (padflag == 0) {
sprintf(domainfilecurrent,"%s" BIGINT_FORMAT "%s",
filestar,update->ntimestep,ptr+1);
} else {
char bif[8],pad[16];
strcpy(bif,BIGINT_FORMAT);
sprintf(pad,"%%s%%0%d%s%%s",padflag,&bif[1]);
sprintf(domainfilecurrent,pad,filestar,update->ntimestep,ptr+1);
}
*ptr = '*';
}
delete [] filestar;
filestar = NULL;
} else {
domainfilecurrent = new char[strlen(filecurrent) + 16];
char *ptr = strrchr(filecurrent,'.');
*ptr = '\0';
sprintf(domainfilecurrent,"%s_boundingBox.%s",filecurrent,ptr+1);
*ptr = '.';
}
// filename of parallel file
if (multiproc && me == 0) {
delete [] parallelfilecurrent;
parallelfilecurrent = NULL;
// remove '%' character and add 'p' to file extension
// -> string length stays the same
char *ptr = strchr(filename,'%');
filestar = new char[strlen(filename) + 1];
*ptr = '\0';
sprintf(filestar,"%s%s",filename,ptr+1);
*ptr = '%';
ptr = strrchr(filestar,'.');
ptr++;
*ptr++='p';
*ptr++='v';
*ptr++='t';
*ptr++= (vtk_file_format == PVTP)?'p':'u';
*ptr++= 0;
if (multifile == 0) {
parallelfilecurrent = new char[strlen(filestar) + 1];
strcpy(parallelfilecurrent, filestar);
} else {
parallelfilecurrent = new char[strlen(filestar) + 16];
char *ptr = strchr(filestar,'*');
*ptr = '\0';
if (padflag == 0) {
sprintf(parallelfilecurrent,"%s" BIGINT_FORMAT "%s",
filestar,update->ntimestep,ptr+1);
} else {
char bif[8],pad[16];
strcpy(bif,BIGINT_FORMAT);
sprintf(pad,"%%s%%0%d%s%%s",padflag,&bif[1]);
sprintf(parallelfilecurrent,pad,filestar,update->ntimestep,ptr+1);
}
*ptr = '*';
}
delete [] filestar;
filestar = NULL;
}
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::buf2arrays(int n, double *mybuf)
+void DumpVTK::buf2arrays(int n, double *mybuf)
{
for (int iatom=0; iatom < n; ++iatom) {
vtkIdType pid[1];
pid[0] = points->InsertNextPoint(mybuf[iatom*size_one],mybuf[iatom*size_one+1],mybuf[iatom*size_one+2]);
int j=3; // 0,1,2 = x,y,z handled just above
for (std::map<int, vtkSmartPointer<vtkAbstractArray> >::iterator it=myarrays.begin(); it!=myarrays.end(); ++it) {
vtkAbstractArray *paa = it->second;
if (it->second->GetNumberOfComponents() == 3) {
switch (vtype[it->first]) {
case INT:
{
int iv3[3] = { static_cast<int>(mybuf[iatom*size_one+j ]),
static_cast<int>(mybuf[iatom*size_one+j+1]),
static_cast<int>(mybuf[iatom*size_one+j+2]) };
vtkIntArray *pia = static_cast<vtkIntArray*>(paa);
pia->InsertNextTupleValue(iv3);
break;
}
case DOUBLE:
{
vtkDoubleArray *pda = static_cast<vtkDoubleArray*>(paa);
pda->InsertNextTupleValue(&mybuf[iatom*size_one+j]);
break;
}
}
j+=3;
} else {
switch (vtype[it->first]) {
case INT:
{
vtkIntArray *pia = static_cast<vtkIntArray*>(paa);
pia->InsertNextValue(mybuf[iatom*size_one+j]);
break;
}
case DOUBLE:
{
vtkDoubleArray *pda = static_cast<vtkDoubleArray*>(paa);
pda->InsertNextValue(mybuf[iatom*size_one+j]);
break;
}
case STRING:
{
vtkStringArray *psa = static_cast<vtkStringArray*>(paa);
psa->InsertNextValue(typenames[static_cast<int>(mybuf[iatom*size_one+j])]);
break;
}
}
++j;
}
}
pointsCells->InsertNextCell(1,pid);
}
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::prepare_domain_data(vtkRectilinearGrid *rgrid)
+void DumpVTK::prepare_domain_data(vtkRectilinearGrid *rgrid)
{
vtkSmartPointer<vtkDoubleArray> xCoords = vtkSmartPointer<vtkDoubleArray>::New();
xCoords->InsertNextValue(boxxlo);
xCoords->InsertNextValue(boxxhi);
vtkSmartPointer<vtkDoubleArray> yCoords = vtkSmartPointer<vtkDoubleArray>::New();
yCoords->InsertNextValue(boxylo);
yCoords->InsertNextValue(boxyhi);
vtkSmartPointer<vtkDoubleArray> zCoords = vtkSmartPointer<vtkDoubleArray>::New();
zCoords->InsertNextValue(boxzlo);
zCoords->InsertNextValue(boxzhi);
rgrid->SetDimensions(2,2,2);
rgrid->SetXCoordinates(xCoords);
rgrid->SetYCoordinates(yCoords);
rgrid->SetZCoordinates(zCoords);
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::prepare_domain_data_triclinic(vtkUnstructuredGrid *hexahedronGrid)
+void DumpVTK::prepare_domain_data_triclinic(vtkUnstructuredGrid *hexahedronGrid)
{
vtkSmartPointer<vtkPoints> hexahedronPoints = vtkSmartPointer<vtkPoints>::New();
hexahedronPoints->SetNumberOfPoints(8);
hexahedronPoints->InsertPoint(0, boxcorners[0][0], boxcorners[0][1], boxcorners[0][2]);
hexahedronPoints->InsertPoint(1, boxcorners[1][0], boxcorners[1][1], boxcorners[1][2]);
hexahedronPoints->InsertPoint(2, boxcorners[3][0], boxcorners[3][1], boxcorners[3][2]);
hexahedronPoints->InsertPoint(3, boxcorners[2][0], boxcorners[2][1], boxcorners[2][2]);
hexahedronPoints->InsertPoint(4, boxcorners[4][0], boxcorners[4][1], boxcorners[4][2]);
hexahedronPoints->InsertPoint(5, boxcorners[5][0], boxcorners[5][1], boxcorners[5][2]);
hexahedronPoints->InsertPoint(6, boxcorners[7][0], boxcorners[7][1], boxcorners[7][2]);
hexahedronPoints->InsertPoint(7, boxcorners[6][0], boxcorners[6][1], boxcorners[6][2]);
vtkSmartPointer<vtkHexahedron> hexahedron = vtkSmartPointer<vtkHexahedron>::New();
hexahedron->GetPointIds()->SetId(0, 0);
hexahedron->GetPointIds()->SetId(1, 1);
hexahedron->GetPointIds()->SetId(2, 2);
hexahedron->GetPointIds()->SetId(3, 3);
hexahedron->GetPointIds()->SetId(4, 4);
hexahedron->GetPointIds()->SetId(5, 5);
hexahedron->GetPointIds()->SetId(6, 6);
hexahedron->GetPointIds()->SetId(7, 7);
hexahedronGrid->Allocate(1, 1);
hexahedronGrid->InsertNextCell(hexahedron->GetCellType(),
hexahedron->GetPointIds());
hexahedronGrid->SetPoints(hexahedronPoints);
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::write_domain_vtk()
+void DumpVTK::write_domain_vtk()
{
vtkSmartPointer<vtkRectilinearGrid> rgrid = vtkSmartPointer<vtkRectilinearGrid>::New();
prepare_domain_data(rgrid.GetPointer());
vtkSmartPointer<vtkRectilinearGridWriter> gwriter = vtkSmartPointer<vtkRectilinearGridWriter>::New();
if(label) gwriter->SetHeader(label);
else gwriter->SetHeader("Generated by LAMMPS");
if (binary) gwriter->SetFileTypeToBinary();
else gwriter->SetFileTypeToASCII();
#if VTK_MAJOR_VERSION < 6
gwriter->SetInput(rgrid);
#else
gwriter->SetInputData(rgrid);
#endif
gwriter->SetFileName(domainfilecurrent);
gwriter->Write();
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::write_domain_vtk_triclinic()
+void DumpVTK::write_domain_vtk_triclinic()
{
vtkSmartPointer<vtkUnstructuredGrid> hexahedronGrid = vtkSmartPointer<vtkUnstructuredGrid>::New();
prepare_domain_data_triclinic(hexahedronGrid.GetPointer());
vtkSmartPointer<vtkUnstructuredGridWriter> gwriter = vtkSmartPointer<vtkUnstructuredGridWriter>::New();
if(label) gwriter->SetHeader(label);
else gwriter->SetHeader("Generated by LAMMPS");
if (binary) gwriter->SetFileTypeToBinary();
else gwriter->SetFileTypeToASCII();
#if VTK_MAJOR_VERSION < 6
gwriter->SetInput(hexahedronGrid);
#else
gwriter->SetInputData(hexahedronGrid);
#endif
gwriter->SetFileName(domainfilecurrent);
gwriter->Write();
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::write_domain_vtr()
+void DumpVTK::write_domain_vtr()
{
vtkSmartPointer<vtkRectilinearGrid> rgrid = vtkSmartPointer<vtkRectilinearGrid>::New();
prepare_domain_data(rgrid.GetPointer());
vtkSmartPointer<vtkXMLRectilinearGridWriter> gwriter = vtkSmartPointer<vtkXMLRectilinearGridWriter>::New();
if (binary) gwriter->SetDataModeToBinary();
else gwriter->SetDataModeToAscii();
#if VTK_MAJOR_VERSION < 6
gwriter->SetInput(rgrid);
#else
gwriter->SetInputData(rgrid);
#endif
gwriter->SetFileName(domainfilecurrent);
gwriter->Write();
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::write_domain_vtu_triclinic()
+void DumpVTK::write_domain_vtu_triclinic()
{
vtkSmartPointer<vtkUnstructuredGrid> hexahedronGrid = vtkSmartPointer<vtkUnstructuredGrid>::New();
prepare_domain_data_triclinic(hexahedronGrid.GetPointer());
vtkSmartPointer<vtkXMLUnstructuredGridWriter> gwriter = vtkSmartPointer<vtkXMLUnstructuredGridWriter>::New();
if (binary) gwriter->SetDataModeToBinary();
else gwriter->SetDataModeToAscii();
#if VTK_MAJOR_VERSION < 6
gwriter->SetInput(hexahedronGrid);
#else
gwriter->SetInputData(hexahedronGrid);
#endif
gwriter->SetFileName(domainfilecurrent);
gwriter->Write();
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::write_vtk(int n, double *mybuf)
+void DumpVTK::write_vtk(int n, double *mybuf)
{
++n_calls_;
buf2arrays(n, mybuf);
if (n_calls_ < nclusterprocs)
return; // multiple processors but only proc 0 is a filewriter (-> nclusterprocs procs contribute to the filewriter's output data)
setFileCurrent();
{
#ifdef UNSTRUCTURED_GRID_VTK
vtkSmartPointer<vtkUnstructuredGrid> unstructuredGrid = vtkSmartPointer<vtkUnstructuredGrid>::New();
unstructuredGrid->SetPoints(points);
unstructuredGrid->SetCells(VTK_VERTEX, pointsCells);
for (std::map<int, vtkSmartPointer<vtkAbstractArray> >::iterator it=myarrays.begin(); it!=myarrays.end(); ++it) {
unstructuredGrid->GetPointData()->AddArray(it->second);
}
vtkSmartPointer<vtkUnstructuredGridWriter> writer = vtkSmartPointer<vtkUnstructuredGridWriter>::New();
#else
vtkSmartPointer<vtkPolyData> polyData = vtkSmartPointer<vtkPolyData>::New();
polyData->SetPoints(points);
polyData->SetVerts(pointsCells);
for (std::map<int, vtkSmartPointer<vtkAbstractArray> >::iterator it=myarrays.begin(); it!=myarrays.end(); ++it) {
polyData->GetPointData()->AddArray(it->second);
}
vtkSmartPointer<vtkPolyDataWriter> writer = vtkSmartPointer<vtkPolyDataWriter>::New();
#endif
if(label) writer->SetHeader(label);
else writer->SetHeader("Generated by LAMMPS");
if (binary) writer->SetFileTypeToBinary();
else writer->SetFileTypeToASCII();
#ifdef UNSTRUCTURED_GRID_VTK
#if VTK_MAJOR_VERSION < 6
writer->SetInput(unstructuredGrid);
#else
writer->SetInputData(unstructuredGrid);
#endif
#else
#if VTK_MAJOR_VERSION < 6
writer->SetInput(polyData);
#else
writer->SetInputData(polyData);
#endif
#endif
writer->SetFileName(filecurrent);
writer->Write();
if (domain->triclinic == 0)
write_domain_vtk();
else
write_domain_vtk_triclinic();
}
reset_vtk_data_containers();
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::write_vtp(int n, double *mybuf)
+void DumpVTK::write_vtp(int n, double *mybuf)
{
++n_calls_;
buf2arrays(n, mybuf);
if (n_calls_ < nclusterprocs)
return; // multiple processors but not all are filewriters (-> nclusterprocs procs contribute to the filewriter's output data)
setFileCurrent();
{
vtkSmartPointer<vtkPolyData> polyData = vtkSmartPointer<vtkPolyData>::New();
polyData->SetPoints(points);
polyData->SetVerts(pointsCells);
for (std::map<int, vtkSmartPointer<vtkAbstractArray> >::iterator it=myarrays.begin(); it!=myarrays.end(); ++it) {
polyData->GetPointData()->AddArray(it->second);
}
vtkSmartPointer<vtkXMLPolyDataWriter> writer = vtkSmartPointer<vtkXMLPolyDataWriter>::New();
if (binary) writer->SetDataModeToBinary();
else writer->SetDataModeToAscii();
#if VTK_MAJOR_VERSION < 6
writer->SetInput(polyData);
#else
writer->SetInputData(polyData);
#endif
writer->SetFileName(filecurrent);
writer->Write();
if (me == 0) {
if (multiproc) {
vtkSmartPointer<vtkXMLPPolyDataWriter> pwriter = vtkSmartPointer<vtkXMLPPolyDataWriter>::New();
pwriter->SetFileName(parallelfilecurrent);
pwriter->SetNumberOfPieces((multiproc > 1)?multiproc:nprocs);
if (binary) pwriter->SetDataModeToBinary();
else pwriter->SetDataModeToAscii();
#if VTK_MAJOR_VERSION < 6
pwriter->SetInput(polyData);
#else
pwriter->SetInputData(polyData);
#endif
pwriter->Write();
}
if (domain->triclinic == 0) {
domainfilecurrent[strlen(domainfilecurrent)-1] = 'r'; // adjust filename extension
write_domain_vtr();
} else {
domainfilecurrent[strlen(domainfilecurrent)-1] = 'u'; // adjust filename extension
write_domain_vtu_triclinic();
}
}
}
reset_vtk_data_containers();
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::write_vtu(int n, double *mybuf)
+void DumpVTK::write_vtu(int n, double *mybuf)
{
++n_calls_;
buf2arrays(n, mybuf);
if (n_calls_ < nclusterprocs)
return; // multiple processors but not all are filewriters (-> nclusterprocs procs contribute to the filewriter's output data)
setFileCurrent();
{
vtkSmartPointer<vtkUnstructuredGrid> unstructuredGrid = vtkSmartPointer<vtkUnstructuredGrid>::New();
unstructuredGrid->SetPoints(points);
unstructuredGrid->SetCells(VTK_VERTEX, pointsCells);
for (std::map<int, vtkSmartPointer<vtkAbstractArray> >::iterator it=myarrays.begin(); it!=myarrays.end(); ++it) {
unstructuredGrid->GetPointData()->AddArray(it->second);
}
vtkSmartPointer<vtkXMLUnstructuredGridWriter> writer = vtkSmartPointer<vtkXMLUnstructuredGridWriter>::New();
if (binary) writer->SetDataModeToBinary();
else writer->SetDataModeToAscii();
#if VTK_MAJOR_VERSION < 6
writer->SetInput(unstructuredGrid);
#else
writer->SetInputData(unstructuredGrid);
#endif
writer->SetFileName(filecurrent);
writer->Write();
if (me == 0) {
if (multiproc) {
vtkSmartPointer<vtkXMLPUnstructuredGridWriter> pwriter = vtkSmartPointer<vtkXMLPUnstructuredGridWriter>::New();
pwriter->SetFileName(parallelfilecurrent);
pwriter->SetNumberOfPieces((multiproc > 1)?multiproc:nprocs);
if (binary) pwriter->SetDataModeToBinary();
else pwriter->SetDataModeToAscii();
#if VTK_MAJOR_VERSION < 6
pwriter->SetInput(unstructuredGrid);
#else
pwriter->SetInputData(unstructuredGrid);
#endif
pwriter->Write();
}
if (domain->triclinic == 0) {
domainfilecurrent[strlen(domainfilecurrent)-1] = 'r'; // adjust filename extension
write_domain_vtr();
} else {
write_domain_vtu_triclinic();
}
}
}
reset_vtk_data_containers();
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::reset_vtk_data_containers()
+void DumpVTK::reset_vtk_data_containers()
{
points = vtkSmartPointer<vtkPoints>::New();
pointsCells = vtkSmartPointer<vtkCellArray>::New();
std::map<int,int>::iterator it=vtype.begin();
++it; ++it; ++it;
for (; it!=vtype.end(); ++it) {
switch(vtype[it->first]) {
case INT:
myarrays[it->first] = vtkSmartPointer<vtkIntArray>::New();
break;
case DOUBLE:
myarrays[it->first] = vtkSmartPointer<vtkDoubleArray>::New();
break;
case STRING:
myarrays[it->first] = vtkSmartPointer<vtkStringArray>::New();
break;
}
if (vector_set.find(it->first) != vector_set.end()) {
myarrays[it->first]->SetNumberOfComponents(3);
myarrays[it->first]->SetName(name[it->first].c_str());
++it; ++it;
} else {
myarrays[it->first]->SetName(name[it->first].c_str());
}
}
}
/* ---------------------------------------------------------------------- */
-int DumpCustomVTK::parse_fields(int narg, char **arg)
+int DumpVTK::parse_fields(int narg, char **arg)
{
- pack_choice[X] = &DumpCustomVTK::pack_x;
+ pack_choice[X] = &DumpVTK::pack_x;
vtype[X] = DOUBLE;
name[X] = "x";
- pack_choice[Y] = &DumpCustomVTK::pack_y;
+ pack_choice[Y] = &DumpVTK::pack_y;
vtype[Y] = DOUBLE;
name[Y] = "y";
- pack_choice[Z] = &DumpCustomVTK::pack_z;
+ pack_choice[Z] = &DumpVTK::pack_z;
vtype[Z] = DOUBLE;
name[Z] = "z";
// customize by adding to if statement
int i;
for (int iarg = 5; iarg < narg; iarg++) {
i = iarg-5;
if (strcmp(arg[iarg],"id") == 0) {
- pack_choice[ID] = &DumpCustomVTK::pack_id;
+ pack_choice[ID] = &DumpVTK::pack_id;
vtype[ID] = INT;
name[ID] = arg[iarg];
} else if (strcmp(arg[iarg],"mol") == 0) {
if (!atom->molecule_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[MOL] = &DumpCustomVTK::pack_molecule;
+ pack_choice[MOL] = &DumpVTK::pack_molecule;
vtype[MOL] = INT;
name[MOL] = arg[iarg];
} else if (strcmp(arg[iarg],"proc") == 0) {
- pack_choice[PROC] = &DumpCustomVTK::pack_proc;
+ pack_choice[PROC] = &DumpVTK::pack_proc;
vtype[PROC] = INT;
name[PROC] = arg[iarg];
} else if (strcmp(arg[iarg],"procp1") == 0) {
- pack_choice[PROCP1] = &DumpCustomVTK::pack_procp1;
+ pack_choice[PROCP1] = &DumpVTK::pack_procp1;
vtype[PROCP1] = INT;
name[PROCP1] = arg[iarg];
} else if (strcmp(arg[iarg],"type") == 0) {
- pack_choice[TYPE] = &DumpCustomVTK::pack_type;
+ pack_choice[TYPE] = &DumpVTK::pack_type;
vtype[TYPE] = INT;
name[TYPE] =arg[iarg];
} else if (strcmp(arg[iarg],"element") == 0) {
- pack_choice[ELEMENT] = &DumpCustomVTK::pack_type;
+ pack_choice[ELEMENT] = &DumpVTK::pack_type;
vtype[ELEMENT] = STRING;
name[ELEMENT] = arg[iarg];
} else if (strcmp(arg[iarg],"mass") == 0) {
- pack_choice[MASS] = &DumpCustomVTK::pack_mass;
+ pack_choice[MASS] = &DumpVTK::pack_mass;
vtype[MASS] = DOUBLE;
name[MASS] = arg[iarg];
} else if (strcmp(arg[iarg],"x") == 0) {
// required property
} else if (strcmp(arg[iarg],"y") == 0) {
// required property
} else if (strcmp(arg[iarg],"z") == 0) {
// required property
} else if (strcmp(arg[iarg],"xs") == 0) {
- if (domain->triclinic) pack_choice[XS] = &DumpCustomVTK::pack_xs_triclinic;
- else pack_choice[XS] = &DumpCustomVTK::pack_xs;
+ if (domain->triclinic) pack_choice[XS] = &DumpVTK::pack_xs_triclinic;
+ else pack_choice[XS] = &DumpVTK::pack_xs;
vtype[XS] = DOUBLE;
name[XS] = arg[iarg];
} else if (strcmp(arg[iarg],"ys") == 0) {
- if (domain->triclinic) pack_choice[YS] = &DumpCustomVTK::pack_ys_triclinic;
- else pack_choice[YS] = &DumpCustomVTK::pack_ys;
+ if (domain->triclinic) pack_choice[YS] = &DumpVTK::pack_ys_triclinic;
+ else pack_choice[YS] = &DumpVTK::pack_ys;
vtype[YS] = DOUBLE;
name[YS] = arg[iarg];
} else if (strcmp(arg[iarg],"zs") == 0) {
- if (domain->triclinic) pack_choice[ZS] = &DumpCustomVTK::pack_zs_triclinic;
- else pack_choice[ZS] = &DumpCustomVTK::pack_zs;
+ if (domain->triclinic) pack_choice[ZS] = &DumpVTK::pack_zs_triclinic;
+ else pack_choice[ZS] = &DumpVTK::pack_zs;
vtype[ZS] = DOUBLE;
name[ZS] = arg[iarg];
} else if (strcmp(arg[iarg],"xu") == 0) {
- if (domain->triclinic) pack_choice[XU] = &DumpCustomVTK::pack_xu_triclinic;
- else pack_choice[XU] = &DumpCustomVTK::pack_xu;
+ if (domain->triclinic) pack_choice[XU] = &DumpVTK::pack_xu_triclinic;
+ else pack_choice[XU] = &DumpVTK::pack_xu;
vtype[XU] = DOUBLE;
name[XU] = arg[iarg];
} else if (strcmp(arg[iarg],"yu") == 0) {
- if (domain->triclinic) pack_choice[YU] = &DumpCustomVTK::pack_yu_triclinic;
- else pack_choice[YU] = &DumpCustomVTK::pack_yu;
+ if (domain->triclinic) pack_choice[YU] = &DumpVTK::pack_yu_triclinic;
+ else pack_choice[YU] = &DumpVTK::pack_yu;
vtype[YU] = DOUBLE;
name[YU] = arg[iarg];
} else if (strcmp(arg[iarg],"zu") == 0) {
- if (domain->triclinic) pack_choice[ZU] = &DumpCustomVTK::pack_zu_triclinic;
- else pack_choice[ZU] = &DumpCustomVTK::pack_zu;
+ if (domain->triclinic) pack_choice[ZU] = &DumpVTK::pack_zu_triclinic;
+ else pack_choice[ZU] = &DumpVTK::pack_zu;
vtype[ZU] = DOUBLE;
name[ZU] = arg[iarg];
} else if (strcmp(arg[iarg],"xsu") == 0) {
- if (domain->triclinic) pack_choice[XSU] = &DumpCustomVTK::pack_xsu_triclinic;
- else pack_choice[XSU] = &DumpCustomVTK::pack_xsu;
+ if (domain->triclinic) pack_choice[XSU] = &DumpVTK::pack_xsu_triclinic;
+ else pack_choice[XSU] = &DumpVTK::pack_xsu;
vtype[XSU] = DOUBLE;
name[XSU] = arg[iarg];
} else if (strcmp(arg[iarg],"ysu") == 0) {
- if (domain->triclinic) pack_choice[YSU] = &DumpCustomVTK::pack_ysu_triclinic;
- else pack_choice[YSU] = &DumpCustomVTK::pack_ysu;
+ if (domain->triclinic) pack_choice[YSU] = &DumpVTK::pack_ysu_triclinic;
+ else pack_choice[YSU] = &DumpVTK::pack_ysu;
vtype[YSU] = DOUBLE;
name[YSU] = arg[iarg];
} else if (strcmp(arg[iarg],"zsu") == 0) {
- if (domain->triclinic) pack_choice[ZSU] = &DumpCustomVTK::pack_zsu_triclinic;
- else pack_choice[ZSU] = &DumpCustomVTK::pack_zsu;
+ if (domain->triclinic) pack_choice[ZSU] = &DumpVTK::pack_zsu_triclinic;
+ else pack_choice[ZSU] = &DumpVTK::pack_zsu;
vtype[ZSU] = DOUBLE;
name[ZSU] = arg[iarg];
} else if (strcmp(arg[iarg],"ix") == 0) {
- pack_choice[IX] = &DumpCustomVTK::pack_ix;
+ pack_choice[IX] = &DumpVTK::pack_ix;
vtype[IX] = INT;
name[IX] = arg[iarg];
} else if (strcmp(arg[iarg],"iy") == 0) {
- pack_choice[IY] = &DumpCustomVTK::pack_iy;
+ pack_choice[IY] = &DumpVTK::pack_iy;
vtype[IY] = INT;
name[IY] = arg[iarg];
} else if (strcmp(arg[iarg],"iz") == 0) {
- pack_choice[IZ] = &DumpCustomVTK::pack_iz;
+ pack_choice[IZ] = &DumpVTK::pack_iz;
vtype[IZ] = INT;
name[IZ] = arg[iarg];
} else if (strcmp(arg[iarg],"vx") == 0) {
- pack_choice[VX] = &DumpCustomVTK::pack_vx;
+ pack_choice[VX] = &DumpVTK::pack_vx;
vtype[VX] = DOUBLE;
name[VX] = arg[iarg];
} else if (strcmp(arg[iarg],"vy") == 0) {
- pack_choice[VY] = &DumpCustomVTK::pack_vy;
+ pack_choice[VY] = &DumpVTK::pack_vy;
vtype[VY] = DOUBLE;
name[VY] = arg[iarg];
} else if (strcmp(arg[iarg],"vz") == 0) {
- pack_choice[VZ] = &DumpCustomVTK::pack_vz;
+ pack_choice[VZ] = &DumpVTK::pack_vz;
vtype[VZ] = DOUBLE;
name[VZ] = arg[iarg];
} else if (strcmp(arg[iarg],"fx") == 0) {
- pack_choice[FX] = &DumpCustomVTK::pack_fx;
+ pack_choice[FX] = &DumpVTK::pack_fx;
vtype[FX] = DOUBLE;
name[FX] = arg[iarg];
} else if (strcmp(arg[iarg],"fy") == 0) {
- pack_choice[FY] = &DumpCustomVTK::pack_fy;
+ pack_choice[FY] = &DumpVTK::pack_fy;
vtype[FY] = DOUBLE;
name[FY] = arg[iarg];
} else if (strcmp(arg[iarg],"fz") == 0) {
- pack_choice[FZ] = &DumpCustomVTK::pack_fz;
+ pack_choice[FZ] = &DumpVTK::pack_fz;
vtype[FZ] = DOUBLE;
name[FZ] = arg[iarg];
} else if (strcmp(arg[iarg],"q") == 0) {
if (!atom->q_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[Q] = &DumpCustomVTK::pack_q;
+ pack_choice[Q] = &DumpVTK::pack_q;
vtype[Q] = DOUBLE;
name[Q] = arg[iarg];
} else if (strcmp(arg[iarg],"mux") == 0) {
if (!atom->mu_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[MUX] = &DumpCustomVTK::pack_mux;
+ pack_choice[MUX] = &DumpVTK::pack_mux;
vtype[MUX] = DOUBLE;
name[MUX] = arg[iarg];
} else if (strcmp(arg[iarg],"muy") == 0) {
if (!atom->mu_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[MUY] = &DumpCustomVTK::pack_muy;
+ pack_choice[MUY] = &DumpVTK::pack_muy;
vtype[MUY] = DOUBLE;
name[MUY] = arg[iarg];
} else if (strcmp(arg[iarg],"muz") == 0) {
if (!atom->mu_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[MUZ] = &DumpCustomVTK::pack_muz;
+ pack_choice[MUZ] = &DumpVTK::pack_muz;
vtype[MUZ] = DOUBLE;
name[MUZ] = arg[iarg];
} else if (strcmp(arg[iarg],"mu") == 0) {
if (!atom->mu_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[MU] = &DumpCustomVTK::pack_mu;
+ pack_choice[MU] = &DumpVTK::pack_mu;
vtype[MU] = DOUBLE;
name[MU] = arg[iarg];
} else if (strcmp(arg[iarg],"radius") == 0) {
if (!atom->radius_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[RADIUS] = &DumpCustomVTK::pack_radius;
+ pack_choice[RADIUS] = &DumpVTK::pack_radius;
vtype[RADIUS] = DOUBLE;
name[RADIUS] = arg[iarg];
} else if (strcmp(arg[iarg],"diameter") == 0) {
if (!atom->radius_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[DIAMETER] = &DumpCustomVTK::pack_diameter;
+ pack_choice[DIAMETER] = &DumpVTK::pack_diameter;
vtype[DIAMETER] = DOUBLE;
name[DIAMETER] = arg[iarg];
} else if (strcmp(arg[iarg],"omegax") == 0) {
if (!atom->omega_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[OMEGAX] = &DumpCustomVTK::pack_omegax;
+ pack_choice[OMEGAX] = &DumpVTK::pack_omegax;
vtype[OMEGAX] = DOUBLE;
name[OMEGAX] = arg[iarg];
} else if (strcmp(arg[iarg],"omegay") == 0) {
if (!atom->omega_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[OMEGAY] = &DumpCustomVTK::pack_omegay;
+ pack_choice[OMEGAY] = &DumpVTK::pack_omegay;
vtype[OMEGAY] = DOUBLE;
name[OMEGAY] = arg[iarg];
} else if (strcmp(arg[iarg],"omegaz") == 0) {
if (!atom->omega_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[OMEGAZ] = &DumpCustomVTK::pack_omegaz;
+ pack_choice[OMEGAZ] = &DumpVTK::pack_omegaz;
vtype[OMEGAZ] = DOUBLE;
name[OMEGAZ] = arg[iarg];
} else if (strcmp(arg[iarg],"angmomx") == 0) {
if (!atom->angmom_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[ANGMOMX] = &DumpCustomVTK::pack_angmomx;
+ pack_choice[ANGMOMX] = &DumpVTK::pack_angmomx;
vtype[ANGMOMX] = DOUBLE;
name[ANGMOMX] = arg[iarg];
} else if (strcmp(arg[iarg],"angmomy") == 0) {
if (!atom->angmom_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[ANGMOMY] = &DumpCustomVTK::pack_angmomy;
+ pack_choice[ANGMOMY] = &DumpVTK::pack_angmomy;
vtype[ANGMOMY] = DOUBLE;
name[ANGMOMY] = arg[iarg];
} else if (strcmp(arg[iarg],"angmomz") == 0) {
if (!atom->angmom_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[ANGMOMZ] = &DumpCustomVTK::pack_angmomz;
+ pack_choice[ANGMOMZ] = &DumpVTK::pack_angmomz;
vtype[ANGMOMZ] = DOUBLE;
name[ANGMOMZ] = arg[iarg];
} else if (strcmp(arg[iarg],"tqx") == 0) {
if (!atom->torque_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[TQX] = &DumpCustomVTK::pack_tqx;
+ pack_choice[TQX] = &DumpVTK::pack_tqx;
vtype[TQX] = DOUBLE;
name[TQX] = arg[iarg];
} else if (strcmp(arg[iarg],"tqy") == 0) {
if (!atom->torque_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[TQY] = &DumpCustomVTK::pack_tqy;
+ pack_choice[TQY] = &DumpVTK::pack_tqy;
vtype[TQY] = DOUBLE;
name[TQY] = arg[iarg];
} else if (strcmp(arg[iarg],"tqz") == 0) {
if (!atom->torque_flag)
error->all(FLERR,"Dumping an atom property that isn't allocated");
- pack_choice[TQZ] = &DumpCustomVTK::pack_tqz;
+ pack_choice[TQZ] = &DumpVTK::pack_tqz;
vtype[TQZ] = DOUBLE;
name[TQZ] = arg[iarg];
// compute value = c_ID
// if no trailing [], then arg is set to 0, else arg is int between []
} else if (strncmp(arg[iarg],"c_",2) == 0) {
- pack_choice[ATTRIBUTES+i] = &DumpCustomVTK::pack_compute;
+ pack_choice[ATTRIBUTES+i] = &DumpVTK::pack_compute;
vtype[ATTRIBUTES+i] = DOUBLE;
int n = strlen(arg[iarg]);
char *suffix = new char[n];
strcpy(suffix,&arg[iarg][2]);
char *ptr = strchr(suffix,'[');
if (ptr) {
if (suffix[strlen(suffix)-1] != ']')
- error->all(FLERR,"Invalid attribute in dump custom/vtk command");
+ error->all(FLERR,"Invalid attribute in dump vtk command");
argindex[ATTRIBUTES+i] = atoi(ptr+1);
*ptr = '\0';
} else argindex[ATTRIBUTES+i] = 0;
n = modify->find_compute(suffix);
- if (n < 0) error->all(FLERR,"Could not find dump custom/vtk compute ID");
+ if (n < 0) error->all(FLERR,"Could not find dump vtk compute ID");
if (modify->compute[n]->peratom_flag == 0)
- error->all(FLERR,"Dump custom/vtk compute does not compute per-atom info");
+ error->all(FLERR,"Dump vtk compute does not compute per-atom info");
if (argindex[ATTRIBUTES+i] == 0 && modify->compute[n]->size_peratom_cols > 0)
error->all(FLERR,
- "Dump custom/vtk compute does not calculate per-atom vector");
+ "Dump vtk compute does not calculate per-atom vector");
if (argindex[ATTRIBUTES+i] > 0 && modify->compute[n]->size_peratom_cols == 0)
error->all(FLERR,\
- "Dump custom/vtk compute does not calculate per-atom array");
+ "Dump vtk compute does not calculate per-atom array");
if (argindex[ATTRIBUTES+i] > 0 &&
argindex[ATTRIBUTES+i] > modify->compute[n]->size_peratom_cols)
- error->all(FLERR,"Dump custom/vtk compute vector is accessed out-of-range");
+ error->all(FLERR,"Dump vtk compute vector is accessed out-of-range");
field2index[ATTRIBUTES+i] = add_compute(suffix);
name[ATTRIBUTES+i] = arg[iarg];
delete [] suffix;
// fix value = f_ID
// if no trailing [], then arg is set to 0, else arg is between []
} else if (strncmp(arg[iarg],"f_",2) == 0) {
- pack_choice[ATTRIBUTES+i] = &DumpCustomVTK::pack_fix;
+ pack_choice[ATTRIBUTES+i] = &DumpVTK::pack_fix;
vtype[ATTRIBUTES+i] = DOUBLE;
int n = strlen(arg[iarg]);
char *suffix = new char[n];
strcpy(suffix,&arg[iarg][2]);
char *ptr = strchr(suffix,'[');
if (ptr) {
if (suffix[strlen(suffix)-1] != ']')
- error->all(FLERR,"Invalid attribute in dump custom/vtk command");
+ error->all(FLERR,"Invalid attribute in dump vtk command");
argindex[ATTRIBUTES+i] = atoi(ptr+1);
*ptr = '\0';
} else argindex[ATTRIBUTES+i] = 0;
n = modify->find_fix(suffix);
- if (n < 0) error->all(FLERR,"Could not find dump custom/vtk fix ID");
+ if (n < 0) error->all(FLERR,"Could not find dump vtk fix ID");
if (modify->fix[n]->peratom_flag == 0)
- error->all(FLERR,"Dump custom/vtk fix does not compute per-atom info");
+ error->all(FLERR,"Dump vtk fix does not compute per-atom info");
if (argindex[ATTRIBUTES+i] == 0 && modify->fix[n]->size_peratom_cols > 0)
- error->all(FLERR,"Dump custom/vtk fix does not compute per-atom vector");
+ error->all(FLERR,"Dump vtk fix does not compute per-atom vector");
if (argindex[ATTRIBUTES+i] > 0 && modify->fix[n]->size_peratom_cols == 0)
- error->all(FLERR,"Dump custom/vtk fix does not compute per-atom array");
+ error->all(FLERR,"Dump vtk fix does not compute per-atom array");
if (argindex[ATTRIBUTES+i] > 0 &&
argindex[ATTRIBUTES+i] > modify->fix[n]->size_peratom_cols)
- error->all(FLERR,"Dump custom/vtk fix vector is accessed out-of-range");
+ error->all(FLERR,"Dump vtk fix vector is accessed out-of-range");
field2index[ATTRIBUTES+i] = add_fix(suffix);
name[ATTRIBUTES+i] = arg[iarg];
delete [] suffix;
// variable value = v_name
} else if (strncmp(arg[iarg],"v_",2) == 0) {
- pack_choice[ATTRIBUTES+i] = &DumpCustomVTK::pack_variable;
+ pack_choice[ATTRIBUTES+i] = &DumpVTK::pack_variable;
vtype[ATTRIBUTES+i] = DOUBLE;
int n = strlen(arg[iarg]);
char *suffix = new char[n];
strcpy(suffix,&arg[iarg][2]);
argindex[ATTRIBUTES+i] = 0;
n = input->variable->find(suffix);
- if (n < 0) error->all(FLERR,"Could not find dump custom/vtk variable name");
+ if (n < 0) error->all(FLERR,"Could not find dump vtk variable name");
if (input->variable->atomstyle(n) == 0)
- error->all(FLERR,"Dump custom/vtk variable is not atom-style variable");
+ error->all(FLERR,"Dump vtk variable is not atom-style variable");
field2index[ATTRIBUTES+i] = add_variable(suffix);
name[ATTRIBUTES+i] = suffix;
delete [] suffix;
// custom per-atom floating point value = d_ID
} else if (strncmp(arg[iarg],"d_",2) == 0) {
- pack_choice[ATTRIBUTES+i] = &DumpCustomVTK::pack_custom;
+ pack_choice[ATTRIBUTES+i] = &DumpVTK::pack_custom;
vtype[ATTRIBUTES+i] = DOUBLE;
int n = strlen(arg[iarg]);
char *suffix = new char[n];
strcpy(suffix,&arg[iarg][2]);
argindex[ATTRIBUTES+i] = 0;
int tmp = -1;
n = atom->find_custom(suffix,tmp);
if (n < 0)
error->all(FLERR,"Could not find custom per-atom property ID");
if (tmp != 1)
error->all(FLERR,"Custom per-atom property ID is not floating point");
field2index[ATTRIBUTES+i] = add_custom(suffix,1);
name[ATTRIBUTES+i] = suffix;
delete [] suffix;
// custom per-atom integer value = i_ID
} else if (strncmp(arg[iarg],"i_",2) == 0) {
- pack_choice[ATTRIBUTES+i] = &DumpCustomVTK::pack_custom;
+ pack_choice[ATTRIBUTES+i] = &DumpVTK::pack_custom;
vtype[ATTRIBUTES+i] = INT;
int n = strlen(arg[iarg]);
char *suffix = new char[n];
strcpy(suffix,&arg[iarg][2]);
argindex[ATTRIBUTES+i] = 0;
int tmp = -1;
n = atom->find_custom(suffix,tmp);
if (n < 0)
error->all(FLERR,"Could not find custom per-atom property ID");
if (tmp != 0)
error->all(FLERR,"Custom per-atom property ID is not integer");
field2index[ATTRIBUTES+i] = add_custom(suffix,0);
name[ATTRIBUTES+i] = suffix;
delete [] suffix;
} else return iarg;
}
identify_vectors();
return narg;
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::identify_vectors()
+void DumpVTK::identify_vectors()
{
// detect vectors
vector_set.insert(X); // required
int vector3_starts[] = {XS, XU, XSU, IX, VX, FX, MUX, OMEGAX, ANGMOMX, TQX};
int num_vector3_starts = sizeof(vector3_starts) / sizeof(int);
for (int v3s = 0; v3s < num_vector3_starts; v3s++) {
if(name.count(vector3_starts[v3s] ) &&
name.count(vector3_starts[v3s]+1) &&
name.count(vector3_starts[v3s]+2) )
{
std::string vectorName = name[vector3_starts[v3s]];
vectorName.erase(vectorName.find_first_of('x'));
name[vector3_starts[v3s]] = vectorName;
vector_set.insert(vector3_starts[v3s]);
}
}
// compute and fix vectors
for (std::map<int,std::string>::iterator it=name.begin(); it!=name.end(); ++it) {
if (it->first < ATTRIBUTES) // neither fix nor compute
continue;
if(argindex[it->first] == 0) // single value
continue;
// assume components are grouped together and in correct order
if(name.count(it->first + 1) && name.count(it->first + 2) ) { // more attributes?
if(it->second.compare(0,it->second.length()-3,name[it->first + 1],0,it->second.length()-3) == 0 && // same attributes?
it->second.compare(0,it->second.length()-3,name[it->first + 2],0,it->second.length()-3) == 0 )
{
it->second.erase(it->second.length()-1);
std::ostringstream oss;
oss << "-" << argindex[it->first+2] << "]";
it->second += oss.str();
vector_set.insert(it->first);
++it; ++it;
}
}
}
}
/* ----------------------------------------------------------------------
add Compute to list of Compute objects used by dump
return index of where this Compute is in list
if already in list, do not add, just return index, else add to list
------------------------------------------------------------------------- */
-int DumpCustomVTK::add_compute(char *id)
+int DumpVTK::add_compute(char *id)
{
int icompute;
for (icompute = 0; icompute < ncompute; icompute++)
if (strcmp(id,id_compute[icompute]) == 0) break;
if (icompute < ncompute) return icompute;
id_compute = (char **)
memory->srealloc(id_compute,(ncompute+1)*sizeof(char *),"dump:id_compute");
delete [] compute;
compute = new Compute*[ncompute+1];
int n = strlen(id) + 1;
id_compute[ncompute] = new char[n];
strcpy(id_compute[ncompute],id);
ncompute++;
return ncompute-1;
}
/* ----------------------------------------------------------------------
add Fix to list of Fix objects used by dump
return index of where this Fix is in list
if already in list, do not add, just return index, else add to list
------------------------------------------------------------------------- */
-int DumpCustomVTK::add_fix(char *id)
+int DumpVTK::add_fix(char *id)
{
int ifix;
for (ifix = 0; ifix < nfix; ifix++)
if (strcmp(id,id_fix[ifix]) == 0) break;
if (ifix < nfix) return ifix;
id_fix = (char **)
memory->srealloc(id_fix,(nfix+1)*sizeof(char *),"dump:id_fix");
delete [] fix;
fix = new Fix*[nfix+1];
int n = strlen(id) + 1;
id_fix[nfix] = new char[n];
strcpy(id_fix[nfix],id);
nfix++;
return nfix-1;
}
/* ----------------------------------------------------------------------
add Variable to list of Variables used by dump
return index of where this Variable is in list
if already in list, do not add, just return index, else add to list
------------------------------------------------------------------------- */
-int DumpCustomVTK::add_variable(char *id)
+int DumpVTK::add_variable(char *id)
{
int ivariable;
for (ivariable = 0; ivariable < nvariable; ivariable++)
if (strcmp(id,id_variable[ivariable]) == 0) break;
if (ivariable < nvariable) return ivariable;
id_variable = (char **)
memory->srealloc(id_variable,(nvariable+1)*sizeof(char *),
"dump:id_variable");
delete [] variable;
variable = new int[nvariable+1];
delete [] vbuf;
vbuf = new double*[nvariable+1];
for (int i = 0; i <= nvariable; i++) vbuf[i] = NULL;
int n = strlen(id) + 1;
id_variable[nvariable] = new char[n];
strcpy(id_variable[nvariable],id);
nvariable++;
return nvariable-1;
}
/* ----------------------------------------------------------------------
add custom atom property to list used by dump
return index of where this property is in list
if already in list, do not add, just return index, else add to list
------------------------------------------------------------------------- */
-int DumpCustomVTK::add_custom(char *id, int flag)
+int DumpVTK::add_custom(char *id, int flag)
{
int icustom;
for (icustom = 0; icustom < ncustom; icustom++)
if ((strcmp(id,id_custom[icustom]) == 0)
&& (flag == flag_custom[icustom])) break;
if (icustom < ncustom) return icustom;
id_custom = (char **)
memory->srealloc(id_custom,(ncustom+1)*sizeof(char *),"dump:id_custom");
flag_custom = (int *)
memory->srealloc(flag_custom,(ncustom+1)*sizeof(int),"dump:flag_custom");
int n = strlen(id) + 1;
id_custom[ncustom] = new char[n];
strcpy(id_custom[ncustom],id);
flag_custom[ncustom] = flag;
ncustom++;
return ncustom-1;
}
/* ---------------------------------------------------------------------- */
-int DumpCustomVTK::modify_param(int narg, char **arg)
+int DumpVTK::modify_param(int narg, char **arg)
{
if (strcmp(arg[0],"region") == 0) {
if (narg < 2) error->all(FLERR,"Illegal dump_modify command");
if (strcmp(arg[1],"none") == 0) iregion = -1;
else {
iregion = domain->find_region(arg[1]);
if (iregion == -1)
error->all(FLERR,"Dump_modify region ID does not exist");
delete [] idregion;
int n = strlen(arg[1]) + 1;
idregion = new char[n];
strcpy(idregion,arg[1]);
}
return 2;
}
if (strcmp(arg[0],"label") == 0) {
if (narg < 2) error->all(FLERR,"Illegal dump_modify command [label]");
delete [] label;
int n = strlen(arg[1]) + 1;
label = new char[n];
strcpy(label,arg[1]);
return 2;
}
if (strcmp(arg[0],"binary") == 0) {
if (narg < 2) error->all(FLERR,"Illegal dump_modify command [binary]");
if (strcmp(arg[1],"yes") == 0) binary = 1;
else if (strcmp(arg[1],"no") == 0) binary = 0;
else error->all(FLERR,"Illegal dump_modify command [binary]");
return 2;
}
if (strcmp(arg[0],"element") == 0) {
if (narg < ntypes+1)
error->all(FLERR,"Dump modify: number of element names do not match atom types");
if (typenames) {
for (int i = 1; i <= ntypes; i++) delete [] typenames[i];
delete [] typenames;
typenames = NULL;
}
typenames = new char*[ntypes+1];
for (int itype = 1; itype <= ntypes; itype++) {
int n = strlen(arg[itype]) + 1;
typenames[itype] = new char[n];
strcpy(typenames[itype],arg[itype]);
}
return ntypes+1;
}
if (strcmp(arg[0],"thresh") == 0) {
if (narg < 2) error->all(FLERR,"Illegal dump_modify command");
if (strcmp(arg[1],"none") == 0) {
if (nthresh) {
memory->destroy(thresh_array);
memory->destroy(thresh_op);
memory->destroy(thresh_value);
thresh_array = NULL;
thresh_op = NULL;
thresh_value = NULL;
}
nthresh = 0;
return 2;
}
if (narg < 4) error->all(FLERR,"Illegal dump_modify command");
// grow threshold arrays
memory->grow(thresh_array,nthresh+1,"dump:thresh_array");
memory->grow(thresh_op,(nthresh+1),"dump:thresh_op");
memory->grow(thresh_value,(nthresh+1),"dump:thresh_value");
// set attribute type of threshold
// customize by adding to if statement
if (strcmp(arg[1],"id") == 0) thresh_array[nthresh] = ID;
else if (strcmp(arg[1],"mol") == 0) thresh_array[nthresh] = MOL;
else if (strcmp(arg[1],"proc") == 0) thresh_array[nthresh] = PROC;
else if (strcmp(arg[1],"procp1") == 0) thresh_array[nthresh] = PROCP1;
else if (strcmp(arg[1],"type") == 0) thresh_array[nthresh] = TYPE;
else if (strcmp(arg[1],"mass") == 0) thresh_array[nthresh] = MASS;
else if (strcmp(arg[1],"x") == 0) thresh_array[nthresh] = X;
else if (strcmp(arg[1],"y") == 0) thresh_array[nthresh] = Y;
else if (strcmp(arg[1],"z") == 0) thresh_array[nthresh] = Z;
else if (strcmp(arg[1],"xs") == 0 && domain->triclinic == 0)
thresh_array[nthresh] = XS;
else if (strcmp(arg[1],"xs") == 0 && domain->triclinic == 1)
thresh_array[nthresh] = XSTRI;
else if (strcmp(arg[1],"ys") == 0 && domain->triclinic == 0)
thresh_array[nthresh] = YS;
else if (strcmp(arg[1],"ys") == 0 && domain->triclinic == 1)
thresh_array[nthresh] = YSTRI;
else if (strcmp(arg[1],"zs") == 0 && domain->triclinic == 0)
thresh_array[nthresh] = ZS;
else if (strcmp(arg[1],"zs") == 0 && domain->triclinic == 1)
thresh_array[nthresh] = ZSTRI;
else if (strcmp(arg[1],"xu") == 0 && domain->triclinic == 0)
thresh_array[nthresh] = XU;
else if (strcmp(arg[1],"xu") == 0 && domain->triclinic == 1)
thresh_array[nthresh] = XUTRI;
else if (strcmp(arg[1],"yu") == 0 && domain->triclinic == 0)
thresh_array[nthresh] = YU;
else if (strcmp(arg[1],"yu") == 0 && domain->triclinic == 1)
thresh_array[nthresh] = YUTRI;
else if (strcmp(arg[1],"zu") == 0 && domain->triclinic == 0)
thresh_array[nthresh] = ZU;
else if (strcmp(arg[1],"zu") == 0 && domain->triclinic == 1)
thresh_array[nthresh] = ZUTRI;
else if (strcmp(arg[1],"xsu") == 0 && domain->triclinic == 0)
thresh_array[nthresh] = XSU;
else if (strcmp(arg[1],"xsu") == 0 && domain->triclinic == 1)
thresh_array[nthresh] = XSUTRI;
else if (strcmp(arg[1],"ysu") == 0 && domain->triclinic == 0)
thresh_array[nthresh] = YSU;
else if (strcmp(arg[1],"ysu") == 0 && domain->triclinic == 1)
thresh_array[nthresh] = YSUTRI;
else if (strcmp(arg[1],"zsu") == 0 && domain->triclinic == 0)
thresh_array[nthresh] = ZSU;
else if (strcmp(arg[1],"zsu") == 0 && domain->triclinic == 1)
thresh_array[nthresh] = ZSUTRI;
else if (strcmp(arg[1],"ix") == 0) thresh_array[nthresh] = IX;
else if (strcmp(arg[1],"iy") == 0) thresh_array[nthresh] = IY;
else if (strcmp(arg[1],"iz") == 0) thresh_array[nthresh] = IZ;
else if (strcmp(arg[1],"vx") == 0) thresh_array[nthresh] = VX;
else if (strcmp(arg[1],"vy") == 0) thresh_array[nthresh] = VY;
else if (strcmp(arg[1],"vz") == 0) thresh_array[nthresh] = VZ;
else if (strcmp(arg[1],"fx") == 0) thresh_array[nthresh] = FX;
else if (strcmp(arg[1],"fy") == 0) thresh_array[nthresh] = FY;
else if (strcmp(arg[1],"fz") == 0) thresh_array[nthresh] = FZ;
else if (strcmp(arg[1],"q") == 0) thresh_array[nthresh] = Q;
else if (strcmp(arg[1],"mux") == 0) thresh_array[nthresh] = MUX;
else if (strcmp(arg[1],"muy") == 0) thresh_array[nthresh] = MUY;
else if (strcmp(arg[1],"muz") == 0) thresh_array[nthresh] = MUZ;
else if (strcmp(arg[1],"mu") == 0) thresh_array[nthresh] = MU;
else if (strcmp(arg[1],"radius") == 0) thresh_array[nthresh] = RADIUS;
else if (strcmp(arg[1],"diameter") == 0) thresh_array[nthresh] = DIAMETER;
else if (strcmp(arg[1],"omegax") == 0) thresh_array[nthresh] = OMEGAX;
else if (strcmp(arg[1],"omegay") == 0) thresh_array[nthresh] = OMEGAY;
else if (strcmp(arg[1],"omegaz") == 0) thresh_array[nthresh] = OMEGAZ;
else if (strcmp(arg[1],"angmomx") == 0) thresh_array[nthresh] = ANGMOMX;
else if (strcmp(arg[1],"angmomy") == 0) thresh_array[nthresh] = ANGMOMY;
else if (strcmp(arg[1],"angmomz") == 0) thresh_array[nthresh] = ANGMOMZ;
else if (strcmp(arg[1],"tqx") == 0) thresh_array[nthresh] = TQX;
else if (strcmp(arg[1],"tqy") == 0) thresh_array[nthresh] = TQY;
else if (strcmp(arg[1],"tqz") == 0) thresh_array[nthresh] = TQZ;
// compute value = c_ID
// if no trailing [], then arg is set to 0, else arg is between []
else if (strncmp(arg[1],"c_",2) == 0) {
thresh_array[nthresh] = COMPUTE;
int n = strlen(arg[1]);
char *suffix = new char[n];
strcpy(suffix,&arg[1][2]);
char *ptr = strchr(suffix,'[');
if (ptr) {
if (suffix[strlen(suffix)-1] != ']')
error->all(FLERR,"Invalid attribute in dump modify command");
argindex[ATTRIBUTES+nfield+nthresh] = atoi(ptr+1);
*ptr = '\0';
} else argindex[ATTRIBUTES+nfield+nthresh] = 0;
n = modify->find_compute(suffix);
if (n < 0) error->all(FLERR,"Could not find dump modify compute ID");
if (modify->compute[n]->peratom_flag == 0)
error->all(FLERR,
"Dump modify compute ID does not compute per-atom info");
if (argindex[ATTRIBUTES+nfield+nthresh] == 0 &&
modify->compute[n]->size_peratom_cols > 0)
error->all(FLERR,
"Dump modify compute ID does not compute per-atom vector");
if (argindex[ATTRIBUTES+nfield+nthresh] > 0 &&
modify->compute[n]->size_peratom_cols == 0)
error->all(FLERR,
"Dump modify compute ID does not compute per-atom array");
if (argindex[ATTRIBUTES+nfield+nthresh] > 0 &&
argindex[ATTRIBUTES+nfield+nthresh] > modify->compute[n]->size_peratom_cols)
error->all(FLERR,"Dump modify compute ID vector is not large enough");
field2index[ATTRIBUTES+nfield+nthresh] = add_compute(suffix);
delete [] suffix;
// fix value = f_ID
// if no trailing [], then arg is set to 0, else arg is between []
} else if (strncmp(arg[1],"f_",2) == 0) {
thresh_array[nthresh] = FIX;
int n = strlen(arg[1]);
char *suffix = new char[n];
strcpy(suffix,&arg[1][2]);
char *ptr = strchr(suffix,'[');
if (ptr) {
if (suffix[strlen(suffix)-1] != ']')
error->all(FLERR,"Invalid attribute in dump modify command");
argindex[ATTRIBUTES+nfield+nthresh] = atoi(ptr+1);
*ptr = '\0';
} else argindex[ATTRIBUTES+nfield+nthresh] = 0;
n = modify->find_fix(suffix);
if (n < 0) error->all(FLERR,"Could not find dump modify fix ID");
if (modify->fix[n]->peratom_flag == 0)
error->all(FLERR,"Dump modify fix ID does not compute per-atom info");
if (argindex[ATTRIBUTES+nfield+nthresh] == 0 &&
modify->fix[n]->size_peratom_cols > 0)
error->all(FLERR,"Dump modify fix ID does not compute per-atom vector");
if (argindex[ATTRIBUTES+nfield+nthresh] > 0 &&
modify->fix[n]->size_peratom_cols == 0)
error->all(FLERR,"Dump modify fix ID does not compute per-atom array");
if (argindex[ATTRIBUTES+nfield+nthresh] > 0 &&
argindex[ATTRIBUTES+nfield+nthresh] > modify->fix[n]->size_peratom_cols)
error->all(FLERR,"Dump modify fix ID vector is not large enough");
field2index[ATTRIBUTES+nfield+nthresh] = add_fix(suffix);
delete [] suffix;
// variable value = v_ID
} else if (strncmp(arg[1],"v_",2) == 0) {
thresh_array[nthresh] = VARIABLE;
int n = strlen(arg[1]);
char *suffix = new char[n];
strcpy(suffix,&arg[1][2]);
argindex[ATTRIBUTES+nfield+nthresh] = 0;
n = input->variable->find(suffix);
if (n < 0) error->all(FLERR,"Could not find dump modify variable name");
if (input->variable->atomstyle(n) == 0)
error->all(FLERR,"Dump modify variable is not atom-style variable");
field2index[ATTRIBUTES+nfield+nthresh] = add_variable(suffix);
delete [] suffix;
} else error->all(FLERR,"Invalid dump_modify threshold operator");
// set operation type of threshold
if (strcmp(arg[2],"<") == 0) thresh_op[nthresh] = LT;
else if (strcmp(arg[2],"<=") == 0) thresh_op[nthresh] = LE;
else if (strcmp(arg[2],">") == 0) thresh_op[nthresh] = GT;
else if (strcmp(arg[2],">=") == 0) thresh_op[nthresh] = GE;
else if (strcmp(arg[2],"==") == 0) thresh_op[nthresh] = EQ;
else if (strcmp(arg[2],"!=") == 0) thresh_op[nthresh] = NEQ;
else error->all(FLERR,"Invalid dump_modify threshold operator");
// set threshold value
thresh_value[nthresh] = force->numeric(FLERR,arg[3]);
nthresh++;
return 4;
}
return 0;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory in buf, choose, variable arrays
------------------------------------------------------------------------- */
-bigint DumpCustomVTK::memory_usage()
+bigint DumpVTK::memory_usage()
{
bigint bytes = Dump::memory_usage();
bytes += memory->usage(choose,maxlocal);
bytes += memory->usage(dchoose,maxlocal);
bytes += memory->usage(clist,maxlocal);
bytes += memory->usage(vbuf,nvariable,maxlocal);
return bytes;
}
/* ----------------------------------------------------------------------
extraction of Compute, Fix, Variable results
------------------------------------------------------------------------- */
-void DumpCustomVTK::pack_compute(int n)
+void DumpVTK::pack_compute(int n)
{
double *vector = compute[field2index[current_pack_choice_key]]->vector_atom;
double **array = compute[field2index[current_pack_choice_key]]->array_atom;
int index = argindex[current_pack_choice_key];
if (index == 0) {
for (int i = 0; i < nchoose; i++) {
buf[n] = vector[clist[i]];
n += size_one;
}
} else {
index--;
for (int i = 0; i < nchoose; i++) {
buf[n] = array[clist[i]][index];
n += size_one;
}
}
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::pack_fix(int n)
+void DumpVTK::pack_fix(int n)
{
double *vector = fix[field2index[current_pack_choice_key]]->vector_atom;
double **array = fix[field2index[current_pack_choice_key]]->array_atom;
int index = argindex[current_pack_choice_key];
if (index == 0) {
for (int i = 0; i < nchoose; i++) {
buf[n] = vector[clist[i]];
n += size_one;
}
} else {
index--;
for (int i = 0; i < nchoose; i++) {
buf[n] = array[clist[i]][index];
n += size_one;
}
}
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::pack_variable(int n)
+void DumpVTK::pack_variable(int n)
{
double *vector = vbuf[field2index[current_pack_choice_key]];
for (int i = 0; i < nchoose; i++) {
buf[n] = vector[clist[i]];
n += size_one;
}
}
/* ---------------------------------------------------------------------- */
-void DumpCustomVTK::pack_custom(int n)
+void DumpVTK::pack_custom(int n)
{
-
int index = field2index[n];
if (flag_custom[index] == 0) { // integer
int iwhich,tmp;
iwhich = atom->find_custom(id_custom[index],tmp);
int *ivector = atom->ivector[iwhich];
for (int i = 0; i < nchoose; i++) {
buf[n] = ivector[clist[i]];
n += size_one;
}
} else if (flag_custom[index] == 1) { // double
int iwhich,tmp;
iwhich = atom->find_custom(id_custom[index],tmp);
double *dvector = atom->dvector[iwhich];
for (int i = 0; i < nchoose; i++) {
buf[n] = dvector[clist[i]];
n += size_one;
}
}
}
diff --git a/src/USER-VTK/dump_custom_vtk.h b/src/USER-VTK/dump_vtk.h
similarity index 95%
rename from src/USER-VTK/dump_custom_vtk.h
rename to src/USER-VTK/dump_vtk.h
index f3b4a8b63..603ca114b 100644
--- a/src/USER-VTK/dump_custom_vtk.h
+++ b/src/USER-VTK/dump_vtk.h
@@ -1,320 +1,321 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
This file initially came from LIGGGHTS (www.liggghts.com)
Copyright (2014) DCS Computing GmbH, Linz
Copyright (2015) Johannes Kepler University Linz
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef DUMP_CLASS
-DumpStyle(custom/vtk,DumpCustomVTK)
+DumpStyle(vtk,DumpVTK)
#else
-#ifndef LMP_DUMP_CUSTOM_VTK_H
-#define LMP_DUMP_CUSTOM_VTK_H
+#ifndef LMP_DUMP_VTK_H
+#define LMP_DUMP_VTK_H
#include "dump_custom.h"
#include <map>
#include <set>
#include <string>
#include <vtkSmartPointer.h>
#include <vtkPoints.h>
#include <vtkCellArray.h>
class vtkAbstractArray;
class vtkRectilinearGrid;
class vtkUnstructuredGrid;
namespace LAMMPS_NS {
/**
- * @brief DumpCustomVTK class
+ * @brief DumpVTK class
* write atom data to vtk files.
*
* Similar to the DumpCustom class but uses the vtk library to write data to vtk simple
* legacy or xml format depending on the filename extension specified. (Since this
* conflicts with the way binary output is specified, dump_modify allows to set the
* binary flag for this dump command explicitly).
* In contrast to DumpCustom class the attributes to be packed are stored in a std::map
* to avoid duplicate entries and enforce correct ordering of vector components (except
* for computes and fixes - these have to be given in the right order in the input script).
* (Note: std::map elements are sorted by their keys.)
* This dump command does not support compressed files, buffering or custom format strings,
* multiproc is only supported by the xml formats, multifile option has to be used.
*/
-class DumpCustomVTK : public DumpCustom {
+
+class DumpVTK : public DumpCustom {
public:
- DumpCustomVTK(class LAMMPS *, int, char **);
- virtual ~DumpCustomVTK();
+ DumpVTK(class LAMMPS *, int, char **);
+ virtual ~DumpVTK();
virtual void write();
protected:
char *label; // string for dump file header
int vtk_file_format; // which vtk file format to write (vtk, vtp, vtu ...)
std::map<int, int> field2index; // which compute,fix,variable calcs this field
std::map<int, int> argindex; // index into compute,fix scalar_atom,vector_atom
// 0 for scalar_atom, 1-N for vector_atom values
// private methods
virtual void init_style();
virtual void write_header(bigint);
int count();
void pack(tagint *);
virtual void write_data(int, double *);
bigint memory_usage();
int parse_fields(int, char **);
void identify_vectors();
int add_compute(char *);
int add_fix(char *);
int add_variable(char *);
int add_custom(char *, int);
virtual int modify_param(int, char **);
- typedef void (DumpCustomVTK::*FnPtrHeader)(bigint);
+ typedef void (DumpVTK::*FnPtrHeader)(bigint);
FnPtrHeader header_choice; // ptr to write header functions
void header_vtk(bigint);
- typedef void (DumpCustomVTK::*FnPtrWrite)(int, double *);
+ typedef void (DumpVTK::*FnPtrWrite)(int, double *);
FnPtrWrite write_choice; // ptr to write data functions
void write_vtk(int, double *);
void write_vtp(int, double *);
void write_vtu(int, double *);
void prepare_domain_data(vtkRectilinearGrid *);
void prepare_domain_data_triclinic(vtkUnstructuredGrid *);
void write_domain_vtk();
void write_domain_vtk_triclinic();
void write_domain_vtr();
void write_domain_vtu_triclinic();
- typedef void (DumpCustomVTK::*FnPtrPack)(int);
+ typedef void (DumpVTK::*FnPtrPack)(int);
std::map<int, FnPtrPack> pack_choice; // ptrs to pack functions
std::map<int, int> vtype; // data type
std::map<int, std::string> name; // attribute labels
std::set<int> vector_set; // set of vector attributes
int current_pack_choice_key;
// vtk data containers
vtkSmartPointer<vtkPoints> points;
vtkSmartPointer<vtkCellArray> pointsCells;
std::map<int, vtkSmartPointer<vtkAbstractArray> > myarrays;
int n_calls_;
double (*boxcorners)[3]; // corners of triclinic domain box
char *filecurrent;
char *domainfilecurrent;
char *parallelfilecurrent;
char *multiname_ex;
void setFileCurrent();
void buf2arrays(int, double *); // transfer data from buf array to vtk arrays
void reset_vtk_data_containers();
// customize by adding a method prototype
void pack_compute(int);
void pack_fix(int);
void pack_variable(int);
void pack_custom(int);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: No dump custom arguments specified
The dump custom command requires that atom quantities be specified to
output to dump file.
E: Invalid attribute in dump custom command
Self-explanatory.
E: Dump_modify format string is too short
There are more fields to be dumped in a line of output than your
format string specifies.
E: Could not find dump custom compute ID
Self-explanatory.
E: Could not find dump custom fix ID
Self-explanatory.
E: Dump custom and fix not computed at compatible times
The fix must produce per-atom quantities on timesteps that dump custom
needs them.
E: Could not find dump custom variable name
Self-explanatory.
E: Could not find custom per-atom property ID
Self-explanatory.
E: Region ID for dump custom does not exist
Self-explanatory.
E: Compute used in dump between runs is not current
The compute was not invoked on the current timestep, therefore it
cannot be used in a dump between runs.
E: Threshhold for an atom property that isn't allocated
A dump threshold has been requested on a quantity that is
not defined by the atom style used in this simulation.
E: Dumping an atom property that isn't allocated
The chosen atom style does not define the per-atom quantity being
dumped.
E: Dump custom compute does not compute per-atom info
Self-explanatory.
E: Dump custom compute does not calculate per-atom vector
Self-explanatory.
E: Dump custom compute does not calculate per-atom array
Self-explanatory.
E: Dump custom compute vector is accessed out-of-range
Self-explanatory.
E: Dump custom fix does not compute per-atom info
Self-explanatory.
E: Dump custom fix does not compute per-atom vector
Self-explanatory.
E: Dump custom fix does not compute per-atom array
Self-explanatory.
E: Dump custom fix vector is accessed out-of-range
Self-explanatory.
E: Dump custom variable is not atom-style variable
Only atom-style variables generate per-atom quantities, needed for
dump output.
E: Custom per-atom property ID is not floating point
Self-explanatory.
E: Custom per-atom property ID is not integer
Self-explanatory.
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Dump_modify region ID does not exist
Self-explanatory.
E: Dump modify element names do not match atom types
Number of element names must equal number of atom types.
E: Invalid attribute in dump modify command
Self-explanatory.
E: Could not find dump modify compute ID
Self-explanatory.
E: Dump modify compute ID does not compute per-atom info
Self-explanatory.
E: Dump modify compute ID does not compute per-atom vector
Self-explanatory.
E: Dump modify compute ID does not compute per-atom array
Self-explanatory.
E: Dump modify compute ID vector is not large enough
Self-explanatory.
E: Could not find dump modify fix ID
Self-explanatory.
E: Dump modify fix ID does not compute per-atom info
Self-explanatory.
E: Dump modify fix ID does not compute per-atom vector
Self-explanatory.
E: Dump modify fix ID does not compute per-atom array
Self-explanatory.
E: Dump modify fix ID vector is not large enough
Self-explanatory.
E: Could not find dump modify variable name
Self-explanatory.
E: Dump modify variable is not atom-style variable
Self-explanatory.
E: Could not find dump modify custom atom floating point property ID
Self-explanatory.
E: Could not find dump modify custom atom integer property ID
Self-explanatory.
E: Invalid dump_modify threshold operator
Operator keyword used for threshold specification in not recognized.
*/
diff --git a/src/compute_dipole_chunk.cpp b/src/compute_dipole_chunk.cpp
index 74d66e7c1..45389ee61 100644
--- a/src/compute_dipole_chunk.cpp
+++ b/src/compute_dipole_chunk.cpp
@@ -1,294 +1,296 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <string.h>
#include "compute_dipole_chunk.h"
#include "atom.h"
#include "update.h"
#include "modify.h"
#include "compute_chunk_atom.h"
#include "domain.h"
#include "memory.h"
#include "error.h"
#include "math_special.h"
using namespace LAMMPS_NS;
using namespace MathSpecial;
enum { MASSCENTER, GEOMCENTER };
/* ---------------------------------------------------------------------- */
ComputeDipoleChunk::ComputeDipoleChunk(LAMMPS *lmp, int narg, char **arg) :
Compute(lmp, narg, arg),
- idchunk(NULL), massproc(NULL), masstotal(NULL), chrgproc(NULL), chrgtotal(NULL), com(NULL),
+ idchunk(NULL), massproc(NULL), masstotal(NULL), chrgproc(NULL),
+ chrgtotal(NULL), com(NULL),
comall(NULL), dipole(NULL), dipoleall(NULL)
{
- if ((narg != 4) && (narg != 5)) error->all(FLERR,"Illegal compute dipole/chunk command");
+ if ((narg != 4) && (narg != 5))
+ error->all(FLERR,"Illegal compute dipole/chunk command");
array_flag = 1;
size_array_cols = 4;
size_array_rows = 0;
size_array_rows_variable = 1;
extarray = 0;
// ID of compute chunk/atom
int n = strlen(arg[3]) + 1;
idchunk = new char[n];
strcpy(idchunk,arg[3]);
usecenter = MASSCENTER;
if (narg == 5) {
if (strncmp(arg[4],"geom",4) == 0) usecenter = GEOMCENTER;
else if (strcmp(arg[4],"mass") == 0) usecenter = MASSCENTER;
else error->all(FLERR,"Illegal compute dipole/chunk command");
}
init();
// chunk-based data
nchunk = 1;
maxchunk = 0;
allocate();
}
/* ---------------------------------------------------------------------- */
ComputeDipoleChunk::~ComputeDipoleChunk()
{
delete [] idchunk;
memory->destroy(massproc);
memory->destroy(masstotal);
memory->destroy(chrgproc);
memory->destroy(chrgtotal);
memory->destroy(com);
memory->destroy(comall);
memory->destroy(dipole);
memory->destroy(dipoleall);
}
/* ---------------------------------------------------------------------- */
void ComputeDipoleChunk::init()
{
int icompute = modify->find_compute(idchunk);
if (icompute < 0)
error->all(FLERR,"Chunk/atom compute does not exist for "
"compute dipole/chunk");
cchunk = (ComputeChunkAtom *) modify->compute[icompute];
if (strcmp(cchunk->style,"chunk/atom") != 0)
error->all(FLERR,"Compute dipole/chunk does not use chunk/atom compute");
}
/* ---------------------------------------------------------------------- */
void ComputeDipoleChunk::compute_array()
{
int i,index;
double massone;
double unwrap[3];
invoked_array = update->ntimestep;
// compute chunk/atom assigns atoms to chunk IDs
// extract ichunk index vector from compute
// ichunk = 1 to Nchunk for included atoms, 0 for excluded atoms
nchunk = cchunk->setup_chunks();
cchunk->compute_ichunk();
int *ichunk = cchunk->ichunk;
if (nchunk > maxchunk) allocate();
size_array_rows = nchunk;
// zero local per-chunk values
for (int i = 0; i < nchunk; i++) {
massproc[i] = chrgproc[i] = 0.0;
com[i][0] = com[i][1] = com[i][2] = 0.0;
dipole[i][0] = dipole[i][1] = dipole[i][2] = dipole[i][3] = 0.0;
}
// compute COM for each chunk
double **x = atom->x;
int *mask = atom->mask;
int *type = atom->type;
imageint *image = atom->image;
double *mass = atom->mass;
double *rmass = atom->rmass;
double *q = atom->q;
double **mu = atom->mu;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++)
if (mask[i] & groupbit) {
index = ichunk[i]-1;
if (index < 0) continue;
if (usecenter == MASSCENTER) {
if (rmass) massone = rmass[i];
else massone = mass[type[i]];
} else massone = 1.0; // usecenter == GEOMCENTER
domain->unmap(x[i],image[i],unwrap);
massproc[index] += massone;
if (atom->q_flag) chrgproc[index] += atom->q[i];
com[index][0] += unwrap[0] * massone;
com[index][1] += unwrap[1] * massone;
com[index][2] += unwrap[2] * massone;
}
MPI_Allreduce(massproc,masstotal,nchunk,MPI_DOUBLE,MPI_SUM,world);
MPI_Allreduce(chrgproc,chrgtotal,nchunk,MPI_DOUBLE,MPI_SUM,world);
MPI_Allreduce(&com[0][0],&comall[0][0],3*nchunk,MPI_DOUBLE,MPI_SUM,world);
for (int i = 0; i < nchunk; i++) {
if (masstotal[i] > 0.0) {
comall[i][0] /= masstotal[i];
comall[i][1] /= masstotal[i];
comall[i][2] /= masstotal[i];
}
}
// compute dipole for each chunk
for (i = 0; i < nlocal; i++) {
if (mask[i] & groupbit) {
index = ichunk[i]-1;
if (index < 0) continue;
domain->unmap(x[i],image[i],unwrap);
if (atom->q_flag) {
dipole[index][0] += q[i]*unwrap[0];
dipole[index][1] += q[i]*unwrap[1];
dipole[index][2] += q[i]*unwrap[2];
}
if (atom->mu_flag) {
dipole[index][0] += mu[i][0];
dipole[index][1] += mu[i][1];
dipole[index][2] += mu[i][2];
}
}
}
MPI_Allreduce(&dipole[0][0],&dipoleall[0][0],4*nchunk,
MPI_DOUBLE,MPI_SUM,world);
for (i = 0; i < nchunk; i++) {
// correct for position dependence with charged chunks
dipoleall[i][0] -= chrgtotal[i]*comall[i][0];
dipoleall[i][1] -= chrgtotal[i]*comall[i][1];
dipoleall[i][2] -= chrgtotal[i]*comall[i][2];
// compute total dipole moment
dipoleall[i][3] = sqrt(square(dipoleall[i][0])
+ square(dipoleall[i][1])
+ square(dipoleall[i][2]));
}
}
/* ----------------------------------------------------------------------
lock methods: called by fix ave/time
these methods insure vector/array size is locked for Nfreq epoch
by passing lock info along to compute chunk/atom
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
increment lock counter
------------------------------------------------------------------------- */
void ComputeDipoleChunk::lock_enable()
{
cchunk->lockcount++;
}
/* ----------------------------------------------------------------------
decrement lock counter in compute chunk/atom, it if still exists
------------------------------------------------------------------------- */
void ComputeDipoleChunk::lock_disable()
{
int icompute = modify->find_compute(idchunk);
if (icompute >= 0) {
cchunk = (ComputeChunkAtom *) modify->compute[icompute];
cchunk->lockcount--;
}
}
/* ----------------------------------------------------------------------
calculate and return # of chunks = length of vector/array
------------------------------------------------------------------------- */
int ComputeDipoleChunk::lock_length()
{
nchunk = cchunk->setup_chunks();
return nchunk;
}
/* ----------------------------------------------------------------------
set the lock from startstep to stopstep
------------------------------------------------------------------------- */
void ComputeDipoleChunk::lock(Fix *fixptr, bigint startstep, bigint stopstep)
{
cchunk->lock(fixptr,startstep,stopstep);
}
/* ----------------------------------------------------------------------
unset the lock
------------------------------------------------------------------------- */
void ComputeDipoleChunk::unlock(Fix *fixptr)
{
cchunk->unlock(fixptr);
}
/* ----------------------------------------------------------------------
free and reallocate per-chunk arrays
------------------------------------------------------------------------- */
void ComputeDipoleChunk::allocate()
{
memory->destroy(massproc);
memory->destroy(masstotal);
memory->destroy(chrgproc);
memory->destroy(chrgtotal);
memory->destroy(com);
memory->destroy(comall);
memory->destroy(dipole);
memory->destroy(dipoleall);
maxchunk = nchunk;
memory->create(massproc,maxchunk,"dipole/chunk:massproc");
memory->create(masstotal,maxchunk,"dipole/chunk:masstotal");
memory->create(chrgproc,maxchunk,"dipole/chunk:chrgproc");
memory->create(chrgtotal,maxchunk,"dipole/chunk:chrgtotal");
memory->create(com,maxchunk,3,"dipole/chunk:com");
memory->create(comall,maxchunk,3,"dipole/chunk:comall");
memory->create(dipole,maxchunk,4,"dipole/chunk:dipole");
memory->create(dipoleall,maxchunk,4,"dipole/chunk:dipoleall");
array = dipoleall;
}
/* ----------------------------------------------------------------------
memory usage of local data
------------------------------------------------------------------------- */
double ComputeDipoleChunk::memory_usage()
{
double bytes = (bigint) maxchunk * 2 * sizeof(double);
bytes += (bigint) maxchunk * 2*3 * sizeof(double);
bytes += (bigint) maxchunk * 2*4 * sizeof(double);
return bytes;
}
diff --git a/src/domain.cpp b/src/domain.cpp
index 31fb3b855..8ead12cd4 100644
--- a/src/domain.cpp
+++ b/src/domain.cpp
@@ -1,2053 +1,2125 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author (triclinic) : Pieter in 't Veld (SNL)
------------------------------------------------------------------------- */
#include <mpi.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <math.h>
#include "domain.h"
#include "style_region.h"
#include "atom.h"
#include "atom_vec.h"
#include "molecule.h"
#include "force.h"
#include "kspace.h"
#include "update.h"
#include "modify.h"
#include "fix.h"
#include "fix_deform.h"
#include "region.h"
#include "lattice.h"
#include "comm.h"
#include "output.h"
#include "thermo.h"
#include "universe.h"
#include "math_const.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
using namespace MathConst;
enum{NO_REMAP,X_REMAP,V_REMAP}; // same as fix_deform.cpp
enum{IGNORE,WARN,ERROR}; // same as thermo.cpp
enum{LAYOUT_UNIFORM,LAYOUT_NONUNIFORM,LAYOUT_TILED}; // several files
#define BIG 1.0e20
#define SMALL 1.0e-4
#define DELTAREGION 4
#define BONDSTRETCH 1.1
/* ----------------------------------------------------------------------
default is periodic
------------------------------------------------------------------------- */
Domain::Domain(LAMMPS *lmp) : Pointers(lmp)
{
box_exist = 0;
dimension = 3;
nonperiodic = 0;
xperiodic = yperiodic = zperiodic = 1;
periodicity[0] = xperiodic;
periodicity[1] = yperiodic;
periodicity[2] = zperiodic;
boundary[0][0] = boundary[0][1] = 0;
boundary[1][0] = boundary[1][1] = 0;
boundary[2][0] = boundary[2][1] = 0;
minxlo = minxhi = 0.0;
minylo = minyhi = 0.0;
minzlo = minzhi = 0.0;
triclinic = 0;
tiltsmall = 1;
boxlo[0] = boxlo[1] = boxlo[2] = -0.5;
boxhi[0] = boxhi[1] = boxhi[2] = 0.5;
xy = xz = yz = 0.0;
h[3] = h[4] = h[5] = 0.0;
h_inv[3] = h_inv[4] = h_inv[5] = 0.0;
h_rate[0] = h_rate[1] = h_rate[2] =
h_rate[3] = h_rate[4] = h_rate[5] = 0.0;
h_ratelo[0] = h_ratelo[1] = h_ratelo[2] = 0.0;
prd_lamda[0] = prd_lamda[1] = prd_lamda[2] = 1.0;
prd_half_lamda[0] = prd_half_lamda[1] = prd_half_lamda[2] = 0.5;
boxlo_lamda[0] = boxlo_lamda[1] = boxlo_lamda[2] = 0.0;
boxhi_lamda[0] = boxhi_lamda[1] = boxhi_lamda[2] = 1.0;
lattice = NULL;
char **args = new char*[2];
args[0] = (char *) "none";
args[1] = (char *) "1.0";
set_lattice(2,args);
delete [] args;
nregion = maxregion = 0;
regions = NULL;
copymode = 0;
region_map = new RegionCreatorMap();
#define REGION_CLASS
#define RegionStyle(key,Class) \
(*region_map)[#key] = &region_creator<Class>;
#include "style_region.h"
#undef RegionStyle
#undef REGION_CLASS
}
/* ---------------------------------------------------------------------- */
Domain::~Domain()
{
if (copymode) return;
delete lattice;
for (int i = 0; i < nregion; i++) delete regions[i];
memory->sfree(regions);
delete region_map;
}
/* ---------------------------------------------------------------------- */
void Domain::init()
{
// set box_change flags if box size/shape/sub-domains ever change
// due to shrink-wrapping or fixes that change box size/shape/sub-domains
box_change_size = box_change_shape = box_change_domain = 0;
if (nonperiodic == 2) box_change_size = 1;
for (int i = 0; i < modify->nfix; i++) {
if (modify->fix[i]->box_change_size) box_change_size = 1;
if (modify->fix[i]->box_change_shape) box_change_shape = 1;
if (modify->fix[i]->box_change_domain) box_change_domain = 1;
}
box_change = 0;
if (box_change_size || box_change_shape || box_change_domain) box_change = 1;
// check for fix deform
deform_flag = deform_vremap = deform_groupbit = 0;
for (int i = 0; i < modify->nfix; i++)
if (strcmp(modify->fix[i]->style,"deform") == 0) {
deform_flag = 1;
if (((FixDeform *) modify->fix[i])->remapflag == V_REMAP) {
deform_vremap = 1;
deform_groupbit = modify->fix[i]->groupbit;
}
}
// region inits
for (int i = 0; i < nregion; i++) regions[i]->init();
}
/* ----------------------------------------------------------------------
set initial global box
assumes boxlo/hi and triclinic tilts are already set
expandflag = 1 if need to expand box in shrink-wrapped dims
not invoked by read_restart since box is already expanded
if don't prevent further expansion, restarted triclinic box
with unchanged tilt factors can become a box with atoms outside the box
------------------------------------------------------------------------- */
void Domain::set_initial_box(int expandflag)
{
// error checks for orthogonal and triclinic domains
if (boxlo[0] >= boxhi[0] || boxlo[1] >= boxhi[1] || boxlo[2] >= boxhi[2])
error->one(FLERR,"Box bounds are invalid or missing");
if (domain->dimension == 2 && (xz != 0.0 || yz != 0.0))
error->all(FLERR,"Cannot skew triclinic box in z for 2d simulation");
// error check or warning on triclinic tilt factors
if (triclinic) {
if ((fabs(xy/(boxhi[0]-boxlo[0])) > 0.5 && xperiodic) ||
(fabs(xz/(boxhi[0]-boxlo[0])) > 0.5 && xperiodic) ||
(fabs(yz/(boxhi[1]-boxlo[1])) > 0.5 && yperiodic)) {
if (tiltsmall)
error->all(FLERR,"Triclinic box skew is too large");
else if (comm->me == 0)
error->warning(FLERR,"Triclinic box skew is large");
}
}
// set small based on box size and SMALL
// this works for any unit system
small[0] = SMALL * (boxhi[0] - boxlo[0]);
small[1] = SMALL * (boxhi[1] - boxlo[1]);
small[2] = SMALL * (boxhi[2] - boxlo[2]);
// if expandflag, adjust box lo/hi for shrink-wrapped dims
if (!expandflag) return;
if (boundary[0][0] == 2) boxlo[0] -= small[0];
else if (boundary[0][0] == 3) minxlo = boxlo[0];
if (boundary[0][1] == 2) boxhi[0] += small[0];
else if (boundary[0][1] == 3) minxhi = boxhi[0];
if (boundary[1][0] == 2) boxlo[1] -= small[1];
else if (boundary[1][0] == 3) minylo = boxlo[1];
if (boundary[1][1] == 2) boxhi[1] += small[1];
else if (boundary[1][1] == 3) minyhi = boxhi[1];
if (boundary[2][0] == 2) boxlo[2] -= small[2];
else if (boundary[2][0] == 3) minzlo = boxlo[2];
if (boundary[2][1] == 2) boxhi[2] += small[2];
else if (boundary[2][1] == 3) minzhi = boxhi[2];
}
/* ----------------------------------------------------------------------
set global box params
assumes boxlo/hi and triclinic tilts are already set
------------------------------------------------------------------------- */
void Domain::set_global_box()
{
prd[0] = xprd = boxhi[0] - boxlo[0];
prd[1] = yprd = boxhi[1] - boxlo[1];
prd[2] = zprd = boxhi[2] - boxlo[2];
h[0] = xprd;
h[1] = yprd;
h[2] = zprd;
h_inv[0] = 1.0/h[0];
h_inv[1] = 1.0/h[1];
h_inv[2] = 1.0/h[2];
prd_half[0] = xprd_half = 0.5*xprd;
prd_half[1] = yprd_half = 0.5*yprd;
prd_half[2] = zprd_half = 0.5*zprd;
if (triclinic) {
h[3] = yz;
h[4] = xz;
h[5] = xy;
h_inv[3] = -h[3] / (h[1]*h[2]);
h_inv[4] = (h[3]*h[5] - h[1]*h[4]) / (h[0]*h[1]*h[2]);
h_inv[5] = -h[5] / (h[0]*h[1]);
boxlo_bound[0] = MIN(boxlo[0],boxlo[0]+xy);
boxlo_bound[0] = MIN(boxlo_bound[0],boxlo_bound[0]+xz);
boxlo_bound[1] = MIN(boxlo[1],boxlo[1]+yz);
boxlo_bound[2] = boxlo[2];
boxhi_bound[0] = MAX(boxhi[0],boxhi[0]+xy);
boxhi_bound[0] = MAX(boxhi_bound[0],boxhi_bound[0]+xz);
boxhi_bound[1] = MAX(boxhi[1],boxhi[1]+yz);
boxhi_bound[2] = boxhi[2];
}
}
/* ----------------------------------------------------------------------
set lamda box params
assumes global box is defined and proc assignment has been made
uses comm->xyz_split or comm->mysplit
to define subbox boundaries in consistent manner
------------------------------------------------------------------------- */
void Domain::set_lamda_box()
{
if (comm->layout != LAYOUT_TILED) {
int *myloc = comm->myloc;
double *xsplit = comm->xsplit;
double *ysplit = comm->ysplit;
double *zsplit = comm->zsplit;
sublo_lamda[0] = xsplit[myloc[0]];
subhi_lamda[0] = xsplit[myloc[0]+1];
sublo_lamda[1] = ysplit[myloc[1]];
subhi_lamda[1] = ysplit[myloc[1]+1];
sublo_lamda[2] = zsplit[myloc[2]];
subhi_lamda[2] = zsplit[myloc[2]+1];
} else {
double (*mysplit)[2] = comm->mysplit;
sublo_lamda[0] = mysplit[0][0];
subhi_lamda[0] = mysplit[0][1];
sublo_lamda[1] = mysplit[1][0];
subhi_lamda[1] = mysplit[1][1];
sublo_lamda[2] = mysplit[2][0];
subhi_lamda[2] = mysplit[2][1];
}
}
/* ----------------------------------------------------------------------
set local subbox params for orthogonal boxes
assumes global box is defined and proc assignment has been made
uses comm->xyz_split or comm->mysplit
to define subbox boundaries in consistent manner
insure subhi[max] = boxhi
------------------------------------------------------------------------- */
void Domain::set_local_box()
{
if (triclinic) return;
if (comm->layout != LAYOUT_TILED) {
int *myloc = comm->myloc;
int *procgrid = comm->procgrid;
double *xsplit = comm->xsplit;
double *ysplit = comm->ysplit;
double *zsplit = comm->zsplit;
sublo[0] = boxlo[0] + xprd*xsplit[myloc[0]];
if (myloc[0] < procgrid[0]-1) subhi[0] = boxlo[0] + xprd*xsplit[myloc[0]+1];
else subhi[0] = boxhi[0];
sublo[1] = boxlo[1] + yprd*ysplit[myloc[1]];
if (myloc[1] < procgrid[1]-1) subhi[1] = boxlo[1] + yprd*ysplit[myloc[1]+1];
else subhi[1] = boxhi[1];
sublo[2] = boxlo[2] + zprd*zsplit[myloc[2]];
if (myloc[2] < procgrid[2]-1) subhi[2] = boxlo[2] + zprd*zsplit[myloc[2]+1];
else subhi[2] = boxhi[2];
} else {
double (*mysplit)[2] = comm->mysplit;
sublo[0] = boxlo[0] + xprd*mysplit[0][0];
if (mysplit[0][1] < 1.0) subhi[0] = boxlo[0] + xprd*mysplit[0][1];
else subhi[0] = boxhi[0];
sublo[1] = boxlo[1] + yprd*mysplit[1][0];
if (mysplit[1][1] < 1.0) subhi[1] = boxlo[1] + yprd*mysplit[1][1];
else subhi[1] = boxhi[1];
sublo[2] = boxlo[2] + zprd*mysplit[2][0];
if (mysplit[2][1] < 1.0) subhi[2] = boxlo[2] + zprd*mysplit[2][1];
else subhi[2] = boxhi[2];
}
}
/* ----------------------------------------------------------------------
reset global & local boxes due to global box boundary changes
if shrink-wrapped, determine atom extent and reset boxlo/hi
for triclinic, atoms must be in lamda coords (0-1) before reset_box is called
------------------------------------------------------------------------- */
void Domain::reset_box()
{
// perform shrink-wrapping
// compute extent of atoms on this proc
// for triclinic, this is done in lamda space
if (nonperiodic == 2) {
double extent[3][2],all[3][2];
extent[2][0] = extent[1][0] = extent[0][0] = BIG;
extent[2][1] = extent[1][1] = extent[0][1] = -BIG;
double **x = atom->x;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
extent[0][0] = MIN(extent[0][0],x[i][0]);
extent[0][1] = MAX(extent[0][1],x[i][0]);
extent[1][0] = MIN(extent[1][0],x[i][1]);
extent[1][1] = MAX(extent[1][1],x[i][1]);
extent[2][0] = MIN(extent[2][0],x[i][2]);
extent[2][1] = MAX(extent[2][1],x[i][2]);
}
// compute extent across all procs
// flip sign of MIN to do it in one Allreduce MAX
extent[0][0] = -extent[0][0];
extent[1][0] = -extent[1][0];
extent[2][0] = -extent[2][0];
MPI_Allreduce(extent,all,6,MPI_DOUBLE,MPI_MAX,world);
// for triclinic, convert back to box coords before changing box
if (triclinic) lamda2x(atom->nlocal);
// in shrink-wrapped dims, set box by atom extent
// if minimum set, enforce min box size settings
// for triclinic, convert lamda extent to box coords, then set box lo/hi
// decided NOT to do the next comment - don't want to sneakily change tilt
// for triclinic, adjust tilt factors if 2nd dim is shrink-wrapped,
// so that displacement in 1st dim stays the same
if (triclinic == 0) {
if (xperiodic == 0) {
if (boundary[0][0] == 2) boxlo[0] = -all[0][0] - small[0];
else if (boundary[0][0] == 3)
boxlo[0] = MIN(-all[0][0]-small[0],minxlo);
if (boundary[0][1] == 2) boxhi[0] = all[0][1] + small[0];
else if (boundary[0][1] == 3) boxhi[0] = MAX(all[0][1]+small[0],minxhi);
if (boxlo[0] > boxhi[0]) error->all(FLERR,"Illegal simulation box");
}
if (yperiodic == 0) {
if (boundary[1][0] == 2) boxlo[1] = -all[1][0] - small[1];
else if (boundary[1][0] == 3)
boxlo[1] = MIN(-all[1][0]-small[1],minylo);
if (boundary[1][1] == 2) boxhi[1] = all[1][1] + small[1];
else if (boundary[1][1] == 3) boxhi[1] = MAX(all[1][1]+small[1],minyhi);
if (boxlo[1] > boxhi[1]) error->all(FLERR,"Illegal simulation box");
}
if (zperiodic == 0) {
if (boundary[2][0] == 2) boxlo[2] = -all[2][0] - small[2];
else if (boundary[2][0] == 3)
boxlo[2] = MIN(-all[2][0]-small[2],minzlo);
if (boundary[2][1] == 2) boxhi[2] = all[2][1] + small[2];
else if (boundary[2][1] == 3) boxhi[2] = MAX(all[2][1]+small[2],minzhi);
if (boxlo[2] > boxhi[2]) error->all(FLERR,"Illegal simulation box");
}
} else {
double lo[3],hi[3];
if (xperiodic == 0) {
lo[0] = -all[0][0]; lo[1] = 0.0; lo[2] = 0.0;
lamda2x(lo,lo);
hi[0] = all[0][1]; hi[1] = 0.0; hi[2] = 0.0;
lamda2x(hi,hi);
if (boundary[0][0] == 2) boxlo[0] = lo[0] - small[0];
else if (boundary[0][0] == 3) boxlo[0] = MIN(lo[0]-small[0],minxlo);
if (boundary[0][1] == 2) boxhi[0] = hi[0] + small[0];
else if (boundary[0][1] == 3) boxhi[0] = MAX(hi[0]+small[0],minxhi);
if (boxlo[0] > boxhi[0]) error->all(FLERR,"Illegal simulation box");
}
if (yperiodic == 0) {
lo[0] = 0.0; lo[1] = -all[1][0]; lo[2] = 0.0;
lamda2x(lo,lo);
hi[0] = 0.0; hi[1] = all[1][1]; hi[2] = 0.0;
lamda2x(hi,hi);
if (boundary[1][0] == 2) boxlo[1] = lo[1] - small[1];
else if (boundary[1][0] == 3) boxlo[1] = MIN(lo[1]-small[1],minylo);
if (boundary[1][1] == 2) boxhi[1] = hi[1] + small[1];
else if (boundary[1][1] == 3) boxhi[1] = MAX(hi[1]+small[1],minyhi);
if (boxlo[1] > boxhi[1]) error->all(FLERR,"Illegal simulation box");
//xy *= (boxhi[1]-boxlo[1]) / yprd;
}
if (zperiodic == 0) {
lo[0] = 0.0; lo[1] = 0.0; lo[2] = -all[2][0];
lamda2x(lo,lo);
hi[0] = 0.0; hi[1] = 0.0; hi[2] = all[2][1];
lamda2x(hi,hi);
if (boundary[2][0] == 2) boxlo[2] = lo[2] - small[2];
else if (boundary[2][0] == 3) boxlo[2] = MIN(lo[2]-small[2],minzlo);
if (boundary[2][1] == 2) boxhi[2] = hi[2] + small[2];
else if (boundary[2][1] == 3) boxhi[2] = MAX(hi[2]+small[2],minzhi);
if (boxlo[2] > boxhi[2]) error->all(FLERR,"Illegal simulation box");
//xz *= (boxhi[2]-boxlo[2]) / xprd;
//yz *= (boxhi[2]-boxlo[2]) / yprd;
}
}
}
// reset box whether shrink-wrapping or not
set_global_box();
set_local_box();
// if shrink-wrapped & kspace is defined (i.e. using MSM), call setup()
// also call init() (to test for compatibility) ?
if (nonperiodic == 2 && force->kspace) {
//force->kspace->init();
force->kspace->setup();
}
// if shrink-wrapped & triclinic, re-convert to lamda coords for new box
// re-invoke pbc() b/c x2lamda result can be outside [0,1] due to roundoff
if (nonperiodic == 2 && triclinic) {
x2lamda(atom->nlocal);
pbc();
}
}
/* ----------------------------------------------------------------------
enforce PBC and modify box image flags for each atom
called every reneighboring and by other commands that change atoms
resulting coord must satisfy lo <= coord < hi
MAX is important since coord - prd < lo can happen when coord = hi
if fix deform, remap velocity of fix group atoms by box edge velocities
for triclinic, atoms must be in lamda coords (0-1) before pbc is called
image = 10 or 20 bits for each dimension depending on sizeof(imageint)
increment/decrement in wrap-around fashion
------------------------------------------------------------------------- */
void Domain::pbc()
{
int i;
imageint idim,otherdims;
double *lo,*hi,*period;
int nlocal = atom->nlocal;
double **x = atom->x;
double **v = atom->v;
int *mask = atom->mask;
imageint *image = atom->image;
// verify owned atoms have valid numerical coords
// may not if computed pairwise force between 2 atoms at same location
double *coord;
int n3 = 3*nlocal;
coord = &x[0][0]; // note: x is always initialized to at least one element.
int flag = 0;
for (i = 0; i < n3; i++)
if (!ISFINITE(*coord++)) flag = 1;
if (flag) error->one(FLERR,"Non-numeric atom coords - simulation unstable");
// setup for PBC checks
if (triclinic == 0) {
lo = boxlo;
hi = boxhi;
period = prd;
} else {
lo = boxlo_lamda;
hi = boxhi_lamda;
period = prd_lamda;
}
// apply PBC to each owned atom
for (i = 0; i < nlocal; i++) {
if (xperiodic) {
if (x[i][0] < lo[0]) {
x[i][0] += period[0];
if (deform_vremap && mask[i] & deform_groupbit) v[i][0] += h_rate[0];
idim = image[i] & IMGMASK;
otherdims = image[i] ^ idim;
idim--;
idim &= IMGMASK;
image[i] = otherdims | idim;
}
if (x[i][0] >= hi[0]) {
x[i][0] -= period[0];
x[i][0] = MAX(x[i][0],lo[0]);
if (deform_vremap && mask[i] & deform_groupbit) v[i][0] -= h_rate[0];
idim = image[i] & IMGMASK;
otherdims = image[i] ^ idim;
idim++;
idim &= IMGMASK;
image[i] = otherdims | idim;
}
}
if (yperiodic) {
if (x[i][1] < lo[1]) {
x[i][1] += period[1];
if (deform_vremap && mask[i] & deform_groupbit) {
v[i][0] += h_rate[5];
v[i][1] += h_rate[1];
}
idim = (image[i] >> IMGBITS) & IMGMASK;
otherdims = image[i] ^ (idim << IMGBITS);
idim--;
idim &= IMGMASK;
image[i] = otherdims | (idim << IMGBITS);
}
if (x[i][1] >= hi[1]) {
x[i][1] -= period[1];
x[i][1] = MAX(x[i][1],lo[1]);
if (deform_vremap && mask[i] & deform_groupbit) {
v[i][0] -= h_rate[5];
v[i][1] -= h_rate[1];
}
idim = (image[i] >> IMGBITS) & IMGMASK;
otherdims = image[i] ^ (idim << IMGBITS);
idim++;
idim &= IMGMASK;
image[i] = otherdims | (idim << IMGBITS);
}
}
if (zperiodic) {
if (x[i][2] < lo[2]) {
x[i][2] += period[2];
if (deform_vremap && mask[i] & deform_groupbit) {
v[i][0] += h_rate[4];
v[i][1] += h_rate[3];
v[i][2] += h_rate[2];
}
idim = image[i] >> IMG2BITS;
otherdims = image[i] ^ (idim << IMG2BITS);
idim--;
idim &= IMGMASK;
image[i] = otherdims | (idim << IMG2BITS);
}
if (x[i][2] >= hi[2]) {
x[i][2] -= period[2];
x[i][2] = MAX(x[i][2],lo[2]);
if (deform_vremap && mask[i] & deform_groupbit) {
v[i][0] -= h_rate[4];
v[i][1] -= h_rate[3];
v[i][2] -= h_rate[2];
}
idim = image[i] >> IMG2BITS;
otherdims = image[i] ^ (idim << IMG2BITS);
idim++;
idim &= IMGMASK;
image[i] = otherdims | (idim << IMG2BITS);
}
}
}
}
/* ----------------------------------------------------------------------
check that point is inside box boundaries, in [lo,hi) sense
return 1 if true, 0 if false
------------------------------------------------------------------------- */
int Domain::inside(double* x)
{
double *lo,*hi;
double lamda[3];
if (triclinic == 0) {
lo = boxlo;
hi = boxhi;
if (x[0] < lo[0] || x[0] >= hi[0] ||
x[1] < lo[1] || x[1] >= hi[1] ||
x[2] < lo[2] || x[2] >= hi[2]) return 0;
else return 1;
} else {
lo = boxlo_lamda;
hi = boxhi_lamda;
x2lamda(x,lamda);
if (lamda[0] < lo[0] || lamda[0] >= hi[0] ||
lamda[1] < lo[1] || lamda[1] >= hi[1] ||
lamda[2] < lo[2] || lamda[2] >= hi[2]) return 0;
else return 1;
}
}
/* ----------------------------------------------------------------------
check that point is inside nonperiodic boundaries, in [lo,hi) sense
return 1 if true, 0 if false
------------------------------------------------------------------------- */
int Domain::inside_nonperiodic(double* x)
{
double *lo,*hi;
double lamda[3];
if (xperiodic && yperiodic && zperiodic) return 1;
if (triclinic == 0) {
lo = boxlo;
hi = boxhi;
if (!xperiodic && (x[0] < lo[0] || x[0] >= hi[0])) return 0;
if (!yperiodic && (x[1] < lo[1] || x[1] >= hi[1])) return 0;
if (!zperiodic && (x[2] < lo[2] || x[2] >= hi[2])) return 0;
return 1;
} else {
lo = boxlo_lamda;
hi = boxhi_lamda;
x2lamda(x,lamda);
if (!xperiodic && (lamda[0] < lo[0] || lamda[0] >= hi[0])) return 0;
if (!yperiodic && (lamda[1] < lo[1] || lamda[1] >= hi[1])) return 0;
if (!zperiodic && (lamda[2] < lo[2] || lamda[2] >= hi[2])) return 0;
return 1;
}
}
/* ----------------------------------------------------------------------
warn if image flags of any bonded atoms are inconsistent
could be a problem when using replicate or fix rigid
------------------------------------------------------------------------- */
void Domain::image_check()
{
int i,j,k,n,imol,iatom;
tagint tagprev;
// only need to check if system is molecular and some dimension is periodic
// if running verlet/split, don't check on KSpace partition since
// it has no ghost atoms and thus bond partners won't exist
if (!atom->molecular) return;
if (!xperiodic && !yperiodic && (dimension == 2 || !zperiodic)) return;
if (strncmp(update->integrate_style,"verlet/split",12) == 0 &&
universe->iworld != 0) return;
// communicate unwrapped position of owned atoms to ghost atoms
double **unwrap;
memory->create(unwrap,atom->nmax,3,"domain:unwrap");
double **x = atom->x;
imageint *image = atom->image;
int nlocal = atom->nlocal;
for (i = 0; i < nlocal; i++)
unmap(x[i],image[i],unwrap[i]);
comm->forward_comm_array(3,unwrap);
// compute unwrapped extent of each bond
// flag if any bond component is longer than 1/2 of periodic box length
// flag if any bond component is longer than non-periodic box length
// which means image flags in that dimension were different
int molecular = atom->molecular;
int *num_bond = atom->num_bond;
tagint **bond_atom = atom->bond_atom;
int **bond_type = atom->bond_type;
tagint *tag = atom->tag;
int *molindex = atom->molindex;
int *molatom = atom->molatom;
Molecule **onemols = atom->avec->onemols;
double delx,dely,delz;
int lostbond = output->thermo->lostbond;
int nmissing = 0;
int flag = 0;
for (i = 0; i < nlocal; i++) {
if (molecular == 1) n = num_bond[i];
else {
if (molindex[i] < 0) continue;
imol = molindex[i];
iatom = molatom[i];
n = onemols[imol]->num_bond[iatom];
}
for (j = 0; j < n; j++) {
if (molecular == 1) {
if (bond_type[i][j] <= 0) continue;
k = atom->map(bond_atom[i][j]);
} else {
if (onemols[imol]->bond_type[iatom][j] < 0) continue;
tagprev = tag[i] - iatom - 1;
k = atom->map(onemols[imol]->bond_atom[iatom][j]+tagprev);
}
if (k == -1) {
nmissing++;
if (lostbond == ERROR)
error->one(FLERR,"Bond atom missing in image check");
continue;
}
delx = unwrap[i][0] - unwrap[k][0];
dely = unwrap[i][1] - unwrap[k][1];
delz = unwrap[i][2] - unwrap[k][2];
if (xperiodic && delx > xprd_half) flag = 1;
if (xperiodic && dely > yprd_half) flag = 1;
if (dimension == 3 && zperiodic && delz > zprd_half) flag = 1;
if (!xperiodic && delx > xprd) flag = 1;
if (!yperiodic && dely > yprd) flag = 1;
if (dimension == 3 && !zperiodic && delz > zprd) flag = 1;
}
}
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_MAX,world);
if (flagall && comm->me == 0)
error->warning(FLERR,"Inconsistent image flags");
if (lostbond == WARN) {
int all;
MPI_Allreduce(&nmissing,&all,1,MPI_INT,MPI_SUM,world);
if (all && comm->me == 0)
error->warning(FLERR,"Bond atom missing in image check");
}
memory->destroy(unwrap);
}
/* ----------------------------------------------------------------------
warn if end atoms in any bonded interaction
are further apart than half a periodic box length
could cause problems when bonded neighbor list is built since
closest_image() could return wrong image
------------------------------------------------------------------------- */
void Domain::box_too_small_check()
{
int i,j,k,n,imol,iatom;
tagint tagprev;
// only need to check if system is molecular and some dimension is periodic
// if running verlet/split, don't check on KSpace partition since
// it has no ghost atoms and thus bond partners won't exist
if (!atom->molecular) return;
if (!xperiodic && !yperiodic && (dimension == 2 || !zperiodic)) return;
if (strncmp(update->integrate_style,"verlet/split",12) == 0 &&
universe->iworld != 0) return;
// maxbondall = longest current bond length
// if periodic box dim is tiny (less than 2 * bond-length),
// minimum_image() itself may compute bad bond lengths
// in this case, image_check() should warn,
// assuming 2 atoms have consistent image flags
int molecular = atom->molecular;
double **x = atom->x;
int *num_bond = atom->num_bond;
tagint **bond_atom = atom->bond_atom;
int **bond_type = atom->bond_type;
tagint *tag = atom->tag;
int *molindex = atom->molindex;
int *molatom = atom->molatom;
Molecule **onemols = atom->avec->onemols;
int nlocal = atom->nlocal;
double delx,dely,delz,rsq;
double maxbondme = 0.0;
int lostbond = output->thermo->lostbond;
int nmissing = 0;
for (i = 0; i < nlocal; i++) {
if (molecular == 1) n = num_bond[i];
else {
if (molindex[i] < 0) continue;
imol = molindex[i];
iatom = molatom[i];
n = onemols[imol]->num_bond[iatom];
}
for (j = 0; j < n; j++) {
if (molecular == 1) {
if (bond_type[i][j] <= 0) continue;
k = atom->map(bond_atom[i][j]);
} else {
if (onemols[imol]->bond_type[iatom][j] < 0) continue;
tagprev = tag[i] - iatom - 1;
k = atom->map(onemols[imol]->bond_atom[iatom][j]+tagprev);
}
if (k == -1) {
nmissing++;
if (lostbond == ERROR)
error->one(FLERR,"Bond atom missing in box size check");
continue;
}
delx = x[i][0] - x[k][0];
dely = x[i][1] - x[k][1];
delz = x[i][2] - x[k][2];
minimum_image(delx,dely,delz);
rsq = delx*delx + dely*dely + delz*delz;
maxbondme = MAX(maxbondme,rsq);
}
}
if (lostbond == WARN) {
int all;
MPI_Allreduce(&nmissing,&all,1,MPI_INT,MPI_SUM,world);
if (all && comm->me == 0)
error->warning(FLERR,"Bond atom missing in box size check");
}
double maxbondall;
MPI_Allreduce(&maxbondme,&maxbondall,1,MPI_DOUBLE,MPI_MAX,world);
maxbondall = sqrt(maxbondall);
// maxdelta = furthest apart 2 atoms in a bonded interaction can be
// include BONDSTRETCH factor to account for dynamics
double maxdelta = maxbondall * BONDSTRETCH;
if (atom->nangles) maxdelta = 2.0 * maxbondall * BONDSTRETCH;
if (atom->ndihedrals) maxdelta = 3.0 * maxbondall * BONDSTRETCH;
// warn if maxdelta > than half any periodic box length
// since atoms in the interaction could rotate into that dimension
int flag = 0;
if (xperiodic && maxdelta > xprd_half) flag = 1;
if (yperiodic && maxdelta > yprd_half) flag = 1;
if (dimension == 3 && zperiodic && maxdelta > zprd_half) flag = 1;
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_MAX,world);
if (flagall && comm->me == 0)
error->warning(FLERR,
"Bond/angle/dihedral extent > half of periodic box length");
}
/* ----------------------------------------------------------------------
check warn if any proc's subbox is smaller than thresh
since may lead to lost atoms in comm->exchange()
current callers set thresh = neighbor skin
------------------------------------------------------------------------- */
void Domain::subbox_too_small_check(double thresh)
{
int flag = 0;
if (!triclinic) {
if (subhi[0]-sublo[0] < thresh || subhi[1]-sublo[1] < thresh) flag = 1;
if (dimension == 3 && subhi[2]-sublo[2] < thresh) flag = 1;
} else {
double delta = subhi_lamda[0] - sublo_lamda[0];
if (delta*prd[0] < thresh) flag = 1;
delta = subhi_lamda[1] - sublo_lamda[1];
if (delta*prd[1] < thresh) flag = 1;
if (dimension == 3) {
delta = subhi_lamda[2] - sublo_lamda[2];
if (delta*prd[2] < thresh) flag = 1;
}
}
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_SUM,world);
if (flagall && comm->me == 0)
error->warning(FLERR,"Proc sub-domain size < neighbor skin, "
"could lead to lost atoms");
}
/* ----------------------------------------------------------------------
minimum image convention in periodic dimensions
use 1/2 of box size as test
for triclinic, also add/subtract tilt factors in other dims as needed
changed "if" to "while" to enable distance to
far-away ghost atom returned by atom->map() to be wrapped back into box
could be problem for looking up atom IDs when cutoff > boxsize
+ this should not be used if atom has moved infinitely far outside box
+ b/c while could iterate forever
+ e.g. fix shake prediction of new position with highly overlapped atoms
+ use minimum_image_once() instead
------------------------------------------------------------------------- */
void Domain::minimum_image(double &dx, double &dy, double &dz)
{
if (triclinic == 0) {
if (xperiodic) {
while (fabs(dx) > xprd_half) {
if (dx < 0.0) dx += xprd;
else dx -= xprd;
}
}
if (yperiodic) {
while (fabs(dy) > yprd_half) {
if (dy < 0.0) dy += yprd;
else dy -= yprd;
}
}
if (zperiodic) {
while (fabs(dz) > zprd_half) {
if (dz < 0.0) dz += zprd;
else dz -= zprd;
}
}
} else {
if (zperiodic) {
while (fabs(dz) > zprd_half) {
if (dz < 0.0) {
dz += zprd;
dy += yz;
dx += xz;
} else {
dz -= zprd;
dy -= yz;
dx -= xz;
}
}
}
if (yperiodic) {
while (fabs(dy) > yprd_half) {
if (dy < 0.0) {
dy += yprd;
dx += xy;
} else {
dy -= yprd;
dx -= xy;
}
}
}
if (xperiodic) {
while (fabs(dx) > xprd_half) {
if (dx < 0.0) dx += xprd;
else dx -= xprd;
}
}
}
}
/* ----------------------------------------------------------------------
minimum image convention in periodic dimensions
use 1/2 of box size as test
for triclinic, also add/subtract tilt factors in other dims as needed
changed "if" to "while" to enable distance to
far-away ghost atom returned by atom->map() to be wrapped back into box
could be problem for looking up atom IDs when cutoff > boxsize
+ this should not be used if atom has moved infinitely far outside box
+ b/c while could iterate forever
+ e.g. fix shake prediction of new position with highly overlapped atoms
+ use minimum_image_once() instead
------------------------------------------------------------------------- */
void Domain::minimum_image(double *delta)
{
if (triclinic == 0) {
if (xperiodic) {
while (fabs(delta[0]) > xprd_half) {
if (delta[0] < 0.0) delta[0] += xprd;
else delta[0] -= xprd;
}
}
if (yperiodic) {
while (fabs(delta[1]) > yprd_half) {
if (delta[1] < 0.0) delta[1] += yprd;
else delta[1] -= yprd;
}
}
if (zperiodic) {
while (fabs(delta[2]) > zprd_half) {
if (delta[2] < 0.0) delta[2] += zprd;
else delta[2] -= zprd;
}
}
} else {
if (zperiodic) {
while (fabs(delta[2]) > zprd_half) {
if (delta[2] < 0.0) {
delta[2] += zprd;
delta[1] += yz;
delta[0] += xz;
} else {
delta[2] -= zprd;
delta[1] -= yz;
delta[0] -= xz;
}
}
}
if (yperiodic) {
while (fabs(delta[1]) > yprd_half) {
if (delta[1] < 0.0) {
delta[1] += yprd;
delta[0] += xy;
} else {
delta[1] -= yprd;
delta[0] -= xy;
}
}
}
if (xperiodic) {
while (fabs(delta[0]) > xprd_half) {
if (delta[0] < 0.0) delta[0] += xprd;
else delta[0] -= xprd;
}
}
}
}
+/* ----------------------------------------------------------------------
+ minimum image convention in periodic dimensions
+ use 1/2 of box size as test
+ for triclinic, also add/subtract tilt factors in other dims as needed
+ only shift by one box length in each direction
+ this should not be used if multiple box shifts are required
+------------------------------------------------------------------------- */
+
+void Domain::minimum_image_once(double *delta)
+{
+ if (triclinic == 0) {
+ if (xperiodic) {
+ if (fabs(delta[0]) > xprd_half) {
+ if (delta[0] < 0.0) delta[0] += xprd;
+ else delta[0] -= xprd;
+ }
+ }
+ if (yperiodic) {
+ if (fabs(delta[1]) > yprd_half) {
+ if (delta[1] < 0.0) delta[1] += yprd;
+ else delta[1] -= yprd;
+ }
+ }
+ if (zperiodic) {
+ if (fabs(delta[2]) > zprd_half) {
+ if (delta[2] < 0.0) delta[2] += zprd;
+ else delta[2] -= zprd;
+ }
+ }
+
+ } else {
+ if (zperiodic) {
+ if (fabs(delta[2]) > zprd_half) {
+ if (delta[2] < 0.0) {
+ delta[2] += zprd;
+ delta[1] += yz;
+ delta[0] += xz;
+ } else {
+ delta[2] -= zprd;
+ delta[1] -= yz;
+ delta[0] -= xz;
+ }
+ }
+ }
+ if (yperiodic) {
+ if (fabs(delta[1]) > yprd_half) {
+ if (delta[1] < 0.0) {
+ delta[1] += yprd;
+ delta[0] += xy;
+ } else {
+ delta[1] -= yprd;
+ delta[0] -= xy;
+ }
+ }
+ }
+ if (xperiodic) {
+ if (fabs(delta[0]) > xprd_half) {
+ if (delta[0] < 0.0) delta[0] += xprd;
+ else delta[0] -= xprd;
+ }
+ }
+ }
+}
+
/* ----------------------------------------------------------------------
return local index of atom J or any of its images that is closest to atom I
if J is not a valid index like -1, just return it
------------------------------------------------------------------------- */
int Domain::closest_image(int i, int j)
{
if (j < 0) return j;
int *sametag = atom->sametag;
double **x = atom->x;
double *xi = x[i];
int closest = j;
double delx = xi[0] - x[j][0];
double dely = xi[1] - x[j][1];
double delz = xi[2] - x[j][2];
double rsqmin = delx*delx + dely*dely + delz*delz;
double rsq;
while (sametag[j] >= 0) {
j = sametag[j];
delx = xi[0] - x[j][0];
dely = xi[1] - x[j][1];
delz = xi[2] - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq < rsqmin) {
rsqmin = rsq;
closest = j;
}
}
return closest;
}
/* ----------------------------------------------------------------------
return local index of atom J or any of its images that is closest to pos
if J is not a valid index like -1, just return it
------------------------------------------------------------------------- */
int Domain::closest_image(double *pos, int j)
{
if (j < 0) return j;
int *sametag = atom->sametag;
double **x = atom->x;
int closest = j;
double delx = pos[0] - x[j][0];
double dely = pos[1] - x[j][1];
double delz = pos[2] - x[j][2];
double rsqmin = delx*delx + dely*dely + delz*delz;
double rsq;
while (sametag[j] >= 0) {
j = sametag[j];
delx = pos[0] - x[j][0];
dely = pos[1] - x[j][1];
delz = pos[2] - x[j][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq < rsqmin) {
rsqmin = rsq;
closest = j;
}
}
return closest;
}
/* ----------------------------------------------------------------------
find and return Xj image = periodic image of Xj that is closest to Xi
for triclinic, add/subtract tilt factors in other dims as needed
not currently used (Jan 2017):
used to be called by pair TIP4P styles but no longer,
due to use of other closest_image() method
------------------------------------------------------------------------- */
void Domain::closest_image(const double * const xi, const double * const xj,
double * const xjimage)
{
double dx = xj[0] - xi[0];
double dy = xj[1] - xi[1];
double dz = xj[2] - xi[2];
if (triclinic == 0) {
if (xperiodic) {
if (dx < 0.0) {
while (dx < 0.0) dx += xprd;
if (dx > xprd_half) dx -= xprd;
} else {
while (dx > 0.0) dx -= xprd;
if (dx < -xprd_half) dx += xprd;
}
}
if (yperiodic) {
if (dy < 0.0) {
while (dy < 0.0) dy += yprd;
if (dy > yprd_half) dy -= yprd;
} else {
while (dy > 0.0) dy -= yprd;
if (dy < -yprd_half) dy += yprd;
}
}
if (zperiodic) {
if (dz < 0.0) {
while (dz < 0.0) dz += zprd;
if (dz > zprd_half) dz -= zprd;
} else {
while (dz > 0.0) dz -= zprd;
if (dz < -zprd_half) dz += zprd;
}
}
} else {
if (zperiodic) {
if (dz < 0.0) {
while (dz < 0.0) {
dz += zprd;
dy += yz;
dx += xz;
}
if (dz > zprd_half) {
dz -= zprd;
dy -= yz;
dx -= xz;
}
} else {
while (dz > 0.0) {
dz -= zprd;
dy -= yz;
dx -= xz;
}
if (dz < -zprd_half) {
dz += zprd;
dy += yz;
dx += xz;
}
}
}
if (yperiodic) {
if (dy < 0.0) {
while (dy < 0.0) {
dy += yprd;
dx += xy;
}
if (dy > yprd_half) {
dy -= yprd;
dx -= xy;
}
} else {
while (dy > 0.0) {
dy -= yprd;
dx -= xy;
}
if (dy < -yprd_half) {
dy += yprd;
dx += xy;
}
}
}
if (xperiodic) {
if (dx < 0.0) {
while (dx < 0.0) dx += xprd;
if (dx > xprd_half) dx -= xprd;
} else {
while (dx > 0.0) dx -= xprd;
if (dx < -xprd_half) dx += xprd;
}
}
}
xjimage[0] = xi[0] + dx;
xjimage[1] = xi[1] + dy;
xjimage[2] = xi[2] + dz;
}
/* ----------------------------------------------------------------------
remap the point into the periodic box no matter how far away
adjust 3 image flags encoded in image accordingly
resulting coord must satisfy lo <= coord < hi
MAX is important since coord - prd < lo can happen when coord = hi
for triclinic, point is converted to lamda coords (0-1) before doing remap
image = 10 bits for each dimension
increment/decrement in wrap-around fashion
------------------------------------------------------------------------- */
void Domain::remap(double *x, imageint &image)
{
double *lo,*hi,*period,*coord;
double lamda[3];
imageint idim,otherdims;
if (triclinic == 0) {
lo = boxlo;
hi = boxhi;
period = prd;
coord = x;
} else {
lo = boxlo_lamda;
hi = boxhi_lamda;
period = prd_lamda;
x2lamda(x,lamda);
coord = lamda;
}
if (xperiodic) {
while (coord[0] < lo[0]) {
coord[0] += period[0];
idim = image & IMGMASK;
otherdims = image ^ idim;
idim--;
idim &= IMGMASK;
image = otherdims | idim;
}
while (coord[0] >= hi[0]) {
coord[0] -= period[0];
idim = image & IMGMASK;
otherdims = image ^ idim;
idim++;
idim &= IMGMASK;
image = otherdims | idim;
}
coord[0] = MAX(coord[0],lo[0]);
}
if (yperiodic) {
while (coord[1] < lo[1]) {
coord[1] += period[1];
idim = (image >> IMGBITS) & IMGMASK;
otherdims = image ^ (idim << IMGBITS);
idim--;
idim &= IMGMASK;
image = otherdims | (idim << IMGBITS);
}
while (coord[1] >= hi[1]) {
coord[1] -= period[1];
idim = (image >> IMGBITS) & IMGMASK;
otherdims = image ^ (idim << IMGBITS);
idim++;
idim &= IMGMASK;
image = otherdims | (idim << IMGBITS);
}
coord[1] = MAX(coord[1],lo[1]);
}
if (zperiodic) {
while (coord[2] < lo[2]) {
coord[2] += period[2];
idim = image >> IMG2BITS;
otherdims = image ^ (idim << IMG2BITS);
idim--;
idim &= IMGMASK;
image = otherdims | (idim << IMG2BITS);
}
while (coord[2] >= hi[2]) {
coord[2] -= period[2];
idim = image >> IMG2BITS;
otherdims = image ^ (idim << IMG2BITS);
idim++;
idim &= IMGMASK;
image = otherdims | (idim << IMG2BITS);
}
coord[2] = MAX(coord[2],lo[2]);
}
if (triclinic) lamda2x(coord,x);
}
/* ----------------------------------------------------------------------
remap the point into the periodic box no matter how far away
no image flag calculation
resulting coord must satisfy lo <= coord < hi
MAX is important since coord - prd < lo can happen when coord = hi
for triclinic, point is converted to lamda coords (0-1) before remap
------------------------------------------------------------------------- */
void Domain::remap(double *x)
{
double *lo,*hi,*period,*coord;
double lamda[3];
if (triclinic == 0) {
lo = boxlo;
hi = boxhi;
period = prd;
coord = x;
} else {
lo = boxlo_lamda;
hi = boxhi_lamda;
period = prd_lamda;
x2lamda(x,lamda);
coord = lamda;
}
if (xperiodic) {
while (coord[0] < lo[0]) coord[0] += period[0];
while (coord[0] >= hi[0]) coord[0] -= period[0];
coord[0] = MAX(coord[0],lo[0]);
}
if (yperiodic) {
while (coord[1] < lo[1]) coord[1] += period[1];
while (coord[1] >= hi[1]) coord[1] -= period[1];
coord[1] = MAX(coord[1],lo[1]);
}
if (zperiodic) {
while (coord[2] < lo[2]) coord[2] += period[2];
while (coord[2] >= hi[2]) coord[2] -= period[2];
coord[2] = MAX(coord[2],lo[2]);
}
if (triclinic) lamda2x(coord,x);
}
/* ----------------------------------------------------------------------
remap xnew to be within half box length of xold
do it directly, not iteratively, in case is far away
for triclinic, both points are converted to lamda coords (0-1) before remap
------------------------------------------------------------------------- */
void Domain::remap_near(double *xnew, double *xold)
{
int n;
double *coordnew,*coordold,*period,*half;
double lamdanew[3],lamdaold[3];
if (triclinic == 0) {
period = prd;
half = prd_half;
coordnew = xnew;
coordold = xold;
} else {
period = prd_lamda;
half = prd_half_lamda;
x2lamda(xnew,lamdanew);
coordnew = lamdanew;
x2lamda(xold,lamdaold);
coordold = lamdaold;
}
// iterative form
// if (xperiodic) {
// while (coordnew[0]-coordold[0] > half[0]) coordnew[0] -= period[0];
// while (coordold[0]-coordnew[0] > half[0]) coordnew[0] += period[0];
// }
if (xperiodic) {
if (coordnew[0]-coordold[0] > period[0]) {
n = static_cast<int> ((coordnew[0]-coordold[0])/period[0]);
coordnew[0] -= n*period[0];
}
while (coordnew[0]-coordold[0] > half[0]) coordnew[0] -= period[0];
if (coordold[0]-coordnew[0] > period[0]) {
n = static_cast<int> ((coordold[0]-coordnew[0])/period[0]);
coordnew[0] += n*period[0];
}
while (coordold[0]-coordnew[0] > half[0]) coordnew[0] += period[0];
}
if (yperiodic) {
if (coordnew[1]-coordold[1] > period[1]) {
n = static_cast<int> ((coordnew[1]-coordold[1])/period[1]);
coordnew[1] -= n*period[1];
}
while (coordnew[1]-coordold[1] > half[1]) coordnew[1] -= period[1];
if (coordold[1]-coordnew[1] > period[1]) {
n = static_cast<int> ((coordold[1]-coordnew[1])/period[1]);
coordnew[1] += n*period[1];
}
while (coordold[1]-coordnew[1] > half[1]) coordnew[1] += period[1];
}
if (zperiodic) {
if (coordnew[2]-coordold[2] > period[2]) {
n = static_cast<int> ((coordnew[2]-coordold[2])/period[2]);
coordnew[2] -= n*period[2];
}
while (coordnew[2]-coordold[2] > half[2]) coordnew[2] -= period[2];
if (coordold[2]-coordnew[2] > period[2]) {
n = static_cast<int> ((coordold[2]-coordnew[2])/period[2]);
coordnew[2] += n*period[2];
}
while (coordold[2]-coordnew[2] > half[2]) coordnew[2] += period[2];
}
if (triclinic) lamda2x(coordnew,xnew);
}
/* ----------------------------------------------------------------------
unmap the point via image flags
x overwritten with result, don't reset image flag
for triclinic, use h[] to add in tilt factors in other dims as needed
------------------------------------------------------------------------- */
void Domain::unmap(double *x, imageint image)
{
int xbox = (image & IMGMASK) - IMGMAX;
int ybox = (image >> IMGBITS & IMGMASK) - IMGMAX;
int zbox = (image >> IMG2BITS) - IMGMAX;
if (triclinic == 0) {
x[0] += xbox*xprd;
x[1] += ybox*yprd;
x[2] += zbox*zprd;
} else {
x[0] += h[0]*xbox + h[5]*ybox + h[4]*zbox;
x[1] += h[1]*ybox + h[3]*zbox;
x[2] += h[2]*zbox;
}
}
/* ----------------------------------------------------------------------
unmap the point via image flags
result returned in y, don't reset image flag
for triclinic, use h[] to add in tilt factors in other dims as needed
------------------------------------------------------------------------- */
void Domain::unmap(const double *x, imageint image, double *y)
{
int xbox = (image & IMGMASK) - IMGMAX;
int ybox = (image >> IMGBITS & IMGMASK) - IMGMAX;
int zbox = (image >> IMG2BITS) - IMGMAX;
if (triclinic == 0) {
y[0] = x[0] + xbox*xprd;
y[1] = x[1] + ybox*yprd;
y[2] = x[2] + zbox*zprd;
} else {
y[0] = x[0] + h[0]*xbox + h[5]*ybox + h[4]*zbox;
y[1] = x[1] + h[1]*ybox + h[3]*zbox;
y[2] = x[2] + h[2]*zbox;
}
}
/* ----------------------------------------------------------------------
adjust image flags due to triclinic box flip
flip operation is changing box vectors A,B,C to new A',B',C'
A' = A (A does not change)
B' = B + mA (B shifted by A)
C' = C + pB + nA (C shifted by B and/or A)
this requires the image flags change from (a,b,c) to (a',b',c')
so that x_unwrap for each atom is same before/after
x_unwrap_before = xlocal + aA + bB + cC
x_unwrap_after = xlocal + a'A' + b'B' + c'C'
this requires:
c' = c
b' = b - cp
a' = a - (b-cp)m - cn = a - b'm - cn
in other words, for xy flip, change in x flag depends on current y flag
this is b/c the xy flip dramatically changes which tiled image of
simulation box an unwrapped point maps to
------------------------------------------------------------------------- */
void Domain::image_flip(int m, int n, int p)
{
imageint *image = atom->image;
int nlocal = atom->nlocal;
for (int i = 0; i < nlocal; i++) {
int xbox = (image[i] & IMGMASK) - IMGMAX;
int ybox = (image[i] >> IMGBITS & IMGMASK) - IMGMAX;
int zbox = (image[i] >> IMG2BITS) - IMGMAX;
ybox -= p*zbox;
xbox -= m*ybox + n*zbox;
image[i] = ((imageint) (xbox + IMGMAX) & IMGMASK) |
(((imageint) (ybox + IMGMAX) & IMGMASK) << IMGBITS) |
(((imageint) (zbox + IMGMAX) & IMGMASK) << IMG2BITS);
}
}
/* ----------------------------------------------------------------------
return 1 if this proc owns atom with coords x, else return 0
x is returned remapped into periodic box
if image flag is passed, flag is updated via remap(x,image)
if image = NULL is passed, no update with remap(x)
if shrinkexceed, atom can be outside shrinkwrap boundaries
called from create_atoms() in library.cpp
------------------------------------------------------------------------- */
int Domain::ownatom(int id, double *x, imageint *image, int shrinkexceed)
{
double lamda[3];
double *coord,*blo,*bhi,*slo,*shi;
if (image) remap(x,*image);
else remap(x);
if (triclinic) {
x2lamda(x,lamda);
coord = lamda;
} else coord = x;
// box and subbox bounds for orthogonal vs triclinic
if (triclinic == 0) {
blo = boxlo;
bhi = boxhi;
slo = sublo;
shi = subhi;
} else {
blo = boxlo_lamda;
bhi = boxhi_lamda;
slo = sublo_lamda;
shi = subhi_lamda;
}
if (coord[0] >= slo[0] && coord[0] < shi[0] &&
coord[1] >= slo[1] && coord[1] < shi[1] &&
coord[2] >= slo[2] && coord[2] < shi[2]) return 1;
// check if atom did not return 1 only b/c it was
// outside a shrink-wrapped boundary
if (shrinkexceed) {
int outside = 0;
if (coord[0] < blo[0] && boundary[0][0] > 1) outside = 1;
if (coord[0] >= bhi[0] && boundary[0][1] > 1) outside = 1;
if (coord[1] < blo[1] && boundary[1][0] > 1) outside = 1;
if (coord[1] >= bhi[1] && boundary[1][1] > 1) outside = 1;
if (coord[2] < blo[2] && boundary[2][0] > 1) outside = 1;
if (coord[2] >= bhi[2] && boundary[2][1] > 1) outside = 1;
if (!outside) return 0;
// newcoord = coords pushed back to be on shrink-wrapped boundary
// newcoord is a copy, so caller's x[] is not affected
double newcoord[3];
if (coord[0] < blo[0] && boundary[0][0] > 1) newcoord[0] = blo[0];
else if (coord[0] >= bhi[0] && boundary[0][1] > 1) newcoord[0] = bhi[0];
else newcoord[0] = coord[0];
if (coord[1] < blo[1] && boundary[1][0] > 1) newcoord[1] = blo[1];
else if (coord[1] >= bhi[1] && boundary[1][1] > 1) newcoord[1] = bhi[1];
else newcoord[1] = coord[1];
if (coord[2] < blo[2] && boundary[2][0] > 1) newcoord[2] = blo[2];
else if (coord[2] >= bhi[2] && boundary[2][1] > 1) newcoord[2] = bhi[2];
else newcoord[2] = coord[2];
// re-test for newcoord inside my sub-domain
// use <= test for upper-boundary since may have just put atom at boxhi
if (newcoord[0] >= slo[0] && newcoord[0] <= shi[0] &&
newcoord[1] >= slo[1] && newcoord[1] <= shi[1] &&
newcoord[2] >= slo[2] && newcoord[2] <= shi[2]) return 1;
}
return 0;
}
/* ----------------------------------------------------------------------
create a lattice
------------------------------------------------------------------------- */
void Domain::set_lattice(int narg, char **arg)
{
if (lattice) delete lattice;
lattice = new Lattice(lmp,narg,arg);
}
/* ----------------------------------------------------------------------
create a new region
------------------------------------------------------------------------- */
void Domain::add_region(int narg, char **arg)
{
if (narg < 2) error->all(FLERR,"Illegal region command");
if (strcmp(arg[1],"delete") == 0) {
delete_region(narg,arg);
return;
}
if (find_region(arg[0]) >= 0) error->all(FLERR,"Reuse of region ID");
// extend Region list if necessary
if (nregion == maxregion) {
maxregion += DELTAREGION;
regions = (Region **)
memory->srealloc(regions,maxregion*sizeof(Region *),"domain:regions");
}
// create the Region
if (lmp->suffix_enable) {
if (lmp->suffix) {
char estyle[256];
sprintf(estyle,"%s/%s",arg[1],lmp->suffix);
if (region_map->find(estyle) != region_map->end()) {
RegionCreator region_creator = (*region_map)[estyle];
regions[nregion] = region_creator(lmp, narg, arg);
regions[nregion]->init();
nregion++;
return;
}
}
if (lmp->suffix2) {
char estyle[256];
sprintf(estyle,"%s/%s",arg[1],lmp->suffix2);
if (region_map->find(estyle) != region_map->end()) {
RegionCreator region_creator = (*region_map)[estyle];
regions[nregion] = region_creator(lmp, narg, arg);
regions[nregion]->init();
nregion++;
return;
}
}
}
if (strcmp(arg[1],"none") == 0) error->all(FLERR,"Unknown region style");
if (region_map->find(arg[1]) != region_map->end()) {
RegionCreator region_creator = (*region_map)[arg[1]];
regions[nregion] = region_creator(lmp, narg, arg);
}
else error->all(FLERR,"Unknown region style");
// initialize any region variables via init()
// in case region is used between runs, e.g. to print a variable
regions[nregion]->init();
nregion++;
}
/* ----------------------------------------------------------------------
one instance per region style in style_region.h
------------------------------------------------------------------------- */
template <typename T>
Region *Domain::region_creator(LAMMPS *lmp, int narg, char ** arg)
{
return new T(lmp, narg, arg);
}
/* ----------------------------------------------------------------------
delete a region
------------------------------------------------------------------------- */
void Domain::delete_region(int narg, char **arg)
{
if (narg != 2) error->all(FLERR,"Illegal region command");
int iregion = find_region(arg[0]);
if (iregion == -1) error->all(FLERR,"Delete region ID does not exist");
delete regions[iregion];
regions[iregion] = regions[nregion-1];
nregion--;
}
/* ----------------------------------------------------------------------
return region index if name matches existing region ID
return -1 if no such region
------------------------------------------------------------------------- */
int Domain::find_region(char *name)
{
for (int iregion = 0; iregion < nregion; iregion++)
if (strcmp(name,regions[iregion]->id) == 0) return iregion;
return -1;
}
/* ----------------------------------------------------------------------
(re)set boundary settings
flag = 0, called from the input script
flag = 1, called from change box command
------------------------------------------------------------------------- */
void Domain::set_boundary(int narg, char **arg, int flag)
{
if (narg != 3) error->all(FLERR,"Illegal boundary command");
char c;
for (int idim = 0; idim < 3; idim++)
for (int iside = 0; iside < 2; iside++) {
if (iside == 0) c = arg[idim][0];
else if (iside == 1 && strlen(arg[idim]) == 1) c = arg[idim][0];
else c = arg[idim][1];
if (c == 'p') boundary[idim][iside] = 0;
else if (c == 'f') boundary[idim][iside] = 1;
else if (c == 's') boundary[idim][iside] = 2;
else if (c == 'm') boundary[idim][iside] = 3;
else {
if (flag == 0) error->all(FLERR,"Illegal boundary command");
if (flag == 1) error->all(FLERR,"Illegal change_box command");
}
}
for (int idim = 0; idim < 3; idim++)
if ((boundary[idim][0] == 0 && boundary[idim][1]) ||
(boundary[idim][0] && boundary[idim][1] == 0))
error->all(FLERR,"Both sides of boundary must be periodic");
if (boundary[0][0] == 0) xperiodic = 1;
else xperiodic = 0;
if (boundary[1][0] == 0) yperiodic = 1;
else yperiodic = 0;
if (boundary[2][0] == 0) zperiodic = 1;
else zperiodic = 0;
periodicity[0] = xperiodic;
periodicity[1] = yperiodic;
periodicity[2] = zperiodic;
nonperiodic = 0;
if (xperiodic == 0 || yperiodic == 0 || zperiodic == 0) {
nonperiodic = 1;
if (boundary[0][0] >= 2 || boundary[0][1] >= 2 ||
boundary[1][0] >= 2 || boundary[1][1] >= 2 ||
boundary[2][0] >= 2 || boundary[2][1] >= 2) nonperiodic = 2;
}
}
/* ----------------------------------------------------------------------
set domain attributes
------------------------------------------------------------------------- */
void Domain::set_box(int narg, char **arg)
{
if (narg < 1) error->all(FLERR,"Illegal box command");
int iarg = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"tilt") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal box command");
if (strcmp(arg[iarg+1],"small") == 0) tiltsmall = 1;
else if (strcmp(arg[iarg+1],"large") == 0) tiltsmall = 0;
else error->all(FLERR,"Illegal box command");
iarg += 2;
} else error->all(FLERR,"Illegal box command");
}
}
/* ----------------------------------------------------------------------
print box info, orthogonal or triclinic
------------------------------------------------------------------------- */
void Domain::print_box(const char *str)
{
if (comm->me == 0) {
if (screen) {
if (triclinic == 0)
fprintf(screen,"%sorthogonal box = (%g %g %g) to (%g %g %g)\n",
str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2]);
else {
char *format = (char *)
"%striclinic box = (%g %g %g) to (%g %g %g) with tilt (%g %g %g)\n";
fprintf(screen,format,
str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2],
xy,xz,yz);
}
}
if (logfile) {
if (triclinic == 0)
fprintf(logfile,"%sorthogonal box = (%g %g %g) to (%g %g %g)\n",
str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2]);
else {
char *format = (char *)
"%striclinic box = (%g %g %g) to (%g %g %g) with tilt (%g %g %g)\n";
fprintf(logfile,format,
str,boxlo[0],boxlo[1],boxlo[2],boxhi[0],boxhi[1],boxhi[2],
xy,xz,yz);
}
}
}
}
/* ----------------------------------------------------------------------
format boundary string for output
assume str is 9 chars or more in length
------------------------------------------------------------------------- */
void Domain::boundary_string(char *str)
{
int m = 0;
for (int idim = 0; idim < 3; idim++) {
for (int iside = 0; iside < 2; iside++) {
if (boundary[idim][iside] == 0) str[m++] = 'p';
else if (boundary[idim][iside] == 1) str[m++] = 'f';
else if (boundary[idim][iside] == 2) str[m++] = 's';
else if (boundary[idim][iside] == 3) str[m++] = 'm';
}
str[m++] = ' ';
}
str[8] = '\0';
}
/* ----------------------------------------------------------------------
convert triclinic 0-1 lamda coords to box coords for all N atoms
x = H lamda + x0;
------------------------------------------------------------------------- */
void Domain::lamda2x(int n)
{
double **x = atom->x;
for (int i = 0; i < n; i++) {
x[i][0] = h[0]*x[i][0] + h[5]*x[i][1] + h[4]*x[i][2] + boxlo[0];
x[i][1] = h[1]*x[i][1] + h[3]*x[i][2] + boxlo[1];
x[i][2] = h[2]*x[i][2] + boxlo[2];
}
}
/* ----------------------------------------------------------------------
convert box coords to triclinic 0-1 lamda coords for all N atoms
lamda = H^-1 (x - x0)
------------------------------------------------------------------------- */
void Domain::x2lamda(int n)
{
double delta[3];
double **x = atom->x;
for (int i = 0; i < n; i++) {
delta[0] = x[i][0] - boxlo[0];
delta[1] = x[i][1] - boxlo[1];
delta[2] = x[i][2] - boxlo[2];
x[i][0] = h_inv[0]*delta[0] + h_inv[5]*delta[1] + h_inv[4]*delta[2];
x[i][1] = h_inv[1]*delta[1] + h_inv[3]*delta[2];
x[i][2] = h_inv[2]*delta[2];
}
}
/* ----------------------------------------------------------------------
convert triclinic 0-1 lamda coords to box coords for one atom
x = H lamda + x0;
lamda and x can point to same 3-vector
------------------------------------------------------------------------- */
void Domain::lamda2x(double *lamda, double *x)
{
x[0] = h[0]*lamda[0] + h[5]*lamda[1] + h[4]*lamda[2] + boxlo[0];
x[1] = h[1]*lamda[1] + h[3]*lamda[2] + boxlo[1];
x[2] = h[2]*lamda[2] + boxlo[2];
}
/* ----------------------------------------------------------------------
convert box coords to triclinic 0-1 lamda coords for one atom
lamda = H^-1 (x - x0)
x and lamda can point to same 3-vector
------------------------------------------------------------------------- */
void Domain::x2lamda(double *x, double *lamda)
{
double delta[3];
delta[0] = x[0] - boxlo[0];
delta[1] = x[1] - boxlo[1];
delta[2] = x[2] - boxlo[2];
lamda[0] = h_inv[0]*delta[0] + h_inv[5]*delta[1] + h_inv[4]*delta[2];
lamda[1] = h_inv[1]*delta[1] + h_inv[3]*delta[2];
lamda[2] = h_inv[2]*delta[2];
}
/* ----------------------------------------------------------------------
convert box coords to triclinic 0-1 lamda coords for one atom
use my_boxlo & my_h_inv stored by caller for previous state of box
lamda = H^-1 (x - x0)
x and lamda can point to same 3-vector
------------------------------------------------------------------------- */
void Domain::x2lamda(double *x, double *lamda,
double *my_boxlo, double *my_h_inv)
{
double delta[3];
delta[0] = x[0] - my_boxlo[0];
delta[1] = x[1] - my_boxlo[1];
delta[2] = x[2] - my_boxlo[2];
lamda[0] = my_h_inv[0]*delta[0] + my_h_inv[5]*delta[1] + my_h_inv[4]*delta[2];
lamda[1] = my_h_inv[1]*delta[1] + my_h_inv[3]*delta[2];
lamda[2] = my_h_inv[2]*delta[2];
}
/* ----------------------------------------------------------------------
convert 8 lamda corner pts of lo/hi box to box coords
return bboxlo/hi = bounding box around 8 corner pts in box coords
------------------------------------------------------------------------- */
void Domain::bbox(double *lo, double *hi, double *bboxlo, double *bboxhi)
{
double x[3];
bboxlo[0] = bboxlo[1] = bboxlo[2] = BIG;
bboxhi[0] = bboxhi[1] = bboxhi[2] = -BIG;
x[0] = lo[0]; x[1] = lo[1]; x[2] = lo[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = hi[0]; x[1] = lo[1]; x[2] = lo[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = lo[0]; x[1] = hi[1]; x[2] = lo[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = hi[0]; x[1] = hi[1]; x[2] = lo[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = lo[0]; x[1] = lo[1]; x[2] = hi[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = hi[0]; x[1] = lo[1]; x[2] = hi[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = lo[0]; x[1] = hi[1]; x[2] = hi[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
x[0] = hi[0]; x[1] = hi[1]; x[2] = hi[2];
lamda2x(x,x);
bboxlo[0] = MIN(bboxlo[0],x[0]); bboxhi[0] = MAX(bboxhi[0],x[0]);
bboxlo[1] = MIN(bboxlo[1],x[1]); bboxhi[1] = MAX(bboxhi[1],x[1]);
bboxlo[2] = MIN(bboxlo[2],x[2]); bboxhi[2] = MAX(bboxhi[2],x[2]);
}
/* ----------------------------------------------------------------------
compute 8 corner pts of my triclinic sub-box
output is in corners, see ordering in lamda_box_corners
------------------------------------------------------------------------- */
void Domain::box_corners()
{
lamda_box_corners(boxlo_lamda,boxhi_lamda);
}
/* ----------------------------------------------------------------------
compute 8 corner pts of my triclinic sub-box
output is in corners, see ordering in lamda_box_corners
------------------------------------------------------------------------- */
void Domain::subbox_corners()
{
lamda_box_corners(sublo_lamda,subhi_lamda);
}
/* ----------------------------------------------------------------------
compute 8 corner pts of any triclinic box with lo/hi in lamda coords
8 output corners are ordered with x changing fastest, then y, finally z
could be more efficient if just coded with xy,yz,xz explicitly
------------------------------------------------------------------------- */
void Domain::lamda_box_corners(double *lo, double *hi)
{
corners[0][0] = lo[0]; corners[0][1] = lo[1]; corners[0][2] = lo[2];
lamda2x(corners[0],corners[0]);
corners[1][0] = hi[0]; corners[1][1] = lo[1]; corners[1][2] = lo[2];
lamda2x(corners[1],corners[1]);
corners[2][0] = lo[0]; corners[2][1] = hi[1]; corners[2][2] = lo[2];
lamda2x(corners[2],corners[2]);
corners[3][0] = hi[0]; corners[3][1] = hi[1]; corners[3][2] = lo[2];
lamda2x(corners[3],corners[3]);
corners[4][0] = lo[0]; corners[4][1] = lo[1]; corners[4][2] = hi[2];
lamda2x(corners[4],corners[4]);
corners[5][0] = hi[0]; corners[5][1] = lo[1]; corners[5][2] = hi[2];
lamda2x(corners[5],corners[5]);
corners[6][0] = lo[0]; corners[6][1] = hi[1]; corners[6][2] = hi[2];
lamda2x(corners[6],corners[6]);
corners[7][0] = hi[0]; corners[7][1] = hi[1]; corners[7][2] = hi[2];
lamda2x(corners[7],corners[7]);
}
diff --git a/src/domain.h b/src/domain.h
index 22e319123..0f47a3c2c 100644
--- a/src/domain.h
+++ b/src/domain.h
@@ -1,281 +1,282 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifndef LMP_DOMAIN_H
#define LMP_DOMAIN_H
#include <math.h>
#include "pointers.h"
#include <map>
#include <string>
namespace LAMMPS_NS {
class Domain : protected Pointers {
public:
int box_exist; // 0 = not yet created, 1 = exists
int dimension; // 2 = 2d, 3 = 3d
int nonperiodic; // 0 = periodic in all 3 dims
// 1 = periodic or fixed in all 6
// 2 = shrink-wrap in any of 6
int xperiodic,yperiodic,zperiodic; // 0 = non-periodic, 1 = periodic
int periodicity[3]; // xyz periodicity as array
int boundary[3][2]; // settings for 6 boundaries
// 0 = periodic
// 1 = fixed non-periodic
// 2 = shrink-wrap non-periodic
// 3 = shrink-wrap non-per w/ min
int triclinic; // 0 = orthog box, 1 = triclinic
int tiltsmall; // 1 if limit tilt, else 0
// orthogonal box
double xprd,yprd,zprd; // global box dimensions
double xprd_half,yprd_half,zprd_half; // half dimensions
double prd[3]; // array form of dimensions
double prd_half[3]; // array form of half dimensions
// triclinic box
// xprd,xprd_half,prd,prd_half =
// same as if untilted
double prd_lamda[3]; // lamda box = (1,1,1)
double prd_half_lamda[3]; // lamda half box = (0.5,0.5,0.5)
double boxlo[3],boxhi[3]; // orthogonal box global bounds
// triclinic box
// boxlo/hi = same as if untilted
double boxlo_lamda[3],boxhi_lamda[3]; // lamda box = (0,1)
double boxlo_bound[3],boxhi_bound[3]; // bounding box of tilted domain
double corners[8][3]; // 8 corner points
// orthogonal box & triclinic box
double minxlo,minxhi; // minimum size of global box
double minylo,minyhi; // when shrink-wrapping
double minzlo,minzhi; // tri only possible for non-skew dims
// orthogonal box
double sublo[3],subhi[3]; // sub-box bounds on this proc
// triclinic box
// sublo/hi = undefined
double sublo_lamda[3],subhi_lamda[3]; // bounds of subbox in lamda
// triclinic box
double xy,xz,yz; // 3 tilt factors
double h[6],h_inv[6]; // shape matrix in Voigt notation
double h_rate[6],h_ratelo[3]; // rate of box size/shape change
int box_change; // 1 if any of next 3 flags are set, else 0
int box_change_size; // 1 if box size changes, 0 if not
int box_change_shape; // 1 if box shape changes, 0 if not
int box_change_domain; // 1 if proc sub-domains change, 0 if not
int deform_flag; // 1 if fix deform exist, else 0
int deform_vremap; // 1 if fix deform remaps v, else 0
int deform_groupbit; // atom group to perform v remap for
class Lattice *lattice; // user-defined lattice
int nregion; // # of defined Regions
int maxregion; // max # list can hold
class Region **regions; // list of defined Regions
int copymode;
typedef Region *(*RegionCreator)(LAMMPS *,int,char**);
typedef std::map<std::string,RegionCreator> RegionCreatorMap;
RegionCreatorMap *region_map;
Domain(class LAMMPS *);
virtual ~Domain();
virtual void init();
void set_initial_box(int expandflag=1);
virtual void set_global_box();
virtual void set_lamda_box();
virtual void set_local_box();
virtual void reset_box();
virtual void pbc();
void image_check();
void box_too_small_check();
void subbox_too_small_check(double);
void minimum_image(double &, double &, double &);
void minimum_image(double *);
+ void minimum_image_once(double *);
int closest_image(int, int);
int closest_image(double *, int);
void closest_image(const double * const, const double * const,
double * const);
void remap(double *, imageint &);
void remap(double *);
void remap_near(double *, double *);
void unmap(double *, imageint);
void unmap(const double *, imageint, double *);
void image_flip(int, int, int);
int ownatom(int, double *, imageint *, int);
void set_lattice(int, char **);
void add_region(int, char **);
void delete_region(int, char **);
int find_region(char *);
void set_boundary(int, char **, int);
void set_box(int, char **);
void print_box(const char *);
void boundary_string(char *);
virtual void lamda2x(int);
virtual void x2lamda(int);
virtual void lamda2x(double *, double *);
virtual void x2lamda(double *, double *);
int inside(double *);
int inside_nonperiodic(double *);
void x2lamda(double *, double *, double *, double *);
void bbox(double *, double *, double *, double *);
void box_corners();
void subbox_corners();
void lamda_box_corners(double *, double *);
// minimum image convention check
// return 1 if any distance > 1/2 of box size
// indicates a special neighbor is actually not in a bond,
// but is a far-away image that should be treated as an unbonded neighbor
// inline since called from neighbor build inner loop
inline int minimum_image_check(double dx, double dy, double dz) {
if (xperiodic && fabs(dx) > xprd_half) return 1;
if (yperiodic && fabs(dy) > yprd_half) return 1;
if (zperiodic && fabs(dz) > zprd_half) return 1;
return 0;
}
protected:
double small[3]; // fractions of box lengths
private:
template <typename T> static Region *region_creator(LAMMPS *,int,char**);
};
}
#endif
/* ERROR/WARNING messages:
E: Box bounds are invalid
The box boundaries specified in the read_data file are invalid. The
lo value must be less than the hi value for all 3 dimensions.
E: Cannot skew triclinic box in z for 2d simulation
Self-explanatory.
E: Triclinic box skew is too large
The displacement in a skewed direction must be less than half the box
length in that dimension. E.g. the xy tilt must be between -half and
+half of the x box length. This constraint can be relaxed by using
the box tilt command.
W: Triclinic box skew is large
The displacement in a skewed direction is normally required to be less
than half the box length in that dimension. E.g. the xy tilt must be
between -half and +half of the x box length. You have relaxed the
constraint using the box tilt command, but the warning means that a
LAMMPS simulation may be inefficient as a result.
E: Illegal simulation box
The lower bound of the simulation box is greater than the upper bound.
E: Bond atom missing in image check
The 2nd atom in a particular bond is missing on this processor.
Typically this is because the pairwise cutoff is set too short or the
bond has blown apart and an atom is too far away.
W: Inconsistent image flags
The image flags for a pair on bonded atoms appear to be inconsistent.
Inconsistent means that when the coordinates of the two atoms are
unwrapped using the image flags, the two atoms are far apart.
Specifically they are further apart than half a periodic box length.
Or they are more than a box length apart in a non-periodic dimension.
This is usually due to the initial data file not having correct image
flags for the 2 atoms in a bond that straddles a periodic boundary.
They should be different by 1 in that case. This is a warning because
inconsistent image flags will not cause problems for dynamics or most
LAMMPS simulations. However they can cause problems when such atoms
are used with the fix rigid or replicate commands.
W: Bond atom missing in image check
The 2nd atom in a particular bond is missing on this processor.
Typically this is because the pairwise cutoff is set too short or the
bond has blown apart and an atom is too far away.
E: Bond atom missing in box size check
The 2nd atoms needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away.
W: Bond atom missing in box size check
The 2nd atoms needed to compute a particular bond is missing on this
processor. Typically this is because the pairwise cutoff is set too
short or the bond has blown apart and an atom is too far away.
W: Bond/angle/dihedral extent > half of periodic box length
This is a restriction because LAMMPS can be confused about which image
of an atom in the bonded interaction is the correct one to use.
"Extent" in this context means the maximum end-to-end length of the
bond/angle/dihedral. LAMMPS computes this by taking the maximum bond
length, multiplying by the number of bonds in the interaction (e.g. 3
for a dihedral) and adding a small amount of stretch.
W: Proc sub-domain size < neighbor skin, could lead to lost atoms
The decomposition of the physical domain (likely due to load
balancing) has led to a processor's sub-domain being smaller than the
neighbor skin in one or more dimensions. Since reneighboring is
triggered by atoms moving the skin distance, this may lead to lost
atoms, if an atom moves all the way across a neighboring processor's
sub-domain before reneighboring is triggered.
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Reuse of region ID
A region ID cannot be used twice.
E: Unknown region style
The choice of region style is unknown.
E: Delete region ID does not exist
Self-explanatory.
E: Both sides of boundary must be periodic
Cannot specify a boundary as periodic only on the lo or hi side. Must
be periodic on both sides.
*/
diff --git a/src/min.cpp b/src/min.cpp
index 79d7d6a8b..d308efb84 100644
--- a/src/min.cpp
+++ b/src/min.cpp
@@ -1,821 +1,828 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author: Aidan Thompson (SNL)
improved CG and backtrack ls, added quadratic ls
Sources: Numerical Recipes frprmn routine
"Conjugate Gradient Method Without the Agonizing Pain" by
JR Shewchuk, http://www-2.cs.cmu.edu/~jrs/jrspapers.html#cg
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "min.h"
#include "atom.h"
#include "atom_vec.h"
#include "domain.h"
#include "comm.h"
#include "update.h"
#include "modify.h"
#include "fix_minimize.h"
#include "compute.h"
#include "neighbor.h"
#include "force.h"
#include "pair.h"
#include "bond.h"
#include "angle.h"
#include "dihedral.h"
#include "improper.h"
#include "kspace.h"
#include "output.h"
#include "thermo.h"
#include "timer.h"
#include "memory.h"
#include "error.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
Min::Min(LAMMPS *lmp) : Pointers(lmp)
{
dmax = 0.1;
searchflag = 0;
linestyle = 1;
elist_global = elist_atom = NULL;
vlist_global = vlist_atom = NULL;
nextra_global = 0;
fextra = NULL;
nextra_atom = 0;
xextra_atom = fextra_atom = NULL;
extra_peratom = extra_nlen = NULL;
extra_max = NULL;
requestor = NULL;
external_force_clear = 0;
}
/* ---------------------------------------------------------------------- */
Min::~Min()
{
delete [] elist_global;
delete [] elist_atom;
delete [] vlist_global;
delete [] vlist_atom;
delete [] fextra;
memory->sfree(xextra_atom);
memory->sfree(fextra_atom);
memory->destroy(extra_peratom);
memory->destroy(extra_nlen);
memory->destroy(extra_max);
memory->sfree(requestor);
}
/* ---------------------------------------------------------------------- */
void Min::init()
{
// create fix needed for storing atom-based quantities
// will delete it at end of run
char **fixarg = new char*[3];
fixarg[0] = (char *) "MINIMIZE";
fixarg[1] = (char *) "all";
fixarg[2] = (char *) "MINIMIZE";
modify->add_fix(3,fixarg);
delete [] fixarg;
fix_minimize = (FixMinimize *) modify->fix[modify->nfix-1];
// clear out extra global and per-atom dof
// will receive requests for new per-atom dof during pair init()
// can then add vectors to fix_minimize in setup()
nextra_global = 0;
delete [] fextra;
fextra = NULL;
nextra_atom = 0;
memory->sfree(xextra_atom);
memory->sfree(fextra_atom);
memory->destroy(extra_peratom);
memory->destroy(extra_nlen);
memory->destroy(extra_max);
memory->sfree(requestor);
xextra_atom = fextra_atom = NULL;
extra_peratom = extra_nlen = NULL;
extra_max = NULL;
requestor = NULL;
// virial_style:
// 1 if computed explicitly by pair->compute via sum over pair interactions
// 2 if computed implicitly by pair->virial_compute via sum over ghost atoms
if (force->newton_pair) virial_style = 2;
else virial_style = 1;
// setup lists of computes for global and per-atom PE and pressure
ev_setup();
// detect if fix omp is present for clearing force arrays
int ifix = modify->find_fix("package_omp");
if (ifix >= 0) external_force_clear = 1;
// set flags for arrays to clear in force_clear()
torqueflag = extraflag = 0;
if (atom->torque_flag) torqueflag = 1;
if (atom->avec->forceclearflag) extraflag = 1;
// allow pair and Kspace compute() to be turned off via modify flags
if (force->pair && force->pair->compute_flag) pair_compute_flag = 1;
else pair_compute_flag = 0;
if (force->kspace && force->kspace->compute_flag) kspace_compute_flag = 1;
else kspace_compute_flag = 0;
// orthogonal vs triclinic simulation box
triclinic = domain->triclinic;
// reset reneighboring criteria if necessary
neigh_every = neighbor->every;
neigh_delay = neighbor->delay;
neigh_dist_check = neighbor->dist_check;
if (neigh_every != 1 || neigh_delay != 0 || neigh_dist_check != 1) {
if (comm->me == 0)
error->warning(FLERR,
"Resetting reneighboring criteria during minimization");
}
neighbor->every = 1;
neighbor->delay = 0;
neighbor->dist_check = 1;
niter = neval = 0;
}
/* ----------------------------------------------------------------------
setup before run
------------------------------------------------------------------------- */
void Min::setup(int flag)
{
if (comm->me == 0 && screen) {
fprintf(screen,"Setting up %s style minimization ...\n",
update->minimize_style);
if (flag) {
fprintf(screen," Unit style : %s\n", update->unit_style);
+ fprintf(screen," Current step : " BIGINT_FORMAT "\n",
+ update->ntimestep);
timer->print_timeout(screen);
}
}
update->setupflag = 1;
// setup extra global dof due to fixes
// cannot be done in init() b/c update init() is before modify init()
nextra_global = modify->min_dof();
- if (nextra_global) fextra = new double[nextra_global];
+ if (nextra_global) {
+ fextra = new double[nextra_global];
+ if (comm->me == 0 && screen)
+ fprintf(screen,"WARNING: Energy due to %d extra global DOFs will"
+ " be included in minimizer energies\n",nextra_global);
+ }
// compute for potential energy
int id = modify->find_compute("thermo_pe");
if (id < 0) error->all(FLERR,"Minimization could not find thermo_pe compute");
pe_compute = modify->compute[id];
// style-specific setup does two tasks
// setup extra global dof vectors
// setup extra per-atom dof vectors due to requests from Pair classes
// cannot be done in init() b/c update init() is before modify/pair init()
setup_style();
// ndoftotal = total dof for entire minimization problem
// dof for atoms, extra per-atom, extra global
bigint ndofme = 3 * static_cast<bigint>(atom->nlocal);
for (int m = 0; m < nextra_atom; m++)
ndofme += extra_peratom[m]*atom->nlocal;
MPI_Allreduce(&ndofme,&ndoftotal,1,MPI_LMP_BIGINT,MPI_SUM,world);
ndoftotal += nextra_global;
// setup domain, communication and neighboring
// acquire ghosts
// build neighbor lists
atom->setup();
modify->setup_pre_exchange();
if (triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
domain->reset_box();
comm->setup();
if (neighbor->style) neighbor->setup_bins();
comm->exchange();
if (atom->sortfreq > 0) atom->sort();
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
domain->image_check();
domain->box_too_small_check();
modify->setup_pre_neighbor();
neighbor->build();
neighbor->ncalls = 0;
// remove these restriction eventually
if (searchflag == 0) {
if (nextra_global)
error->all(FLERR,
"Cannot use a damped dynamics min style with fix box/relax");
if (nextra_atom)
error->all(FLERR,
"Cannot use a damped dynamics min style with per-atom DOF");
}
if (strcmp(update->minimize_style,"hftn") == 0) {
if (nextra_global)
error->all(FLERR, "Cannot use hftn min style with fix box/relax");
if (nextra_atom)
error->all(FLERR, "Cannot use hftn min style with per-atom DOF");
}
// atoms may have migrated in comm->exchange()
reset_vectors();
// compute all forces
force->setup();
ev_set(update->ntimestep);
force_clear();
modify->setup_pre_force(vflag);
if (pair_compute_flag) force->pair->compute(eflag,vflag);
else if (force->pair) force->pair->compute_dummy(eflag,vflag);
if (atom->molecular) {
if (force->bond) force->bond->compute(eflag,vflag);
if (force->angle) force->angle->compute(eflag,vflag);
if (force->dihedral) force->dihedral->compute(eflag,vflag);
if (force->improper) force->improper->compute(eflag,vflag);
}
if (force->kspace) {
force->kspace->setup();
if (kspace_compute_flag) force->kspace->compute(eflag,vflag);
else force->kspace->compute_dummy(eflag,vflag);
}
modify->setup_pre_reverse(eflag,vflag);
if (force->newton) comm->reverse_comm();
// update per-atom minimization variables stored by pair styles
if (nextra_atom)
for (int m = 0; m < nextra_atom; m++)
requestor[m]->min_xf_get(m);
modify->setup(vflag);
output->setup(flag);
update->setupflag = 0;
// stats for initial thermo output
ecurrent = pe_compute->compute_scalar();
if (nextra_global) ecurrent += modify->min_energy(fextra);
if (output->thermo->normflag) ecurrent /= atom->natoms;
einitial = ecurrent;
fnorm2_init = sqrt(fnorm_sqr());
fnorminf_init = fnorm_inf();
}
/* ----------------------------------------------------------------------
setup without output or one-time post-init setup
flag = 0 = just force calculation
flag = 1 = reneighbor and force calculation
------------------------------------------------------------------------- */
void Min::setup_minimal(int flag)
{
update->setupflag = 1;
// setup domain, communication and neighboring
// acquire ghosts
// build neighbor lists
if (flag) {
modify->setup_pre_exchange();
if (triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
domain->reset_box();
comm->setup();
if (neighbor->style) neighbor->setup_bins();
comm->exchange();
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
domain->image_check();
domain->box_too_small_check();
modify->setup_pre_neighbor();
neighbor->build();
neighbor->ncalls = 0;
}
// atoms may have migrated in comm->exchange()
reset_vectors();
// compute all forces
ev_set(update->ntimestep);
force_clear();
modify->setup_pre_force(vflag);
if (pair_compute_flag) force->pair->compute(eflag,vflag);
else if (force->pair) force->pair->compute_dummy(eflag,vflag);
if (atom->molecular) {
if (force->bond) force->bond->compute(eflag,vflag);
if (force->angle) force->angle->compute(eflag,vflag);
if (force->dihedral) force->dihedral->compute(eflag,vflag);
if (force->improper) force->improper->compute(eflag,vflag);
}
if (force->kspace) {
force->kspace->setup();
if (kspace_compute_flag) force->kspace->compute(eflag,vflag);
else force->kspace->compute_dummy(eflag,vflag);
}
modify->setup_pre_reverse(eflag,vflag);
if (force->newton) comm->reverse_comm();
// update per-atom minimization variables stored by pair styles
if (nextra_atom)
for (int m = 0; m < nextra_atom; m++)
requestor[m]->min_xf_get(m);
modify->setup(vflag);
update->setupflag = 0;
// stats for Finish to print
ecurrent = pe_compute->compute_scalar();
if (nextra_global) ecurrent += modify->min_energy(fextra);
if (output->thermo->normflag) ecurrent /= atom->natoms;
einitial = ecurrent;
fnorm2_init = sqrt(fnorm_sqr());
fnorminf_init = fnorm_inf();
}
/* ----------------------------------------------------------------------
perform minimization, calling iterate() for N steps
------------------------------------------------------------------------- */
void Min::run(int n)
{
// minimizer iterations
stop_condition = iterate(n);
stopstr = stopstrings(stop_condition);
// if early exit from iterate loop:
// set update->nsteps to niter for Finish stats to print
// set output->next values to this timestep
// call energy_force() to insure vflag is set when forces computed
// output->write does final output for thermo, dump, restart files
// add ntimestep to all computes that store invocation times
// since are hardwiring call to thermo/dumps and computes may not be ready
if (stop_condition != MAXITER) {
update->nsteps = niter;
if (update->restrict_output == 0) {
for (int idump = 0; idump < output->ndump; idump++)
output->next_dump[idump] = update->ntimestep;
output->next_dump_any = update->ntimestep;
if (output->restart_flag) {
output->next_restart = update->ntimestep;
if (output->restart_every_single)
output->next_restart_single = update->ntimestep;
if (output->restart_every_double)
output->next_restart_double = update->ntimestep;
}
}
output->next_thermo = update->ntimestep;
modify->addstep_compute_all(update->ntimestep);
ecurrent = energy_force(0);
output->write(update->ntimestep);
}
}
/* ---------------------------------------------------------------------- */
void Min::cleanup()
{
modify->post_run();
// stats for Finish to print
efinal = ecurrent;
fnorm2_final = sqrt(fnorm_sqr());
fnorminf_final = fnorm_inf();
// reset reneighboring criteria
neighbor->every = neigh_every;
neighbor->delay = neigh_delay;
neighbor->dist_check = neigh_dist_check;
// delete fix at end of run, so its atom arrays won't persist
modify->delete_fix("MINIMIZE");
domain->box_too_small_check();
}
/* ----------------------------------------------------------------------
evaluate potential energy and forces
may migrate atoms due to reneighboring
return new energy, which should include nextra_global dof
return negative gradient stored in atom->f
return negative gradient for nextra_global dof in fextra
------------------------------------------------------------------------- */
double Min::energy_force(int resetflag)
{
// check for reneighboring
// always communicate since minimizer moved atoms
int nflag = neighbor->decide();
if (nflag == 0) {
timer->stamp();
comm->forward_comm();
timer->stamp(Timer::COMM);
} else {
if (modify->n_min_pre_exchange) {
timer->stamp();
modify->min_pre_exchange();
timer->stamp(Timer::MODIFY);
}
if (triclinic) domain->x2lamda(atom->nlocal);
domain->pbc();
if (domain->box_change) {
domain->reset_box();
comm->setup();
if (neighbor->style) neighbor->setup_bins();
}
timer->stamp();
comm->exchange();
if (atom->sortfreq > 0 &&
update->ntimestep >= atom->nextsort) atom->sort();
comm->borders();
if (triclinic) domain->lamda2x(atom->nlocal+atom->nghost);
timer->stamp(Timer::COMM);
if (modify->n_min_pre_neighbor) {
timer->stamp();
modify->min_pre_neighbor();
timer->stamp(Timer::MODIFY);
}
neighbor->build();
timer->stamp(Timer::NEIGH);
}
ev_set(update->ntimestep);
force_clear();
timer->stamp();
if (modify->n_min_pre_force) {
modify->min_pre_force(vflag);
timer->stamp(Timer::MODIFY);
}
if (pair_compute_flag) {
force->pair->compute(eflag,vflag);
timer->stamp(Timer::PAIR);
}
if (atom->molecular) {
if (force->bond) force->bond->compute(eflag,vflag);
if (force->angle) force->angle->compute(eflag,vflag);
if (force->dihedral) force->dihedral->compute(eflag,vflag);
if (force->improper) force->improper->compute(eflag,vflag);
timer->stamp(Timer::BOND);
}
if (kspace_compute_flag) {
force->kspace->compute(eflag,vflag);
timer->stamp(Timer::KSPACE);
}
if (modify->n_min_pre_reverse) {
modify->min_pre_reverse(eflag,vflag);
timer->stamp(Timer::MODIFY);
}
if (force->newton) {
comm->reverse_comm();
timer->stamp(Timer::COMM);
}
// update per-atom minimization variables stored by pair styles
if (nextra_atom)
for (int m = 0; m < nextra_atom; m++)
requestor[m]->min_xf_get(m);
// fixes that affect minimization
if (modify->n_min_post_force) {
timer->stamp();
modify->min_post_force(vflag);
timer->stamp(Timer::MODIFY);
}
// compute potential energy of system
// normalize if thermo PE does
double energy = pe_compute->compute_scalar();
if (nextra_global) energy += modify->min_energy(fextra);
if (output->thermo->normflag) energy /= atom->natoms;
// if reneighbored, atoms migrated
// if resetflag = 1, update x0 of atoms crossing PBC
// reset vectors used by lo-level minimizer
if (nflag) {
if (resetflag) fix_minimize->reset_coords();
reset_vectors();
}
return energy;
}
/* ----------------------------------------------------------------------
clear force on own & ghost atoms
clear other arrays as needed
------------------------------------------------------------------------- */
void Min::force_clear()
{
if (external_force_clear) return;
// clear global force array
// if either newton flag is set, also include ghosts
size_t nbytes = sizeof(double) * atom->nlocal;
if (force->newton) nbytes += sizeof(double) * atom->nghost;
if (nbytes) {
memset(&atom->f[0][0],0,3*nbytes);
if (torqueflag) memset(&atom->torque[0][0],0,3*nbytes);
if (extraflag) atom->avec->force_clear(0,nbytes);
}
}
/* ----------------------------------------------------------------------
pair style makes request to add a per-atom variables to minimization
requestor stores callback to pair class to invoke during min
to get current variable and forces on it and to update the variable
return flag that pair can use if it registers multiple variables
------------------------------------------------------------------------- */
int Min::request(Pair *pair, int peratom, double maxvalue)
{
int n = nextra_atom + 1;
xextra_atom = (double **) memory->srealloc(xextra_atom,n*sizeof(double *),
"min:xextra_atom");
fextra_atom = (double **) memory->srealloc(fextra_atom,n*sizeof(double *),
"min:fextra_atom");
memory->grow(extra_peratom,n,"min:extra_peratom");
memory->grow(extra_nlen,n,"min:extra_nlen");
memory->grow(extra_max,n,"min:extra_max");
requestor = (Pair **) memory->srealloc(requestor,n*sizeof(Pair *),
"min:requestor");
requestor[nextra_atom] = pair;
extra_peratom[nextra_atom] = peratom;
extra_max[nextra_atom] = maxvalue;
nextra_atom++;
return nextra_atom-1;
}
/* ---------------------------------------------------------------------- */
void Min::modify_params(int narg, char **arg)
{
if (narg == 0) error->all(FLERR,"Illegal min_modify command");
int iarg = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"dmax") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal min_modify command");
dmax = force->numeric(FLERR,arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"line") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal min_modify command");
if (strcmp(arg[iarg+1],"backtrack") == 0) linestyle = 0;
else if (strcmp(arg[iarg+1],"quadratic") == 0) linestyle = 1;
else if (strcmp(arg[iarg+1],"forcezero") == 0) linestyle = 2;
else error->all(FLERR,"Illegal min_modify command");
iarg += 2;
} else error->all(FLERR,"Illegal min_modify command");
}
}
/* ----------------------------------------------------------------------
setup lists of computes for global and per-atom PE and pressure
------------------------------------------------------------------------- */
void Min::ev_setup()
{
delete [] elist_global;
delete [] elist_atom;
delete [] vlist_global;
delete [] vlist_atom;
elist_global = elist_atom = NULL;
vlist_global = vlist_atom = NULL;
nelist_global = nelist_atom = 0;
nvlist_global = nvlist_atom = 0;
for (int i = 0; i < modify->ncompute; i++) {
if (modify->compute[i]->peflag) nelist_global++;
if (modify->compute[i]->peatomflag) nelist_atom++;
if (modify->compute[i]->pressflag) nvlist_global++;
if (modify->compute[i]->pressatomflag) nvlist_atom++;
}
if (nelist_global) elist_global = new Compute*[nelist_global];
if (nelist_atom) elist_atom = new Compute*[nelist_atom];
if (nvlist_global) vlist_global = new Compute*[nvlist_global];
if (nvlist_atom) vlist_atom = new Compute*[nvlist_atom];
nelist_global = nelist_atom = 0;
nvlist_global = nvlist_atom = 0;
for (int i = 0; i < modify->ncompute; i++) {
if (modify->compute[i]->peflag)
elist_global[nelist_global++] = modify->compute[i];
if (modify->compute[i]->peatomflag)
elist_atom[nelist_atom++] = modify->compute[i];
if (modify->compute[i]->pressflag)
vlist_global[nvlist_global++] = modify->compute[i];
if (modify->compute[i]->pressatomflag)
vlist_atom[nvlist_atom++] = modify->compute[i];
}
}
/* ----------------------------------------------------------------------
set eflag,vflag for current iteration
invoke matchstep() on all timestep-dependent computes to clear their arrays
eflag/vflag based on computes that need info on this ntimestep
always set eflag_global = 1, since need energy every iteration
eflag = 0 = no energy computation
eflag = 1 = global energy only
eflag = 2 = per-atom energy only
eflag = 3 = both global and per-atom energy
vflag = 0 = no virial computation (pressure)
vflag = 1 = global virial with pair portion via sum of pairwise interactions
vflag = 2 = global virial with pair portion via F dot r including ghosts
vflag = 4 = per-atom virial only
vflag = 5 or 6 = both global and per-atom virial
------------------------------------------------------------------------- */
void Min::ev_set(bigint ntimestep)
{
int i,flag;
int eflag_global = 1;
for (i = 0; i < nelist_global; i++)
elist_global[i]->matchstep(ntimestep);
flag = 0;
int eflag_atom = 0;
for (i = 0; i < nelist_atom; i++)
if (elist_atom[i]->matchstep(ntimestep)) flag = 1;
if (flag) eflag_atom = 2;
if (eflag_global) update->eflag_global = update->ntimestep;
if (eflag_atom) update->eflag_atom = update->ntimestep;
eflag = eflag_global + eflag_atom;
flag = 0;
int vflag_global = 0;
for (i = 0; i < nvlist_global; i++)
if (vlist_global[i]->matchstep(ntimestep)) flag = 1;
if (flag) vflag_global = virial_style;
flag = 0;
int vflag_atom = 0;
for (i = 0; i < nvlist_atom; i++)
if (vlist_atom[i]->matchstep(ntimestep)) flag = 1;
if (flag) vflag_atom = 4;
if (vflag_global) update->vflag_global = update->ntimestep;
if (vflag_atom) update->vflag_atom = update->ntimestep;
vflag = vflag_global + vflag_atom;
}
/* ----------------------------------------------------------------------
compute and return ||force||_2^2
------------------------------------------------------------------------- */
double Min::fnorm_sqr()
{
int i,n;
double *fatom;
double local_norm2_sqr = 0.0;
for (i = 0; i < nvec; i++) local_norm2_sqr += fvec[i]*fvec[i];
if (nextra_atom) {
for (int m = 0; m < nextra_atom; m++) {
fatom = fextra_atom[m];
n = extra_nlen[m];
for (i = 0; i < n; i++)
local_norm2_sqr += fatom[i]*fatom[i];
}
}
double norm2_sqr = 0.0;
MPI_Allreduce(&local_norm2_sqr,&norm2_sqr,1,MPI_DOUBLE,MPI_SUM,world);
if (nextra_global)
for (i = 0; i < nextra_global; i++)
norm2_sqr += fextra[i]*fextra[i];
return norm2_sqr;
}
/* ----------------------------------------------------------------------
compute and return ||force||_inf
------------------------------------------------------------------------- */
double Min::fnorm_inf()
{
int i,n;
double *fatom;
double local_norm_inf = 0.0;
for (i = 0; i < nvec; i++)
local_norm_inf = MAX(fabs(fvec[i]),local_norm_inf);
if (nextra_atom) {
for (int m = 0; m < nextra_atom; m++) {
fatom = fextra_atom[m];
n = extra_nlen[m];
for (i = 0; i < n; i++)
local_norm_inf = MAX(fabs(fatom[i]),local_norm_inf);
}
}
double norm_inf = 0.0;
MPI_Allreduce(&local_norm_inf,&norm_inf,1,MPI_DOUBLE,MPI_MAX,world);
if (nextra_global)
for (i = 0; i < nextra_global; i++)
norm_inf = MAX(fabs(fextra[i]),norm_inf);
return norm_inf;
}
/* ----------------------------------------------------------------------
possible stop conditions
------------------------------------------------------------------------- */
char *Min::stopstrings(int n)
{
const char *strings[] = {"max iterations",
"max force evaluations",
"energy tolerance",
"force tolerance",
"search direction is not downhill",
"linesearch alpha is zero",
"forces are zero",
"quadratic factors are zero",
"trust region too small",
"HFTN minimizer error",
"walltime limit reached"};
return (char *) strings[n];
}
diff --git a/src/min.h b/src/min.h
index 464018e82..021198bc0 100644
--- a/src/min.h
+++ b/src/min.h
@@ -1,147 +1,153 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifndef LMP_MIN_H
#define LMP_MIN_H
#include "pointers.h"
namespace LAMMPS_NS {
class Min : protected Pointers {
public:
double einitial,efinal,eprevious;
double fnorm2_init,fnorminf_init,fnorm2_final,fnorminf_final;
double alpha_final;
int niter,neval;
int stop_condition;
char *stopstr;
int searchflag; // 0 if damped dynamics, 1 if sub-cycles on local search
Min(class LAMMPS *);
virtual ~Min();
virtual void init();
void setup(int flag=1);
void setup_minimal(int);
void run(int);
void cleanup();
int request(class Pair *, int, double);
virtual bigint memory_usage() {return 0;}
void modify_params(int, char **);
double fnorm_sqr();
double fnorm_inf();
virtual void init_style() {}
virtual void setup_style() = 0;
virtual void reset_vectors() = 0;
virtual int iterate(int) = 0;
// possible return values of iterate() method
enum{MAXITER,MAXEVAL,ETOL,FTOL,DOWNHILL,ZEROALPHA,ZEROFORCE,
ZEROQUAD,TRSMALL,INTERROR,TIMEOUT};
protected:
int eflag,vflag; // flags for energy/virial computation
int virial_style; // compute virial explicitly or implicitly
int external_force_clear; // clear forces locally or externally
double dmax; // max dist to move any atom in one step
int linestyle; // 0 = backtrack, 1 = quadratic, 2 = forcezero
int nelist_global,nelist_atom; // # of PE,virial computes to check
int nvlist_global,nvlist_atom;
class Compute **elist_global; // lists of PE,virial Computes
class Compute **elist_atom;
class Compute **vlist_global;
class Compute **vlist_atom;
int triclinic; // 0 if domain is orthog, 1 if triclinic
int pairflag;
int torqueflag,extraflag;
int pair_compute_flag; // 0 if pair->compute is skipped
int kspace_compute_flag; // 0 if kspace->compute is skipped
int narray; // # of arrays stored by fix_minimize
class FixMinimize *fix_minimize; // fix that stores auxiliary data
class Compute *pe_compute; // compute for potential energy
double ecurrent; // current potential energy
bigint ndoftotal; // total dof for entire problem
int nvec; // local atomic dof = length of xvec
double *xvec; // variables for atomic dof, as 1d vector
double *fvec; // force vector for atomic dof, as 1d vector
int nextra_global; // # of extra global dof due to fixes
double *fextra; // force vector for extra global dof
// xextra is stored by fix
int nextra_atom; // # of extra per-atom variables
double **xextra_atom; // ptr to the variable
double **fextra_atom; // ptr to the force on the variable
int *extra_peratom; // # of values in variable, e.g. 3 in x
int *extra_nlen; // total local length of variable, e.g 3*nlocal
double *extra_max; // max allowed change per iter for atom's var
class Pair **requestor; // Pair that stores/manipulates the variable
int neigh_every,neigh_delay,neigh_dist_check; // neighboring params
double energy_force(int);
void force_clear();
double compute_force_norm_sqr();
double compute_force_norm_inf();
void ev_setup();
void ev_set(bigint);
char *stopstrings(int);
};
}
#endif
/* ERROR/WARNING messages:
W: Resetting reneighboring criteria during minimization
Minimization requires that neigh_modify settings be delay = 0, every =
1, check = yes. Since these settings were not in place, LAMMPS
changed them and will restore them to their original values after the
minimization.
+W: Energy due to X extra global DOFs will be included in minimizer energies
+
+When using fixes like box/relax, the potential energy used by the minimizer
+is augmented by an additional energy provided by the fix. Thus the printed
+converged energy may be different from the total potential energy.
+
E: Minimization could not find thermo_pe compute
This compute is created by the thermo command. It must have been
explicitly deleted by a uncompute command.
E: Cannot use a damped dynamics min style with fix box/relax
This is a current restriction in LAMMPS. Use another minimizer
style.
E: Cannot use a damped dynamics min style with per-atom DOF
This is a current restriction in LAMMPS. Use another minimizer
style.
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
*/
diff --git a/src/neighbor.cpp b/src/neighbor.cpp
index 4cd99b41d..1d12ef578 100644
--- a/src/neighbor.cpp
+++ b/src/neighbor.cpp
@@ -1,2420 +1,2420 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
/* ----------------------------------------------------------------------
Contributing author (triclinic and multi-neigh) : Pieter in 't Veld (SNL)
------------------------------------------------------------------------- */
#include <mpi.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include "neighbor.h"
#include "neigh_list.h"
#include "neigh_request.h"
#include "style_nbin.h"
#include "style_nstencil.h"
#include "style_npair.h"
#include "style_ntopo.h"
#include "atom.h"
#include "atom_vec.h"
#include "comm.h"
#include "force.h"
#include "pair.h"
#include "domain.h"
#include "group.h"
#include "modify.h"
#include "fix.h"
#include "compute.h"
#include "update.h"
#include "respa.h"
#include "output.h"
#include "citeme.h"
#include "memory.h"
#include "error.h"
#include <map>
using namespace LAMMPS_NS;
using namespace NeighConst;
#define RQDELTA 1
#define EXDELTA 1
#define BIG 1.0e20
enum{NSQ,BIN,MULTI}; // also in NBin, NeighList, NStencil
enum{NONE,ALL,PARTIAL,TEMPLATE};
static const char cite_neigh_multi[] =
"neighbor multi command:\n\n"
"@Article{Intveld08,\n"
" author = {P.{\\,}J.~in{\\,}'t~Veld and S.{\\,}J.~Plimpton"
" and G.{\\,}S.~Grest},\n"
" title = {Accurate and Efficient Methods for Modeling Colloidal\n"
" Mixtures in an Explicit Solvent using Molecular Dynamics},\n"
" journal = {Comp.~Phys.~Comm.},\n"
" year = 2008,\n"
" volume = 179,\n"
" pages = {320--329}\n"
"}\n\n";
//#define NEIGH_LIST_DEBUG 1
/* ---------------------------------------------------------------------- */
Neighbor::Neighbor(LAMMPS *lmp) : Pointers(lmp),
pairclass(NULL), pairnames(NULL), pairmasks(NULL)
{
MPI_Comm_rank(world,&me);
MPI_Comm_size(world,&nprocs);
firsttime = 1;
style = BIN;
every = 1;
delay = 10;
dist_check = 1;
pgsize = 100000;
oneatom = 2000;
binsizeflag = 0;
build_once = 0;
cluster_check = 0;
ago = -1;
cutneighmax = 0.0;
cutneighsq = NULL;
cutneighghostsq = NULL;
cuttype = NULL;
cuttypesq = NULL;
fixchecklist = NULL;
// pairwise neighbor lists and associated data structs
nlist = 0;
lists = NULL;
nbin = 0;
neigh_bin = NULL;
nstencil = 0;
neigh_stencil = NULL;
neigh_pair = NULL;
nstencil_perpetual = 0;
slist = NULL;
npair_perpetual = 0;
plist = NULL;
nrequest = maxrequest = 0;
requests = NULL;
old_nrequest = 0;
old_requests = NULL;
old_style = style;
old_triclinic = 0;
old_pgsize = pgsize;
old_oneatom = oneatom;
zeroes = NULL;
binclass = NULL;
binnames = NULL;
binmasks = NULL;
stencilclass = NULL;
stencilnames = NULL;
stencilmasks = NULL;
// topology lists
bondwhich = anglewhich = dihedralwhich = improperwhich = NONE;
neigh_bond = NULL;
neigh_angle = NULL;
neigh_dihedral = NULL;
neigh_improper = NULL;
// coords at last neighboring
maxhold = 0;
xhold = NULL;
lastcall = -1;
last_setup_bins = -1;
// pair exclusion list info
includegroup = 0;
nex_type = maxex_type = 0;
ex1_type = ex2_type = NULL;
ex_type = NULL;
nex_group = maxex_group = 0;
ex1_group = ex2_group = ex1_bit = ex2_bit = NULL;
nex_mol = maxex_mol = 0;
ex_mol_group = ex_mol_bit = ex_mol_intra = NULL;
// Kokkos setting
copymode = 0;
}
/* ---------------------------------------------------------------------- */
Neighbor::~Neighbor()
{
if (copymode) return;
memory->destroy(cutneighsq);
memory->destroy(cutneighghostsq);
delete [] cuttype;
delete [] cuttypesq;
delete [] fixchecklist;
for (int i = 0; i < nlist; i++) delete lists[i];
for (int i = 0; i < nbin; i++) delete neigh_bin[i];
for (int i = 0; i < nstencil; i++) delete neigh_stencil[i];
for (int i = 0; i < nlist; i++) delete neigh_pair[i];
delete [] lists;
delete [] neigh_bin;
delete [] neigh_stencil;
delete [] neigh_pair;
delete [] slist;
delete [] plist;
for (int i = 0; i < nlist; i++)
if (requests[i]) delete requests[i];
memory->sfree(requests);
for (int i = 0; i < old_nrequest; i++)
if (old_requests[i]) delete old_requests[i];
memory->sfree(old_requests);
delete [] zeroes;
delete [] binclass;
delete [] binnames;
delete [] binmasks;
delete [] stencilclass;
delete [] stencilnames;
delete [] stencilmasks;
delete [] pairclass;
delete [] pairnames;
delete [] pairmasks;
delete neigh_bond;
delete neigh_angle;
delete neigh_dihedral;
delete neigh_improper;
memory->destroy(xhold);
memory->destroy(ex1_type);
memory->destroy(ex2_type);
memory->destroy(ex_type);
memory->destroy(ex1_group);
memory->destroy(ex2_group);
delete [] ex1_bit;
delete [] ex2_bit;
memory->destroy(ex_mol_group);
delete [] ex_mol_bit;
memory->destroy(ex_mol_intra);
}
/* ---------------------------------------------------------------------- */
void Neighbor::init()
{
int i,j,n;
ncalls = ndanger = 0;
dimension = domain->dimension;
triclinic = domain->triclinic;
newton_pair = force->newton_pair;
// error check
if (delay > 0 && (delay % every) != 0)
error->all(FLERR,"Neighbor delay must be 0 or multiple of every setting");
if (pgsize < 10*oneatom)
error->all(FLERR,"Neighbor page size must be >= 10x the one atom setting");
// ------------------------------------------------------------------
// settings
// bbox lo/hi ptrs = bounding box of entire domain, stored by Domain
if (triclinic == 0) {
bboxlo = domain->boxlo;
bboxhi = domain->boxhi;
} else {
bboxlo = domain->boxlo_bound;
bboxhi = domain->boxhi_bound;
}
// set neighbor cutoffs (force cutoff + skin)
// trigger determines when atoms migrate and neighbor lists are rebuilt
// needs to be non-zero for migration distance check
// even if pair = NULL and no neighbor lists are used
// cutneigh = force cutoff + skin if cutforce > 0, else cutneigh = 0
// cutneighghost = pair cutghost if it requests it, else same as cutneigh
triggersq = 0.25*skin*skin;
boxcheck = 0;
if (domain->box_change && (domain->xperiodic || domain->yperiodic ||
(dimension == 3 && domain->zperiodic)))
boxcheck = 1;
n = atom->ntypes;
if (cutneighsq == NULL) {
if (lmp->kokkos) init_cutneighsq_kokkos(n);
else memory->create(cutneighsq,n+1,n+1,"neigh:cutneighsq");
memory->create(cutneighghostsq,n+1,n+1,"neigh:cutneighghostsq");
cuttype = new double[n+1];
cuttypesq = new double[n+1];
}
double cutoff,delta,cut;
cutneighmin = BIG;
cutneighmax = 0.0;
for (i = 1; i <= n; i++) {
cuttype[i] = cuttypesq[i] = 0.0;
for (j = 1; j <= n; j++) {
if (force->pair) cutoff = sqrt(force->pair->cutsq[i][j]);
else cutoff = 0.0;
if (cutoff > 0.0) delta = skin;
else delta = 0.0;
cut = cutoff + delta;
cutneighsq[i][j] = cut*cut;
cuttype[i] = MAX(cuttype[i],cut);
cuttypesq[i] = MAX(cuttypesq[i],cut*cut);
cutneighmin = MIN(cutneighmin,cut);
cutneighmax = MAX(cutneighmax,cut);
if (force->pair && force->pair->ghostneigh) {
cut = force->pair->cutghost[i][j] + skin;
cutneighghostsq[i][j] = cut*cut;
} else cutneighghostsq[i][j] = cut*cut;
}
}
cutneighmaxsq = cutneighmax * cutneighmax;
// rRESPA cutoffs
int respa = 0;
if (update->whichflag == 1 && strstr(update->integrate_style,"respa")) {
if (((Respa *) update->integrate)->level_inner >= 0) respa = 1;
if (((Respa *) update->integrate)->level_middle >= 0) respa = 2;
}
if (respa) {
double *cut_respa = ((Respa *) update->integrate)->cutoff;
cut_inner_sq = (cut_respa[1] + skin) * (cut_respa[1] + skin);
cut_middle_sq = (cut_respa[3] + skin) * (cut_respa[3] + skin);
cut_middle_inside_sq = (cut_respa[0] - skin) * (cut_respa[0] - skin);
if (cut_respa[0]-skin < 0) cut_middle_inside_sq = 0.0;
}
// fixchecklist = other classes that can induce reneighboring in decide()
restart_check = 0;
if (output->restart_flag) restart_check = 1;
delete [] fixchecklist;
fixchecklist = NULL;
fixchecklist = new int[modify->nfix];
fix_check = 0;
for (i = 0; i < modify->nfix; i++)
if (modify->fix[i]->force_reneighbor)
fixchecklist[fix_check++] = i;
must_check = 0;
if (restart_check || fix_check) must_check = 1;
// set special_flag for 1-2, 1-3, 1-4 neighbors
// flag[0] is not used, flag[1] = 1-2, flag[2] = 1-3, flag[3] = 1-4
// flag = 0 if both LJ/Coulomb special values are 0.0
// flag = 1 if both LJ/Coulomb special values are 1.0
// flag = 2 otherwise or if KSpace solver is enabled
// pairwise portion of KSpace solver uses all 1-2,1-3,1-4 neighbors
// or selected Coulomb-approixmation pair styles require it
if (force->special_lj[1] == 0.0 && force->special_coul[1] == 0.0)
special_flag[1] = 0;
else if (force->special_lj[1] == 1.0 && force->special_coul[1] == 1.0)
special_flag[1] = 1;
else special_flag[1] = 2;
if (force->special_lj[2] == 0.0 && force->special_coul[2] == 0.0)
special_flag[2] = 0;
else if (force->special_lj[2] == 1.0 && force->special_coul[2] == 1.0)
special_flag[2] = 1;
else special_flag[2] = 2;
if (force->special_lj[3] == 0.0 && force->special_coul[3] == 0.0)
special_flag[3] = 0;
else if (force->special_lj[3] == 1.0 && force->special_coul[3] == 1.0)
special_flag[3] = 1;
else special_flag[3] = 2;
if (force->kspace || force->pair_match("coul/wolf",0) ||
force->pair_match("coul/dsf",0) || force->pair_match("thole",0))
special_flag[1] = special_flag[2] = special_flag[3] = 2;
// maxwt = max multiplicative factor on atom indices stored in neigh list
maxwt = 0;
if (special_flag[1] == 2) maxwt = 2;
if (special_flag[2] == 2) maxwt = 3;
if (special_flag[3] == 2) maxwt = 4;
// ------------------------------------------------------------------
// xhold array
// free if not needed for this run
if (dist_check == 0) {
memory->destroy(xhold);
maxhold = 0;
xhold = NULL;
}
// first time allocation
if (dist_check) {
if (maxhold == 0) {
maxhold = atom->nmax;
memory->create(xhold,maxhold,3,"neigh:xhold");
}
}
// ------------------------------------------------------------------
// exclusion lists
// depend on type, group, molecule settings from neigh_modify
// warn if exclusions used with KSpace solver
n = atom->ntypes;
if (nex_type == 0 && nex_group == 0 && nex_mol == 0) exclude = 0;
else exclude = 1;
if (nex_type) {
if (lmp->kokkos)
init_ex_type_kokkos(n);
else {
memory->destroy(ex_type);
memory->create(ex_type,n+1,n+1,"neigh:ex_type");
}
for (i = 1; i <= n; i++)
for (j = 1; j <= n; j++)
ex_type[i][j] = 0;
for (i = 0; i < nex_type; i++) {
if (ex1_type[i] <= 0 || ex1_type[i] > n ||
ex2_type[i] <= 0 || ex2_type[i] > n)
error->all(FLERR,"Invalid atom type in neighbor exclusion list");
ex_type[ex1_type[i]][ex2_type[i]] = 1;
ex_type[ex2_type[i]][ex1_type[i]] = 1;
}
}
if (nex_group) {
if (lmp->kokkos)
init_ex_bit_kokkos();
else {
delete [] ex1_bit;
delete [] ex2_bit;
ex1_bit = new int[nex_group];
ex2_bit = new int[nex_group];
}
for (i = 0; i < nex_group; i++) {
ex1_bit[i] = group->bitmask[ex1_group[i]];
ex2_bit[i] = group->bitmask[ex2_group[i]];
}
}
if (nex_mol) {
if (lmp->kokkos)
init_ex_mol_bit_kokkos();
else {
delete [] ex_mol_bit;
ex_mol_bit = new int[nex_mol];
}
for (i = 0; i < nex_mol; i++)
ex_mol_bit[i] = group->bitmask[ex_mol_group[i]];
}
if (exclude && force->kspace && me == 0)
error->warning(FLERR,"Neighbor exclusions used with KSpace solver "
"may give inconsistent Coulombic energies");
// ------------------------------------------------------------------
// create pairwise lists
// one-time call to init_styles() to scan style files and setup
// init_pair() creates auxiliary classes: NBin, NStencil, NPair
if (firsttime) init_styles();
firsttime = 0;
int same = init_pair();
// invoke copy_neighbor_info() in Bin,Stencil,Pair classes
// copied once per run in case any cutoff, exclusion, special info changed
for (i = 0; i < nbin; i++) neigh_bin[i]->copy_neighbor_info();
for (i = 0; i < nstencil; i++) neigh_stencil[i]->copy_neighbor_info();
for (i = 0; i < nlist; i++)
if (neigh_pair[i]) neigh_pair[i]->copy_neighbor_info();
if (!same && comm->me == 0) print_pairwise_info();
// can now delete requests so next run can make new ones
// print_pairwise_info() made use of requests
// set of NeighLists now stores all needed info
for (int i = 0; i < nrequest; i++) {
delete requests[i];
requests[i] = NULL;
}
nrequest = 0;
// ------------------------------------------------------------------
// create topology lists
// instantiated topo styles can change from run to run
init_topology();
}
/* ----------------------------------------------------------------------
create and initialize lists of Nbin, Nstencil, NPair classes
lists have info on all classes in 3 style*.h files
cannot do this in constructor, b/c too early to instantiate classes
------------------------------------------------------------------------- */
void Neighbor::init_styles()
{
// extract info from NBin classes listed in style_nbin.h
nbclass = 0;
#define NBIN_CLASS
#define NBinStyle(key,Class,bitmasks) nbclass++;
#include "style_nbin.h"
#undef NBinStyle
#undef NBIN_CLASS
binclass = new BinCreator[nbclass];
binnames = new char*[nbclass];
binmasks = new int[nbclass];
nbclass = 0;
#define NBIN_CLASS
#define NBinStyle(key,Class,bitmasks) \
binnames[nbclass] = (char *) #key; \
binclass[nbclass] = &bin_creator<Class>; \
binmasks[nbclass++] = bitmasks;
#include "style_nbin.h"
#undef NBinStyle
#undef NBIN_CLASS
// extract info from NStencil classes listed in style_nstencil.h
nsclass = 0;
#define NSTENCIL_CLASS
#define NStencilStyle(key,Class,bitmasks) nsclass++;
#include "style_nstencil.h"
#undef NStencilStyle
#undef NSTENCIL_CLASS
stencilclass = new StencilCreator[nsclass];
stencilnames = new char*[nsclass];
stencilmasks = new int[nsclass];
nsclass = 0;
#define NSTENCIL_CLASS
#define NStencilStyle(key,Class,bitmasks) \
stencilnames[nsclass] = (char *) #key; \
stencilclass[nsclass] = &stencil_creator<Class>; \
stencilmasks[nsclass++] = bitmasks;
#include "style_nstencil.h"
#undef NStencilStyle
#undef NSTENCIL_CLASS
// extract info from NPair classes listed in style_npair.h
npclass = 0;
#define NPAIR_CLASS
#define NPairStyle(key,Class,bitmasks) npclass++;
#include "style_npair.h"
#undef NPairStyle
#undef NPAIR_CLASS
pairclass = new PairCreator[npclass];
pairnames = new char*[npclass];
pairmasks = new int[npclass];
npclass = 0;
#define NPAIR_CLASS
#define NPairStyle(key,Class,bitmasks) \
pairnames[npclass] = (char *) #key; \
pairclass[npclass] = &pair_creator<Class>; \
pairmasks[npclass++] = bitmasks;
#include "style_npair.h"
#undef NPairStyle
#undef NPAIR_CLASS
}
/* ----------------------------------------------------------------------
create and initialize NPair classes
------------------------------------------------------------------------- */
int Neighbor::init_pair()
{
int i,j,k,m;
// test if pairwise lists need to be re-created
// no need to re-create if:
// neigh style, triclinic, pgsize, oneatom have not changed
// current requests = old requests
// so just return:
// delete requests so next run can make new ones
// current set of NeighLists already stores all needed info
// requests are compared via identical() before:
// any requests are morphed using logic below
// any requests are added below, e.g. as parents of pair hybrid skip lists
// copy them via requests_new2old() BEFORE any changes made to requests
// necessary b/c morphs can change requestor settings (see comment below)
int same = 1;
if (style != old_style) same = 0;
if (triclinic != old_triclinic) same = 0;
if (pgsize != old_pgsize) same = 0;
if (oneatom != old_oneatom) same = 0;
if (nrequest != old_nrequest) same = 0;
else
for (i = 0; i < nrequest; i++)
if (requests[i]->identical(old_requests[i]) == 0) same = 0;
#ifdef NEIGH_LIST_DEBUG
if (comm->me == 0) printf("SAME flag %d\n",same);
#endif
if (same) return same;
requests_new2old();
// delete old lists since creating new ones
for (i = 0; i < nlist; i++) delete lists[i];
for (i = 0; i < nbin; i++) delete neigh_bin[i];
for (i = 0; i < nstencil; i++) delete neigh_stencil[i];
for (i = 0; i < nlist; i++) delete neigh_pair[i];
delete [] lists;
delete [] neigh_bin;
delete [] neigh_stencil;
delete [] neigh_pair;
// morph requests in various ways
// purpose is to avoid duplicate or inefficient builds
// may add new requests if a needed request to derive from does not exist
// methods:
// (1) other = point history and rRESPA lists at their partner lists
// (2) skip = create any new non-skip lists needed by pair hybrid skip lists
// (3) granular = adjust parent and skip lists for granular onesided usage
// (4) h/f = pair up any matching half/full lists
// (5) copy = convert as many lists as possible to copy lists
// order of morph methods matters:
// (1) before (2), b/c (2) needs to know history partner pairings
// (2) after (1), b/c (2) may also need to create new history lists
// (3) after (2), b/c it adjusts lists created by (2)
// (4) after (2) and (3),
// b/c (2) may create new full lists, (3) may change them
// (5) last, after all lists are finalized, so all possible copies found
int nrequest_original = nrequest;
morph_other();
morph_skip();
morph_granular(); // this method can change flags set by requestor
morph_halffull();
morph_copy();
// create new lists, one per request including added requests
// wait to allocate initial pages until copy lists are detected
- // NOTE: can I allocation now, instead of down below?
+ // NOTE: can I allocate now, instead of down below?
nlist = nrequest;
lists = new NeighList*[nrequest];
neigh_bin = new NBin*[nrequest];
neigh_stencil = new NStencil*[nrequest];
neigh_pair = new NPair*[nrequest];
// allocate new lists
// pass list ptr back to requestor (except for Command class)
// only for original requests, not ones added by Neighbor class
for (i = 0; i < nrequest; i++) {
if (requests[i]->kokkos_host || requests[i]->kokkos_device)
create_kokkos_list(i);
else lists[i] = new NeighList(lmp);
lists[i]->index = i;
if (requests[i]->pair && i < nrequest_original) {
Pair *pair = (Pair *) requests[i]->requestor;
pair->init_list(requests[i]->id,lists[i]);
} else if (requests[i]->fix && i < nrequest_original) {
Fix *fix = (Fix *) requests[i]->requestor;
fix->init_list(requests[i]->id,lists[i]);
} else if (requests[i]->compute && i < nrequest_original) {
Compute *compute = (Compute *) requests[i]->requestor;
compute->init_list(requests[i]->id,lists[i]);
}
}
// invoke post_constructor() for all lists
// copies info from requests to lists, sets ptrs to related lists
for (i = 0; i < nrequest; i++)
lists[i]->post_constructor(requests[i]);
// assign Bin,Stencil,Pair style to each list
int flag;
for (i = 0; i < nrequest; i++) {
flag = choose_bin(requests[i]);
lists[i]->bin_method = flag;
if (flag < 0)
error->all(FLERR,"Requested neighbor bin option does not exist");
flag = choose_stencil(requests[i]);
lists[i]->stencil_method = flag;
if (flag < 0)
error->all(FLERR,"Requested neighbor stencil method does not exist");
flag = choose_pair(requests[i]);
lists[i]->pair_method = flag;
if (flag < 0)
error->all(FLERR,"Requested neighbor pair method does not exist");
}
// instantiate unique Bin,Stencil classes in neigh_bin & neigh_stencil vecs
// unique = only one of its style, or request unique flag set (custom cutoff)
nbin = 0;
for (i = 0; i < nrequest; i++) {
requests[i]->index_bin = -1;
flag = lists[i]->bin_method;
if (flag == 0) continue;
for (j = 0; j < nbin; j++)
if (neigh_bin[j]->istyle == flag) break;
if (j < nbin && !requests[i]->unique) {
requests[i]->index_bin = j;
continue;
}
BinCreator bin_creator = binclass[flag-1];
neigh_bin[nbin] = bin_creator(lmp);
neigh_bin[nbin]->post_constructor(requests[i]);
neigh_bin[nbin]->istyle = flag;
requests[i]->index_bin = nbin;
nbin++;
}
nstencil = 0;
for (i = 0; i < nrequest; i++) {
requests[i]->index_stencil = -1;
flag = lists[i]->stencil_method;
if (flag == 0) continue;
for (j = 0; j < nstencil; j++)
if (neigh_stencil[j]->istyle == flag) break;
if (j < nstencil && !requests[i]->unique) {
requests[i]->index_stencil = j;
continue;
}
StencilCreator stencil_creator = stencilclass[flag-1];
neigh_stencil[nstencil] = stencil_creator(lmp);
neigh_stencil[nstencil]->post_constructor(requests[i]);
neigh_stencil[nstencil]->istyle = flag;
if (lists[i]->bin_method > 0) {
neigh_stencil[nstencil]->nb = neigh_bin[requests[i]->index_bin];
if (neigh_stencil[nstencil]->nb == NULL)
error->all(FLERR,"Could not assign bin method to neighbor stencil");
}
requests[i]->index_stencil = nstencil;
nstencil++;
}
// instantiate one Pair class per list in neigh_pair vec
for (i = 0; i < nrequest; i++) {
requests[i]->index_pair = -1;
flag = lists[i]->pair_method;
if (flag == 0) {
neigh_pair[i] = NULL;
continue;
}
PairCreator pair_creator = pairclass[flag-1];
neigh_pair[i] = pair_creator(lmp);
neigh_pair[i]->post_constructor(requests[i]);
neigh_pair[i]->istyle = flag;
if (lists[i]->bin_method > 0) {
neigh_pair[i]->nb = neigh_bin[requests[i]->index_bin];
if (neigh_pair[i]->nb == NULL)
error->all(FLERR,"Could not assign bin method to neighbor pair");
}
if (lists[i]->stencil_method > 0) {
neigh_pair[i]->ns = neigh_stencil[requests[i]->index_stencil];
if (neigh_pair[i]->ns == NULL)
error->all(FLERR,"Could not assign stencil method to neighbor pair");
}
requests[i]->index_pair = i;
}
// allocate initial pages for each list, except if copy flag set
// allocate dnum vector of zeroes if set
int dnummax = 0;
for (i = 0; i < nlist; i++) {
if (lists[i]->copy) continue;
lists[i]->setup_pages(pgsize,oneatom);
dnummax = MAX(dnummax,lists[i]->dnum);
}
if (dnummax) {
delete [] zeroes;
zeroes = new double[dnummax];
for (i = 0; i < dnummax; i++) zeroes[i] = 0.0;
}
// first-time allocation of per-atom data for lists that are built and store
// lists that are not built: granhistory, respa inner/middle (no neigh_pair)
// lists that do not store: copy
// use atom->nmax for both grow() args
// i.e. grow first time to expanded size to avoid future reallocs
// also Kokkos list initialization
int maxatom = atom->nmax;
for (i = 0; i < nlist; i++)
if (neigh_pair[i] && !lists[i]->copy) lists[i]->grow(maxatom,maxatom);
// plist = indices of perpetual NPair classes
// perpetual = non-occasional, re-built at every reneighboring
// slist = indices of perpetual NStencil classes
// perpetual = used by any perpetual NPair class
delete [] slist;
delete [] plist;
nstencil_perpetual = npair_perpetual = 0;
slist = new int[nstencil];
plist = new int[nlist];
for (i = 0; i < nlist; i++) {
if (lists[i]->occasional == 0 && lists[i]->pair_method)
plist[npair_perpetual++] = i;
}
for (i = 0; i < nstencil; i++) {
flag = 0;
for (j = 0; j < npair_perpetual; j++)
if (lists[plist[j]]->stencil_method == neigh_stencil[i]->istyle)
flag = 1;
if (flag) slist[nstencil_perpetual++] = i;
}
// reorder plist vector if necessary
// relevant for lists that are derived from a parent list:
// half-full,copy,skip
// the child index must appear in plist after the parent index
// swap two indices within plist when dependency is mis-ordered
// start double loop check again whenever a swap is made
// done when entire double loop test results in no swaps
NeighList *ptr;
int done = 0;
while (!done) {
done = 1;
for (i = 0; i < npair_perpetual; i++) {
for (k = 0; k < 3; k++) {
ptr = NULL;
if (k == 0) ptr = lists[plist[i]]->listcopy;
if (k == 1) ptr = lists[plist[i]]->listskip;
if (k == 2) ptr = lists[plist[i]]->listfull;
if (ptr == NULL) continue;
for (m = 0; m < nrequest; m++)
if (ptr == lists[m]) break;
for (j = 0; j < npair_perpetual; j++)
if (m == plist[j]) break;
if (j < i) continue;
int tmp = plist[i]; // swap I,J indices
plist[i] = plist[j];
plist[j] = tmp;
done = 0;
break;
}
if (!done) break;
}
}
// debug output
#ifdef NEIGH_LIST_DEBUG
for (i = 0; i < nrequest; i++) lists[i]->print_attributes();
#endif
return same;
}
/* ----------------------------------------------------------------------
scan NeighRequests to set additional flags
only for history, respaouter, custom cutoff lists
------------------------------------------------------------------------- */
void Neighbor::morph_other()
{
NeighRequest *irq;
for (int i = 0; i < nrequest; i++) {
irq = requests[i];
// if history, point this list and partner list at each other
if (irq->history) {
irq->historylist = i-1;
requests[i-1]->history_partner = 1;
requests[i-1]->historylist = i;
}
// if respaouter, point all associated rRESPA lists at each other
if (irq->respaouter) {
if (requests[i-1]->respainner) {
irq->respainnerlist = i-1;
requests[i-1]->respaouterlist = i;
} else {
irq->respamiddlelist = i-1;
requests[i-1]->respaouterlist = i;
requests[i-1]->respainnerlist = i-1;
irq->respainnerlist = i-2;
requests[i-2]->respaouterlist = i;
requests[i-2]->respamiddlelist = i-1;
}
}
// if cut flag set by requestor, set unique flag
// this forces Pair,Stencil,Bin styles to be instantiated separately
if (irq->cut) irq->unique = 1;
}
}
/* ----------------------------------------------------------------------
scan NeighRequests to process all skip lists
look for a matching non-skip list
if one exists, point at it via skiplist
else make new parent via copy_request() and point at it
------------------------------------------------------------------------- */
void Neighbor::morph_skip()
{
int i,j,inewton,jnewton;
NeighRequest *irq,*jrq,*nrq;
for (i = 0; i < nrequest; i++) {
irq = requests[i];
// only processing skip lists
if (!irq->skip) continue;
// these lists are created other ways, no need for skipping
// halffull list and its full parent may both skip,
// but are checked to insure matching skip info
if (irq->history) continue;
if (irq->respainner || irq->respamiddle) continue;
if (irq->halffull) continue;
if (irq->copy) continue;
// check all other lists
for (j = 0; j < nrequest; j++) {
if (i == j) continue;
jrq = requests[j];
// can only skip from a perpetual non-skip list
if (jrq->occasional) continue;
if (jrq->skip) continue;
// both lists must be half, or both full
if (irq->half != jrq->half) continue;
if (irq->full != jrq->full) continue;
// both lists must be newton on, or both newton off
// IJ newton = 1 for newton on, 2 for newton off
inewton = irq->newton;
if (inewton == 0) inewton = force->newton_pair ? 1 : 2;
jnewton = jrq->newton;
if (jnewton == 0) jnewton = force->newton_pair ? 1 : 2;
if (inewton != jnewton) continue;
// these flags must be same,
// else 2 lists do not store same pairs
// or their data structures are different
// this includes custom cutoff set by requestor
// no need to check respaouter b/c it stores same pairs
// no need to check dnum b/c only set for history
// NOTE: need check for 2 Kokkos flags?
if (irq->ghost != jrq->ghost) continue;
if (irq->size != jrq->size) continue;
if (irq->bond != jrq->bond) continue;
if (irq->omp != jrq->omp) continue;
if (irq->intel != jrq->intel) continue;
if (irq->kokkos_host != jrq->kokkos_host) continue;
if (irq->kokkos_device != jrq->kokkos_device) continue;
if (irq->ssa != jrq->ssa) continue;
if (irq->cut != jrq->cut) continue;
if (irq->cutoff != jrq->cutoff) continue;
// 2 lists are a match
break;
}
// if matching list exists, point to it
// else create a new identical list except non-skip
// for new list, set neigh = 1, skip = 0, no skip vec/array,
// copy unique flag (since copy_request() will not do it)
// note: parents of skip lists do not have associated history list
// b/c child skip lists store their own history info
if (j < nrequest) irq->skiplist = j;
else {
int newrequest = request(this,-1);
irq->skiplist = newrequest;
nrq = requests[newrequest];
nrq->copy_request(irq,0);
nrq->pair = nrq->fix = nrq->compute = nrq->command = 0;
nrq->neigh = 1;
nrq->skip = 0;
if (irq->unique) nrq->unique = 1;
}
}
}
/* ----------------------------------------------------------------------
scan NeighRequests just added by morph_skip for hybrid granular
adjust newton/oneside parent settings if children require onesided skipping
also set children off2on flag if parent becomes a newton off list
this is needed because line/gran and tri/gran pair styles
require onesided neigh lists and system newton on,
but parent list must be newton off to enable the onesided skipping
------------------------------------------------------------------------- */
void Neighbor::morph_granular()
{
int i,j;
NeighRequest *irq,*jrq;
for (i = 0; i < nrequest; i++) {
irq = requests[i];
// only examine NeighRequests added by morph_skip()
// only those with size attribute for granular systems
if (!irq->neigh) continue;
if (!irq->size) continue;
// check children of this list
int onesided = -1;
for (j = 0; j < nrequest; j++) {
jrq = requests[j];
// only consider JRQ pair, size lists that skip from Irq list
if (!jrq->pair) continue;
if (!jrq->size) continue;
if (!jrq->skip || jrq->skiplist != i) continue;
// onesided = -1 if no children
// onesided = 0/1 = child granonesided value if same for all children
// onesided = 2 if children have different granonesided values
if (onesided < 0) onesided = jrq->granonesided;
else if (onesided != jrq->granonesided) onesided = 2;
if (onesided == 2) break;
}
// if onesided = 2, parent has children with both granonesided = 0/1
// force parent newton off (newton = 2) to enable onesided skip by child
// set parent granonesided = 0, so it stores all neighs in usual manner
// set off2on = 1 for all children, since they expect newton on lists
// this is b/c granonesided only set by line/gran and tri/gran which
// both require system newton on
if (onesided == 2) {
irq->newton = 2;
irq->granonesided = 0;
for (j = 0; j < nrequest; j++) {
jrq = requests[j];
// only consider JRQ pair, size lists that skip from Irq list
if (!jrq->pair) continue;
if (!jrq->size) continue;
if (!jrq->skip || jrq->skiplist != i) continue;
jrq->off2on = 1;
}
}
}
}
/* ----------------------------------------------------------------------
scan NeighRequests for possible half lists to derive from full lists
if 2 requests match, set half list to derive from full list
------------------------------------------------------------------------- */
void Neighbor::morph_halffull()
{
int i,j;
NeighRequest *irq,*jrq;
for (i = 0; i < nrequest; i++) {
irq = requests[i];
// only processing half lists
if (!irq->half) continue;
// Kokkos doesn't yet support half from full
if (irq->kokkos_host) continue;
if (irq->kokkos_device) continue;
// these lists are created other ways, no need for halffull
// do want to process skip lists
if (irq->history) continue;
if (irq->respainner || irq->respamiddle) continue;
if (irq->copy) continue;
// check all other lists
for (j = 0; j < nrequest; j++) {
if (i == j) continue;
jrq = requests[j];
// can only derive from a perpetual full list
// newton setting of derived list does not matter
if (jrq->occasional) continue;
if (!jrq->full) continue;
// these flags must be same,
// else 2 lists do not store same pairs
// or their data structures are different
// this includes custom cutoff set by requestor
// no need to check respaouter b/c it stores same pairs
// no need to check dnum b/c only set for history
if (irq->ghost != jrq->ghost) continue;
if (irq->size != jrq->size) continue;
if (irq->bond != jrq->bond) continue;
if (irq->omp != jrq->omp) continue;
if (irq->intel != jrq->intel) continue;
if (irq->kokkos_host != jrq->kokkos_host) continue;
if (irq->kokkos_device != jrq->kokkos_device) continue;
if (irq->ssa != jrq->ssa) continue;
if (irq->cut != jrq->cut) continue;
if (irq->cutoff != jrq->cutoff) continue;
// skip flag must be same
// if both are skip lists, skip info must match
if (irq->skip != jrq->skip) continue;
if (irq->skip && irq->same_skip(jrq) == 0) continue;
// 2 lists are a match
break;
}
// if matching list exists, point to it
if (j < nrequest) {
irq->halffull = 1;
irq->halffulllist = j;
}
}
}
/* ----------------------------------------------------------------------
scan NeighRequests for possible copies
if 2 requests match, turn one into a copy of the other
------------------------------------------------------------------------- */
void Neighbor::morph_copy()
{
int i,j,inewton,jnewton;
NeighRequest *irq,*jrq;
for (i = 0; i < nrequest; i++) {
irq = requests[i];
// this list is already a copy list due to another morph method
if (irq->copy) continue;
// these lists are created other ways, no need to copy
// skip lists are eligible to become a copy list
if (irq->history) continue;
if (irq->respainner || irq->respamiddle) continue;
// check all other lists
- for (j = 0; j < i; j++) {
+ for (j = 0; j < nrequest; j++) {
if (i == j) continue;
jrq = requests[j];
// other list is already copied from this one
if (jrq->copy && jrq->copylist == i) continue;
// parent list must be perpetual
// copied list can be perpetual or occasional
if (jrq->occasional) continue;
// both lists must be half, or both full
if (irq->half != jrq->half) continue;
if (irq->full != jrq->full) continue;
// both lists must be newton on, or both newton off
// IJ newton = 1 for newton on, 2 for newton off
inewton = irq->newton;
if (inewton == 0) inewton = force->newton_pair ? 1 : 2;
jnewton = jrq->newton;
if (jnewton == 0) jnewton = force->newton_pair ? 1 : 2;
if (inewton != jnewton) continue;
// ok for non-ghost list to copy from ghost list, but not vice versa
if (irq->ghost && !jrq->ghost) continue;
// these flags must be same,
// else 2 lists do not store same pairs
// or their data structures are different
// this includes custom cutoff set by requestor
// no need to check respaouter b/c it stores same pairs
// no need to check omp b/c it stores same pairs
// no need to check dnum b/c only set for history
// NOTE: need check for 2 Kokkos flags?
if (irq->size != jrq->size) continue;
if (irq->bond != jrq->bond) continue;
if (irq->intel != jrq->intel) continue;
if (irq->kokkos_host != jrq->kokkos_host) continue;
if (irq->kokkos_device != jrq->kokkos_device) continue;
if (irq->ssa != jrq->ssa) continue;
if (irq->cut != jrq->cut) continue;
if (irq->cutoff != jrq->cutoff) continue;
// skip flag must be same
// if both are skip lists, skip info must match
if (irq->skip != jrq->skip) continue;
if (irq->skip && irq->same_skip(jrq) == 0) continue;
// 2 lists are a match
break;
}
// turn list I into a copy of list J
// do not copy a list from another copy list, but from its parent list
- if (j < i) {
+ if (j < nrequest) {
irq->copy = 1;
if (jrq->copy) irq->copylist = jrq->copylist;
else irq->copylist = j;
}
}
}
/* ----------------------------------------------------------------------
create and initialize NTopo classes
------------------------------------------------------------------------- */
void Neighbor::init_topology()
{
int i,m;
if (!atom->molecular) return;
// set flags that determine which topology neighbor classes to use
// these settings could change from run to run, depending on fixes defined
// bonds,etc can only be broken for atom->molecular = 1, not 2
// SHAKE sets bonds and angles negative
// gcmc sets all bonds, angles, etc negative
// bond_quartic sets bonds to 0
// delete_bonds sets all interactions negative
int bond_off = 0;
int angle_off = 0;
for (i = 0; i < modify->nfix; i++)
if ((strcmp(modify->fix[i]->style,"shake") == 0)
|| (strcmp(modify->fix[i]->style,"rattle") == 0))
bond_off = angle_off = 1;
if (force->bond && force->bond_match("quartic")) bond_off = 1;
if (atom->avec->bonds_allow && atom->molecular == 1) {
for (i = 0; i < atom->nlocal; i++) {
if (bond_off) break;
for (m = 0; m < atom->num_bond[i]; m++)
if (atom->bond_type[i][m] <= 0) bond_off = 1;
}
}
if (atom->avec->angles_allow && atom->molecular == 1) {
for (i = 0; i < atom->nlocal; i++) {
if (angle_off) break;
for (m = 0; m < atom->num_angle[i]; m++)
if (atom->angle_type[i][m] <= 0) angle_off = 1;
}
}
int dihedral_off = 0;
if (atom->avec->dihedrals_allow && atom->molecular == 1) {
for (i = 0; i < atom->nlocal; i++) {
if (dihedral_off) break;
for (m = 0; m < atom->num_dihedral[i]; m++)
if (atom->dihedral_type[i][m] <= 0) dihedral_off = 1;
}
}
int improper_off = 0;
if (atom->avec->impropers_allow && atom->molecular == 1) {
for (i = 0; i < atom->nlocal; i++) {
if (improper_off) break;
for (m = 0; m < atom->num_improper[i]; m++)
if (atom->improper_type[i][m] <= 0) improper_off = 1;
}
}
for (i = 0; i < modify->nfix; i++)
if ((strcmp(modify->fix[i]->style,"gcmc") == 0))
bond_off = angle_off = dihedral_off = improper_off = 1;
// sync on/off settings across all procs
int onoff = bond_off;
MPI_Allreduce(&onoff,&bond_off,1,MPI_INT,MPI_MAX,world);
onoff = angle_off;
MPI_Allreduce(&onoff,&angle_off,1,MPI_INT,MPI_MAX,world);
onoff = dihedral_off;
MPI_Allreduce(&onoff,&dihedral_off,1,MPI_INT,MPI_MAX,world);
onoff = improper_off;
MPI_Allreduce(&onoff,&improper_off,1,MPI_INT,MPI_MAX,world);
// instantiate NTopo classes
if (atom->avec->bonds_allow) {
int old_bondwhich = bondwhich;
if (atom->molecular == 2) bondwhich = TEMPLATE;
else if (bond_off) bondwhich = PARTIAL;
else bondwhich = ALL;
if (!neigh_bond || bondwhich != old_bondwhich) {
delete neigh_bond;
if (bondwhich == ALL)
neigh_bond = new NTopoBondAll(lmp);
else if (bondwhich == PARTIAL)
neigh_bond = new NTopoBondPartial(lmp);
else if (bondwhich == TEMPLATE)
neigh_bond = new NTopoBondTemplate(lmp);
}
}
if (atom->avec->angles_allow) {
int old_anglewhich = anglewhich;
if (atom->molecular == 2) anglewhich = TEMPLATE;
else if (angle_off) anglewhich = PARTIAL;
else anglewhich = ALL;
if (!neigh_angle || anglewhich != old_anglewhich) {
delete neigh_angle;
if (anglewhich == ALL)
neigh_angle = new NTopoAngleAll(lmp);
else if (anglewhich == PARTIAL)
neigh_angle = new NTopoAnglePartial(lmp);
else if (anglewhich == TEMPLATE)
neigh_angle = new NTopoAngleTemplate(lmp);
}
}
if (atom->avec->dihedrals_allow) {
int old_dihedralwhich = dihedralwhich;
if (atom->molecular == 2) dihedralwhich = TEMPLATE;
else if (dihedral_off) dihedralwhich = PARTIAL;
else dihedralwhich = ALL;
if (!neigh_dihedral || dihedralwhich != old_dihedralwhich) {
delete neigh_dihedral;
if (dihedralwhich == ALL)
neigh_dihedral = new NTopoDihedralAll(lmp);
else if (dihedralwhich == PARTIAL)
neigh_dihedral = new NTopoDihedralPartial(lmp);
else if (dihedralwhich == TEMPLATE)
neigh_dihedral = new NTopoDihedralTemplate(lmp);
}
}
if (atom->avec->impropers_allow) {
int old_improperwhich = improperwhich;
if (atom->molecular == 2) improperwhich = TEMPLATE;
else if (improper_off) improperwhich = PARTIAL;
else improperwhich = ALL;
if (!neigh_improper || improperwhich != old_improperwhich) {
delete neigh_improper;
if (improperwhich == ALL)
neigh_improper = new NTopoImproperAll(lmp);
else if (improperwhich == PARTIAL)
neigh_improper = new NTopoImproperPartial(lmp);
else if (improperwhich == TEMPLATE)
neigh_improper = new NTopoImproperTemplate(lmp);
}
}
}
/* ----------------------------------------------------------------------
output summary of pairwise neighbor list info
only called by proc 0
------------------------------------------------------------------------- */
void Neighbor::print_pairwise_info()
{
int i,m;
char str[128];
NeighRequest *rq;
FILE *out;
const double cutghost = MAX(cutneighmax,comm->cutghostuser);
double binsize, bbox[3];
bbox[0] = bboxhi[0]-bboxlo[0];
bbox[1] = bboxhi[1]-bboxlo[1];
bbox[2] = bboxhi[2]-bboxlo[2];
if (binsizeflag) binsize = binsize_user;
else if (style == BIN) binsize = 0.5*cutneighmax;
else binsize = 0.5*cutneighmin;
if (binsize == 0.0) binsize = bbox[0];
int nperpetual = 0;
int noccasional = 0;
int nextra = 0;
for (i = 0; i < nlist; i++) {
if (lists[i]->pair_method == 0) nextra++;
else if (lists[i]->occasional) noccasional++;
else nperpetual++;
}
for (m = 0; m < 2; m++) {
if (m == 0) out = screen;
else out = logfile;
if (out) {
fprintf(out,"Neighbor list info ...\n");
fprintf(out," update every %d steps, delay %d steps, check %s\n",
every,delay,dist_check ? "yes" : "no");
fprintf(out," max neighbors/atom: %d, page size: %d\n",
oneatom, pgsize);
fprintf(out," master list distance cutoff = %g\n",cutneighmax);
fprintf(out," ghost atom cutoff = %g\n",cutghost);
if (style != NSQ)
fprintf(out," binsize = %g, bins = %g %g %g\n",binsize,
ceil(bbox[0]/binsize), ceil(bbox[1]/binsize),
ceil(bbox[2]/binsize));
fprintf(out," %d neighbor lists, "
"perpetual/occasional/extra = %d %d %d\n",
nlist,nperpetual,noccasional,nextra);
for (i = 0; i < nlist; i++) {
rq = requests[i];
if (rq->pair) {
char *pname = force->pair_match_ptr((Pair *) rq->requestor);
sprintf(str," (%d) pair %s",i+1,pname);
} else if (rq->fix) {
sprintf(str," (%d) fix %s",i+1,((Fix *) rq->requestor)->style);
} else if (rq->compute) {
sprintf(str," (%d) compute %s",i+1,
((Compute *) rq->requestor)->style);
} else if (rq->command) {
sprintf(str," (%d) command %s",i+1,rq->command_style);
} else if (rq->neigh) {
sprintf(str," (%d) neighbor class addition",i+1);
}
fprintf(out,"%s",str);
if (rq->occasional) fprintf(out,", occasional");
else fprintf(out,", perpetual");
// order these to get single output of most relevant
if (rq->history)
fprintf(out,", history for (%d)",rq->historylist+1);
else if (rq->copy)
fprintf(out,", copy from (%d)",rq->copylist+1);
else if (rq->halffull)
fprintf(out,", half/full from (%d)",rq->halffulllist+1);
else if (rq->skip)
fprintf(out,", skip from (%d)",rq->skiplist+1);
fprintf(out,"\n");
// list of neigh list attributes
fprintf(out," attributes: ");
if (rq->half) fprintf(out,"half");
else if (rq->full) fprintf(out,"full");
if (rq->newton == 0) {
if (force->newton_pair) fprintf(out,", newton on");
else fprintf(out,", newton off");
} else if (rq->newton == 1) fprintf(out,", newton on");
else if (rq->newton == 2) fprintf(out,", newton off");
if (rq->ghost) fprintf(out,", ghost");
if (rq->size) fprintf(out,", size");
if (rq->history) fprintf(out,", history");
if (rq->granonesided) fprintf(out,", onesided");
if (rq->respainner) fprintf(out,", respa outer");
if (rq->respamiddle) fprintf(out,", respa middle");
if (rq->respaouter) fprintf(out,", respa inner");
if (rq->bond) fprintf(out,", bond");
if (rq->omp) fprintf(out,", omp");
if (rq->intel) fprintf(out,", intel");
if (rq->kokkos_device) fprintf(out,", kokkos_device");
if (rq->kokkos_host) fprintf(out,", kokkos_host");
if (rq->ssa) fprintf(out,", ssa");
if (rq->cut) fprintf(out,", cut %g",rq->cutoff);
if (rq->off2on) fprintf(out,", off2on");
fprintf(out,"\n");
fprintf(out," ");
if (lists[i]->pair_method == 0) fprintf(out,"pair build: none\n");
else fprintf(out,"pair build: %s\n",pairnames[lists[i]->pair_method-1]);
fprintf(out," ");
if (lists[i]->stencil_method == 0) fprintf(out,"stencil: none\n");
else fprintf(out,"stencil: %s\n",
stencilnames[lists[i]->stencil_method-1]);
fprintf(out," ");
if (lists[i]->bin_method == 0) fprintf(out,"bin: none\n");
else fprintf(out,"bin: %s\n",binnames[lists[i]->bin_method-1]);
}
/*
fprintf(out," %d stencil methods\n",nstencil);
for (i = 0; i < nstencil; i++)
fprintf(out," (%d) %s\n",
i+1,stencilnames[neigh_stencil[i]->istyle-1]);
fprintf(out," %d bin methods\n",nbin);
for (i = 0; i < nbin; i++)
fprintf(out," (%d) %s\n",i+1,binnames[neigh_bin[i]->istyle-1]);
*/
}
}
}
/* ----------------------------------------------------------------------
make copy of current requests and Neighbor params
used to compare to when next run occurs
------------------------------------------------------------------------- */
void Neighbor::requests_new2old()
{
for (int i = 0; i < old_nrequest; i++) delete old_requests[i];
memory->sfree(old_requests);
old_nrequest = nrequest;
old_requests = (NeighRequest **)
memory->smalloc(old_nrequest*sizeof(NeighRequest *),
"neighbor:old_requests");
for (int i = 0; i < old_nrequest; i++) {
old_requests[i] = new NeighRequest(lmp);
old_requests[i]->copy_request(requests[i],1);
}
old_style = style;
old_triclinic = triclinic;
old_pgsize = pgsize;
old_oneatom = oneatom;
}
/* ----------------------------------------------------------------------
assign NBin class to a NeighList
use neigh request settings to build mask
match mask to list of masks of known Nbin classes
return index+1 of match in list of masks
return 0 for no binning
return -1 if no match
------------------------------------------------------------------------- */
int Neighbor::choose_bin(NeighRequest *rq)
{
// no binning needed
if (style == NSQ) return 0;
if (rq->skip || rq->copy || rq->halffull) return 0;
if (rq->history) return 0;
if (rq->respainner || rq->respamiddle) return 0;
// use request settings to match exactly one NBin class mask
// checks are bitwise using NeighConst bit masks
int mask;
for (int i = 0; i < nbclass; i++) {
mask = binmasks[i];
// require match of these request flags and mask bits
// (!A != !B) is effectively a logical xor
if (!rq->intel != !(mask & NB_INTEL)) continue;
if (!rq->ssa != !(mask & NB_SSA)) continue;
if (!rq->kokkos_device != !(mask & NB_KOKKOS_DEVICE)) continue;
if (!rq->kokkos_host != !(mask & NB_KOKKOS_HOST)) continue;
return i+1;
}
// error return if matched none
return -1;
}
/* ----------------------------------------------------------------------
assign NStencil class to a NeighList
use neigh request settings to build mask
match mask to list of masks of known NStencil classes
return index+1 of match in list of masks
return 0 for no binning
return -1 if no match
------------------------------------------------------------------------- */
int Neighbor::choose_stencil(NeighRequest *rq)
{
// no stencil creation needed
if (style == NSQ) return 0;
if (rq->skip || rq->copy || rq->halffull) return 0;
if (rq->history) return 0;
if (rq->respainner || rq->respamiddle) return 0;
// convert newton request to newtflag = on or off
int newtflag;
if (rq->newton == 0 && newton_pair) newtflag = 1;
else if (rq->newton == 0 && !newton_pair) newtflag = 0;
else if (rq->newton == 1) newtflag = 1;
else if (rq->newton == 2) newtflag = 0;
//printf("STENCIL RQ FLAGS: hff %d %d n %d g %d s %d newtflag %d\n",
// rq->half,rq->full,rq->newton,rq->ghost,rq->ssa,
// newtflag);
// use request and system settings to match exactly one NStencil class mask
// checks are bitwise using NeighConst bit masks
int mask;
for (int i = 0; i < nsclass; i++) {
mask = stencilmasks[i];
//printf("III %d: half %d full %d newton %d newtoff %d ghost %d ssa %d\n",
// i,mask & NS_HALF,mask & NS_FULL,mask & NS_NEWTON,
// mask & NS_NEWTOFF,mask & NS_GHOST,mask & NS_SSA);
// exactly one of half or full is set and must match
if (rq->half) {
if (!(mask & NS_HALF)) continue;
} else if (rq->full) {
if (!(mask & NS_FULL)) continue;
}
// newtflag is on or off and must match
if (newtflag) {
if (!(mask & NS_NEWTON)) continue;
} else if (!newtflag) {
if (!(mask & NS_NEWTOFF)) continue;
}
// require match of these request flags and mask bits
// (!A != !B) is effectively a logical xor
if (!rq->ghost != !(mask & NS_GHOST)) continue;
if (!rq->ssa != !(mask & NS_SSA)) continue;
// neighbor style is BIN or MULTI and must match
if (style == BIN) {
if (!(mask & NS_BIN)) continue;
} else if (style == MULTI) {
if (!(mask & NS_MULTI)) continue;
}
// dimension is 2 or 3 and must match
if (dimension == 2) {
if (!(mask & NS_2D)) continue;
} else if (dimension == 3) {
if (!(mask & NS_3D)) continue;
}
// domain triclinic flag is on or off and must match
if (triclinic) {
if (!(mask & NS_TRI)) continue;
} else if (!triclinic) {
if (!(mask & NS_ORTHO)) continue;
}
return i+1;
}
// error return if matched none
return -1;
}
/* ----------------------------------------------------------------------
assign NPair class to a NeighList
use neigh request settings to build mask
match mask to list of masks of known NPair classes
return index+1 of match in list of masks
return 0 for no binning
return -1 if no match
------------------------------------------------------------------------- */
int Neighbor::choose_pair(NeighRequest *rq)
{
// no neighbor list build performed
if (rq->history) return 0;
if (rq->respainner || rq->respamiddle) return 0;
// error check for includegroup with ghost neighbor request
if (includegroup && rq->ghost)
error->all(FLERR,"Neighbor include group not allowed with ghost neighbors");
// convert newton request to newtflag = on or off
int newtflag;
if (rq->newton == 0 && newton_pair) newtflag = 1;
else if (rq->newton == 0 && !newton_pair) newtflag = 0;
else if (rq->newton == 1) newtflag = 1;
else if (rq->newton == 2) newtflag = 0;
int molecular = atom->molecular;
//printf("PAIR RQ FLAGS: hf %d %d n %d g %d sz %d gos %d r %d b %d o %d i %d "
// "kk %d %d ss %d dn %d sk %d cp %d hf %d oo %d\n",
// rq->half,rq->full,rq->newton,rq->ghost,rq->size,
// rq->granonesided,rq->respaouter,rq->bond,rq->omp,rq->intel,
// rq->kokkos_host,rq->kokkos_device,rq->ssa,rq->dnum,
// rq->skip,rq->copy,rq->halffull,rq->off2on);
// use request and system settings to match exactly one NPair class mask
// checks are bitwise using NeighConst bit masks
int mask;
for (int i = 0; i < npclass; i++) {
mask = pairmasks[i];
//printf(" PAIR NAMES i %d %d name %s mask %d\n",i,nrequest,
// pairnames[i],pairmasks[i]);
// if copy request, no further checks needed, just return or continue
// Kokkos device/host flags must also match in order to copy
if (rq->copy) {
if (!(mask & NP_COPY)) continue;
if (!rq->kokkos_device != !(mask & NP_KOKKOS_DEVICE)) continue;
if (!rq->kokkos_host != !(mask & NP_KOKKOS_HOST)) continue;
return i+1;
}
// exactly one of half or full is set and must match
if (rq->half) {
if (!(mask & NP_HALF)) continue;
} else if (rq->full) {
if (!(mask & NP_FULL)) continue;
}
// newtflag is on or off and must match
if (newtflag) {
if (!(mask & NP_NEWTON)) continue;
} else if (!newtflag) {
if (!(mask & NP_NEWTOFF)) continue;
}
// if molecular on, do not match ATOMONLY (b/c a MOLONLY Npair exists)
// if molecular off, do not match MOLONLY (b/c an ATOMONLY Npair exists)
if (molecular) {
if (mask & NP_ATOMONLY) continue;
} else if (!molecular) {
if (mask & NP_MOLONLY) continue;
}
// require match of these request flags and mask bits
// (!A != !B) is effectively a logical xor
if (!rq->ghost != !(mask & NP_GHOST)) continue;
if (!rq->size != !(mask & NP_SIZE)) continue;
if (!rq->respaouter != !(mask & NP_RESPA)) continue;
if (!rq->granonesided != !(mask & NP_ONESIDE)) continue;
if (!rq->respaouter != !(mask & NP_RESPA)) continue;
if (!rq->bond != !(mask & NP_BOND)) continue;
if (!rq->omp != !(mask & NP_OMP)) continue;
if (!rq->intel != !(mask & NP_INTEL)) continue;
if (!rq->kokkos_device != !(mask & NP_KOKKOS_DEVICE)) continue;
if (!rq->kokkos_host != !(mask & NP_KOKKOS_HOST)) continue;
if (!rq->ssa != !(mask & NP_SSA)) continue;
if (!rq->skip != !(mask & NP_SKIP)) continue;
if (!rq->halffull != !(mask & NP_HALF_FULL)) continue;
if (!rq->off2on != !(mask & NP_OFF2ON)) continue;
// neighbor style is one of NSQ,BIN,MULTI and must match
if (style == NSQ) {
if (!(mask & NP_NSQ)) continue;
} else if (style == BIN) {
if (!(mask & NP_BIN)) continue;
} else if (style == MULTI) {
if (!(mask & NP_MULTI)) continue;
}
// domain triclinic flag is on or off and must match
if (triclinic) {
if (!(mask & NP_TRI)) continue;
} else if (!triclinic) {
if (!(mask & NP_ORTHO)) continue;
}
return i+1;
}
// error return if matched none
return -1;
}
/* ----------------------------------------------------------------------
called by other classes to request a pairwise neighbor list
------------------------------------------------------------------------- */
int Neighbor::request(void *requestor, int instance)
{
if (nrequest == maxrequest) {
maxrequest += RQDELTA;
requests = (NeighRequest **)
memory->srealloc(requests,maxrequest*sizeof(NeighRequest *),
"neighbor:requests");
}
requests[nrequest] = new NeighRequest(lmp);
requests[nrequest]->index = nrequest;
requests[nrequest]->requestor = requestor;
requests[nrequest]->requestor_instance = instance;
nrequest++;
return nrequest-1;
}
/* ----------------------------------------------------------------------
one instance per entry in style_neigh_bin.h
------------------------------------------------------------------------- */
template <typename T>
NBin *Neighbor::bin_creator(LAMMPS *lmp)
{
return new T(lmp);
}
/* ----------------------------------------------------------------------
one instance per entry in style_neigh_stencil.h
------------------------------------------------------------------------- */
template <typename T>
NStencil *Neighbor::stencil_creator(LAMMPS *lmp)
{
return new T(lmp);
}
/* ----------------------------------------------------------------------
one instance per entry in style_neigh_pair.h
------------------------------------------------------------------------- */
template <typename T>
NPair *Neighbor::pair_creator(LAMMPS *lmp)
{
return new T(lmp);
}
/* ----------------------------------------------------------------------
setup neighbor binning and neighbor stencils
called before run and every reneighbor if box size/shape changes
only operates on perpetual lists
build_one() operates on occasional lists
------------------------------------------------------------------------- */
void Neighbor::setup_bins()
{
// invoke setup_bins() for all NBin
// actual binning is performed in build()
for (int i = 0; i < nbin; i++)
neigh_bin[i]->setup_bins(style);
// invoke create_setup() and create() for all perpetual NStencil
// same ops performed for occasional lists in build_one()
for (int i = 0; i < nstencil_perpetual; i++) {
neigh_stencil[slist[i]]->create_setup();
neigh_stencil[slist[i]]->create();
}
last_setup_bins = update->ntimestep;
}
/* ---------------------------------------------------------------------- */
int Neighbor::decide()
{
if (must_check) {
bigint n = update->ntimestep;
if (restart_check && n == output->next_restart) return 1;
for (int i = 0; i < fix_check; i++)
if (n == modify->fix[fixchecklist[i]]->next_reneighbor) return 1;
}
ago++;
if (ago >= delay && ago % every == 0) {
if (build_once) return 0;
if (dist_check == 0) return 1;
return check_distance();
} else return 0;
}
/* ----------------------------------------------------------------------
if any atom moved trigger distance (half of neighbor skin) return 1
shrink trigger distance if box size has changed
conservative shrink procedure:
compute distance each of 8 corners of box has moved since last reneighbor
reduce skin distance by sum of 2 largest of the 8 values
new trigger = 1/2 of reduced skin distance
for orthogonal box, only need 2 lo/hi corners
for triclinic, need all 8 corners since deformations can displace all 8
------------------------------------------------------------------------- */
int Neighbor::check_distance()
{
double delx,dely,delz,rsq;
double delta,deltasq,delta1,delta2;
if (boxcheck) {
if (triclinic == 0) {
delx = bboxlo[0] - boxlo_hold[0];
dely = bboxlo[1] - boxlo_hold[1];
delz = bboxlo[2] - boxlo_hold[2];
delta1 = sqrt(delx*delx + dely*dely + delz*delz);
delx = bboxhi[0] - boxhi_hold[0];
dely = bboxhi[1] - boxhi_hold[1];
delz = bboxhi[2] - boxhi_hold[2];
delta2 = sqrt(delx*delx + dely*dely + delz*delz);
delta = 0.5 * (skin - (delta1+delta2));
deltasq = delta*delta;
} else {
domain->box_corners();
delta1 = delta2 = 0.0;
for (int i = 0; i < 8; i++) {
delx = corners[i][0] - corners_hold[i][0];
dely = corners[i][1] - corners_hold[i][1];
delz = corners[i][2] - corners_hold[i][2];
delta = sqrt(delx*delx + dely*dely + delz*delz);
if (delta > delta1) delta1 = delta;
else if (delta > delta2) delta2 = delta;
}
delta = 0.5 * (skin - (delta1+delta2));
deltasq = delta*delta;
}
} else deltasq = triggersq;
double **x = atom->x;
int nlocal = atom->nlocal;
if (includegroup) nlocal = atom->nfirst;
int flag = 0;
for (int i = 0; i < nlocal; i++) {
delx = x[i][0] - xhold[i][0];
dely = x[i][1] - xhold[i][1];
delz = x[i][2] - xhold[i][2];
rsq = delx*delx + dely*dely + delz*delz;
if (rsq > deltasq) flag = 1;
}
int flagall;
MPI_Allreduce(&flag,&flagall,1,MPI_INT,MPI_MAX,world);
if (flagall && ago == MAX(every,delay)) ndanger++;
return flagall;
}
/* ----------------------------------------------------------------------
build perpetual neighbor lists
called at setup and every few timesteps during run or minimization
topology lists also built if topoflag = 1 (Kokkos calls with topoflag=0)
------------------------------------------------------------------------- */
void Neighbor::build(int topoflag)
{
int i,m;
ago = 0;
ncalls++;
lastcall = update->ntimestep;
int nlocal = atom->nlocal;
int nall = nlocal + atom->nghost;
// check that using special bond flags will not overflow neigh lists
if (nall > NEIGHMASK)
error->one(FLERR,"Too many local+ghost atoms for neighbor list");
// store current atom positions and box size if needed
if (dist_check) {
double **x = atom->x;
if (includegroup) nlocal = atom->nfirst;
if (atom->nmax > maxhold) {
maxhold = atom->nmax;
memory->destroy(xhold);
memory->create(xhold,maxhold,3,"neigh:xhold");
}
for (i = 0; i < nlocal; i++) {
xhold[i][0] = x[i][0];
xhold[i][1] = x[i][1];
xhold[i][2] = x[i][2];
}
if (boxcheck) {
if (triclinic == 0) {
boxlo_hold[0] = bboxlo[0];
boxlo_hold[1] = bboxlo[1];
boxlo_hold[2] = bboxlo[2];
boxhi_hold[0] = bboxhi[0];
boxhi_hold[1] = bboxhi[1];
boxhi_hold[2] = bboxhi[2];
} else {
domain->box_corners();
corners = domain->corners;
for (i = 0; i < 8; i++) {
corners_hold[i][0] = corners[i][0];
corners_hold[i][1] = corners[i][1];
corners_hold[i][2] = corners[i][2];
}
}
}
}
// bin atoms for all NBin instances
// not just NBin associated with perpetual lists
// b/c cannot wait to bin occasional lists in build_one() call
// if bin then, atoms may have moved outside of proc domain & bin extent,
// leading to errors or even a crash
if (style != NSQ) {
for (int i = 0; i < nbin; i++) {
neigh_bin[i]->bin_atoms_setup(nall);
neigh_bin[i]->bin_atoms();
}
}
// build pairwise lists for all perpetual NPair/NeighList
// grow() with nlocal/nall args so that only realloc if have to
for (i = 0; i < npair_perpetual; i++) {
m = plist[i];
if (!lists[m]->copy) lists[m]->grow(nlocal,nall);
neigh_pair[m]->build_setup();
neigh_pair[m]->build(lists[m]);
}
// build topology lists for bonds/angles/etc
if (atom->molecular && topoflag) build_topology();
}
/* ----------------------------------------------------------------------
build topology neighbor lists: bond, angle, dihedral, improper
copy their list info back to Neighbor for access by bond/angle/etc classes
------------------------------------------------------------------------- */
void Neighbor::build_topology()
{
if (force->bond) {
neigh_bond->build();
nbondlist = neigh_bond->nbondlist;
bondlist = neigh_bond->bondlist;
}
if (force->angle) {
neigh_angle->build();
nanglelist = neigh_angle->nanglelist;
anglelist = neigh_angle->anglelist;
}
if (force->dihedral) {
neigh_dihedral->build();
ndihedrallist = neigh_dihedral->ndihedrallist;
dihedrallist = neigh_dihedral->dihedrallist;
}
if (force->improper) {
neigh_improper->build();
nimproperlist = neigh_improper->nimproperlist;
improperlist = neigh_improper->improperlist;
}
}
/* ----------------------------------------------------------------------
build a single occasional pairwise neighbor list indexed by I
called by other classes
------------------------------------------------------------------------- */
void Neighbor::build_one(class NeighList *mylist, int preflag)
{
// check if list structure is initialized
if (mylist == NULL)
error->all(FLERR,"Trying to build an occasional neighbor list "
"before initialization completed");
// build_one() should never be invoked on a perpetual list
if (!mylist->occasional)
error->all(FLERR,"Neighbor build one invoked on perpetual list");
// no need to build if already built since last re-neighbor
// preflag is set by fix bond/create and fix bond/swap
// b/c they invoke build_one() on same step neigh list is re-built,
// but before re-build, so need to use ">" instead of ">="
NPair *np = neigh_pair[mylist->index];
if (preflag) {
if (np->last_build > lastcall) return;
} else {
if (np->last_build >= lastcall) return;
}
// if this is copy list and parent is occasional list,
// or this is halffull and parent is occasional list,
// insure parent is current
if (mylist->listcopy && mylist->listcopy->occasional)
build_one(mylist->listcopy,preflag);
if (mylist->listfull && mylist->listfull->occasional)
build_one(mylist->listfull,preflag);
// create stencil if hasn't been created since last setup_bins() call
NStencil *ns = np->ns;
if (ns && ns->last_stencil < last_setup_bins) {
ns->create_setup();
ns->create();
}
// build the list
np->build_setup();
np->build(mylist);
}
/* ----------------------------------------------------------------------
set neighbor style and skin distance
------------------------------------------------------------------------- */
void Neighbor::set(int narg, char **arg)
{
if (narg != 2) error->all(FLERR,"Illegal neighbor command");
skin = force->numeric(FLERR,arg[0]);
if (skin < 0.0) error->all(FLERR,"Illegal neighbor command");
if (strcmp(arg[1],"nsq") == 0) style = NSQ;
else if (strcmp(arg[1],"bin") == 0) style = BIN;
else if (strcmp(arg[1],"multi") == 0) style = MULTI;
else error->all(FLERR,"Illegal neighbor command");
if (style == MULTI && lmp->citeme) lmp->citeme->add(cite_neigh_multi);
}
/* ----------------------------------------------------------------------
reset timestamps in all NeignBin, NStencil, NPair classes
so that neighbor lists will rebuild properly with timestep change
ditto for lastcall and last_setup_bins
------------------------------------------------------------------------- */
void Neighbor::reset_timestep(bigint ntimestep)
{
for (int i = 0; i < nbin; i++)
neigh_bin[i]->last_bin = -1;
for (int i = 0; i < nstencil; i++)
neigh_stencil[i]->last_stencil = -1;
for (int i = 0; i < nlist; i++) {
if (!neigh_pair[i]) continue;
neigh_pair[i]->last_build = -1;
}
lastcall = -1;
last_setup_bins = -1;
}
/* ----------------------------------------------------------------------
modify parameters of the pair-wise neighbor build
------------------------------------------------------------------------- */
void Neighbor::modify_params(int narg, char **arg)
{
int iarg = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"every") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
every = force->inumeric(FLERR,arg[iarg+1]);
if (every <= 0) error->all(FLERR,"Illegal neigh_modify command");
iarg += 2;
} else if (strcmp(arg[iarg],"delay") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
delay = force->inumeric(FLERR,arg[iarg+1]);
if (delay < 0) error->all(FLERR,"Illegal neigh_modify command");
iarg += 2;
} else if (strcmp(arg[iarg],"check") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (strcmp(arg[iarg+1],"yes") == 0) dist_check = 1;
else if (strcmp(arg[iarg+1],"no") == 0) dist_check = 0;
else error->all(FLERR,"Illegal neigh_modify command");
iarg += 2;
} else if (strcmp(arg[iarg],"once") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (strcmp(arg[iarg+1],"yes") == 0) build_once = 1;
else if (strcmp(arg[iarg+1],"no") == 0) build_once = 0;
else error->all(FLERR,"Illegal neigh_modify command");
iarg += 2;
} else if (strcmp(arg[iarg],"page") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
old_pgsize = pgsize;
pgsize = force->inumeric(FLERR,arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"one") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
old_oneatom = oneatom;
oneatom = force->inumeric(FLERR,arg[iarg+1]);
iarg += 2;
} else if (strcmp(arg[iarg],"binsize") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
binsize_user = force->numeric(FLERR,arg[iarg+1]);
if (binsize_user <= 0.0) binsizeflag = 0;
else binsizeflag = 1;
iarg += 2;
} else if (strcmp(arg[iarg],"cluster") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (strcmp(arg[iarg+1],"yes") == 0) cluster_check = 1;
else if (strcmp(arg[iarg+1],"no") == 0) cluster_check = 0;
else error->all(FLERR,"Illegal neigh_modify command");
iarg += 2;
} else if (strcmp(arg[iarg],"include") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
includegroup = group->find(arg[iarg+1]);
if (includegroup < 0)
error->all(FLERR,"Invalid group ID in neigh_modify command");
if (includegroup && (atom->firstgroupname == NULL ||
strcmp(arg[iarg+1],atom->firstgroupname) != 0))
error->all(FLERR,
"Neigh_modify include group != atom_modify first group");
iarg += 2;
} else if (strcmp(arg[iarg],"exclude") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (strcmp(arg[iarg+1],"type") == 0) {
if (iarg+4 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (nex_type == maxex_type) {
maxex_type += EXDELTA;
memory->grow(ex1_type,maxex_type,"neigh:ex1_type");
memory->grow(ex2_type,maxex_type,"neigh:ex2_type");
}
ex1_type[nex_type] = force->inumeric(FLERR,arg[iarg+2]);
ex2_type[nex_type] = force->inumeric(FLERR,arg[iarg+3]);
nex_type++;
iarg += 4;
} else if (strcmp(arg[iarg+1],"group") == 0) {
if (iarg+4 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (nex_group == maxex_group) {
maxex_group += EXDELTA;
memory->grow(ex1_group,maxex_group,"neigh:ex1_group");
memory->grow(ex2_group,maxex_group,"neigh:ex2_group");
}
ex1_group[nex_group] = group->find(arg[iarg+2]);
ex2_group[nex_group] = group->find(arg[iarg+3]);
if (ex1_group[nex_group] == -1 || ex2_group[nex_group] == -1)
error->all(FLERR,"Invalid group ID in neigh_modify command");
nex_group++;
iarg += 4;
} else if (strcmp(arg[iarg+1],"molecule/inter") == 0 ||
strcmp(arg[iarg+1],"molecule/intra") == 0) {
if (iarg+3 > narg) error->all(FLERR,"Illegal neigh_modify command");
if (atom->molecule_flag == 0)
error->all(FLERR,"Neigh_modify exclude molecule "
"requires atom attribute molecule");
if (nex_mol == maxex_mol) {
maxex_mol += EXDELTA;
memory->grow(ex_mol_group,maxex_mol,"neigh:ex_mol_group");
if (lmp->kokkos)
grow_ex_mol_intra_kokkos();
else
memory->grow(ex_mol_intra,maxex_mol,"neigh:ex_mol_intra");
}
ex_mol_group[nex_mol] = group->find(arg[iarg+2]);
if (ex_mol_group[nex_mol] == -1)
error->all(FLERR,"Invalid group ID in neigh_modify command");
if (strcmp(arg[iarg+1],"molecule/intra") == 0)
ex_mol_intra[nex_mol] = 1;
else
ex_mol_intra[nex_mol] = 0;
nex_mol++;
iarg += 3;
} else if (strcmp(arg[iarg+1],"none") == 0) {
nex_type = nex_group = nex_mol = 0;
iarg += 2;
} else error->all(FLERR,"Illegal neigh_modify command");
} else error->all(FLERR,"Illegal neigh_modify command");
}
}
/* ----------------------------------------------------------------------
remove the first group-group exclusion matching group1, group2
------------------------------------------------------------------------- */
void Neighbor::exclusion_group_group_delete(int group1, int group2)
{
int m, mlast;
for (m = 0; m < nex_group; m++)
if (ex1_group[m] == group1 && ex2_group[m] == group2 )
break;
mlast = m;
if (mlast == nex_group)
error->all(FLERR,"Unable to find group-group exclusion");
for (m = mlast+1; m < nex_group; m++) {
ex1_group[m-1] = ex1_group[m];
ex2_group[m-1] = ex2_group[m];
ex1_bit[m-1] = ex1_bit[m];
ex2_bit[m-1] = ex2_bit[m];
}
nex_group--;
}
/* ----------------------------------------------------------------------
return the value of exclude - used to check compatibility with GPU
------------------------------------------------------------------------- */
int Neighbor::exclude_setting()
{
return exclude;
}
/* ----------------------------------------------------------------------
return # of bytes of allocated memory
------------------------------------------------------------------------- */
bigint Neighbor::memory_usage()
{
bigint bytes = 0;
bytes += memory->usage(xhold,maxhold,3);
for (int i = 0; i < nlist; i++)
if (lists[i]) bytes += lists[i]->memory_usage();
for (int i = 0; i < nstencil; i++)
bytes += neigh_stencil[i]->memory_usage();
for (int i = 0; i < nbin; i++)
bytes += neigh_bin[i]->memory_usage();
if (neigh_bond) bytes += neigh_bond->memory_usage();
if (neigh_angle) bytes += neigh_angle->memory_usage();
if (neigh_dihedral) bytes += neigh_dihedral->memory_usage();
if (neigh_improper) bytes += neigh_improper->memory_usage();
return bytes;
}
diff --git a/src/pair.h b/src/pair.h
index 3f66c6095..dd859e5f2 100644
--- a/src/pair.h
+++ b/src/pair.h
@@ -1,362 +1,362 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifndef LMP_PAIR_H
#define LMP_PAIR_H
#include "pointers.h"
#include "accelerator_kokkos.h"
namespace LAMMPS_NS {
class Pair : protected Pointers {
friend class AngleSDK;
friend class AngleSDKOMP;
friend class BondQuartic;
friend class BondQuarticOMP;
friend class DihedralCharmm;
friend class DihedralCharmmOMP;
friend class FixGPU;
friend class FixOMP;
friend class ThrOMP;
friend class Info;
public:
static int instance_total; // # of Pair classes ever instantiated
double eng_vdwl,eng_coul; // accumulated energies
double virial[6]; // accumulated virial
double *eatom,**vatom; // accumulated per-atom energy/virial
double cutforce; // max cutoff for all atom pairs
double **cutsq; // cutoff sq for each atom pair
int **setflag; // 0/1 = whether each i,j has been set
int comm_forward; // size of forward communication (0 if none)
int comm_reverse; // size of reverse communication (0 if none)
int comm_reverse_off; // size of reverse comm even if newton off
int single_enable; // 1 if single() routine exists
int restartinfo; // 1 if pair style writes restart info
int respa_enable; // 1 if inner/middle/outer rRESPA routines
int one_coeff; // 1 if allows only one coeff * * call
int manybody_flag; // 1 if a manybody potential
int no_virial_fdotr_compute; // 1 if does not invoke virial_fdotr_compute()
int writedata; // 1 if writes coeffs to data file
int ghostneigh; // 1 if pair style needs neighbors of ghosts
double **cutghost; // cutoff for each ghost pair
int ewaldflag; // 1 if compatible with Ewald solver
int pppmflag; // 1 if compatible with PPPM solver
int msmflag; // 1 if compatible with MSM solver
int dispersionflag; // 1 if compatible with LJ/dispersion solver
int tip4pflag; // 1 if compatible with TIP4P solver
int dipoleflag; // 1 if compatible with dipole solver
int reinitflag; // 1 if compatible with fix adapt and alike
int tail_flag; // pair_modify flag for LJ tail correction
double etail,ptail; // energy/pressure tail corrections
double etail_ij,ptail_ij;
int evflag; // energy,virial settings
int eflag_either,eflag_global,eflag_atom;
int vflag_either,vflag_global,vflag_atom;
int ncoultablebits; // size of Coulomb table, accessed by KSpace
int ndisptablebits; // size of dispersion table
double tabinnersq;
double tabinnerdispsq;
double *rtable,*drtable,*ftable,*dftable,*ctable,*dctable;
double *etable,*detable,*ptable,*dptable,*vtable,*dvtable;
double *rdisptable, *drdisptable, *fdisptable, *dfdisptable;
double *edisptable, *dedisptable;
int ncoulshiftbits,ncoulmask;
int ndispshiftbits, ndispmask;
int nextra; // # of extra quantities pair style calculates
double *pvector; // vector of extra pair quantities
int single_extra; // number of extra single values calculated
double *svector; // vector of extra single quantities
class NeighList *list; // standard neighbor list used by most pairs
class NeighList *listhalf; // half list used by some pairs
class NeighList *listfull; // full list used by some pairs
class NeighList *listhistory; // neighbor history list used by some pairs
class NeighList *listinner; // rRESPA lists used by some pairs
class NeighList *listmiddle;
class NeighList *listouter;
int allocated; // 0/1 = whether arrays are allocated
// public so external driver can check
int compute_flag; // 0 if skip compute()
// KOKKOS host/device flag and data masks
ExecutionSpace execution_space;
unsigned int datamask_read,datamask_modify;
Pair(class LAMMPS *);
virtual ~Pair();
// top-level Pair methods
void init();
virtual void reinit();
virtual void setup() {}
double mix_energy(double, double, double, double);
double mix_distance(double, double);
void write_file(int, char **);
void init_bitmap(double, double, int, int &, int &, int &, int &);
virtual void modify_params(int, char **);
void compute_dummy(int, int);
// need to be public, so can be called by pair_style reaxc
void v_tally(int, double *, double *);
void ev_tally(int, int, int, int, double, double, double,
double, double, double);
void ev_tally3(int, int, int, double, double,
double *, double *, double *, double *);
void v_tally3(int, int, int, double *, double *, double *, double *);
void v_tally4(int, int, int, int, double *, double *, double *,
double *, double *, double *);
void ev_tally_xyz(int, int, int, int, double, double,
double, double, double, double, double, double);
// general child-class methods
virtual void compute(int, int) = 0;
virtual void compute_inner() {}
virtual void compute_middle() {}
virtual void compute_outer(int, int) {}
virtual double single(int, int, int, int,
double, double, double,
double& fforce) {
fforce = 0.0;
return 0.0;
}
virtual void settings(int, char **) = 0;
virtual void coeff(int, char **) = 0;
virtual void init_style();
virtual void init_list(int, class NeighList *);
virtual double init_one(int, int) {return 0.0;}
virtual void init_tables(double, double *);
virtual void init_tables_disp(double);
virtual void free_tables();
virtual void free_disp_tables();
virtual void write_restart(FILE *) {}
virtual void read_restart(FILE *) {}
virtual void write_restart_settings(FILE *) {}
virtual void read_restart_settings(FILE *) {}
virtual void write_data(FILE *) {}
virtual void write_data_all(FILE *) {}
virtual int pack_forward_comm(int, int *, double *, int, int *) {return 0;}
virtual void unpack_forward_comm(int, int, double *) {}
virtual int pack_forward_comm_kokkos(int, DAT::tdual_int_2d,
int, DAT::tdual_xfloat_1d &,
int, int *) {return 0;};
virtual void unpack_forward_comm_kokkos(int, int, DAT::tdual_xfloat_1d &) {}
virtual int pack_reverse_comm(int, int, double *) {return 0;}
virtual void unpack_reverse_comm(int, int *, double *) {}
virtual double memory_usage();
void set_copymode(int value) {copymode = value;}
// specific child-class methods for certain Pair styles
virtual void *extract(const char *, int &) {return NULL;}
virtual void swap_eam(double *, double **) {}
virtual void reset_dt() {}
virtual void min_xf_pointers(int, double **, double **) {}
virtual void min_xf_get(int) {}
virtual void min_x_set(int) {}
// management of callbacks to be run from ev_tally()
protected:
int num_tally_compute;
class Compute **list_tally_compute;
public:
- void add_tally_callback(class Compute *);
- void del_tally_callback(class Compute *);
+ virtual void add_tally_callback(class Compute *);
+ virtual void del_tally_callback(class Compute *);
protected:
int instance_me; // which Pair class instantiation I am
enum{GEOMETRIC,ARITHMETIC,SIXTHPOWER}; // mixing options
int special_lj[4]; // copied from force->special_lj for Kokkos
int suffix_flag; // suffix compatibility flag
// pair_modify settings
int offset_flag,mix_flag; // flags for offset and mixing
double tabinner; // inner cutoff for Coulomb table
double tabinner_disp; // inner cutoff for dispersion table
// custom data type for accessing Coulomb tables
typedef union {int i; float f;} union_int_float_t;
int vflag_fdotr;
int maxeatom,maxvatom;
int copymode; // if set, do not deallocate during destruction
// required when classes are used as functors by Kokkos
virtual void ev_setup(int, int, int alloc = 1);
void ev_unset();
void ev_tally_full(int, double, double, double, double, double, double);
void ev_tally_xyz_full(int, double, double,
double, double, double, double, double, double);
void ev_tally4(int, int, int, int, double,
double *, double *, double *, double *, double *, double *);
void ev_tally_tip4p(int, int *, double *, double, double);
void v_tally2(int, int, double, double *);
void v_tally_tensor(int, int, int, int,
double, double, double, double, double, double);
void virial_fdotr_compute();
// union data struct for packing 32-bit and 64-bit ints into double bufs
// see atom_vec.h for documentation
union ubuf {
double d;
int64_t i;
ubuf(double arg) : d(arg) {}
ubuf(int64_t arg) : i(arg) {}
ubuf(int arg) : i(arg) {}
};
inline int sbmask(int j) {
return j >> SBBITS & 3;
}
};
}
#endif
/* ERROR/WARNING messages:
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Too many total bits for bitmapped lookup table
Table size specified via pair_modify command is too large. Note that
a value of N generates a 2^N size table.
E: Cannot have both pair_modify shift and tail set to yes
These 2 options are contradictory.
E: Cannot use pair tail corrections with 2d simulations
The correction factors are only currently defined for 3d systems.
W: Using pair tail corrections with nonperiodic system
This is probably a bogus thing to do, since tail corrections are
computed by integrating the density of a periodic system out to
infinity.
W: Using pair tail corrections with pair_modify compute no
The tail corrections will thus not be computed.
W: Using pair potential shift with pair_modify compute no
The shift effects will thus not be computed.
W: Using a manybody potential with bonds/angles/dihedrals and special_bond exclusions
This is likely not what you want to do. The exclusion settings will
eliminate neighbors in the neighbor list, which the manybody potential
needs to calculated its terms correctly.
E: All pair coeffs are not set
All pair coefficients must be set in the data file or by the
pair_coeff command before running a simulation.
E: Fix adapt interface to this pair style not supported
New coding for the pair style would need to be done.
E: Pair style requires a KSpace style
No kspace style is defined.
E: Cannot yet use compute tally with Kokkos
This feature is not yet supported.
E: Pair style does not support pair_write
The pair style does not have a single() function, so it can
not be invoked by pair write.
E: Invalid atom types in pair_write command
Atom types must range from 1 to Ntypes inclusive.
E: Invalid style in pair_write command
Self-explanatory. Check the input script.
E: Invalid cutoffs in pair_write command
Inner cutoff must be larger than 0.0 and less than outer cutoff.
E: Cannot open pair_write file
The specified output file for pair energies and forces cannot be
opened. Check that the path and name are correct.
E: Bitmapped lookup tables require int/float be same size
Cannot use pair tables on this machine, because of word sizes. Use
the pair_modify command with table 0 instead.
W: Table inner cutoff >= outer cutoff
You specified an inner cutoff for a Coulombic table that is longer
than the global cutoff. Probably not what you wanted.
E: Too many exponent bits for lookup table
Table size specified via pair_modify command does not work with your
machine's floating point representation.
E: Too many mantissa bits for lookup table
Table size specified via pair_modify command does not work with your
machine's floating point representation.
E: Too few bits for lookup table
Table size specified via pair_modify command does not work with your
machine's floating point representation.
*/
diff --git a/src/pair_hybrid.cpp b/src/pair_hybrid.cpp
index 03e55006f..fa79f1cf9 100644
--- a/src/pair_hybrid.cpp
+++ b/src/pair_hybrid.cpp
@@ -1,933 +1,968 @@
/* ----------------------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "pair_hybrid.h"
#include "atom.h"
#include "force.h"
#include "pair.h"
#include "neighbor.h"
#include "neigh_request.h"
#include "update.h"
#include "comm.h"
#include "memory.h"
#include "error.h"
#include "respa.h"
using namespace LAMMPS_NS;
/* ---------------------------------------------------------------------- */
PairHybrid::PairHybrid(LAMMPS *lmp) : Pair(lmp),
styles(NULL), keywords(NULL), multiple(NULL), nmap(NULL),
- map(NULL), special_lj(NULL), special_coul(NULL)
+ map(NULL), special_lj(NULL), special_coul(NULL), compute_tally(NULL)
{
nstyles = 0;
outerflag = 0;
respaflag = 0;
if (lmp->kokkos)
error->all(FLERR,"Cannot yet use pair hybrid with Kokkos");
}
/* ---------------------------------------------------------------------- */
PairHybrid::~PairHybrid()
{
if (nstyles) {
for (int m = 0; m < nstyles; m++) {
delete styles[m];
delete [] keywords[m];
if (special_lj[m]) delete [] special_lj[m];
if (special_coul[m]) delete [] special_coul[m];
}
}
delete [] styles;
delete [] keywords;
delete [] multiple;
delete [] special_lj;
delete [] special_coul;
+ delete [] compute_tally;
delete [] svector;
if (allocated) {
memory->destroy(setflag);
memory->destroy(cutsq);
memory->destroy(cutghost);
memory->destroy(nmap);
memory->destroy(map);
}
}
/* ----------------------------------------------------------------------
call each sub-style's compute() or compute_outer() function
accumulate sub-style global/peratom energy/virial in hybrid
for global vflag = 1:
each sub-style computes own virial[6]
sum sub-style virial[6] to hybrid's virial[6]
for global vflag = 2:
call sub-style with adjusted vflag to prevent it calling
virial_fdotr_compute()
hybrid calls virial_fdotr_compute() on final accumulated f
------------------------------------------------------------------------- */
void PairHybrid::compute(int eflag, int vflag)
{
int i,j,m,n;
// if no_virial_fdotr_compute is set and global component of
// incoming vflag = 2, then
// reset vflag as if global component were 1
// necessary since one or more sub-styles cannot compute virial as F dot r
if (no_virial_fdotr_compute && vflag % 4 == 2) vflag = 1 + vflag/4 * 4;
if (eflag || vflag) ev_setup(eflag,vflag);
else evflag = vflag_fdotr = eflag_global = vflag_global =
eflag_atom = vflag_atom = 0;
// check if global component of incoming vflag = 2
// if so, reset vflag passed to substyle as if it were 0
// necessary so substyle will not invoke virial_fdotr_compute()
int vflag_substyle;
if (vflag % 4 == 2) vflag_substyle = vflag/4 * 4;
else vflag_substyle = vflag;
double *saved_special = save_special();
// check if we are running with r-RESPA using the hybrid keyword
Respa *respa = NULL;
respaflag = 0;
if (strstr(update->integrate_style,"respa")) {
respa = (Respa *) update->integrate;
if (respa->nhybrid_styles > 0) respaflag = 1;
}
for (m = 0; m < nstyles; m++) {
set_special(m);
if (!respaflag || (respaflag && respa->hybrid_compute[m])) {
// invoke compute() unless compute flag is turned off or
// outerflag is set and sub-style has a compute_outer() method
if (styles[m]->compute_flag == 0) continue;
if (outerflag && styles[m]->respa_enable)
styles[m]->compute_outer(eflag,vflag_substyle);
else styles[m]->compute(eflag,vflag_substyle);
}
restore_special(saved_special);
// jump to next sub-style if r-RESPA does not want global accumulated data
if (respaflag && !respa->tally_global) continue;
if (eflag_global) {
eng_vdwl += styles[m]->eng_vdwl;
eng_coul += styles[m]->eng_coul;
}
if (vflag_global) {
for (n = 0; n < 6; n++) virial[n] += styles[m]->virial[n];
}
if (eflag_atom) {
n = atom->nlocal;
if (force->newton_pair) n += atom->nghost;
double *eatom_substyle = styles[m]->eatom;
for (i = 0; i < n; i++) eatom[i] += eatom_substyle[i];
}
if (vflag_atom) {
n = atom->nlocal;
if (force->newton_pair) n += atom->nghost;
double **vatom_substyle = styles[m]->vatom;
for (i = 0; i < n; i++)
for (j = 0; j < 6; j++)
vatom[i][j] += vatom_substyle[i][j];
}
}
delete [] saved_special;
if (vflag_fdotr) virial_fdotr_compute();
}
+
+/* ---------------------------------------------------------------------- */
+
+void PairHybrid::add_tally_callback(Compute *ptr)
+{
+ for (int m = 0; m < nstyles; m++)
+ if (compute_tally[m]) styles[m]->add_tally_callback(ptr);
+}
+
+/* ---------------------------------------------------------------------- */
+
+void PairHybrid::del_tally_callback(Compute *ptr)
+{
+ for (int m = 0; m < nstyles; m++)
+ if (compute_tally[m]) styles[m]->del_tally_callback(ptr);
+}
+
/* ---------------------------------------------------------------------- */
void PairHybrid::compute_inner()
{
for (int m = 0; m < nstyles; m++)
if (styles[m]->respa_enable) styles[m]->compute_inner();
}
/* ---------------------------------------------------------------------- */
void PairHybrid::compute_middle()
{
for (int m = 0; m < nstyles; m++)
if (styles[m]->respa_enable) styles[m]->compute_middle();
}
/* ---------------------------------------------------------------------- */
void PairHybrid::compute_outer(int eflag, int vflag)
{
outerflag = 1;
compute(eflag,vflag);
outerflag = 0;
}
/* ----------------------------------------------------------------------
allocate all arrays
------------------------------------------------------------------------- */
void PairHybrid::allocate()
{
allocated = 1;
int n = atom->ntypes;
memory->create(setflag,n+1,n+1,"pair:setflag");
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j++)
setflag[i][j] = 0;
memory->create(cutsq,n+1,n+1,"pair:cutsq");
memory->create(cutghost,n+1,n+1,"pair:cutghost");
memory->create(nmap,n+1,n+1,"pair:nmap");
memory->create(map,n+1,n+1,nstyles,"pair:map");
for (int i = 1; i <= n; i++)
for (int j = i; j <= n; j++)
nmap[i][j] = 0;
}
/* ----------------------------------------------------------------------
create one pair style for each arg in list
------------------------------------------------------------------------- */
void PairHybrid::settings(int narg, char **arg)
{
if (narg < 1) error->all(FLERR,"Illegal pair_style command");
// delete old lists, since cannot just change settings
if (nstyles) {
for (int m = 0; m < nstyles; m++) delete styles[m];
delete [] styles;
for (int m = 0; m < nstyles; m++) delete [] keywords[m];
delete [] keywords;
}
if (allocated) {
memory->destroy(setflag);
memory->destroy(cutsq);
memory->destroy(cutghost);
memory->destroy(nmap);
memory->destroy(map);
}
allocated = 0;
// allocate list of sub-styles as big as possibly needed if no extra args
styles = new Pair*[narg];
keywords = new char*[narg];
multiple = new int[narg];
special_lj = new double*[narg];
special_coul = new double*[narg];
+ compute_tally = new int[narg];
+
// allocate each sub-style
// allocate uses suffix, but don't store suffix version in keywords,
// else syntax in coeff() will not match
// call settings() with set of args that are not pair style names
// use force->pair_map to determine which args these are
int iarg,jarg,dummy;
iarg = 0;
nstyles = 0;
while (iarg < narg) {
if (strcmp(arg[iarg],"hybrid") == 0)
error->all(FLERR,"Pair style hybrid cannot have hybrid as an argument");
if (strcmp(arg[iarg],"none") == 0)
error->all(FLERR,"Pair style hybrid cannot have none as an argument");
styles[nstyles] = force->new_pair(arg[iarg],1,dummy);
force->store_style(keywords[nstyles],arg[iarg],0);
special_lj[nstyles] = special_coul[nstyles] = NULL;
+ compute_tally[nstyles] = 1;
jarg = iarg + 1;
while (jarg < narg && !force->pair_map->count(arg[jarg])) jarg++;
styles[nstyles]->settings(jarg-iarg-1,&arg[iarg+1]);
iarg = jarg;
nstyles++;
}
// multiple[i] = 1 to M if sub-style used multiple times, else 0
for (int i = 0; i < nstyles; i++) {
int count = 0;
for (int j = 0; j < nstyles; j++) {
if (strcmp(keywords[j],keywords[i]) == 0) count++;
if (j == i) multiple[i] = count;
}
if (count == 1) multiple[i] = 0;
}
// set pair flags from sub-style flags
flags();
}
/* ----------------------------------------------------------------------
set top-level pair flags from sub-style flags
------------------------------------------------------------------------- */
void PairHybrid::flags()
{
int m;
// set comm_forward, comm_reverse, comm_reverse_off to max of any sub-style
for (m = 0; m < nstyles; m++) {
if (styles[m]) comm_forward = MAX(comm_forward,styles[m]->comm_forward);
if (styles[m]) comm_reverse = MAX(comm_reverse,styles[m]->comm_reverse);
if (styles[m]) comm_reverse_off = MAX(comm_reverse_off,
styles[m]->comm_reverse_off);
}
// single_enable = 1 if any sub-style is set
// respa_enable = 1 if any sub-style is set
// manybody_flag = 1 if any sub-style is set
// no_virial_fdotr_compute = 1 if any sub-style is set
// ghostneigh = 1 if any sub-style is set
// ewaldflag, pppmflag, msmflag, dipoleflag, dispersionflag, tip4pflag = 1
// if any sub-style is set
// compute_flag = 1 if any sub-style is set
single_enable = 0;
compute_flag = 0;
for (m = 0; m < nstyles; m++) {
if (styles[m]->single_enable) single_enable = 1;
if (styles[m]->respa_enable) respa_enable = 1;
if (styles[m]->manybody_flag) manybody_flag = 1;
if (styles[m]->no_virial_fdotr_compute) no_virial_fdotr_compute = 1;
if (styles[m]->ghostneigh) ghostneigh = 1;
if (styles[m]->ewaldflag) ewaldflag = 1;
if (styles[m]->pppmflag) pppmflag = 1;
if (styles[m]->msmflag) msmflag = 1;
if (styles[m]->dipoleflag) dipoleflag = 1;
if (styles[m]->dispersionflag) dispersionflag = 1;
if (styles[m]->tip4pflag) tip4pflag = 1;
if (styles[m]->compute_flag) compute_flag = 1;
}
// single_extra = min of all sub-style single_extra
// allocate svector
single_extra = styles[0]->single_extra;
for (m = 1; m < nstyles; m++)
single_extra = MIN(single_extra,styles[m]->single_extra);
if (single_extra) {
delete [] svector;
svector = new double[single_extra];
}
}
/* ----------------------------------------------------------------------
set coeffs for one or more type pairs
------------------------------------------------------------------------- */
void PairHybrid::coeff(int narg, char **arg)
{
if (narg < 3) error->all(FLERR,"Incorrect args for pair coefficients");
if (!allocated) allocate();
int ilo,ihi,jlo,jhi;
force->bounds(FLERR,arg[0],atom->ntypes,ilo,ihi);
force->bounds(FLERR,arg[1],atom->ntypes,jlo,jhi);
// 3rd arg = pair sub-style name
// 4th arg = pair sub-style index if name used multiple times
// allow for "none" as valid sub-style name
int multflag;
int m;
for (m = 0; m < nstyles; m++) {
multflag = 0;
if (strcmp(arg[2],keywords[m]) == 0) {
if (multiple[m]) {
multflag = 1;
if (narg < 4) error->all(FLERR,"Incorrect args for pair coefficients");
if (!isdigit(arg[3][0]))
error->all(FLERR,"Incorrect args for pair coefficients");
int index = force->inumeric(FLERR,arg[3]);
if (index == multiple[m]) break;
else continue;
} else break;
}
}
int none = 0;
if (m == nstyles) {
if (strcmp(arg[2],"none") == 0) none = 1;
else error->all(FLERR,"Pair coeff for hybrid has invalid style");
}
// move 1st/2nd args to 2nd/3rd args
// if multflag: move 1st/2nd args to 3rd/4th args
// just copy ptrs, since arg[] points into original input line
arg[2+multflag] = arg[1];
arg[1+multflag] = arg[0];
// invoke sub-style coeff() starting with 1st remaining arg
if (!none) styles[m]->coeff(narg-1-multflag,&arg[1+multflag]);
// if sub-style only allows one pair coeff call (with * * and type mapping)
// then unset setflag/map assigned to that style before setting it below
// in case pair coeff for this sub-style is being called for 2nd time
if (!none && styles[m]->one_coeff)
for (int i = 1; i <= atom->ntypes; i++)
for (int j = i; j <= atom->ntypes; j++)
if (nmap[i][j] && map[i][j][0] == m) {
setflag[i][j] = 0;
nmap[i][j] = 0;
}
// set setflag and which type pairs map to which sub-style
// if sub-style is none: set hybrid setflag, wipe out map
// else: set hybrid setflag & map only if substyle setflag is set
// previous mappings are wiped out
int count = 0;
for (int i = ilo; i <= ihi; i++) {
for (int j = MAX(jlo,i); j <= jhi; j++) {
if (none) {
setflag[i][j] = 1;
nmap[i][j] = 0;
count++;
} else if (styles[m]->setflag[i][j]) {
setflag[i][j] = 1;
nmap[i][j] = 1;
map[i][j][0] = m;
count++;
}
}
}
if (count == 0) error->all(FLERR,"Incorrect args for pair coefficients");
}
/* ----------------------------------------------------------------------
init specific to this pair style
------------------------------------------------------------------------- */
void PairHybrid::init_style()
{
int i,m,itype,jtype,used,istyle,skip;
// error if a sub-style is not used
int ntypes = atom->ntypes;
for (istyle = 0; istyle < nstyles; istyle++) {
used = 0;
for (itype = 1; itype <= ntypes; itype++)
for (jtype = itype; jtype <= ntypes; jtype++)
for (m = 0; m < nmap[itype][jtype]; m++)
if (map[itype][jtype][m] == istyle) used = 1;
if (used == 0) error->all(FLERR,"Pair hybrid sub-style is not used");
}
// check if special_lj/special_coul overrides are compatible
for (istyle = 0; istyle < nstyles; istyle++) {
if (special_lj[istyle]) {
for (i = 1; i < 4; ++i) {
if (((force->special_lj[i] == 0.0) || (force->special_lj[i] == 1.0))
&& (force->special_lj[i] != special_lj[istyle][i]))
error->all(FLERR,"Pair_modify special setting for pair hybrid "
"incompatible with global special_bonds setting");
}
}
if (special_coul[istyle]) {
for (i = 1; i < 4; ++i) {
if (((force->special_coul[i] == 0.0)
|| (force->special_coul[i] == 1.0))
&& (force->special_coul[i] != special_coul[istyle][i]))
error->all(FLERR,"Pair_modify special setting for pair hybrid "
"incompatible with global special_bonds setting");
}
}
}
// each sub-style makes its neighbor list request(s)
for (istyle = 0; istyle < nstyles; istyle++) styles[istyle]->init_style();
// create skip lists inside each pair neigh request
// any kind of list can have its skip flag set in this loop
for (i = 0; i < neighbor->nrequest; i++) {
if (!neighbor->requests[i]->pair) continue;
// istyle = associated sub-style for the request
for (istyle = 0; istyle < nstyles; istyle++)
if (styles[istyle] == neighbor->requests[i]->requestor) break;
// allocate iskip and ijskip
// initialize so as to skip all pair types
// set ijskip = 0 if type pair matches any entry in sub-style map
// set ijskip = 0 if mixing will assign type pair to this sub-style
// will occur if type pair is currently unassigned
// and both I,I and J,J are assigned to single sub-style
// and sub-style for both I,I and J,J match istyle
// set iskip = 1 only if all ijskip for itype are 1
int *iskip = new int[ntypes+1];
int **ijskip;
memory->create(ijskip,ntypes+1,ntypes+1,"pair_hybrid:ijskip");
for (itype = 1; itype <= ntypes; itype++)
for (jtype = 1; jtype <= ntypes; jtype++)
ijskip[itype][jtype] = 1;
for (itype = 1; itype <= ntypes; itype++)
for (jtype = itype; jtype <= ntypes; jtype++) {
for (m = 0; m < nmap[itype][jtype]; m++)
if (map[itype][jtype][m] == istyle)
ijskip[itype][jtype] = ijskip[jtype][itype] = 0;
if (nmap[itype][jtype] == 0 &&
nmap[itype][itype] == 1 && map[itype][itype][0] == istyle &&
nmap[jtype][jtype] == 1 && map[jtype][jtype][0] == istyle)
ijskip[itype][jtype] = ijskip[jtype][itype] = 0;
}
for (itype = 1; itype <= ntypes; itype++) {
iskip[itype] = 1;
for (jtype = 1; jtype <= ntypes; jtype++)
if (ijskip[itype][jtype] == 0) iskip[itype] = 0;
}
// if any skipping occurs
// set request->skip and copy iskip and ijskip into request
// else delete iskip and ijskip
// no skipping if pair style assigned to all type pairs
skip = 0;
for (itype = 1; itype <= ntypes; itype++)
for (jtype = 1; jtype <= ntypes; jtype++)
if (ijskip[itype][jtype] == 1) skip = 1;
if (skip) {
neighbor->requests[i]->skip = 1;
neighbor->requests[i]->iskip = iskip;
neighbor->requests[i]->ijskip = ijskip;
} else {
delete [] iskip;
memory->destroy(ijskip);
}
}
}
/* ----------------------------------------------------------------------
init for one type pair i,j and corresponding j,i
------------------------------------------------------------------------- */
double PairHybrid::init_one(int i, int j)
{
// if I,J is not set explicitly:
// perform mixing only if I,I sub-style = J,J sub-style
// also require I,I and J,J are both assigned to single sub-style
if (setflag[i][j] == 0) {
if (nmap[i][i] != 1 || nmap[j][j] != 1 || map[i][i][0] != map[j][j][0])
error->one(FLERR,"All pair coeffs are not set");
nmap[i][j] = 1;
map[i][j][0] = map[i][i][0];
}
// call init/mixing for all sub-styles of I,J
// set cutsq in sub-style just as Pair::init() does via call to init_one()
// set cutghost for I,J and J,I just as sub-style does
// sum tail corrections for I,J
// return max cutoff of all sub-styles assigned to I,J
// if no sub-styles assigned to I,J (pair_coeff none), cutmax = 0.0 returned
double cutmax = 0.0;
cutghost[i][j] = cutghost[j][i] = 0.0;
if (tail_flag) etail_ij = ptail_ij = 0.0;
nmap[j][i] = nmap[i][j];
for (int k = 0; k < nmap[i][j]; k++) {
map[j][i][k] = map[i][j][k];
double cut = styles[map[i][j][k]]->init_one(i,j);
styles[map[i][j][k]]->cutsq[i][j] =
styles[map[i][j][k]]->cutsq[j][i] = cut*cut;
if (styles[map[i][j][k]]->ghostneigh)
cutghost[i][j] = cutghost[j][i] =
MAX(cutghost[i][j],styles[map[i][j][k]]->cutghost[i][j]);
if (tail_flag) {
etail_ij += styles[map[i][j][k]]->etail_ij;
ptail_ij += styles[map[i][j][k]]->ptail_ij;
}
cutmax = MAX(cutmax,cut);
}
return cutmax;
}
/* ----------------------------------------------------------------------
invoke setup for each sub-style
------------------------------------------------------------------------- */
void PairHybrid::setup()
{
for (int m = 0; m < nstyles; m++) styles[m]->setup();
}
/* ----------------------------------------------------------------------
proc 0 writes to restart file
------------------------------------------------------------------------- */
void PairHybrid::write_restart(FILE *fp)
{
fwrite(&nstyles,sizeof(int),1,fp);
// each sub-style writes its settings, but no coeff info
int n;
for (int m = 0; m < nstyles; m++) {
n = strlen(keywords[m]) + 1;
fwrite(&n,sizeof(int),1,fp);
fwrite(keywords[m],sizeof(char),n,fp);
styles[m]->write_restart_settings(fp);
// write out per style special settings, if present
n = (special_lj[m] == NULL) ? 0 : 1;
fwrite(&n,sizeof(int),1,fp);
if (n) fwrite(special_lj[m],sizeof(double),4,fp);
n = (special_coul[m] == NULL) ? 0 : 1;
fwrite(&n,sizeof(int),1,fp);
if (n) fwrite(special_coul[m],sizeof(double),4,fp);
}
}
/* ----------------------------------------------------------------------
proc 0 reads from restart file, bcasts
------------------------------------------------------------------------- */
void PairHybrid::read_restart(FILE *fp)
{
int me = comm->me;
if (me == 0) fread(&nstyles,sizeof(int),1,fp);
MPI_Bcast(&nstyles,1,MPI_INT,0,world);
// allocate list of sub-styles
styles = new Pair*[nstyles];
keywords = new char*[nstyles];
multiple = new int[nstyles];
special_lj = new double*[nstyles];
special_coul = new double*[nstyles];
// each sub-style is created via new_pair()
// each reads its settings, but no coeff info
int n,dummy;
for (int m = 0; m < nstyles; m++) {
if (me == 0) fread(&n,sizeof(int),1,fp);
MPI_Bcast(&n,1,MPI_INT,0,world);
keywords[m] = new char[n];
if (me == 0) fread(keywords[m],sizeof(char),n,fp);
MPI_Bcast(keywords[m],n,MPI_CHAR,0,world);
styles[m] = force->new_pair(keywords[m],0,dummy);
styles[m]->read_restart_settings(fp);
// read back per style special settings, if present
special_lj[m] = special_coul[m] = NULL;
if (me == 0) fread(&n,sizeof(int),1,fp);
MPI_Bcast(&n,1,MPI_INT,0,world);
if (n > 0 ) {
special_lj[m] = new double[4];
if (me == 0) fread(special_lj[m],sizeof(double),4,fp);
MPI_Bcast(special_lj[m],4,MPI_DOUBLE,0,world);
}
if (me == 0) fread(&n,sizeof(int),1,fp);
MPI_Bcast(&n,1,MPI_INT,0,world);
if (n > 0 ) {
special_coul[m] = new double[4];
if (me == 0) fread(special_coul[m],sizeof(double),4,fp);
MPI_Bcast(special_coul[m],4,MPI_DOUBLE,0,world);
}
}
// multiple[i] = 1 to M if sub-style used multiple times, else 0
for (int i = 0; i < nstyles; i++) {
int count = 0;
for (int j = 0; j < nstyles; j++) {
if (strcmp(keywords[j],keywords[i]) == 0) count++;
if (j == i) multiple[i] = count;
}
if (count == 1) multiple[i] = 0;
}
// set pair flags from sub-style flags
flags();
}
/* ----------------------------------------------------------------------
call sub-style to compute single interaction
error if sub-style does not support single() call
since overlay could have multiple sub-styles, sum results explicitly
------------------------------------------------------------------------- */
double PairHybrid::single(int i, int j, int itype, int jtype,
double rsq, double factor_coul, double factor_lj,
double &fforce)
{
if (nmap[itype][jtype] == 0)
error->one(FLERR,"Invoked pair single on pair style none");
double fone;
fforce = 0.0;
double esum = 0.0;
for (int m = 0; m < nmap[itype][jtype]; m++) {
if (rsq < styles[map[itype][jtype][m]]->cutsq[itype][jtype]) {
if (styles[map[itype][jtype][m]]->single_enable == 0)
error->one(FLERR,"Pair hybrid sub-style does not support single call");
if ((special_lj[map[itype][jtype][m]] != NULL) ||
(special_coul[map[itype][jtype][m]] != NULL))
error->one(FLERR,"Pair hybrid single calls do not support"
" per sub-style special bond values");
esum += styles[map[itype][jtype][m]]->
single(i,j,itype,jtype,rsq,factor_coul,factor_lj,fone);
fforce += fone;
// copy substyle extra values into hybrid's svector
if (single_extra && styles[map[itype][jtype][m]]->single_extra)
for (m = 0; m < single_extra; m++)
svector[m] = styles[map[itype][jtype][m]]->svector[m];
}
}
return esum;
}
/* ----------------------------------------------------------------------
modify parameters of the pair style and its sub-styles
------------------------------------------------------------------------- */
void PairHybrid::modify_params(int narg, char **arg)
{
if (narg == 0) error->all(FLERR,"Illegal pair_modify command");
// if 1st keyword is pair, apply other keywords to one sub-style
if (strcmp(arg[0],"pair") == 0) {
if (narg < 2) error->all(FLERR,"Illegal pair_modify command");
int m;
for (m = 0; m < nstyles; m++)
if (strcmp(arg[1],keywords[m]) == 0) break;
if (m == nstyles) error->all(FLERR,"Unknown pair_modify hybrid sub-style");
int iarg = 2;
if (multiple[m]) {
if (narg < 3) error->all(FLERR,"Illegal pair_modify command");
int multiflag = force->inumeric(FLERR,arg[2]);
for (m = 0; m < nstyles; m++)
if (strcmp(arg[1],keywords[m]) == 0 && multiflag == multiple[m]) break;
if (m == nstyles)
error->all(FLERR,"Unknown pair_modify hybrid sub-style");
iarg = 3;
}
// if 2nd keyword (after pair) is special:
// invoke modify_special() for the sub-style
if (iarg < narg && strcmp(arg[iarg],"special") == 0) {
if (narg < iarg+5)
error->all(FLERR,"Illegal pair_modify special command");
modify_special(m,narg-iarg,&arg[iarg+1]);
iarg += 5;
}
+ // if 2nd keyword (after pair) is compute/tally:
+ // set flag to register USER-TALLY computes accordingly
+
+ if (iarg < narg && strcmp(arg[iarg],"compute/tally") == 0) {
+ if (narg < iarg+2)
+ error->all(FLERR,"Illegal pair_modify compute/tally command");
+ if (strcmp(arg[iarg+1],"yes") == 0) {
+ compute_tally[m] = 1;
+ } else if (strcmp(arg[iarg+1],"no") == 0) {
+ compute_tally[m] = 0;
+ } else error->all(FLERR,"Illegal pair_modify compute/tally command");
+ iarg += 2;
+ }
+
// apply the remaining keywords to the base pair style itself and the
// sub-style except for "pair" and "special".
// the former is important for some keywords like "tail" or "compute"
if (narg-iarg > 0) {
Pair::modify_params(narg-iarg,&arg[iarg]);
styles[m]->modify_params(narg-iarg,&arg[iarg]);
}
// apply all keywords to pair hybrid itself and every sub-style
} else {
Pair::modify_params(narg,arg);
for (int m = 0; m < nstyles; m++) styles[m]->modify_params(narg,arg);
}
}
/* ----------------------------------------------------------------------
store a local per pair style override for special_lj and special_coul
------------------------------------------------------------------------- */
void PairHybrid::modify_special(int m, int narg, char **arg)
{
double special[4];
int i;
special[0] = 1.0;
special[1] = force->numeric(FLERR,arg[1]);
special[2] = force->numeric(FLERR,arg[2]);
special[3] = force->numeric(FLERR,arg[3]);
if (strcmp(arg[0],"lj/coul") == 0) {
if (!special_lj[m]) special_lj[m] = new double[4];
if (!special_coul[m]) special_coul[m] = new double[4];
for (i = 0; i < 4; ++i)
special_lj[m][i] = special_coul[m][i] = special[i];
} else if (strcmp(arg[0],"lj") == 0) {
if (!special_lj[m]) special_lj[m] = new double[4];
for (i = 0; i < 4; ++i)
special_lj[m][i] = special[i];
} else if (strcmp(arg[0],"coul") == 0) {
if (!special_coul[m]) special_coul[m] = new double[4];
for (i = 0; i < 4; ++i)
special_coul[m][i] = special[i];
} else error->all(FLERR,"Illegal pair_modify special command");
}
/* ----------------------------------------------------------------------
override global special bonds settings with per substyle values
------------------------------------------------------------------------- */
void PairHybrid::set_special(int m)
{
int i;
if (special_lj[m])
for (i = 0; i < 4; ++i) force->special_lj[i] = special_lj[m][i];
if (special_coul[m])
for (i = 0; i < 4; ++i) force->special_coul[i] = special_coul[m][i];
}
/* ----------------------------------------------------------------------
store global special settings
------------------------------------------------------------------------- */
double * PairHybrid::save_special()
{
double *saved = new double[8];
for (int i = 0; i < 4; ++i) {
saved[i] = force->special_lj[i];
saved[i+4] = force->special_coul[i];
}
return saved;
}
/* ----------------------------------------------------------------------
restore global special settings from saved data
------------------------------------------------------------------------- */
void PairHybrid::restore_special(double *saved)
{
for (int i = 0; i < 4; ++i) {
force->special_lj[i] = saved[i];
force->special_coul[i] = saved[i+4];
}
}
/* ----------------------------------------------------------------------
extract a ptr to a particular quantity stored by pair
pass request thru to sub-styles
return first non-NULL result except for cut_coul request
for cut_coul, insure all non-NULL results are equal since required by Kspace
------------------------------------------------------------------------- */
void *PairHybrid::extract(const char *str, int &dim)
{
void *cutptr = NULL;
void *ptr;
double cutvalue = 0.0;
for (int m = 0; m < nstyles; m++) {
ptr = styles[m]->extract(str,dim);
if (ptr && strcmp(str,"cut_coul") == 0) {
double *p_newvalue = (double *) ptr;
double newvalue = *p_newvalue;
if (cutptr && newvalue != cutvalue)
error->all(FLERR,
"Coulomb cutoffs of pair hybrid sub-styles do not match");
cutptr = ptr;
cutvalue = newvalue;
} else if (ptr) return ptr;
}
if (strcmp(str,"cut_coul") == 0) return cutptr;
return NULL;
}
/* ---------------------------------------------------------------------- */
void PairHybrid::reset_dt()
{
for (int m = 0; m < nstyles; m++) styles[m]->reset_dt();
}
/* ----------------------------------------------------------------------
check if itype,jtype maps to sub-style
------------------------------------------------------------------------- */
int PairHybrid::check_ijtype(int itype, int jtype, char *substyle)
{
for (int m = 0; m < nmap[itype][jtype]; m++)
if (strcmp(keywords[map[itype][jtype][m]],substyle) == 0) return 1;
return 0;
}
/* ----------------------------------------------------------------------
memory usage of each sub-style
------------------------------------------------------------------------- */
double PairHybrid::memory_usage()
{
double bytes = maxeatom * sizeof(double);
bytes += maxvatom*6 * sizeof(double);
for (int m = 0; m < nstyles; m++) bytes += styles[m]->memory_usage();
return bytes;
}
diff --git a/src/pair_hybrid.h b/src/pair_hybrid.h
index e3de3b022..b8b9af5f4 100644
--- a/src/pair_hybrid.h
+++ b/src/pair_hybrid.h
@@ -1,154 +1,158 @@
/* -*- c++ -*- ----------------------------------------------------------
LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
http://lammps.sandia.gov, Sandia National Laboratories
Steve Plimpton, sjplimp@sandia.gov
Copyright (2003) Sandia Corporation. Under the terms of Contract
DE-AC04-94AL85000 with Sandia Corporation, the U.S. Government retains
certain rights in this software. This software is distributed under
the GNU General Public License.
See the README file in the top-level LAMMPS directory.
------------------------------------------------------------------------- */
#ifdef PAIR_CLASS
PairStyle(hybrid,PairHybrid)
#else
#ifndef LMP_PAIR_HYBRID_H
#define LMP_PAIR_HYBRID_H
#include <stdio.h>
#include "pair.h"
namespace LAMMPS_NS {
class PairHybrid : public Pair {
friend class FixGPU;
friend class FixIntel;
friend class FixOMP;
friend class Force;
friend class Respa;
friend class Info;
public:
PairHybrid(class LAMMPS *);
virtual ~PairHybrid();
void compute(int, int);
void settings(int, char **);
virtual void coeff(int, char **);
void init_style();
double init_one(int, int);
void setup();
void write_restart(FILE *);
void read_restart(FILE *);
double single(int, int, int, int, double, double, double, double &);
void modify_params(int narg, char **arg);
double memory_usage();
void compute_inner();
void compute_middle();
void compute_outer(int, int);
void *extract(const char *, int &);
void reset_dt();
int check_ijtype(int, int, char *);
+ virtual void add_tally_callback(class Compute *);
+ virtual void del_tally_callback(class Compute *);
+
protected:
int nstyles; // # of sub-styles
Pair **styles; // list of Pair style classes
char **keywords; // style name of each Pair style
int *multiple; // 0 if style used once, else Mth instance
int outerflag; // toggle compute() when invoked by outer()
int respaflag; // 1 if different substyles are assigned to
// different r-RESPA levels
int **nmap; // # of sub-styles itype,jtype points to
int ***map; // list of sub-styles itype,jtype points to
double **special_lj; // list of per style LJ exclusion factors
double **special_coul; // list of per style Coulomb exclusion factors
+ int *compute_tally; // list of on/off flags for tally computes
void allocate();
void flags();
void modify_special(int, int, char**);
double *save_special();
void set_special(int);
void restore_special(double *);
};
}
#endif
#endif
/* ERROR/WARNING messages:
E: Cannot yet use pair hybrid with Kokkos
This feature is not yet supported.
E: Illegal ... command
Self-explanatory. Check the input script syntax and compare to the
documentation for the command. You can use -echo screen as a
command-line option when running LAMMPS to see the offending line.
E: Pair style hybrid cannot have hybrid as an argument
Self-explanatory.
E: Pair style hybrid cannot have none as an argument
Self-explanatory.
E: Incorrect args for pair coefficients
Self-explanatory. Check the input script or data file.
E: Pair coeff for hybrid has invalid style
Style in pair coeff must have been listed in pair_style command.
E: Pair hybrid sub-style is not used
No pair_coeff command used a sub-style specified in the pair_style
command.
E: Pair_modify special setting for pair hybrid incompatible with global special_bonds setting
Cannot override a setting of 0.0 or 1.0 or change a setting between
0.0 and 1.0.
E: All pair coeffs are not set
All pair coefficients must be set in the data file or by the
pair_coeff command before running a simulation.
E: Invoked pair single on pair style none
A command (e.g. a dump) attempted to invoke the single() function on a
pair style none, which is illegal. You are probably attempting to
compute per-atom quantities with an undefined pair style.
E: Pair hybrid sub-style does not support single call
You are attempting to invoke a single() call on a pair style
that doesn't support it.
E: Pair hybrid single calls do not support per sub-style special bond values
Self-explanatory.
E: Unknown pair_modify hybrid sub-style
The choice of sub-style is unknown.
E: Coulomb cutoffs of pair hybrid sub-styles do not match
If using a Kspace solver, all Coulomb cutoffs of long pair styles must
be the same.
*/
diff --git a/src/version.h b/src/version.h
index e6ffb22dc..7ae7ec487 100644
--- a/src/version.h
+++ b/src/version.h
@@ -1 +1 @@
-#define LAMMPS_VERSION "11 Apr 2017"
+#define LAMMPS_VERSION "4 May 2017"
diff --git a/tools/msi2lmp/README b/tools/msi2lmp/README
index a20f6e893..db9b1aca5 100644
--- a/tools/msi2lmp/README
+++ b/tools/msi2lmp/README
@@ -1,227 +1,240 @@
-Axel Kohlmeyer is the current maintainer of the msi2lmp tool.
-Please send any inquiries about msi2lmp to the lammps-users mailing list.
-06 Oct 2016 Axel Kohlmeyer <akohlmey@gmail.com>
-
-Improved whitespace handling in parsing topology and force field
-files to avoid bogus warnings about type name truncation.
-
-24 Oct 2015 Axel Kohlmeyer <akohlmey@gmail.com>
-
-Added check to make certain that force field files
-are consistent with the notation of non-bonded parameters
-that the msi2lmp code expects. For Class 1 and OPLS-AA
-the A-B notation with geometric mixing is expected and for
-Class 2 the r-eps notation with sixthpower mixing.
-
-11 Sep 2014 Axel Kohlmeyer <akohlmey@gmail.com>
-
-Refactored ReadMdfFile.c so it more consistently honors
-the MAX_NAME and MAX_STRING string length defines and
-potentially handles inputs with long names better.
-
-27 May 2014 Axel Kohlmeyer <akohlmey@gmail.com>
-
-Added TopoTools style type hints as comments to all Mass, PairCoeff,
-BondCoeff, AngleCoeff, DihedralCoeff, ImproperCoeff entries.
-This should make it easier to identify force field entries with
-the structure and force field map in the data file later.
-
-06 Mar 2014 Axel Kohlmeyer <akohlmey@gmail.com>
-
-Fixed a bug in handling of triclinic cells, where the matrices to
-convert to and from fractional coordinates were incorrectly built.
-
-26 Oct 2013 Axel Kohlmeyer <akohlmey@gmail.com>
-
-Implemented writing out force field style hints in generated data
-files for improved consistency checking when reading those files.
-Also added writing out CGCMM style comments to identify atom types.
+ msi2lmp.exe
-08 Oct 2013 Axel Kohlmeyer <akohlmey@gmail.com>
+This code has several known limitations listed below under "LIMITATIONS"
+(and possibly some unknown ones, too) and is no longer under active
+development. Only the occasional bugfix is applied.
-Fixed a memory access violation with Class 2 force fields.
-Free all allocated memory to better detection of memory errors.
-Print out version number and data with all print levels > 0.
-Added valgrind checks to the regression tests
+Please send any inquiries about msi2lmp to the lammps-users
+mailing list and not to individual people.
-08 Oct 2013 Axel Kohlmeyer <akohlmey@gmail.com>
+------------------------------------------------------------------------
-Fixed a memory access violation with Class 2 force fields.
-Free all allocated memory to better detection of memory errors.
-Print out version number and data with all print levels > 0.
-Added valgrind checks to the regression tests
+OVERVIEW
-02 Aug 2013 Axel Kohlmeyer <akohlmey@gmail.com>
-
-Added rudimentary support for OPLS-AA based on
-input provided by jeff greathouse.
+This is the third version of a program that generates a LAMMPS data file
+based on the information in MSI .car (atom coordinates), .mdf (molecular
+topology) and .frc (forcefield) files. The .car and .mdf files are
+specific to a molecular system while the .frc file is specific to a
+forcefield version. The only coherency needed between .frc and
+.car/.mdf files are the atom types.
-18 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
-
-Added support for writing out image flags
-Improved accuracy of atom masses
-Added flag for shifting the entire system
-Fixed some minor logic bugs and prepared
-for supporting other force fields and morse style bonds.
-
-12 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
-
-Fixed the bug that caused improper coefficients to be wrong
-Cleaned up the handling of box parameters and center the box
-by default around the system/molecule. Added a flag to make
-this step optional and center the box around the origin instead.
-Added a regression test script with examples.
-
-1 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
+The first version was written by Steve Lustig at Dupont, but required
+using Discover to derive internal coordinates and forcefield parameters
-Cleanup and improved port to windows.
-Removed some more static string limits.
-Added print level 3 for additional output.
-Make code stop at missing force field parameters
-and added -i flag to override this.
-Safer argument checking.
-Provide short versions for all flags.
+The second version was written by Michael Peachey while an intern in the
+Cray Chemistry Applications Group managed by John Carpenter. This
+version derived internal coordinates from the mdf file and looked up
+parameters in the frc file thus eliminating the need for Discover.
-23 Sep 2011
+The third version was written by John Carpenter to optimize the
+performance of the program for large molecular systems (the original
+code for deriving atom numbers was quadratic in time) and to make the
+program fully dynamic. The second version used fixed dimension arrays
+for the internal coordinates.
-added support for triclinic boxes
-see msi2lmp/TriclinicModification.pdf doc for details
+The third version was revised in Fall 2011 by Stephanie Teich-McGoldrick
+to add support non-orthogonal cells.
------------------------------
+The next revision was started in Summer/Fall 2013 by Axel Kohlmeyer to
+improve portability to Windows compilers, clean up command line parsing
+and improve compatibility with the then current LAMMPS versions. This
+revision removes compatibility with the obsolete LAMMPS version written
+in Fortran 90.
- msi2lmp V3.6 4/10/2005
+INSTALLATION & USAGE
- This program uses the .car and .mdf files from MSI/Biosyms's INSIGHT
+This program uses the .car and .mdf files from MSI/Biosyms's INSIGHT
program to produce a LAMMPS data file.
1. Building msi2lmp
Use the Makefile in the src directory. It is
currently set up for gcc. You will have to modify
it to use a different compiler.
2. Testing the program
There are several pairs of input test files in the format generated
by materials studio or compatible programs (one .car and one .mdf
file each) in the test directory. There is also a LAMMPS input to
run a minimization for each and write out the resulting system as
a data file. With the runtests.sh script all of those inputs are
converted via msi2lmp, then the minimization with LAMMPS is run
and the generated data files are compared with the corresponding
files in the reference folder. This script assumes you are on a
unix/linux system and that you have compile a serial LAMMPS executable
called lmp_serial with make serial. The tests are groups by the
force fields they use.
3. To run the program
The program is started by supplying information at the command prompt
according to the usage described below.
USAGE: msi2lmp.exe <ROOTNAME> {-print #} {-class #} {-frc FRC_FILE}
{-ignore} {-nocenter} {-shift # # #}
-- msi2lmp.exe is the name of the executable
-- <ROOTNAME> is the base name of the .car and .mdf files
-- -2001
Output lammps files for LAMMPS version 2001 (F90 version)
Default is to write output for the C++ version of LAMMPS
-- -print (or -p)
# is the print level 0 - silent except for error messages
1 - minimal (default)
2 - verbose (usual for developing and
checking new data files for consistency)
3 - even more verbose (additional debug info)
-- -ignore (or -i) ignore errors about missing force field parameters
and treat them as warnings instead.
-- -nocenter (or -n) do not recenter the simulation box around the
geometrical center of the provided geometry but
rather around the origin
-- -oldstyle (or -o) write out a data file without style hints
(to be compatible with older LAMMPS versions)
-- -shift (or -s) translate the entire system (box and coordinates)
by a vector (default: 0.0 0.0 0.0)
-- -class (or -c)
# is the class of forcefield to use (I or 1 = Class I e.g., CVFF)
(O or 0 = OPLS-AA)
(II or 2 = Class II e.g., CFFx)
default is -class I
-- -frc (or -f) specifies name of the forcefield file (e.g., cff91)
If the file name includes a directory component (or drive letter
on Windows), then the name is used as is. Otherwise, the program
looks for the forcefield file in $MSI2LMP_LIBRARY (or %MSI2LMP_LIBRARY%
on Windows). If $MSI2LMP_LIBRARY is not set, ../frc_files is used
(for testing). If the file name does not end in .frc, then .frc
is appended to the name.
For example, -frc cvff (assumes cvff.frc is in $MSI2LMP_LIBRARY
or ../frc_files)
-frc cff/cff91 (assumes cff91.frc is in cff)
-frc /usr/local/forcefields/cff95
(assumes cff95.frc is in /usr/local/forcefields/)
By default, the program uses $MSI2LMP_LIBRARY/cvff.frc or
../frc_files/cvff.frc depending on whether MSI2LMP_LIBRARY is set.
-- the LAMMPS data file is written to <ROOTNAME>.data
protocol and error information is written to the screen.
-****************************************************************
-*
-* msi2lmp
-*
-* This is the third version of a program that generates a LAMMPS
-* data file based on the information in MSI .car (atom
-* coordinates), .mdf (molecular topology) and .frc (forcefield)
-* files. The .car and .mdf files are specific to a molecular
-* system while the .frc file is specific to a forcefield version.
-* The only coherency needed between .frc and .car/.mdf files are
-* the atom types.
-*
-* The first version was written by Steve Lustig at Dupont, but
-* required using Discover to derive internal coordinates and
-* forcefield parameters
-*
-* The second version was written by Michael Peachey while an
-* intern in the Cray Chemistry Applications Group managed
-* by John Carpenter. This version derived internal coordinates
-* from the mdf file and looked up parameters in the frc file
-* thus eliminating the need for Discover.
-*
-* The third version was written by John Carpenter to optimize
-* the performance of the program for large molecular systems
-* (the original code for deriving atom numbers was quadratic in time)
-* and to make the program fully dynamic. The second version used
-* fixed dimension arrays for the internal coordinates.
-*
-* The current maintainer is only reluctantly doing so because John Mayo no longer
-* needs this code.
-*
-* V3.2 corresponds to adding code to MakeLists.c to gracefully deal with
-* systems that may only be molecules of 1 to 3 atoms. In V3.1, the values
-* for number_of_dihedrals, etc. could be unpredictable in these systems.
-*
-* V3.3 was generated in response to a strange error reading a MDF file generated by
-* Accelys' Materials Studio GUI. Simply rewriting the input part of ReadMdfFile.c
-* seems to have fixed the problem.
-*
-* V3.4 and V3.5 are minor upgrades to fix bugs associated mostly with .car and .mdf files
-* written by Accelys' Materials Studio GUI.
-*
-* V3.6 outputs to LAMMPS 2005 (C++ version).
-*
-* Contact: Kelly L. Anderson, kelly.anderson@cantab.net
-*
-* April 2005
+------------------------------------------------------------------------
+
+LIMITATIONS
+
+msi2lmp has the following known limitations:
+
+- there is no support to select morse bonds over harmonic bonds
+- there is no support for auto-equivalences to supplement fully
+ parameterized interactions with heuristic ones
+- there is no support for bond increments
+
+------------------------------------------------------------------------
+
+CHANGELOG
+
+06 Oct 2016 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Improved whitespace handling in parsing topology and force field
+files to avoid bogus warnings about type name truncation.
+
+24 Oct 2015 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Added check to make certain that force field files are consistent with
+the notation of non-bonded parameters that the msi2lmp code expects.
+For Class 1 and OPLS-AA the A-B notation with geometric mixing is
+expected and for Class 2 the r-eps notation with sixthpower mixing.
+
+11 Sep 2014 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Refactored ReadMdfFile.c so it more consistently honors the MAX_NAME
+and MAX_STRING string length defines and potentially handles inputs
+with long names better.
+
+27 May 2014 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Added TopoTools style type hints as comments to all Mass, PairCoeff,
+BondCoeff, AngleCoeff, DihedralCoeff, ImproperCoeff entries.
+This should make it easier to identify force field entries with
+the structure and force field map in the data file later.
+
+06 Mar 2014 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Fixed a bug in handling of triclinic cells, where the matrices to
+convert to and from fractional coordinates were incorrectly built.
+
+26 Oct 2013 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Implemented writing out force field style hints in generated data
+files for improved consistency checking when reading those files.
+Also added writing out CGCMM style comments to identify atom types.
+
+08 Oct 2013 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Fixed a memory access violation with Class 2 force fields. Free all
+allocated memory to better detection of memory errors. Print out
+version number and data with all print levels > 0. Added valgrind
+checks to the regression tests.
+
+02 Aug 2013 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Added rudimentary support for OPLS-AA based on input provided
+by jeff greathouse.
+
+18 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Added support for writing out image flags. Improved accuracy of atom
+masses. Added flag for shifting the entire system. Fixed some minor
+logic bugs and prepared for supporting other force fields and morse
+style bonds.
+
+12 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Fixed the bug that caused improper coefficients to be wrong. Cleaned up
+the handling of box parameters and center the box by default around the
+system/molecule. Added a flag to make this step optional and center the
+box around the origin instead. Added a regression test script with
+examples.
+
+1 Jul 2013 Axel Kohlmeyer <akohlmey@gmail.com>
+
+Cleanup and improved port to windows. Removed some more static string
+limits. Added print level 3 for additional output. Make code stop at
+missing force field parameters and added -i flag to override this.
+Safer argument checking. Provide short versions for all flags.
+
+23 Sep 2011
+
+added support for triclinic boxes
+
+V3.6 outputs to LAMMPS 2005 (C++ version).
+
+Contact: Kelly L. Anderson, kelly.anderson@cantab.net
+
+V3.4 and V3.5 are minor upgrades to fix bugs associated mostly with .car
+ and .mdf files written by Accelys' Materials Studio GUI. April 2005
+
+V3.3 was generated in response to a strange error reading a MDF file
+generated by Accelys' Materials Studio GUI. Simply rewriting the input
+part of ReadMdfFile.c seems to have fixed the problem.
+
+V3.2 corresponds to adding code to MakeLists.c to gracefully deal with
+systems that may only be molecules of 1 to 3 atoms. In V3.1, the values
+for number_of_dihedrals, etc. could be unpredictable in these systems.
+
+-----------------------------
+
+ msi2lmp v3.9.8 6/10/2016
+
diff --git a/tools/msi2lmp/src/GetParameters.c b/tools/msi2lmp/src/GetParameters.c
index e183c529e..192b4d296 100644
--- a/tools/msi2lmp/src/GetParameters.c
+++ b/tools/msi2lmp/src/GetParameters.c
@@ -1,1310 +1,1310 @@
#include "msi2lmp.h"
#include "Forcefield.h"
#include <string.h>
#include <stdlib.h>
#include <math.h>
static int find_improper_body_data(char [][5],struct FrcFieldItem,int *);
static void rearrange_improper(int,int);
static int find_trigonal_body_data(char [][5],struct FrcFieldItem);
static int find_angleangle_data(char [][5],struct FrcFieldItem,int[]);
static int find_match(int, char [][5],struct FrcFieldItem,int *);
static int match_types(int,int,char [][5],char [][5],int *);
static double get_r0(int,int);
static double get_t0(int,int,int);
static int quo_cp();
static void get_equivs(int,char [][5],char[][5]);
static int find_equiv_type(char[]);
/**********************************************************************/
/* */
/* GetParameters is a long routine for searching the forcefield */
/* parameters (read in by ReadFrcFile) for parameters corresponding */
/* to the different internal coordinate types derived by MakeLists */
/* */
/**********************************************************************/
void GetParameters()
{
int i,j,k,backwards,cp_type,rearrange;
int kloc[3],multiplicity;
char potential_types[4][5];
char equiv_types[4][5];
double rab,rbc,rcd,tabc,tbcd,tabd,tcbd;
if (pflag > 1) fprintf(stderr," Trying Atom Equivalences if needed\n");
/**********************************************************************/
/* */
/* Find masses of atom types */
/* */
/**********************************************************************/
for (i=0; i < no_atom_types; i++) {
backwards = -1;
strncpy(potential_types[0],atomtypes[i].potential,5);
k = find_match(1,potential_types,ff_atomtypes,&backwards);
if (k < 0) {
printf(" Unable to find mass for %s\n",atomtypes[i].potential);
condexit(10);
} else {
atomtypes[i].mass = ff_atomtypes.data[k].ff_param[0];
}
}
/**********************************************************************/
/* */
/* Find VDW parameters for atom types */
/* */
/**********************************************************************/
for (i=0; i < no_atom_types; i++) {
backwards = 0;
for (j=0; j < 2; j++) atomtypes[i].params[j] = 0.0;
strncpy(potential_types[0],atomtypes[i].potential,5);
k = find_match(1,potential_types,ff_vdw,&backwards);
if (k < 0) {
get_equivs(1,potential_types,equiv_types);
if (pflag > 2) printf(" Using equivalences for VDW %s -> %s\n",
potential_types[0],equiv_types[0]);
k = find_match(1,equiv_types,ff_vdw,&backwards);
}
if (k < 0) {
printf(" Unable to find vdw data for %s\n",atomtypes[i].potential);
condexit(11);
} else {
if (ljtypeflag == 0) {
if((ff_vdw.data[k].ff_param[0] != 0.0 ) &&
(ff_vdw.data[k].ff_param[1] != 0.0)) {
atomtypes[i].params[0] =
(ff_vdw.data[k].ff_param[1]*
ff_vdw.data[k].ff_param[1])/(4.0*ff_vdw.data[k].ff_param[0]);
atomtypes[i].params[1] = pow((ff_vdw.data[k].ff_param[0]/
ff_vdw.data[k].ff_param[1]),
(1.0/6.0));
}
} else if (ljtypeflag == 1) {
atomtypes[i].params[0] = ff_vdw.data[k].ff_param[1];
atomtypes[i].params[1] = ff_vdw.data[k].ff_param[0];
} else {
printf(" Unknown LJ parameter type %d\n",ljtypeflag);
exit(111);
}
}
}
if (pflag > 2) {
printf("\n Atom Types, Masses and VDW Parameters\n");
for (i=0; i < no_atom_types; i++) {
printf(" %3s %8.4f %8.4f %8.4f\n",
atomtypes[i].potential,atomtypes[i].mass, atomtypes[i].params[0],atomtypes[i].params[1]);
}
}
/**********************************************************************/
/* */
/* Find parameters for bond types */
/* */
/**********************************************************************/
for (i=0; i < no_bond_types; i++) {
backwards = 0;
for (j=0; j < 4; j++) bondtypes[i].params[j] = 0.0;
for (j=0; j < 2; j++)
strncpy(potential_types[j],
atomtypes[bondtypes[i].types[j]].potential,5);
k = find_match(2,potential_types,ff_bond,&backwards);
if (k < 0) {
get_equivs(2,potential_types,equiv_types);
if (pflag > 2) {
printf(" Using equivalences for bond %s %s -> %s %s\n",
potential_types[0],potential_types[1],
equiv_types[0],equiv_types[1]);
}
k = find_match(2,equiv_types,ff_bond,&backwards);
}
if (k < 0) {
printf(" Unable to find bond data for %s %s\n",
potential_types[0],potential_types[1]);
condexit(12);
} else {
if (forcefield & (FF_TYPE_CLASS1|FF_TYPE_OPLSAA)) {
bondtypes[i].params[0] = ff_bond.data[k].ff_param[1];
bondtypes[i].params[1] = ff_bond.data[k].ff_param[0];
- }
+ }
if (forcefield & FF_TYPE_CLASS2) {
for (j=0; j < 4; j++)
bondtypes[i].params[j] = ff_bond.data[k].ff_param[j];
}
}
}
if (pflag > 2) {
printf("\n Bond Types and Parameters\n");
for (i=0; i < no_bond_types; i++) {
for (j=0; j < 2; j++)
printf(" %-3s",atomtypes[bondtypes[i].types[j]].potential);
for (j=0; j < 4; j++)
printf(" %8.4f",bondtypes[i].params[j]);
printf("\n");
}
}
/**********************************************************************/
/* */
/* Find parameters for angle types including bondbond, */
/* and bondangle parameters if Class II */
/* */
/* Each of the cross terms are searched separately even though */
/* they share a given angle type. This allows parameters to be */
/* in different order in the forcefield for each cross term or */
/* maybe not even there. */
/* */
/**********************************************************************/
for (i=0; i < no_angle_types; i++) {
backwards = 0;
for (j=0; j < 4; j++) angletypes[i].params[j] = 0.0;
for (j=0; j < 3; j++)
strncpy(potential_types[j],atomtypes[angletypes[i].types[j]].potential,5);
k = find_match(3,potential_types,ff_ang,&backwards);
if (k < 0) {
get_equivs(3,potential_types,equiv_types);
if (pflag > 2) {
printf(" Using equivalences for angle %s %s %s -> %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],
equiv_types[0],equiv_types[1],
equiv_types[2]);
}
k = find_match(3,equiv_types,ff_ang,&backwards);
}
if (k < 0) {
printf(" Unable to find angle data for %s %s %s\n",
potential_types[0],potential_types[1],potential_types[2]);
condexit(13);
} else {
if (forcefield & (FF_TYPE_CLASS1|FF_TYPE_OPLSAA)) {
angletypes[i].params[0] = ff_ang.data[k].ff_param[1];
angletypes[i].params[1] = ff_ang.data[k].ff_param[0];
}
if (forcefield & FF_TYPE_CLASS2) {
for (j=0; j < 4; j++)
angletypes[i].params[j] = ff_ang.data[k].ff_param[j];
}
}
if (forcefield & FF_TYPE_CLASS2) {
get_equivs(3,potential_types,equiv_types);
if (pflag > 2) {
printf(" Using equivalences for 3 body cross terms %s %s %s -> %s %s %s\n",
potential_types[0],potential_types[1],potential_types[2],
equiv_types[0],equiv_types[1],equiv_types[2]);
}
for (j=0; j < 3; j++) angletypes[i].bondbond_cross_term[j] = 0.0;
for (j=0; j < 4; j++) angletypes[i].bondangle_cross_term[j] = 0.0;
rab = get_r0(angletypes[i].types[0],angletypes[i].types[1]);
rbc = get_r0(angletypes[i].types[1],angletypes[i].types[2]);
angletypes[i].bondbond_cross_term[1] = rab;
angletypes[i].bondbond_cross_term[2] = rbc;
angletypes[i].bondangle_cross_term[2] = rab;
angletypes[i].bondangle_cross_term[3] = rbc;
k = find_match(3,potential_types,ff_bonbon,&backwards);
if (k < 0) {
k = find_match(3,equiv_types,ff_bonbon,&backwards);
}
if (k < 0) {
printf(" Unable to find bondbond data for %s %s %s\n",
potential_types[0],potential_types[1],potential_types[2]);
condexit(14);
} else {
angletypes[i].bondbond_cross_term[0] = ff_bonbon.data[k].ff_param[0];
}
k = find_match(3,potential_types,ff_bonang,&backwards);
if (k < 0) {
k = find_match(3,equiv_types,ff_bonang,&backwards);
}
if (k < 0) {
printf(" Unable to find bondangle data for %s %s %s\n",
potential_types[0],potential_types[1],potential_types[2]);
condexit(15);
} else {
if (backwards) {
angletypes[i].bondangle_cross_term[0] = ff_bonang.data[k].ff_param[1];
angletypes[i].bondangle_cross_term[1] = ff_bonang.data[k].ff_param[0];
} else {
angletypes[i].bondangle_cross_term[0] = ff_bonang.data[k].ff_param[0];
angletypes[i].bondangle_cross_term[1] = ff_bonang.data[k].ff_param[1];
}
}
}
}
if (pflag > 2) {
printf("\n Angle Types and Parameters\n");
for (i=0; i < no_angle_types; i++) {
for (j=0; j < 3; j++)
printf(" %-3s", atomtypes[angletypes[i].types[j]].potential);
for (j=0; j < 4; j++) printf(" %8.4f",angletypes[i].params[j]);
printf("\n");
}
if (forcefield & FF_TYPE_CLASS2) {
printf("\n BondBond Types and Parameters\n");
for (i=0; i < no_angle_types; i++) {
for (j=0; j < 3; j++)
printf(" %-3s",atomtypes[angletypes[i].types[j]].potential);
for (j=0; j < 3; j++)
printf(" %8.4f",angletypes[i].bondbond_cross_term[j]);
printf("\n");
}
printf("\n BondAngle Types and Parameters\n");
for (i=0; i < no_angle_types; i++) {
for (j=0; j < 3; j++)
printf(" %-3s",atomtypes[angletypes[i].types[j]].potential);
for (j=0; j < 4; j++)
printf(" %8.4f",angletypes[i].bondangle_cross_term[j]);
printf("\n");
}
}
}
/**********************************************************************/
/* */
/* Find parameters for dihedral types including endbonddihedral, */
/* midbonddihedral, angledihedral, angleangledihedral and */
/* bondbond13 parameters if Class II */
/* */
/* Each of the cross terms are searched separately even though */
/* they share a given dihedral type. This allows parameters to be */
/* in different order in the forcefield for each cross term or */
/* maybe not even there. */
/* */
/**********************************************************************/
for (i=0; i < no_dihedral_types; i++) {
for (j=0; j < 6; j++)
dihedraltypes[i].params[j] = 0.0;
for (j=0; j < 4; j++)
strncpy(potential_types[j],
atomtypes[dihedraltypes[i].types[j]].potential,5);
backwards = 0;
k = find_match(4,potential_types,ff_tor,&backwards);
if (k < 0) {
get_equivs(4,potential_types,equiv_types);
if (pflag > 2) {
printf(" Using equivalences for dihedral %s %s %s %s -> %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3],
equiv_types[0],equiv_types[1],
equiv_types[2],equiv_types[3]);
}
k = find_match(4,equiv_types,ff_tor,&backwards);
}
if (k < 0) {
printf(" Unable to find torsion data for %s %s %s %s\n",
potential_types[0],
potential_types[1],
potential_types[2],
potential_types[3]);
condexit(16);
} else {
if (forcefield & FF_TYPE_CLASS1) {
multiplicity = 1;
if (ff_tor.data[k].ff_types[0][0] == '*')
multiplicity =
atomtypes[dihedraltypes[i].types[1]].no_connect-1;
if (ff_tor.data[k].ff_types[3][0] == '*')
multiplicity *=
atomtypes[dihedraltypes[i].types[2]].no_connect-1;
dihedraltypes[i].params[0] = ff_tor.data[k].ff_param[0]/(double) multiplicity;
if (ff_tor.data[k].ff_param[2] == 0.0)
dihedraltypes[i].params[1] = 1.0;
else if (ff_tor.data[k].ff_param[2] == 180.0)
dihedraltypes[i].params[1] = -1.0;
else {
printf(" Non planar phi0 for %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3]);
dihedraltypes[i].params[1] = 0.0;
}
dihedraltypes[i].params[2] = ff_tor.data[k].ff_param[1];
}
if (forcefield & FF_TYPE_OPLSAA) {
for (j=0; j < 4; j++)
dihedraltypes[i].params[j] = ff_tor.data[k].ff_param[j];
}
if (forcefield & FF_TYPE_CLASS2) {
for (j=0; j < 6; j++)
dihedraltypes[i].params[j] = ff_tor.data[k].ff_param[j];
}
}
if (forcefield & FF_TYPE_CLASS2) {
get_equivs(4,potential_types,equiv_types);
if (pflag > 2) {
printf(" Using equivalences for linear 4 body cross terms %s %s %s %s -> %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3],
equiv_types[0],equiv_types[1],
equiv_types[2],equiv_types[3]);
}
for (j=0; j < 8; j++)
dihedraltypes[i].endbonddihedral_cross_term[j] = 0.0;
for (j=0; j < 4; j++)
dihedraltypes[i].midbonddihedral_cross_term[j] = 0.0;
for (j=0; j < 8; j++)
dihedraltypes[i].angledihedral_cross_term[j] = 0.0;
for (j=0; j < 3; j++)
dihedraltypes[i].angleangledihedral_cross_term[j] = 0.0;
for (j=0; j < 3; j++)
dihedraltypes[i].bond13_cross_term[j] = 0.0;
rab = get_r0(dihedraltypes[i].types[0],dihedraltypes[i].types[1]);
rbc = get_r0(dihedraltypes[i].types[1],dihedraltypes[i].types[2]);
rcd = get_r0(dihedraltypes[i].types[2],dihedraltypes[i].types[3]);
tabc = get_t0(dihedraltypes[i].types[0],
dihedraltypes[i].types[1],
dihedraltypes[i].types[2]);
tbcd = get_t0(dihedraltypes[i].types[1],
dihedraltypes[i].types[2],
dihedraltypes[i].types[3]);
dihedraltypes[i].endbonddihedral_cross_term[6] = rab;
dihedraltypes[i].endbonddihedral_cross_term[7] = rcd;
dihedraltypes[i].midbonddihedral_cross_term[3] = rbc;
dihedraltypes[i].angledihedral_cross_term[6] = tabc;
dihedraltypes[i].angledihedral_cross_term[7] = tbcd;
dihedraltypes[i].angleangledihedral_cross_term[1] = tabc;
dihedraltypes[i].angleangledihedral_cross_term[2] = tbcd;
dihedraltypes[i].bond13_cross_term[1] = rab;
dihedraltypes[i].bond13_cross_term[2] = rcd;
backwards = 0;
k = find_match(4,potential_types,ff_endbontor,&backwards);
if (k < 0) {
k = find_match(4,equiv_types,ff_endbontor,&backwards);
}
if (k < 0) {
printf(" Unable to find endbonddihedral data for %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3]);
condexit(17);
} else {
if (backwards) {
dihedraltypes[i].endbonddihedral_cross_term[0] =
ff_endbontor.data[k].ff_param[3];
dihedraltypes[i].endbonddihedral_cross_term[1] =
ff_endbontor.data[k].ff_param[4];
dihedraltypes[i].endbonddihedral_cross_term[2] =
ff_endbontor.data[k].ff_param[5];
dihedraltypes[i].endbonddihedral_cross_term[3] =
ff_endbontor.data[k].ff_param[0];
dihedraltypes[i].endbonddihedral_cross_term[4] =
ff_endbontor.data[k].ff_param[1];
dihedraltypes[i].endbonddihedral_cross_term[5] =
ff_endbontor.data[k].ff_param[2];
} else {
dihedraltypes[i].endbonddihedral_cross_term[0] =
ff_endbontor.data[k].ff_param[0];
dihedraltypes[i].endbonddihedral_cross_term[1] =
ff_endbontor.data[k].ff_param[1];
dihedraltypes[i].endbonddihedral_cross_term[2] =
ff_endbontor.data[k].ff_param[2];
dihedraltypes[i].endbonddihedral_cross_term[3] =
ff_endbontor.data[k].ff_param[3];
dihedraltypes[i].endbonddihedral_cross_term[4] =
ff_endbontor.data[k].ff_param[4];
dihedraltypes[i].endbonddihedral_cross_term[5] =
ff_endbontor.data[k].ff_param[5];
}
}
backwards = 0;
k = find_match(4,potential_types,ff_midbontor,&backwards);
if (k < 0) {
k = find_match(4,equiv_types,ff_midbontor,&backwards);
}
if (k < 0) {
printf(" Unable to find midbonddihedral data for %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3]);
condexit(18);
} else {
dihedraltypes[i].midbonddihedral_cross_term[0] =
ff_midbontor.data[k].ff_param[0];
dihedraltypes[i].midbonddihedral_cross_term[1] =
ff_midbontor.data[k].ff_param[1];
dihedraltypes[i].midbonddihedral_cross_term[2] =
ff_midbontor.data[k].ff_param[2];
}
backwards = 0;
k = find_match(4,potential_types,ff_angtor,&backwards);
if (k < 0) {
k = find_match(4,equiv_types,ff_angtor,&backwards);
}
if (k < 0) {
printf(" Unable to find angledihedral data for %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3]);
condexit(19);
} else {
if (backwards) {
dihedraltypes[i].angledihedral_cross_term[0] =
ff_angtor.data[k].ff_param[3];
dihedraltypes[i].angledihedral_cross_term[1] =
ff_angtor.data[k].ff_param[4];
dihedraltypes[i].angledihedral_cross_term[2] =
ff_angtor.data[k].ff_param[5];
dihedraltypes[i].angledihedral_cross_term[3] =
ff_angtor.data[k].ff_param[0];
dihedraltypes[i].angledihedral_cross_term[4] =
ff_angtor.data[k].ff_param[1];
dihedraltypes[i].angledihedral_cross_term[5] =
ff_angtor.data[k].ff_param[2];
} else {
dihedraltypes[i].angledihedral_cross_term[0] =
ff_angtor.data[k].ff_param[0];
dihedraltypes[i].angledihedral_cross_term[1] =
ff_angtor.data[k].ff_param[1];
dihedraltypes[i].angledihedral_cross_term[2] =
ff_angtor.data[k].ff_param[2];
dihedraltypes[i].angledihedral_cross_term[3] =
ff_angtor.data[k].ff_param[3];
dihedraltypes[i].angledihedral_cross_term[4] =
ff_angtor.data[k].ff_param[4];
dihedraltypes[i].angledihedral_cross_term[5] =
ff_angtor.data[k].ff_param[5];
}
}
backwards = 0;
k = find_match(4,potential_types,ff_angangtor,&backwards);
if (k < 0) {
k = find_match(4,equiv_types,ff_angangtor,&backwards);
}
if (k < 0) {
printf(" Unable to find angleangledihedral data for %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3]);
condexit(20);
} else {
dihedraltypes[i].angleangledihedral_cross_term[0] =
ff_angangtor.data[k].ff_param[0];
}
cp_type = quo_cp();
if ((cp_type >= 0) &&
((dihedraltypes[i].types[0] == cp_type) ||
(dihedraltypes[i].types[1] == cp_type) ||
(dihedraltypes[i].types[2] == cp_type) ||
(dihedraltypes[i].types[3] == cp_type) )) {
backwards = 0;
k = find_match(4,potential_types,ff_bonbon13,&backwards);
if (k < 0) {
k = find_match(4,equiv_types,ff_bonbon13,&backwards);
}
if (k < 0) {
printf(" Unable to find bond13 data for %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3]);
condexit(21);
} else {
dihedraltypes[i].bond13_cross_term[0] =
ff_bonbon13.data[k].ff_param[0];
}
}
}
}
if (pflag > 2) {
printf("\n Dihedral Types and Parameters\n");
for (i=0; i < no_dihedral_types; i++) {
for (j=0; j < 4; j++)
printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
for (j=0; j < 6; j++)
printf(" %8.4f",dihedraltypes[i].params[j]);
printf("\n");
}
if (forcefield & FF_TYPE_CLASS2) {
printf("\n EndBondDihedral Types and Parameters\n");
for (i=0; i < no_dihedral_types; i++) {
for (j=0; j < 4; j++)
printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
for (j=0; j < 8; j++)
printf(" %8.4f",dihedraltypes[i].endbonddihedral_cross_term[j]);
printf("\n");
}
printf("\n MidBondDihedral Types and Parameters\n");
for (i=0; i < no_dihedral_types; i++) {
for (j=0; j < 4; j++)
printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
for (j=0; j < 4; j++)
printf(" %8.4f",dihedraltypes[i].midbonddihedral_cross_term[j]);
printf("\n");
}
printf("\n AngleDihedral Types and Parameters\n");
for (i=0; i < no_dihedral_types; i++) {
for (j=0; j < 4; j++)
printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
for (j=0; j < 8; j++)
printf(" %8.4f",dihedraltypes[i].angledihedral_cross_term[j]);
printf("\n");
}
printf("\n AngleAngleDihedral Types and Parameters\n");
for (i=0; i < no_dihedral_types; i++) {
for (j=0; j < 4; j++)
printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
for (j=0; j < 3; j++)
printf(" %8.4f",dihedraltypes[i].angleangledihedral_cross_term[j]);
printf("\n");
}
printf("\n Bond13 Types and Parameters\n");
for (i=0; i < no_dihedral_types; i++) {
for (j=0; j < 4; j++)
printf(" %-3s",atomtypes[dihedraltypes[i].types[j]].potential);
for (j=0; j < 3; j++)
printf(" %8.4f",dihedraltypes[i].bond13_cross_term[j]);
printf("\n");
}
}
}
/**********************************************************************/
/* */
/* Find parameters for oop types */
/* */
/* This is the most complicated of all the types because the */
/* the class I oop is actually an improper torsion and does */
/* not have the permutation symmetry of a well defined oop */
/* The net result is that if one does not find the current */
/* atom type ordering in the forcefield file then one must try each */
/* of the next permutations (6 in total) and when a match is found */
/* the program must go back and rearrange the oop type AND the atom */
/* ordering in the oop lists for those with the current type */
/* */
/* The Class II oop types are easier but also tedious since the */
/* program has to try all permutations of the a c and d atom */
/* types to find a match. A special routine is used to do this. */
/* */
/* Fortunately, there are typically few oop types */
/* */
/**********************************************************************/
if (forcefield & FF_TYPE_CLASS1) {
for (i=0; i < no_oop_types; i++) {
for (j=0; j < 3; j++) ooptypes[i].params[j] = 0.0;
for (j=0; j < 4; j++)
strncpy(potential_types[j],
atomtypes[ooptypes[i].types[j]].potential,5);
k = find_improper_body_data(potential_types,ff_oop,&rearrange);
if (k < 0) {
get_equivs(5,potential_types,equiv_types);
if (pflag > 2) {
printf(" Using equivalences for oop %s %s %s %s -> %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3],
equiv_types[0],equiv_types[1],
equiv_types[2],equiv_types[3]);
}
k = find_improper_body_data(equiv_types,ff_oop,&rearrange);
}
if (k < 0) {
printf(" Unable to find oop data for %s %s %s %s\n",
potential_types[0],
potential_types[1],potential_types[2],potential_types[3]);
condexit(22);
} else {
ooptypes[i].params[0] = ff_oop.data[k].ff_param[0];
if (ff_oop.data[k].ff_param[2] == 0.0)
ooptypes[i].params[1] = 1.0;
else if (ff_oop.data[k].ff_param[2] == 180.0)
ooptypes[i].params[1] = -1.0;
else {
printf(" Non planar phi0 for %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3]);
ooptypes[i].params[1] = 0.0;
}
ooptypes[i].params[2] = ff_oop.data[k].ff_param[1];
if (rearrange > 0) rearrange_improper(i,rearrange);
}
}
}
if (forcefield & FF_TYPE_CLASS2) {
for (i=0; i < no_oop_types; i++) {
for (j=0; j < 3; j++)
ooptypes[i].params[j] = 0.0;
for (j=0; j < 4; j++)
strncpy(potential_types[j],
atomtypes[ooptypes[i].types[j]].potential,5);
k = find_trigonal_body_data(potential_types,ff_oop);
if (k < 0) {
get_equivs(5,potential_types,equiv_types);
if (pflag > 2) {
printf(" Using equivalences for oop %s %s %s %s -> %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3],
equiv_types[0],equiv_types[1],
equiv_types[2],equiv_types[3]);
}
k = find_trigonal_body_data(equiv_types,ff_oop);
}
if (k < 0) {
printf(" Unable to find oop data for %s %s %s %s\n",
potential_types[0],
potential_types[1],potential_types[2],potential_types[3]);
condexit(23);
} else {
for (j=0; j < 2; j++)
ooptypes[i].params[j] = ff_oop.data[k].ff_param[j];
}
}
}
if (pflag > 2) {
printf("\n OOP Types and Parameters\n");
for (i=0; i < no_oop_types; i++) {
for (j=0; j < 4; j++)
printf(" %-3s",atomtypes[ooptypes[i].types[j]].potential);
for (j=0; j < 3; j++)
printf(" %8.4f",ooptypes[i].params[j]);
printf("\n");
}
}
/**********************************************************************/
/* */
/* Find parameters for angleangle types (Class II only) */
/* */
/* This is somewhat complicated in that one set of four types */
/* a b c d has three angleangle combinations so for each type */
/* the program needs to find three sets of parameters by */
/* progressively looking for data for different permutations of */
/* a c and d */
/* */
/**********************************************************************/
if (forcefield & FF_TYPE_CLASS2) {
for (i=0; i < no_oop_types; i++) {
for (j=0; j < 6; j++) ooptypes[i].angleangle_params[j] = 0.0;
for (j=0; j < 4; j++)
strncpy(potential_types[j],
atomtypes[ooptypes[i].types[j]].potential,5);
tabc = get_t0(ooptypes[i].types[0],
ooptypes[i].types[1],
ooptypes[i].types[2]);
tabd = get_t0(ooptypes[i].types[0],
ooptypes[i].types[1],
ooptypes[i].types[3]);
tcbd = get_t0(ooptypes[i].types[2],
ooptypes[i].types[1],
ooptypes[i].types[3]);
ooptypes[i].angleangle_params[3] = tabc;
ooptypes[i].angleangle_params[4] = tcbd;
ooptypes[i].angleangle_params[5] = tabd;
k = find_angleangle_data(potential_types,ff_angang,kloc);
if (k < 0) {
get_equivs(5,potential_types,equiv_types);
if (pflag > 2) {
printf(" Using equivalences for angleangle %s %s %s %s -> %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3],
equiv_types[0],equiv_types[1],
equiv_types[2],equiv_types[3]);
k = find_angleangle_data(equiv_types,ff_angang,kloc);
}
}
if (k < 0) {
printf(" Unable to find angleangle data for %s %s %s %s\n",
potential_types[0],
potential_types[1],potential_types[2],potential_types[3]);
condexit(24);
} else {
for (j=0; j < 3; j++) {
if (kloc[j] > -1)
ooptypes[i].angleangle_params[j] = ff_angang.data[kloc[j]].ff_param[0];
}
}
}
for (i=0; i < no_angleangle_types; i++) {
for (j=0; j < 6; j++) angleangletypes[i].params[j] = 0.0;
for (j=0; j < 4; j++)
strncpy(potential_types[j],
atomtypes[angleangletypes[i].types[j]].potential,5);
tabc = get_t0(angleangletypes[i].types[0],
angleangletypes[i].types[1],
angleangletypes[i].types[2]);
tabd = get_t0(angleangletypes[i].types[0],
angleangletypes[i].types[1],
angleangletypes[i].types[3]);
tcbd = get_t0(angleangletypes[i].types[2],
angleangletypes[i].types[1],
angleangletypes[i].types[3]);
angleangletypes[i].params[3] = tabc;
angleangletypes[i].params[4] = tcbd;
angleangletypes[i].params[5] = tabd;
k = find_angleangle_data(potential_types,ff_angang,kloc);
if (k < 0) {
get_equivs(5,potential_types,equiv_types);
if (pflag > 2) {
printf("Using equivalences for angleangle %s %s %s %s -> %s %s %s %s\n",
potential_types[0],potential_types[1],
potential_types[2],potential_types[3],
equiv_types[0],equiv_types[1],
equiv_types[2],equiv_types[3]);
}
k = find_angleangle_data(equiv_types,ff_angang,kloc);
}
if (k < 0) {
printf(" Unable to find angleangle data for %s %s %s %s\n",
potential_types[0],
potential_types[1],potential_types[2],potential_types[3]);
condexit(25);
} else {
for (j=0; j < 3; j++) {
if (kloc[j] > -1)
angleangletypes[i].params[j] =
ff_angang.data[kloc[j]].ff_param[0];
}
}
}
if (pflag > 2) {
printf("\n AngleAngle Types and Parameters\n");
for (i=0; i < no_oop_types; i++) {
for (j=0; j < 4; j++)
printf(" %-3s",atomtypes[ooptypes[i].types[j]].potential);
for (j=0; j < 6; j++)
printf(" %8.4f",ooptypes[i].angleangle_params[j]);
printf("\n");
}
for (i=0; i < no_angleangle_types; i++) {
for (j=0; j < 4; j++)
printf(" %-3s",atomtypes[angleangletypes[i].types[j]].potential);
for (j=0; j < 6; j++) printf(" %8.4f",angleangletypes[i].params[j]);
printf("\n");
}
}
}
}
int find_improper_body_data(char types1[][5],struct FrcFieldItem item,
int *rearrange_ptr)
{
int k,backwards;
char mirror_types[4][5];
backwards = 0;
/* a b c d */
*rearrange_ptr = 0;
k = find_match(4,types1,item,&backwards);
if (k >= 0) return k;
/* a b d c */
*rearrange_ptr = 1;
strncpy(mirror_types[0],types1[0],5);
strncpy(mirror_types[1],types1[1],5);
strncpy(mirror_types[2],types1[3],5);
strncpy(mirror_types[3],types1[2],5);
k = find_match(4,mirror_types,item,&backwards);
if (k >= 0) return k;
/* d b a c */
*rearrange_ptr = 2;
strncpy(mirror_types[0],types1[3],5);
strncpy(mirror_types[2],types1[0],5);
strncpy(mirror_types[3],types1[2],5);
k = find_match(4,mirror_types,item,&backwards);
if (k >= 0) return k;
/* d b c a */
*rearrange_ptr = 3;
strncpy(mirror_types[2],types1[2],5);
strncpy(mirror_types[3],types1[0],5);
k = find_match(4,mirror_types,item,&backwards);
if (k >= 0) return k;
/* c b a d */
*rearrange_ptr = 4;
strncpy(mirror_types[0],types1[2],5);
strncpy(mirror_types[2],types1[0],5);
strncpy(mirror_types[3],types1[3],5);
k = find_match(4,mirror_types,item,&backwards);
if (k >= 0) return k;
/* c b d a */
*rearrange_ptr = 5;
strncpy(mirror_types[2],types1[3],5);
strncpy(mirror_types[3],types1[0],5);
k = find_match(4,mirror_types,item,&backwards);
return k;
}
void rearrange_improper(int ooptype,int rearrange)
{
int i,j,temp[4];
for (i=0; i < 4; i++) temp[i] = ooptypes[ooptype].types[i];
switch (rearrange) {
case 1:
ooptypes[ooptype].types[0] = temp[0];
ooptypes[ooptype].types[2] = temp[3];
ooptypes[ooptype].types[3] = temp[2];
for (i=0; i < total_no_oops; i++) {
if (oops[i].type == ooptype) {
for (j=0; j < 4; j++) temp[j] = oops[i].members[j];
oops[i].members[2] = temp[3];
oops[i].members[3] = temp[2];
}
}
break;
case 2:
ooptypes[ooptype].types[0] = temp[3];
ooptypes[ooptype].types[2] = temp[0];
ooptypes[ooptype].types[3] = temp[2];
for (i=0; i < total_no_oops; i++) {
if (oops[i].type == ooptype) {
for (j=0; j < 4; j++) temp[j] = oops[i].members[j];
oops[i].members[0] = temp[3];
oops[i].members[2] = temp[0];
oops[i].members[3] = temp[2];
}
}
break;
case 3:
ooptypes[ooptype].types[0] = temp[3];
ooptypes[ooptype].types[2] = temp[2];
ooptypes[ooptype].types[3] = temp[0];
for (i=0; i < total_no_oops; i++) {
if (oops[i].type == ooptype) {
for (j=0; j < 4; j++) temp[j] = oops[i].members[j];
oops[i].members[0] = temp[3];
oops[i].members[2] = temp[2];
oops[i].members[3] = temp[0];
}
}
break;
case 4:
ooptypes[ooptype].types[0] = temp[2];
ooptypes[ooptype].types[2] = temp[0];
ooptypes[ooptype].types[3] = temp[3];
for (i=0; i < total_no_oops; i++) {
if (oops[i].type == ooptype) {
for (j=0; j < 4; j++) temp[j] = oops[i].members[j];
oops[i].members[0] = temp[2];
oops[i].members[2] = temp[0];
oops[i].members[3] = temp[3];
}
}
break;
case 5:
ooptypes[ooptype].types[0] = temp[2];
ooptypes[ooptype].types[2] = temp[3];
ooptypes[ooptype].types[3] = temp[0];
for (i=0; i < total_no_oops; i++) {
if (oops[i].type == ooptype) {
for (j=0; j < 4; j++) temp[j] = oops[i].members[j];
oops[i].members[0] = temp[2];
oops[i].members[2] = temp[3];
oops[i].members[3] = temp[0];
}
}
break;
default:
break;
}
}
int find_trigonal_body_data(char types1[][5],struct FrcFieldItem item)
{
int k,backwards;
char mirror_types[4][5];
backwards = -1;
/* a b c d */
k = find_match(4,types1,item,&backwards);
if (k >= 0) return k;
/* a b d c */
strncpy(mirror_types[0],types1[0],5);
strncpy(mirror_types[1],types1[1],5);
strncpy(mirror_types[2],types1[3],5);
strncpy(mirror_types[3],types1[2],5);
k = find_match(4,mirror_types,item,&backwards);
if (k >= 0) return k;
/* d b a c */
strncpy(mirror_types[0],types1[3],5);
strncpy(mirror_types[2],types1[0],5);
strncpy(mirror_types[3],types1[2],5);
k = find_match(4,mirror_types,item,&backwards);
if (k >= 0) return k;
/* d b c a */
strncpy(mirror_types[2],types1[2],5);
strncpy(mirror_types[3],types1[0],5);
k = find_match(4,mirror_types,item,&backwards);
if (k >= 0) return k;
/* c b a d */
strncpy(mirror_types[0],types1[2],5);
strncpy(mirror_types[2],types1[0],5);
strncpy(mirror_types[3],types1[3],5);
k = find_match(4,mirror_types,item,&backwards);
if (k >= 0) return k;
/* c b d a */
strncpy(mirror_types[2],types1[3],5);
strncpy(mirror_types[3],types1[0],5);
k = find_match(4,mirror_types,item,&backwards);
return k;
}
int find_angleangle_data(char types1[][5],struct FrcFieldItem item,int kloc[3])
{
int k,backwards = -1;
char mirror_types[4][5];
strncpy(mirror_types[1],types1[1],5);
/* go for first parameter a b c d or d b c a */
k = find_match(4,types1,item,&backwards);
if (k < 0) {
strncpy(mirror_types[0],types1[3],5);
strncpy(mirror_types[2],types1[2],5);
strncpy(mirror_types[3],types1[0],5);
k = find_match(4,mirror_types,item,&backwards);
}
kloc[0] = k;
/* go for second parameter d b a c or c b a d */
strncpy(mirror_types[0],types1[3],5);
strncpy(mirror_types[2],types1[0],5);
strncpy(mirror_types[3],types1[2],5);
k = find_match(4,mirror_types,item,&backwards);
if (k < 0) {
strncpy(mirror_types[0],types1[2],5);
strncpy(mirror_types[3],types1[3],5);
k = find_match(4,mirror_types,item,&backwards);
}
kloc[1] = k;
/* go for third parameter a b d c or c b d a */
strncpy(mirror_types[0],types1[0],5);
strncpy(mirror_types[2],types1[3],5);
strncpy(mirror_types[3],types1[2],5);
k = find_match(4,mirror_types,item,&backwards);
if (k < 0) {
strncpy(mirror_types[0],types1[2],5);
strncpy(mirror_types[3],types1[0],5);
k = find_match(4,mirror_types,item,&backwards);
}
kloc[2] = k;
k = 0;
if ((kloc[0] < 0) && (kloc[1] < 0) && (kloc[2] < 0)) k = -1;
return k;
}
int find_match(int n, char types1[][5],struct FrcFieldItem item,int
*backwards_ptr)
{
int k,match;
match = 0;
k=0;
/* Try for an exact match (no wildcards) first */
while (!match && (k < item.entries)) {
if (match_types(n, 0,types1,item.data[k].ff_types,backwards_ptr) == 1)
match = 1;
else
k++;
}
/* Try again - allow wildcard matching */
if (!match) {
k=0;
while (!match && (k < item.entries)) {
if (match_types(n,1,types1,item.data[k].ff_types,backwards_ptr) == 1)
match = 1;
else
k++;
}
}
if (match) return k;
else return -1;
}
int match_types(int n,int wildcard,char types1[][5],char types2[][5],
int *backwards_ptr)
{
int k,match;
/* Routine to match short arrays of characters strings which contain
atom potential types. The arrays range from 1 to 4 (VDW or equivalences,
bond, angle, dihedrals or oops). There are potentially four ways the
arrays can match: exact match (forwards), exact match when one array is
run backwards (backwards), forwards with wildcard character match allowed
(forwards *) and finally backwards with wildcard character match
(backwards *). If the variable, backwards (pointed by backwards_ptr)
is -1, then the backwards options are not to be used (such when
matching oop types)
*/
if (wildcard == 0) {
/* forwards */
k=0;
match = 1;
while (match && (k < n)) {
if (strncmp(types1[k],types2[k],5) == 0)
k++;
else
match = 0;
}
} else {
/* forwards * */
k=0;
match = 1;
while (match && (k < n)) {
if ((strncmp(types1[k],types2[k],5) == 0) ||
(types2[k][0] == '*'))
k++;
else
match = 0;
}
}
if (match) {
*backwards_ptr = 0;
return 1;
}
if ((n < 2) || (*backwards_ptr == -1)) return 0;
if (wildcard == 0) {
/* backwards */
k=0;
match = 1;
while (match && (k < n)) {
if (strncmp(types1[n-k-1],types2[k],5) == 0)
k++;
else
match = 0;
}
} else {
/* backwards * */
k=0;
match = 1;
while (match && (k < n)) {
if ((strncmp(types1[n-k-1],types2[k],5) == 0) ||
(types2[k][0] == '*') )
k++;
else
match = 0;
}
}
if (match) {
*backwards_ptr = 1;
return 1;
} else return 0;
}
double get_r0(int typei,int typej)
{
int k,match;
double r;
k=0;
match=0;
r = 0.0;
while (!match && (k < no_bond_types)) {
if (((typei == bondtypes[k].types[0]) &&
(typej == bondtypes[k].types[1])) ||
((typej == bondtypes[k].types[0]) &&
(typei == bondtypes[k].types[1])) ) {
r = bondtypes[k].params[0];
match = 1;
} else k++;
}
if (match == 0)
printf(" Unable to find r0 for types %d %d\n",typei,typej);
return r;
}
double get_t0(int typei,int typej,int typek)
{
int k,match;
double theta;
k=0;
match=0;
theta = 0.0;
while (!match && (k < no_angle_types)) {
if (((typei == angletypes[k].types[0]) &&
(typej == angletypes[k].types[1]) &&
(typek == angletypes[k].types[2])) ||
((typek == angletypes[k].types[0]) &&
(typej == angletypes[k].types[1]) &&
(typei == angletypes[k].types[2])) ) {
theta = angletypes[k].params[0];
match = 1;
} else k++;
}
if (match == 0)
printf(" Unable to find t0 for types %d %d %d\n",
typei,typej,typek);
return theta;
}
int quo_cp()
{
char cp[] = "cp ";
int i,type,found;
i = 0;
type = -1;
found = 0;
while (!found && (i < no_atom_types)) {
if (strncmp(atomtypes[i].potential,cp,2) == 0) {
found = 1;
type = i;
} else i++;
}
return type;
}
void get_equivs(int ic,char potential_types[][5],char equiv_types[][5])
{
int i,k;
switch (ic) {
case 1:
k = find_equiv_type(potential_types[0]);
if (k > -1) strncpy(equiv_types[0],equivalence.data[k].ff_types[1],5);
break;
case 2:
for (i=0; i < 2; i++) {
k = find_equiv_type(potential_types[i]);
if (k > -1) strncpy(equiv_types[i],equivalence.data[k].ff_types[2],5);
}
break;
case 3:
for (i=0; i < 3; i++) {
k = find_equiv_type(potential_types[i]);
if (k > -1) strncpy(equiv_types[i],equivalence.data[k].ff_types[3],5);
}
break;
case 4:
for (i=0; i < 4; i++) {
k = find_equiv_type(potential_types[i]);
if (k > -1) strncpy(equiv_types[i],equivalence.data[k].ff_types[4],5);
}
break;
case 5:
for (i=0; i < 4; i++) {
k = find_equiv_type(potential_types[i]);
if (k > -1)
strncpy(equiv_types[i],equivalence.data[k].ff_types[5],5);
}
break;
default:
printf(" Requesting equivalences of unsupported type: %d\n",ic);
condexit(26);
break;
}
return;
}
int find_equiv_type(char potential_type[5])
{
int j,k,match;
j = -1;
k = 0;
match = 0;
while (!match && (k < equivalence.entries)) {
if (strncmp(potential_type,
equivalence.data[k].ff_types[0],5) == 0) {
match = 1;
j = k;
} else {
k++;
}
}
if (j < 0)
printf(" Unable to find equivalent type for %s\n",potential_type);
return j;
}
diff --git a/tools/msi2lmp/src/InitializeItems.c b/tools/msi2lmp/src/InitializeItems.c
index 4df9fd0f1..1e3363691 100644
--- a/tools/msi2lmp/src/InitializeItems.c
+++ b/tools/msi2lmp/src/InitializeItems.c
@@ -1,140 +1,140 @@
/*
* This function fills in the keyword field, the number of members for each
* item and the number of parameters for each item
*
*/
#include "msi2lmp.h"
#include "Forcefield.h"
#include <string.h>
void InitializeItems(void)
{
/* ATOM TYPES */
strcpy(ff_atomtypes.keyword,"#atom_types");
ff_atomtypes.number_of_members = 1;
ff_atomtypes.number_of_parameters = 1;
/* EQUIVALENCE */
strcpy(equivalence.keyword,"#equivalence");
equivalence.number_of_members = 6;
equivalence.number_of_parameters = 0;
/* NON-BOND */
strcpy(ff_vdw.keyword,"#nonbond");
ff_vdw.number_of_members = 1;
ff_vdw.number_of_parameters = 2;
/* BOND */
ff_bond.number_of_members = 2;
if (forcefield & (FF_TYPE_CLASS1|FF_TYPE_OPLSAA)) {
strcpy(ff_bond.keyword,"#quadratic_bond");
ff_bond.number_of_parameters = 2;
}
if (forcefield & FF_TYPE_CLASS2) {
strcpy(ff_bond.keyword,"#quartic_bond");
ff_bond.number_of_parameters = 4;
}
/* MORSE */
if (forcefield & FF_TYPE_CLASS1) {
ff_morse.number_of_members = 2;
strcpy(ff_morse.keyword,"#morse_bond");
ff_morse.number_of_parameters = 3;
}
/* ANGLE */
ff_ang.number_of_members = 3;
if (forcefield & (FF_TYPE_CLASS1|FF_TYPE_OPLSAA)) {
strcpy(ff_ang.keyword,"#quadratic_angle");
ff_ang.number_of_parameters = 2;
}
if (forcefield & FF_TYPE_CLASS2) {
strcpy(ff_ang.keyword,"#quartic_angle");
ff_ang.number_of_parameters = 4;
}
/* TORSION */
ff_tor.number_of_members = 4;
if (forcefield & (FF_TYPE_CLASS1|FF_TYPE_OPLSAA)) {
strcpy(ff_tor.keyword,"#torsion_1");
ff_tor.number_of_parameters = 3;
- }
+ }
if (forcefield & FF_TYPE_CLASS2) {
strcpy(ff_tor.keyword,"#torsion_3");
ff_tor.number_of_parameters = 6;
}
/* OOP */
ff_oop.number_of_members = 4;
if (forcefield & (FF_TYPE_CLASS1|FF_TYPE_OPLSAA)) {
strcpy(ff_oop.keyword,"#out_of_plane");
ff_oop.number_of_parameters = 3;
}
if (forcefield & FF_TYPE_CLASS2) {
strcpy(ff_oop.keyword,"#wilson_out_of_plane");
ff_oop.number_of_parameters = 2;
}
if (forcefield & FF_TYPE_CLASS2) {
/* BOND-BOND */
strcpy(ff_bonbon.keyword,"#bond-bond");
ff_bonbon.number_of_members = 3;
ff_bonbon.number_of_parameters = 1;
/* BOND-ANGLE */
strcpy(ff_bonang.keyword,"#bond-angle");
ff_bonang.number_of_members = 3;
ff_bonang.number_of_parameters = 2;
/* ANGLE-TORSION */
strcpy(ff_angtor.keyword,"#angle-torsion_3");
ff_angtor.number_of_members = 4;
ff_angtor.number_of_parameters = 6;
/* ANGLE-ANGLE-TORSION */
strcpy(ff_angangtor.keyword,"#angle-angle-torsion_1");
ff_angangtor.number_of_members = 4;
ff_angangtor.number_of_parameters = 1;
/* END-BOND-TORSION */
strcpy(ff_endbontor.keyword,"#end_bond-torsion_3");
ff_endbontor.number_of_members = 4;
ff_endbontor.number_of_parameters = 6;
/* MID-BOND-TORSION */
strcpy(ff_midbontor.keyword,"#middle_bond-torsion_3");
ff_midbontor.number_of_members = 4;
ff_midbontor.number_of_parameters = 3;
/* ANGLE-ANGLE */
strcpy(ff_angang.keyword,"#angle-angle");
ff_angang.number_of_members = 4;
ff_angang.number_of_parameters = 1;
/* BOND-BOND-1-3 */
strcpy(ff_bonbon13.keyword,"#bond-bond_1_3");
ff_bonbon13.number_of_members = 4;
ff_bonbon13.number_of_parameters = 1;
}
}
diff --git a/tools/msi2lmp/src/WriteDataFile.c b/tools/msi2lmp/src/WriteDataFile.c
index 498978406..c03eba71c 100644
--- a/tools/msi2lmp/src/WriteDataFile.c
+++ b/tools/msi2lmp/src/WriteDataFile.c
@@ -1,478 +1,478 @@
/*
* This function creates and writes the data file to be used with LAMMPS
*/
#include "msi2lmp.h"
#include "Forcefield.h"
#include <stdlib.h>
void WriteDataFile(char *nameroot)
{
int i,j,k,m;
char line[MAX_LINE_LENGTH];
FILE *DatF;
/* Open data file */
sprintf(line,"%s.data",rootname);
if (pflag > 0) {
printf(" Writing LAMMPS data file %s.data",rootname);
if (forcefield & FF_TYPE_CLASS1) puts(" for Class I force field");
if (forcefield & FF_TYPE_CLASS2) puts(" for Class II force field");
if (forcefield & FF_TYPE_OPLSAA) puts(" for OPLS-AA force field");
}
if ((DatF = fopen(line,"w")) == NULL ) {
printf("Cannot open %s\n",line);
exit(62);
}
if (forcefield & (FF_TYPE_CLASS1|FF_TYPE_OPLSAA)) total_no_angle_angles = 0;
if (hintflag) fprintf(DatF, "LAMMPS data file. msi2lmp " MSI2LMP_VERSION
" / CGCMM for %s\n\n", nameroot);
else fprintf(DatF, "LAMMPS data file. msi2lmp " MSI2LMP_VERSION
" for %s\n\n", nameroot);
fprintf(DatF, " %6d atoms\n", total_no_atoms);
fprintf(DatF, " %6d bonds\n", total_no_bonds);
fprintf(DatF, " %6d angles\n",total_no_angles);
fprintf(DatF, " %6d dihedrals\n", total_no_dihedrals);
fprintf(DatF, " %6d impropers\n", total_no_oops+total_no_angle_angles);
fputs("\n",DatF);
fprintf(DatF, " %3d atom types\n", no_atom_types);
if (no_bond_types > 0)
fprintf(DatF, " %3d bond types\n", no_bond_types);
if (no_angle_types> 0)
fprintf(DatF, " %3d angle types\n", no_angle_types);
if (no_dihedral_types > 0) fprintf (DatF," %3d dihedral types\n",
no_dihedral_types);
if (forcefield & FF_TYPE_CLASS1) {
if (no_oop_types > 0)
fprintf (DatF, " %3d improper types\n", no_oop_types);
}
if (forcefield & FF_TYPE_CLASS2) {
if ((no_oop_types + no_angleangle_types) > 0)
fprintf (DatF, " %3d improper types\n",
no_oop_types + no_angleangle_types);
}
/* Modified by SLTM to print out triclinic box types 10/05/10 - lines 56-68 */
if (TriclinicFlag == 0) {
fputs("\n",DatF);
fprintf(DatF, " %15.9f %15.9f xlo xhi\n", box[0][0], box[1][0]);
fprintf(DatF, " %15.9f %15.9f ylo yhi\n", box[0][1], box[1][1]);
fprintf(DatF, " %15.9f %15.9f zlo zhi\n", box[0][2], box[1][2]);
} else {
fputs("\n",DatF);
fprintf(DatF, " %15.9f %15.9f xlo xhi\n", box[0][0], box[1][0]);
fprintf(DatF, " %15.9f %15.9f ylo yhi\n", box[0][1], box[1][1]);
fprintf(DatF, " %15.9f %15.9f zlo zhi\n", box[0][2], box[1][2]);
fprintf(DatF, " %15.9f %15.9f %15.9f xy xz yz\n",box[2][0], box[2][1], box[2][2]);
}
/* MASSES */
fprintf(DatF, "\nMasses\n\n");
for(k=0; k < no_atom_types; k++) {
if (hintflag) fprintf(DatF, " %3d %10.6f # %s\n",k+1,atomtypes[k].mass,atomtypes[k].potential);
else fprintf(DatF, " %3d %10.6f\n",k+1,atomtypes[k].mass);
}
fputs("\n",DatF);
/* COEFFICIENTS */
fputs("Pair Coeffs",DatF);
if (hintflag) {
if (forcefield & (FF_TYPE_CLASS1|FF_TYPE_OPLSAA))
fputs(" # lj/cut/coul/long\n\n",DatF);
else if (forcefield & FF_TYPE_CLASS2)
fputs(" # lj/class2/coul/long\n\n",DatF);
} else fputs("\n\n",DatF);
for (i=0; i < no_atom_types; i++) {
fprintf(DatF, " %3i ", i+1);
for ( j = 0; j < 2; j++)
fprintf(DatF, "%14.10f ",atomtypes[i].params[j]);
if (hintflag) fprintf(DatF, "# %s\n",atomtypes[i].potential);
else fputs("\n",DatF);
}
fputs("\n",DatF);
if (no_bond_types > 0) {
m = 0;
if (forcefield & FF_TYPE_CLASS1) m = 2;
if (forcefield & FF_TYPE_OPLSAA) m = 2;
if (forcefield & FF_TYPE_CLASS2) m = 4;
fputs("Bond Coeffs",DatF);
if (hintflag) {
if (forcefield & (FF_TYPE_CLASS1|FF_TYPE_OPLSAA))
fputs(" # harmonic\n\n",DatF);
else if (forcefield & FF_TYPE_CLASS2)
fputs(" # class2\n\n",DatF);
} else fputs("\n\n",DatF);
for (i=0; i < no_bond_types; i++) {
fprintf(DatF, " %3i", i+1);
for ( j = 0; j < m; j++)
fprintf(DatF, " %10.4f", bondtypes[i].params[j]);
if (hintflag) fprintf(DatF," # %s-%s\n",atomtypes[bondtypes[i].types[0]].potential,
atomtypes[bondtypes[i].types[1]].potential);
else fputs("\n",DatF);
}
fputs("\n",DatF);
}
if (no_angle_types > 0) {
m = 0;
if (forcefield & FF_TYPE_CLASS1) m = 2;
if (forcefield & FF_TYPE_OPLSAA) m = 2;
if (forcefield & FF_TYPE_CLASS2) m = 4;
fputs("Angle Coeffs",DatF);
if (hintflag) {
if (forcefield & (FF_TYPE_CLASS1|FF_TYPE_OPLSAA))
fputs(" # harmonic\n\n",DatF);
else if (forcefield & FF_TYPE_CLASS2)
fputs(" # class2\n\n",DatF);
} else fputs("\n\n",DatF);
-
+
for (i=0; i < no_angle_types; i++) {
fprintf(DatF, " %3i", i+1);
for ( j = 0; j < m; j++)
fprintf(DatF, " %10.4f", angletypes[i].params[j]);
if (hintflag) fprintf(DatF," # %s-%s-%s\n",
atomtypes[angletypes[i].types[0]].potential,
atomtypes[angletypes[i].types[1]].potential,
atomtypes[angletypes[i].types[2]].potential);
else fputs("\n",DatF);
}
fputs("\n",DatF);
}
if (no_dihedral_types > 0) {
fputs("Dihedral Coeffs",DatF);
if (forcefield & FF_TYPE_CLASS1) {
if (hintflag) fputs(" # harmonic\n\n",DatF);
else fputs("\n\n",DatF);
for (i=0; i < no_dihedral_types; i++) {
fprintf(DatF, "%3i %10.4f %3i %3i", i+1,
dihedraltypes[i].params[0],
(int) dihedraltypes[i].params[1],
(int) dihedraltypes[i].params[2]);
if (hintflag) fprintf(DatF," # %s-%s-%s-%s\n",
atomtypes[dihedraltypes[i].types[0]].potential,
atomtypes[dihedraltypes[i].types[1]].potential,
atomtypes[dihedraltypes[i].types[2]].potential,
atomtypes[dihedraltypes[i].types[3]].potential);
else fputs("\n",DatF);
}
fputs("\n",DatF);
} else if (forcefield & FF_TYPE_OPLSAA) {
if (hintflag) fputs(" # opls\n\n",DatF);
else fputs("\n\n",DatF);
for (i=0; i < no_dihedral_types; i++) {
fprintf(DatF, " %3i",i+1);
for ( j = 0; j < 4; j++)
fprintf(DatF, " %10.4f",dihedraltypes[i].params[j]);
if (hintflag) fprintf(DatF," # %s-%s-%s-%s\n",
atomtypes[dihedraltypes[i].types[0]].potential,
atomtypes[dihedraltypes[i].types[1]].potential,
atomtypes[dihedraltypes[i].types[2]].potential,
atomtypes[dihedraltypes[i].types[3]].potential);
else fputs("\n",DatF);
}
fputs("\n",DatF);
} else if (forcefield & FF_TYPE_CLASS2) {
if (hintflag) fputs(" # class2\n\n",DatF);
else fputs("\n\n",DatF);
for (i=0; i < no_dihedral_types; i++) {
fprintf(DatF, " %3i",i+1);
for ( j = 0; j < 6; j++)
fprintf(DatF, " %10.4f",dihedraltypes[i].params[j]);
if (hintflag) fprintf(DatF,"# %s-%s-%s-%s\n",
atomtypes[dihedraltypes[i].types[0]].potential,
atomtypes[dihedraltypes[i].types[1]].potential,
atomtypes[dihedraltypes[i].types[2]].potential,
atomtypes[dihedraltypes[i].types[3]].potential);
else fputs("\n",DatF);
}
fputs("\n",DatF);
}
}
if (forcefield & FF_TYPE_CLASS1) {
if (no_oop_types > 0) {
/* cvff improper coeffs are: type K0 d n */
if (hintflag) fputs("Improper Coeffs # cvff\n\n",DatF);
else fputs("Improper Coeffs\n\n",DatF);
for (i=0; i < no_oop_types; i++) {
fprintf(DatF,"%5i %10.4f %3i %3i ",i+1,
ooptypes[i].params[0], (int) ooptypes[i].params[1],
(int) ooptypes[i].params[2]);
if (hintflag) fprintf(DatF,"# %s-%s-%s-%s\n",
atomtypes[ooptypes[i].types[0]].potential,
atomtypes[ooptypes[i].types[1]].potential,
atomtypes[ooptypes[i].types[2]].potential,
atomtypes[ooptypes[i].types[3]].potential);
else fputs("\n",DatF);
}
fputs("\n",DatF);
}
} else if (forcefield & FF_TYPE_OPLSAA) {
if (no_oop_types > 0) {
/* opls improper coeffs are like cvff: type K0 d(=-1) n(=2) */
if (hintflag) fputs("Improper Coeffs # cvff\n\n",DatF);
else fputs("Improper Coeffs\n\n",DatF);
for (i=0; i < no_oop_types; i++) {
fprintf(DatF,"%5i %10.4f %3i %3i ",i+1,
ooptypes[i].params[0], (int) ooptypes[i].params[1],
(int) ooptypes[i].params[2]);
if (hintflag) fprintf(DatF,"# %s-%s-%s-%s\n",
atomtypes[ooptypes[i].types[0]].potential,
atomtypes[ooptypes[i].types[1]].potential,
atomtypes[ooptypes[i].types[2]].potential,
atomtypes[ooptypes[i].types[3]].potential);
else fputs("\n",DatF);
}
fputs("\n",DatF);
}
} else if (forcefield & FF_TYPE_CLASS2) {
if ((no_oop_types + no_angleangle_types) > 0) {
if (hintflag) fputs("Improper Coeffs # class2\n\n",DatF);
else fputs("Improper Coeffs\n\n",DatF);
for (i=0; i < no_oop_types; i++) {
fprintf(DatF, "%3i ", i+1);
for ( j = 0; j < 2; j++)
fprintf(DatF, "%10.4f ", ooptypes[i].params[j]);
if (hintflag) fprintf(DatF,"# %s-%s-%s-%s\n",
atomtypes[ooptypes[i].types[0]].potential,
atomtypes[ooptypes[i].types[1]].potential,
atomtypes[ooptypes[i].types[2]].potential,
atomtypes[ooptypes[i].types[3]].potential);
else fputs("\n",DatF);
}
for (i=0; i < no_angleangle_types; i++) {
fprintf(DatF, "%3i ", i+no_oop_types+1);
for ( j = 0; j < 2; j++)
fprintf(DatF, "%10.4f ", 0.0);
fputs("\n",DatF);
}
fputs("\n",DatF);
}
}
if (forcefield & FF_TYPE_CLASS2) {
if (no_angle_types > 0) {
fprintf(DatF,"BondBond Coeffs\n\n");
for (i=0; i < no_angle_types; i++) {
fprintf(DatF, "%3i ", i+1);
for ( j = 0; j < 3; j++)
fprintf(DatF, "%10.4f ", angletypes[i].bondbond_cross_term[j]);
fputs("\n",DatF);
}
fputs("\n",DatF);
fprintf(DatF,"BondAngle Coeffs\n\n");
for (i=0; i < no_angle_types; i++) {
fprintf(DatF, "%3i ", i+1);
for ( j = 0; j < 4; j++)
fprintf(DatF, "%10.4f ",angletypes[i].bondangle_cross_term[j]);
fputs("\n",DatF);
}
fputs("\n",DatF);
}
if ((no_oop_types+no_angleangle_types) > 0) {
fprintf(DatF,"AngleAngle Coeffs\n\n");
for (i=0; i < no_oop_types; i++) {
fprintf(DatF, "%3i ", i+1);
for ( j = 0; j < 6; j++)
fprintf(DatF, "%10.4f ", ooptypes[i].angleangle_params[j]);
fputs("\n",DatF);
}
for (i=0; i < no_angleangle_types; i++) {
fprintf(DatF, "%3i ", i+no_oop_types+1);
for ( j = 0; j < 6; j++)
fprintf(DatF, "%10.4f ", angleangletypes[i].params[j]);
fputs("\n",DatF);
}
fputs("\n",DatF);
}
if (no_dihedral_types > 0) {
fprintf(DatF,"AngleAngleTorsion Coeffs\n\n");
for (i=0; i < no_dihedral_types; i++) {
fprintf(DatF, "%3i ", i+1);
for ( j = 0; j < 3; j++)
fprintf(DatF,"%10.4f ",
dihedraltypes[i].angleangledihedral_cross_term[j]);
fputs("\n",DatF);
}
fputs("\n",DatF);
fprintf(DatF,"EndBondTorsion Coeffs\n\n");
for (i=0; i < no_dihedral_types; i++) {
fprintf(DatF, "%i ", i+1);
for ( j = 0; j < 8; j++)
fprintf(DatF, "%10.4f ",
dihedraltypes[i].endbonddihedral_cross_term[j]);
fputs("\n",DatF);
}
fputs("\n",DatF);
fprintf(DatF,"MiddleBondTorsion Coeffs\n\n");
for (i=0; i < no_dihedral_types; i++) {
fprintf(DatF, "%3i ", i+1);
for ( j = 0; j < 4; j++)
fprintf(DatF,"%10.4f ",
dihedraltypes[i].midbonddihedral_cross_term[j]);
fputs("\n",DatF);
}
fputs("\n",DatF);
fprintf(DatF,"BondBond13 Coeffs\n\n");
for (i=0; i < no_dihedral_types; i++) {
fprintf(DatF, "%3i ", i+1);
for ( j = 0; j < 3; j++)
fprintf(DatF, "%10.4f ",
dihedraltypes[i].bond13_cross_term[j]);
fputs("\n",DatF);
}
fputs("\n",DatF);
fprintf(DatF,"AngleTorsion Coeffs\n\n");
for (i=0; i < no_dihedral_types; i++) {
fprintf(DatF, "%3i ", i+1);
for ( j = 0; j < 8; j++)
fprintf(DatF, "%10.4f ",
dihedraltypes[i].angledihedral_cross_term[j]);
fputs("\n",DatF);
}
fputs("\n",DatF);
}
}
/*--------------------------------------------------------------------*/
/* ATOMS */
if (hintflag) fputs("Atoms # full\n\n",DatF);
else fputs("Atoms\n\n",DatF);
for(k=0; k < total_no_atoms; k++) {
int typ = atoms[k].type;
fprintf(DatF," %6i %6i %3i %9.6f %15.9f %15.9f %15.9f %3i %3i %3i",
k+1,
atoms[k].molecule,
typ+1,
atoms[k].q,
atoms[k].x[0],
atoms[k].x[1],
atoms[k].x[2],
atoms[k].image[0],
atoms[k].image[1],
atoms[k].image[2]);
if (hintflag) fprintf(DatF," # %s\n",atomtypes[typ].potential);
else fputs("\n",DatF);
}
fputs("\n",DatF);
/***** BONDS *****/
if (total_no_bonds > 0) {
fprintf(DatF, "Bonds\n\n");
for(k=0; k < total_no_bonds; k++)
fprintf(DatF, "%6i %3i %6i %6i\n",k+1,
bonds[k].type+1,
bonds[k].members[0]+1,
bonds[k].members[1]+1);
fputs("\n",DatF);
}
/***** ANGLES *****/
if (total_no_angles > 0) {
fprintf(DatF, "Angles\n\n");
for(k=0; k < total_no_angles; k++)
fprintf(DatF, "%6i %3i %6i %6i %6i\n",k+1,
angles[k].type+1,
angles[k].members[0]+1,
angles[k].members[1]+1,
angles[k].members[2]+1);
fputs("\n",DatF);
}
/***** TORSIONS *****/
if (total_no_dihedrals > 0) {
fprintf(DatF,"Dihedrals\n\n");
for(k=0; k < total_no_dihedrals; k++)
fprintf(DatF, "%6i %3i %6i %6i %6i %6i\n",k+1,
dihedrals[k].type+1,
dihedrals[k].members[0]+1,
dihedrals[k].members[1]+1,
dihedrals[k].members[2]+1,
dihedrals[k].members[3]+1);
fputs("\n",DatF);
}
/***** OUT-OF-PLANES *****/
if (total_no_oops+total_no_angle_angles > 0) {
fprintf(DatF,"Impropers\n\n");
for (k=0; k < total_no_oops; k++)
fprintf(DatF, "%6i %3i %6i %6i %6i %6i \n", k+1,
oops[k].type+1,
oops[k].members[0]+1,
oops[k].members[1]+1,
oops[k].members[2]+1,
oops[k].members[3]+1);
if (forcefield & FF_TYPE_CLASS2) {
for (k=0; k < total_no_angle_angles; k++)
fprintf(DatF, "%6i %3i %6i %6i %6i %6i \n",k+total_no_oops+1,
angleangles[k].type+no_oop_types+1,
angleangles[k].members[0]+1,
angleangles[k].members[1]+1,
angleangles[k].members[2]+1,
angleangles[k].members[3]+1);
}
fputs("\n",DatF);
}
/* Close data file */
if (fclose(DatF) !=0) {
printf("Error closing %s.lammps05\n", rootname);
exit(61);
}
}
diff --git a/tools/msi2lmp/src/msi2lmp.c b/tools/msi2lmp/src/msi2lmp.c
index c94d4b4d7..15cfddd25 100644
--- a/tools/msi2lmp/src/msi2lmp.c
+++ b/tools/msi2lmp/src/msi2lmp.c
@@ -1,442 +1,439 @@
/*
*
* msi2lmp.exe
*
* v3.9.8 AK- Improved whitespace handling in parsing topology and force
* field files to avoid bogus warnings about type name truncation
*
* v3.9.7 AK- Add check to enforce that Class1/OPLS-AA use A-B parameter
* conventions in force field file and Class2 us r-eps conventions
*
* v3.9.6 AK- Refactoring of MDF file parser with more consistent
* handling of compile time constants MAX_NAME and MAX_STRING
*
* v3.9.5 AK- Add TopoTools style force field parameter type hints
*
* v3.9.4 AK- Make force field style hints optional with a flag
*
* v3.9.3 AK- Bugfix for triclinic cells.
*
* v3.9.2 AK- Support for writing out force field style hints
*
* v3.9.1 AK- Bugfix for Class2. Free allocated memory. Print version number.
*
* v3.9 AK - Rudimentary support for OPLS-AA
*
* v3.8 AK - Some refactoring and cleanup of global variables
* - Bugfixes for argument parsing and improper definitions
* - improved handling of box dimensions and image flags
* - port to compiling on windows using MinGW
* - more consistent print level handling
* - more consistent handling of missing parameters
* - Added a regression test script with examples.
*
* V3.7 STM - Added support for triclinic cells
*
* v3.6 KLA - Changes to output to either lammps 2001 (F90 version) or to
* lammps 2005 (C++ version)
*
* v3.4 JEC - a number of minor changes due to way newline and EOF are generated
* on Materials Studio generated .car and .mdf files as well as odd
* behavior out of newer Linux IO libraries. ReadMdfFile was restructured
* in the process.
*
* v3.1 JEC - changed IO interface to standard in/out, forcefield file
* location can be indicated by environmental variable; added
* printing options, consistency checks and forcefield
* parameter versions sensitivity (highest one used)
*
* v3.0 JEC - program substantially rewritten to reduce execution time
* and be 98 % dynamic in memory use (still fixed limits on
* number of parameter types for different internal coordinate
* sets)
*
* v2.0 MDP - got internal coordinate information from mdf file and
* forcefield parameters from frc file thus eliminating
* need for Discover
*
* V1.0 SL - original version. Used .car file and internal coordinate
* information from Discover to produce LAMMPS data file.
*
* This program uses the .car and .mdf files from MSI/Biosyms's INSIGHT
* program to produce a LAMMPS data file.
*
* The program is started by supplying information at the command prompt
* according to the usage described below.
*
* USAGE: msi2lmp3 ROOTNAME {-print #} {-class #} {-frc FRC_FILE} {-ignore} {-nocenter} {-oldstyle}
*
* -- msi2lmp3 is the name of the executable
* -- ROOTNAME is the base name of the .car and .mdf files
* -- all opther flags are optional and can be abbreviated (e.g. -p instead of -print)
*
* -- -print
* # is the print level: 0 - silent except for errors
* 1 - minimal (default)
* 2 - more verbose
* 3 - even more verbose
* -- -class
* # is the class of forcefield to use (I or 1 = Class I e.g., CVFF, clayff)
* (II or 2 = Class II e.g., CFFx, COMPASS)
* (O or 0 = OPLS-AA)
* default is -class I
*
* -- -ignore - tells msi2lmp to ignore warnings and errors and keep going
*
* -- -nocenter - tells msi2lmp to not center the box around the (geometrical)
* center of the atoms, but around the origin
*
* -- -oldstyle - tells msi2lmp to write out a data file without style hints
* (to be compatible with older LAMMPS versions)
*
* -- -shift - tells msi2lmp to shift the entire system (box and coordinates)
* by a vector (default: 0.0 0.0 0.0)
*
* -- -frc - specifies name of the forcefield file (e.g., cff91)
*
* If the name includes a hard wired directory (i.e., if the name
* starts with . or /), then the name is used alone. Otherwise,
* the program looks for the forcefield file in $MSI2LMP_LIBRARY.
* If $MSI2LMP_LIBRARY is not set, then the current directory is
* used.
*
* If the file name does not include a dot after the first
* character, then .frc is appended to the name.
*
* For example, -frc cvff (assumes cvff.frc is in $MSI2LMP_LIBRARY
* or .)
*
* -frc cff/cff91 (assumes cff91.frc is in
* $MSI2LMP_LIBRARY/cff or ./cff)
*
* -frc /usr/local/forcefields/cff95 (absolute
* location)
*
* By default, the program uses $MSI2LMP_LIBRARY/cvff.frc
*
* -- output is written to a file called ROOTNAME.data
*
*
****************************************************************
*
* msi2lmp
*
* This is the third version of a program that generates a LAMMPS
* data file based on the information in a MSI car file (atom
* coordinates) and mdf file (molecular topology). A key part of
* the program looks up forcefield parameters from an MSI frc file.
*
* The first version was written by Steve Lustig at Dupont, but
* required using Discover to derive internal coordinates and
* forcefield parameters
*
* The second version was written by Michael Peachey while an
* in intern in the Cray Chemistry Applications Group managed
* by John Carpenter. This version derived internal coordinates
* from the mdf file and looked up parameters in the frc file
* thus eliminating the need for Discover.
*
* The third version was written by John Carpenter to optimize
* the performance of the program for large molecular systems
* (the original code for deriving atom numbers was quadratic in time)
* and to make the program fully dynamic. The second version used
* fixed dimension arrays for the internal coordinates.
*
-* John Carpenter can be contacted by sending email to
-* jec374@earthlink.net
-*
* November 2000
*/
#include "msi2lmp.h"
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
/* global variables */
char *rootname;
double pbc[6];
double box[3][3];
double shift[3];
int periodic = 1;
int TriclinicFlag = 0;
int forcefield = 0;
int centerflag = 1;
int hintflag = 1;
int ljtypeflag = 0;
int pflag;
int iflag;
int *no_atoms;
int no_molecules;
int replicate[3];
int total_no_atoms = 0;
int total_no_bonds = 0;
int total_no_angles = 0;
int total_no_dihedrals = 0;
int total_no_angle_angles = 0;
int total_no_oops = 0;
int no_atom_types = 0;
int no_bond_types = 0;
int no_angle_types = 0;
int no_dihedral_types = 0;
int no_oop_types = 0;
int no_angleangle_types = 0;
char *FrcFileName = NULL;
FILE *CarF = NULL;
FILE *FrcF = NULL;
FILE *PrmF = NULL;
FILE *MdfF = NULL;
FILE *RptF = NULL;
struct Atom *atoms = NULL;
struct MoleculeList *molecule = NULL;
struct BondList *bonds = NULL;
struct AngleList *angles = NULL;
struct DihedralList *dihedrals = NULL;
struct OOPList *oops = NULL;
struct AngleAngleList *angleangles = NULL;
struct AtomTypeList *atomtypes = NULL;
struct BondTypeList *bondtypes = NULL;
struct AngleTypeList *angletypes = NULL;
struct DihedralTypeList *dihedraltypes = NULL;
struct OOPTypeList *ooptypes = NULL;
struct AngleAngleTypeList *angleangletypes = NULL;
void condexit(int val)
{
if (iflag == 0) exit(val);
}
static int check_arg(char **arg, const char *flag, int num, int argc)
{
if (num >= argc) {
printf("Missing argument to \"%s\" flag\n",flag);
return 1;
}
if (arg[num][0] == '-') {
printf("Incorrect argument to \"%s\" flag: %s\n",flag,arg[num]);
return 1;
}
return 0;
}
int main (int argc, char *argv[])
{
int n,i,found_sep;
const char *frc_dir_name = NULL;
const char *frc_file_name = NULL;
pflag = 1;
iflag = 0;
forcefield = FF_TYPE_CLASS1 | FF_TYPE_COMMON;
shift[0] = shift[1] = shift[2] = 0.0;
frc_dir_name = getenv("MSI2LMP_LIBRARY");
if (argc < 2) {
printf("usage: %s <rootname> [-class <I|1|II|2>] [-frc <path to frc file>] [-print #] [-ignore] [-nocenter] [-oldstyle]\n",argv[0]);
return 1;
} else { /* rootname was supplied as first argument, copy to rootname */
int len = strlen(argv[1]) + 1;
rootname = (char *)malloc(len);
strcpy(rootname,argv[1]);
}
n = 2;
while (n < argc) {
if (strncmp(argv[n],"-c",2) == 0) {
n++;
if (check_arg(argv,"-class",n,argc))
return 2;
if ((strcmp(argv[n],"I") == 0) || (strcmp(argv[n],"1") == 0)) {
forcefield = FF_TYPE_CLASS1 | FF_TYPE_COMMON;
} else if ((strcmp(argv[n],"II") == 0) || (strcmp(argv[n],"2") == 0)) {
forcefield = FF_TYPE_CLASS2 | FF_TYPE_COMMON;
} else if ((strcmp(argv[n],"O") == 0) || (strcmp(argv[n],"0") == 0)) {
forcefield = FF_TYPE_OPLSAA | FF_TYPE_COMMON;
} else {
printf("Unrecognized Forcefield class: %s\n",argv[n]);
return 3;
}
} else if (strncmp(argv[n],"-f",2) == 0) {
n++;
if (check_arg(argv,"-frc",n,argc))
return 4;
frc_file_name = argv[n];
} else if (strncmp(argv[n],"-s",2) == 0) {
if (n+3 > argc) {
printf("Missing argument(s) to \"-shift\" flag\n");
return 1;
}
shift[0] = atof(argv[++n]);
shift[1] = atof(argv[++n]);
shift[2] = atof(argv[++n]);
} else if (strncmp(argv[n],"-i",2) == 0 ) {
iflag = 1;
} else if (strncmp(argv[n],"-n",4) == 0 ) {
centerflag = 0;
} else if (strncmp(argv[n],"-o",2) == 0 ) {
hintflag = 0;
} else if (strncmp(argv[n],"-p",2) == 0) {
n++;
if (check_arg(argv,"-print",n,argc))
return 5;
pflag = atoi(argv[n]);
} else {
printf("Unrecognized option: %s\n",argv[n]);
return 6;
}
n++;
}
/* set defaults, if nothing else was given */
if (frc_dir_name == NULL)
#if (_WIN32)
frc_dir_name = "..\\frc_files";
#else
frc_dir_name = "../frc_files";
#endif
if (frc_file_name == NULL)
frc_file_name = "cvff.frc";
found_sep=0;
#ifdef _WIN32
if (isalpha(frc_file_name[0]) && (frc_file_name[1] == ':'))
found_sep=1; /* windows drive letter => full path. */
#endif
n = strlen(frc_file_name);
for (i=0; i < n; ++i) {
#ifdef _WIN32
if ((frc_file_name[i] == '/') || (frc_file_name[i] == '\\'))
found_sep=1+i;
#else
if (frc_file_name[i] == '/')
found_sep=1+i;
#endif
}
/* full pathname given */
if (found_sep) {
i = 0;
/* need to append extension? */
if ((n < 5) || (strcmp(frc_file_name+n-4,".frc") !=0))
i=1;
FrcFileName = (char *)malloc(n+1+i*4);
strcpy(FrcFileName,frc_file_name);
if (i) strcat(FrcFileName,".frc");
} else {
i = 0;
/* need to append extension? */
if ((n < 5) || (strcmp(frc_file_name+n-4,".frc") !=0))
i=1;
FrcFileName = (char *)malloc(n+2+i*4+strlen(frc_dir_name));
strcpy(FrcFileName,frc_dir_name);
#ifdef _WIN32
strcat(FrcFileName,"\\");
#else
strcat(FrcFileName,"/");
#endif
strcat(FrcFileName,frc_file_name);
if (i) strcat(FrcFileName,".frc");
}
if (pflag > 0) {
puts("\nRunning msi2lmp " MSI2LMP_VERSION "\n");
if (forcefield & FF_TYPE_CLASS1) puts(" Forcefield: Class I");
if (forcefield & FF_TYPE_CLASS2) puts(" Forcefield: Class II");
if (forcefield & FF_TYPE_OPLSAA) puts(" Forcefield: OPLS-AA");
printf(" Forcefield file name: %s\n",FrcFileName);
if (centerflag) puts(" Output is recentered around geometrical center");
if (hintflag) puts(" Output contains style flag hints");
else puts(" Style flag hints disabled");
- printf(" System translated by: %g %g %g\n",shift[0],shift[1],shift[2]);
+ printf(" System translated by: %g %g %g\n",shift[0],shift[1],shift[2]);
}
n = 0;
if (forcefield & FF_TYPE_CLASS1) {
if (strstr(FrcFileName,"cvff") != NULL) ++n;
if (strstr(FrcFileName,"clayff") != NULL) ++n;
} else if (forcefield & FF_TYPE_OPLSAA) {
if (strstr(FrcFileName,"oplsaa") != NULL) ++n;
} else if (forcefield & FF_TYPE_CLASS2) {
if (strstr(FrcFileName,"pcff") != NULL) ++n;
if (strstr(FrcFileName,"cff91") != NULL) ++n;
if (strstr(FrcFileName,"compass") != NULL) ++n;
}
if (n == 0) {
if (iflag > 0) fputs(" WARNING",stderr);
else fputs(" Error ",stderr);
-
+
fputs("- forcefield name and class appear to be inconsistent\n\n",stderr);
if (iflag == 0) return 7;
}
/* Read in .car file */
ReadCarFile();
/*Read in .mdf file */
ReadMdfFile();
/* Define bonds, angles, etc...*/
if (pflag > 0)
printf("\n Building internal coordinate lists \n");
MakeLists();
/* Read .frc file into memory */
if (pflag > 0)
printf("\n Reading forcefield file \n");
ReadFrcFile();
/* Get forcefield parameters */
if (pflag > 0)
printf("\n Get force field parameters for this system\n");
GetParameters();
/* Do internal check of internal coordinate lists */
if (pflag > 0)
printf("\n Check parameters for internal consistency\n");
CheckLists();
/* Write out the final data */
WriteDataFile(rootname);
/* free up memory to detect possible memory corruption */
free(rootname);
free(FrcFileName);
ClearFrcData();
for (n=0; n < no_molecules; n++) {
free(molecule[n].residue);
}
free(no_atoms);
free(molecule);
free(atoms);
free(atomtypes);
if (bonds) free(bonds);
if (bondtypes) free(bondtypes);
if (angles) free(angles);
if (angletypes) free(angletypes);
if (dihedrals) free(dihedrals);
if (dihedraltypes) free(dihedraltypes);
if (oops) free(oops);
if (ooptypes) free(ooptypes);
if (angleangles) free(angleangles);
if (angleangletypes) free(angleangletypes);
if (pflag > 0)
printf("\nNormal program termination\n");
return 0;
}
diff --git a/tools/msi2lmp/src/msi2lmp.h b/tools/msi2lmp/src/msi2lmp.h
index 377ab1a6c..4716f719d 100644
--- a/tools/msi2lmp/src/msi2lmp.h
+++ b/tools/msi2lmp/src/msi2lmp.h
@@ -1,228 +1,228 @@
/********************************
*
* Header file for msi2lmp conversion program.
*
* This is the header file for the third version of a program
* that generates a LAMMPS data file based on the information
* in an MSI car file (atom coordinates) and mdf file (molecular
* topology). A key part of the program looks up forcefield parameters
* from an MSI frc file.
*
* The first version was written by Steve Lustig at Dupont, but
* required using Discover to derive internal coordinates and
* forcefield parameters
*
* The second version was written by Michael Peachey while an
* intern in the Cray Chemistry Applications Group managed
* by John Carpenter. This version derived internal coordinates
* from the mdf file and looked up parameters in the frc file
* thus eliminating the need for Discover.
*
* The third version was written by John Carpenter to optimize
* the performance of the program for large molecular systems
* (the original code for derving atom numbers was quadratic in time)
* and to make the program fully dynamic. The second version used
* fixed dimension arrays for the internal coordinates.
*
-* The thrid version was revised in Fall 2011 by
+* The third version was revised in Fall 2011 by
* Stephanie Teich-McGoldrick to add support non-orthogonal cells.
*
* The next revision was started in Summer/Fall 2013 by
* Axel Kohlmeyer to improve portability to Windows compilers,
* clean up command line parsing and improve compatibility with
-* the then current LAMMPS versions. This revision removes
+* the then current LAMMPS versions. This revision removes
* compatibility with the obsolete LAMMPS version written in Fortran 90.
*/
# include <stdio.h>
#define MSI2LMP_VERSION "v3.9.8 / 06 Oct 2016"
#define PI_180 0.01745329251994329576
#define MAX_LINE_LENGTH 256
#define MAX_CONNECTIONS 8
#define MAX_STRING 64
#define MAX_NAME 16
#define WHITESPACE " \t\r\n\f"
#define MAX_ATOM_TYPES 100
#define MAX_BOND_TYPES 200
#define MAX_ANGLE_TYPES 300
#define MAX_DIHEDRAL_TYPES 400
#define MAX_OOP_TYPES 400
#define MAX_ANGLEANGLE_TYPES 400
#define MAX_TYPES 12000
#define FF_TYPE_COMMON 1<<0
#define FF_TYPE_CLASS1 1<<1
#define FF_TYPE_CLASS2 1<<2
#define FF_TYPE_OPLSAA 1<<3
struct ResidueList {
int start;
int end;
char name[MAX_NAME];
};
struct MoleculeList {
int start;
int end;
int no_residues;
struct ResidueList *residue;
};
/* Internal coodinate Lists */
struct BondList {
int type;
int members[2];
};
struct AngleList {
int type;
int members[3];
};
struct DihedralList {
int type;
int members[4];
};
struct OOPList {
int type;
int members[4];
};
struct AngleAngleList {
int type;
int members[4];
};
/* Internal coodinate Types Lists */
struct AtomTypeList
{
char potential[5];
double mass;
double params[2];
int no_connect;
};
struct BondTypeList {
int types[2];
double params[4];
};
struct AngleTypeList {
int types[3];
double params[4];
double bondangle_cross_term[4];
double bondbond_cross_term[3];
};
struct DihedralTypeList {
int types[4];
double params[6];
double endbonddihedral_cross_term[8];
double midbonddihedral_cross_term[4];
double angledihedral_cross_term[8];
double angleangledihedral_cross_term[3];
double bond13_cross_term[3];
};
struct OOPTypeList {
int types[4];
double params[3];
double angleangle_params[6];
};
struct AngleAngleTypeList {
int types[4];
double params[6];
};
/* ---------------------------------------------- */
struct Atom {
int molecule; /* molecule id */
int no; /* atom id */
char name[MAX_NAME]; /* atom name */
double x[3]; /* position vector */
int image[3]; /* image flag */
char potential[6]; /* atom potential type */
char element[4]; /* atom element */
double q; /* charge */
char residue_string[MAX_NAME]; /* residue string */
int no_connect; /* number of connections to atom */
char connections[MAX_CONNECTIONS][MAX_STRING]; /* long form, connection name*/
double bond_order[MAX_CONNECTIONS];
int conn_no[MAX_CONNECTIONS]; /* Atom number to which atom is connected */
int type;
};
extern char *rootname;
extern char *FrcFileName;
extern double pbc[6]; /* A, B, C, alpha, beta, gamma */
extern double box[3][3]; /* hi/lo for x/y/z and xy, xz, yz for triclinic */
extern double shift[3]; /* shift vector for all coordinates and box positions */
extern int periodic; /* 0= nonperiodic 1= 3-D periodic */
extern int TriclinicFlag; /* 0= Orthogonal 1= Triclinic */
extern int forcefield; /* BitMask: the value FF_TYPE_COMMON is set for common components of the options below,
* FF_TYPE_CLASS1 = ClassI, FF_TYPE_CLASS2 = ClassII, FF_TYPE_OPLSAA = OPLS-AA*/
extern int ljtypeflag; /* how LJ parameters are stored: 0 = A-B, 1 = r-eps */
extern int centerflag; /* 1= center box 0= keep box */
extern int hintflag; /* 1= print style hint comments 0= no hints */
extern int pflag; /* print level: 0, 1, 2, 3 */
extern int iflag; /* 0 stop at errors 1 = ignore errors */
extern int *no_atoms;
extern int no_molecules;
extern int replicate[3];
extern int total_no_atoms;
extern int total_no_bonds;
extern int total_no_angles;
extern int total_no_dihedrals;
extern int total_no_angle_angles;
extern int total_no_oops;
extern int no_atom_types;
extern int no_bond_types;
extern int no_angle_types;
extern int no_dihedral_types;
extern int no_oop_types;
extern int no_angleangle_types;
extern FILE *CarF;
extern FILE *FrcF;
extern FILE *PrmF;
extern FILE *MdfF;
extern FILE *RptF;
extern struct Atom *atoms;
extern struct MoleculeList *molecule;
extern struct BondList *bonds;
extern struct AngleList *angles;
extern struct DihedralList *dihedrals;
extern struct OOPList *oops;
extern struct AngleAngleList *angleangles;
extern struct AtomTypeList *atomtypes;
extern struct BondTypeList *bondtypes;
extern struct AngleTypeList *angletypes;
extern struct DihedralTypeList *dihedraltypes;
extern struct OOPTypeList *ooptypes;
extern struct AngleAngleTypeList *angleangletypes;
extern void FrcMenu();
extern void ReadCarFile();
extern void ReadMdfFile();
extern void ReadFrcFile();
extern void ClearFrcData();
extern void MakeLists();
extern void GetParameters();
extern void CheckLists();
extern void WriteDataFile(char *);
extern void set_box(double box[3][3], double *h, double *h_inv);
extern void lamda2x(double *lamda, double *x, double *h, double *boxlo);
extern void x2lamda(double *x, double *lamda, double *h_inv, double *boxlo);
extern void condexit(int);

Event Timeline